What is that search engine doing anyway?

"What is that search engine doing anyway?" Lee Giles asked the lunchtime crowd at the second conversation of the spring season. Giles, professor in Penn State's School of Information Science and Technology, highlighted the sophisticated technology and burgeoning societal impact of today's search engines.

professor in black sweater hold coffee
Emily Wiley

Web expert Lee Giles explains search engines.

"Search is actually an old idea," Giles told the group. "Card catalogues may be considered the first searchable index." In a field defined by rapid change, Giles cited search engines Alta Vista and Google as true innovators. AltaVista, Giles notes, was the first text-based indexing technology, and Google, the first to incorporate link analysis and ranking capabilities.

Speaking to a crowd which included local residents as well as colleagues, Giles detailed the thorough and virtually instantaneous workings of search engines. "Search engines 'crawl' Internets, Intranets, and databases of digitized information," he explained. "An index is then built, providing a ranking of the information in an easy-to-use interface." This, exclaimed Giles, is where Google exceeds its competition.

"Google is the epitome of the modern-day search engine, with eight billion plus pages and over 300 million queries every day," Giles noted, mentioning Yahoo, MSN, AOL, and AskJeeves as runners-up.

"AskJeeves has a loyal clientele, despite former problems with natural language recognition," Giles said. "For instance, if you asked it, 'What is the best way to deliver a baby?' the results included FedEx and UPS."

older audience sits and watches
Emily Rowlands

Research Unplugged is held Wednesdays from noon to one at the Penn State Downtown Theatre.

Some clever search-engine users have figured out how to pull Google's strings. Giles explained that Google ranks a Web site by counting how many times a given phrase occurs within linking sites. Users can "Google bomb" a site with links containing a particular phrase, thus affecting search engine results. In some cases, Google bombing is a form of political commentary. Giles pointed out that a search in Google under the phrase "miserable failure" yields as a number one result the biography of George W. Bush on the White House Web site.

Giles pointed out search engines are partially controlled by advertisers-whether paid, sponsored, or auctioned. He recommended "pay per click" ads as the best way to increase the ranking of particular sites.

For Web designers and developers interested in higher ranking of their sites, Giles suggested some tactics to get noticed. "Both on-page and off-page factors are important," he said. "You must build your pages for two entities-humans and computers. If you do not build a site for a computer, it will be ignored by the search engine." Giles explained content and coding of a site and anchor text within a Web address must be relevant and meaningful. Also, it helps to link to your site from other sites, he said.

Giles has experience in both realms-as a creator of Web pages and of search engines, including Cite Seer, a "niche" engine for scientific literature, and SmealSearch, for business literature. His greatest concern is making the search process more relevant and specific to the needs of the user. For example, Cite Seer searches academic papers and returns not only article matches, but citations as well. Such capability makes research easier and allows scholars to build a web of knowledge on any given subject.

As the discussion wound down, someone asked Giles about the future. He didn't have to search his own thoughts long to reply. His vision? The merging of search technologies to create a single device that simply does everything.

Lee Giles, Ph.D., is David Reese professor of information sciences and technology and associate director of eBusiness Research Center at Penn State; giles@ist.psu.edu.

Last Updated March 02, 2005