Desperately Seeking

Alison Balmat
September 01, 2000

Kitchen appliances.

Type words in little white box. Click "Search." Wait. See results pop up. Click on first link. Skim webpage. Log off.

drawing of can of worms

"That's how searching the World Wide Web seems to work for most people," explains Purvi Shah. "They log on, enter a query or two, scan a web page, and log off."

Shah is one of four undergraduate students working with Amanda Spink, an associate professor in Penn State's School of Information Sciences and Technology. Using data from Excite!, they are studying people's behavior on the web in order to develop more effective search engines.

Say you really wanted an electric can opener—the kind that is cordless, lightweight, and opens 20 cans on a single charge. Says Spink, "Can openers might not even pop up on that first page of search results." You'll only get convection ovens, waffle irons, and stainless steel refrigerators unless you search further, but most people give up after "kitchen appliances."

Very few queries incorporate the search engine's advanced features —options that allow for a more specific and accurate search —adds Michelle Sollenberger, another undergraduate working with Spink. For example, less than five percent of all queries use Boolean operators—words such as "AND" and "OR," which narrow or broaden a search—and mistakes are common, failing to capitalize the operator, for example. The "+" and "-" modifiers, which specify a certain term to include or exclude in a search, are rarely seen, and using quotation marks to create phrases is a technique absent in most queries.

"When you search," explains Sollenberger, "using the advanced features will break the query up into smaller pieces and the results will be more specific." But when users try to take advantage of these features—which is rare—they tend to use them incorrectly.

People use symbols such as ":" or "&" to separate terms, as you might in everyday writing, yet the Excite! search engine cannot recognize them. Stephanie Milchak, another of Spink's students, explains that finding patterns of mistakes like these is vital to improving the search engines. "Engine designers want to know exactly what users do," she says. "Our results could lead to a new generation of web-searching tools that work with people."

But first, the data must be scrutinized. Excite! recently compiled 30 billion queries to analyze and "happily gave me a chunk of that data to play around with," laughs Spink.

Undergraduate Darcy Comstock, for instance, has a stack of papers several inches high, each page filled with 12-digit numbers— anonymous user-identification numbers—and a list of every word for which that user searched. She is looking at the number of queries each user enters, the number of words per query, and how the queries change (if the user adds or subtracts words) during the session.

"The search engine actually records each individual letter that is entered and stores it all." Adds Comstock: "The web companies are processing more than 30 million queries per day; that's a whole lot of data to tabulate."

So far, Spink and her students have found that, on average, people type 2.5 words into the little white box. More than half of these words are proper names or slang terms for which the search engines often cannot find exact matches. A small number of words— about 75—are repeated frequently. "There's lots of 'ands,' 'ofs,' and 'thes,' but we also see 'sex,' 'free,' 'nude,' 'university,' and 'music' a lot," Shah says.

Shah, Comstock, and Milchak are categorizing each word—as entertainment, sex, or travel, for example—and will then look within each category for patterns showing how users search for information.

Meanwhile, Sollenberger is tallying spelling errors—work that will eventually, Spink hopes, lead to the development of a dictionary that can automatically correct common mistakes.

"The number of queries posed on the web is huge, but searching isn't giving people the results that it potentially can," Spink says. Search engines can be tricky. Says Spink, "We want users to persevere and find the best answers to their problems." Even if that "problem" is just finding a fancy can opener.

Amanda Spink, Ph.D., is associate professor in the School of Information Sciences and Technology, 511 Rider Bldg., University Park, PA 16802; 814-865-4454; Her research is funded by the National Science Foundation, NEC, IBM, and Excite!. Darcy Comstock, Michelle Sollenberger, and Purvi Shah are information sciences and technology majors. Stephanie Milchak is a computer engineering major in the College of Engineering. All four are participating in the Women in Science and Engineering Research (WISER) program, for which first-year women students receive credit their first semester in the lab and payment the second semester. WISER is administered by the Pennsylvania Space Grant Consortium and funded by the College of Agricultural Sciences, the College of Earth and Mineral Sciences, the College of Engineering, the Eberly College of Science, EOPC, Lockheed Martin Services Group, NASA, and the National Science Foundation. Visit WISER at Writer Alison Balmat will graduate in May 2002 with a B.A. in French and geography, with honors in geography. Illustrator Livio Ramondelli is an undergraduate majoring in visual arts.

Last Updated September 01, 2000