Penn State IST researchers to enhance search engine

August 26, 2005

University Park, Pa. --- The National Science Foundation has awarded a $1.2-million grant to researchers in the Penn State School of Information Sciences and Technology (IST) and the University of Kansas to enhance and improve the CiteSeer academic search engine which receives more than 1 million hits a day and is heavily indexed by Google and Yahoo!.

Since its launch in 1997, CiteSeer has provided the public with access to more than 700,000 documents in computer and information sciences. The Next Generation CiteSeer will archive more documents, allow new types of searching, offer CiteSeer as a Web service, include personalized recommendations and searches, and permit synchronous live-object collaboration.

Lee Giles, the David Reese Professor of Information Sciences and Technology, is the principal investigator for the NSF Computing Research Infrastructure Collaborative Grant. Jack Carroll, the Edward M. Frymoyer Professor of Information Sciences and Technology; Jim Jansen, assistant professor of information sciences and technology; and Susan Gauch, University of Kansas, are co-investigators.

Funded for four years, the Next Generation CiteSeer project will expand CiteSeer's database and add and improve services. Among the new features will be a parsing service, which allows extraction of acknowledgments and header analysis, and an enhanced indexing service for documents and their citations.

Besides the new services, the Next Generation CiteSeer architecture will be open source, making it easier to use and more reliable, Giles said. The new architecture also will be a collection of Web services, which will enable greater access to CiteSeer metadata.

CiteSeer was created at the NEC Research Institute-now NEC Labs-by Giles and others. IST now hosts the search engine and digital library.

Since CiteSeer's inception, the Web has grown in size, necessitating new crawler strategies. The growth of the computer and information sciences communities sparked interest in making CiteSeer a collaborative resource. In addition to online discussion forums, Next Generation CiteSeer will provide opportunities for joint authoring in an environment streamlined for efficiency and ease of participation. For improved access, mirror sites for CiteSeer will be located throughout the world with ones already at MIT and the University of Zurich.

While CiteSeer currently focuses on computer and information sciences, Giles has developed a business version, SMEALSearch. The Next Generation CiteSeer project will enable the search engine to be easily adapted to other academic areas as well, Giles said.

(Media Contacts)

Last Updated March 10, 2010