Research

Digital library/search engine to help scholars

University Park, Pa. -- A new digital library and search engine created by Penn State researchers now holds more than 1 million journal articles and other scholarly works that can be easily accessed by anyone.

CiteSeerX, (http://citeseerx.ist.psu.edu), based in Penn State's College of Information Sciences and Technology (IST), is designed to enhance the dissemination of scientific literature by making papers and other documents easier to locate online. The library provides resources such as algorithms, data, metadata, services, techniques and software that are transferable other digital libraries -- supplying users with more than just an index of search results. The newest version, releases in early 2009, added the capability to search tables.

The search engine was developed by C. Lee Giles, David Reese professor of information sciences and technology and Isaac G. Councill, a Penn State Ph.D. recipient. It is based on open-source software, which means it can be adapted as needed, by anyone, to fit users' requirements.

"We won't keep it to ourselves," Giles said. "We'll give it to other people and they can build similar systems. Because it's modular, it can be changed to meet their needs."

The search engine also includes a feature called MyCiteSeerX, a customizable personal space where the individual user can do tagging, make corrections, create his or her own collections and monitor paper updates.

Other tools currently being developed include Our CiteSeerX, an environment where collaborating teams can work and share information within the library, and a feature that will allow users to receive alerts about new papers of personal interest.

Giles has published several papers on CiteSeerX; his most recent was "Graph-based Crawler Seed Selection," which was presented at the 18th International Conference on the World Wide Web. Councill based his Ph.D. dissertation, "Characterizing Scientific Contributions through Automatic Indexing and Citation Analysis" on the project.

CiteSeerX was funded by the National Science Foundation, Microsoft, NASA and the College of IST.

Last Updated April 6, 2010