Mitra presents research on AI at international conferences

August 24, 2011

Prasenjit Mitra, associate professor of information sciences and technology in Penn State's College of Information Sciences and Technology (IST), along with Cornelia Caragea, post-doctoral researcher, recently presented their research in the area of artificial intelligence at two conferences in Spain.

Mitra and Caragea presented “Classifying Scientific Publications Using Abstract Features” at the ninth Symposium on Abstraction, Reformulation and Approximation (SARA), which was held in Catalonia; and “Context Sensitive Topic Models for Author Influence in Document Networks” at the International Joint Conference on Artificial Intelligence (IJCAI) that was held in Barcelona.

The aim of SARA is to provide a forum for interaction among researchers in all areas of artificial intelligence and computer science with an interest in abstraction, reformulation or approximation.

“Classifying Scientific Publications Using Abstract Features” was co-written by Adrian Silvescu, a research scientist at Naviance Inc. in Oakland, Calif., Saurabh Kataria, a doctoral student in the College of IST, and Doina Caragea, an assistant professor in the Department of Computing and Information Sciences at Kansas State University.

In the article, the authors state that recent technological advances, as well as the popularity of Web 2.0 applications, have resulted in large amounts of online text data, e.g. news articles, weblogs and scientific documents.

“Efficient and effective classification methods are required in order to deliver the appropriate information to specific users or groups,” the authors wrote.

Algorithms are designed to train computer programs to accurately classify text data, Mitra and Caragea said. However, according to the authors, “the choice of appropriate features to encode such data is crucial for the performance and the complexity of the learning algorithms.”

According to the article, feature selection methods reduce the number of features by selecting a subset of the available features based on some chosen criteria. An example of feature selection, Mitra said, is classifying e-mails as spam and non-spam. Current feature selection methods use mutual information and topical words to group features into clusters.

As an alternative to traditional methods, Mitra and his colleagues propose feature abstraction methods, which are data organization techniques designed to reduce a model input size by grouping “similar” features into clusters of features. Each cluster is identified by an abstract feature.  A cut through the hierarchy specifies a compressed model, where the nodes on the cut represent abstract features.

“Ideally, we would want very similar features to be grouped together and form an abstract feature,” Mitra said. “A good cut keeps semantically similar features together but groups differing features into different abstract feature groups.”

 Mitra, Caragea and their collaborators compared the abstraction approach with traditional feature selection methods and evaluated the quality of the scoring function used to identify a “good” cut through an abstraction hierarchy on two data sets of scientific publications: Cora, a standard benchmark data set of research articles, and the Cite-Seer digital library.

The results of their experiments, Mitra said, showed that feature abstraction produces higher-performance models than feature selection methods using mutual information and topical words.

“People who need to run classifiers to categorize text should prefer feature abstraction to other methods, especially if they want the classification to run fast using a smaller feature space,” he said. 

IJCAI is the main international gathering of researchers in artificial intelligence.  Held biennially in odd-numbered years since 1969, the conference is sponsored jointly by IJCAI and the national AI societies of the host nations (s). 

At IJCAI in Barcelona, Mitra and Caragea presented  “Context Sensitive Topic Models for Author Influence in Document Networks,” which was co-authored by  Kataria and C. Lee Giles,  David Reese Professor of Information Sciences and Technology. According to the article, in a document network such as a citation network of scientific documents, weblogs, etc., the content produced by authors exhibits their interest in certain topics. In addition, some authors influence other authors’ interests.

Mitra, Caragea and their collaborators propose to model the influence of cited authors along with the interests of citing authors.  They present two different generative models for inter-linked documents, the author link topic (ALT) and the author cite topic (ACT) models, which simultaneously model the content of documents, and the interests as well as the influence of authors in certain topics.

”Influential authors are ones who have pioneered some research area or have made seminal contributions in the area,” Mitra said. “Interested authors are those who have published a lot of work in the area and are thus competent experts in the area, but, they may not have spawned new, exciting works in the area.”

In addition, the researchers hypothesize that the context in which a cited document appears in a citing document indicates how the authors of the cited document have influenced the contributions by the citing authors. ACT extends ALT to incorporate the citation context, which could provide additional information about the cited authors.

The author influence models, Mitra said, are an extension of the research he and Caragea conducted last year with citation recommendation models. Author influence models can be helpful, he added, when seeking experts in a particular field to serve on a program committee; or when university departments are granting awards, promotions and tenure to professor

(Media Contacts)

Last Updated January 09, 2015