Research

Multi-institutional team to use AI to evaluate social, behavioral science claims

DARPA-funded program aims to build automated tool that scores confidence of published research findings, could aid in decisions regarding national security

UNIVERSITY PARK, Pa. -- A computer program that could quickly and accurately evaluate the validity of published research claims in social and behavioral sciences literatures will be the focus of a team of researchers from Penn State, Texas A&M, Old Dominion and Microsoft Research — led by C. Lee Giles, the David Reese Professor of Information Sciences and Technology. The team received $1.1 million in competitive funding from the Defense Advanced Research Projects Agency (DARPA) to develop this program, and if successful the project is projected to receive $2,930,995 over its 27-month execution.

The aim of the program, SCORE (Systematizing Confidence in Open Research and Evidence), is to develop automated tools to score the confidence — or reproducibility and replicability — of scientific claims in areas such as psychology, sociology and political science. As DARPA’s program description notes, a large number of studies have documented how scientific results and claims vary dramatically in terms of their ability to be independently corroborated. This could have significant real-world implications for the Department of Defense’s ability to make decisions regarding national security related to how people think and act.

“Rigorous understanding of human behavior is critical to national security,” said Sarah Rajtmajer, assistant professor of IST and member of the research team. “Ideally, the prolific social and behavioral sciences literature should be a primary resource for the Department of Defense and its partners. Published work should be taken as evidence in support or refute of specific findings, and confidence therein should derive from a preponderance of well-understood evidence.”

In their approach, the team will develop artificial prediction markets — an artificial intelligence equivalent of a group of human experts — to assign confidence scores to claims in social science literatures. Artificial agents, or trader bots, will learn trading patterns from human market participants, then act on those patterns in synthetic markets, reasoning on information extracted from published work and rich metadata. The end result will be a market asking price for each finding, which will be interpreted as confidence in that claim.

Previous work has suggested that prediction markets populated by human expert participants may do well at scoring the credibility of research claims. However, according to Rajtmajer, organizing and running prediction markets with human expert participants can be time consuming and expensive. Humans have limitations and shortcomings that machines do not, including the scope of available information and inherent cognitive biases. Machines on the other hand are very good at rapidly ingesting and processing a lot of data, but lack the range of human cognitive abilities.  

"In the context of the specific objectives of this program, we have a unique opportunity to address a challenge that is paramount to modern AI generally. That is, how we can combine the best of both worlds — human intuition and machine reasoning — for a much more intelligent technology,” Giles said.

Other Penn State collaborators on the project include Anna Squicciarini, associate professor of IST; Christopher Griffin, associate research professor in Penn State’s Applied Research Laboratory; and Anthony Kwasnica, professor in the Smeal College of Business.

Last Updated January 22, 2020

Contact