Interdisciplinary center seeks to leverage power of big data analytics

The explosive growth in big data has enabled researchers and scientists in many fields to harness information that has the potential to change the way governments, organizations, and academic institutions conduct business and make discoveries. The massive amounts of data that are being generated, however, require sophisticated algorithms, techniques and software tools to make that information useful. A new interdisciplinary center at Penn State seeks to leverage the talents of researchers across the University as part of a joint effort to maximize the potential of big data.

“Data is everywhere,” said Vasant Honavar, professor and Edward Frymoyer Chair at Penn State’s College of Information Sciences and Technology (IST). “But having this data is not enough. You have to get some useful, actionable knowledge out of it.”

Honavar is the director of the Center for Big Data Analytics and Discovery Informatics, which is co-sponsored by the College of IST, the Institute for CyberScience, the Huck Institute for the Life Sciences, and the Social Science Research Institute. The goal of the center is to pursue interdisciplinary fundamental and applied research, and research-based advanced training in big data analytics and discovery informatics, covering topics in areas such as artificial intelligence, computational discovery, machine learning, and social network analytics. The center will serve as a focal point that links faculty across the campus who have an interest in research and education in the data sciences. The center will engage faculty in multiple colleges including: the College of IST, the Eberly College of Science, the College of Engineering, the College of Medicine, College of Health and Human Development, the College of the Liberal Arts, and the College of Earth and Mineral Sciences.

“Big data cuts across all those areas,” Honavar said. “Our goal is to bring the different areas of expertise together to advance techniques and technologies for extracting knowledge from data.”

Big data analytics is a term that is increasingly used to describe the process of applying serious computing power -- the latest in machine learning and artificial intelligence -- to massive and highly complex sets of information. Discovery informatics is an emerging field that brings together computing and information scientists, statisticians, cognitive and social scientists as well as experts in specific areas of sciences and humanities to: understand and formalize the representations and processes that are crucial to discovery in the sciences as well as the humanities; design, develop and evaluate computing and information artifacts that embody such understanding; and apply the resulting artifacts and systems to facilitate discovery.

According to Honavar, Penn State researchers in areas such as biology, engineering and health are seeking to generate research that uses the vast amounts of data that are now available to them. However, he added, the researchers “need sophisticated data analytics techniques” to utilize that data. Researchers at the College of IST and the Department of Statistics are developing new algorithms for analyzing data that would complement the work that is being done by researchers in other disciplines.

Honavar received his doctorate in computer science and cognitive science in 1990 from the University of Wisconsin-Madison, specializing in artificial intelligence. In addition to serving as the Edward Frymoyer Chair Professor of IST at Penn State, he is on the faculty of the Huck Institute of the Life Sciences, the Institute for Cyberscience, and the bioinformatics and genomics graduate programs.  He has extensive experience with research collaborations that leverage innovations in data analytics to advance bioinformatics, social informatics, health informatics, energy informatics, and security informatics. He says he is especially excited about the potential of big data in health sciences.

“Every discipline is grappling with new types of data,” he said.

For example, he said, In the health domain, the wide adoption of electronic medical records offer unprecedented opportunities for innovations in health care that help deliver higher quality care at lower cost by leveraging large quantities of data to support evidence-based approaches to clinical practice.

According to Honavar, modern data analytics techniques that integrate sophisticated probabilistic models, statistical inference, and data structures into machine learning algorithms have resulted in powerful ways to extract actionable knowledge from data in virtually every human endeavor. Creative applications of data analytics are enabling biologists to gain insights into how living systems acquire, encode, process and transmit information; health scientists to not only diagnose and treat diseases but also help individuals make healthy choices; economists to understand markets; and for security analysts to uncover threats to national security.

“A lot of this potential comes from integrating and analyzing data that previously resided in silos,” he said.

Despite recent technological advances, Honavar said, “there remains a huge gap between our ability to acquire data and our ability to make effective use of data to advance discovery.” The challenge is further compounded, he added, by the fact that many scientific investigations increasingly need to draw on expertise and results from multiple disciplines.

“Closing this gap calls for increasing use of computational tools to automate many of the tasks that underlie scientific discovery,” he said.

The Center for Big Data and Discovery Informatics would seek research support from a variety of sources, including the National Science Foundation, the Department of Defense, the Department of Homeland Security, and corporate sponsors. Partnerships with faculty from the College of Medicine, the Huck Institutes of the Life Sciences, and the Social Sciences Research Institute would assist in the pursuit of funding from the National Institute for Health and other agencies.

Last Updated August 25, 2016