Big data, big science: Students share 'big data' research at poster session

Sayali Phadke (Big Data Poster Session) — Sayali Phadke, a doctoral student in statistics, discusses her research during the Big Data Social Science poster session on April 14. Credit: Emilee Spokus / Penn State. Creative Commons

UNIVERSITY PARK, Pa. — Just today alone, enough data will be produced to fill 250,000 Libraries of Congress, according to a 2016 report from Mikal Khoso of Northeastern University. On a broader scale, estimates indicate that 4.4 zettabytes of data (that’s 44 trillion gigabytes) existed in the world in 2013 — an amount that is expected to increase tenfold by 2020.

That data comes in all shapes and sizes, from text-based tweets to satellite imagery, and its variability was on display at the Big Data Social Science (BDSS) poster session held April 14 on the University Park campus. Poster topics included: identifying bullying tweets (Amy Zhang, statistics and Diane Felmlee, sociology); social covariates of the HIV epidemic (Ben Sheng, Xun Cao and Le Bao, statistics); virtual reality and decision making (Mark Simpson and Alexander Klippel, geography); and, racial segregation of both home and work environments (Robert Zuchowski and Stephen Matthews; sociology & demography).

Students Matthew Denny (political science), Cassie McMillan (sociology, demography), and Sayali Phadke (statistics) also presented at the poster session. As doctoral students in the BDSS Integrative Graduate Education and Research Training (IGERT) program funded by the National Science Foundation, each of them is working to improve current analytic techniques and apply them to the exponentially exploding political, geographic, and social network data produced every day.

The work that Denny presented takes a nuanced approach to network analysis by examining not only whether particular nodes are connected, but also the strength of those connections. To put this in context, imagine a celebrity’s connections on a social network like Facebook. Everyone that they are friends with can be considered a connection; but, distinguishing the friends from the fans requires more information. One way to do that is to consider the strength of those connections by examining how often the celebrity and their “friends” like each other’s posts. Denny and his adviser, Bruce Desmarais, Penn State associate professor of political science, recently published in Social Networks a model that deals with just that sort of weighted network, applied to lending data from 17 countries.

“We think that there’s a big hole in the market for people trying to understand systemic risk and that there are some really interesting applications in terms of improving risk management in the financial system by adopting these network analytic techniques,” Denny said in explaining the potential implications of his work. For example, he believes their model could help them understand “how the relationship between banks or economies or countries underlays the risk of financial collapse and the way in which countries can respond.”

Phadke, meanwhile, is exploring another direction — she is examining how influence spreads through networks. To explain her work, Phadke called on a ubiquitous aspect of modern life: advertisements.

“Let’s say there is a company who wants to study the effect of an advertisement,” she began. “In all of the classical statistical methods you assume that two units [people who see the ad] are independent of each other, but, the moment you have a network set up, you are researching units that are communicating. So if you go into it assuming that showing an ad to one person means you are going to affect one person’s purchase outcome, you are possibly looking at underestimating the effect of your ad and putting more money into it than you really need.”

The model that Phadke is developing has more applications than just saving companies money, however. She suggested that the model could be used to assess the effectiveness of public health initiatives or even international trade regulations.

While Phadke and Denny have concentrated on improving statistical models, McMillan is applying them to address a common problem: bullying. McMillan used network analysis to assess the likelihood of bullying among students at two time points. She found that, contrary to the plot of many teen dramas, bullying is more common among students of similar social status.

“Our project has the potential to better inform school prevention and intervention programs that target adolescent bullying behavior,” McMillan believes. “Popular culture often characterizes the victims of bullying as adolescents who are on the peripheries of their social networks, while the aggressors are more popular peers with no other social relations to their victims. While this characterizes some of the bullying behavior observed in our sample, a lot of school bullying occurs between friends and between those who are similarly positioned in their social networks.

“When designing prevention and intervention programs, professionals should keep in mind that adolescents often bully one another as an attempt to gain social status and this is best achieved by picking on those who are more popular and who are similarly positioned in the social hierarchy.”

To learn more about ongoing research and other information about Penn State’s BDSS-IGERT program, please visit http://bdss.psu.edu.

Last Updated April 28, 2017

Contact

- amc497@psu.edu

Big data, big science: Students share 'big data' research at poster session

Contact

Tags