Penn State doctoral student creates digital map to the global future

UNIVERSITY PARK, Pa. -- There seems to be a problem in Kansas.

The city of Wichita is lighting up month after month on a time-lapse map showing every documented protest event across the globe from 1979 to the present. At first, there are few dots in the U.S., with most showing up predictably in New York and Washington, D.C. The only other dot is persistently blinking right in the middle of Kansas, a seemingly unlikely location for so much unrest.

“We get a lot of questions about that,” John Beieler, political science doctoral student at Penn State and creator of the map, said with a chuckle. “It’s because any protest event labeled as occurring in the U.S. without being attributed to a specific city is displayed in the geo-center of the country, which just happens to be in Kansas.”

The result makes the state appear as a hotbed of protest activity in the 1980s, but this becomes less noticeable as the months cycle by. Activity ramps up in the late '90s, perhaps with the advent of the Internet and greater accessibility to information. By the time the digital counter hits 2010, the entire map pulses with protesting dots.

Visible on the map are the anti-apartheid protests in South Africa, Cold War demonstrations, and more recent Occupy Wall Street and Arab Spring uprisings.

“The idea came about as a ‘proof-of-concept.' You can look at this data in a spreadsheet and not really see the big picture; you just see a list of numbers. We wanted to make all this information truly visual.”

            -- John Beieler, political science doctoral student

Beieler created the map as a trainee in Penn State’s National Science Foundation-funded Big Data Social Science Integrative Graduate Education and Research Traineeship (IGERT) program and used data culled from the Global Database of Events, Language- and Tone (GDELT) — an enormous repository chronicling every documented social event accessible on the Internet. This includes protests, bombings, speeches, peace agreements and a myriad of others.

In the world of political science, the data set is a big deal. Although similar databases have been created, Beieler said what makes the GDELT dataset stand out is its scale.

“The scope of the data set is what really makes it amazing,” Beieler said. “It doesn’t just tell you there was a protest in Egypt on a specific day, it also specifies who did what to whom.”

For example, a GDELT entry wouldn’t just cite that a bombing happened in Iran. Each entry has 57 fields, storing such information as the date, event, perpetrator (including their ethnicity, race and political standing), and the location and the tone of the coverage (on a scale from positive to negative). The extent of this detail throughout the database makes for an exhaustive, comprehensive picture of the world.

The GDELT has been more than 20 years in the making. Philip Schrodt, then a professor at the University of Kansas in the 1980s, laid the early foundations of the database with interests laying mainly in Middle Eastern and Asian countries. Years later, Kalev Leetaru (then a graduate student at the University of Illinois, and now the Yahoo! Fellow at Georgetown University) helped bring the project to fruition —creating the technical infrastructure and workflows to scale it up to a global database that monitored tens of thousands of news sources on a daily basis. The completed data set was announced last April.

“I was very interested in teasing apart and exploring the connection between emotions and physical behavior,” Leetaru said. “I think the GDELT has succeeded in capturing people’s imaginations, and it’s telling that something like this has gone mainstream — and now even people outside of political science have begun exploring event data.”

After Leetaru added so much to the database, it was difficult for Beieler to extract the information he needed to create his animated map. Totaling 85 gigabytes of data, the GDELT is too big to run on one computer, so he had to use a computer cluster to open the data set across multiple systems. Beieler then had to work on separating the data about protest events from millions of other entries.

Once he sorted the data, Beieler compiled it in a spreadsheet, did some minor coding and worked with fellow Big Data Social Science-IGERT trainee and doctoral student, Josh Stevens, to map the data using software called CartoDB. Since its completion, the map has been featured by such news sites as The Guardian, Slate, Foreign Policy and Wired Japan.

But even more exciting than the press the map has been getting are the possibilities for future research opened by the GDELT data, Beieler explains.

“For almost any research question you have, you can find something in this data set to help you answer it. It’s truly a record of what humanity has been doing at these specific days in time, and if we can harness that to predict future events, then that’s very powerful.”

                                                                  -- John Beieler

Scholars have already started using the GDELT data to track and predict social unrest. In May, a reporter at New Scientist isolated conflict entries occurring in Syria to create a map of the country’s civil war. And a few months earlier, in March, Penn State student Jay Yonamine wrote his doctoral dissertation on mapping and predicting levels of violence in districts in Afghanistan.

Leetaru went one step further than Beieler, predicting that the GDELT database will eventually become a way to better understand the evolving landscape of human interaction.

“My ultimate dream for the GDELT is for it to become an evolving, dynamic system for understanding global media and for cataloging and understanding human society in new ways,” he said. “I see it bringing together all available, open data sources into a single unified platform to compile information and essentially make human society computable.”

To view and experiment with the map, visit http://johnbeieler.org/protest_mapping/. For more stories about IT at Penn State, visit Current at http://current.it.psu.edu/.

Contacts: 
Last Updated November 25, 2013