Grassroots data science initiative seeks to create connections, collaborations

Liam Jackson
September 21, 2020

UNIVERSITY PARK, Pa. — Dave Hunter knows how valuable a serendipitous connection can be. Throughout his career, Hunter, professor of statistics at Penn State, said that “being at the right place at the right time” helped him find new partnerships and projects to advance his science and his academic career. Now, Hunter is providing similar opportunities to other Penn State researchers involved in data science. 

With help from a Teaching and Learning with Technology (TLT) faculty fellowship, Hunter spearheaded the launch of a grassroots data science initiative designed, in part, to provide more chances for other researchers to create new connections, conversations and collaborations. Now that the community has grown, the Institute for Computational and Data Sciences (ICDS) and the University Libraries have joined the initiative, which is already leading to new collaborations being formed. 

Data science for all

Hunter was first inspired to form a data science community when the University launched a major in data sciences in 2016. As head of the Department of Statistics at the time, Hunter was involved in conversations about the proposed major with faculty members in the College of Information Sciences and Technology and the School of Electrical Engineering and Computer Science, each of which had faculty and students deeply involved in data science education. 

“In the end we decided to take an intercollege approach to the major,” Hunter said. “It was a nice way to build community around this new major, and that’s where I started to dive into this idea of data science at Penn State, when I started to have lots of chance conversations around data science with various people.”

In one of those conversations, Hunter learned from a colleague — Scott McDonald, professor of science education — about the TLT Faculty Fellows program. 

“Scott told me he had been working with the fellows program for a while, and I thought, wouldn’t it be great if this data science thing were bigger than just the three departments involved in the major? It could involve everybody who is trying to do something with big data. It seemed like a nice way to build a community that would not be exclusive,” he said.

The TLT Faculty Fellows program provides faculty members with a team of support staff around projects that are “at the intersection of pedagogy and technology,” said Bart Pursel, interim director of innovation with TLT. A key component of the program is that it’s designed to be flexible to adapt to the changing nature of projects that work with ideas and technology at the cutting edge.

“Projects might meander and end up veering off in a couple of directions, and that's okay. The fellows program is designed for that,” said Pursel. “We want to take projects from innovation to scale, and we take projects on with the idea that we don't just want to do a niche thing. We want projects to grow legs and go into other disciplines, or impact different parts of the University.”

Hunter was paired with TLT’s Data Empowered Learning team, where he said he found support from a whole team who had experience with data science. But Hunter credits one person, Hannah Williams, project manager, with his ability to launch the initiative.

“There are many great people from TLT involved in this initiative, but Hannah has truly gone above and beyond,” he said. 

After several months of brainstorming and conversations with stakeholders from around the University, the team came up with a plan for investigating the possibility of creating a community. 

Putting shape to a data science community

The first task was to identify how to shape a data science initiative at Penn State. To keep it as inclusive as possible, Hunter wanted a grass roots movement that allowed the membership to guide the direction of the initiative. Fittingly for a data science initiative, the team sought to collect data, using a detailed survey sent out to Penn State faculty. The biggest question they sought to answer was: Does the Penn State community want to be involved in a data science initiative?

“The answer was a resounding yes,” said Williams. “People wanted interaction and engagement, and we started to realize that people feel very strongly about all sorts of things in this space of data science: policy, privacy, applications, methodologies, education, internal collaborations, grant proposals, and just being competitive in general.”

Considering how ubiquitous data science is, creating a University-wide community would require a careful, thoughtful approach. Even defining data science is a challenge for many institutions, but Hunter prefers one that, at first, seems simple.

“Data science is about deriving meaning from data, and often that means big data,” Hunter said. 

As Hunter notes, this definition implies that it requires a team approach, often interdisciplinary.

“Often, this means the data themselves are not easy to obtain, so you need some technical expertise to obtain data,” he said. “Then you have put it into a format where it can be analyzed, which requires a different type of expertise. Then you need to know what you're trying to learn, which requires subject matter expertise. You need to know how to answer the questions you’re trying to answer, and this requires some statistical expertise.”

The TLT team helped Hunter launch a web presence for the community, datascience.psu.edu. Their real success came after realizing that people needed more opportunities for conversations around different aspects of data science. An informal lecture series would be the best way to gather anyone with an interest in data science, an idea supported by their survey data. 

Anyone interested in joining the data science community can get more information on the data science community website.

Modeling their talk series after the Penn State Materials Research Institute’s Millennium Café series, the group strategically built in opportunities for conversation wherever they could. First, each talk would include two speakers, one on the development side of data science and one on the application side. Next, each speaker would be asked to tell a little bit about themselves, outside of their researcher or practitioner role, to provide a “human element” to the talks, said Williams. Finally, they booked space for far longer than would be required for two 13-minute talks. Four hours total, and there were times when that didn’t seem like enough.

“People were sticking around for sometimes several hours afterward to talk shop, whether to find out what open-source packages people are using, or just sharing best practices and ideas,” said Pursel. “It’s been encouraging to see that.” 

In its first year, the Data Science Talks featured 17 researchers from 15 departments and three campuses, including one, Michael Rutter, associate professor of statistics and mathematics at Penn State Behrend, who gave his talk using a Beam robot in the Dreamery

person standing next to beam robot

Hannah Williams stands with Michael Rutter, associate professor of statistics and mathematics at Penn State Behrend, who joined a fall 2019 data science community meeting via beam robot to present his work.

IMAGE: Dave Hunter / Penn State

The initiative has led to more than just conversations for several researchers. After giving a data science talk about his work with PlantVillage, David Hughes, associate professor of entomology and biology, was approached by an audience member, Medha Uppala, postdoctoral researcher in College of the Liberal Arts' Center for Social Data Analytics.

“As a new postdoc at Penn State, I was keen on forming new contacts with researchers in the data science community. This was especially important to me as I’m an applied statistician and I was looking for new field applications that I can pursue my research in,” Uppala said. “David spoke about his research with PlantVillage, the farmer networks in Kenya and how they adopt new technologies. It so happens that some of my background is in social networks. So I approached him to chat about his work and if he had any open social network problems I could collaborate on. It all began there.”

Hughes said the two are involved in a new project that is “going to be a huge study in Kenya on social networks.”

“It’s a big change in our work, which is great and the whole point of such forums,” Hughes said. “Diversity is always good. Bringing together different ideas is the essence of advancement in knowledge.”

The future of the community

The success of the series grabbed the attention of two Penn State units that, like TLT, serve the entire University community — ICDS and the University Libraries. Both have become co-organizers of the initiative. 

Hunter opted to hand the reins over to new faculty leadership both because he is going on sabbatical in the fall 2020 semester, and because he believes in an inclusive community.

“This is the sort of thing that by its nature should not be directed by the same person year after year after year,” he said.

Two new faculty leads — Briana Ezray, research data librarian, and Xiaofeng Liu, associate professor of civil and environmental engineering and ICDS co-hire — volunteered to take on a leadership role with the community, and they will be announcing the fall semester speakers at the first fall 2020 data science meeting on Sept. 24.

Williams noted that seeing the evolution and growth of the initiative is a sign of what she hopes are many good things to come. 

“This is now this amazing partnership with other important units in the University, ICDS and the Libraries, who have a vested interest in this and want to see it move forward,” she said. “That’s a huge success, and it’s only going to get better from there.”

Anyone interested in joining the data science community can get more information on the data science community website.

(Media Contacts)

Last Updated September 30, 2020