New major prepares next generation of data scientists

UNIVERSITY PARK, Pa. – As part of the Amazon Prime Air team, the drones-delivering-packages project, Brigid Smith sees the benefit of the data sciences classes she took as a graduate student in computer science and engineering. Smith was using graphs to correlate related objects during object recognition. That is, she was using data science to keep track of objects we tend to see in order to prioritize looking for them as we detect more and more objects.

“While my professional experience has been almost entirely within this hardware-focused space, the data sciences as a concept is a very in-demand field,” said Smith, a 2015 CSE graduate. “Having that background has been very useful. Prime Air is a team comprised of people from disciplines from aerospace to mechanical engineering, so having a wider breadth of knowledge has helped me be more comfortable in such a varied team.”

Because of the growing demand for data scientists, the School of Electrical Engineering and Computer Science is now offering data sciences as an undergraduate major in collaboration with the College of Information Sciences and Technology, and  the Department of Statistics in the Eberly College of Science. Approved by the Penn State Board of Trustees in February, the data sciences major will be available in the summer.

According to John Hannan, interim associate department head of computer science and engineering, this major is important today because there is a high demand for data scientists in the real world right now. The field encompasses so many aspects of our lives from social, physical, and computational sciences.

“Data scientists are important because of the vast amounts of heterogeneous data that are routinely being generated and collected,” said Daniel Kifer, an associate professor in computer science and engineering who has been teaching data sciences courses already. “Analyzing and making sense of the data requires proficiency in modern statistical and machine learning techniques as well as strong programming skills that take advantage of modern big data frameworks. This is a relatively rare combination of skills.”

Students majoring in data sciences will learn the technical fundamentals of data sciences with a focus on developing the knowledge and skills needed to manage and analyze large-scale, unstructured data to address an expanding range of problems in industry, government, and academia.

“Data sciences is a highly interdisciplinary field that can be viewed through the lenses of several different disciplines,” said Raj Acharya, professor and director of the school. “This new program will provide a unique opportunity by providing students with a broad background in these different disciplinary aspects of data science while specializing in depth in one of the disciplinary approaches.

After taking core courses during the pre-major stage, students will choose among options focused on application (IST), computation (Engineering), and statistical modeling (Science). Students in all three options will come together in their junior and senior years for two shared capstone experiences.

Faculty in EECS will teach the computational data sciences courses. According to Hannan, this focus will concentrate on the mathematical techniques used to filter and process signals to remove unnecessary information. From a computer science and engineering perspective, students will be taught to develop software and hardware that can be used to solve the challenges big data generates.

“More data sciences programs exist, but it’s a bit more unusual at an undergraduate level,” said Hannan. “The need for more data scientists needs to be addressed so this is a young discipline. Its essence has been around for decades but it’s fairly new to formalize it as a standalone discipline; this reflects the trend that we need more experts to handle all this data.

Kifer added that data scientists are in demand in a variety of fields—companies will hire them internally or as consultants.

“In general, any business that can collect large-scale data about its operations is in need of a data scientist,” said Kifer. “This includes finance, health care, retail, manufacturing, advertising and social media/online services.”

Last Updated March 28, 2016