Research

Computer scientists aim to accelerate data-intensive workflows

UNIVERSITY PARK, Pa. -- Four Penn State researchers have received a three-year, $850,000 grant from the National Science Foundation (NSF) to enhance time-to-completion of data- and compute-intensive bioinformatics workflows on supercomputers.

Kamesh Madduri, assistant professor of computer science and engineering, is principal investigator of the project, titled "End-to-End Acceleration of Genomic Workflows on Emerging Heterogeneous Supercomputers."

The team includes Mahmut Kandemir, Paul Medvedev and Padma Raghavan, who are all faculty in the department of computer science and engineering. Madduri and Medvedev are part of the Genome Sciences Institute in the Huck Institutes of the Life Sciences, and Medvedev is also a faculty member in the Department of Biochemistry and Molecular Biology.

Their research will explore lightweight data layout reorganizations at multiple granularities to enhance locality; co-tuning of tasks to overlap compute-, communication- and I/O-phases; and locality-aware load-balancing and coordinated resource partitioning to exploit high-performance computing platforms. It will initially target accelerating multiple components of genetic variant detection workflow in bioinformatics.

One of their key goals is to design methodologies and task-specific optimizations targeting the massive parallelism and scalability potential of current heterogeneous supercomputers, so that the developed techniques can be easily transferred and applied to dedicated academic cluster and commercial cloud environments.

Madduri explained, "Our new bioinformatics workflow acceleration methodologies aim to minimize data movement between various levels of the memory hierarchy, exploit compute-node heterogeneity and maximize data locality."

About the potential project impact, Medvedev said, "Genetic variation detection workflows are ubiquitous in disease studies and will soon become a key component of personalized medicine technologies."

The grant is part of the NSF’s Exploiting Parallelism and Scalability program, which supports groundbreaking research leading to a new era of parallel computing.

Last Updated September 26, 2014