John Graham publishes book on the problem of missing data in research

UNIVERSITY PARK, Pa. – What's a researcher to do when, halfway through her study, several participants drop out, citing changes to their health status or their availability? Missing data have long plagued those conducting applied research in the social, behavioral, and health sciences, but while good missing-data analysis solutions are available, practical information about implementation of these solutions has been lacking.

In a new book titled "Missing Data: Analysis and Design," John Graham, Penn State professor of biobehavioral health and human development and family studies, offers practical information to researchers who are not statisticians to implement modern missing-data procedures properly in their research, and to reap the benefits in terms of improved accuracy and statistical power.

Graham's own research focuses on the evaluation of health promotion and disease prevention interventions. He specializes in evaluation research methods, including missing data analysis and design, structural equation modeling, and measurement.

"Missing data are a problem for three reasons," said Graham. "The first is a practical reason. Most statistical analysis tools were built with complete data in mind, so for most analyses, there is simply no convenient way of handling the missing values. The second reason is that missing data behave in many respects like data that were never collected in the first place. If 200 of a person's 1,000 subjects drop out of the study, statistical power is reduced to what it would have been if the person had collected data on only 800 subjects to begin with. The third reason is that because of the missing data, any conclusions based on the statistical analysis of the data may be biased or misleading. Modern missing data analysis procedures help with all three of these problems."

According to Graham, for researchers with limited missing-data analysis experience, the book offers an easy-to-read introduction to the theoretical underpinnings of analysis of missing data; provides clear, step-by-step instructions for performing state-of-the-art multiple imputation analyses; and offers practical advice, based on 20 years of experience, for avoiding and troubleshooting problems. For more advanced readers, the book provides unique discussions of attrition, non-Monte-Carlo techniques for simulations involving missing data, evaluation of the benefits of auxiliary variables, and cost-effective planned missing data designs.

A related website contains free downloads of the supplementary software as well as sample empirical data sets and a variety of practical exercises described in the book to enhance and reinforce the reader’s learning experience. The book and its website work together to enable beginners to gain confidence in their ability to conduct missing data analysis, and more advanced readers to expand their skill set.

More information about the book is available at


Last Updated September 10, 2012