IST undergrad uses internship, research to explore bias in machine learning

Jordan Ford
July 16, 2018

UNIVERSITY PARK, Pa. — This past spring, Josh Irwin, a rising senior majoring in information sciences and technology, was an intern with the United States Military Academy. Now, he's returning from an international conference where he discussed a paper he co-authored as part of a research team at the Army Cyber Institute.

Irwin was third author of “Predicting Bias in Machine Learned Classifiers Using Clustering,” which was accepted to the proceedings of the 2018 International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation. Irwin's internship and research were supported in part by the DARPA Explainable Artificial Intelligence program, and he used a travel grant awarded by the conference to attend the event, held July 10-13 in Washington, D.C. 

The research team — which also included Robert Thomson, Elie Alhajjar and Travis Russell from the Military Academy — investigated how to identify bias in machine learning and, ideally, how to mitigate that bias from the system before it affects results. 

The researchers cited that artificial intelligence is only as useful as the data used to train it, and it often reflects the bias of those who create it. This can make this bias implicit in the system.  

“There are a lot of times when bias in machine learning shows, particularly with regard to race and gender,” Irwin said. 

For example, virtual assistants have often had difficulty understanding and responding to requests from different English dialects and vernacular. As the use of artificial intelligence grows more widespread, this means certain groups of people might be left vulnerable or excluded from various opportunities. 

Past research has mainly focused on proving that a machine learning system is biased against a certain attribute, rather than identifying which attribute is the cause of the bias. Irwin’s group focused on locating the source of the bias and removing it from the process. 

To do this, the researchers examined 70,000 images of handwritten numerals and trained a system to determine which number — zero through nine — appeared in the image. Then, they grouped the results into clusters based on how accurately the system classified the numbers. 

“We predicted that we’d have ten strong clusters that did a good job at identifying the correct number and ten weak clusters that were likely to show bias,” explained Irwin. 

They used 40,000 of the numerals to train the system and an additional 20,000 numerals aimed at removing bias from the different clusters. Once the system was accurately trained, they tested it using the remaining 10,000 numeral dataset. If accuracy dropped below 95 percent, they retrained the system with new images.

Additionally, the group tested their algorithm on a series of well-known datasets related to voting records, diabetes and the census. Ultimately, their process showed greater accuracy in identifying bias in machine learning that would otherwise be difficult, if not impossible, to detect. 

“One advantage of our method is that the user need not identify the source of bias since the source of the bias is identified by a random clustering algorithm,” the group wrote in their paper. 

Added Irwin, “The ultimate goal is to further refine this research and apply our mitigation algorithms to other datasets.” 

Though his internship with the Military Academy ended in the spring, the Lansdale, Pennsylvania, native continues to work as a research assistant in the College of Information Sciences and Technology. Working with Frank Ritter, professor of IST, Irwin assists on various funded research projects, including the development of web-based tools that allow users to create online tutors. 

Each of these experiences is helping Irwin do more of what he loves. 

“I really enjoy the computer programming part of my work,” he said. “When there’s a problem that can be identified and solved, it’s really exciting.”

Last Updated July 16, 2018