Map Wise

Heather Fletcher
May 01, 2000

Sanshzar Kettebekov stands in front of the camera. He raises his arm and a red cursor-hand appears on the map on the big-screen TV in front of him. He points at a building and asks the computer what department is housed there.

"Electrical Engineering," the computer responds in Kettebekov's voice, accent included—because he's the one who programmed the vocal data into the computer.

cartoon weatherman in front of map

To make a talking map, researchers first had to teach a computer to recognize human gestures, to track them, and to understand what they mean.

"Show me the nearest parking," Kettebekov says. The computer draws a line connecting the building to the closest lot.

Kettebekov is a graduate student in industrial engineering at Penn State. His adviser, engineer Rajeev Sharma, is working to give computers vision: He gives them "eyes," then teaches them to understand what those eyes see. Already, Sharma and his students have produced an application that represents a significant step toward their goal: an interactive computerized map of the University Park campus.

To develop the map, Sharma and his students had to first teach a computer to recognize human gestures, to track them, and to understand what they mean. Sharma needed gesture data to analyze, so he recorded hours of footage from The Weather Channel. His team then produced algorithms, sets of programs, that allowed a computer to track a weather person's hands and categorize the gestures. "This is outlining," Sharma demonstrates, circling something in the New England area of an imaginary U. S. map. "This is pointing." He touched his index finger to his note pad. Then he made a motion that looked like a basketball referee calling a "traveling" offense. "These motions are called beats. You have to filter this stuff out."

Sharma's team also used the weather footage to teach the computer how to correlate words with gestures. "If someone says, 'Show me how to get from here to there,'" Sharma explained, "it doesn't mean anything without the pointing. I need to know that you want to get from Hammond Building to Pond Lab."

A tripod-mounted camera is the computer's eyes; a clip-on microphone acts as its ears. The program runs from three Unix workstations and one PC networked together: one computer operates voice recognition software, one controls tracking, one is responsible for gesture recognition, and one retrieves the requested information. (This configuration helps the researchers troubleshoot.)

The map is programmed to recognize human heads and hands. When no one is in front of the camera, the little squares that the tracking computer uses to define head and hand fields move to the top of the monitor screen in a computer version of boredom. When the computer sees someone it can lock onto, it uses a statistical technique called a Kalman filter to guess where the person's face and hands will go next. This allows the system to interact with the user in almost real time. (Networking causes a slight delay like the one you experience when your desktop computer is thinking.)

Once the map program has recognized a gesture, it uses statistical methods called Hidden Markov Models to determine whether or not the gesture is meaningful. At the same time, another part of the program hears and records the user's speech, and both the visual and the audio data are given a time stamp. The computer uses this to pair the gesture with the words. The map then analyzes the combined signals, identifies the user's request, and searches for the answer.

Creating the interactive map was complicated, Sharma says, because it incorporates elements from cognitive science, natural language processing, linguistics, vision, and speech recognition. But limiting the context of this gesture recognition model made the project successful. "Facilitating general human-computer interaction is very difficult because the more general the situation, the more gestures you could use," Sharma explains.

Another approach, Sharma adds, is to work with predefined sets of gestures. But this requires that people be taught the gestures they need to communicate with a computer. Sharma believes the time has come to make computers work harder. "They are fast enough and smart enough now—let them do the work to try to understand me. We don't want to confine the operator. We want people to act how they would normally act."

Rajeev Sharma, Ph.D., is assistant professor of computer science and engineering in the College of Engineering, 317 Pond Lab, University Park, PA 16802; 814-863-0147; rsharma@cse.psu.edu. This project is supported by the National Science Foundation and the Army Research Lab. Sanshzar Kettebekov is a doctoral student in industrial engineering; 863-4799; kettebek@cse.psu.edu. Other graduate students currently working on the project are Hongseok Kim, Ediz Polat, Jiongyu Cai, and Yuhui Zhou.Writer Heather L. Fletcher is a graduate student in chemistry.

Last Updated May 01, 2000