The McGurk Effect

Speech: A Sight to Behold

The McGurk Effect

Research on the role of vision in speech began in the 1970s, when psychologist Harry McGurk dubbed the vocal sounds "bah," "kah," "gah," or "pah" onto a video of a real woman saying an alternate one of the four sounds.

Just as in Massaro's later work with the computerized talking head, McGurk found that people who watched the videos often misheard the words actually spoken. If the woman mouthed "gah," but was dubbed with a voice saying "bah," people usually heard "dah." When viewers turned their backs to the video, they heard "bah" correctly.

McGurk also discovered viewers couldn't force themselves to hear the correct vocal sounds when told they were being fed the wrong visual information. Combining the two types of information occured naturally. McGurk published his findings in a 1976 Nature paper.

With "Baldy," Massaro performs the same studies, but with the added bonus of having an animated face that will repeatedly make facial movements in ways humans can't. For example, Baldy can mouth and say a sound half way between "bah" and "dah." Massaro can then determine what people think the talking head said. From this work, he has concluded that visible speech and auditory speech are given equal weight in determining what is said.

However, our eyes and ears each have their 60 seconds of fame every now and then. The brain appears to give more weight to what we hear when someone mouths sounds that don't leave a distinct signature on their face, such as the "geh" and "deh" in "get" and "debt." Conversely, visible speech matters more when we listen to someone in a noisy place, or when someone uses speech sounds similar to other sounds, such as the "taa" and "paa" in "tack" and "pack."