New technology gleans the gist of stories a person hears while laying in a brain scanner

A video presents a stylized depiction of a new language decoding process. A decoder generates multiple word sequences (paper strips) and predicts how similar each candidate word sequence is to the actual word sequence (beads of light) by comparing predictions of the user’s brain responses against the actual recorded responses. Credit: Jerry Tang/Alexander Huth

Functional magnetic resonance imaging (fMRI) captures coarse, colorful snapshots of the brain in action. While this specialized type of magnetic resonance imaging has transformed cognitive neuroscience, it isn’t a mind-reading machine: neuroscientists can’t look at a brain scan and tell what someone was seeing, hearing or thinking in the scanner.

But gradually scientists are pushing against that fundamental barrier to translate internal experiences into words using brain imaging. This technology could help people who can’t speak or otherwise outwardly communicate such as those who have suffered strokes or are living with amyotrophic lateral sclerosis. Current brain-computer interfaces require the implantation of devices in the brain, but neuroscientists hope to use non-invasive techniques such as fMRI to decipher internal speech without the need for surgery.

Now researchers have taken a step forward by combining fMRI’s ability to monitor neural activity with the predictive power of artificial intelligence language models. The hybrid technology has resulted in a decoder that can reproduce, with a surprising level of accuracy, the stories that a person listened to or imagined telling in the scanner. The decoder could even guess the story behind a short film that someone watched in the scanner, though with less accuracy.

“There’s a lot more information in brain data than we initially thought,” said Jerry Tang, a computational neuroscientist at the University of Texas at Austin and the study’s lead author, during a press briefing. The research, published on Monday in Nature Communications, is what Tang describes as “a proof of concept that language can be decoded from noninvasive recordings of brain activity.”

The decoder technology is in its infancy. It must be trained extensively for each person who uses it, and it doesn’t construct an exact transcript of the words they heard or imagined. But it is still a notable advance. Researchers now know that the AI language system, an early relative of the model behind ChatGPT, can help make informed guesses about the words that evoked brain activity just by looking at fMRI brain scans. While current technological limitations prevent the decoder from being widely used, for good or ill, the authors emphasize the need to enact proactive policies that protect the privacy of one’s internal mental processes. “What we’re getting is still kind of a ‘gist,’ or more like a paraphrase, of what the original story was,” says Alexander Huth, a computational neuroscientist at the University of Texas at Austin and the study’s senior author.

Here’s an example of what one study participant heard, as transcribed in the paper: “i got up from the air mattress and pressed my face against the glass of the bedroom window expecting to see eyes staring back at me but instead finding only darkness.” Inspecting the person’s brain scans, the model went on to decode, “i just continued to walk up to the window and open the glass i stood on my toes and peered out i didn’t see anything and looked up again i saw nothing.”

“Overall, there is definitely a long way to go, but the current results are better than anything we had before in fMRI language decoding,” says Anna Ivanova, a neuroscientist at the Massachusetts Institute of Technology who was not involved in the study.

The model misses a lot about the stories it decodes. It struggles with grammatical features such as pronouns. It can’t decipher proper nouns such as names and places, and sometimes it just gets things wrong altogether. But it achieves a high level of accuracy, compared with past methods. Between 72 and 82 percent of the time in the stories, the decoder was more accurate at decoding their meaning than would be expected from random chance.

“The results just look really good,” says Martin Schrimpf, a computational neuroscientist at the Massachusetts Institute of Technology, who was not involved in the study. Previous attempts to use AI models to decode brain activity showed some success but eventually hit a wall. Here Tang’s team used “a much more accurate model of the language system,” Schrimpf says. That model is GPT-1, which came out in 2018 and was the original version of GPT-4, the model that now underpins ChatGPT.

Neuroscientists have been working to decipher fMRI brain scans for decades to connect with people who can’t outwardly communicate. In a key 2010 study, scientists used fMRI to pose “yes or no” questions to an individual who couldn’t control his body and outwardly appeared to be unconscious.

But decoding entire words and phrases is a more imposing challenge. The biggest roadblock is fMRI itself, which doesn’t directly measure the brain’s rapid firing of neurons but instead tracks the slow changes in blood flow that supply those neurons with oxygen. Tracking these relatively sluggish changes leaves fMRI scans temporally “blurry”: Picture a long-exposure photograph of a bustling city sidewalk, with facial features obscured by the movement. Trying to use fMRI images to determine what happened in the brain at any particular moment is like trying to identify the people in that photograph. This is a glaring problem for deciphering language, which flies by fast, with one fMRI image capturing responses of up to about 20 words.

Now it appears that the predictive abilities of AI language models can help. In the new study, three participants laid stock-still in an fMRI scanner for 15 sessions that totaled to 16 hours. Through headphones, they listened to excerpts from podcasts and radio shows such as The Moth Radio Hour and the New York Times’ Modern Love. Meanwhile the scanner tracked the blood flow across different language-related regions of the brain. These data were then used to train an AI model that found patterns in how each subject’s brain activated in response to certain words and concepts.

After uncovering these patterns, the model took a new series of brain images and predicted what a person was hearing at the time they were taken. It worked gradually through the story, comparing the new scans to the AI’s predicted patterns for a host of candidate words. To prevent having to check every word in the English language, the researchers used GPT-1 to predict which words were most likely to appear in a particular context. This created a small pool of possible word sequences, from which the most likely candidate could be chosen. Then GPT-1 moved on to the next string of words until it had decoded an entire story.

The researchers used the same methods to decode stories that participants only imagined telling. They instructed participants to picture themselves narrating a detailed, one-minute story. While the decoder’s accuracy decreased, it still worked better than expected, compared with random chance. This indicates that similar brain regions are involved in imagining something versus simply perceiving it. The ability to translate imagined speech into words is critical for designing brain-computer interfaces for people who are unable to communicate with language.

What’s more, the findings went beyond language. In the most surprising result, the researchers had people watch animated short films without sound in the scanner. Despite being trained explicitly on spoken language, the decoder could still decipher stories from brain scans of participants watching the silent movies. “I was more surprised by the video than the imagined speech,” Huth says, because the movies were muted. “I think we are decoding something that is deeper than language,” he said at the press briefing.

Still, the technology is many years away from being used as a brain-computer interface in everyday life. For one thing, the scanning technology isn’t portable—fMRI machines occupy entire rooms at hospitals and research institutions and cost millions of dollars. But Huth’s team is working to adapt these findings for existing brain-imaging systems that can be worn like a cap such as functional near-infrared spectroscopy (fNIRS) and electroencephalography (EEG).

The technology in the new study also requires intense customization, with hours of fMRI data needed for each individual. “It’s not like earbuds, where you can just put them in, and they work for you,” Schrimpf says. With each user, the AI models need to be trained to “adapt and adjust to your brain,” he adds. Schrimpf guesses that the technology will require less customization as researchers uncover commonalities across people’s brains in the future. Huth, by contrast, thinks that the more accurate models will be more detailed, requiring even more precise customization.

The team also tested the technology to see what might happen if someone wanted to resist or sabotage the scans. A study participant could spoof it by just telling another story in their head. When the researchers asked participants do this, the results were gibberish, Huth says. “[The decoder] just kind of fell apart entirely.”

Even at this early stage, the authors stress the importance of considering policies that protect the privacy of our inner words and thoughts. “This can’t work yet to do really nefarious things,” Tang says, “but we don’t want to let it get to that point before we maybe have policies in place that would prevent that.”

Rights & Permissions


Allison Parshall is a science journalist, multimedia editor, podcast host and current news intern at Scientific American. Follow her on Twitter @parshallison

FYI:  Jelani ClarkeDr. Carmen-Silva SergiouShady El Damaty Li G