Artificial Intelligence

AI Decodes Brain Activity Into Speech With High Accuracy

Brain-computer interface uses AI to transform brain activity into speech.

Posted September 4, 2023 | Reviewed by Kaja Perina

Geralt/Pixabay

A new study published in this month’s Journal of Neural Engineering demonstrates how a brain-computer interface (BCI) uses artificial intelligence (AI) deep learning to translate brain activity to speech with up to 100% accuracy.

“The present study demonstrates that high accuracy and robust decoding can be achieved on rather small datasets (10 repetitions of 12 words) if using speech reconstructions for classification,” wrote lead author Julia Berezutskaya, a postdoctoral researcher at Radboud University Donders Institute for Brain Cognition and Behavior and University Medical Centre (UMC) Utrecht Brain Centre, along with Zachary V Freudenburg, Mariska J Vansteensel, Erik Aarnouts, Nick Ramsey, and Marcel van Gerven. “These results highlight the potential of this approach for further use in BCI.”

Brain-computer interfaces, also called brain-machine interfaces (BMIs), offer hope to those who have lost the ability to speak or move by decoding patient intentions from brain activity in order to operate and control robotic limbs, computer software applications such as email, and other external devices.

“Thus far, no comprehensive study on optimization of deep learning models for speech reconstruction has been performed,” the researchers wrote. “Moreover, there is a lack of consensus regarding choices of brain and audio speech features that are used in such models.”

Using speech reconstruction from high-density electrocorticography recordings of brain activity produced in the sensorimotor cortex area during speech production, the team validated and enhanced a neural decoding method for this study.

“Understanding which decoding strategies deliver best and directly applicable results is crucial for advancing the field,” the scientists wrote.

The speech reconstruction used brain activity data as input in order to produce graphic representations of a spectrum called speech spectrograms. Brain activity data of the sensorimotor area was collected from the study participants using high-density electrocorticography (HD ECoG) recordings of five people speaking 12 words out loud ten times each. The participants had implanted HD ECoG grids that used the NeuroPort neural recording system by Blackrock Microsystems.

The researchers evaluated three different deep learning speech reconstruction models: a sequence-to-sequence (S2S) recurrent neural network (RNN), a multilayered perceptron (MLP), and a DenseNet (DN) convolutional neural network (CNN).

Across all of the models, individual word decoding in reconstructed speech by AI machine learning classifiers had achieved 92% to 100% accuracy according to the scientists. Furthermore, they discovered that for more accurate AI speech reconstructions, highly complex AI deep neural network models are needed.

The multi-layered perceptron (MLP), with its relatively simple computing architecture consisting of basic linear operations followed by a non-linear activation function, was outperformed by AI models with more complex computational operations. The recurrent sequence-to-sequence, with its attention mechanism and state memory, and the convolutional DenseNet, with its skip-connections and local convolutions, are both AI models that use more complex computations compared to the multi-layered perceptron AI model.

The study results suggest that the combination of artificial intelligence and a brain-computer interface for direct speech reconstruction from brain activity in the sensorimotor area yields highly accurate word decoding.

“These results have the potential to further advance the state-of-the-art in speech decoding and reconstruction for subsequent use in BCIs for communication in individuals with severe motor impairments,” the researcher team concluded.