Columbia neuroengineers have created a system that translates thought into intelligible, recognizable speech. Devices monitor brain activity and Artificial Intelligence reconstructs the words a person hears. This breakthrough harnesses the power of speech synthesizers and artificial intelligence. It could lead to new ways for computers to communicate directly with the brain.
People with amyotrophic lateral sclerosis (ALS) or recovering from stroke could regain their ability to communicate with the outside world.
They plan to test more complicated words and sentences next, and they want to run the same tests on brain signals emitted when a person speaks or imagines speaking. Ultimately, they hope their system could be part of an implant, similar to those worn by some epilepsy patients, that translates the wearer’s thoughts directly into words.
Brain maps
The researchers mapped electrode positions to brain anatomy using registration of the post-implant computed tomography (CT) to the pre-implant MRI via the post-op MRI. After coregistration, the electrodes were identified on the post-implantation CT scan using BioImage Suite. Following coregistration, the subdural grid and strip electrodes were snapped to the closest point on the reconstructed brain surface of the pre-implantation MRI.
DNN architecture
They used a common deep neural network architecture that consists of two stages: feature extraction and feature summation.
They had five different architectures for the feature extraction part of the network:
1. the fully connected network (FCN, also known as the multilayer perceptron or MLP),
2. the locally connected network (LCN),
3. convolutional neural network (CNN),
4. FCN + CNN, and
5. FCN + LCN
For auditory spectrogram reconstruction, they directly regressed the 128 frequency bands using a multilayer FCN model for feature extraction.
DNN training and cross validation
The networks were implemented in Keras with a Tensorflow backend. Initialization of the weights was performed using a previously proposed method which was specifically developed for deep multilayer networks with rectified linear units (ReLUs) as their nonlinearities.
Scientific Reports – Towards reconstructing intelligible speech from the human auditory cortex
Abstract
Auditory stimulus reconstruction is a technique that finds the best approximation of the acoustic stimulus from the population of evoked neural activity. Reconstructing speech from the human auditory cortex creates the possibility of a speech neuroprosthetic to establish a direct communication with the brain and has been shown to be possible in both overt and covert conditions. However, the low quality of the reconstructed speech has severely limited the utility of this method for brain-computer interface (BCI) applications. To advance the state-of-the-art in speech neuroprosthesis, we combined the recent advances in deep learning with the latest innovations in speech synthesis technologies to reconstruct closed-set intelligible speech from the human auditory cortex. We investigated the dependence of reconstruction accuracy on linear and nonlinear (deep neural network) regression methods and the acoustic representation that is used as the target of reconstruction, including auditory spectrogram and speech synthesis parameters. In addition, we compared the reconstruction accuracy from low and high neural frequency ranges. Our results show that a deep neural network model that directly estimates the parameters of a speech synthesizer from all neural frequencies achieves the highest subjective and objective scores on a digit recognition task, improving the intelligibility by 65% over the baseline method which used linear regression to reconstruct the auditory spectrogram. These results demonstrate the efficacy of deep learning and speech synthesis algorithms for designing the next generation of speech BCI systems, which not only can restore communications for paralyzed patients but also have the potential to transform human-computer interaction technologies.
Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.
Fantastic article. I am amazed at the sincere eloquence and direct excitement that this issue brings. The abstract is superb. I am a nascent entrepreneur and have been interested in this topic for my theory of a labyrinth effect I think those with disabilities have created… ie. A universal language that has come from out of the afflictions. This is great and I wonder it’s effects and gifts it could bring autism? Fantastic! Sincerely yours Nicholas R Venceil Shaxpikre@gmail.com