The lateral surface of the human cortex contains neural populations that encode key representations of both perceived and produced speech Recent investigations of the underlying mechanisms of these speech representations have shown that acoustic and phonemic speech content can be decoded directly from neural activity in superior temporal gyrus (STG) and surrounding secondary auditory regions during listening. Similarly, activity in ventral sensorimotor cortex (vSMC) can be used to decode characteristics of produced speech based primarily on kinematic representations of the supralaryngeal articulators and the larynx for voicing and pitch17. A major challenge for these approaches is achieving high single-trial accuracy rates, which is essential for a clinically relevant implementation to aid individuals who are unable to communicate due to injury or neurodegenerative disorders.
Recently, speech decoding paradigms have been implemented in real-time applications, including the ability to map speech-evoked sensorimotor activations, generate neural encoding models of perceived phonemes, decode produced isolated phonemes, detect voice activity, and classify perceived sentences. These demonstrations are important steps toward the development of a functional neuroprosthesis for communication that decodes speech directly from recorded neural signals. However, to the best of our knowledge there have not been attempts to decode both perceived and produced speech from human participants in a real-time setting that resembles natural communication. Multimodal decoding of natural speech may have important practical implications for individuals who are unable to communicate due to stroke, neurodegenerative disease, or other causes. Despite advances in the development of assistive communication interfaces that restore some communicative capabilities to impaired patients via non-invasive scalp electroencephalography, invasive microelectrode recordings, electrocorticography (ECoG), and eye tracking methodologies, to date there is no speech prosthetic system that allows users to have interactions on the rapid timescale of human conversation.
Natural communication often occurs in dialogue, differentially engaging auditory and sensorimotor brain regions during listening and speaking. However, previous attempts to decode speech directly from the human brain typically consider listening or speaking tasks in isolation. Here, human participants listened to questions and responded aloud with answers while we used high-density electrocorticography (ECoG) recordings to detect when they heard or said an utterance and to then decode the utterance’s identity. Because certain answers were only plausible responses to certain questions, we could dynamically update the prior probabilities of each answer using the decoded question likelihoods as context. We decode produced and perceived utterances with accuracy rates as high as 61% and 76%, respectively (chance is 7% and 20%). Contextual integration of decoded question likelihoods significantly improves answer decoding. These results demonstrate real-time decoding of speech in an interactive, conversational setting, which has important implications for patients who are unable to communicate.
SOURCES- Facebook, Nature Communications
Written By Brian Wang, Nextbigfuture.com