A perceived social partner alters the neural processing of speech

Mounting evidence suggests that social interaction changes how communicative behaviors (e.g., spoken language, gaze) are processed, but the precise neural bases by which social-interactive context may alter communication remain unknown. Various perspectives suggest that live interactions are more rewarding, more attention-grabbing, or require increased mentalizing—thinking about the thoughts of others. Dissociating between these possibilities is difficult because most extant neuroimaging paradigms examining social interaction have not directly compared live paradigms to conventional “offline” (or recorded) paradigms. We developed a novel fMRI paradigm to assess whether and how an interactive context changes the processing of speech matched in content and vocal characteristics. Participants listened to short vignettes—which contained no reference to people or mental states—believing that some vignettes were prerecorded and that others were presented over a real-time audio-feed by a live social partner. In actuality, all speech was prerecorded. Simply believing that speech was live increased activation in each participant’s own mentalizing regions, defined using a functional localizer. Contrasting live to recorded speech did not reveal significant differences in attention or reward regions. Further, higher levels of autistic-like traits were associated with altered neural specialization for live interaction. These results suggest that humans engage in ongoing mentalizing about social partners, even when such mentalizing is not explicitly required, illustrating how social context shapes social cognition. Understanding communication in social context has important implications for typical and atypical social processing, especially for disorders like autism where social difficulties are more acute in live interaction.


The Role of Corticostriatal Systems in Speech Category Learning

One of the most difficult category learning problems for humans is learning nonnative speech categories. While feedback-based category training can enhance speech learning, the mechanisms underlying these benefits are unclear. In this functional magnetic resonance imaging study, we investigated neural and computational mechanisms underlying feedback-dependent speech category learning in adults. Positive feedback activated a large corticostriatal network including the dorsolateral prefrontal cortex, inferior parietal lobule, middle temporal gyrus, caudate, putamen, and the ventral striatum. Successful learning was contingent upon the activity of domain-general category learning systems: the fast-learning reflective system, involving the dorsolateral prefrontal cortex that develops and tests explicit rules based on the feedback content, and the slow-learning reflexive system, involving the putamen in which the stimuli are implicitly associated with category responses based on the reward value in feedback. Computational modeling of response strategies revealed significant use of reflective strategies early in training and greater use of reflexive strategies later in training. Reflexive strategy use was associated with increased activation in the putamen. Our results demonstrate a critical role for the reflexive corticostriatal learning system as a function of response strategy and proficiency during speech category learning.

The auditory representation of speech sounds in human motor cortex

In humans, listening to speech evokes neural responses in the motor cortex. This has been controversially interpreted as evidence that speech sounds are processed as articulatory gestures. However, it is unclear what information is actually encoded by such neural activity. We used high-density direct human cortical recordings while participants spoke and listened to speech sounds. Motor cortex neural patterns during listening were substantially different than during articulation of the same sounds. During listening, we observed neural activity in the superior and inferior regions of ventral motor cortex. During speaking, responses were distributed throughout somatotopic representations of speech articulators in motor cortex. The structure of responses in motor cortex during listening was organized along acoustic features similar to auditory cortex, rather than along articulatory features as during speaking. Motor cortex does not contain articulatory representations of perceived actions in speech, but rather, represents auditory vocal information.


Dissociating Contributions of the Motor Cortex to Speech Perception and Response Bias by Using Transcranial Magnetic Stimulation

In the present TMS study, we addressed this question by using signal detection theory to calculate sensitivity (i.e., d′) and response bias (i.e., criterion c). We used repetitive TMS to temporarily disrupt the lip or hand representation in the left motor cortex. Participants discriminated pairs of sounds from a “ba”–“da” continuum before TMS, immediately after TMS (i.e., during the period of motor disruption), and after a 30-min break. We found that the sensitivity for between-category pairs was reduced during the disruption of the lip representation. In contrast, disruption of the hand representation temporarily reduced response bias.

fMRI activations in inferior temporal lobe during intelligible speech comprehension

The aim of this study was to use intelligible and unintelligible (spectrally rotated) sentences to determine if the vATL could be detected during a passive speech comprehension task using a dual-echo acquisition. A whole brain analysis for an intelligibility contrast showed bilateral superior temporal lobe activations and a cluster of activation within the left vATL.

Rapid and automatic speech-specific learning mechanism in human neocortex

We found a robust index of neurolexical memory-trace formation: a rapid enhancement of the brain's activation elicited by novel words during a short (~ 30 min) perceptual exposure, underpinned by fronto-temporal cortical networks, and, importantly, correlated with behavioural learning outcomes. Crucially, this neural memory trace build-up took place regardless of focused attention on the input or any pre-existing or learnt semantics.

Stimulus-independent semantic bias misdirects word recognition in older adults

Rogers & Wingfield in JASA:

Older adults' normally adaptive use of semantic context to aid in word recognition can have a negative consequence of causing misrecognitions, especially when the word actually spoken sounds similar to a word that more closely fits the context. Word-pairs were presented to young and older adults, with the second word of the pair masked by multi-talker babble varying in signal-to-noise ratio. Results confirmed older adults' greater tendency to misidentify words based on their semantic context compared to the young adults, and to do so with a higher level of confidence. This age difference was unaffected by differences in the relative level of acoustic masking.

Prediction across sensory modalities: A neurocomputational model of the McGurk effect ($)

Here we assessed the role of dynamic cross-modal predictions in the outcome of AV speech integration using a computational model that processes continuous audiovisual speech sensory inputs in a predictive coding framework. The model involves three processing levels: sensory units, units that encode the dynamics of stimuli, and multimodal recognition/identity units. The model exhibits a dynamic prediction behavior because evidence about speech tokens can be asynchronous across sensory modality, allowing for updating the activity of the recognition units from one modality while sending top–down predictions to the other modality. We explored the model's response to congruent and incongruent AV stimuli and found that, in the two-dimensional feature space spanned by the speech second formant and lip aperture, fusion stimuli are located in the neighborhood of congruent /ada/, which therefore provides a valid match. Conversely, stimuli that lead to combination percepts do not have a unique valid neighbor. In that case, acoustic and visual cues are both highly salient and generate conflicting predictions in the other modality that cannot be fused, forcing the elaboration of a combinatorial solution.

Encoding speech sequence probability in human temporal cortex

In my quick first pass, this seems like a nice demonstration of phonotactic probability (likelihood of auditory transitions) being reflected in superior temporal gyrus. Though, the effects of lexicality suggests something more than pure transition probability is going on here.

Transition probability first negatively modulated neural responses, followed by positive modulation of neural responses, consistent with coordinated predictive and retrospective recognition processes, respectively. Furthermore, transition probability encoding was different for real English words compared with nonwords, providing evidence for online interactions with high-order linguistic knowledge.

Divided attention disrupts perceptual encoding during speech recognition

Adding cognitive load increased the likelihood that listeners would select a word acoustically similar to the target even though its frequency was lower than that of the target. Thus, there was no evidence that cognitive load led to a high-frequency response bias. Rather, cognitive load seems to disrupt sublexical encoding, possibly by impairing perceptual acuity at the auditory periphery.