The timing of regular sequences: Production, perception, and covariation

The temporal structure of behavior provides information that allows the tracking of temporal regularity in the sensory and sensorimotor domains. In turn, temporal regularity allows the generation of predictions about upcoming events and to adjust behavior accordingly. These mechanisms are essential to ensure behavior beyond the level of mere reaction. However, efficient temporal processing is required to establish adequate internal representations of temporal structure. The current study used two simple paradigms, namely, finger-tapping at a regular self-chosen rate (spontaneous motor tempo) and ERPs of the EEG (EEG/ERP) recorded during attentive listening to temporally regular and irregular “oddball” sequences to explore the capacity to encode and use temporal regularity in production and perception. The results show that specific aspects of the ability to time a regular sequence of events in production covary with the ability to time a regular sequence in perception, probably pointing toward the engagement of domain-general mechanisms.

Prediction across sensory modalities: A neurocomputational model of the McGurk effect ($)

Here we assessed the role of dynamic cross-modal predictions in the outcome of AV speech integration using a computational model that processes continuous audiovisual speech sensory inputs in a predictive coding framework. The model involves three processing levels: sensory units, units that encode the dynamics of stimuli, and multimodal recognition/identity units. The model exhibits a dynamic prediction behavior because evidence about speech tokens can be asynchronous across sensory modality, allowing for updating the activity of the recognition units from one modality while sending top–down predictions to the other modality. We explored the model's response to congruent and incongruent AV stimuli and found that, in the two-dimensional feature space spanned by the speech second formant and lip aperture, fusion stimuli are located in the neighborhood of congruent /ada/, which therefore provides a valid match. Conversely, stimuli that lead to combination percepts do not have a unique valid neighbor. In that case, acoustic and visual cues are both highly salient and generate conflicting predictions in the other modality that cannot be fused, forcing the elaboration of a combinatorial solution.