Interspeech 2019 Special Session: Dynamics of Emotional Speech Exchanges in Multimodal Communication

Graz, Austria, Sep. 15-19, 2019

Special Session Format:

The format of the special session allows 20 minutes for each accepted presentation, including questions.
A round table is planned (scheduled time: 1 hour), on the theme: “The dependability of voice in interactional exchanges”.

This Special Session is promoted by the H2020 EMPATHIC RIA (grant 769872, and  H2020 MENHIR MSCA-RISE (grant 823907) actions.

A special issue of the Journal Computer Speech and Language is foreseen as an outcome of this special session.

Important Dates:

  • March 29, 2019 Paper submission deadline
  • April 5, 2019 Final paper submission deadline
  • June 17, 2019 Acceptance/rejection notification
  • July 1, 2019 Camera-ready paper due

Paper Submission:

Guidelines for paper submissions and paper presentations, and detailed author kit can be found  at the Interspeech web site:

Papers must be submitted through the Interspeech electronic paper submission system.

Please indicate that the paper should be included in the Special Session on:  Dynamics of Emotional Speech Exchanges in Multimodal Communication


Graz, Austria, Sep. 15-19, 2019. About Graz

Program Chairs and Contacts :

ANNA ESPOSITO, Università della Campania “Luigi Vanvitelli”, Italy,,

MARIA INÉS TORRES, Universidad del País Vasco UPV/EHU, Spain,

OLGA GORDEEVA, Acapela Group, Belgium,

RAQUEL JUSTO, Universidad del País Vasco UPV/EHU, Spain,

ZORAIDA CALLEJAS CARRIÓN, Universidad de Granada,  Spain,

KRISTIINA JOKINEN, AIST AI Research Center in Tokyo, Japan,

GENNARO CORDASCO, Università della Campania “Luigi Vanvitelli”, Italy,

BJIOERN SCHULLER, ICL, UK, and University of Augsburg, Germany,

CARL VOGEL, Trinity College Dublin, Ireland,

ALESSANDRO VINCIARELLI, University of Glasgow, Glasgow, UK,

GERARD CHOLLET, Intelligent Voice, London, UK,

NEIL GLACKIN, Intelligent Voice LTD, London, UK,

Interspeech 2019 will be held in Graz (Austria) from September 15 to 19, 2019. We invite you to submit your work to a special session of Interspeech 2019, Dynamics of Emotional Speech Exchanges in Multimodal Communication.


Emotional expression plays a vital role in creating social linkages, producing cultural exchanges, influencing relationships and communicating experiences. Emotional information is transmitted and perceived simultaneously through verbal (the semantic content of a message as well as its linguistic form) and nonverbal (non-linguistic vocalizations, voice quality, facial expressions, gestures, paralinguistic information, turn-taking, response selection) communicative channels. These channels each constitute communication modes.

Research devoted to understanding the relationship between verbal and nonverbal communication modes, and investigating the perceptual and cognitive processes involved in the coding/decoding of emotional states (as well as their mathematical modelling and algorithmic implementation) is particularly relevant in the fields of Human-Human and Human-Computer Interaction for developing friendly and emotionally coloured technologies, whether assistive or entertainment-oriented.

When it comes to speech, it is unmistakable that the same linguistic expression may be uttered for teasing, challenging, stressing, supporting, inquiring, answering or as expressing an authentic doubt. The appropriate continuance of the interaction depends on detecting the addresser’s mood.

To progress towards a better understanding and modelling of such interactional facets of communication, there is a need for more accurate solutions to the following challenges making this  special session special:

  1. Identify signal processing algorithms able to capture emotional features from multimodal social signals and, in particular, from speech, realize a coherent multimodal fusion of such features, and produce coherent emotional responses;
  2. Implement fast and efficient computational models trained to classify vocal emotional features retaining their hierarchically structured, time-dependent and reciprocally connected relationships from multimodal channels;
  3. Identify the emotional and empathic contents (either successful or unsuccessful) underpinning daily interactional exchanges in order to generate affective models of them for user-centered human-machine interaction, and assistive ICT interfaces;
  4. Build models that integrate emotional behaviour in interaction strategies (elicit emotional response, react to emotion, favour engagement and rapport);
  5. Explore what kind of impact affective user models have on the development of practical applications that reproduce emotional behaviour;
  6. Identify relevant ethical aspects and discuss the societal impact of affective technology.

The themes of this special session are multidisciplinary in nature, and closely connected in their final aims to identify features from realistic dynamics of emotional speech exchanges. It includes formal and informal social signals, communication modes, hearing processes, and physical or cognitive functionalities. Of particular further interest are analyses of visual, textual and audio information and corresponding computational efforts to automatically detect and interpret their semantic and pragmatic contents. Related applications of these interdisciplinary facets are ICT systems and their interfaces able to detect health and affective states of their users, interpret their psychological and behavioural patterns and support them through positively designed interventions to improve their quality of life.
Themes include but are not limited to:

  • Vocal signals for detecting affective well-being and emotional states
  • Interpretation of features of interaction
  • Detection of health and psychological states from speech-based interaction
  • Speech communication to identify and/or manage emotional disorders
  • Empathic voice user interfaces
  • Quantification, analysis and/or promotion of engagement and rapport
  • Context effects in detecting emotional vocal expressions
  • Supervised and unsupervised learning algorithms in affective speech systems
  • Human and/or machine encoding and decoding of vocal behavioural patterns
  • Age, language and cultural variability in daily speech expressions
  • Spontaneous and acted speech databases
  • Emotional voices in social networks
  • Affective and emotional tagging of spoken databases (with or without interaction)
  • Semantics and extraction of emotional information from text
  • Emotional speech in human machine interaction
  • Models for managing emotion in human-machine interaction
  • Generation of affective user models
  • Emotion in human and machine conversational behaviour (grounding, turn-taking, dialogue act selection…)
  • Embedding emotion in dialogue strategies and dialogue management

The development of these themes promises to substantially improve the interaction with technologies likely to become part of our everyday life in the next years, including virtual assistants like Alexa or Siri, social robots, embodied conversational agents.
Another factor that makes the session special is the diversity of disciplinary backgrounds of the likely contributors: psychologists, health scientists, computer scientists, cognitive scientists, philosophers. Such an interdisciplinary confluence at an event that is traditionally rather more engineering oriented may bring a more rich discussion than one might otherwise expect.

WordPress Appliance - Powered by TurnKey Linux