The format of the special session allows 20 minutes for each accepted presentation, including questions.
A round table is planned (scheduled time: 1 hour), on the theme: “The dependability of voice in interactional exchanges”.
This Special Session is promoted by the H2020 EMPATHIC RIA (grant 769872, http://www.empathic-project.eu/) and H2020 MENHIR MSCA-RISE (grant 823907) actions.
A special issue of the Journal Computer Speech and Language is foreseen as an outcome of this special session.
Guidelines for paper submissions and paper presentations, and detailed author kit can be found at the Interspeech web site: https://www.interspeech2019.org/
Papers must be submitted through the Interspeech electronic paper submission system.
Please indicate that the paper should be included in the Special Session on: Dynamics of Emotional Speech Exchanges in Multimodal Communication
Graz, Austria, Sep. 15-19, 2019. About Graz
ANNA ESPOSITO, Università della Campania “Luigi Vanvitelli”, Italy, email@example.com, firstname.lastname@example.org
MARIA INÉS TORRES, Universidad del País Vasco UPV/EHU, Spain, email@example.com
OLGA GORDEEVA, Acapela Group, Belgium, firstname.lastname@example.org
RAQUEL JUSTO, Universidad del País Vasco UPV/EHU, Spain, email@example.com
ZORAIDA CALLEJAS CARRIÓN, Universidad de Granada, Spain, firstname.lastname@example.org
KRISTIINA JOKINEN, AIST AI Research Center in Tokyo, Japan, email@example.com
GENNARO CORDASCO, Università della Campania “Luigi Vanvitelli”, Italy, firstname.lastname@example.org
BJIOERN SCHULLER, ICL, UK, and University of Augsburg, Germany, email@example.com
CARL VOGEL, Trinity College Dublin, Ireland, firstname.lastname@example.org
ALESSANDRO VINCIARELLI, University of Glasgow, Glasgow, UK, Alessandro.Vinciarelli@glasgow.ac.uk
GERARD CHOLLET, Intelligent Voice, London, UK, email@example.com
NEIL GLACKIN, Intelligent Voice LTD, London, UK, firstname.lastname@example.org
Emotional expression plays a vital role in creating social linkages, producing cultural exchanges, influencing relationships and communicating experiences. Emotional information is transmitted and perceived simultaneously through verbal (the semantic content of a message as well as its linguistic form) and nonverbal (non-linguistic vocalizations, voice quality, facial expressions, gestures, paralinguistic information, turn-taking, response selection) communicative channels. These channels each constitute communication modes.
Research devoted to understanding the relationship between verbal and nonverbal communication modes, and investigating the perceptual and cognitive processes involved in the coding/decoding of emotional states (as well as their mathematical modelling and algorithmic implementation) is particularly relevant in the fields of Human-Human and Human-Computer Interaction for developing friendly and emotionally coloured technologies, whether assistive or entertainment-oriented.
When it comes to speech, it is unmistakable that the same linguistic expression may be uttered for teasing, challenging, stressing, supporting, inquiring, answering or as expressing an authentic doubt. The appropriate continuance of the interaction depends on detecting the addresser’s mood.
To progress towards a better understanding and modelling of such interactional facets of communication, there is a need for more accurate solutions to the following challenges making this special session special:
Identify signal processing algorithms able to capture emotional features from multimodal social signals and, in particular, from speech, realize a coherent multimodal fusion of such features, and produce coherent emotional responses;
Implement fast and efficient computational models trained to classify vocal emotional features retaining their hierarchically structured, time-dependent and reciprocally connected relationships from multimodal channels;
Identify the emotional and empathic contents (either successful or unsuccessful) underpinning daily interactional exchanges in order to generate affective models of them for user-centered human-machine interaction, and assistive ICT interfaces;
Build models that integrate emotional behaviour in interaction strategies (elicit emotional response, react to emotion, favour engagement and rapport);
Explore what kind of impact affective user models have on the development of practical applications that reproduce emotional behaviour;
Identify relevant ethical aspects and discuss the societal impact of affective technology.
The themes of this special session are multidisciplinary in nature, and closely connected in their final aims to identify features from realistic dynamics of emotional speech exchanges. It includes formal and informal social signals, communication modes, hearing processes, and physical or cognitive functionalities. Of particular further interest are analyses of visual, textual and audio information and corresponding computational efforts to automatically detect and interpret their semantic and pragmatic contents. Related applications of these interdisciplinary facets are ICT systems and their interfaces able to detect health and affective states of their users, interpret their psychological and behavioural patterns and support them through positively designed interventions to improve their quality of life.
Themes include but are not limited to:
Vocal signals for detecting affective well-being and emotional states
Interpretation of features of interaction
Detection of health and psychological states from speech-based interaction
Speech communication to identify and/or manage emotional disorders
Empathic voice user interfaces
Quantification, analysis and/or promotion of engagement and rapport
Context effects in detecting emotional vocal expressions
Supervised and unsupervised learning algorithms in affective speech systems
Human and/or machine encoding and decoding of vocal behavioural patterns
Age, language and cultural variability in daily speech expressions
Spontaneous and acted speech databases
Emotional voices in social networks
Affective and emotional tagging of spoken databases (with or without interaction)
Semantics and extraction of emotional information from text
Emotional speech in human machine interaction
Models for managing emotion in human-machine interaction
Generation of affective user models
Emotion in human and machine conversational behaviour (grounding, turn-taking, dialogue act selection…)
Embedding emotion in dialogue strategies and dialogue management
The development of these themes promises to substantially improve the interaction with technologies likely to become part of our everyday life in the next years, including virtual assistants like Alexa or Siri, social robots, embodied conversational agents.
Another factor that makes the session special is the diversity of disciplinary backgrounds of the likely contributors: psychologists, health scientists, computer scientists, cognitive scientists, philosophers. Such an interdisciplinary confluence at an event that is traditionally rather more engineering oriented may bring a more rich discussion than one might otherwise expect.