Real-time Interface for Speech Emotion Recognition (Bachelor's thesis)


Emotion recognition is a growing field, which integrates many multimodal analysis techniques, such as speech paralinguistic analysis. Paralinguistics is the study of the short-term states (e.g, emotion) and long-term traits (e.g., gender) of a speaker. With an abundance of speech-based features to be extracted automatically (e.g., prosody and loudness), the advantage of speech-based monitoring systems for emotion recognition is its non-intrusive nature.

As a means of demonstrating the capability of existing machine learning-based methods for such a task, this topic involves the development of an interface for monitoring emotions from individuals through the analysis of their speech. The interface should be able to capture speech signals, process them close to real-time to identify the emotional state of the individual, and display the probability of the input speech signal being categorised into the 6 basic emotions (e.g., surprise, disgust, happiness, sadness, fear, and anger).


In this research project, pre-existing tools such as openSMILE [1] will be used for real-time speech analysis, whilst also considering state of the art approaches for speaker state analysis (e.g., CNN Mel-spectrogram classification [2]).

The system is envisioned to work in both German and English languages, so the use of established emotional speech datasets, such as IEMOCAP, SAVEE and EMO-DB, is encouraged. Ultimately, the expected outcome of this project is a real-time demonstrator with an interpretable interface.

Utilises openSMILE (, CNN Mel-spectrogram classification (
Requirements Knowledge of machine learning, good level of skill with Python programming.
Languages English preferred

Alice Baird (

Adria Mallol-Ragolta (