EmoVoice has been recently integrated as toolbox into the Social Signal Interpration (SSI)
framework which is also from the Lab for Multimedia Concepts and Applications.
In combination with SSI, EmoVoice includes the following modules:
- database creation
- feature extraction + classifier building and testing
- online recognition
ModelUI, the graphical user interface of SSI, supports the creation of an emotional speech database. Stimuli to elicit emotions can be provided by the interface, for example by reading a set of emotional sentences. We have defined a set of sentences that is loosely based on the Velten mood induction technique  which should facilitate the real experience of the emotions. However, the sentences can also be personalised so as to help the reader to better immerse into emotional states. This procedure reduces the effort of building a prototypical personalised emotion recogniser to just a few minutes.
Of course, also already available emotional speech databases can be used with EmoVoice.
Feature extraction + classifier building
The phonetic analysis largely uses algorithms from the Praat
phonetic software and the ESMERALDA
environment for speech recognition. Features are based on global statistics derived from pitch, energy, MFCCs, duration, voice quality and spectral information.
Currently, two classifiers are integrated into the framework: Naive Bayes as a fast but simple classifier, and Support Vector Machines as a more sophisticated classifier.
Online recognition works as a command line application that outputs to the command line or over a socket using the Open Sound Control (OSC) protocol. The tool reads constantly from the microphone and extracts suitable voice segments by voice activity detection. After feature extraction, each segment is directly assigned an emotion label with the help of a previously trained classifier.
So far, we have developed a number of demo applications that use EmoVoice, and it has been used for showcases by partners in the Callas project.
Examples for these are:
- an emotional caleidoscope :
- the virtual agent Greta  as an emotionally reacting listener. Greta mirrors the emotion of the speaker with her face and gives emotionally appropriate verbal feedback:
- E-Tree , a Callas showcase, is an AR art installation of a tree growing and changing colour and shape according to multimodal input, one of them emotional valence and arousal recognised from voice:
EmoVoice is part of the SSI and available freely for download under the GNU Public Licence.
- Download free binaries here.
- The source code of EmoVoice is available as part from SSI here.
If you use EmoVoice for your own projects or publications, please cite the following papers:
T. Vogt, E. André and N. Bee, "EmoVoice - A framework for online recognition of emotions from voice," in Proceedings of Workshop on Perception and Interactive Technologies for Speech-Based Systems, 2008.
J. Wagner, F. Lingenfelser, and E. Andre, The Social Signal Interpretation Framework (SSI) for Real Time Signal Processing and Recognitions," in Proceedings of INTERSPEECH 2011, Florence, Italy, 2011.
 Velten, E. (1968). A laboratory task for induction of mood states. Behavior Research & Therapy, (6):473-482.
 Sichert, J. (2008). Visualisierung des emotionalen Ausdrucks aus der Stimme. Vdm Verlag Dr. Müller. Saarbrücken, Germany.
 de Rosis, F., Pelachaud, C., Poggi, I., Carofiglio, V., and de Carolis, B. (2003). From Greta's mind to her face: modelling the dynamics of affective
states in a conversational embodied agent. International Journal of Human-Computer Studies, 59:81-118.
 Gilroy, S. W., Cavazza, M., Chaignon, R., Mäkelä, S.-M., Niiranen, M., André, E., Vogt, T., Billinghurst, M., Seichter, H., and Benayoun, M. (2007). An emotionally responsive AR art installation. In Proceedings of ISMAR Workshop 2: Mixed Reality Entertainment and Art, Nara, Japan.