The impact of F0 extraction errors on the classification of promincence and emotion

A. Batliner, S. Steidl, B. Schuller, D. Seppi, T. Vogt, L. Devillers, L. Vidrascu, N. Amir, L. Kossous and V. Aharonson

erschienen 2007 "International Conference on Phonetic Sciences (ICPhS)"


Traditionally, it has been assumed that pitch is the most important prosodic feature for the marking of prominence, and of other phenomena such as the marking of boundaries or emotions. This role has been put into question by recent studies. As nowadays larger databases are always being processed automatically, it is not clear up to what extent the possibly lower relevance of pitch can be attributed to extraction errors or to other factors. We present some ideas as for a phenomenological difference between pitch and duration, and compare the performance of automatically extracted F0 values and of manually corrected F0 values for the automatic recognition of prominence and emotion in spontaneous speech (children giving commands to a pet robot). The difference in classification performance between corrected and automatically extracted pitch features turns out to be consistent but not very pronounced.


  • BibTeX  -  (BibTeX.txt, 0 KB)