Search

Towards Conditional Adversarial Training for Predicting Emotions from Speech


Motivated by the encouraging results recently obtained by generative adversarial networks in various image processing tasks, we propose a conditional adversarial training framework to predict dimensional representations of emotion, i. e., arousal and valence, from speech signals. The framework consists of two networks, trained in an adversarial manner: The first network tries to predict emotion from acoustic features, while the second network aims at distinguishing between the predictions provided by the first network and the emotion labels from the database using the acoustic features as conditional information. We evaluate the performance of the proposed conditional adversarial training framework on the widely used emotion database RECOLA. Experimental results show that the proposed training strategy outperforms the conventional training method, and is comparable with, or even superior to other recently reported approaches, including deep and end-to-end learning.
Title: Towards Conditional Adversarial Training for Predicting Emotions from Speech
Lecturer: Jing Han
Date: 14:00 13-04-2018
Building/Room: Eichleitnerstraße 30 / F1 304