Search

Sequence to Sequence Autoencoders for Unsupervised Representation Learning from Audio


This paper describes our contribution to the Acoustic Scene Classification task of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2017). We propose a system for this task using a recurrent sequence to sequence autoencoder for unsupervised representation learning from raw audio files. First, we extract mel-spectrograms from the raw audio files. Second, we train a recurrent sequence to sequence autoencoder on these spectrograms, that are considered as time-dependent frequency vectors. Then, we extract, from a fully connected layer between the decoder and encoder units, the learnt representations of spectrograms as the feature vectors for the corresponding audio instances. Finally, we train a multilayer perceptron neural network on these feature vectors to predict the class labels. An accuracy of 88.0% is achieved on the official development set of the challenge – a relative improvement of 17.7% over the challenge baseline.
Title: Sequence to Sequence Autoencoders for Unsupervised Representation Learning from Audio
Lecturer: Shahin Amiriparian
Date: 14-11-2017
Building/Room: Eichleitnerstraße 30 / 207
Contact: U Augsburg/TUM