Mutlimodal Intergration in Speech Recognition and Speaker Localization

Abstract: Speech recognition has profited immensely from the recent developments in deep learning, and together with signal enhancement strategies for multiple microphones, it is now possible to successfully employ speech recognition even in difficult environments. This talk will focus on strategies for achieving even greater robustness by including visual information, i.e. lip movements, in addition to the acoustic channel alone. This is a strategy that is often employed by human listeners in noisy environments, and this talk will show how that capability can aid machine listening as well. Together with an appropriate stream weighting strategy, error rates of neural-network-based speech reocgnition can be cut in half in difficult situations by the addition of video information, while achieving reliable improvements even in good acousic conditions. The same strategy is also applicable to speaker localization, where, again, stream weighting is of significant value to maximally gain from the availability of both sources of information. This talk will discuss the architecture of recognition and tracking systems that enable such improvements, video features that can be employed, and, importantly, the adaptive stream weighting that allows one to profit from the addition of video information under all circumstances. > Bio: Prof. Dr.-Ing. Dorothea Kolossa operates since 2010 as head of the Cognitive workgroup signal processing at the University of Bochum. There they engaged in robust voice and pattern recognition, therefore developed methods and algorithms to make pattern recognition can also be used in difficult and changing environments. This topic has initially engaged in many projects in her doctoral thesis at the Technical University of Berlin, then in several research visits, including at NTT (Kyoto), at the University of Hong Kong and in 2009 as a visiting faculty at UC Berkeley Prof. Kolossa. More than eighty publications and patents and a book to robust speech recognition have arisen in the context of this work, and current collaborations, including with the International Computer Science Institute (ICSI) in Berkeley, are aimed at the existing today speech recognition technology reliable for everyday mobile use to design.
Title: Mutlimodal Intergration in Speech Recognition and Speaker Localization
Lecturer: Prof. Dr.-Ing Dorothea Kolossa
Date: 22 June 2018 / 15:45
Building/Room: 1058 N