Banner and Advertisement Detection and Localisation in YouTube Videos Utilising Pseudo-Supervised Deep Learning


The use of large in-the-wild datasets is beneficial for research and industry. In-the-wild data, however, have a higher granularity and noise than laboratory data. In order to simplify the joint use, noise and particularly disturbing training influences have to be automatically detected, extracted and removed. Data sources such as YouTube, represent a very good data source due to its public availability and extensive content. These videos, however, often include banner highlighting additional information in textual form. These video elements are disturbing training influences that can confuse feature extraction frameworks trained by deep learning models. Removing these banners by hand would be extra effort for the creators, reducing the chance of receiving permission of them to use their videos for research purposes.

The aim of this study is to automatically detect and localise distracting elements in videos utilising SOTA deep learning algorithms. For this purpose, a label generator has to be developed, which projects realistic boxes and texts on random positions and in different sizes into the video. These elements are used as pseudo labels in the subsequent training process. The developed neural network should learn to predict these elements and their position in a video sequence (see Pixel CNNs). 

Task In this thesis, the student(s) will develop a state-of-the-art data generator and deep learning method for banner detection. 
Utilises Advanced Data Augmentation, Video/Image Segmentation/Masking, R-CNN
Requirements Preliminary knowledge in Deep Learning, Computer Vision, Good programming skills (e.g. Python, C++)
Languages German or English
Supervisor Lukas Stappen, M. Sc. (