Visual Representation Learning from Text Input


In recent years, deep learning has become very popular in the research community leading to a steady advancement in the development of new neural network architectures. Many architectures have emerged in the field of computer vision and have subsequently been adapted for further modalities such as text and audio. For example, convolutional neural networks, which were originally derived from human visual perception, also achieve state-of-the-art performance on many text classification tasks. Most of the approaches using 1D convolutions on pre-trained word embedding, but do not utilise the full potential of visual layers. As an example, reference can be made to (visual) character quantization ( The aim of this study is to explore new ways to learn visual text representation complementary to word embeddings utilising typical image properties (such as colour) and compare them on a simple CNN.

Task In this thesis, the student will design and implement a new text representation suitable for a visual input layer. In addition, this text representation will be compared to others like word2vec visual embeddings and traditional word embeddings. For this purpose, a benchmark will be performed on the popular NLP tasks; e.g., Text Sentiment Classification based on Amazon Review 5-class polarity dataset.
Utilises Tensorflow/ Keras, Visual representation Learning for Text, CNNs
Requirements Advanced knowledge in machine learning and natural language processing, Good programming skills (e.g. Python, C++)
Languages German or English
Supervisor Lukas Stappen, M. Sc. (