Master Thesis at ISMLL

Current Master Theses at ISMLL

Mr. Torben Windler

Automated Customs Tariff Classification

As a result of globalization and its effects on the flow of goods inner- and intercontinental trade, companies are increasingly faced with the challenge of coping with the increasing bureaucratic effort involved in exporting goods. Especially the process of tariff classification, in the course of which every article to be exported must be assigned to a certain customs tariff group, takes up an extremely large amount of working time. A large number of employees is tied to repetitive manual work, which requires costs and time. In this context, progress in data analysis represents a significant potential. With the help of algorithms from the field of machine learning, this process could be at least partially automated.

Contact: Ahmed Rashed

Mr. Edgar Jimenez

July 2019 - Jan 2020

Neural Factorization Machines for Recommendation System

Currently recommendation systems are one of the main focus in the industry in use of many marketing campaigns to improve and grow the selling and distribution of their products and services. In the look for the best recommendation, researchers base their prediction base on Factorization Machines (FM) which are the popular solution for efficiently using feature interactions. Is the second order interaction between two dimensions, this mean relation between user, item. However, FM model features interactions in a lineal way, which can be insufficient for capturing the non-linear and complex inherent structure of the real-world data. This real-world data contains information in categorical values and in implicit structure, not always in a linear perspective. For the other hand exists deep neural networks have recently been applied to learn non-linear feature interactions, but also sometimes are to difficult to train. For my research I want to implement a Neural Factorization Machine (NFM), using two different data sets full of categorical data to create recommendations. NFM is a model that combines the linearity of FM in modelling second-order feature interactions and the non-linearity of neural network in modelling higher-order feature interactions. Using NFM we can take advantages of every categorical feature in our data sets. In my research I am going to compare the model base on two data sets, one provided it by Kaggle and one for a private company, to proof the model can performance efficiently using a real-world data.

Contact: Ahmed Rashed

Ms. Krithika Murugesan

Feb 2019 - Oct 2019

Knowledge Graph Embeddings

Knowledge Graphs represent information via entities (nodes) and their relationships (edges). The ability of a Knowledge Graph to capture relationships among entities enriches the context of the data stored, enabling better reasoning capabilities during information retrieval. Construction of Knowledge Graphs can be classified into two major categories namely, Probabilistic Methods and Embeddings. The paramount concern among probabilistic methods is their inability to generalize unseen entities and relations, they are also computationally expensive. Therefore, the alternate is Embeddings! These are dense vector representations that can extract comparatively more relationships from the document by learning latent representations. It can be implemented in two steps by first, extracting the relationships among entities followed by completion of the graph itself. The latent features can be learned by several existent methods like Deepwalk, LINE, node2vec and PTE; [1] has established theoretical evidence that such methods are implicit matrix factorization with closed form using negative sampling. The authors also proposed a new general framework NetMF, aimed to perform the factorization done by Deepwalk and LINE inherently. NetMF outperforms Deepwalk and LINE, though there is scope for faster approximations in closed form as mentioned by the authors. [2] and [3] enumerate different sampling strategies that can be used in learning knowledge graph representations, [2] stresses on three different such techniques for the graph-based loss functions while [3] attempts to achieve this by spectral sparsification involving Gaussian sampling. GraphGAN [4] a new approach altogether takes advantage of Generative Adversarial Networks (GANs) to simulate a minimax game between existing discriminative and generative methods in order to learn graph embedding. Since, the training is adversarial the model learns from the positive samples as well unlike aforementioned techniques. This thesis is directed towards discovering the potential new approaches to learn knowledge graph embedding by augmenting NetMF with different sampling approaches including the possibility of integration with GraphGAN thereby making the representation more robust as it can learn from the positive samples as well.

Contact: Ahmed Rashed

Valerie Chikukwa

Jul 2019 - Jan 2020

Handling missing data in Multivariate time series

Multivariate time series data is usually characterized by having a lot of missing values in many practical applications ranging from biology, geoscience, to health care and others. When sampling a clinical time series, missing values become ubiquitous due to a variety of factors such as frequency of medical events (when a blood test is performed, for example). These missing values hinder the application of advanced analysis methods because often further steps for data analysis and processing rely on complete data. There is, therefore, a need to replace the missing values with reasonable values for better analysis, but imputing missing data from a multivariate time series dataset remains a challenging problem (Bashir and Wei, 2018). Recent work in time series prediction and other related tasks has shown that missing values and their missing patterns are often correlated with their target label. (Che, Purushotham, Cho, Sontag, and Liu, 2018) referred to this as informative missingness. The goal of the thesis is to correctly classify a time series task while approximating the accuracy of the literature and trying to obtain similar or better results. To do this we plan to explore the implementation of Neural Ordinary differential equations (Chen et al., 2018) based on the ODESolve module, which is a continuous-time generative model. As an extension, for the purpose of understanding model behaviors with different missing patterns, we plan to generate synthetic datasets from a regularly sampled dataset with no missing values.

References: -

[1] Bashir, F., and Wei, H.-L. (2018). Handling missing data in multivariate time series using a vector autoregressive model-imputation (var-im) algorithm. Neurocomputing, 276 , 23–30.
[2] Che, Z., Purushotham, S., Cho, K., Sontag, D., and Liu, Y. (2018). Recurrent neural networks for multivariate time series with missing values. Scientific reports, 8 (1), 6085.
[3] Chen, T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. (2018). Neural ordinary differential equations. In Advances in neural information processing systems (pp. 6571–6583).

Contact: Rafael Rego Drumond

Junaid Ghauri

Jan 2019 - July 2019

Parking Occupany Prediction

Parking availability prediction is rapidly gaining interest within the community as an operationally cheap approach to identifying empty parking locations. Parking locations accommodate multiple vehicles and are rarely completely occupied which makes it difficult to predict occupied locations without the augmentation of external data, as the data becomes highly imbalanced. Existing forecasting models neither encapsulate the heterogeneous modes/types of parking data, nor can handle sparse measurements. In this thesis, your task will be to develop a sequence-to-sequence learning framework that addresses the issue of occupancy forecasting. Throught this project, you will build on top of existing methods pertaining to multi-step forecasting, and deliver a model that is competitive with what has been recently done.

References: -

[1] Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
[2] Lin, Trista, et al. ”A survey of smart parking solutions.” parking 1524 (2017): 9050.: 32-38.
[3] Chen, Xiao. ”Parking occupancy prediction and pattern analysis.” Dept. Comput. Sci., Stanford Univ., Stanford, CA, USA, Tech. Rep. CS229-2014 (2014).

Contact: Hadi Samer Jomaa

Muhammad Usman Syed

Feb 2019 - Jul 2019

Scalable Distributed Machine Learning Algorithm

Parallel Stochastic Gradient descent algorithms implemented in the form of centralized architecture face multiple challenges. Communication between the worker nodes via the server node becomes a bottleneck because of limited or congested network. Synchronization of worker nodes is hence costly and a challenging task. We propose and devise 2 experiments to find out the most optimal setting to find the sweet spot for an SGD algorithm to balance between efficient communication and the most optimal moment to synchronize or average parameters for all workers. The first experiment conducted is to find the best time or parameter F (frequency) to average incoming parameters from the worker nodes on the server node. Previous papers have mentioned the pros and cons of averaging at the end of all iterations or after every iteration for convex and non-convex setting. They however fail to mention the time factor or any numerical value that could quantify the best time to average the parameters. We try to explore these values in search for a better convergence rate. The second experiment is to rethink the state of the art Ring Topology which is used by multiple distributed systems to average and update parameters. We present a 2-Tier ring topology where every ring in the topology has a ring leader. In the second step, the communication between the ring leaders to average the parameters would reduce complexity in the network. The goal is to test the AllReduce with this new topology so that communication overhead can be minimized. The dataset to be used in the experiments are MNIST, CIFAR10 and ImageNet.

References:

1. Rendle, Steffen, et al. "Robust Large-Scale Machine Learning in the Cloud." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.
2. Li, Mu, et al. "Communication efficient distributed machine learning with the parameter server." Advances in Neural Information Processing Systems. 2014.
3. Zhang, Huan, Cho-Jui Hsieh, and Venkatesh Akella. "Hogwild++: A new mechanism for decentralized asynchronous stochastic gradient descent." Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 2016.

Contact: Mohsan Jameel

Daniel Pototzky

Jan 2019 - Jue 2019

Pre-Training Methods for Object Detection

The success of Machine Learning models strongly depends on the quality of the available labelled data. Unfortunately, creating labelled datasets is very time consuming and expensive. While labelled data is in short supply, unlabeled data is easily accessible in many areas. For example, an almost unlimited amount of images is freely available on the internet. Therefore, the question naturally arises if and how we can make use of unlabeled data. This thesis will address the issue for the case of unsupervised/self-supervised learning of visual representations based on two existing approaches. The first one being called Noise as targets which maps output features of a Neural Network to random noise vectors. The second approach is called RotNet. Input images are transformed and the network trained to recognize the transformation. Several modifications of the existing setups as well as a combination of both are proposed. Furthermore, first results are presented.

Contact: Mohsan Jameel

Past Master theses at University of Hildesheim
Past Master theses at University of Freiburg

Projects

Bachelor

Master

Master Thesis at ISMLL

Current Master Theses at ISMLL