wir bieten...
Dekobild im Seitenkopf ISMLL
Bachelor and Master thesis topics:
( methodological focus, technical focus)

Analysis of Meta-Features with Respect to Their Use for Similarity Estimation Between Data Sets

Many machine learning methods need hyperparameter tuning to achieve useful results. The hyperparameter tuning is usually done by experts or with brute force strategies (grid search) which are time-consuming.

Recent work tries to automate the hyperparameter tuning process or even improve it. Knowledge from past experiments is used and transferred to new experiments. The fundamental idea is that similar data sets also behave similar with respect to the hyperparameter configurations. This similarity is estimated using meta-features. Examples for meta-features are simple properties of data sets like the size but also the performance of simple machine learning algorithms.

Contact: Martin Wistuba
Empirical Comparison of Move Prediction Algorithms for Go Computers

Go is an old two-player board game with origin in Asia. The game is easy to learn but hard to master which makes it so popular all over the world. Go is also interesting for artificial intelligence research. For Chess already programs exist that can beat any human player. For Go the strongest AI players are far away from beating human experts. There are two main reasons: i) the search space of Go is extremely large since many move options exist and ii) moves can have an important long-term effect such that no good heuristic was found that can evaluate a board state. Thus, techniques that led to success in games like Chess, cannot be applied successfully in Go.

Strong state of the art Go computers are using a combination of Monte Carlo Tree Search (a heuristic search algorithm) and move prediction algorithms that are pruning the search space and guide the search. So far, these algorithms are specialized for the task of move prediction in Go. It is unclear whether this is necessary or state of the art learning to rank algorithms can be even better. The aim of this thesis is to create a tool that converts Go game recording into a format that can be red by learning to rank algorithms. Furthermore, a direct comparison of Go move predictors with learning to rank algorithms shall be applied.

Contact: Martin Wistuba
Johann Witowski
Continuous Integration and Unit Tests for Recommender Systems

Test-driven development using unit testing and Continuous Integration is one of the pillars of agile software development.

The task for this Bachelor thesis is the design, development and deployment of a testing environment for a recommender system library developed at ISMLL. It consists of the following sub-tasks:

  • Survey of continuous integration tools for Mono/C# (e.g. CruiseControl.NET)
  • Selection of suitable tools for the task
  • Conception of integration tests based on a given specification
  • Adapting and enhancing of existing unit tests
  • Setup of an automated testing environment
  • Requirements:

  • programming skills in C# or Java
  • some experience with unit testing
  • Prior knowledge or classes in machine learning/recommender systems are not required.
  • taken
    Finding Anomalities in Time Series

    Time series refer to streams of data ordered in a time based sequence. Such data is observed in plenty of real world domains as statistics, signal, processing, medical measurements (ECG,EEG), etc ... Analysis of time series has attracted considerable interest, still various aspects remain under research focus. One of the most important challenges is to identify anomalies in the series, which denote surprising or interesting patterns. An example of anomaly detection could be detecting anomaly subsequences in the heart signal of an ECG time series plot.

    Your task is to implement the referenced research paper which presents a technique on detecting surprising patterns. In the end a software which detects and displays anomalies is expected.

    Reference: Eamonn J. Keogh, Stefano Lonardi, Bill Yuan-chi Chiu: Finding surprising patterns in a time series database in linear time and space. KDD 2002: 550-556

    Communication Efficient Distributed Classification in Peer-to-Peer Networks

    Mining patterns from large-scale distributed networks, such as Peer-to-Peer (P2P), is a challenging task, because centralization of data is not feasible. The goal is to develop mining algorithms that are communication efficient, scalable, asynchronous, and robust to peer dynamism, which achieve accuracy as close as possible to centralized ones. In this project, we aim to implement classification models that can be learned locally on each peer in a distributed network setting, and are able to produce a very reduced or light weight representative local knowledge to be shared with their direct neighbors. Desired output of such an experiment should be one that maxmizes the prediction accuracy on each peer (with the exchange of knowledge among neighbors) while keeping the communication overhead to be the least. Proposed learning technique: Relevance Vector Machines (RVM) described in following paper. Sparse Bayesian Learning and the Relevance Vector Machine.

    Contact: Umer Khan

    If you are interested in other topics, please ask one of us directly.

    Past Bachelor theses at University of Hildesheim
    Past Bachelor theses at University of Freiburg