wir bieten...
Dekobild im Seitenkopf ISMLL
Courses in summer term 2007 / "Praktikum"/Project on Data Mining and Machine Learning

Time: not regularly
Begin: Fri. April 13, 11:15
Although many tedious tasks can be automated by modelling the behavior of a (computer) system manually, many problems require that a system can adapt its reponses based on feedback on former actions, i.e., learn how to act in a better way in the future. Other tasks are just too large-scale for humans to overview, so help from computers is needed.

Machine Learning (also known as Data Mining, Pattern Recognition, Data Analysis, and Classification) is a research area at the intersection of computer science, artificial intelligence, mathematics and statistics, that addresses these problems. It covers general methods and techniques that then can be applied to a vast set of applications such as predicting customer behavior, steering a robot, detect spam, and predict the folding of a protein, to name just a few.

In this project we provide different practical topics from the area of data mining and machine learning, the task is to design and implement an application. This application should be applied on data from different domains (provided by us).

The project allows students to gain practical knowledge and capabilities in the usage of data mining and machine learning algorithms.

  • Each topic is intended for a small group of 3-4 students.
  • Software should be written in Java or C++.
  • Final talks can be given in English or German.
  • Each topic consists of a generic tool and its proof-of-concept application in an example domain.


  • Groups can start immediately.
  • Each group is supposed to give at least two presentations:
    • a first presentation about ongoing work, showing a first implementation and commenting on problems (around mid term),
    • a final presentation of the whole work (end of term).


  • Kickoff meeting Friday 13.04.2007, 11:15, C202
  • You can register for topics from now via email (preisach@ismll.uni-hildesheim.de)
  • Topics will assigned in order of arrival of registration emails .
  • If you state several topcis in decreasing preference, you will get assigned the first one that is available.
  • Registration of pre-formed groups is preferred.


1. Recommender System
example domains: movies; bibliographic metadata.

Design and implementation of an application for recommender systems that has two user modes:

i) administrator mode: which allow users to analyze products and user data, explicit and implicit rating information, as well as configuration of different recommend systems settings;

ii) user mode: allows users to rate products, view rated products, get recommendations, explore more information about their recommendations.

The following tasks have to be performed:

1)Design recommender system which models products, user data and capture explicit and implicit rating information as well as functionalities for the user modes.

2) Implementation of the system.

3) As proof-of-concept, apply movie data to the recommender system (e.g. use data from IMDB).

2. Collaborative Tagging
example domains: internet radio, bookmarks.

A Folksomy is a flat and light weight knowledge structure available in Internet-mediated social environments. There, user resources such as Web pages, online photographs, and Web links are labelled with free chosen keywords, usually called tags. Some recent experiments on folksonomies have been raised the assumption that two users sharing similar resources tend also to share the same tags and vice-versa The task of this praktikum is to use advanced data-analysis techniques to give more evidences and possibly to confirm this assumption.

3. Object Identification
example domains: price comparsion; cars description

Design and implement a application that can identify identical objects, that are describe in a slightly different way.

The following tasks have to be performed:

1) Find cars description in the internet

2) Extract relevant attributes for object identification

3.) Apply algorithms in order to identify same objects (for example simple similarity measures)

4) Evaluate algorithms

4. Similarity Search for Time Series
example domains: medical data; technical measurements from aerospace and automobile domain.

Design and implement a application that allows to search for similar time series by using global curve similarity.

The following tasks have to be performed:

1) Use similarity/distance measures for finding the most similar time series (some already available).

2) Implement indexing methods in order to accelerate search.

3) Design and implement a graphical interface that allows to choose the distance measure and shows the similar time series.