Information Systems and Machine Learning Lab, University of Hildesheim, Germany

Master/Diploma and Bachelor/Studienarbeiten thesis topics:
(

methodological focus,

technical focus)

Master theses

Past Bachelor theses at University of Freiburg:

Stefan Hauger

2006

Probabilistic Model Estimation for Collaborative Filtering using Item Attributes

Many Ecommerce sites are using recommender systems to help their customers to find suitable products from a large database. A recommender system (RS) learns from users and recommends products based on the user's preference. Collaborative Filtering (CF) is one of the most prevalent methods for RS, still a few has attempted to overcome the shortcomings of CF. One of them is the probabilistic model from B.M. Kim and Q. Li. They suggested a model where items are partitioned into groups by applying item attributes/information and made prediction for users by considering the Gaussian distribution of user ratings in each group. This probabilistic model shows better results on some public datasets.

This Studienarbeit should investigate this algorithm with other datasets such as the EachMovie dataset and extend the investigation of the impact of attributes when different (sub)sets of attributes are used.

The tasks of this Studienarbeit are as follows: (i) to implement the probabilistic model (ii) evaluate with other datasets and (iii) evaluate with varying sets of attributes, (iv) observe and analyze the impact of attributes on the model.

Tobias Lang

2006

Bayesian Models for Recommender Systems

In the last few years, several proposals for Bayesian networks and hierarchical Bayesian models for recommender systems have been made. Most of these models have in common, that (i) the models easily can overfit the training data and (ii) are learned by means of an Expectation Maximization Algorithm (EM) that converges slowly. Therefore, typically EM is stopped early.

The task of this Studienarbeit is to implement two of these models (the so-called "Aspect model" and the "Latent Dirichlet Allocation" model) and to experiment with different initialization schemes, stopping criteria and EM acceleration methods.

Till Knorr

2006

Temporal Data Mining on Hemodialysis Treatment Data

Growing numbers of patients suffer from end stage renal failure. For these patients either transplantation or regular dialysis treatment is necessary for survival. But the cost of providing dialysis care is high and the survival span of patients on dialysis is significantly lower than for patients with transplants. During dialysis two dozen parameters can be observed on a regular basis. Analyzing these data with data mining algorithms can help to identify critical factors for patient survival.

The tasks of this Studienarbeit are as follows: (i) prepare the data set for data mining (e.g. cleaning etc.), (ii) apply standard data mining techniques and evaluate the results (Kusiak et al. 2005), (iii) build more sophisticated models making use of the temporal structure (e.g. motif finding, e.g., Keogh et al. 2003) and evaluate the benefit (if any) compared to simpler models.

This Studienarbeit is offerend jointly with Calcucare GmbH, Freiburg.

Manuel Stritt

2006

Learning Anonymous Recommender Systems

Anonymous recommender systems help anonymous users/customers to find items/products they are interested in by asking them some questions about their interests and then provide a list of items/products in return. The goal of this Studienarbeit is learn a classification model based on the answers to the questions that predicts the items of interest. Historical data from an existing e-commerce recommender system for training and evaluation is available.

Methodically, one of the challenges of this tasks is the large number of possible alternatives of the target variable (about 100). Several model setups (1-vs-rest, 1-vs-1, hierarchical contrasts) as well as several base classifiers (logistic regression, naive bayesian classifier, decision tree, SVM) should be compared using the data mining toolkit Weka. If time permits, also ensemble models and feature selection strategies should be investigated.

The Studienarbeit comprises the following tasks: (i) extract suitable transaction data from an operational database. (ii) design several experiments. (iii) setup several models and run experiments. (iv) compare results with results for a static as well as for the status-quo system. (v) discussion of results.

Dominik Benz

2005

Automatic Bookmark Classification Systems

Bookmarks (or favorites, hotlists, ...) are a common way to assist users in retrieving information from the World Wide Web. Contemporary browsers like Mozilla, Internet Explorer and others allow the creation of personalized local URL repositories to store interesting web locations. Most of these bookmarking facilities offer the possibility to organize the bookmarks by creating a hierarchy of folders. This approach addresses the problem of bookmark organization, whereby it focusses on the bookmark classification problem, i.e. the question in which bookmark folder a recently bookmarked website should be stored. A collaborative approach is used to assign a user to a bookmarking peer group of users with similar interests and to generate recommendations for the classification of new bookmarks.

The Studientarbeit comprises the following tasks: (i) investigate existing bookmarks systems (ii) design bookmark classification system (iii) implementation of system (iv) evaluation and discussion of end system.

Steffen Rendle

2005

Classification Models for Automatic Vowel / Consonant Distinction

Linguists are interessted in annotating audio speech data with linguistic properties, e.g., in the simplest case, which parts of the speech are vowels and which are consonants. There exist large hand-labeled corpora, e.g., the LeaP speech corpus that contains audio files and corresponding linguistic annotations. The challenge of this topic is to build a model that automatically segments a given speech signal into intervals and labels them as vowels or consonants.

Concrete tasks are: - implement a preprocessing tool that formats audio and annotation datat in a combined, tabled-based data set containing spectrogram and annotation data (based on existing tools for spectrogram computation) that allows to select different parametrizations (e.g., window size, selected frequencies etc.). - create data sets for different model setups and parameter constellations. - run experiments on these data sets with different classification algorithms (generalized linear models, neural nets, SVMs). - analyse the results of these experiments.

Christine Preisach

2005

Graph-Based Recommender Systems Implementation and Experiments

Graph-based model of Huang et al (2004) solved the Sparsity problem of Collaborative Filtering. In this paper, the transitive associations of users will be computed and recommendations will be generated base on these associated values. Huang et al. have shown significantly better results, however, only with non-public datasets. This Studienarbeit investigated these algorithms with public E-commerce dataset: MovieLens and EachMovie datasets.

The Studientarbeit comprises the following tasks: (i) implement exisitng models from Huang (ii) evalute models with public two public datasets: MovieLens and EachMovie (iii) set up and run experiments. (iv) discussion of results.

Zan Huang,Hsinchun Chen, Daniel Zeng (2004): Applying Associative Retrieval Techniques to Alleviate the Sparsity Problem in Collaborative Filtering. ACM Transactions on Information Systems, Vol. 22, No.1, pp. 116-142.

Bachelor projects

Bachelor theses

Master theses