wir bieten...
Dekobild im Seitenkopf ISMLL
 
Courses in summer term 2009 / Machine Learning Project / Topics
topics
Martin Ortmann
Distributed Hyperparameter Search
example domains: medical data; multimedia; bibliographic data; text.

Hyperparameters are the arguments which are passed to machine learning algorithms. Choosing the right parameters for the given application may be crucial for the performance of the learned model. However, it is usually not possible to know the best hyperparameters beforehand, so the only way to find suitable parameter combinations is to run the algorithm many times with different parameters. This can be quite time-consuming, depending on the amount of data, the algorithm, and the number and range of the hyperparameters.

As the different runs of the learning algorithms are independent of each other, it is straightforward to distribute hyperparameter search over several computers.

  1. Choose several classifiers from Weka.
  2. Implement a generic hyperparameter search method for the Sun Grid Engine, including a visualization GUI, for those classifiers.
  3. Deploy it on ISMLL's cluster infrastructure and test it on different application datasets.

available
String Kernels for Text Classification
example domains: CRM; news articles; email; document management.

String and word sequence kernels allow the use of kernel-based methods like support-vector machines (SVMs) directly on text, without any (or at least with less) data preprocessing.

The task is to implement one or several string/word sequence kernels for LIBSVM in C++ or Java and then to compare their performance with the standard approach of polynomial kernels with bag-of-words features. Existing code could be used as a starting point.

available
Tag-Aware Collaborative Filtering
example domains: movies; music; bibliographic data; images; bookmarks.

Folksonomies are user-generated, flat and lightweight vocabularies that can help to organize massive amounts of data items on websites. Collaborative filtering (CF) is a key technology for recommender systems (RS). It is based on the assumption that users who bought/clicked/rated similar items will also perform similarly on so far unobserved items. Because of their widespread adoption in domains like online shopping (see amazon.com for an example), and because of the one-million-dollar Netflix Prize, CF and RS have gained publicity in past years,

The task is to implement several state-of-the-art collaborative filtering algorithms which also take into account folksonomy data, and evaluate them on public datasets.

available
Shape Recognition
example domains: historical images [Wang 2008]; video analysis [Yankov 2007]; machine vision; robotics.

Eamonn Keogh et al. [Keogh 2006] have introduced a new method for shape recognition. It is based on the conversion of shapes on an image into time series.

The task consists of 3 subtasks:

  1. review shape recognition methods,
  2. re-implement the method proposed by Keogh and evaluate it against shape matching implemented in OpenCV,
  3. adapt the method proposed by Keogh for a special task like recognition of traffic signs.

available
Data Mining Cup
example domains: online shops; direct marketing; couponing.

The Data Mining Cup is an annual data analysis competition for undergraduate students. Participants receive a real-life dataset, the task is then to make predictions using the given data.

This year's competition will start on April 15 and last until May 25, so the main workload of this project will be in those 40 days.