wir bieten...
Dekobild im Seitenkopf ISMLL
Veranstaltungen im Wintersemester 2013/2014 / BSc-/MSc-Praktikum: Business Analytics, Maschinelles Lernen und Kunstliche Intelligenz / Themen

Parallel Machine Learning Systems: How to Mine Big Data?

In recent years we have seen a rapidly growing gap between the amount of collected data and data processing capabilities of conventional computers. This is not surprising: according to the Moore's Law, the processing power of an "average computer" doubles every 18 months, while, according to Lyman and Varian from Berkeley, the amount of stored data doubles every 12 months.
Modern data-mining applications, often called "big-data" analysis, require us to manage immense amounts of data quickly, more precisely and more intelligently. To deal with such applications, a new software stack of parallel machine learning systems has evolved. In this practical, we shall focus on understanding and using these systems.

Practical Objectives

Students will gain hands-on experience with state-of-the-art distributed data mining systems such as Hadoop, MapReduce, Mahout, and GraphLab systems, applying them to some benchmark Machine Learning tasks.

Predict Flight Delays

Did you know airlines are constantly looking for ways to make flights more efficient? From gate conflicts to operational challenges to air traffic management, the dynamics of a flight can change quickly and lead to costly delays. There is good news. Advancements in real-time big data analysis are changing the course of flight as we know it. Imagine if the pilot could augment their decision-making process with real time business intelligence information available in the cockpit that would allow them to make adjustments to their flight patterns.

Practical Objectives

Develop a machine learning algorithm to analyze and predict flight departure delays using flights dataset (provded). Identify the factors most likely to cause flight delays. Predict whether an individual flight will be delayed, if yes, then how much is the delay?


There is a posibility to join an active competition (FlightQuest2) at Kaggle.com, and win a prize.

Predict Loan Defaulters

Give me some credit. No, first we should predict probability of default.
Banks play a crucial role in market economies. They decide who can get finance and on what terms and can make or break investment decisions. For markets and society to function, individuals and companies need access to credit. Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted. This competition requires participants to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years.

Practical Objectives

Provided a data set, build a model that can predict the probability that a customer can fail to pay back in next two years.

More topics to be added soon