Information Systems and Machine Learning Lab, University of Hildesheim, Germany

Courses in Summer term 2007 / Seminar on Text Mining:

abstract

readings

Time:	Mittwoch 14:00 - 16:00
Location:	14:00 B 25
Begin:	11.04

Machines understanding text written in an human language such as English or German is one of the major goals of artificial intelligence and machine learning. While it may not so clear what "understanding" means exactly for machines and anything worth to be called true machine understanding is decades ahead, for many interesting applications the ability to deal with textual data in a very narrow context already is sufficient. For example:

(i) All of us rely on capabilities of our spam filter to automatically sort incoming emails in legitimate and spam email, a binary classifier learned from example texts/emails.

(ii) Retailers and comparison shopping platforms integrating offers from hundreds of different providers have to identify offers of the same product based on textual descriptions that typically vary a little bit from provider to provider, so that their customers can view all offers to the same product in a single place.

(iii) Information portals for bibliographic data such as citeseer or for job offerings often crawl their information automatically from the net and therefore have to extract relevant pieces of information such as addresses, job titles, etc. from texts.

Methods that address these tasks have been developed in different research communitites lately, such as Statistical Natural Language Processing (NLP) and Computational Linguistics, Text Mining, Information Retrieval, Information Extraction, etc.

More recently, the output of such methods also is represented formally in logics, e.g., entities as instances, relations between entities as predicates, etc. Especially, some fragments of first order logics, description logics, sometimes also called ontologies, have been used for this task. In this context, the task often is called ontology learning.

This seminar aims at presenting an broad overview of methods for dealing with texts that address some of these problems.

Talks can be given in English or German.

Supervisor: Karen Tso

Topics:

-- Introduction --
Overview of Text Mining.
Named Entity Recognition.
Google-PageRank.
Word Sense Disambiguation.
Text Summarization.
Sentiment Analysis.
Text analytics.
Text Clustering.
Information Extraction by Rule Induction.
Technology.

Interested students can register for a topic from now via email to .