Information Systems and Machine Learning Lab, University of Hildesheim, Germany

Courses in summer term 2006 / Seminar on Text Mining and Ontology Learning:

abstract

readings

Time:	Tue. 14-16
Location:	SR 01-018, Geb. 101
Vorbesprechung:	Mon., April 24, 16-18 (Room 026, Building 101)
Begin:	25.04.2005

Machines understanding text written in an human language such as English or German is one of the major goals of artificial intelligence and machine learning. While it may not so clear what "understanding" means exactly for machines and anything worth to be called true machine understanding is decades ahead, for many interesting applications the ability to deal with textual data in a very narrow context already is sufficient. For example:

(i) All of us rely on capabilities of our spam filter to automatically sort incoming emails in legitimate and spam email, a binary classifier learned from example texts/emails.

(ii) Retailers and comparison shopping platforms integrating offers from hundreds of different providers have to identify offers of the same product based on textual descriptions that typically vary a little bit from provider to provider, so that their customers can view all offers to the same product in a single place.

(iii) Information portals for bibliographic data such as citeseer or for job offerings often crawl their information automatically from the net and therefore have to extract relevant pieces of information such as addresses, job titles, etc. from texts.

Methods that address these tasks have been developed in different research communitites lately, such as Statistical Natural Language Processing (NLP) and Computational Linguistics, Text Mining, Information Retrieval, Information Extraction, etc.

More recently, the output of such methods also is represented formally in logics, e.g., entities as instances, relations between entities as predicates, etc. Especially, some fragments of first order logics, description logics, sometimes also called ontologies, have been used for this task. In this context, the task often is called ontology learning.

This seminar aims at presenting an broad overview of methods for dealing with texts that address some of these problems.

Talks can be given in English or German.

Interested students can register for a topic from now via email to . Topics also will be assigned at the common seminar introduction session of the Department of Computer Science at Monday, April 24, 16-18 (Room 026, Building 101).

Supervisors are: Prof. Dr. Lars Schmidt-Thieme, Christine Preisach, Karen Tso and Leandro Balby Marinho

Topics and preliminary schedule:

	Tue. 25.04	(0)	-- Introduction --
I. Text Classification
	Tue. 16.05	(1)	A Survey of Text Classification Methods, especially Support Vector Machines
	Tue. 23.05	(2)	Text Classification considering Background Knowledge
	Tue. 30.05		No Seminar
	Tue. 06.06		Pentecost Holydays
	Tue. 13.06	(3)	Automatic Classification based on semantic hierarchies
II. Some Basic Problems
	Tue. 20.06	(4)	Named Entity Recognition
	Tue. 27.06	(5)	Word Sense Disambiguation
	Tue. 04.07	(6)	Coreference Resolution
III. Learning Concept Taxonomies
	Tue. 11.07	(7)	Ontology Semantic Similarity
	--	(8)	Evaluation of information extraction tasks and ontologies
	Tue. 18.07	(9)	Learning Concept Taxonomies
IV. Learning General Relations
	Tue. 25.07	(10)	Learning Relations using Association Rules
	--	(11)	Learning Relations using Kernel Methods
	--	(12)	Adaptive Information Extraction (LP2 algorithm)
V. Applications
	--	(13)	Text Summarization