wir bieten...
Dekobild im Seitenkopf ISMLL
 
Courses in summer term 2005 / Seminar on Predictive Modelling / readings:

List of readings (ee = link to electronic edition; ask me for the other references):

  • For those who are interested, an irregular slot for part 2 of Topic 7 will be presented on Tue. 12.7. between 13:30-14:15
  •  Tue. 12.4.(0)-- Introduction --
    I. Some Fundamentals
    ---(1)Evaluation of Classifier Performance
    • [ee] R. Caruana and A. Niculescu-Mizil (2004): Data mining in metric space: an empirical analysis of supervised learning performance criteria. ACM KDD

    optionally:
    • [ee] K.C. Klauer and W.H. Batchelder (1996): Structural analysis of subjective categorical data. Psychometrika 61
    • [ee] A.P. Sinha and J.H. May (2005): Evaluating and Tuning Predictive Data Mining Models Using Receiver Operating Characteristic Curves. Journal of Management Information Systems 21/3
    Tue. 10.5.(2)Support Vector Machines (SVMs) Speaker: Daniel Weisser
    • [ee] Th. Joachims (1999): Making Large Scale SVM learning practical.
    • [ee] J. Platt (1998): Sequential minimal optimization: a fast algorithm for training support vector machines.

    background: some of
    • T. Hastie, R. Tibshirani and J. Friedman (2001): The elements of statistical learning, chapter 12.1-12.3
    • L. Wasserman (2005): All of statistics, chapter 22.9-22.10
    • Lectures "Machine Learning" and "Computer Vision"
    • P.S. Bradley, O.L. Mangasarian and D.R. Musicant (2002): Optimization methods in massive data sets.
    Tue. 24.5.(3)Missing Values and the Expectation Maximization (EM) algorithm Speaker: Corina Mitrohin
    • [ee] Z. Ghahramani and M. Jordan (1994): Supervised learning from incomplete data via an EM approach. Advances in Neural Information Processing Systems 6

    background: some of
    • G.J. McLachlan and T. Krishnan (1997): The EM algorithm and extensions.
    • P.D. Allison (2001): Missing data.
    • R.J.A. Little and D.B. Rubin (2002): Statistical Analysis with Missing Data.
    II. Some Basic Problems
    ---(4)Imbalanced class distributions
    • [ee] Gary M. Weiss (2004): Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter
    • [ee] G. Batista, R.C. Prati and M.C. Monard (2004): A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter

    optionally: one of
    • [ee] H. Guo and H.L. Viktor (2004): Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach. ACM SIGKDD Explorations Newsletter
    • [ee] T. Jo and N. Japkowicz (2004): Class imbalances versus small disjuncts. ACM SIGKDD Explorations Newsletter
    • [ee] B. Raskutti and A. Kowalczyk (2004): Extreme re-balancing for SVMs: a case study. ACM SIGKDD Explorations Newsletter
    Tue. 31.5.(5) Multi-class predictions (aka multi-label, multi-category) Speaker: Stefan Hauger
    • [ee] C.-W. Hsu and C.-J. Lin (2002): A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks 13

    optionally:
    • [ee] T.-K. Huang, R.C. Weng, and C.-J. Lin (2004): A Generalized Bradley-Terry Model: From Group Competition to Individual Skill. NIPS
    Tue. 7.6. [cancelled]
    Fr. 10.6.(6) Hierarchical targets Speaker: Jens Heidrich
    • [ee] A. Sun and E.-P. Lim (2001): Hierarchical Text Classification and Evaluation. ICDM

    optionally:
    • [ee] O. Dekel, J. Keshet and Y. Singer (2004): Large margin hierarchical classification. ICML

    background:
    • [ee] M. Granitzer (2003): Hierarchical Text Classification using Methods from Machine Learning (section 3). Master Thesis Graz University of Technology
    III. Classification of structured objects (structured input) and
         Predicting structured targets (structured output)
    Tue.14.6. (7) Classification of sequences with Support Vector Machines Speaker: Yang Li
    • [ee] C. Leslie and R. Kuang (2003): Fast kernels for inexact string matching.
    • [ee] H. Lodhi, J. Shawe-Taylor, N. Cristianini and Ch. Watkins (2002): Text Classification using String Kernels.

    background:
    • [ee] Th. Gärtner (2003): A survey of kernels for structured data.
    Tue. 21.6. [cancelled]
    Fr. 24.6. (9) Predicting sequences Speaker: Johannes Rudolph
    • [ee] Y. Altun, I. Tsochantaridis and Th. Hofmann (2003): Hidden Markov Support Vector Machines. ("HM-SVM")
    • [ee] J. Lafferty, A. McCallum and F. Pereira (2001): Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.
    Fr. 24.6. (10) Predicting structured targets in general (e.g., rankings) Speaker: Till Knorr
    • [ee] I. Tsochantaridis, Th. Hofmann, Th. Joachims and Y. Altun (2004): Support Vector Machine Learning for Interdependent and Structured Output Spaces
    • [ee] P. Bartlett, M. Collins, D. McAllester and B. Taskar (2004): Large margin methods for structured classification: exponentiated gradient algorithms and PAC-Bayesian generalization bounds.
    IV. Using unlabeled data (transductive inference, transduction,
         semi-supervised classification)
    --- (11) Transductive ridge regression.
    • [ee] O. Chapelle, V. Vapnik and J. Weston (1999): Transductive inference for estimating values of functions.

    background:
    • [ee] M. Seeger (2002): Learning with labeled and unlabeled data.
    Tue. 28.6. (12) Transductive Support Vector Machines Speaker: Teodora Vatahska
    • [ee] Th. Joachims (1999): Transductive Inference for Text Classification using Support Vector Machines. ("TSVM")
    • K. Bennet (1999): Combining support vector mathematical programming methods for classification.

    background:
    • V. Vapnik (1998): Statistical learning theory.
    • [ee] U. Brefeld and T. Scheffer (2004): Co-EM support vector machine. [sections 1-3]
    --- (13) Transductive k-nearest neighbor classifiers
    • [ee] Th. Joachims (2003): Transductive Learning via Spectral Graph Partitioning.
    • [ee] A. Blum and S. Chawla (2001): Learning from labeled and unlabeled data using graph mincuts.

    background:
    • [ee] J. Kleinberg and E. Tardos (2000): Approximation algorithms for classification problems with pairwise relationships: metric labeling and markov random fields.

    optionally:
    • [ee] A. Blum, J. Lafferty, M.R. Rwebangira, R. Reddy (2004): Semi-supervised learning using randomized mincuts.
    • X. Zhu, Z. Gharahmani and J. Lafferty (2003): Semi-supervised learning using Gaussian fields and harmonic functions.
    Tue. 5.7. (8) Classification of graphs with Support Vector Machines Speaker: Dominic Blasius
    • [ee] T. Horvath, Th. Gärtner and St. Wrobel (2004): Cyclic Pattern Kernels for Predictive Graph Mining.
    • H. Kashima, K. Tsuda, and A. Inokuchi (2003): Marginalized kernels between labeled graphs.
    • Th. Gärtner, P. Flach, S. Wrobel (2003): On graph kernels: Hardness results and efficient alternatives. COLT
    • [ee] Th. Gärtner (2002): Exponential and geometric kernels for graphs.

    background:
    • [ee] Th. Gärtner (2003): A survey of kernels for structured data.
    V. Using relations between instances (relational learning, collective inference)
    Tue. 5.7. (14) Co-training Speaker: -----
    • [ee] R. Ghani (2002): Combining labeled and unlabeled data for multiclass text categorization.
    • [ee] K. Nigam and R. Ghani (2000): Analyzing the effectiveness and applicability of co-training.
    • [ee] A. Blum and Th. Mitchell (1998): Combining labeled and unlabeled data with co-training

    optionally:
    • [ee] U. Brefeld and T. Scheffer (2004): Co-EM Support Vector Learning.
    Tue. 12.7. (7) Classification of sequences with Support Vector Machines - Part II (13:30-14:15) Speaker: Yang Li
    Tue. 12.7. (15) Collective Inference Speaker: Robert Kaschuba
    • [ee] J. Neville and D. Jensen (2000): Iterative Classification in Relational Data. AAAI Workshop on Learning Statistical Models from Relational Data
    • [ee] B. Taskar, E. Segal and D. Koller (2001): Probabilistic Classification and Clustering in Relational Data. UAI

    optionally:
    • [ee] D. Jensen, J. Neville and B. Gallagher (2004): Why collective inference improves relational classification.