wir bieten...
Dekobild im Seitenkopf ISMLL
 
Themen für Projekte und Abschlussarbeiten:
( methodischer Schwerpunkt, technischer Schwerpunkt)


Master Thesis at ISMLL

Available Supervisors

Name Available Slots
Dr. Josif Grabocka 2
Mohsan Jameel, M.Sc. From June 2019
Rafael Rêgo Drumond, M.Sc. From September 2020
Ahmed Rashed, M.Sc. From December 2019
Hadi Samer Jomaa, M.Eng. 1
Randolf Scholz, M.Sc. 0
Jonas Falkner, M.Sc. 2
Vijaya Krishna Yalavarthi, M.Eng. From April 2020
Mofassir ul Islam Arif 3
Lukas Brinkmeyer From July 2020
Eya Boumaiza From April 2020
Shayan Jawed 3

Available Master Thesis topics at ISMLL

available
Exploration in parallel/distributed reinforcement learning

The reinforcement learning (RL) has seen wide up-take by the research community as well as the industry. The RL setting consists of an agent which interacts with the environment and learns a policy that is optimal to solve a certain problem. If the environment is complex the learning process could slow down, requiring significantly amount of resources to reach the optimal solution. In this direction the speedup of the learning process could be achieved by using parallel and distributed concepts as well as enhancing the exploration capacity of the agent.

In this thesis you have to develop a scalable RL framework that exploits the parallelism to speed up the learning process and diversify the exploration capacity of the agents.

References:

  • 1. Dimakopoulou, Maria and Van Roy, Benjamin. "Coordinated Exploration in Concurrent Reinforcement Learning" Proceedings of the 35th International Conference on Machine Learning, 2018.
  • 2. Volodymyr Mnih, et al. "Asynchronous Methods for Deep Reinforcement Learning" Proceedings of The 33rd International Conference on Machine Learning, 2016

Kontakt: Mohsan Jameel
available
Towards a distributed screening algorithm for Big Data Optimization for Sparse Models

The big data is generally attributed with the large number of data instances and features. This increase in the size of data is attributed to sophisticated means of capturing data via sensors or growth in online retail services. In both cases the size of data ranges between terabytes to petabytes with billions of features. However, the data features that significantly contribute towards learning are very sparse. The sparse machine learning models are typically used i.e. LASSO. The existing sparse models that work in distributed settings usually start with large models and have a high communication cost. The high communication cost dominates the execution time in distributed settings.

In literature, many screening algorithms[2] exist that reduce dimensionality of the model. But these algorithms are applied as a preprocessing step and are sequential. In this thesis you have to explore the existing screening algorithms by implement and evaluating them in a distributed setting. Interesting aspect would be to explore the compactness of these models, its impact on the communication cost and the convergence. A novel aspect would be to design an efficient distributed optimization algorithm, that exploits compactness of the model.

References:

  • 1. Li, Qingyang, et al. "Parallel Lasso Screening for Big Data Optimization." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.
  • 2. L. E. Ghaoui, V. Viallon, and T. Rabbani. Safe feature elimination for the lasso and sparse supervised learning problems. Pacific Journal of Optimization, (8):667–698, 2012.
  • 3. R. Tibshirani, J. Bien, J. Friedman, T. Hastie, N. Simon, J. Taylor, and R. J. Tibshirani. Strong rules for discarding predictors in lasso-type problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(2):245–266, 2012.

Kontakt: Mohsan Jameel
available
Hierarchical Reinforcement Learning

Dividing the reinforcement learning task into a hierarchy of problems is one of the solutions provided for environment with a large state space. Some of the existing methods focus on decomposing the reward and learning separate value functions per component [1]. Other methods, referred to as Feudal Learning, split the learning agent into modules, namely a master and slave. The master learns abstract actions and propagates them to the worker, which receivs an inherent reward if he comlies with the master, and aims to maximize the discounted reward [2]. In this thesis, the objective is to focus on another aspect of hierarchical learning, Compositonal Q-learning [3]. Simply put, in compositional Q-learning, the agent aims to maximize the reward of elemental tasks, and composite tasks, which are an ordered sequence of elemental tasks. This type of Learning is interesting in atari games as well as for robotic control, both of which have readliy available source code inthe OpenAI repository[4].

References: -

  • [1] Van Seijen, Harm, et al. "Hybrid reward architecture for reinforcement learning." Advances in Neural Information Processing Systems. 2017.
  • [2] Vezhnevets, Alexander Sasha, et al. "Feudal networks for hierarchical reinforcement learning." arXiv preprint arXiv:1703.01161 (2017).
  • [3] Tham, Chen K., and Richard W. Prager. "A modular q-learning architecture for manipulator task decomposition." Machine Learning Proceedings 1994. 1994. 309-317.
  • [4] https://github.com/openai/gym

available
Uncertainty-driven Exploration in Q-learning

One of the existing challenges in reinforcement learning is exploration. After an agent is trained, it tends to stick to the actions that have proved to maximize its previous rewards, without any incentive to explore other, possibly more rewarding actions. Several approaches have been proposed to boost exploration which inlcude noise injection into the action space[1], noise injection into the parameter space[2], curiosity drive exploration in which curiosity leads to a higher intrinsic reward [3]. In this thesis, you will focus on uncertainty-driven exploration in discrete action-spaces. In this method, every action is associated with a degree of uncertainty, and the best action is selected based on a modified reward that accommodates the level of uncertainty. Similar work has been investigated within the the deep learning community, and proved to improve the overall performance[4].

References: -

  • [1] Osband, Ian, et al. "Deep exploration via bootstrapped DQN." Advances in neural information processing systems. 2016.
  • [2] Plappert, Matthias, et al. "Parameter space noise for exploration." arXiv preprint arXiv:1706.01905 (2017).
  • [3] Pathak, Deepak, et al. "Curiosity-driven exploration by self-supervised prediction." International Conference on Machine Learning (ICML). Vol. 2017. 2017.
  • [4] Kendall, Alex, and Yarin Gal. "What uncertainties do we need in bayesian deep learning for computer vision?." Advances in neural information processing systems. 2017.

available
Hyperparameter optimization with reinforcement learning

Choosing the set of optimal hyperparameters is an important step in tuning a learning architecture. Existing methods inlude the naive grid-search or a random search over a given space of values. More statistical methods include Bayesian optimization [1,2],as well as gradient-based tuning [3]. However, with the recent evolved interest in reinforcement learning, it is now feasible to add a controller which is trained simultaneously with the proposed architecture, while the agent adaptively chooses new hyperparameter from a continuous action space[4]. In this thesis, you will work on formalizing the reinforcement learning task, by defining the state and reward associated with every action, (hyperparameter). At the first level, you will focus on simple models with a small number of hyperparameter, for example: regularization factor of an SVM, and then progress to adaptively tuning neural networks by selecting the right dropout values, L2-regularization factor, etc.

References: -

  • [1] Hutter, Frank, Holger H. Hoos, and Kevin Leyton-Brown. "Sequential model-based optimization for general algorithm configuration." International Conference on Learning and Intelligent Optimization. Springer, Berlin, Heidelberg, 2011.
  • [2] Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. "Practical bayesian optimization of machine learning algorithms." Advances in neural information processing systems. 2012.
  • [3] Bengio, Yoshua. "Gradient-based optimization of hyperparameters." Neural computation 12.8 (2000): 1889-1900.
  • [4] Lillicrap, Timothy P., et al. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).

Not available
Dynamic Recommender Systems

Traditional recommender systems assume that the user profile and item attributes are static factors. This assumption doesn't consider the temporal dynamics that affect the user profile or the items e.g. user interest in an item may change based on the season of the year like watching Christmas movies in summer and winter. This thesis will identify the potential approaches to build a dynamic recommender system model that can predict users interest in items based on their historical interactions and behaviors.

References: -

  • 1) Chao-Yuan Wu, Amr Ahmed, Alex Beutel, Alexander J. Smola, and How Jing. 2017. Recurrent Recommender Networks. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM '17). ACM, New York, NY, USA, 495-503. DOI: https://doi.org/10.1145/3018661.3018689
  • 2) Yongfeng Zhang, Min Zhang, Yi Zhang, Guokun Lai, Yiqun Liu, Honghui Zhang, and Shaoping Ma. 2015. Daily-Aware Personalized Recommendation based on Feature-Level Time Series Analysis. In Proceedings of the 24th International Conference on World Wide Web (WWW '15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1373-1383. DOI: https://doi.org/10.1145/2736277.2741087

Kontakt: Ahmed Rashed
available
Location Prediction with Deep Learning In Social Networks

Location-based Social Networks (LSNs) allows users to see where their friends are, to search location-tagged content within their social graph, and to meet others nearby. The recent availability of open mobile platforms, such as Apple iPhones and Google Android phones, makes LSNs data much more accessible. This thesis will identify the potential deep learning approaches to predict the next user location based on his historical movements, interaction and extra auxiliary information like topics, tags, and images that might enhance the prediction accuracy.

References: -

  • 1) Ye, Jihang, Zhe Zhu, and Hong Cheng. "What's your next move: User activity prediction in location-based social networks." Proceedings of the 2013 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2013.
  • 2) McGee, Jeffrey, James Caverlee, and Zhiyuan Cheng. "Location prediction in social media based on tie strength." Proceedings of the 22nd ACM international conference on Information and Knowledge Management. ACM, 2013.

Kontakt: Ahmed Rashed
Not available
Calculating feature similarity across data-sets with Siamese Networks for Model Agnostic Meta-Learning

Siamese networks have grown quite popular in the field of Machine Learning specially with research about object tracking and text equivalence, since they are able to compare samples and estimate their similarity. The hypothesis is that this architecture can be proven as a useful tool if you want to compare similarities between features from different data-sets becoming an asset for performing Model Agnostic Meta-Learning across tasks with different schemas.

In this thesis you will implement a siamese network to improve the performance of Model Agnostic Meta-Learning Methods that work across tasks with different schemas.

References:

  • 1. Zhang, Zhipeng, and Houwen Peng. "Deeper and wider siamese networks for real-time visual tracking." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
  • 2. Mueller, J., Thyagarajan, A. (2016, March). Siamese recurrent architectures for learning sentence similarity. In thirtieth AAAI conference on artificial intelligence.
  • 3. Brinkmeyer, L., Drumond, R. R., Scholz, R., Grabocka, J., Schmidt-Thieme, L. (2019). Chameleon: Learning model initializations across tasks with different schemas.
  • 4. Drumond, R. R., Brinkmeyer, L., Grabocka, J., Schmidt-Thieme, L. (2020). HIDRA: Head Initialization across Dynamic targets for Robust Architectures. In Proceedings of the 2020 SIAM International Conference on Data Mining (pp. 397-405). Society for Industrial and Applied Mathematics.

available
Evolutionary Computation for Periodic Vehicle Routing Problems

The classical vehicle routing problem (VRP) designs optimal delivery routes where each vehicle only travels one route, each vehicle has the same characteristics and there is only one central depot. The goal of the VRP is to find a set of least-cost vehicle routes such that each customer is visited exactly once by one vehicle, each vehicle starts and ends its route at the depot, and the capacity of the vehicles is not exceeded. This classical VRP has been extended in many ways by introducing additional real-life aspects or characteristics, resulting in a large number of variants of the VRP. In the standard periodic vehicle routing problem (PVRP), customers require visits on one or more days within a planning period, and there are a set of feasible visit options for each customer. Customers must be assigned to a feasible visit option and a VRP is solved for each day in the planning period. The typical objective is to minimize the total distance traveled over the planning period. The PVRP arises in a diverse array of applications, from the collection of recyclables, to the routing of home healthcare nurses, to the collection of data in wireless networks. The wide applicability and versatility of the problem, coupled with the problem’s difficulty, has led to continuing interest and research efforts.

In computer science, evolutionary computation is a family of algorithms for global optimization inspired by biological evolution, and the subfield of artificial intelligence and soft computing studying these algorithms. In technical terms, they are a family of population-based trial and error problem solvers with a metaheuristic or stochastic optimization character. In evolutionary computation, an initial set of candidate solutions is generated and iteratively updated. Each new generation is produced by stochastically removing less desired solutions, and introducing small random changes. In biological terminology, a population of solutions is subjected to natural selection (or artificial selection) and mutation. As a result, the population will gradually evolve to increase in fitness, in this case the chosen fitness function of the algorithm. Evolutionary computation techniques can produce highly optimized solutions in a wide range of problem settings, making them popular in computer science.

References:

  • 1. Braekers, Kris, Katrien Ramaekers, and Inneke Van Nieuwenhuyse. “The Vehicle Routing Problem: State of the Art Classification and Review.” Computers and Industrial Engineering 99 (September 1, 2016): 300–313. https://doi.org/10.1016/j.cie.2015.12.007.
  • 2. Campbell, Ann Melissa, and Jill Hardin Wilson. “Forty Years of Periodic Vehicle Routing.” Networks 63, no. 1 (January 2014): 2–15. https://doi.org/10.1002/net.21527.
  • 3. Zhou, Aimin, Bo-Yang Qu, Hui Li, Shi-Zheng Zhao, Ponnuthurai Nagaratnam Suganthan, and Qingfu Zhang. “Multiobjective Evolutionary Algorithms: A Survey of the State of the Art.” Swarm and Evolutionary Computation 1, no. 1 (March 1, 2011): 32–49. https://doi.org/10.1016/j.swevo.2011.03.001.
  • 4. Ombuki, Beatrice, Brian J. Ross, and Franklin Hanshar. “Multi-Objective Genetic Algorithms for Vehicle Routing Problem with Time Windows.” Applied Intelligence 24, no. 1 (February 1, 2006): 17–30. https://doi.org/10.1007/s10489-006-6926-z.

Contact: Jonas Falkner
available
RL Approaches to Periodic Vehicle Routing Problems

The classical vehicle routing problem (VRP) designs optimal delivery routes where each vehicle only travels one route, each vehicle has the same characteristics and there is only one central depot. The goal of the VRP is to find a set of least-cost vehicle routes such that each customer is visited exactly once by one vehicle, each vehicle starts and ends its route at the depot, and the capacity of the vehicles is not exceeded. This classical VRP has been extended in many ways by introducing additional real-life aspects or characteristics, resulting in a large number of variants of the VRP. In the standard periodic vehicle routing problem (PVRP), customers require visits on one or more days within a planning period, and there are a set of feasible visit options for each customer. Customers must be assigned to a feasible visit option and a VRP is solved for each day in the planning period. The typical objective is to minimize the total distance traveled over the planning period. The PVRP arises in a diverse array of applications, from the collection of recyclables, to the routing of home healthcare nurses, to the collection of data in wireless networks. The wide applicability and versatility of the problem, coupled with the problem’s difficulty, has led to continuing interest and research efforts.

Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. In the operations research and control literature, reinforcement learning is called approximate dynamic programming, or neuro-dynamic programming (NDP). It describes a recent methodology that deals with the approximate solution of large and complex dynamic programming (DP) problems. Accordingly NDP combines simulation, learning, approximation architectures (e.g., neural networks) and the central ideas of DP to break the curse of dimensionality that is typical of DP. Therefore NDP could be a promising approach to solve large-scale and complex VRP.

References:

  • 1. Braekers, Kris, Katrien Ramaekers, and Inneke Van Nieuwenhuyse. “The Vehicle Routing Problem: State of the Art Classification and Review.” Computers and Industrial Engineering 99 (September 1, 2016): 300–313. https://doi.org/10.1016/j.cie.2015.12.007.
  • 2. Campbell, Ann Melissa, and Jill Hardin Wilson. “Forty Years of Periodic Vehicle Routing.” Networks 63, no. 1 (January 2014): 2–15. https://doi.org/10.1002/net.21527.
  • 3. Sutton, Richard S, and Andrew G Barto. “Reinforcement Learning: An Introduction,”; https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf
  • 4. Secomandi, Nicola. “Comparing Neuro-Dynamic Programming Algorithms for the Vehicle Routing Problem with Stochastic Demands.” Computers and Operations Research 27, no. 11 (September 1, 2000): 1201–25. https://doi.org/10.1016/S0305-0548(99)00146-X.

Contact: Jonas Falkner
available
Action Space Incremental Reinforcement Learning

In general reinforcement learning, action space is pre-defined (number of actions are fixed throughout the task). But, in real world situations, we may need to use new actions to accomplish the tasks. It can also be observed in the video games where the number of actions may increase with increase in level of the game.

As an example, consider a well known DAVE game. In level one, there are the actions {up}, {left}, {right}. Once, we enter level three, we will get an additional action of {shooting}. If we do not know that in future we may get new action, one has to learn new action, and may need to train the reinforcement learning algorithm from scratch with the additional action. This will take lot of time and the knowledge gained in previous levels goes waste.

In this thesis, a novel learning technique that can adapt the new action incrementally not from the scratch. The algorithm shall not lose the existing knowledge (ex: knowledge of previous levels in video games).

References:

  • 1. Venkatesan, Rajasekar, and Meng Joo Er. "A novel progressive learning technique for multi-class classification." Neurocomputing 207 (2016): 310-321.
  • 2. Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529.

available
Disribution on training losses for learning

In general, while training a classification algorithm, we have true labels (Y) and predicted predicted labels (Y'). General trend of updating the parameters of the algorithm is by back propogating the loss which is average of absolute difference (or RMS loss). However, recent works shows that considereing top-k loss [1], average top-k loss [2] provide better results compared with traditional method.

In this work, one would analyse the results using above methods and find a better distribution over the losses for increasing the accuracy of the algorithm.

For information:

Top-k loss: It is the k^{th} maximum value in |Y - Y'| (similarly in RMS loss)

Average Top-k loss: It is the average of top-k values in |Y - Y'| (similarly in RMS loss)

References:

  • 1. S. Shalev-Shwartz and Y. Wexler. Minimizing the maximal loss: How and why. In ICML, 2016
  • 2. Yanbo Fan, Siwei Lyu, Yiming Ying, Bao-Gang Hu. Learning with Average Top-k Loss. In NeruIPS, 2017
  • 3. Leonard Berrada, Andrew Zisserman, M. Pawan Kumar. Smooth Loss Functions for Deep Top-k Classification. in ICLR 2018

If you are interested in other topics, please ask one of us directly.

Past Master theses at University of Hildesheim
Past Master theses at University of Freiburg