Information Systems and Machine Learning Lab, University of Hildesheim, Germany

Courses in winter term 2020 / Master-Seminar: Data Analytics 2

Abstract

Readings

Materials:

[1] V. Mnih et al., “Playing Atari with Deep Reinforcement Learning.” arXiv, Dec. 19, 2013. Accessed: Sep. 14, 2023. [Online]. Available: http://arxiv.org/abs/1312.5602
[2] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in International conference on machine learning, 2014, pp. 387–395.
[3] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning, 2015, pp. 1889–1897.
[4] V. Mnih et al., “Asynchronous methods for deep reinforcement learning,” in International conference on machine learning, 2016, pp. 1928–1937.
[5] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proceedings of the AAAI conference on artificial intelligence, 2016.
[6] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
[7] A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine, “Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning.” arXiv, Dec. 01, 2017. doi: 10.48550/arXiv.1708.02596.
[8] D. Silver et al., “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.” arXiv, Dec. 05, 2017. doi: 10.48550/arXiv.1712.01815.
[9] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning, 2018, pp. 1861–1870.
[10] M. Hessel et al., “Rainbow: Combining improvements in deep reinforcement learning,” in Thirty-second AAAI conference on artificial intelligence, 2018.
[11] W. Dabney, G. Ostrovski, D. Silver, and R. Munos, “Implicit Quantile Networks for Distributional Reinforcement Learning.” arXiv, Jun. 14, 2018. doi: 10.48550/arXiv.1806.06923.
[12] J. Fu, K. Luo, and S. Levine, “Learning Robust Rewards with Adversarial Inverse Reinforcement Learning.” arXiv, Aug. 13, 2018. doi: 10.48550/arXiv.1710.11248.
[13] J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-Dimensional Continuous Control Using Generalized Advantage Estimation.” arXiv, Oct. 20, 2018. Accessed: Sep. 14, 2023. [Online]. Available: http://arxiv.org/abs/1506.02438
[14] A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative Q-Learning for Offline Reinforcement Learning.” arXiv, Aug. 19, 2020. doi: 10.48550/arXiv.2006.04779.
[15] J. Schrittwieser et al., “Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model,” Nature, vol. 588, no. 7839, pp. 604–609, Dec. 2020, doi: 10.1038/s41586-020-03051-4.
[16] L. Chen et al., “Decision Transformer: Reinforcement Learning via Sequence Modeling.” arXiv, Jun. 24, 2021. doi: 10.48550/arXiv.2106.01345.
[17] L. Ouyang et al., “Training language models to follow instructions with human feedback.” arXiv, Mar. 04, 2022. Accessed: Sep. 14, 2023. [Online]. Available: http://arxiv.org/abs/2203.02155
[18] C. Lu, J. G. Kuba, A. Letcher, L. Metz, C. S. de Witt, and J. Foerster, “Discovered Policy Optimisation.” arXiv, Oct. 12, 2022. Accessed: Sep. 14, 2023. [Online]. Available: http://arxiv.org/abs/2210.05639
[19] D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, “Mastering Diverse Domains through World Models.” arXiv, Jan. 10, 2023. Accessed: Oct. 17, 2023. [Online]. Available: http://arxiv.org/abs/2301.04104