wir bieten...
Dekobild im Seitenkopf ISMLL
 
Courses in winter term 2020 / Master-Seminar: Data Analytics 2
Readings

Materials:

  • [1] V. Mnih et al., “Playing Atari with Deep Reinforcement Learning.” arXiv, Dec. 19, 2013. Accessed: Sep. 14, 2023. [Online]. Available: http://arxiv.org/abs/1312.5602
  • [2] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in International conference on machine learning, 2014, pp. 387–395.
  • [3] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning, 2015, pp. 1889–1897.
  • [4] V. Mnih et al., “Asynchronous methods for deep reinforcement learning,” in International conference on machine learning, 2016, pp. 1928–1937.
  • [5] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proceedings of the AAAI conference on artificial intelligence, 2016.
  • [6] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  • [7] A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine, “Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning.” arXiv, Dec. 01, 2017. doi: 10.48550/arXiv.1708.02596.
  • [8] D. Silver et al., “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.” arXiv, Dec. 05, 2017. doi: 10.48550/arXiv.1712.01815.
  • [9] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning, 2018, pp. 1861–1870.
  • [10] M. Hessel et al., “Rainbow: Combining improvements in deep reinforcement learning,” in Thirty-second AAAI conference on artificial intelligence, 2018.
  • [11] W. Dabney, G. Ostrovski, D. Silver, and R. Munos, “Implicit Quantile Networks for Distributional Reinforcement Learning.” arXiv, Jun. 14, 2018. doi: 10.48550/arXiv.1806.06923.
  • [12] J. Fu, K. Luo, and S. Levine, “Learning Robust Rewards with Adversarial Inverse Reinforcement Learning.” arXiv, Aug. 13, 2018. doi: 10.48550/arXiv.1710.11248.
  • [13] J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-Dimensional Continuous Control Using Generalized Advantage Estimation.” arXiv, Oct. 20, 2018. Accessed: Sep. 14, 2023. [Online]. Available: http://arxiv.org/abs/1506.02438
  • [14] A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative Q-Learning for Offline Reinforcement Learning.” arXiv, Aug. 19, 2020. doi: 10.48550/arXiv.2006.04779.
  • [15] J. Schrittwieser et al., “Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model,” Nature, vol. 588, no. 7839, pp. 604–609, Dec. 2020, doi: 10.1038/s41586-020-03051-4.
  • [16] L. Chen et al., “Decision Transformer: Reinforcement Learning via Sequence Modeling.” arXiv, Jun. 24, 2021. doi: 10.48550/arXiv.2106.01345.
  • [17] L. Ouyang et al., “Training language models to follow instructions with human feedback.” arXiv, Mar. 04, 2022. Accessed: Sep. 14, 2023. [Online]. Available: http://arxiv.org/abs/2203.02155
  • [18] C. Lu, J. G. Kuba, A. Letcher, L. Metz, C. S. de Witt, and J. Foerster, “Discovered Policy Optimisation.” arXiv, Oct. 12, 2022. Accessed: Sep. 14, 2023. [Online]. Available: http://arxiv.org/abs/2210.05639
  • [19] D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, “Mastering Diverse Domains through World Models.” arXiv, Jan. 10, 2023. Accessed: Oct. 17, 2023. [Online]. Available: http://arxiv.org/abs/2301.04104