Markov Decision Processes in Artificial Intelligence (English)
In the recent years, we have witnessed spectacular progress in applying techniques of reinforcement learning to problems that have for a long time considered to be out-of-reach -- be it the game of “Go” or autonomous driving. This course is about Markov decision processes, which is the mathematical foundation of reinforcement learning. The style of the course will be two-fold. On the one hand, the lecture will provide rigorous definitions and proves for the most central motives in Markov decision processes. On the other hand, this theory will be illustrated by hands-on implementations reflecting the most recent developments in this fast-moving field.
Topics
- Deep Q-learning
- Policy Gradient Theorem
- Value and Policy Iteration
- Linear-Quadratic-Gaussian Control
- Kalman Filter
- Multi-Armed Bandits
- Stochastic Approximation Algorithms
Lecturer
Prof. Dr. Christian Hirsch
Schedule
Lecture: Monday, 10.15–11.45, B6, A203, Tuesday, 10.15–11.45, A5, C012
Exam: tba
News
Die mündlichen Prüfungen finden statt am Di, 17.12.2019 und Fr, 31.1.2020. Anmeldung mit Angabe ob ein Termin im ersten oder zweiten Prüfungszeitraum erwünscht wird per Mail bis spätestens 22.11.2019
Literature
https://github.com/Christian-Hirsch/Space_Invaders
https://spinningup.openai.com/en/latest/spinningup/rl_intro.html
V. S. Borkar and S. P. Meyn (2000), The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation
V. S. Borkar (2008), Stochastic Approximation A Dynamical Systems Viewpoint
Dimitri P. Bertsekas (2005, 3rd edition) Dynamic Programming and Optimal Control, Vol. I
Tor Lattimore and Csaba Szepesvári (2019) Bandit Algorithms, https://tor-lattimore.com/downloads/book/book.pdf
Martin L. Puterman (2009, 2nd edition) Markov Decision Processes: Discrete Stochastic Dynamic Programming
Lecture Notes
Irregularly updated Lecture Notes. Use at own risk -- no guarantees on completeness or correctness!