Markov Decision Processes in Artificial Intelligence (English)

In the recent years, we have witnessed spectacular progress in applying techniques of reinforcement learning to problems that have for a long time considered to be out-of-reach -- be it the game of „Go“ or autonomous driving. This course is about Markov decision processes, which is the mathematical foundation of reinforcement learning. The style of the course will be two-fold. On the one hand, the lecture will provide rigorous definitions and proves for the most central motives in Markov decision processes. On the other hand, this theory will be illustrated by hands-on implementations reflecting the most recent developments in this fast-moving field.


  • Deep Q-learning
  • Policy Gradient Theorem
  • Value and Policy Iteration
  • Linear-Quadratic-Gaussian Control
  • Kalman Filter
  • Multi-Armed Bandits
  • Stochastic Approximation Algorithms
  • Lecturer

    Prof. Dr. Christian Hirsch

  • Schedule

    Lecture: Monday, 10.15-11.45, B6, A203, Tuesday, 10.15-11.45, A5, C012

    Exam: tba

  • News

    There are additional lectures on

    1. Tuesday, September 3, 13.45 - 15.15 , B6, A104,
    2. Tuesday, September 17, 17.15-18.45, B6, A301
    3. TBA

    There are no lectures on

    1. Monday, September 30
    2. Tuesday, October 1
    3. TBA
  • Literature

    Dimitri P. Bertsekas (2005, 3rd edition) Dynamic Programming and Optimal Control, Vol. I

    Tor Lattimore and Csaba Szepesvári (2019) Bandit Algorithms,

    Martin L. Puterman (2009, 2nd edition) Markov Decision Processes: Discrete Stochastic Dynamic Programming