Reinforcement Learning

This lecture (9 ECTS) will lay the foundations of reinforcement learning (RL). The lecture is devided into three parts: Multiarmed bandits, tabular RL, non-tabular RL.

Multiarmed bandits (mostly from the algorithmic point of view) – 5 lectures
- Explore-then-commit
- Greedy
- UCB
- Botzman Exploration
- Softmax (policy gradient)
Tabular MDP basics – 5 lectures
- Foundations of dynamic programming
- value iteration
- policy iteration
Tabular Q-learning, TD-learning – 6 lectures
- Monte Carlo evaluation
- Tsitsiklis' convergence proof of stochastic fixed point interations
- One-step approximate dynamic programming (TD(0), SARSA, Q-learning)
- Double Q-learning
- Multi-step approximate dynamic programming (n-step, forwards&backwards TD(lambda))
Policy Gradient Schemes – 10 lectures
- policy gradient theorems
- variance reduction tricks such as baseline, actor critic
- gradient descent and stochastic gradient descent
- neural networks in RL
- SAC, TRPO, PPO

We will prove everything that we think is needed for a proper understanding of the algorithms but also go into the coding (Python). At many instances of RL convergence proofs are still open (even worse, sometimes algorithms are known to diverge). We will cover theoretical results around RL which sometimes leads to good educated guesses for RL algorithms even though the theoretical assumptions of techniques cannot be checked (or are violated).

Why is reinforcement learning useful?
Reinforcement learning is a type of machine learning that involves training an agent to make a sequence of decisions in an environment in order to maximize a reward. It is often used to control complex, dynamic systems or to optimize performance. Some applications of reinforcement learning include:
Robotics: Reinforcement learning can be used to teach robots how to perform tasks by rewarding successful execution and punishing mistakes.
Financial markets: Reinforcement learning can be used to develop trading strategies by learning how to take advantage of market conditions.
Games: Reinforcement learning has been successfully used to control computer games by learning how to play against human or other computer opponents.
Web optimization: Reinforcement learning can be used to optimize websites by learning how to control traffic on the site in order to achieve certain goals.
Overall, reinforcement learning offers a way to optimize complex systems by learning how to act in certain situations in order to maximize rewards.
Attention: This text was written by chatGPT, an AI tool based on reinforcement learning (RL) itself (and transformer networks). I do not quite agree with chatGPT, financial markets seem not be very well suited to ML methods. Anyways, as we will cross RL in our future lives in manifold occasions it will be useful to know how RL works.
Target group
Students from the study programs Mathematics, WiMa, WiFo, MMDS. We will cover the mathematical background of reinforcement learning, coding (in python) will be part of the exercises.
Team
Prof. Dr. Leif Döring, André Ferdinand, Till Freihaut, Sara Klein, Marc Pritsch, Almut Röder, Leo Vela
Weekly schedule
Lectures:
Tuesday, B2, garden house (in the garden of B6, 26)
Thursday, B2, garden house (in the garden of B6, 26)
Tutorials:
Thursday, B1, garden house (in the garden of B6, 26)
Oral exams
Exams will be oral, here are some hints.
Dates: 1st-5th of June, 26th, 27th, 28th of July, 26th of August-1st of September

Link to Introduction Repository	https://github.com/aferdina/IntroductionRL/
Link to the Videos	https://www.youtube.com/@andreferdinand8405/videos

1. Exercise (PDF, 370 kB)Sheet (PDF, 370 kB),	1. Solution Sheet (PDF, 378 kB)
2. Exercise Sheet (PDF, 374 kB)	2. Solution Sheet (PDF, 392 kB)
3. Exercise Sheet (PDF, 352 kB)	3. Solution Sheet (PDF, 339 kB)
4. Exercise Sheet (PDF, 359 kB)	4. Solution Sheet (PDF, 390 kB)
5. (PDF, 336 kB)Exercise (PDF, 336 kB)Sheet (PDF, 336 kB) (new numeration in the lecture notes)	5. Solution Sheet (PDF, 373 kB)
6. Exercise Sheet (PDF, 337 kB)	6. Solution Sheet (PDF, 396 kB)
7. (PDF, 362 kB)Exercise (PDF, 362 kB) Sheet (update: 05.04.)	7. Solution Sheet (PDF, 424 kB)
8. Exercise Sheet (PDF, 333 kB)	8. Solution Sheet (PDF, 412 kB)
9. Exercise Sheet (PDF, 357 kB) (updated)	9. Solution Sheet (PDF, 383 kB)
10. Exercise Sheet (PDF, 337 kB)	10. Solution Sheet (PDF, 384 kB)
11. Exercise Sheet (PDF, 317 kB)	11. Solution Sheet (PDF, 627 kB)
12. Exercise Sheet (PDF, 467 kB)	12. Solution Sheet (PDF, 511 kB)

Repository to collaborate at the programming tasks: https://github.com/aferdina/RLFiniteGames

Solutions for the programming tasks in https://github.com/aferdina/Solution_Exercise

Lecture notes
Lecture notes (PDF, 5 MB)
Further reading
Sutton & Barto: “Reinforcement Learning – an Introduction” is available online. This covers all major ideas but skipps essentially all details. In essence, this lecture course follows the core ideas of Sutton & Barto but tries to include as much of the missing mathematics as possible.

Reinforcement Learning

Why is reinforcement learning useful?

Target group

Team

Weekly schedule

Oral exams

Exercises

Lecture notes

Further reading

Chair of Probability Theory

FORUM