Reinforcement Learning: The state of the Art
Michael N. Katehakis
Distinguished Professor and Chair
Department of Management Science and Information Systems
Rutgers Business School Newark and New Brunswick
Reinforcement Learning (RL) refers to techniques designed for sequential decision making when a system needs to "learn" a strategy that maximizes a reward (or minimizes a cost) criterion when some parameters of the basic underlying model are not known in advance. RL is experiencing significant growth in recognition due to successful applications in many areas of machine learning (ML).
In this talk we provide a survey of the state of the art of the area of computing optimal data driven RL algorithms. Then, we compare the performance of the classic UCB policy of Burnetas and Katehakis (1987) new algorithms recently proposed: optimistic programming, the MDP-Deterministic Minimum Empirical Divergence (MDP-DMED), as well as a method based on Posterior sampling (MDP-PS).
We also discuss the origins of RL and its connection with model and theory of the so-called multi-armed bandit problem.