Risk-averse learning by temporal difference methods with markov risk measures

Research output: Contribution to journalArticlepeer-review

Abstract

We propose a novel reinforcement learning methodology where the system performance is evaluated by a Markov coherent dynamic risk measure with the use of linear value function approximations. We construct projected risk-averse dynamic programming equations and study their properties. We propose new risk-averse counterparts of the basic and multi-step methods of temporal differences and we prove their convergence with probability one. We also perform an empirical study on a complex transportation problem.

Original languageEnglish (US)
JournalJournal of Machine Learning Research
Volume22
StatePublished - 2021
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Keywords

  • Dynamic Risk Measures
  • Linear Function Approximation
  • Reinforcement Learning
  • Stochastic Approximation
  • Temporal Difference Methods

Fingerprint

Dive into the research topics of 'Risk-averse learning by temporal difference methods with markov risk measures'. Together they form a unique fingerprint.

Cite this