Learning transition models with time-delayed causal relations

Junchi Liang, Abdeslam Boularias

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper introduces an algorithm for discovering implicit and delayed causal relations between events observed by a robot at arbitrary times, with the objective of improving data-efficiency and interpretability of model- based reinforcement learning (RL) techniques. The proposed algorithm initially predicts observations with the Markov assumption, and incrementally introduces new hidden variables to explain and reduce the stochasticity of the observations. The hidden variables are memory units that keep track of pertinent past events. Such events are systematically identified by their information gains. The learned transition and reward models are then used for planning. Experiments on simulated and real robotic tasks show that this method significantly improves over current RL techniques.

Original languageEnglish (US)
Title of host publication2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages8087-8093
Number of pages7
ISBN (Electronic)9781728162126
DOIs
StatePublished - Oct 24 2020
Event2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2020 - Las Vegas, United States
Duration: Oct 24 2020Jan 24 2021

Publication series

NameIEEE International Conference on Intelligent Robots and Systems
ISSN (Print)2153-0858
ISSN (Electronic)2153-0866

Conference

Conference2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2020
Country/TerritoryUnited States
CityLas Vegas
Period10/24/201/24/21

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Software
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Learning transition models with time-delayed causal relations'. Together they form a unique fingerprint.

Cite this