TY - GEN
T1 - Apprenticeship learning via soft local homomorphisms
AU - Boularias, Abdeslam
AU - Chaib-draa, Brahim
PY - 2010
Y1 - 2010
N2 - We consider the problem of apprenticeship learning when the expert's demonstration covers only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient solution to this problem based on the assumption that the expert is optimally acting in a Markov Decision Process (MDP). However, past work on IRL requires an accurate estimate of the frequency of encountering each feature of the states when the robot follows the expert's policy. Given that the complete policy of the expert is unknown, the features frequencies can only be empirically estimated from the demonstrated trajectories. In this paper, we propose to use a transfer method, known as soft homomorphism, in order to generalize the expert's policy to unvisited regions of the state space. The generalized policy can be used either as the robot's final policy, or to calculate the features frequencies within an IRL algorithm. Empirical results show that our approach is able to learn good policies from a small number of demonstrations.
AB - We consider the problem of apprenticeship learning when the expert's demonstration covers only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient solution to this problem based on the assumption that the expert is optimally acting in a Markov Decision Process (MDP). However, past work on IRL requires an accurate estimate of the frequency of encountering each feature of the states when the robot follows the expert's policy. Given that the complete policy of the expert is unknown, the features frequencies can only be empirically estimated from the demonstrated trajectories. In this paper, we propose to use a transfer method, known as soft homomorphism, in order to generalize the expert's policy to unvisited regions of the state space. The generalized policy can be used either as the robot's final policy, or to calculate the features frequencies within an IRL algorithm. Empirical results show that our approach is able to learn good policies from a small number of demonstrations.
UR - http://www.scopus.com/inward/record.url?scp=77955793370&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955793370&partnerID=8YFLogxK
U2 - 10.1109/ROBOT.2010.5509717
DO - 10.1109/ROBOT.2010.5509717
M3 - Conference contribution
AN - SCOPUS:77955793370
SN - 9781424450381
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 2971
EP - 2976
BT - 2010 IEEE International Conference on Robotics and Automation, ICRA 2010
T2 - 2010 IEEE International Conference on Robotics and Automation, ICRA 2010
Y2 - 3 May 2010 through 7 May 2010
ER -