Reinforcement Learning for POMDP: Rollout and Policy Iteration with Application to Sequential Repair

Citation:

Thomas Wheeler, Ezhil Bharathi, and Stephanie Gil. 5/20/2019. “Reinforcement Learning for POMDP: Rollout and Policy Iteration with Application to Sequential Repair.” IEEE International Conference on Robotics and Automation (ICRA).
Reinforcement Learning for POMDP: Rollout and Policy Iteration with Application to Sequential Repair

Abstract:

We study rollout algorithms which combine limited lookahead and terminal cost function approximation in the context of POMDP. We demonstrate their effectiveness in the context of a sequential pipeline repair problem, which also arises in other contexts of search and rescue. We provide performance bounds and empirical validation of the methodology, in both cases of a single rollout iteration, and multiple iterations with intermediate policy space approximations.
Last updated on 07/06/2021