Multi-Robot Sequential Decision Making

Simulation Video 

 

Decision-making under uncertainty with multiple-robots systems is essential for various applications. Real-world robotic sequential decision problems entail several challenges, including partial observation, dynamic environment, a large state space due to a complex system, and a large control space given by multiple agents, partial communication. We consider infinite-horizon discounted Markov decision problems under partial observation (POMDP) with finite, discrete state and control space. The size of the state space is 1037, and the control space is 107 for a team of 10 agents. 

In the paper [1], we discuss an algorithm that uses multi-step lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline. We present a favorable comparison of our algorithm with the standard rollout and other existing algorithms, including POMCP and DESPOT. 

In the paper [2], we discuss methods that specifically address the computational challenges of partially observable multiagent problems. We discuss and compare algorithms that simultaneously or sequentially optimize the agents’ controls by using multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This method dramatically reduces required computation while preserving the key cost improvement property of the standard methods. The per-step computational requirements for our method are linear as compared with exponential in terms of the number of agents for the standard methods. We apply our method to a challenging problem with a graph structure, including a class of robot repair problems whereby multiple robots collaboratively inspect and repair a system under partial information. We provide a simulation study that compares our methods with existing methods. Additionally, we incorporate our multiagent rollout algorithms as building blocks in an approximate policy iteration scheme, where successive rollout policies are approximated by using neural network classifiers. While this scheme requires a strictly off-line implementation, it works well in our computational experiments and produces additional significant performance improvement over the single online rollout iteration method.

Presentation on Multiagent Rollout and Policy Iteration for POMDP with Application to Multi-Robot Repair Problems in CoRL 2020

        

Multi-Robot Sequential Decision Making Publications

Orhan Eren Akgün, Arif Kerem Dayı, Stephanie Gil, and Angelia Nedić. Submitted. “Learning Trust Over Directed Graphs in Multiagent Systems (extended version)”. Publisher's VersionAbstract
We address the problem of learning the legitimacy of other agents in a multiagent network when an unknown subset is comprised of malicious actors. We specifically derive results for the case of directed graphs and where stochastic side information, or observations of trust, is available. We refer to this as ``learning trust'' since agents must identify which neighbors in the network are reliable, and we derive a protocol to achieve this. We also provide analytical results showing that under this protocol i) agents can learn the legitimacy of all other agents almost surely, and that ii) the opinions of the agents converge in mean to the true legitimacy of all other agents in the network. Lastly, we provide numerical studies showing that our convergence results hold in practice for various network topologies and variations in the number of malicious agents in the network.
Sushmita Bhattacharya, Siva Kailas, Sahil Badyal, Stephanie Gil, and Dimitri Bertsekas. 11/9/2020. “Multiagent Rollout and Policy Iteration for POMDP with Application to Multi-Robot Repair Problems.” In 4th Conference on Robot Learning. Cambridge MA, USA. Publisher's VersionAbstract
In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, partial state observations, and a multiagent structure. We discuss and compare algorithms that simultaneously or sequentially optimize the agents' controls by using multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. Our methods specifically address the computational challenges of partially observable multiagent problems. In particular: 1) We consider rollout algorithms that dramatically reduce required computation while preserving the key cost improvement property of the standard rollout method. The per-step computational requirements for our methods are on the order of O(Cm) as compared with O(Cm) for standard rollout, where C is the maximum cardinality of the constraint set for the control component of each agent, and m is the number of agents. 2) We show that our methods can be applied to challenging problems with a graph structure, including a class of robot repair problems whereby multiple robots collaboratively inspect and repair a system under partial information. 3) We provide a simulation study that compares our methods with existing methods, and demonstrate that our methods can handle larger and more complex partially observable multiagent problems (state space size 1037 and control space size 107, respectively). Finally, we incorporate our multiagent rollout algorithms as building blocks in an approximate policy iteration scheme, where successive rollout policies are approximated by using neural network classifiers. While this scheme requires a strictly off-line implementation, it works well in our computational experiments and produces additional significant performance improvement over the single online rollout iteration method.
Reinforcement Learning for POMDP: Rollout and Policy Iteration with Application to Sequential Repair
Thomas Wheeler, Ezhil Bharathi, and Stephanie Gil. 5/20/2019. “Reinforcement Learning for POMDP: Rollout and Policy Iteration with Application to Sequential Repair.” IEEE International Conference on Robotics and Automation (ICRA).Abstract
We study rollout algorithms which combine limited lookahead and terminal cost function approximation in the context of POMDP. We demonstrate their effectiveness in the context of a sequential pipeline repair problem, which also arises in other contexts of search and rescue. We provide performance bounds and empirical validation of the methodology, in both cases of a single rollout iteration, and multiple iterations with intermediate policy space approximations.
Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems
Sushmita Bhattacharya, Sahil Badyal, Thomas Wheeler, Stephanie Gil, and Dimitri Bertsekas. 1/23/2020. “Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems.” RAL 1/23/2020. Abstract
In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approxi- mated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline.
More