Selected Publications

In Preparation
Matthew Cavorsi, Orhan Eren Akgün, Michal Yemini, Andrea Goldsmith, and Stephanie Gil. In Preparation. “Exploiting Trust for Resilient Hypothesis Testing with Malicious Robots”. Publisher's VersionAbstract
We develop a resilient binary hypothesis testing framework for decision making in adversarial multi-robot crowdsensing tasks. This framework exploits stochastic trust observations between robots to arrive at tractable, resilient decision making at a centralized Fusion Center (FC) even when i) there exist malicious robots in the network and their number may be larger than the number of legitimate robots, and ii) the FC uses one-shot noisy measurements from all robots. We derive two algorithms to achieve this. The first is the Two Stage Approach (2SA) that estimates the legitimacy of robots based on received trust observations, and provably minimizes the probability of detection error in the worst-case malicious attack. Here, the proportion of malicious robots is known but arbitrary. For the case of an unknown proportion of malicious robots, we develop the Adversarial Generalized Likelihood Ratio Test (A-GLRT) that uses both the reported robot measurements and trust observations to estimate the trustworthiness of robots, their reporting strategy, and the correct hypothesis simultaneously. We exploit special problem structure to show that this approach remains computationally tractable despite several unknown problem parameters. We deploy both algorithms in a hardware experiment where a group of robots conducts crowdsensing of traffic conditions on a mock-up road network similar in spirit to Google Maps, subject to a Sybil attack. We extract the trust observations for each robot from actual communication signals which provide statistical information on the uniqueness of the sender. We show that even when the malicious robots are in the majority, the FC can reduce the probability of detection error to 30.5% and 29% for the 2SA and the A-GLRT respectively.
exploiting_trust_for_resilient_hypothesis_testing_with_malicious_robots.pdf
Daniel Garces, Sushmita Bhattacharya, Stephanie Gil, and Dimitri Bertsekas. In Preparation. “Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand.” In ICRA. Philadelphia, PA. Publisher's VersionAbstract
We derive a learning framework to generate routing/pickup policies for a fleet of vehicles tasked with servicing stochastically appearing requests on a city map. We focus on policies that 1) give rise to coordination amongst the vehicles, thereby reducing wait times for servicing requests, 2) are non-myopic, considering a-priori unknown potential future requests, and 3) can adapt to changes in the underlying demand distribution. Specifically, we are interested in adapting to fluctuations of actual demand conditions in urban environments, such as on-peak vs. off-peak hours. We achieve this through a combination of (i) online play, a lookahead optimization method that improves the performance of rollout methods via an approximate policy iteration step, and (ii) an offline approximation scheme that allows for adapting to changes in the underlying demand model. In particular, we achieve adaptivity of our learned policy to different demand distributions by quantifying a region of validity using the q-valid radius of a Wasserstein Ambiguity Set. We propose a mechanism for switching the originally trained offline approximation when the current demand is outside the original validity region. In this case, we propose to use an offline architecture, trained on a historical demand model that is closer to the current demand in terms of Wasserstein distance. We learn routing and pickup policies over real taxicab requests in downtown San Francisco with high variability between on-peak and off-peak hours, demonstrating the ability of our method to adapt to real fluctuation in demand distributions. Our numerical results demonstrate that our method outperforms rollout-based reinforcement learning, as well as several benchmarks based on classical methods from the field of operations research.
multiagent_reinforcement_autonomous_routing_adaptation_variable_demand.pdf
Matthew Cavorsi, Beatrice Capelli, Lorenzo Sabattini, and Stephanie Gil. In Preparation. “Multi-Robot Adversarial Resilience using Control Barrier Functions”.Abstract

In this paper we present a control barrier function-based (CBF) resilience controller that provides resilience in a multi-robot network to adversaries. Previous approaches provide resilience by virtue of specific linear combinations of multiple control constraints. These combinations can be difficult to find and are sensitive to the addition of new constraints. Unlike previous approaches, the proposed CBF provides network resilience and is easily amenable to multiple other control constraints, such as collision and obstacle avoidance. The inclusion of such constraints is essential in order to implement a resilience controller on realistic robot platforms. We demonstrate the iability of the CBF-based resilience controller on real robotic systems through case studies on a multi-robot flocking problem in cluttered environments with the presence of adversarial robots.

Ninad Jadhav, Meghna Behari, Robert Wood, and Stephanie Gil. In Preparation. “Multi-Robot exploration without Explicit Information Exchange.” In .Abstract
We present a novel method to coordinate a team of multiple robots in a frontier-based exploration task that leverages wireless signals to retrieve relative positions, and weights information gain for different frontiers accordingly. Previous multi-robot frontier-based exploration methods depend on explicit information exchange amongst robots such as a shared global map or realtime position exchanges. We do not rely on explicit information exchange and instead use range and bearing extracted directly from wireless signals that allow robots to estimate relative positions of their neighbors within the team, even in non-line-of-sight scenarios. Our method shows that even without a centralized system, and without explicit information exchange, a team of multiple robots can coordinate exploration using these relative positions in order to minimize simultaneous overlap of their individual explored regions. We validate our proposed algorithm on real robots in cluttered 300m2 environment as well as in simulation. 
author_version.pdf
Michal Yemini, Angelia Nedi´c, Andrea J. Goldsmith, and Stephanie Gil. In Preparation. “Resilient Distributed Optimization for Multi-Agent Cyberphysical Systems”.Abstract
Enhancing resilience in distributed networks in the face of malicious agents is an important problem for which many key theoretical results and applications require further development and characterization. This work focuses on the problem of distributed optimization in multi-agent cyberphysical systems, where a legitimate agent’s dynamic is influenced both by the values it receives from potentially malicious neighboring agents, and by its own self-serving target function. We develop a new algorithmic and analytical framework to achieve resilience for the class of problems where stochastic values of trust between agents exist and can be exploited. In this case we show that convergence to the true global optimal point can be recovered, both in mean and almost surely, even in the presence of malicious agents. Furthermore, we provide expected convergence rate guarantees in the form of upper bounds on the expected squared distance to the optimal value. Finally, we present numerical results that validate the analytical convergence guarantees we present in this paper even when the malicious agents compose the majority of agents in the network.
Submitted
Orhan Eren Akgün, Arif Kerem Dayı, Stephanie Gil, and Angelia Nedić. Submitted. “Learning Trust Over Directed Graphs in Multiagent Systems (extended version)”. Publisher's VersionAbstract
We address the problem of learning the legitimacy of other agents in a multiagent network when an unknown subset is comprised of malicious actors. We specifically derive results for the case of directed graphs and where stochastic side information, or observations of trust, is available. We refer to this as ``learning trust'' since agents must identify which neighbors in the network are reliable, and we derive a protocol to achieve this. We also provide analytical results showing that under this protocol i) agents can learn the legitimacy of all other agents almost surely, and that ii) the opinions of the agents converge in mean to the true legitimacy of all other agents in the network. Lastly, we provide numerical studies showing that our convergence results hold in practice for various network topologies and variations in the number of malicious agents in the network.
learning_trust_over_directed_graphs_in_multiagent_systems.pdf
Forthcoming
Ninad Jadhav, Weiying Wang, Diana Zhang, Swarun Kumar, and Stephanie Gil. Forthcoming. “Toolbox Release: A WiFi-Based Relative Bearing Framework for Robotics.” IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022.Abstract
This paper presents the WiFi-Sensor-for-Robotics (WSR) open-source toolbox. It enables robots in a team to obtain relative bearing to each other, even in non-line-of-sight (NLOS) settings which is a very challenging problem in robotics. It does so by analyzing the phase of their communicated WiFi signals as the robots traverse the environment. This capability, based on the theory developed in our prior works, is made available for the first time as an open-source toolbox. It is motivated by the lack of easily deployable solutions that use robots' local resources (e.g WiFi) for sensing in NLOS. This has implications for multi-robot mapping and rendezvous, ad-hoc robot networks, and security in multi-robot teams, amongst other applications. The toolbox is designed for distributed and online deployment on robot platforms using commodity hardware and on-board sensors. We also release datasets demonstrating its performance in NLOS and line-of-sight (LOS) settings and for a multi-robot localization use case. Empirical results for hardware experiments show that the bearing estimation from our toolbox achieves accuracy with mean and standard deviation of 1.13 degrees, 11.07 degrees in LOS and 6.04 degrees, 26.4 degrees for NLOS, respectively, in an indoor office environment.
author_version.pdf
2022
Ninad Jadhav, Weiying Wang, Diana Zhang, Oussama Khatib, Swarun Kumar, and Stephanie Gil. 9/26/2022. “A wireless signal-based sensing framework for robotics.” International Journal of Robotics Research, 2022, Volume 41, Issue 11-12, Pp. 955–992. Publisher's VersionAbstract
In this paper, we develop the analytical framework for a novel Wireless signal-based Sensing capability for Robotics (WSR) by leveraging robots' mobility in 3D space. It allows robots to primarily measure relative direction, or Angle-of-Arrival (AOA), to other robots, while operating in non-line-of-sight unmapped environments and without requiring external infrastructure. We do so by capturing all of the paths that a wireless signal traverses as it travels from a transmitting to a receiving robot in the team, which we term as an AOA profile. The key intuition behind our approach is to enable a robot to emulate antenna arrays as it moves freely in 2D and 3D space. The small differences in the phase of the wireless signals are thus processed with knowledge of robots' local displacement to obtain the profile, via a method akin to Synthetic Aperture Radar (SAR). The main contribution of this work is the development of i) a framework to accommodate arbitrary 2D and 3D motion, as well as continuous mobility of both signal transmitting and receiving robots, while computing AOA profiles between them and ii) a Cramer-Rao Bound analysis, based on antenna array theory, that provides a lower bound on the variance in AOA estimation as a function of the geometry of robot motion. This is a critical distinction with previous work on SAR-based methods that restrict robot mobility to prescribed motion patterns, do not generalize to the full 3D space, and require transmitting robots to be stationary during data acquisition periods. We show that allowing robots to use their full mobility in 3D space while performing SAR results in more accurate AOA profiles and thus better AOA estimation. We formally characterize this observation as the informativeness of the robots' motion; a computable quantity for which we derive a closed form. All analytical developments are substantiated by extensive simulation and hardware experiments on air/ground robot platforms using 5GHz WiFi. Our experimental results bolster our analytical findings, demonstrating that 3D motion provides enhanced and consistent accuracy, with a total AOA error of less than 10 degree for 95% of trials. We also analytically characterize the impact of displacement estimation errors on the measured AOA, and validate this theory empirically using robot displacements obtained using an off-the-shelf Intel Tracking Camera T265. Finally, we demonstrate the performance of our system on a multi-robot task where a heterogeneous air/ground pair of robots continuously measure AOA profiles over a WiFi link to achieve dynamic rendezvous in an unmapped, 300m2 environment with occlusions.
author_version.pdf
Michal Yemini, Stephanie Gil, and Andrea J. Goldsmith. 8/23/2022. “Cloud-Cluster Architecture for Detection in Intermittently Connected Sensor Networks.” IEEE Transactions on Wireless Communications, 1536-1276, Pp. 1. Publisher's VersionAbstract
We consider a centralized detection problem where sensors experience noisy measurements and intermittent connectivity to a centralized fusion center. The sensors collaborate locally within predefined sensor clusters and fuse their noisy sensor data to reach a common local estimate of the detected event in each cluster. The connectivity of each sensor cluster is intermittent and depends on the available communication opportunities of the sensors to the fusion center. Upon receiving the estimates from all the connected sensor clusters the fusion center fuses the received estimates to make a final determination regarding the occurrence of the event across the deployment area. We refer to this hybrid communication scheme as a cloud-cluster architecture. We propose a method for optimizing the decision rule for each cluster and analyzing the expected detection performance resulting from our hybrid scheme. Our method is tractable and addresses the high computational complexity caused by heterogeneous sensors’ and clusters’ detection quality, heterogeneity in their communication opportunities, and non-convexity of the loss function. Our analysis shows that clustering the sensors provides resilience to noise in the case of low sensor communication probability with the cloud. For larger clusters, a steep improvement in detection performance is possible even for a low communication probability by using our cloud-cluster architecture.
cloud-cluster_architecture_for_detection_in_intermittently_connected_sensor_networks.pdf
Matthew Cavorsi, Ninad Jadhav, David Saldaña, and Stephanie Gil. 2022. “Adaptive malicious robot detection in dynamic topologies.” In IEEE Conference on Decisions and Control (CDC). Cancun, Mexico.Abstract
We consider a class of problems where robots
gather observations of each other to assess the legitimacy
of their peers. Previous works propose accurate detection of
malicious robots when robots are able to extract observations of
each other for a long enough time. However, they often consider
static networks where the set of neighbors a robot observes
remains the same. Mobile robots experience a dynamic set of
neighbors as they move, making the acquisition of adequate
observations more difficult. We design a stochastic policy that
enables the robots to periodically gather observations of every
other robot, while simultaneously satisfying a desired robot
distribution over an environment modeled by sites. We show
that with this policy, any pre-existing or new malicious robot in
the network will be detected in a finite amount of time, which
we minimize and also characterize. We derive bounds on the
time needed to obtain the desired number of observations for a
given topological map and validate these bounds in simulation.
We also show and verify in a hardware experiment that the
team is able to successfully detect malicious robots, and thus
estimate the true distribution of cooperative robots per site, in
order to converge to the desired robot distribution over sites.
Matthew Cavorsi, Beatrice Capelli, Lorenzo Sabattini, and Stephanie Gil. 2022. “Multi-robot adversarial resilience using control barrier functions.” In Robotics Science and Systems (RSS) Conference.Abstract

In this paper we present a control barrier function-based (CBF) resilience controller that provides resilience in a multi-robot network to adversaries. Previous approaches provide resilience by virtue of specific linear combinations of multiple control constraints. These combinations can be difficult to find and are sensitive to the addition of new constraints. Unlike previous approaches, the proposed CBF provides network resilience and is easily amenable to multiple other control constraints, such as collision and obstacle avoidance. The inclusion of such constraints is essential in order to implement a resilience controller on realistic robot platforms. We demonstrate the viability of the CBF-based resilience controller on real robotic systems through case studies on a multi-robot flocking problem in cluttered environments with the presence of adversarial robots.

Matthew Cavorsi and Stephanie Gil. 2022. “Providing local resilience to vulnerable areas in robotic networks.” In IEEE International Conference on Robotics and Automation (ICRA), Pp. 4929-4935. Philadelphia, PA.Abstract
We study how information flows through a multirobot network in order to better understand how to provide resilience to malicious information. While the notion of global resilience is well studied, one way existing methods provide global resilience is by bringing robots closer together to improve the connectivity of the network. However, large changes in network structure can impede the team from performing other functions such as coverage, where the robots need to spread apart. Our goal is to mitigate the trade-off between resilience and network structure preservation by applying resilience locally in areas of the network where it is needed most. We introduce a metric, Influence, to identify vulnerable regions in the network requiring resilience. We design a control law targeting local resilience to the vulnerable areas by improving the connectivity of robots within these areas so that each robot has at least 2F +1 vertex-disjoint communication paths between itself and the high influence robot in the vulnerable area. We demonstrate the performance of our local resilience controller in simulation and in hardware by applying it to a coverage problem and comparing our results with an existing global resilience strategy. For the specific hardware experiments, we show that our control provides local resilience to vulnerable areas in the network while only requiring 9.90% and 15.14% deviations from the desired team formation compared to the global strategy.
2021
Characterizing Trust and Resilience in Distributed Consensus for Cyberphysical Systems
Michal Yemini, Angelia Nedic ́, Andrea Goldsmith, and Stephanie Gil. 2021. “Characterizing Trust and Resilience in Distributed Consensus for Cyberphysical Systems.” Transactions on Robotics Journal. Publisher's VersionAbstract
This work considers the problem of resilient consensus where stochastic values of trust between agents are available. Specifically, we derive a unified mathematical framework to characterize convergence, deviation of the consensus from the true consensus value, and expected convergence rate, when there exists additional information of trust between agents. We show that under certain conditions on the stochastic trust values and consensus protocol: 1) almost sure convergence to a common limit value is possible even when malicious agents constitute more than half of the network connectivity, 2) the deviation of the converged limit, from the case where there is no attack, i.e., the true consensus value, can be bounded with probability that approaches 1 exponentially, and 3) correct classification of malicious and legitimate agents can be attained in finite time almost surely. Further, the expected convergence rate decays exponentially with the quality of the trust observations between agents.
characterizing_trust.pdf
Crowd Vetting: Rejecting Adversaries via Collaboration--with Application to Multi-Robot Flocking
Frederik Mallmann-Trenn, Matthew Cavorsi, and Stephanie Gil. 2021. “Crowd Vetting: Rejecting Adversaries via Collaboration--with Application to Multi-Robot Flocking.” Transactions on Robotics Journal.Abstract
We characterize the advantage of using a robot's neighborhood to find and eliminate adversarial robots in the presence of a Sybil attack. We show that by leveraging the opinions of its neighbors on the trustworthiness of transmitted data, robots can detect adversaries with high probability. We characterize a number of communication rounds required to achieve this result to be a function of the communication quality and the proportion of legitimate to malicious robots. This result enables increased resiliency of many multi-robot algorithms. Because our results are finite time and not asymptotic, they are particularly well-suited for problems with a time critical nature. We develop two algorithms, \emph{FindSpoofedRobots} that determines trusted neighbors with high probability, and \emph{FindResilientAdjacencyMatrix} that enables distributed computation of graph properties in an adversarial setting. We apply our methods to a flocking problem where a team of robots must track a moving target in the presence of adversarial robots. We show that by using our algorithms, the team of robots are able to maintain tracking ability of the dynamic target.
crowd_vetting.pdf
2020
Andrea Goldsmith, Stephanie Gil, and Michal Yemini. 12/7/2020. “Exploiting Local and Cloud Sensor Fusion in Intermittently Connected Sensor Networks.” In GLOBECOM 2020 - 2020 IEEE Global Communications Conference. Taipei, Taiwan: IEEE.Abstract
We consider a detection problem where sensors experience noisy measurements and intermittent communication opportunities to a centralized fusion center (or cloud). The objective of the problem is to arrive at the correct estimate of event detection in the environment. The sensors may communicate locally with other sensors (local clusters) where they fuse their noisy sensor data to estimate the detection of an event locally. In addition, each sensor cluster can intermittently communicate to the cloud, where a centralized fusion center fuses estimates from all sensor clusters to make a final determination regarding the occurrence of the event across the deployment area. We refer to this hybrid communication scheme as a cloud-cluster architecture. Minimizing the expected loss function of networks where noisy sensors are intermittently connected to the cloud, as in our hybrid communication scheme, has not been investigated to our knowledge. We leverage recently improved concentration inequalities to arrive at an optimized decision rule for each cluster and we analyze the expected detection performance resulting from our hybrid scheme. Our analysis shows that clustering the sensors provides resilience to noise in the case of low communication probability with the cloud. For larger clusters, a steep improvement in detection performance is possible even for a low communication probability by using our cloud-cluster architecture.
Sushmita Bhattacharya, Siva Kailas, Sahil Badyal, Stephanie Gil, and Dimitri Bertsekas. 11/9/2020. “Multiagent Rollout and Policy Iteration for POMDP with Application to Multi-Robot Repair Problems.” In 4th Conference on Robot Learning. Cambridge MA, USA. Publisher's VersionAbstract
In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, partial state observations, and a multiagent structure. We discuss and compare algorithms that simultaneously or sequentially optimize the agents' controls by using multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. Our methods specifically address the computational challenges of partially observable multiagent problems. In particular: 1) We consider rollout algorithms that dramatically reduce required computation while preserving the key cost improvement property of the standard rollout method. The per-step computational requirements for our methods are on the order of O(Cm) as compared with O(Cm) for standard rollout, where C is the maximum cardinality of the constraint set for the control component of each agent, and m is the number of agents. 2) We show that our methods can be applied to challenging problems with a graph structure, including a class of robot repair problems whereby multiple robots collaboratively inspect and repair a system under partial information. 3) We provide a simulation study that compares our methods with existing methods, and demonstrate that our methods can handle larger and more complex partially observable multiagent problems (state space size 1037 and control space size 107, respectively). Finally, we incorporate our multiagent rollout algorithms as building blocks in an approximate policy iteration scheme, where successive rollout policies are approximated by using neural network classifiers. While this scheme requires a strictly off-line implementation, it works well in our computational experiments and produces additional significant performance improvement over the single online rollout iteration method.
bhattacharya21a.pdf
Plug-and-Play Supervisory Control Using Muscle and Brain Signals for Real-Time Gesture and Error Detection
Ramin Hasani, Andres F. Salazar-Gomez, Stephanie Gil, Joseph DelPreto, Frank H. Guenther, and Daniela Rus. 8/9/2020. “Plug-and-Play Supervisory Control Using Muscle and Brain Signals for Real-Time Gesture and Error Detection.” Autonomous Robots volume, 44, Pp. 1303–1322. Publisher's VersionAbstract
Effective human supervision of robots can be key for ensuring correct robot operation in a variety of potentially safety-critical scenarios. This paper takes a step towards fast and reliable human intervention in supervisory control tasks by combining two streams of human biosignals: muscle and brain activity acquired via EMG and EEG, respectively. It presents continuous classification of left and right hand-gestures using muscle signals, time-locked classification of error-related potentials using brain signals (unconsciously produced when observing an error), and a framework that combines these pipelines to detect and correct robot mistakes during multiple-choice tasks. The resulting hybrid system is evaluated in a “plug-and-play” fashion with 7 untrained subjects supervising an autonomous robot performing a target selection task. Offline analysis further explores the EMG classification performance, and investigates methods to select subsets of training data that may facilitate generalizable plug-and-play classifiers.
plug-and-play_supervisory_control.pdf
Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems
Sushmita Bhattacharya, Sahil Badyal, Thomas Wheeler, Stephanie Gil, and Dimitri Bertsekas. 1/23/2020. “Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems.” RAL 1/23/2020. Abstract
In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approxi- mated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline.
Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems
2019
Active Rendezvous for Multi-Robot Pose Graph Optimization using Sensing over Wi-Fi
Weiying Wang, Ninad Jadhav, Paul Vohs, Nathan Hughes, Mark Mazumder, and Stephanie Gil. 12/29/2019. “Active Rendezvous for Multi-Robot Pose Graph Optimization using Sensing over Wi-Fi.” In International Symposium on Robotics Research (ISRR). Hanoi: Springer Proceedings in Advanced Robotics. Publisher's VersionAbstract
We present a novel framework for collaboration amongst a team of robots performing Pose Graph Optimization (PGO) that ad- dresses two important challenges for multi-robot SLAM: i) that of en- abling information exchange “on-demand” via Active Rendezvous without using a map or the robot’s location, and ii) that of rejecting outlying mea- surements. Our key insight is to exploit relative position data present in the communication channel between robots to improve groundtruth accu- racy of PGO. We develop an algorithmic and experimental framework for integrating Channel State Information (CSI) with multi-robot PGO; it is distributed, and applicable in low-lighting or featureless environments where traditional sensors often fail. We present extensive experimental results on actual robots and observe that using Active Rendezvous re- sults in a 64% reduction in ground truth pose error and that using CSI observations to aid outlier rejection reduces ground truth pose error by 32%. These results show the potential of integrating communication as a novel sensor for SLAM.
Active Rendezvous for Multi-Robot Pose Graph Optimization using Sensing over Wi-Fi
Reinforcement Learning for POMDP: Rollout and Policy Iteration with Application to Sequential Repair
Thomas Wheeler, Ezhil Bharathi, and Stephanie Gil. 5/20/2019. “Reinforcement Learning for POMDP: Rollout and Policy Iteration with Application to Sequential Repair.” IEEE International Conference on Robotics and Automation (ICRA).Abstract
We study rollout algorithms which combine limited lookahead and terminal cost function approximation in the context of POMDP. We demonstrate their effectiveness in the context of a sequential pipeline repair problem, which also arises in other contexts of search and rescue. We provide performance bounds and empirical validation of the methodology, in both cases of a single rollout iteration, and multiple iterations with intermediate policy space approximations.
Reinforcement Learning for POMDP: Rollout and Policy Iteration with Application to Sequential Repair

Pages