Counterfactual Fairness Filter for Fair-Delay Multi-Robot Navigation

1OMRON SINIC X2University of Tokyo* Work done as an intern at OMRON SINIC X


Multi-robot navigation is the task of finding trajectories for a team of robotic agents to reach their destinations as quickly as possible without collisions. In this work, we introduce a new problem: fair-delay multi-robot navigation, which aims not only to enable such efficient, safe travels but also to equalize the travel delays among agents in terms of actual trajectories as compared to the best possible trajectories. Learning a navigation policy to achieve this objective requires resolving a nontrivial credit assignment problem with robotic agents having continuous action spaces. Hence, we developed a new algorithm called Navigation with Counterfactual Fairness Filter (NCF2). With NCF2, each agent performs counterfactual inference on whether it can advance toward its goal or should stay still to let other agents go. Doing so allows us to address the credit assignment problem effectively and improve fairness regarding travel delays while maintaining high efficiency and safety. Our extensive experimental results in several challenging multi-robot navigation environments demonstrate the greater effectiveness of NCF2 as compared to state-of-the-art fairness-aware multi-agent reinforcement learning methods.

Considering fairness in multi-robot navigation tasks

Time is equally precious to everyone. Having only certain people experience delays in services where efficiency is critical would lead to unfairness. Familiar examples include food delivery and cab dispatch services: imagine a situation in which your afternoon meeting is about to start, but only the food that you ordered has not been delivered!

We are interested in achieving such temporal fairness among multiple autonomous mobile robotic agents. Fair-delay multi-robot navigation is a novel task that extends fairness-aware multi-agent reinforcement learning (MARL) for multi-robot navigation domains. In this task, each robot must travel to its destination not only efficiently, safely, but also fairly in terms of temporal delays. Specifically, we aim to find navigation policies that can equalize the delays among agents between their actual trajectories and the best possible trajectories they could have taken by ignoring the presence of other agents while also pursuing high efficiency and safety.

Concept of Navigation with Counterfactual Fairness Filter (NCF2)

In multi-agent navigation, an agent's actions are influenced by its own actions and those of neighboring agents. This makes it difficult to calculate which actions contribute to fairness, a problem referred to as the credit assignment problem. We propose Navigation with Counterfactual Fairness Filter (NCF2) to address this issue.

NCF2 is a decentralized, counterfactual algorithm for fair-delay multi-robot navigation tasks. The key idea is to equip each agent with a policy comprising a navigation module and a counterfactual fairness filter (CF2) module. The navigation module enables agents to take continuous actions to reach their destinations efficiently and safely. At the same time, the CF2 module is learned via counterfactual inference to decide if the agents can advance toward their goal or should stay still to let others go to improve the trade-off between fairness and efficiency.


We employ the following steps to learn the CF2 module.

  1. The navigation module computes the cooperative actions, considering all neighboring agents' actions, which results in a conservative action.
  2. CF2 module decides whether the agent should advance toward its goal or should stay still to let other agents go and pass and send messages to the agent who will stay still or not to neighboring agents. The orange agent's CF2 module decides to stop and try to make space for the blue agent.
  3. The navigation module recalculates its action, taking into account the message from the CF2 module. If the CF2 module of a neighboring agent decides to stop, the agent can ignore this neighbor's next action, allowing it to take a more aggressive approach. The blue agent can ignore the orange agent's action so that the blue agent can move forward.

Through these steps, the CF2 module can calculate how much its own stopping action contributes to the actions of neighboring agents, thereby effectively solving complex credit assignment problems.


We evaluate the effectiveness of our approach on a set of challenging multi-agent navigation environments involving multiple wheeled robotic agents with a simple but practical kinematics model and a continuous action space. NCF2 is compared against several state-of-the-art fairness-aware MARL baselines, including Fair-Efficient Networks (FEN) [Jiang+, 2019] and Self-Oriented Team-Oriented Network (SOTO) [Zimmer+, 2021].

NCF2 had a much better balance between success rate (safety), makespan (efficiency), and variance of delays (fairness) than the baseline methods.

FEN [Jiang+, 2019]SOTO [Zimmer+, 2021]NCF2 (Proposed)
Uniform-8-25Success Rate (↑)Makespan (↓)Variance of Delays (↓)
Corner-12-25Success Rate (↑)Makespan (↓)Variance of Delays (↓)


# arXiv version
  title={Counterfactual Fairness Filter for Fair-Delay Multi-Robot Navigation},
  author={Asano, Hikaru and Yonetani, Ryo and Nishimura, Mai and Kozuno, Tadashi},
  journal={arXiv preprint arXiv:2305.11465},
# AAMAS version
  title={Counterfactual Fairness Filter for Fair-Delay Multi-Robot Navigation},
  author={Asano, Hikaru and Yonetani, Ryo and Nishimura, Mai and Kozuno, Tadashi},
  booktitle={Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems},