Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards

Lukas Brunke 1,2,3, Yanni Zhang 1, Ralf Römer 1, Jack Naimer 1,2,
Nikola Staykov1, Siqi Zhou1, Angela P. Schoellig1,2,3
1Technical University of Munich, 2University of Toronto, 3Vector Institute

We propose a semantic safety filter leveraging the combination of 3D perception and large language models (LLMs) for robots to comply with "common-sense" constraints.

Abstract

Ensuring safe interactions in human-centric environments requires robots to understand and adhere to constraints recognized by humans as “common sense” (e.g., “moving a cup of water above a laptop is unsafe as the water may spill” or “rotating a cup of water is unsafe as it can lead to pouring its content”). Recent advances in computer vision and machine learning have enabled robots to acquire a semantic understanding of and reason about their operating environments. While extensive literature on safe robot decision-making exists, semantic understanding is rarely integrated into these formulations. In this work, we propose a semantic safety filter framework to certify robot inputs with respect to semantically defined constraints (e.g., unsafe positional relationships, behaviours, and poses) and geometrically defined safety constraints (e.g., environment-collision and self-collision constraints). In our proposed approach, given perception inputs, we build a semantic map of the 3D environment and leverage the contextual reasoning capabilities of large language models to infer semantically unsafe conditions. These semantically unsafe conditions are then mapped to safe actions through a control barrier certification formulation. We evaluated our semantic safety filter approach in teleoperated tabletop manipulation tasks and pick-and-place tasks, demonstrating its effectiveness in incorporating semantic constraints to ensure safe robot operation beyond collision avoidance.

Approach

We give an overview of our proposed semantic safety filter framework. The perception module segments the visual input and builds a semantic world representation. The LLM is queried based on the list of semantic labels and the manipulated object. It outputs the semantic context, which contains a list of unsafe position-based semantic constraints for each object in the scene, a list of behavioral-based semantic constraints, and a pose-based semantic constraint. The semantic context, together with the point clouds of the objects in the scene, are then used to define safe sets for our proposed semantic safety filter. Additionally, based on the semantic context, the safety filter's parameters are adapted, for example, to prevent end effector rotations or to approach certain objects more carefully. In every robot control loop, either a human operator or a motion policy provides an uncertified input. The high-level command is mapped to the joint velocity through differential inverse kinematics, certified by the proposed semantic safety filter, and then sent to the robot system.


Semantic Safety Filter

For each scene, environment collision constraints are generated based on the point clouds of individual objects while the semantic constraints are synthesized based on the point clouds and labels of individual objects as well as the semantic safety conditions from the LLM. The semantic safety conditions are further categorized into spatial relationship constraints (blue text), behavioural constraints (orange text), and end effector pose constraints (green text).


Robot Experiments

We present the experimental evaluation of the proposed semantic safety filter. In the real-world experiment, a Franka Emika FR3 robotic manipulator is deployed with our semantic safety filter in closed- loop to prevent potentially unsafe commands from a non- expert user or a motion policy. Below we show robot manipulation experiments using an unsafe motion policy in closed-loop with our semantic safety filter. The teleoperation experiments can be found above or here: tiny.cc/semantic-manipulation.

Our scene has 17 objects of various types and is represented using fitted superquadrics shown in red. First, the robot is tasked to transport a dry sponge across the table from left to right. As the sponge is dry and soft, there are no unsafe semantic constraint between the manipulated object and the objects in the scene and the manipulator may move at a normal speed. Additionally, the robot's end effector is allowed to rotate freely and move above the laptop while avoiding collisions as shown in the following video.

Second, the robot is tasked to transport a cup of water across the table from right to left. In this case there are semantically unsafe conditions once the robot picks up the cup of water. The unsafe spatial relationships, for example {laptop, above}, are highlighted in blue. Moreover, the robot is also required to move cautiously and to reduce end effector rotations.

With these semantic constraints, the robot end effector slowly moves around the laptop while minimizing the rotation of the cup to avoid the potential spillage of water over the laptop as shown in the following video. The robot then successfully places the cup of water on the left side of the table and the semantic constraints become inactive again.


Watch the Full Video

BibTeX

@misc{semantic-manipulation2024,
          title={Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards}, 
          author={Lukas Brunke and Yanni Zhang and Ralf R{\"o}mer and Jack Naimer and Nikola Staykov and Siqi Zhou and Angela P. Schoellig},
          year={2024},
          eprint={2410.15185},
          archivePrefix={arXiv},
          primaryClass={cs.RO},
          url={https://arxiv.org/abs/2410.15185}, 
        }