Self-Preserving Robots Combining Reinforcement Learning with NMPC

Table of Contents

Self-Preserving Robots
Photo by Aideal Hwa on Unsplash

Reasearch Paper Are Available in IEEE (A Neuro-Inspired Control Architecture to Enhance Robot Self-Preservation and Adaptation in Autonomous Navigation Tasks by Andrea Usai and Alessandro Rizzo ) 

Autonomous robots are increasingly deployed in complex environments like search and rescue missions where survival is just as important as completing the assigned task. Traditional navigation methods often struggle to balance these competing needs because model based approaches require tedious manual tuning while learning based methods frequently fail to generalize in dynamic settings. To address these limitations researchers have developed a new control architecture inspired by the human brain's specific mechanisms for processing fear and self preservation.


The core concept parallels the "Low Road" pathway from neuroscience which is responsible for rapid and unconscious physiological responses to threats. This biological model explains how the brain creates immediate survival reactions via the amygdala without waiting for slower cortical processing. By mimicking this structure the proposed system aims to give robots a similar instinct for self preservation when facing dangerous obstacles in unknown environments. 

IEEE REFERENCE

A Neuro-Inspired Control Architecture to Enhance Robot Self-Preservation and Adaptation in Autonomous Navigation Tasks

Authors: A. Usai and A. Rizzo IEEE Robotics and Automation Letters, vol. 10, no. 8, pp. 8491-8497, Aug. 2025
Robots Brain Modeling Adaptation Models Neurorobotics
View on IEEE Xplore

The architecture is built upon three distinct blocks that replicate biological functions to process information. First the Thalamus acts as a nonlinear filter that processes raw sensory data to extract key features like obstacle positions and the robot's current state relative to its goal. This refined information is passed to the Amygdala which is modeled as a Soft Actor Critic reinforcement learning agent. The Amygdala's role is to assess the "fear level" of the environment based on the proximity of threats and output a set of optimal tuning weights rather than direct motor commands.

 

The final component represents the Brainstem Cerebellum connection and is implemented using a Nonlinear Model Predictive Controller or NMPC. Unlike standard controllers that rely on fixed parameters this NMPC receives dynamic weights from the Amygdala in real time. This allows the controller to optimize the robot's trajectory while respecting physical constraints and adjusting its behavior based on the perceived level of danger. For example the robot might behave more cautiously when the Amygdala detects a high risk obstacle.

 

Training the system involves a specialized reward function that penalizes collisions and infeasible solutions while encouraging the robot to reach its goal efficiently. The concept of fear is mathematically quantified using a sigmoid function applied to the inverse distance of obstacles which normalizes the input. This normalization process ensures that the learning agent receives stable observation variables which leads to more robust performance compared to traditional reinforcement learning setups that often suffer from overfitting.

 

In static testing environments the neuro inspired approach demonstrated clear superiority over conventional methods like Artificial Potential Fields. While the potential field method produced erratic paths and took significantly longer to reach the destination the new architecture maintained a safe distance from dangerous objects without sacrificing efficiency. The standard NMPC approached obstacles too closely but the neuro inspired version successfully modulated its path to reduce the calculated fear level and maintain a larger safety margin.

 

Flow from the Research

The benefits were even more pronounced in dynamic scenarios with moving obstacles. Standard controllers often failed to anticipate movement leading to constraint violations and unsafe proximity to threats. In contrast the learning based system effectively differentiated between obstacle danger levels and adjusted the robot's path accordingly to avoid collisions. Furthermore the adaptive nature of the system allowed for a shorter prediction horizon which reduced the average execution time by a significant margin compared to the standard NMPC. 

This research establishes a foundation for robots that can autonomously balance task completion with self preservation in hazardous areas. Future developments aim to incorporate the "High Road" pathway potentially utilizing Large Language Models to handle complex social norms and strategic planning. This dual process approach promises to create intelligent systems capable of operating safely alongside humans in unpredictable real world environments.