Embodied Intelligence via Learning and Evolution

Gupta et al., 2021

Source: Gupta et al., 2021

Summary

The relationship between environmental complexity, evolved morphology, and learnability of intelligent control is not well understood
Deep Evolutionary Reinforcement Learning (DERL) evolves diverse agent morphologies to learn locomotion and manipulation tasks in complex environments using egocentric sensory information
Demonstrates that environmental complexity fosters the evolution of morphological intelligence
- Evolution selects morphologies that learn faster - morphological Baldwin effect – due to better physical stability and energy efficiency
Links: [ website ] [ pdf ]

Animals exhibit high degrees of embodied intelligence by leveraging their morphologies to solve complex tasks
- In contrast, AI has generally focused on disembodied cognition
Artificial evolution of morphologies is difficult:
- Combinatorially large number of possible morphologies
- Significant compute to evaluate fitness through lifetime learning
DERL enables scaling along three axes of complexity: environmental, morphological, and control
- Mimics process of Darwinian evolution over generations and neural learning within a lifetime
Previous evolutionary simulations used generational evolution, which scales poorly since evolution occurs only after every individual is trained

DERL uses asynchronous tournament based evolution in groups of four
Each agent receives egocentric proprioceptive and exteroceptive observations, policy learned with PPO
- Proprioceptive observations: joint angles, angular velocities, head velocity, acceleration, and angular acceleration, and touch sensors on limbs and head
- Exteroceptive observations: local terrain profile, goal location, and positions of objects and obstacles
- Controller reward is a combination of forward velocity and a small penalty for large torques, but only forward progress is used for fitness
UNIMAL: UNIversal aniMAL morphological design space that is expressive yet controllable
- Kinematic tree genotype corresponding to a hierarchy of 3D rigid parts connected via motor actuated hinge joints
- Three classes of mutations:
  - Grow or delete limbs
  - Modify physical properties of existing limbs (e.g. length or density)
  - Modify properties of joints (e.g. DoF, limits of rotation, or gear ratios)
- Preserve bilateral symmetry by using paired mutations, which results in the center of mass lying on the saggital plane
Three levels of environmental complexity: flat terrain (FT), variable terrain (VT), and non-prehensile manipulation in variable terrain (MVT)

Experiments averaging 10 generations, 4000 morphologies, and 5 million agent-environment interactions
Relatively high average initial fitness indicates the efficacy of UNIMAL
Asynchronous parallel tournaments in DERL enables ancestors with lower initial fitness to still contribute highly fit descendants to the final population
Assessing morphological intelligence
- Eight tasks divided into three domains: agility, stability, and manipulation
- Controllers learned from scratch in each task, ensuring differences in performance are a result of morphology
- Agents evolved in MVT outperformed FT in seven tasks, VT better than FT in agility and stability but same in manipulation – indicates that complex environments promotes morphological intelligence
Morphological Baldwin effect, where learning time to reach a given level of fitness is reduced over generations
- Evolution selects for morphologies with better passive stability and energy efficiency, which enables better and faster learning

Large-scale evolutionary simulations by DERL yield insights into how the interaction between learning, evolution, and environmental complexity can lead to morphological intelligence
Looks like the performance is still increasing at the end of lifelong learning (5 million environmental interactions), which confounds the selection pressure for final performance and learning speed
Would be interesting to further investigate the various design choices (morphological design space, evolution hyperparameters, environments, etc.)
Morphological intelligence is just one example of useful information that is encoded in the genome