Embodied Intelligence via Learning and Evolution
Gupta et al., 2021
Summary
- The relationship between environmental complexity, evolved morphology, and learnability of intelligent control is not well understood
- Deep Evolutionary Reinforcement Learning (DERL) evolves diverse agent morphologies to learn locomotion and manipulation tasks in complex environments using egocentric sensory information
- Demonstrates that environmental complexity fosters the evolution of morphological intelligence
- Evolution selects morphologies that learn faster - morphological Baldwin effect – due to better physical stability and energy efficiency
- Links: [ website ] [ pdf ]
Background
- Animals exhibit high degrees of embodied intelligence by leveraging their morphologies to solve complex tasks
- In contrast, AI has generally focused on disembodied cognition
- Artificial evolution of morphologies is difficult:
- Combinatorially large number of possible morphologies
- Significant compute to evaluate fitness through lifetime learning
- DERL enables scaling along three axes of complexity: environmental, morphological, and control
- Mimics process of Darwinian evolution over generations and neural learning within a lifetime
- Previous evolutionary simulations used generational evolution, which scales poorly since evolution occurs only after every individual is trained
Methods
- DERL uses asynchronous tournament based evolution in groups of four
- Each agent receives egocentric proprioceptive and exteroceptive observations, policy learned with PPO
- Proprioceptive observations: joint angles, angular velocities, head velocity, acceleration, and angular acceleration, and touch sensors on limbs and head
- Exteroceptive observations: local terrain profile, goal location, and positions of objects and obstacles
- Controller reward is a combination of forward velocity and a small penalty for large torques, but only forward progress is used for fitness
- UNIMAL: UNIversal aniMAL morphological design space that is expressive yet controllable
- Kinematic tree genotype corresponding to a hierarchy of 3D rigid parts connected via motor actuated hinge joints
- Three classes of mutations:
- Grow or delete limbs
- Modify physical properties of existing limbs (e.g. length or density)
- Modify properties of joints (e.g. DoF, limits of rotation, or gear ratios)
- Preserve bilateral symmetry by using paired mutations, which results in the center of mass lying on the saggital plane
- Three levels of environmental complexity: flat terrain (FT), variable terrain (VT), and non-prehensile manipulation in variable terrain (MVT)
Results
- Experiments averaging 10 generations, 4000 morphologies, and 5 million agent-environment interactions
- Relatively high average initial fitness indicates the efficacy of UNIMAL
- Asynchronous parallel tournaments in DERL enables ancestors with lower initial fitness to still contribute highly fit descendants to the final population
- Assessing morphological intelligence
- Eight tasks divided into three domains: agility, stability, and manipulation
- Controllers learned from scratch in each task, ensuring differences in performance are a result of morphology
- Agents evolved in MVT outperformed FT in seven tasks, VT better than FT in agility and stability but same in manipulation – indicates that complex environments promotes morphological intelligence
- Morphological Baldwin effect, where learning time to reach a given level of fitness is reduced over generations
- Evolution selects for morphologies with better passive stability and energy efficiency, which enables better and faster learning
Conclusion
- Large-scale evolutionary simulations by DERL yield insights into how the interaction between learning, evolution, and environmental complexity can lead to morphological intelligence
- Looks like the performance is still increasing at the end of lifelong learning (5 million environmental interactions), which confounds the selection pressure for final performance and learning speed
- Would be interesting to further investigate the various design choices (morphological design space, evolution hyperparameters, environments, etc.)
- Morphological intelligence is just one example of useful information that is encoded in the genome