Contrastively-trained Structured World Models (C-SWMs) depart from traditional pixel-based reconstruction losses and use an energy-based hinge loss for learning object-centric world models.
Demonstrates the potential of forward-prediction for solving PHYRE physical reasoning tasks by investigating various combinations of object and pixel-based forward-prediction and task-solution models.
Hierarchical Relational Inference (HRI) learns hierarchical object representations and their relations directly from raw visual inputs, but is evaluated against limited baselines on simple datasets
IntPhys provides a well-designed benchmark for evaluting a system's understanding of a few core concepts about the physics of objects.