Physical Scene Understanding

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

In line with Rich Sutton's 'The Bitter Lesson', the improvement of video prediction performance as model capacity increases leaves an open question about how far we can get by finding the right combination of maximal model capacity and minimal inductive bias.

Contrastive Learning of Structured World Models

Contrastively-trained Structured World Models (C-SWMs) depart from traditional pixel-based reconstruction losses and use an energy-based hinge loss for learning object-centric world models.

Forward Prediction for Physical Reasoning

Demonstrates the potential of forward-prediction for solving PHYRE physical reasoning tasks by investigating various combinations of object and pixel-based forward-prediction and task-solution models.