In line with Rich Sutton's 'The Bitter Lesson', the improvement of video prediction performance as model capacity increases leaves an open question about how far we can get by finding the right combination of maximal model capacity and minimal inductive bias.
Development of artificial neural networks should leverage the insight that much of animal behavior is innate as a result of wiring rules encoded in the genome, learned through billions of years of evolution.
The Object-centric perception, prediction, and planning (OP3) framework demonstrates strong generalization to novel configurations in block stacking tasks by symmetrically processing entity representations extracted from raw visual observations.
The Control What You Can (CWYC) method learns to control components of the environment to achieve multi-step goals by combining task planning with surprise and learning progress based intrinsic motivation.
A large scale, comprehensive study challenges various assumptions in learning disentangled representations, which motivates demonstrating concrete benefits in robust experimental setups in future work.
Novel method for video prediction from a single frame by decomposing the scene into entities with location and appearance features, capturing ambiguities with a global latent variable.