FAIR

Scaling and Benchmarking Self-Supervised Visual Representation Learning

Demonstrates that scaling up self-supervised methods along data size, model capacity, and problem complexity enables them to match or surpass ImageNet supervised pre-training on a variety of tasks.

ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases

Demonstrates that providing explantions and model criticism can be useful tools to improve the reliability of ImageNet-trained CNNs for end-users.

Training data-efficient image transformers & distillation through attention

Produces competitive convolution-free transformer, training only on ImageNet.

Forward Prediction for Physical Reasoning

Demonstrates the potential of forward-prediction for solving PHYRE physical reasoning tasks by investigating various combinations of object and pixel-based forward-prediction and task-solution models.

IntPhys 2019: A Benchmark for Visual Intuitive Physics Understanding

IntPhys provides a well-designed benchmark for evaluting a system's understanding of a few core concepts about the physics of objects.

Occlusion resistant learning of intuitive physics from videos

Combines a compositional rendering network with a recurrent interaction network to learn dynamics in scenes with significant occlusion, but relies on ground-truth object positions and segmentations.

Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases

Analysis of invariances in representations from contrastive self-supervised models reveals that they leverage aggressive cropping on object-centric datasets to improve occlusion invariance at the expense of viewpoint and category instance invariance.

Compositional Video Prediction

Novel method for video prediction from a single frame by decomposing the scene into entities with location and appearance features, capturing ambiguities with a global latent variable.

Embodied Multimodal Multitask Learning

Proposes multitask model to jointly learn semantic goal navigation and embodied question answering.