A Simple Framework for Contrastive Learning of Visual Representations

SimCLR, a simple unsupervised contrastive learning framework, uses data augmentation for positive pairs, a nonlinear projection head, normalized temperature-scaled cross entropy loss, and large batch sizes to achieve SotA in self-supervised, semi-supervised, and transfer learning domains.

Learning Physical Graph Representations from Visual Scenes

Introduces a novel hierarchical representation of visual scenes, Physical Scene Graphs (PSGs), as well as a network for learning them from RGB movies, PSGNet, which outperforms other unsupervised methods in scene segmentation.

Embodied Multimodal Multitask Learning

Proposes multitask model to jointly learn semantic goal navigation and embodied question answering.