Uses bijective networks to identify large subspaces of invariance-based vulnerability and introduces the independence cross-entropy loss which partially alleviates it.
Demonstrates that scaling up self-supervised methods along data size, model capacity, and problem complexity enables them to match or surpass ImageNet supervised pre-training on a variety of tasks.
Demonstrates the benefit of curriculum learning with different scoring and pacing functions on various small datasets.
Applies task-agnostic, web-scale pre-training to computer vision using natural language supervision, enabling powerful zero-shot transfer to many datasets.
Demonstrates that providing explantions and model criticism can be useful tools to improve the reliability of ImageNet-trained CNNs for end-users.
Demonstrates that huamns use scene information to guide search towards likely target sizes, resulting in higher miss rates for mis-scaled targets, which does not occur for object detection DNNs.
Produces competitive convolution-free transformer, training only on ImageNet.
Presents the idea of using hyperbolic embeddings for hierarchical representations and provides some experiments classifying action within a hierarchy of actions.
Evaluating object recognition performance of humans and CNNs on images with varying levels of shape and texture cues reveals contrasting biases, which can be partially alleviated by training CNNs with stylized images.
Applies self-supervised learning algorithms to developmentally realistic, longitudinal, egocentric video from young children and demonstrates the emergence of high-level visual representations.