computer vision | Elias Z. Wang | AI Researcher & PhD Candidate at Stanford

Excessive Invariance Causes Adversarial Vulnerability

Uses bijective networks to identify large subspaces of invariance-based vulnerability and introduces the independence cross-entropy loss which partially alleviates it.

Scaling and Benchmarking Self-Supervised Visual Representation Learning

Demonstrates that scaling up self-supervised methods along data size, model capacity, and problem complexity enables them to match or surpass ImageNet supervised pre-training on a variety of tasks.

On The Power of Curriculum Learning in Training Deep Networks

Demonstrates the benefit of curriculum learning with different scoring and pacing functions on various small datasets.

Learning Transferable Visual Models From Natural Language Supervision

Applies task-agnostic, web-scale pre-training to computer vision using natural language supervision, enabling powerful zero-shot transfer to many datasets.

ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases

Demonstrates that providing explantions and model criticism can be useful tools to improve the reliability of ImageNet-trained CNNs for end-users.

Humans, but Not Deep Neural Networks, Often Miss Giant Targets in Scenes

Demonstrates that huamns use scene information to guide search towards likely target sizes, resulting in higher miss rates for mis-scaled targets, which does not occur for object detection DNNs.

Training data-efficient image transformers & distillation through attention

Produces competitive convolution-free transformer, training only on ImageNet.

Learning the Predictability of the Future

Presents the idea of using hyperbolic embeddings for hierarchical representations and provides some experiments classifying action within a hierarchy of actions.

ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness

Evaluating object recognition performance of humans and CNNs on images with varying levels of shape and texture cues reveals contrasting biases, which can be partially alleviated by training CNNs with stylized images.

Self-supervised learning through the eyes of a child

Applies self-supervised learning algorithms to developmentally realistic, longitudinal, egocentric video from young children and demonstrates the emergence of high-level visual representations.