Learning Transferable Visual Models From Natural Language Supervision

Applies task-agnostic, web-scale pre-training to computer vision using natural language supervision, enabling powerful zero-shot transfer to many datasets.

Scaling Laws for Neural Language Models

A large-scale empirical invesigation of scaling laws shows that performance has a power-law relationship to model size, dataset size, and training compute, while architectural details have minimal effects.

Attention Is All You Need

The Transformer, a sequence transduction model that replaces recurrent layers and relies entirely on attention mechanisms, achieves new SotA on machine translation tasks while reducing training time significantly.

Stance Detection for Fake News Identification

*Stanford NLP with Deep Learning (CS 224N) Project*, 2017