Uses bijective networks to identify large subspaces of invariance-based vulnerability and introduces the independence cross-entropy loss which partially alleviates it.
Demonstrates the benefit of curriculum learning with different scoring and pacing functions on various small datasets.
Demonstrates that providing explantions and model criticism can be useful tools to improve the reliability of ImageNet-trained CNNs for end-users.
Produces competitive convolution-free transformer, training only on ImageNet.
Presents the idea of using hyperbolic embeddings for hierarchical representations and provides some experiments classifying action within a hierarchy of actions.
A large-scale empirical invesigation of scaling laws shows that performance has a power-law relationship to model size, dataset size, and training compute, while architectural details have minimal effects.
Success of reasonably sized neural networks hinges on symmetry, locality, and polynomial log-probability in data from the natural world.
By training on a large-scale repository of Visual Commonsense Graphs, VisualCOMET, a single stream vision-language transformer model, is able to generate inferences about past and present events by integrating information from the image with textual descriptions of the present event and location.
The Transformer, a sequence transduction model that replaces recurrent layers and relies entirely on attention mechanisms, achieves new SotA on machine translation tasks while reducing training time significantly.
Development of artificial neural networks should leverage the insight that much of animal behavior is innate as a result of wiring rules encoded in the genome, learned through billions of years of evolution.