The Transformer, a sequence transduction model that replaces recurrent layers and relies entirely on attention mechanisms, achieves new SotA on machine translation tasks while reducing training time significantly.
SimCLR, a simple unsupervised contrastive learning framework, uses data augmentation for positive pairs, a nonlinear projection head, normalized temperature-scaled cross entropy loss, and large batch sizes to achieve SotA in self-supervised, semi-supervised, and transfer learning domains.
Adversarial examples trained on an ensemble of CNNs with a retinal preprocessing layer reduce the accuracy of time-limited humans in a two alternative forced choice task.
A large scale, comprehensive study challenges various assumptions in learning disentangled representations, which motivates demonstrating concrete benefits in robust experimental setups in future work.