Applies self-supervised learning algorithms to developmentally realistic, longitudinal, egocentric video from young children and demonstrates the emergence of high-level visual representations.
In line with Rich Sutton's 'The Bitter Lesson', the improvement of video prediction performance as model capacity increases leaves an open question about how far we can get by finding the right combination of maximal model capacity and minimal inductive bias.
The Transformer, a sequence transduction model that replaces recurrent layers and relies entirely on attention mechanisms, achieves new SotA on machine translation tasks while reducing training time significantly.
Adversarial examples trained on an ensemble of CNNs with a retinal preprocessing layer reduce the accuracy of time-limited humans in a two alternative forced choice task.
The Control What You Can (CWYC) method learns to control components of the environment to achieve multi-step goals by combining task planning with surprise and learning progress based intrinsic motivation.
Computer vision models trained on data obtained from head-mounted cameras on children performs better than data from adults.