The unlearnability, high-dimensionality, and unboundedness of the real world necessitates the integration of intrinsic motivation with other developmental constraints, such as sensorimotor primitives, task space representations, maturational mechanisms, and social guidance.
Provides evidence that human experimentation in physical environments is effective at revealing properties of interest, and learning from observations relies on the learning goals.
Presents the idea of using hyperbolic embeddings for hierarchical representations and provides some experiments classifying action within a hierarchy of actions.
A large-scale empirical invesigation of scaling laws shows that performance has a power-law relationship to model size, dataset size, and training compute, while architectural details have minimal effects.
Evaluating object recognition performance of humans and CNNs on images with varying levels of shape and texture cues reveals contrasting biases, which can be partially alleviated by training CNNs with stylized images.
Reviews recent work that analyzes the egocentric view of infants, highlighting the connection between the data and internal machinery for statistical learning.
Applies self-supervised learning algorithms to developmentally realistic, longitudinal, egocentric video from young children and demonstrates the emergence of high-level visual representations.
Proposes a new set of ImageNet labels that address the limitations of the original labels resulting from multiple objects in a single image and synonymous labels.
In line with Rich Sutton's 'The Bitter Lesson', the improvement of video prediction performance as model capacity increases leaves an open question about how far we can get by finding the right combination of maximal model capacity and minimal inductive bias.