Uses bijective networks to identify large subspaces of invariance-based vulnerability and introduces the independence cross-entropy loss which partially alleviates it.
Evaluating object recognition performance of humans and CNNs on images with varying levels of shape and texture cues reveals contrasting biases, which can be partially alleviated by training CNNs with stylized images.
Contrastively-trained Structured World Models (C-SWMs) depart from traditional pixel-based reconstruction losses and use an energy-based hinge loss for learning object-centric world models.