Combines a compositional rendering network with a recurrent interaction network to learn dynamics in scenes with significant occlusion, but relies on ground-truth object positions and segmentations.
RELATE builds upon the interpretable, structured latent parameterization of BlockGAN by modeling the correlations between object parameters to generate realistic videos of dynamic scenes, using raw, unlabeled data.
Novel method for video prediction from a single frame by decomposing the scene into entities with location and appearance features, capturing ambiguities with a global latent variable.