CMU

Compositional Video Prediction

Novel method for video prediction from a single frame by decomposing the scene into entities with location and appearance features, capturing ambiguities with a global latent variable.

Embodied Multimodal Multitask Learning

Proposes multitask model to jointly learn semantic goal navigation and embodied question answering.