Region Proposal Interaction Networks (RPIN) learn to reason about object trajectories in a latent region-proposal feature space, that captures object and contextual information.
The Object-centric perception, prediction, and planning (OP3) framework demonstrates strong generalization to novel configurations in block stacking tasks by symmetrically processing entity representations extracted from raw visual observations.