Encoding 3D contextual information for dynamic scene understanding