A central challenge in scalable autonomy is multimodal, dynamic 3D scene understanding. Specific challenges include integration of multiple sensors (that can operate that different spatial and temporal resolutions), semantic understanding of a dynamic scene, and integration of prior knowledge in the form of a map (that may be outdated and require updates). Finally, it is crucial for such understanding to happen in a streaming setting with low latency, robust performance, and graceful degradation and error handling.

Autonomy in the open-world requires understanding and forecasting the behaviors, intentions, and goals of other agents. This is particularly challenging due to the multi-agent nature of the problem; the behavior of any one agent can have a profound effect on others. Forecasting future behaviors requires encoding and representing uncertainties about the multitude of ways the world could evolve. Our key strategy here is to make use of prior knowledge and experience, obtained by observing past examples of naturalistic behavior.


Traditional autonomous vehicle pipelines are highly modularized with different subsystems for localization, perception, actor prediction, planning, and control. Though this approach provides ease of interpretation, it can be difficult to generalize to unseen environments and scale to new environments and cities without hand-engineering. Motivated by this observation, the center is also exploring end-to-end learning approaches that make use of historical data and simulators, to enable autonomy stacks that can be self-tuned and are adaptive.

Argoverse Dataset by Argo AI  code