3D object detection is an essential perception task in autonomous driving to
understand the environments. The Bird’s-Eye-View (BEV) representations have
significantly improved the performance of 3D detectors with camera inputs on
popular benchmarks. However, there still lacks a systematic understanding of
the robustness of these vision-dependent BEV models, which is closely related
to the safety of autonomous driving systems. In this paper, we evaluate the
natural and adversarial robustness of various representative models under
extensive settings, to fully understand their behaviors influenced by explicit
BEV features compared with those without BEV. In addition to the classic
settings, we propose a 3D consistent patch attack by applying adversarial
patches in the 3D space to guarantee the spatiotemporal consistency, which is
more realistic for the scenario of autonomous driving. With substantial
experiments, we draw several findings: 1) BEV models tend to be more stable
than previous methods under different natural conditions and common corruptions
due to the expressive spatial representations; 2) BEV models are more
vulnerable to adversarial noises, mainly caused by the redundant BEV features;
3) Camera-LiDAR fusion models have superior performance under different
settings with multi-modal inputs, but BEV fusion model is still vulnerable to
adversarial noises of both point cloud and image. These findings alert the
safety issue in the applications of BEV detectors and could facilitate the
development of more robust models.
Related Stories
June 3, 2023