BiTAA: Adversarial Attacks on 3D Detection and Depth Estimation with Gaussian Splatting

As autonomous systems become more integrated into daily life—from self-driving cars to delivery drones—the reliability of 3D perception pipelines is non-negotiable. BiTAA, short for a Bi-Task Adversarial Attack, shines a light on a previously underexplored vulnerability: the ability to simultaneously degrade both object detection and depth estimation in 3D scenes. Built on the idea of Gaussian splatting, BiTAA highlights how fragile multi-task perception can be when adversaries exploit the shared representations that underlie both tasks.

BiTAA in a Nutshell

BiTAA is a method that targets two critical tasks at once: identifying where objects are in a 3D scene (detection) and judging how far away those objects are (depth estimation). Rather than optimizing a single objective, BiTAA uses a joint framework that crafts perturbations to the input or the scene representation so that both outputs misbehave in a coordinated way. The key idea is that perturbations effective for one task often leak into the other when the models share a common 3D representation, especially when that representation relies on a continuous, probabilistic description of the scene.

Gaussian Splatting and 3D Perception

Gaussian splatting represents a 3D scene as a collection of 3D Gaussians that blend to render depth, occupancy, and color. This approach can offer smoothness, high fidelity, and efficient rendering for complex scenes. However, because the Gaussians act as a shared substrate for multiple perceptual heads, they can be nudged in ways that simultaneously perturb bounding boxes and depth estimates. In practical terms, a small, carefully constrained adjustment to the splats can ripple through the network, producing outsized errors in where objects are located and how distant they appear.

The Bi-Task Attack: Why Two Tasks Matter

Why emphasize a bi-task attack? In real-world systems, perception rarely operates in a vacuum. A misdetected object that is also assigned an incorrect depth can lead to dangerous decisions—such as misjudging crossing distances for a pedestrian or misplacing a vehicle in the scene. BiTAA formalizes this risk by optimizing a joint objective that balances detection errors (missed objects, incorrect classifications) with depth errors (biased or noisy distance estimates). The result is a more realistic and challenging threat model than single-task attacks, underscoring the need for defenses that account for task interdependencies rather than treating detection and depth in isolation.

Robustness in 3D perception hinges on recognizing how interconnected tasks amplify defects. A defense that only hardens the detector may leave depth estimation fragile, and vice versa. BiTAA pushes the field to think in terms of multi-task resilience.

Impact on Safety, Trust, and Deployment

In high-stakes environments, adversarial vulnerabilities aren’t abstract concerns—they translate to real-world risk. A successful BiTAA-style perturbation could cause an autonomous system to underreact to a nearby obstacle, overestimate the distance to a safety margin, or misinterpret clutter as free space. For developers, this means prioritizing cross-task robustness, not just performance metrics on a single task. It also motivates better evaluation protocols that stress test detectors and depth estimators under coordinated, multi-task perturbations.

Defenses and Best Practices

Adversarial training for multi-task perception: Exposing models to coordinated perturbations during training helps them learn joint invariances across detection and depth.
Robust representation learning: Encouraging disentangled or hierarchical representations can limit the spillover of perturbations between tasks.
Certified or provable robustness: Developing bounds that guarantee performance within certain perturbation limits can provide formal assurances for critical deployments.
Combining LiDAR with image, radar, or other modalities can reduce single-source vulnerability by cross-validating depth cues and object hypotheses.

Uncertainty-aware inference: Explicitly modeling and propagating uncertainty in depth and detection can help systems hedge against potential perturbations.

Looking Ahead

BiTAA marks a compelling pivot in how we study 3D perception security. By focusing on the interplay between object detection and depth estimation, researchers and engineers can design more resilient pipelines that survive not just isolated attacks but coordinated, multi-task adversaries. The path forward involves building standardized benchmarks that simulate realistic Gaussian-splat representations, developing defense techniques that span tasks, and fostering a security-by-design mindset in autonomous perception—from data collection to deployment.