KDP-AD:
A Knowledge-Driven Diffusion Policy
for End-to-End Autonomous Driving Based on Expert Routing

Chengkai Xu1, Jiaqi Liu2, Yicheng Guo1, Peng Hang1, Jian Sun1

1Tongji University, 2UNC-Chapel Hill

Contact: 2534242@tongji.edu.cn

KDP-AD Framework Overview

Case Study & Analysis

Main Video

Intersection

Roundabout

In Ramp

Intersection

Intersection

Roundabout

Abstract

End-to-end autonomous driving remains constrained by the need to generate multimodal actions, maintain temporal stability, and generalize across diverse scenarios. Existing methods often collapse multimodality, struggle with long-horizon consistency, or lack modular adaptability. This paper presents KDP, a knowledge-driven diffusion policy that integrates generative diffusion modeling with a sparse mixture-of-experts routing mechanism.

The diffusion component generates temporally coherent and multimodal action sequences, while the expert routing mechanism activates specialized and reusable experts according to context, enabling modular knowledge composition.

Extensive experiments across representative driving scenarios demonstrate that KDP achieves consistently higher success rates, reduced collision risk, and smoother control compared to prevailing paradigms. Ablation studies highlight the effectiveness of sparse expert activation and the Transformer backbone, and activation analyses reveal structured specialization and cross-scenario reuse of experts. These results establish diffusion with expert routing as a scalable and interpretable paradigm for knowledge-driven end-to-end autonomous driving.

Key Contributions

  • A knowledge-driven end-to-end driving framework. We remodel experts in MoE as abstract driving knowledge units, enabling modular and compositional policy learning beyond task-centric formulations.
  • Integration of diffusion modeling with expert routing. We combine diffusion policies with MoE, which can maintain long-horizon consistency while adapting to diverse scenarios through modularity and selective reuse of knowledge.
  • Comprehensive empirical validation. We demonstrate the effectiveness of the proposed approach on diverse driving scenarios, achieving superior safety, generalization, and efficiency compared with baselines.

Methodology

VIPER-R1 Framework

Framework of the proposed Knowledge-Driven Diffusion Policy. Scene inputs condition a diffusion-based policy to generate multi-modal, temporally coherent action sequences. A Mixture-of-Experts module refines these sequences by activating experts interpreted as abstract knowledge units, whose combinations express diverse and extensible driving skills such as interaction, maneuvering, and adaptation.

1

Diffusion Policy for Driving

Step 1: Forward Diffusion of Action Sequences

The forward diffusion process perturbs expert action sequences with gradually injected Gaussian noise to create supervised training targets, enabling the policy to learn how to recover clean and temporally coherent trajectories during the reverse denoising process.

\( q(a_t \mid a_0) = \mathcal{N}\big(\sqrt{\bar{\alpha}_t} \, a_0, \, (1-\bar{\alpha}_t) I \big) \)

Step 2: Reverse Denoising Process as Driving Policy

The reverse denoising process serves as the driving policy by iteratively transforming Gaussian noise into coherent, multi-modal action sequences conditioned on observations, ensuring temporal consistency and capturing the variability of human demonstrations.

\( p_\theta(a_{t-1} \mid a_t, o_{t}) = \mathcal{N}\!\big(\mu_\theta(a_t, o_{t}, t), \, \Sigma_\theta(a_t, o_{t}, t)\big), \quad t = T,\ldots,1 \)
2

MoE-based Knowledge Routing

The MoE-based knowledge routing extends the diffusion driving policy with modular experts and a top-K router, enabling dynamic composition of specialized knowledge units while ensuring balanced and efficient expert utilization.

Step 1: Integration of MoE into Diffusion Policy: \( \epsilon_{\theta}(a_t, o_t, t) = \sum_{i \in \mathcal{S}_t} g_{\phi}^{\,i}(o_t)\,\epsilon_{\theta_i}(a_t, o_t, t) \)
Step 2: Routing Mechanism: \( g_{\phi}(x) = \mathrm{softmax}(xW) \)
Step 3: Training Objective with Routing Constraints: \( \mathcal{L} = \mathcal{L}_{\text{denoise}} + \lambda_{\text{bal}}\,\mathcal{L}_{\text{balance}} - \gamma\, I(K,E) \)

Experimental Results

Main Results

Comparison of security, efficiency, and comfort test results of different methods in multiple scenarios:

Scenario Model Success Rate Collision Rate Average Episodic Reward Average Velocity (m/s) Acceleration Variance Average Completion Steps
In RampPPO-Lag0.950.02191.867.060.38250.67
In RampRPID0.990.01196.277.470.38242.57
In RampIBC0.860.11190.288.030.35220.78
In RampOurs1.000.00197.528.610.37210.99
IntersectionPPO-Lag0.900.10116.796.300.45172.02
IntersectionRPID0.630.3798.577.280.39125.42
IntersectionIBC0.680.3198.945.750.35163.48
IntersectionOurs0.940.06121.546.340.45174.05
RoundaboutPPO-Lag0.640.18142.776.560.45203.43
RoundaboutRPID0.580.19134.967.230.40174.90
RoundaboutIBC0.700.22139.716.410.35245.98
RoundaboutOurs0.900.10177.856.830.45246.06

Expert Activation Analysis

Fig_expert_time

Temporal activation patterns of experts across driving scenarios

Fig_expert_sce

Scenario-level expert activation aggregated across episodes, where Experts 1 and 3 dominate in Merge, Experts 1 and 5 dominate in Intersection, and Experts 6 and 8 dominate in Roundabout, while Experts 2 and 4 exhibit occasional activation across multiple scenarios

Ablation Study

Performance comparison of model variants across driving scenarios:

Ablation Study Results

Contribution of Transformer and MoE stages

Baseline-U

Sce-1: 0.68 | Sce-2: 0.71 | Sce-3: 0.91

Baseline-T (+ Transformer)

Sce-1: 0.76 | Sce-2: 0.84 | Sce-3: 0.98

Ours (+ MoE)

Sce-1: 0.94 | Sce-2: 0.90 | Sce-3: 1.00

Checkpoints And Datasets

Download Dataset

Checkpoints and Dataset will be released upon paper acceptance

Citation


@article{xu2025kdp,
          title   = {A Knowledge-Driven Diffusion Policy for End-to-End Autonomous Driving Based on Expert Routing},
          author  = {Xu, Chengkai and Liu, Jiaqi and Guo, Yicheng and Hang, Peng and Sun, Jian},
          journal = {arXiv preprint arXiv:2509.04853},
          year    = {2025}
        }