KDP: A Knowledge-Driven Diffusion Policy for End-to-End Autonomous Driving Based on Expert Routing

Case Study & Analysis

Main Video

Intersection

Roundabout

In Ramp

Intersection

Roundabout

Abstract

End-to-end autonomous driving remains constrained by the need to generate multimodal actions, maintain temporal stability, and generalize across diverse scenarios. Existing methods often collapse multimodality, struggle with long-horizon consistency, or lack modular adaptability. This paper presents KDP, a knowledge-driven diffusion policy that integrates generative diffusion modeling with a sparse mixture-of-experts routing mechanism.

The diffusion component generates temporally coherent and multimodal action sequences, while the expert routing mechanism activates specialized and reusable experts according to context, enabling modular knowledge composition.

Extensive experiments across representative driving scenarios demonstrate that KDP achieves consistently higher success rates, reduced collision risk, and smoother control compared to prevailing paradigms. Ablation studies highlight the effectiveness of sparse expert activation and the Transformer backbone, and activation analyses reveal structured specialization and cross-scenario reuse of experts. These results establish diffusion with expert routing as a scalable and interpretable paradigm for knowledge-driven end-to-end autonomous driving.

Key Contributions

A knowledge-driven end-to-end driving framework. We remodel experts in MoE as abstract driving knowledge units, enabling modular and compositional policy learning beyond task-centric formulations.
Integration of diffusion modeling with expert routing. We combine diffusion policies with MoE, which can maintain long-horizon consistency while adapting to diverse scenarios through modularity and selective reuse of knowledge.
Comprehensive empirical validation. We demonstrate the effectiveness of the proposed approach on diverse driving scenarios, achieving superior safety, generalization, and efficiency compared with baselines.

Methodology

Framework of the proposed Knowledge-Driven Diffusion Policy. Scene inputs condition a diffusion-based policy to generate multi-modal, temporally coherent action sequences. A Mixture-of-Experts module refines these sequences by activating experts interpreted as abstract knowledge units, whose combinations express diverse and extensible driving skills such as interaction, maneuvering, and adaptation.

1

Diffusion Policy for Driving

Step 1: Forward Diffusion of Action Sequences

The forward diffusion process perturbs expert action sequences with gradually injected Gaussian noise to create supervised training targets, enabling the policy to learn how to recover clean and temporally coherent trajectories during the reverse denoising process.

\( q(a_t \mid a_0) = \mathcal{N}\big(\sqrt{\bar{\alpha}_t} \, a_0, \, (1-\bar{\alpha}_t) I \big) \)

Step 2: Reverse Denoising Process as Driving Policy

The reverse denoising process serves as the driving policy by iteratively transforming Gaussian noise into coherent, multi-modal action sequences conditioned on observations, ensuring temporal consistency and capturing the variability of human demonstrations.

\( p_\theta(a_{t-1} \mid a_t, o_{t}) = \mathcal{N}\!\big(\mu_\theta(a_t, o_{t}, t), \, \Sigma_\theta(a_t, o_{t}, t)\big), \quad t = T,\ldots,1 \)

2

MoE-based Knowledge Routing

The MoE-based knowledge routing extends the diffusion driving policy with modular experts and a top-K router, enabling dynamic composition of specialized knowledge units while ensuring balanced and efficient expert utilization.

Step 1: Integration of MoE into Diffusion Policy: \( \epsilon_{\theta}(a_t, o_t, t) = \sum_{i \in \mathcal{S}_t} g_{\phi}^{\,i}(o_t)\,\epsilon_{\theta_i}(a_t, o_t, t) \)

Step 2: Routing Mechanism: \( g_{\phi}(x) = \mathrm{softmax}(xW) \)

Step 3: Training Objective with Routing Constraints: \( \mathcal{L} = \mathcal{L}_{\text{denoise}} + \lambda_{\text{bal}}\,\mathcal{L}_{\text{balance}} - \gamma\, I(K,E) \)

Experimental Results

Main Results

Comparison of security, efficiency, and comfort test results of different methods in multiple scenarios:

Scenario	Model	Success Rate	Collision Rate	Average Episodic Reward	Average Velocity (m/s)	Acceleration Variance	Average Completion Steps
In Ramp	PPO-Lag	0.95	0.02	191.86	7.06	0.38	250.67
In Ramp	RPID	0.99	0.01	196.27	7.47	0.38	242.57
In Ramp	IBC	0.86	0.11	190.28	8.03	0.35	220.78
In Ramp	Ours	1.00	0.00	197.52	8.61	0.37	210.99
Intersection	PPO-Lag	0.90	0.10	116.79	6.30	0.45	172.02
Intersection	RPID	0.63	0.37	98.57	7.28	0.39	125.42
Intersection	IBC	0.68	0.31	98.94	5.75	0.35	163.48
Intersection	Ours	0.94	0.06	121.54	6.34	0.45	174.05
Roundabout	PPO-Lag	0.64	0.18	142.77	6.56	0.45	203.43
Roundabout	RPID	0.58	0.19	134.96	7.23	0.40	174.90
Roundabout	IBC	0.70	0.22	139.71	6.41	0.35	245.98
Roundabout	Ours	0.90	0.10	177.85	6.83	0.45	246.06

Expert Activation Analysis

Temporal activation patterns of experts across driving scenarios

Scenario-level expert activation aggregated across episodes, where Experts 1 and 3 dominate in Merge, Experts 1 and 5 dominate in Intersection, and Experts 6 and 8 dominate in Roundabout, while Experts 2 and 4 exhibit occasional activation across multiple scenarios

Ablation Study

Performance comparison of model variants across driving scenarios:

Contribution of Transformer and MoE stages

Baseline-U

Sce-1: 0.68 | Sce-2: 0.71 | Sce-3: 0.91

Baseline-T (+ Transformer)

Sce-1: 0.76 | Sce-2: 0.84 | Sce-3: 0.98

Ours (+ MoE)

Sce-1: 0.94 | Sce-2: 0.90 | Sce-3: 1.00

Checkpoints And Datasets

Giant Model Large Model Medium Model Small Model

Download Dataset

Checkpoints and Dataset will be released upon paper acceptance

Citation


@article{xu2025kdp,
          title   = {A Knowledge-Driven Diffusion Policy for End-to-End Autonomous Driving Based on Expert Routing},
          author  = {Xu, Chengkai and Liu, Jiaqi and Guo, Yicheng and Hang, Peng and Sun, Jian},
          journal = {arXiv preprint arXiv:2509.04853},
          year    = {2025}
        }

KDP-AD: A Knowledge-Driven Diffusion Policy for End-to-End Autonomous Driving Based on Expert Routing