Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map

Xinyuan Chang1*, Maixuan Xue1,2*, Xinran Liu1†, Zheng Pan1, Xing Wei2‡
1Amap, Alibaba Group, 2Xi'an Jiaotong University
*Equal Contribution, Project Leader, Corresponding Author
Description of the image

For safe autonomous driving, accurate interpretation of lanes and traffic signs is crucial, ensuring vehicles maintain proper positioning and follow driving rules. This figure illustrates an intersection scene where extracted traffic sign rules are integrated into the corresponding lanes on the HD map.

Abstract

Ensuring adherence to traffic sign regulations is essential for both human and autonomous vehicle navigation. While current online mapping solutions often prioritize the construction of the geometric and connectivity layers of HD maps, overlooking the construction of the traffic regulation layer within HD maps. Addressing this gap, we introduce MapDR, a novel dataset designed for the extraction of Driving Rules from traffic signs and their association with vectorized, locally perceived HD Maps. MapDR features over 10,000 annotated video clips that capture the intricate correlation between traffic sign regulations and lanes.

Built upon this benchmark and the newly defined task of integrating traffic regulations into online HD maps, we provide modular and end-to-end solutions: VLE-MEE and RuleVLM, offering a strong baseline for advancing autonomous driving technology. It fills a critical gap in the integration of traffic sign rules, contributing to the development of reliable autonomous driving systems.

Video Visualization of MapDR

Task Definition

Description of the image

Step1 ~ Step4 shows a case of driving by the rules. Step2 and Step3 demonstrates the specific role of two sub-tasks, respectively.

The ability to discern rules from traffic signs and to associate them with specific lanes is pivotal for autonomous navigation. As depicted in Figure above, traffic signs are primary indicators of lane-level rules. Our proposed task involves two core sub-tasks: 1) Extracting lane-level rules from traffic signs, and 2) Establishing correspondence between these rules and centerlines. Generally, vehicles follow the center of lanes. Therefore, we use centerlines to represent lanes. This approach mirrors human drivers' instinct to observe traffic signs and then relate the indicated rules to the lanes they govern.

Dataset: MapDR

Description of the image

Multiple lane-level rules of a single traffic sign are annotated in {key:value} format. Directed lines indicate the correspondence between rules and particular centerlines.

Description of the image

The majority of the data originates from Beijing and Shanghai, with additional scenes from Guangzhou. Figure above illustrates the geographic spread and variety of traffic signs.

We introduce the MapDR dataset, meticulously annotated with traffic sign regulations and their correspondences to lanes, as shown in Figure above. The dataset encompasses a diverse range of scenarios, weather conditions, and traffic situations, with over 10,000 traffic scene segments, 18,000 driving rules, and 400,000 images. Traffic signs typically have varying textual descriptions, text layouts, and positions on the road, which add complexity to the task.

The dataset reflects a natural long-tail distribution, with a prevalence of bus and direction lanes and a scarcity of tidalflow lanes. We primarily focus on traffic signs that indicate lane-level rules, collected from cities with the most complex and diverse traffic scenarios in China, ensuring realistic and representative data. All images have undergone privacy and safety processing to obscure license plates and faces. More comprehensive statistic of dataset and case demonstrations can be found in out paper.

Description of the image Description of the image Description of the image

MapDR includes various types and layouts of traffic signs, which contain various driving rules. This presents challenges and necessity for accurately interpreting these traffic signs and associating them with the corresponding lanes.

Modular Approach

Description of the image Description of the image

Entire approach can be divided into two main parts: Rule Extraction from Traffic Sign (top) and Rule-Lane Correspondence Reasoning (bottom). Rule Extraction model consists of two sequential stages with the same structure VLE but unshared parameters, and the training procedure is independent.

Vectors can be represented as sequences of points, similar to words in sentences. Inspired by this, we designed MEE akin to BERT.

End-to-End Approach

Description of the image

Qwen-VL(TextPrompt) encodes centerline coordinates inthe PV image as text, inputs them into QwenLM; Qwen-VL(VisualPrompt) visualizes the centerline and its index inthe PV image and inputs it as Visual Prompt into QwenLM,while RuleVLM uses MEE with the cross-attention layerremoved to encode vectorized centerline results and aligns itwith LLM through an adapter.

Experiment

Description of the image

The heuristic method and the Qwen-VL(TextPrompt) method serve as the baselines for the modularand end-to-end approach, respectively. "−" denotes end-to-end approach is not suitable to independent evaluations of C.R. because theseapproaches do not utilize ground truth of rules for correspondence reasoning independently.

BibTeX

@misc{chang2025drivingrulesbenchmarkintegrating,
      title={Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map}, 
      author={Xinyuan Chang and Maixuan Xue and Xinran Liu and Zheng Pan and Xing Wei},
      year={2025},
      eprint={2410.23780},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.23780}
}

Access Dataset & Evaluation Code

The full dataset and evaluation code are comming soon.

Now please contact changxinyuan.cxy@alibaba-inc.com for a dataset demo.