HeightLane: BEV Heightmap guided
3D Lane Detection
WACV 2025 (Oral Presentation)
- Chaesong Park Seoul National University
- Eunbin Seo Seoul National University
- Jongwoo Lim Seoul National University
Abstract
Accurate 3D lane detection from monocular images presents significant challenges due to depth ambiguity and imperfect ground modeling. Previous attempts to model the ground have often used a planar ground assumption with limited degrees of freedom, making them unsuitable for complex road environments with varying slopes. Our study introduces HeightLane, an innovative method that predicts a height map from monocular images by creating anchors based on a multi-slope assumption. This approach provides a detailed and accurate representation of the ground. HeightLane employs the predicted heightmap along with a deformable attention-based spatial feature transform framework to efficiently convert 2D image features into 3D bird's eye view (BEV) features, enhancing spatial understanding and lane structure recognition. Additionally, the heightmap is used for the positional encoding of BEV features, further improving their spatial accuracy. This explicit view transformation bridges the gap between frontview perceptions and spatially accurate BEV representations, significantly improving detection performance. To address the lack of the necessary ground truth height map in the original OpenLane dataset, we leverage the Waymo dataset and accumulate its LiDAR data to generate a height map for the drivable area of each scene. The GT heightmaps are used to train the heightmap extraction module from monocular images. Extensive experiments on the OpenLane validation set show that HeightLane achieves state-of-the-art performance in terms of F-score, highlighting its potential in real-world applications.
HeightLane Framework
HeightLane creates a predefined BEV grid for the ground and generates multiple heightmap anchors on this grid, assuming various slopes. These anchors are projected back onto the image to sample front-view features from the corresponding regions, enabling the model to efficiently predict a heightmap. To better align each BEV grid pixel with the 2D front-view features, height information from the predicted heightmap is added to the positional encoding of the BEV grid queries.
Using the predicted heightmap along with deformable attention mechanisms, HeightLane explicitly performs spatial transformations of image features onto the BEV grid. This method significantly reduces the misalignment between the image and BEV features, ensuring more accurate representation and processing.
Results
HeightLane achieves state-of-the-art performance on the OpenLane validation set, surpassing previous methods in terms of F-score. The explicit heightmap modeling enables more accurate 3D lane detection in complex road environments with varying slopes and elevations.
The heightmap-guided approach provides accurate ground modeling, enabling robust 3D lane detection across diverse road conditions and terrain variations.
Citation
Acknowledgements
We thank the OpenLane and Waymo dataset teams for providing the benchmarks and data used in this research. This work was supported by ME & IPAI at Seoul National University.
The website template was borrowed from Michaƫl Gharbi.