The latest Chinese computer vision research proposes a LiDAR-inertia-visual fusion framework called R3LIVE++ to obtain robust and accurate state estimation while simultaneously reconstructing the radiation map on the fly

Simultaneous Localization and Mapping (SLAM) is a system that estimates sensor postures while simultaneously reconstructing a 3D map of the surrounding environment using a sequence of sensor data (e.g., camera, LiDAR, IMU). SLAM has been widely used in localization and feedback control for autonomous robotics (e.g., unmanned aerial vehicles, automated ground vehicles, and autonomous automobiles) because it can estimate postures in real time. Meanwhile, due to its ability to recreate maps in real time, SLAM is essential for robot navigation, virtual and augmented reality (VR/AR), surveying and mapping applications. Different applications often require different levels of map detail, such as a sparse feature map, a 3D dense point cloud map, and a 3D radiance map (i.e. a scatterplot map 3D with radiance information).

Existing SLAM systems can be divided into two categories based on the sensor used: visual SLAM and LiDAR SLAM. For example, the sparse visual feature map is suitable and widely used for camera localization. Sparse features detected in images can be used to calculate camera posture. Even for small objects, the dense 3D point cloud can capture the geometric structure of the environment. Finally, radiance maps, including geometry and radiance information, are used in mobile mapping, augmented reality/virtual reality (AR/VR), video games, 3D modeling, and surveying. These applications require geometric structures and textures to generate virtual worlds similar to the real world.

Visual SLAM is based on low-cost and efficient SWaP camera sensors and has produced good results in terms of location accuracy. The reconstructed map is also acceptable for human interpretation due to the abundant and vivid information measured by the cameras. However, due to the lack of direct and accurate depth measurements, the accuracy and resolution of SLAM visual mapping is often lower than that of SLAM LiDAR. Visual SLAM maps the environment by triangulating disparities from multi-view images (e.g., structure from motion for a mono camera, stereo vision for a stereo camera), an exceptionally computationally intensive operation that frequently requires a hardware acceleration or server clusters.

In addition, the accuracy of estimated depth decreases quadratically with measurement distance due to measurement disturbances and multi-view image baseline, which makes visual SLAM difficult to reconstruct large outdoor landscapes. ladder. Additionally, visual SLAM can only be used in well-lit circumstances and will degrade in a high occlusion or textureless environment. LiDAR SLAM, on the other hand, is based on LiDAR sensors. LiDAR SLAM can achieve significantly higher accuracy and efficiency on localization and map reconstruction than visual SLAM due to high measurement accuracy (a few millimeters) and extended measurement range (hundreds of meters) of the sensors LiDAR. LiDAR SLAM, on the other hand, frequently fails in environments with inadequate geometric characteristics, such as long tunnel-like corridors, facing a single large wall, etc.

The fusion of LiDAR and camera measurements in SLAM could overcome sensor degeneracy issues in localization and meet the requirements of various mapping applications. Moreover, LiDAR SLAM can only recreate the geometric structure of the environment and does not contain color information. Accordingly, they propose R3LIVE++, which has a LiDAR-Inertial-Visual fusion architecture that strongly connects two subsystems: LiDAR-inertial odometry (LIO) and visual-inertial odometry (VIO). In real time, the two subsystems collaborate and progressively generate a 3D radiance map of the environment.

The LIO subsystem, in particular, reconstructs the geometric structure by recording new points in each LiDAR scan on the map. In contrast, the VIO subsystem retrieves luminance information by mapping the colors of pixels in each image to points on the map. It uses a revolutionary VIO architecture that monitors camera position (and estimates another system state) by minimizing the difference in brightness between locations on the brightness map and a sparse collection of pixels in the current frame. Direct photometric inaccuracy on a sparse collection of individual pixels effectively limits computational load, and image-to-map alignment effectively minimizes odometry drift.

Additionally, based on photometric inaccuracies, the VIO can estimate camera exposure time online, allowing the retrieval of authentic radiation information from the environment. Benchmark results on 25 sequences from an open dataset (the NCLT dataset) reveal that R3LIVE++ outperforms all existing SLAM systems (e.g., LVI-SAM, LIO-SAM, FASTLIO2) in overall accuracy . R3LIVE++ is resistant to extremely demanding conditions under which LiDAR and camera measurements degenerate, according to studies on their dataset (for example, when the device faces a single wall with no texture).

Finally, compared to other competitors, R3LIVE++ calculates the camera’s exposure time more accurately and reconstructs the true environment brightness information with significantly lower errors than the measured pixels in the photos. To their knowledge, this is the first radiance map reconstruction framework capable of real-time performance on a PC with an average CPU and no hardware or GPU acceleration. The technology is open source to facilitate replication of current work and aid future research. Based on a collection of offline tools for model reconstruction and improved textures, the system has enormous potential in several real-world applications, such as 3D HDR photography, physics simulation and video games.

Code implementations and sample videos are available on GitHub.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'R3LIVE++: A Robust, Real-time, Radiance reconstruction package with a tightly-coupled LiDAR-Inertial-Visual state Estimator'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and github link.

Please Don't Forget To Join Our ML Subreddit


Consultant intern in content writing at Marktechpost.


Comments are closed.