Real-Time Dense Reconstruction of Indoor Scene

2021-12-14 06:06JinxingNiuQingshengHuYiNiuTaoZhangandSunilKumarJha
Computers Materials&Continua 2021年9期

Jinxing Niu,Qingsheng Hu,Yi Niu,Tao Zhang and Sunil Kumar Jha

1Institute of Mechanics,North China University of Water Resources and Electric Power,Zhengzhou,450011,China

2IT Fundamentals and Education Technologies Applications,University of Information Technology and Management in Rzeszow,Rzeszow,100031,Poland

Abstract:Real-time dense reconstruction of indoor scenes is of great research value for the application and development of service robots,augmented reality,cultural relics conservation and other fields.ORB-SLAM2 method is one of the excellent open source algorithms in visual SLAM system,which is often used in indoor scene reconstruction.However,it is time-consuming and can only build sparse scene map by using ORB features to solve camera pose.In view of the shortcomings of ORB-SLAM2 method,this article proposes an improved ORB-SLAM2 solution,which uses a direct method based on light intensity to solve the camera pose.It can greatly reduce the amount of computation,the speed is significantly improved by about 5 times compared with the ORB feature method.A parallel thread of map reconstruction is added with surfel model,and depth map and RGB map are fused to build the dense map.A Realsense D415 sensor is used as RGB-D cameras to obtain the three-dimensional (3D) point clouds of an indoor environments.After calibration and alignment processing,the sensor is applied in the reconstruction experiment of indoor scene with the improved ORB-SLAM2 method.Results show that the improved ORB-SLAM2 algorithm cause a great improvement in processing speed and reconstructing density of scenes.

Keywords:Scene reconstruction;improved ORB-SLAM2;direct method;surfel

1 Introduction

Scene reconstruction is a research focus in the field of computer vision.It has a wide applications in indoor positioning and navigation,semantic maps,augmented reality,virtual reality,cultural relics protection,etc.[1-5].In recent years,some new RGB-D sensors (such as Kinect V1,Kinect V2,Realsense SR300,Realsense D415) are used for the 3D (three dimensional) reconstruction of indoor scenes,and new algorithms are successively produced (such as KinectFusion [6],DynamicFusion [7],ElasticFusion [8],Fusion4D [9],BundleFusion [10]).KinectFusion method is limited to small scenes,which cannot be used for moving,large,or deformation scenes.DynamicFusion method can be used in the reconstruction of non-rigid dynamic scenes.BundleFusion method is used for the reconstruction of a complete large indoor scene.Considering the density and accuracy of reconstruction of indoor scene map,it is often to use GPU accelerator,or even multiple GPUs for parallel acceleration,which limits its application in some situation [11].

In recent years,SLAM (simultaneous localization and mapping) technology is often used for 3D reconstruction of indoor scenes combined with RGB-D sensors,such as RGB-D SLAM v2 [12],ORB-SLAM2 [13]and others.RGB-D SLAM V2 is a 3D reconstruction solution based on the Kinect V1 sensor that can be used for robots,aircraft,and handheld equipment.It performs 3D reconstruction of the scene through operations such as feature point matching and graph optimization.GPU acceleration calculations are needed to obtain real-time reconstruction.ORB-SLAM2 is an upgraded version carried out by Raul Mur-Artal based on ORB-SLAM [14]for real-time reconstruction of monocular camera,which can be also used for binocular camera and RGB-D sensor.The reconstruction point cloud of 3D scene is relatively sparse by ORBSLAM2 based on ORB feature.It will affect the accuracy of 3D reconstruction of indoor scene,and may even cause holes in the generated 3D model.

This article proposes an improved ORB-SLAM2 method and applies it into the real-time reconstruction of indoor scenes.It can promote the 3D reconstruction accuracy of indoor scenes and can reduce the running time simultaneously.

2 Methods

2.1 RGB-D Sensor Calibration

Intel Realsense D415 RGB-D sensor is used to obtain the RGB and depth image of indoor scene.It needs to be calibrated to obtain the internal and external parameters of the cameras.Fig.1 shows the Realsense D415 RGB-D sensor and its calibration experiment.

Figure 1:Realsense D415 RGB-D sensor (left),RGB-D sensor calibration experiment (right)

Figure 2:Three coordinate systems

A point in pixel coordinate can be calculated by formula (1),whereKis the internal parameter matrix of the camera,Mis the external parameter matrix of the camera.The external parameters of the right infrared camera and RGB camera relative to the left infrared camera can be calculated by the formula (2).Rlandtlare the rotation matrix and translation matrix in external parameters of the left infrared camera respectively.Rr/RGBandtr/RGBare the calculated rotation matrix and translation matrix in external parameters of the right infrared camera and RGB camera respectively.

The RGB-D sensor can acquire images at a variety of resolutions.It is set at 640×480 pixels in RGB images and depth images.Twelve pictures of the calibration board at different angles are collected by the left and right infrared cameras and RGB camera respectively.The calibration algorithm in OpenCV is used to calculate the internal parameters and external parameters of cameras.Fig.3 shows part of the corner detection diagram during the camera calibration.

Figure 3:Corner detection diagram during camera calibration

After calibrating the RGB-D sensor,we obtain the internal parameter values of the left and the right infrared cameras and the RGB camera (Kl,Kr,KRGB),and the external parameter values of right infrared camera and RGB camera relative to left infrared camera((Rr|l,tr|l),(RRGB|l,tRGB|l)).as shown in formula (3)-(5):

According to the calibration results,we can see that there is no rotation transformation between the left and right infrared cameras,and the relative horizontal translation is about 5.5 cm.There is both rotation transformation and horizontal translation between RGB camera and left infrared camera.

2.2 Align RGB Map and Depth Map

In order to fuse the RGB map and the depth map,alignment operations are required.The process of the alignment is to convert the depth value in the depth map to the space point of the world coordinate system,and then project it to the RGB map.The schematic diagram of alignment operation is shown in Fig.4.The depth map of the Realsense sensor is acquired through left and right infrared cameras and infrared laser projector.The internal parameter matrices of the left and right cameras are the same according to the calibration results.The acquired internal parameter matrix of the depth map is also the same as the internal parameter matrix of the left and right cameras,Kd=Kl=Kr.

Figure 4:Schematic diagram of alignment operation

Assuming that the pixel coordinates of one point in the RGB map are represented as(uRGB,vRGB,dRGB)T.uRGB,vRGB,dRGBrepresent the abscissa,ordinate,and depth values respectively.The pixel coordinates in the depth map are represented as(ud,vd,dd)T.ud,vd,ddrepresent the abscissa,ordinate,and depth values respectively.The conversion relationship between camera coordinate system and pixel coordinate system in RGB map and depth map are shown in formula(6) and (7) respectively.

Conversely,the conversion relationship between the pixel coordinate system of the RGB map and the depth map to the camera coordinate system are shown in formula (8) and (9):

And that is the life drama that passes before the old maid whileshe looks out upon the rampart, the green, sunny rampart, where thechildren, with their red cheeks and bare shoeless feet, arerejoicing merrily, like the other free little birds.

The relationship between the depth map and the RGB map in camera coordinate system is expressed as formula (10).M′is a 4×4 transformation matrix,including rotation matrix and translation matrix.

Substituting formulas (8) and (9) into (10),the following formula (11) can be obtained:

Wis a 4×4 transformation matrix,it can be expressed as:

Then formula (12) can be expressed as:

According to formula (14),the transformation relationship between depth map and RGB map can be calculated.The aligned RGB map and depth map in experiments are shown in Fig.5.

Figure 5:The aligned RGB map and depth map

2.3 Improved ORB-SLAM2

ORB-SLAM2 is a visual SLAM method based on ORB feature and nonlinear optimization.It mainly includes camera tracking based on ORB feature,trajectory estimation,closed-loop detection and relocation,and local and global optimization [15].It can only construct a sparse point cloud map of indoor scene by using the ORB feature.A direct method is proposed to replace the ORB feature method used in the parallel tracking thread.The surfel map is used to reconstruct the dense indoor scene.The improved ORB-SLAM2 flow diagram is shown as Fig.6,the red line frame in the figure is the improved parts.The improved ORB-SLAM2 is mainly composed of four parallel threads and one global optimization thread.The four parallel threads are the trace thread,the local mapping thread,the loop closing detection thread,and the dense map builder thread.

Figure 6:Improved ORB-SLAM2 flow diagram

2.4 Direct Method to Solve Camera Pose

In ORB-SLAM2,the camera pose is estimated by the ORB features which are extracted between two adjacent frames.The ORB feature is mainly composed of Oriented FAST and BRIEF.The extraction of Oriented FAST and the calculation of BRIEF are time-consuming.It is difficult to perform real-time processing operations on low-performance computers.In order to solve the time-consuming problem of ORB-SLAM2,this article proposes to use the direct method to solve the camera pose.

In the direct method,the camera pose is obtained by minimizing photometric error without concerning the feature between pixel points.The image points of a pointP(X,Y,Z)in world space are respectively denoted asp1,p2at two moments,thenp1,p2can be expressed as Eqs.(15),(16):

whereZ1is the depth ofPin the camera coordinate system at the first moment,Z2is the depth ofPin the camera coordinate system at the second moment,Kis the camera’s internal parameters,Randtare the rotation matrix and the translation matrix,respectively.

Comparing minimizing the reprojection error in the ORB feature method,the aim of direct method is to minimize photometric error,and the formula ise=I1(p1)-I2(p2),eis a scalar value.The optimization calculation is based on the assuming that the gray level of the same point is unchanged in different image.For a space pointPi,the camera pose estimation problem becomes an optimization of formula (17).

Figure 7:Comparison of results by ORB feature method (top) and direct method (bottom)

The direct method to solve the camera pose can be transformed into an optimization problem,which can be solved iteratively using the Gauss-Newton method.Fig.7 shows the calculated results by the two methods.Tab.1 shows the comparison of run-time by the two methods in several tests.The test results show that,compared with the ORB feature method,the solving speed is significantly improved by about 5 times.

Table 1:Comparison of the run-time by ORB feature method and direct method

2.5 Dense Reconstruction of Scenes

In this article,Surfel is used to fuse the depth map and RGB map obtained from RGB-D sensor to reconstruct the indoor scene.Each surfel stores the location of the corresponding spatial point,the radius,the normal vector,the color,and the time information [16].The position,normal vector,and color will be updated according to the weighted fusion result,and the radius is obtained by the distance between the surface and the optical center of camera [8].The radius of each surfel is initialized according to the following formula:

where,dis the depth value corresponding to the surfel,fis the focal length of the depth camera,andnzis the normalzcomponent obtained by central difference estimation of the depth map.The surfel is updated and expanded by continuously fusing the depth map and RGB map,and a densely 3D model based on surfel is reconstructed finally.Fig.8 shows the densely reconstructed three-dimensional model by Surfel.

Figure 8:Three-dimensional model reconstructed by Surfel

3 Experimental Results

In this experiment,a notebook computer with Intel Core i5-4210U CPU and 12G memory is used to carry out the 3D reconstruction of indoor scene.The Realsense D415 sensor is driven to obtain RGB image and depth image with the API interface provided by Realsense SDK 2.0,and the frame rate is 30 fps.The 3D reconstruction map of laboratory scene is reconstructed by the improved ORB-SLAM2 algorithm.The reconstruction process and result are shown in Fig.9.Experimental results show that the improved ORB-SLAM2 algorithm has greatly improved the processing speed and the density of the reconstructed scene map.

Figure 9:The diagram of 3D reconstruction process (left) and result of laboratory scene (right)

4 Conclusion

In order to solve the problem of time-consuming and sparse reconstruction by the ORBSLAM2 scheme,the direct method based on light intensity is used to calculate the camera pose,and the surfel model is used for fusion.A dense scene reconstruction solution is proposed with the depth map and RGB map obtained from RGB-D sensor.Results show that the improved ORB-SLAM2 scene reconstruction method has a great improvement in processing speed and the density of the reconstructed scene map.

Acknowledgement:The authors would like to thank the anonymous reviewers and the editor for the very instructive suggestions that led to the much-improved quality of this paper.

Funding Statement:This work was supported by Henan Province Science and Technology Project under Grant No.182102210065.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.