李艳君1,黄康为1,2,项 基3※
(1. 浙大城市学院,杭州 310015;2. 浙江大学控制科学与工程学院,杭州 310027;3. 浙江大学电气工程学院,杭州 310027)
获取渔业养殖鱼类生长态势的人工测量方法费时费力,且影响鱼的正常生长。为了实现水下鱼体信息动态感知和快速无损检测,该研究提出立体视觉下动态鱼体尺寸测量方法。通过双目立体视觉技术获取三维信息,再通过Mask-RCNN(Mask Region Convolution Neural Network)网络进行鱼体检测与精细分割,最后生成鱼表面的三维点云数据,计算得到自由活动下多条鱼的外形尺寸。试验结果表明,长度和宽度的平均相对误差分别在4.7%和9.2%左右。该研究满足了水产养殖环境下进行可视化管理、无接触测量鱼体尺寸的需要,可以为养殖过程中分级饲养和合理投饵提供参考依据。
本研究主要基于深度学习和立体视觉,实现不同场景不同种类鱼体尺寸的快速无损测量。在自主搭建的鱼体尺寸测量平台上,开发了水产养殖监控系统和鱼体尺寸计算程序。通过相机标定、立体校正和匹配实现对双目采集图像的三维重建;并制作数据集来训练掩膜区域卷积神经网络(Mask Region Convolution Neural Network,Mask-RCNN)模型,再结合形态学和GrabCut算法实现鱼体检测与分割;根据鱼体分割的三维信息提取鱼表面数据,经坐标变换统一鱼体的方向和位置,计算鱼体长度和宽度信息。该方法为自由活动状态下鱼体尺寸信息的快速自动获取提供了思路。
本研究的试验样本选用花鲈15条,体长分布在112.3~141.8 mm;珍珠石斑鱼5条,体长分布在233.0~247.0 mm;鲈鱼5条,体长分布在245.0~290.0 mm。设计了“一桶多鱼”和“一桶一鱼”2 种场景,采用直径1 m、高1 m的圆桶和880 mm×630 mm×650 mm的方形养殖箱作为养殖容器。多条鱼放置于同一养殖桶中的“一桶多鱼”场景与实际水产养殖环境相似,用于获取图像制作数据集,同时验证深度学习模型的检测分割效果。“一桶一鱼”场景是将一条已人工测量尺寸的鱼单独放置于养殖箱中,用于验证鱼体尺寸测量算法准确性。
自行研制并搭建的鱼体尺寸测量系统如图1所示。双目相机放置于防水外壳中,通过USB数据线传输鱼的水下视频数据。双目相机分辨率像素为2 560×720,视频采集帧率为30 Hz,基线长度为6 cm。树莓派负责视频流推送。云端服务器配备四路Nvidia GTX 1080 Ti显卡,为深度学习的计算能力提供保障。软件部分主要实现数据采集、传输、计算、结果输出等功能。测量算法由Python语言编写,通过OpenCV计算机视觉库实现图像相关操作,基于Tensorflow的Keras框架实现深度学习模型搭建与训练。
图1 鱼体尺寸测量系统
1.2.1 三维重建
注:pleft和pright分别为双目的左右成像平面;P为物点;z为物点的深度,mm;(ul, vl)和(ur, vr)为物点在2个成像平面上的像点坐标;Oleft和Oright分别为左右相机的主点;t为双目的基线长度,mm;f为立体校正后的焦距长度。
采用半全局块匹配算法(Semi-Global Block Matching,SGBM)[24-25]获取视差值,经立体匹配后得到视差图[26]。在立体匹配过程中,由于遮挡、噪声、单一背景以及视差值过大等原因无法在搜索范围内找到匹配点,无法得到视差值。物点的三维坐标由对应像点在图像中的坐标和视差值计算得到,如式(3)所示:
1.2.2 检测与分割
与传统图像分割方法相比,基于深度学习的目标检测与分割方法可克服应用环境等因素对检测和分割结果的影响。本研究选取Mask-RCNN[27]网络完成鱼体检测和分割任务,计算出目标边界框和分割的结果。共获取3 712张人工标注图像制作数据集,其中训练集图像2 662张,验证集图像750 张。数据集中只标注鱼尾根部和鼻尖都在图像中且完整无遮挡的鱼。
图3 GrabCut交互式分割算法优化分割结果
采用平均像素交并比[30](mean Intersection Over Union,mIOU)评价分割效果,交并比(Intersection Over Union,IOU)为模型分割结果与标注结果间的交集像素数与并集像素数比值,mIOU为所有预测结果IOU的均值。
1.2.3 三维点云处理
Note: The rectangle is plane fitted by points clouds of contour;axis,axis andaxis are the axes before first transformation, mm;′,′ axis,axis andaxis are the origin and axes after first transformation, mm; the ellipse is fitted by the projection points of the contour points cloud on the fitting plane;″,″ axis,″ axis and″axis are the origin and axes after second transformation, mm.
图4 坐标变换计算过程
Fig.4 Procedure of coordinate transformations
为验证深度学习模型性能,将置信度阈值设为0.9,由式(5)得到模型精确率为88%,召回率为84%。经过GrabCut精细化分割处理后,mIOU由78%提升为81%。图像的处理速度为2.3 Hz。结果表明所训练的Mask-RCNN网络能实现较好的检测效果,且本研究采用的基于形态学操作和GrabCut算法的分割精细化能够有效提高分割精度。
表1 不同种类鱼体尺寸计算结果
图7 不同角度下的长度平均相对误差、宽度平均相对误差和角度频数分布直方图
1)采集了3 712张水下鱼图像,通过多边形标注工具制作水产养殖环境下鱼类分割数据集,训练掩膜区域卷积神经网络(Mask Region Convolution Neural Network,Mask-RCNN)模型,实现鱼体检测与分割。模型在验证集上精确率为88%,召回率为84%。采用GrabCut交互式分割算法在边缘附近精细化处理,使分割结果的平均像素交并比(mean Intersection Over Union,mIOU)由78%提升至81%,提高了分割精确度。
3)计算结果与人工测量结果进行比较,长度测量的平均相对误差为4.7%,宽度测量的平均相对误差为9.2%,计算速度为2 Hz。表明本研究提出的水下游动鱼体尺寸测量方法,具有计算的准确性和快速性以及良好的泛化能力,且体型较大鱼的平均相对测量误差会较低,为水产养殖中游动鱼体尺寸无接触测量提供了可行方法。
Measurement of dynamic fish dimension based on stereoscopic vision
Li Yanjun1, Huang Kangwei1,2, Xiang Ji3※
(1.,310015,; 2.,310027,; 3.,,310027,)
Fish dimension information, especially length, is very important for aquaculture, which can be used for grading and developing bait strategy. In order to acquire accurate information on fish size, the traditional method of measurement has to take the fish out of the water, which is not only time-consuming and laborious but also may influence the growth rates of fishes. In this study, a dynamic measurement method for fish body dimension based on stereo vision was proposed, which could calculate dimension information of multiple fishes simultaneously without restricting their movements. It was implemented and verified by an intelligent monitor system designed and built by ourselves considering the hardware compatibility with satisfied integral performance. Through this system, the videos of underwater fish were captured and uploaded to the remote cloud server for further processing. Then three main procedures were developed including 3D reconstruction, fish detection and segmentation, 3D points cloud processing, which was designed for size acquirement of fishes swimming freely in a real aquaculture environment. In the 3D reconstruction part, in order to acquire the data for modeling, 3D information was restored from binocular images by camera calibration, stereo rectification, stereo matching in sequence. Firstly, the binocular was calibrated with a chessboard to get camera parameters including intrinsic matrix as well as relative translation and rotation of the left and right camera. Then, the captured binocular images were rectified to row-aligned according to parameters of the calibrated binocular camera. Finally, stereo matching based on the semi-global block matching method (SGBM) was applied to extract accurate 3D information from rectified binocular image pairs and achieved 3D reconstruction. In the fish detection and segmentation part, a Mask Region Convolution Neural Network (Mask-RCNN) was trained as a model to locate fishes in the image with a bounding box and extract pixels of fish in each bounding box to get raw segmentation. The raw segmentation was refined with an interactive segmentation method called GrabCut combining with some morphological processing algorithms to correct bias around the edge. In the 3D points cloud processing part, two coordinate transformations were carried out to unify the cloud points of fishes with various locations and orientations. The transformation parameters were calculated based on three-dimension plane fitting of the contour points cloud and rotated ellipse fitting of the transformed points cloud respectively. After transformation, the length and width of the fish points cloud were parallel to axes. Therefore, the length and width of fish were the range of points cloud along the abscissa and ordinate axes. Experiments were conducted using the self-designed system and results including various species and sizes of fish were compared with those of manual measurements. It turned out that the average relative estimation error of length was about 4.7% and the average relative estimation error of width was about 9.2%. In terms of running time, the developed measurement system could process 2.5 frames per second for fish dimensions calculation. The experiment results also showed that the trained Mask-RCNN model achieved the precision of 0.88 and the recall of 84% with satisfied generalization performance. After segmentation refinement, the mean intersection over union increased from 78% to 81%, which exhibited the effectiveness of the refinement method. It also showed that the longer the fish length, the smaller the average relative error of the measurement. These results demonstrated that the proposed method was able to measure multiple underwater fish dimensions based on a stereoscopic vision method by using deep learning-based image segmentation algorithms and coordinates transformation method. This study could provide a novel idea for flexible measurement of fish body size and improve the level of dynamic information perception technology for rapid and non-destructive detection of underwater fish in aquaculture.
fish; machine vision; three-dimensional reconstruction; image segmentation; deep learning; Mask-RCNN; 3D cloud points processing
李艳君,黄康为,项基. 基于立体视觉的动态鱼体尺寸测量[J]. 农业工程学报,2020,36(21):220-226. doi:10.11975/j.issn.1002-6819.2020.21.026 http://www.tcsae.org
Li Yanjun, Huang Kangwei, Xiang Ji. Measurement of dynamic fish dimension based on stereoscopic vision[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(21): 220-226. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2020.21.026 http://www.tcsae.org