涂淑琴,黄 磊,梁 云,黄正鑫,李承桀,刘晓龙
涂淑琴,黄 磊,梁 云※,黄正鑫,李承桀,刘晓龙
(华南农业大学数学与信息学院,广州 510642)
为实现群养生猪在不同场景下(白天与黑夜,猪只稀疏与稠密)的猪只个体准确检测与实时跟踪,该研究提出一种联合检测与跟踪(Joint Detection and Embedding,JDE)模型。首先利用特征提取模块对输入视频序列提取不同尺度的图像特征,产生3个预测头,预测头通过多任务协同学习输出3个分支,分别为分类信息、边界框回归信息和外观信息。3种信息在数据关联模块进行处理,其中分类信息和边界框回归信息输出检测框的位置,结合外观信息,通过包含卡尔曼滤波和匈牙利算法的数据关联算法输出视频序列。试验结果表明,本文JDE模型在公开数据集和自建数据集的总体检测平均精度均值(mean Average Precision,mAP)为92.9%,多目标跟踪精度(Multiple Object Tracking Accuracy,MOTA)为83.9%,IDF1得分为79.6%,每秒传输帧数(Frames Per Second,FPS)为73.9帧/s。在公开数据集中,对比目标检测和跟踪模块分离(Separate Detection and Embedding,SDE)模型,本文JDE模型在MOTA提升0.5个百分点的基础上,FPS提升340%,解决了采用SDE模型多目标跟踪实时性不足问题。对比TransTrack模型,本文JDE模型的MOTA和IDF1分别提升10.4个百分点和6.6个百分点,FPS提升324%。实现养殖环境下的群养生猪多目标实时跟踪,可为大规模生猪养殖的精准管理提供技术支持。
多目标跟踪的性能在很大程度上取决于其检测目标的性能。传统的目标检测算法,如Zhao等[25]使用背景减法来检测移动奶牛目标,Zhang等[26]提出了一种基于光流估计的运动目标检测方法,于欣等[27]提出一种基于光流法与特征统计的鱼群异常行为检测方法,这些算法在速度和准确性方面不能满足实际场景要求。目前,基于深度学习的目标检测算法不断完善,其准确性和速度都有显著提升,能够满足实际应用。深度学习的目标检测算法主要分为一阶段和二阶段算法。二阶段算法在检测时首先生成候选区域,之后对候选区域进行分类和校准,准确率相对较高,典型的有R-CNN(Region Convolution Neural Network)算法[28],Fast R-CNN算法[29],Faster R-CNN算法[30]。如王浩等[31]利用改进的Faster R-CNN算法定位群养生猪的圈内位置,识别准确率可达96.7%。一阶段算法在检测时无需生成候选区域,直接对目标类别和边界进行回归,如YOLO系列算法[32-35]。如金耀等[36]利用YOLOv3算法[32]对生猪个体进行识别,对母猪的识别精度均值达95.16%。相较于二阶段算法,一阶段算法的检测速度更快。
在多目标跟踪方面,现有多目标跟踪算法的应用大多是基于检测跟踪(Tracking by Detection,TBD)范式,即SDE(Separate Detection and Embedding)模型,先用检测器输出检测结果,再用基于卡尔曼滤波和匈牙利算法的后端追踪优化算法进行跟踪,如使用SORT(Simple Online and Realtime Tracking)[37]、DeepSORT[38]算法来提取目标的表观特征进行多目标重识别进行跟踪,其中DeepSORT算法在SORT算法的基础上,通过提取深度表观特征提高了多目标的跟踪效果。如张宏鸣等[39]利用改进YOLOv3算法结合DeepSORT算法进行肉牛多目标跟踪,张伟等[40]利用基于CenterNet结合优化DeepSORT算法进行断奶仔猪目标跟踪。上述研究的算法是两阶段过程,先检测再跟踪,目标检测和跟踪模块分离导致跟踪速度慢,达不到实时跟踪效果。
本研究将目标检测与跟踪融合在一个过程中,提出一种实时、非接触的群养生猪多目标跟踪JDE(Joint Detection and Embedding)算法,通过一个端对端网络同时输出多目标的分类信息、边界框回归信息和外观信息,以减少算法的运行时间,达到实时跟踪的效果。在相同的公开试验数据集中将JDE算法与SDE算法进行对比,以验证本文算法的速度,同时与TransTrack算法[41]对比,进一步验证本文算法的准确性与实时性。
图1 基于JDE的群养生猪多目标跟踪算法
图2 特征提取网络结构
表1 Darknet-53网络结构参数
图3 卡尔曼滤波结合匈牙利算法的猪只目标跟踪流程
表2 公开数据集
首先,利用FFmpeg软件完成视频剪辑,从中截取稠密、稀疏、白天、黑夜的视频,2部分数据集共21个视频。然后利用DarkLabel软件对数据进行标注,其中,公开数据集11个视频,共3 300张图像,自建数据集10个视频,共1 000张图像。部分数据集如图4所示。为对比不同场景下模型的检测和跟踪能力,选取不同的视频进行模型训练和测试,参与训练的视频不参与测试。本文共设计3个试验,其中试验1以视频4、6、12为测试集,这些视频均为白天稠密,其余视频为训练集。试验2以视频2、5、8为测试集,其中视频5、8分别为夜晚稀疏与夜晚稠密,视频2为白天稀疏,其余视频为训练集。试验3以自建数据集的7个视频为测试集(视频3、11、14、16、18、19、21),另外3个视频为测试集(视频13、17、20)。其中猪只活动水平定义如下:根据视频的人工观察结果,在白天(10:00-12:30)猪只的饮食和玩耍等行为较频繁,此时间段定义为猪只白天的高活动水平。在白天(12:30-17:00)或夜晚(17:00-20:00)猪只的饮食和玩耍等行为没有白天(10:00-12:30)高,此时间段定义为白天或夜晚的中等活动水平。在白天(7:00-10:00)或夜晚(20:00-7:00)猪只的饮食和玩耍等行为较少,躺卧行为较多,此时间段定义为白天或夜晚的低活动水平。
图4 部分数据集
本文所有试验在同一计算机上完成,硬件配置为12th Gen Intel(R) i9-12900KF CPU,NVIDIA GeForce RTX 3090 GPU,32GB内存,64位Linux操作系统,Pytorch版本1.7.1,Python版本3.8,CUDA版本11.0。
训练过程中设置图片尺寸为416×416(像素),批处理大小(Batchsize)设置为32,初始学习率(Learning Rate)为0.01,动量(Momentum)设置为0.9,共训练30个时期(Epoch),使用随机梯度下降法(Stochastic Gradient Descent, SGD)进行优化,保存训练过程中精度最高的模型参数进行模型测试。
选择精确率(Precision,),召回率(Recall,)和平均精度均值(mean Average Precision,mAP)3个指标评判模型的检测性能。精确率衡量模型对猪只目标检测的精确程度,如式(5),其中DTP是检测正确的目标数量,DFP是检测错误的目标数量。
选择多目标跟踪精度(Multiple Object Tracking Accuracy,MOTA)和IDF1得分(ID F1 Score)作为多目标跟踪的主要评价指标。MOTA衡量跟踪器检测目标和保持轨迹跟踪的性能。IDF1为引入跟踪目标标号ID的F1值,由于引入了跟踪目标标号ID,IDF1更重视目标的轨迹跟踪能力。MOTA计算公式如式(8)所示。
此外,其他相关指标还有碎片数(Fragmentation,FM)、主要跟踪到的目标(Mostly Tracked Target,MT)(被跟踪到的轨迹比例大于80%)、主要丢失目标(Mostly Lost Target,ML)(被跟踪到的轨迹比例小于20%)、部分跟踪到的目标(Partially Tracked Target,PT)(被跟踪到的轨迹比例不大于80%且不小于20%)、一条跟踪轨迹改变目标标号ID的次数(Identity Switches,IDS)以及平均每秒传输帧数(Frames Per Second,FPS)。
表3 JDE模型的目标检测试验结果
表4 JDE模型的多目标跟踪试验结果
图6 猪只白天和夜晚不同分布情况的的可视化分析结果
表5 SDE模型的多目标跟踪试验结果
图7 JDE与SDE模型对猪只不同分布情况的可视化结果对比
表6 TransTrack模型的试验结果
图8 JDE与TransTrack模型的可视化结果对比
[1] Rowe E, Dawkins M S, Gebhardt-Henrich S G A. Systematic review of precision livestock farming in the poultry sector: Is Technology focussed on improving bird welfare?[J]. Animals (Basel), 2019, 9(9): 614.
[2] Cowton J, Kyriazakis I, Plotz T , et al. A combined deep learning GRU-autoencoder for the early detection of respiratory disease in pigs using multiple environmental sensors[J]. Sensors (Basel), 2018, 18(8): 2521.
[3] Sébastien F, Alain N R, Benoit L. Rethinking environment control strategy of confined animal housing systems through precision livestock farming[J]. Biosystems Engineering, 2017, 155: 96-123.
[4] Zambelis A, Wolfe T, Vasseur E. Technical note: Validation of an ear-tag accelerometer to identify feeding and activity behaviors of tiestall-housed dairy cattle[J]. Journal of Dairy Science, 2019, 102(5): 4536-4540.
[5] Giovanetti V, Decandia M, Molle G, et al. Automatic classification system for grazing, ruminating and resting behaviour of dairy sheep using a tri-axial accelerometer[J]. Livestock Science, 2017, 196: 42-48.
[6] Krista M M, Elizabeth A S, Carlos J B R, et al. Technical note: Validation of an automatic recording system to assess behavioural activity level in sheep (Ovis aries)[J]. Small Ruminant Research, 2015, 127: 92-96.
[7] Chen C, Zhu W X, Ma C H, et al. Image motion feature extraction for recognition of aggressive behaviors among group-housed pigs[J]. Computers and Electronics in Agriculture, 2017, 142: 380-387.
[8] Chen C, Zhu W X, Guo Y Z, et al. A kinetic energy model based on machine vision for recognition of aggressive behaviours among group-housed pigs[J]. Livestock Science, 2018, 218: 70-78.
[9] Chen C, Zhu W X, Liu D, et al. Detection of aggressive behaviours in pigs using a RealSence depth sensor[J]. Computers and Electronics in Agriculture, 2019, 166: 105003.
[10] Chen C, Zhu W X, Steibel J, et al. Recognition of aggressive episodes of pigs based on convolutional neural network and long short-term memory[J]. Computers and Electronics in Agriculture, 2020, 169: 105166.
[11] Alameer A, Kyriazakis I, Bacardit J. Automated recognition of postures and drinking behaviour for the detection of compromised health in pigs[J]. Scientific Reports, 2020, 10(1): 13665.
[12] Lao F, Brown B, Stinn J P, et al. Automatic recognition of lactating sow behaviors through depth image processing[J]. Computers and Electronics in Agriculture, 2016, 125: 56-62.
[13] Zhu W X, Guo Y Z, Jiao P P, et al. Recognition and drinking behaviour analysis of individual pigs based on machine vision[J]. Livestock Science, 2017, 205: 129-136.
[14] Leonard S M, Xin H, Brown-Brandl T M, et al. Development and application of an image acquisition system for characterizing sow behaviors in farrowing stalls[J]. Computers and Electronics in Agriculture, 2019, 163: 104866.
[15] Yang A Q, Huang H S, Zheng B, et al. An automatic recognition framework for sow daily behaviours based on motion and image analyses[J]. Biosystems Engineering, 2020, 192: 56-71.
[16] Zhang Y Q, Cai J H, Xiao D Q, et al. Real-time sow behavior detection based on deep learning[J]. Computers and Electronics in Agriculture, 2019, 163: 104884.
[17] Nasirahmadi A, Hensel O, Edwards S, et al. Automatic detection of mounting behaviours among pigs using image analysis[J]. Computers and Electronics in Agriculture, 2016, 124: 295-302.
[18] Li D, Chen Y F, Zhang K F, et al. Mounting beaviour recognition for pigs based on deep learning[J]. Sensors (Basel), 2019, 19(22): 4924.
[19] Nasirahmadi A, Sturm B, Olsson A, et al. Automatic scoring of lateral and sternal lying posture in grouped pigs using image processing and support vector machine[J]. Computers and Electronics in Agriculture, 2019, 156: 475-481.
[20] Zheng C, Zhu X M, Yang X F, et al. Automatic recognition of lactating sow postures from depth images by deep learning detector[J]. Computers and Electronics in Agriculture, 2018, 147: 51-63.
[21] Zhu X M, Chen C X, Zheng B, et al. Automatic recognition of lactating sow postures by refined two-stream RGB-D faster R-CNN[J]. Biosystems Engineering, 2020, 189: 116-132.
[22] Zheng C, Yang X F, Zhu X M, et al. Automatic posture change analysis of lactating sows by action localisation and tube optimisation from untrimmed depth videos[J]. Biosystems Engineering, 2020, 194: 227-250.
[23] Jorquera-Chavez M, Fuentes S, Dunshea F R, et al. Remotely sensed imagery for early detection of respiratory disease in pigs: A pilot study[J]. Animals (Basel), 2020, 10(3): 451.
[24] Jorquera-Chavez M, Fuentes S, Dunshea F R, et al. Using imagery and computer vision as remote monitoring methods for early detection of respiratory disease in pigs[J]. Computers and Electronics in Agriculture, 2021, 187: 106283.
[25] Zhao K X, He D J. Target detection method for moving cows based on background subtraction[J]. International Journal of Agricultural and Biological Engineering, 2015, 8(1): 42-49.
[26] Zhang Y G, Zheng J, Zhang C, et al. An effective motion object detection method using optical flow estimation under a moving camera[J]. Journal of Visual Communication and Image Representation, 2018, 55: 215-228.
[27] 于欣,侯晓娇,卢焕达,等. 基于光流法与特征统计的鱼群异常行为检测[J]. 农业工程学报,2014,30(2):162-168.
Yu Xin, Hou Xiaojiao, Lu Huanda, et al. Anomaly detection of fish school behavior based on features statistical and optical flow methods[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2014, 30(2): 162-168. (in Chinese with English abstract)
[28] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Columbus, OH, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014: 580-587.
[29] Girshick R. Fast R-CNN[C]// Santiago, Chile, IEEE International Conference on Computer Vision (ICCV), 2015: 1440-1448.
[30] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[31] 王浩,曾雅琼,裴宏亮,等. 改进 Faster R-CNN 的群养猪只圈内位置识别与应用[J]. 农业工程学报,2020,36(21):201-209.
Wang Hao, Zeng Yaqiong , Pei Hongliang, et al. Recognition and application of pigs’position in group pens based on improved Faster R-CNN[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(21): 201-209. (in Chinese with English abstract)
[32] Redmon J, Farhadia A. YOLOv3: An incremental improvement [EB/OL]. 2018-04-08, https://pjreddie.com/media/files/papers/ YOLOv3.pdf.
[33] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Las Vegas, NV, USA, Conference on Computer Vision and Pattern Recognition (CVPR), 2016: 779-788.
[34] Redmon J, Farhadi A. YOLO9000: Better,faster,stronger[C]//Honolulu, HI, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 7263-7271.
[35] Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: Optimal speed and accuracy of object detection[EB/OL]. 2020-04-23, https://arxiv.org/pdf/2004.10934.pdf.
[36] 金耀,何秀文,万世主,等. 基于YOLO v3的生猪个体识别方法[J]. 中国农机化学报,2021,42(2):178-183.
Jin Yao, He Xiuwen, Wan Shizhu, et al.Individual pig identification method based on YOLOv3[J]. Journal of Chinese Agricultural Mechanization, 2021, 42(2): 178-183. (in Chinese with English abstract)
[37] Bewley A, Ge Z Y, Ott L, et al. Simple online and realtime tracking[C]//Phoenix, Arizona, USA. IEEE International Conference on Image Processing (ICIP), 2016: 3464-3468.
[38] Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric[C]//Beijing, China. IEEE International Conference on Image Processing (ICIP), 2017: 3645-3649.
[39] 张宏鸣,汪润,董佩杰,等. 基于DeepSORT算法的肉牛多目标跟踪方法[J]. 农业机械学报,2021,52(4):249-256.
Zhang Hongming, Wang Run, Dong Peijie, et al. Multi-object tracking method for beef cattle based on DeepSORT algorithm[J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52(4): 249-256. (in Chinese with English abstract)
[40] 张伟,沈明霞,刘龙申,等. 基于CenterNet搭配优化DeepSORT算法的断奶仔猪目标跟踪方法研究[J]. 南京农业大学学报,2021,44(5):973-981.
Zhang Wei, Shen Mingxia, Liu Longshen, et al. Research on weaned piglet target tracking method based on CenterNet collocation optimized DeepSORT algorithm[J]. Journal of Nanjing Agricultural University, 2021, 44(5): 973-981. (in Chinese with English abstract)
[41] Sun P Z, Cao J K, Jiang Y, et al. TransTrack: Multiple object tracking with transformer[EB/OL]. 2021-05-04, https://arxiv.org/abs/2012.15460v1.
[42] Psota E T, Schmidt T, Mote B, et al. Long-term tracking of group-housed livestock using keypoint detection and MAP estimation for individual animal identification[J]. Sensors (Basel), 2020, 20(13): 3670.
[43] Tu S Q,Yuan W J,Liang Y,et al. Automatic detection and segmentation for group-housed pigs based on PigMS R-CNN[J]. Sensors (Basel), 2021, 21(9): 3251.
Multiple object tracking of group-housed pigs based on JDE model
Tu Shuqin, Huang Lei, Liang Yun※, Huang Zhengxin, Li Chengjie, Liu Xiaolong
Pig production has been always the pillar of the industrial livestock industry in China. Therefore, the pig industry is closely related to food safety, social stability, and the coordinated development of the national economy. An intelligent video surveillance can greatly contribute to the large-scale production of animal husbandry under labor shortage at present. It is very necessary to accurately track and identify the abnormal behavior of group-housed pigs in the breeding scene. Much effort has been focused on Multiple Object Tracking (MOT) for pig detection and tracking. Among them, two parts are included in the Tracking By Detection (TBD) paradigm, e.g., the Separate Detection and Embedding (SDE) model. Previously, the detector has been developed to detect pig objects. And then the tracking models have been selected for the pig tracking using Kalman filter and Hungarian (Sort or DeepSORT). The detection and association steps have been designed to increase the running and training time of the model in the dominant MOT strategy. Thus, real-time tracking cannot fully meet the requirement of the group-housed pigs. In this study, a Joint Detection and Embedding (JDE) model was proposed to automatically detect the pig objects and then track each one in the complex scenes (day or night, sparse or dense). The core of JDE model was to integrate the detector and the embedding model into a single network for the real-time MOT system. Specifically, the JDE model incorporated the appearance model into a single-shot detector. As such, the simultaneous output was performed on the corresponding appearance to improve the runtime and operational efficiency of the model. An overall loss of one multiple task learning loss was utilized in the JDE model. Three loss functions were included classification, box regression and appearance. Three merits were achieved after operations. Firstly, the multiple tasks learning loss was used to realize the object detection and appearance to be learned in a shared model, in order to reduce the amount of occupied memory. Secondly, the forward operation was computed using the multiple tasks loss at one time. The overall inference time was reduced to improve the efficiency of the MOT system. Thirdly, the performance of each prediction head was promoted to share the same set of low-level features and feature pyramid network architecture. Finally, the data association module was utilized to process the outputs of the detection and appearance head from the JDE, in order to produce the position prediction and ID tracking of multiple objects. The JDE model was validated on the special dataset under a variety of settings. The special dataset was also built with a total of 21 video segments and 4 300 images using the dark label video annotation software. Two types of datasets were obtained, where the public dataset contained 11 video sequences and 3 300 images, and the private dataset contained 10 video segments and 1 000 images. The experimental results show that the mean Average Precision (mAP), Multiple Object Tracking Accuracies (MOTA), IDF1 score, and FPS of the JDE on all test videos were 92.9%, 83.9%, 79.6%, and 73.9 frames/s, respectively. A comparison was also made with the SDE model and TransTrack method on the public dataset. The JDE model improved the FPS by 340%, and the MOTA by 0.5 percentage points in the same test dataset, compared with the SDE model. It infers the sufficient real-time performance of MOT using the JDE model. The MOTA, IDF1 metrics, and FPS of the JDE model was improved by 10.4 and 6.6 percentage points, and 324%, respectively, compared with the TransTrack model. The visual tracking demonstrated that the JDE model performed the best detection and tracking ability with the SDE and TransTrack models under the four scenarios, including the dense day, sparse day, dense night, and sparse night. The finding can also provide an effective and accurate detection for the rapid tracking of group-housed pigs in complex farming scenes.
object detection; object tracking; joint detection and tracking; data association; group-housed pigs
涂淑琴,黄磊,梁云,等. 基于JDE模型的群养生猪多目标跟踪[J]. 农业工程学报,2022,38(17):186-195.doi:10.11975/j.issn.1002-6819.2022.17.020 http://www.tcsae.org
Tu Shuqin, Huang Lei, Liang Yun, et al. Multiple object tracking of group-housed pigs based on JDE model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(17): 186-195. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2022.17.020 http://www.tcsae.org