杨世强 罗晓宇 乔丹 柳培蕾 李德信
摘 要:针对现有动作识别中对连续动作识别研究较少且单一算法对连续动作识别效果较差的问题,提出在单个动作建模的基础上,采用滑动窗口法和动态规划法结合,实现连续动作的分割与识别。首先,采用深度置信网络和隐马尔可夫结合的模型DBN-HMM对单个动作建模;其次,运用所训练动作模型的对数似然值和滑动窗口法对连续动作进行评分估计,实现初始分割点的检测;然后,采用动态规划对分割点位置进行优化并对单个动作进行识别。在公开动作数据库MSR Action3D上进行连续动作分割与识别测试,结果表明基于滑动窗口的动态规划能够优化分割点的选取,进而提高识别精度,能够用于连续动作识别。
关键词:隐马尔可夫模型;动作分割;动作识别;滑动窗口;动态规划
中图分类号: TP391.4
文献标志码:A
Abstract: Concerning the fact that there are few researches on continuous action recognition in the field of action recognition and single algorithms have poor effect on continuous action recognition, a segmentation and recognition method of continuous actions was proposed based on single motion modeling by combining sliding window method and dynamic programming method. Firstly, the single action model was constructed based on the Deep Belief Network and Hidden Markov Model (DBN-HMM). Secondly, the logarithmic likelihood value of the trained action model and the sliding window method were used to estimate the score of the continous action, detecting the initial segmentation points. Thirdly, the dynamic programming method was used to optimize the location of the segmentation points and identify the single action. Finally, the testing experiments of continuous action segmentation and recognition were conducted with an open action database MSR Action3D. The experimental results show that the dynamic programming based on sliding window can optimize the selection of segmentation points to improve the recognition accuracy, which can be used to recognize continuous action.
Key words: Hidden Markov Model (HMM); action segmentation; action recognition; sliding window; dynamic programming
0 引言
人體动作识别是近年来诸多邻域研究的热点[1], 如视频监控[2]、人机交互[3]等领域。随着人口老龄化,服务机器人将在未来的日常生活中发挥重要作用,观察和反映人类行动将成为服务机器人的基本技能[4]。动作识别逐渐应用到人们生活和工作的各个方面,具有深远的应用价值。
动作行为一般是以连续动作的形式来体现,包含多个单一动作,行为识别时根据分割与识别的前后关系,可分为直接分割和间接分割。直接分割是先根据简单的参数大小变化确定分割边界,然后识别分割好的片段,如白栋天等[5]根据关节速度、关节角度的变化对动作序列进行初始分割,该方法较为简单快速,但对于较复杂的连续动作分割误差较大。间接分割是分割与识别同时进行,连续动作的分割与识别在实际中相互耦合,动作分割结果会影响动作识别,且动作分割一般需要动作识别的支持。在连续动作的识别中使用较多的算法有动态时间规整(Dynamic Time Warping, DTW)[6]、连续动态规划(Continuous Dynamic Programming, CDP)[7]和隐马尔可夫模型(Hidden Markov Model, HMM)。
Gong等[8]用动态流形变化法(Dynamic Manifold Warping, DMW)计算两个多变量时间序列之间的相似性,实现动作分割与识别。Zhu等[9]利用基于特征位差的在线分割方法将特征序列划分为姿态特征段和运动特征段,通过在线模型匹配计算每个特征片段可以被标记为提取的关键姿态或原子运动的似然概率。Lei等[10]提出了一种结合卷积神经网络(Convolutional Neural Network, CNN)和HMM的分层框架(CNN-HMM),对连续动作同时进行分割与识别,能提取有效鲁棒的动作特征,对动作视频序列取得了较好的识别结果,而且HMM具有较强的可扩展性。Kulkarni等[11] 设计了一种视觉对准技术动态帧规整(Dynamic Frame Warping, DFW),对每个动作视频训练超级模板,能实现多个动作的分割与识别;但在测试中,测试动作序列帧与模板的距离计算复杂度较高,且与概率统计的方法相比,模型训练方面学习能力较低。Evangelidis等[12]采用滑动窗口来构造框架式Fisher向量,由多类支持向量机(Support Vector Machine, SVM)进行分类,由于滑动窗口法对动作序列长度的固定,导致对相同类别且长度有较大差异的动作识别较差。
和HMM結合的复合模型DBN-HMM对单个动作建模,该复合模型对时序数据具有较强的建模能力和模型学习能力,然后利用评分机制和滑动窗口法对初始分割点进行检测,最后用动态规划法进行分割点优化与识别。利用滑动窗口可降低动态规划计算复杂度,而动态规划能弥补滑动窗口固定长度的缺陷,最终实现最优分割点的检测。
1 单个动作建模
连续动作识别中首先对连续动作中的单个动作分别建模,在此使用DBN与HMM相结合的模型DBN-HMM对动作建模。
1.1 特征提取
人体动作可以表示为三维空间中人体不同肢体的旋转变化,结合由关节点组成的人体骨架模型,可由人体的20个关节点在空间中的三维坐标表示人体姿态,各关节点位置分别为:头部、左/右肩关节、肩膀中心、左/右肘关节、脊柱中心、左/右手腕关节、左/右手、左/右髋关节、臀部中心、左/右膝盖、左/右脚踝、左/右脚。在肢体角度模型中,一个肢体由人体20个关节点中两个相邻关节点在空间中的相对位置来表示。假设所有关节都是从脊柱关节点延伸出的,由相邻两个关节点组成的一个肢体中,靠近脊柱关节的关节点定义为父关节点,另一个定义为子关节点。通过坐标系转换将世界坐标系转换为局部球坐标系来表示每个肢体的相对位置信息,以每个肢体中的父关节点作为球坐标系的原点,子关节点与父关节点的连线长度为r,其在球坐标系中与Z轴的夹角为φ,投影到XOY平面上与X轴的夹角为θ,一个肢体角度模型可以表示为(r,θ,φ),如图1所示。由于距离r包含有人体尺寸的影响,因此去掉距离r,由(θ,φ)表示肢体角度模型。
5 结语
针对现有动作识别中对连续动作识别研究较少,且单一算法对连续动作识别效果较差的问题,本文给出了一种连续动作的分割与识别方法——采用滑动窗口法和动态规划法结合用于连续动作的分割与识别。建立的DBN-HMM具有较强的建模能力,结合滑动窗口和动态规划对连续动作分割点进行检测,使两种方法互补,既能降低计算复杂度又能弥补固定长度的限制。实验结果表明,本文方法在复杂连续动作的分割与识别中获得了较好的识别结果。不过算法的识别率还有进一步提高的空间,在后续研究中需考虑开展采集连续动作视频的动作分割与识别。
参考文献
[1] 胡琼,秦磊,黄庆明.基于视觉的人体动作识别综述[J].计算机学报,2013,36(12): 2512-2524. (HU Q, QIN L, HUANG Q M. A survey on visual human action recognition [J]. Chinese Journal of Computers, 2013, 36 (12):2512-2524.)
[2] AGGARWAL J K, RYOO M S. Human activity analysis:a review[J]. ACM Computing Surveys, 2011, 43(3): Article No. 16.
[3] KOPPULA H S, SAXENA A. Anticipating human activities using object affordances for reactive robotic response [J]. IEEE Transactions on Pattern analysis and Machine Intelligence, 2015, 38(1): 1-14.
[4] ZHANG C, TIAN Y. RGB-D camera-based daily living activity recognition [J]. Journal of Computer Vision and Image Processing, 2012, 2(4): 1-7.
[5] 白栋天,张磊,黄华.RGB-D视频中连续动作识别[J].中国科技论文,2016(2):168-172. (BAI D T, ZHANG L,HUANG H. Recognition continuous human actions from RGB-D videos[J]. China Science Paper, 2016(2): 168-172.)
[6] DARRELL T, PENTLAND A. Space-time gestures [C]// Proceedings of the 1993 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 1993: 335-340.
[7] OKA R. Spotting method for classification of real world data[J]. Computer Journal, 1998, 41(8): 559-565.
[8] GONG D, MEDIONI G, ZHAO X. Structured time series analysis for human action segmentation and recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1414-1427.
[9] ZHU G, ZHANG L, SHEN P, et al. An online continuous human action recognition algorithm based on the Kinect sensor [J]. Sensors, 2016, 16(2): 161-179.
[10] LEI J, LI G, ZHANG J, et al. Continuous action segmentation and recognition using hybrid convolutional neural network-hidden Markov model model[J]. IET Computer Vision, 2016, 10(6): 537-544.
[11] KULKARNI K, EVANGELIDIS G, CECH J, et al. Continuous action recognition based on sequence alignment[J]. International Journal of Computer Vision, 2015, 112(1): 90-114.
[12] EVANGELIDIS G D, SINGH G, HORAUD R. Continuous gesture recognition from articulated poses [C]// Proceedings of the 2014 European Conference on Computer Vision. Cham: Springer, 2014: 595-607.
[13] SONG Y, GU Y, WANG P, et al. A Kinect based gesture recognition algorithm using GMM and HMM [C]// Proceedings of the 2013 6th International Conference on Biomedical Engineering and Informatics. Piscataway, NJ: IEEE, 2013: 750-754.
[14] VITERBI A J. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm [J]. IEEE Transactions on Information Theory, 1967, 13(2): 260-269.
[15] TAYLOR G W, HINTON G E, ROWEIS S. Modeling human motion using binary latent variables [C]// Proceedings of the 19th International Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2007: 1345-1352.
[16] HINTON G E, SIMON O, TEH Y W, et al. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2014, 18(7): 1527-1554.
[17] LI W, ZHANG Z, LIU Z. Action recognition based on a bag of 3D points [C]// Proceedings of the 2010 IEEE Computer Vision and Pattern Recognition Workshops. Washington, DC: IEEE Computer Society, 2010: 9-14.