
2022-01-22 10:34:31杨建伟,严振华,王彩玲
计算机时代 2022年1期


摘  要: 为了提高无监督嵌入学习对图像特征的判别能力,提出一种基于深度聚类的无监督学习方法。通过对图像的嵌入特征进行聚类,获得图像之间的伪类别信息,然后最小化聚类损失来优化网络模型,使得模型能够学习到图像的高判别性特征。在三个标准数据集上的图像检索性能表明了该方法的有效性,并且优于目前大多数方法。

关键词: 无监督学习; 嵌入学习; 深度聚类

中图分类号:TP391          文献标识码:A     文章编号:1006-8228(2022)01-19-03

Unsupervised feature embedding learning via deep clustering

Yang Jianwei1, Yan Zhenhua2, Wang Cailing1

(1. School of Automation of Nanjing University of Posts and Telecommunications, Nanjing, Jiangsu, 210023, China;

2. Wuerth Electronic Tianjin Co,.ltd.)

Abstract: In order to improve the ability of unsupervised embedding learning to distinguish image features, an unsupervised method based on deep clustering is proposed. By clustering the embedded features of images, the pseudo category information between images is obtained, and then the clustering loss is minimized to optimize the network model, so that the model can learn the high discriminant features of images. The performance of image retrieval on three standard data sets shows that the proposed method is effective and better than most of the current methods.

Key words: unsupervised learning; embedding learning; deep clustering

0 引言




1 深度聚类



[D(i)=minfθ(xi)-cj2,   j=1, 2, …, m]  ⑴


[P(i)=D(i)2i=1nD(i)2] ⑵



[s.t.    yΤi1k=1] ⑶



[Lc=1Ni=1nfθ(xi)-c+2fθ(xi)-c-2]  ⑷

在训练过程中,网络和图像特征逐步更新,聚类中心每20 Epochs 更新一次。

2 实验

2.1 数据集介绍


2.2 实验设置

实验采用在ImageNet上预训练后的GoogLeNet[21]作为特征提取网络,并对网络进行微调。在网络的全局池化层之后加上一个512维的全连接层作为输出层。在训练阶段,所有图像被裁剪为227*227大小;在测试阶段,每个图像被中心裁剪之后作为测试输入。使用0.9动量的Adma优化器[22]并将权重衰减设置为0.0005。对于聚类模块,为CUB200和Cars196设置100个聚类中心,为SOP设置10000个聚类中心。整个网络在NVIDIA GeForce RTX 2080Ti GPUs上训练,采用图像检索表现R@K作为标准的评估度量。

2.3 实验结果


3 结束语



[1] Manmatha R, Wu C, Smola A, et, al. Sampling matters in deep embedding learning[C] // IEEE International Conference on Computer Vision (ICCV),2017:2859-2867

[2] Song H, Xiang Y, Jegelka S, and Savarese S, et, al. Deep metric learning via lifted structured feature embedding [C] //IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016:4004-4012

[3] Wang X, Han X, Huang W, et, al. Multi-similarity loss with general pair weighting for deep metric learning [C] // IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2019:5022-5030

[4] Zhou T, Fu H, Gong C, et, al. Multi-mutual consistency induced transfer subspace learning for human motion segmentation[C] //IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2020:10277-10286

[5] Li T, Liang Z, Zhao S, et, al. Self-learning with rectification strategy for human parsing [C] // IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2020

[6] Woo S, Park J, Lee J, et, al. Learning descriptors for object recognition and 3d pose estimation [C] // IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2015:3109-3118

[7] He X, Zhou Y, Zhou Z, et, al. Triplet-center loss for multi-view 3d object retrieval [C] // IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2018:1945-1954

[8] Grabner A, Roth P, Lepetit V. 3d pose estimation and 3d model retrieval for objects in the wild [C] // IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2018:3022-3031

[9] Wen Y, Zhang K, Li Z, et, al. A discriminative feature learning approach for deep face recognition [C] // European Conference on Computer Vision (ECCV),2016:499-515

[10] Tao R, Gavves E, Smeulders A. Siamese instance search for tracking [C] // IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016:1420-1429

[11] Yu R, Dou Z, Bai S, et, al. Hard-aware point-to-set deep metric for person re-identification [C] // European Conference on Computer Vision (ECCV),2018:188-204

[12] Hermans A, Beyer L Leibe B. In defense of the triplet loss for person re-identification[EB/OL].arXiv preprint arXiv:1703.07737,2017

[13] Iscen A, ToliaS G, Avrithis Y, et, al. Mining on manifolds: metric learning without labels [C] // IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2018:7642-7651

[14] Huang J, Dong Q, Gong S, et, al. Unsupervised deep learning by neighbourhood discovery [C] // ACM International Conference on Machine Learning (ICML),2018:7642-7651

[15] Ye M, Zhang X, Yuen P, et, al. Unsupervised embedding learning via invariant and spreading instance feature [C] // IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2019:6210-6219

[16] Ye M, Shen J. Probabilistic structural latent representa-tion for unsupervised embedding [C] // IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2020:5457-5466

[17] Deng J, Dong W, Socher R, et, al. A large-scale hierarchical image database [C] // IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2009:248-255

[18] Wah C, Branson S, Welinder P, et, al. Caltech-UCSD birds 200[R]. California Institute of Technology,2010

[19] Krause J, Stark M, Deng J, et, al. 3D object representations for fine-grained categorization [C] // IEEE International Conference on Computer Vision Workshops (ICCVW),2013:554-561

[20] Khosla A, Jayadevaprakash N, Yao B, et, al. Novel dataset for fine-grained image categorization[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2011

[21] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions [C] // IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2015

[22] Kingma D P, Ba J. Adam: a method for stochastic optimization[EB/OL]. arXiv preprint arXiv:1412.6980,2015