(清华大学电子工程系“智能技术与系统”国家重点实验室图形图象分室,北京 100084)
目前,深度学习和大数据是人脸识别研究的一个主要热点方向,现有的大规模人脸数据集合包括:北京旷视科技(Megvii)有限公司旗下的新型视觉服务平台Face++的5 million images of 20000 subjects,谷歌公司的人工智能系统FaceNet的100-200 million images of 8 million subjects,腾讯公司的优图团队Tencent-BestImage的1 million images of 20000 subjects,以及中国科学院自动化研究所的494414 images of 10575 subjects。
截至2015年5月7日,中国知网(CNKI)和Web of Science(WOS)的数据报告显示,以人脸识别为词条检索到的期刊文献分别为4612与4795条,本专题将相关数据按照:研究机构发文数、作者发文数、期刊发文数、被引用频次进行排行,结果如下。
根据Web of Science统计数据,以人脸识别为词条检索到的高被引论文排行结果如下。
基于Web of Science检索结果,利用Histcite软件选取LCS(Local Citation Score,本地引用次数)TOP 30文献作为节点进行分析,并结合专家意见,得到本领域推荐的经典文献如下。
来源出版物:Journal of Cognitive Neuroscience,1991,3(1):71-86
A comparative study of texture measures with classification based on feature distributions
Ojala,T; Pietikainen,M; Harwood,D
Abstract: This paper evaluates the performance both of some texture measures which have been successfully used in various applications and of some new promising approaches proposed recently. For classification a method based on Kullback discrimination of sample and prototype distributions is used. The classification results for single features with one-dimensional feature value distributions and for pairs of complementary features with two-dimensional distributions are presented.
Keywords: texture analysis; classification; feature distribution; Brodatz textures; Kullback discriminant; performance evaluation
来源出版物:Pattern Recognition,1996,29(1): 51-59
Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection
Belhumeur,PN; Hespanha,JP; Kriegman,DJ
Abstract: We develop a face recognition algorithm which is insensitive to large Variation in lighting direction and facial expression. Taking a pattern classification approach,we consider each pixel in an image as a coordinate in a high-dimensional space. We take advantage of the observation that the images of a particular face,under varying illumination but fixed pose,lie in a 3D linear subspace of the high dimensional image space - if the face is a Lambertian surface without shadowing. However,since faces are not truly Lambertian surfaces and do indeed produce self-shadowing,images will deviate from this linear subspace. Rather than explicitly modeling this deviation,we linearly project the image into a subspace in a manner which discounts those regions of the face with large deviation. Our projectionmethod is based on Fisher's Linear Discriminant and produces well separated classes in a low-dimensional subspace,even under severe variation in lighting and facial expressions. The Eigenface technique,another method based on linearly projecting the image space to a low dimensional subspace,has similar computational requirements. Yet,extensive experimental results demonstrate that the proposed ''Fisherface'' method has error rates that are tower than those of the Eigenface technique for tests on the Harvard and Yale Face Databases.
Keywords: appearance-based vision; face recognition; illumination invariance; Fisher's linear discriminant
来源出版物:IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(7): 711-720
Robust real-time face detection
Viola,P; Jones,MJ
Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the "Integral Image" which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm(Freund and Schapire,1995)to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection performance comparable to the best previous systems(Sung and Poggio,1998;Rowley et al.,1998; Schneiderman and Kanade,2000; Roth et al.,2000). Implemented on a conventional desktop,face detection proceeds at 15 frames per second.
Keywords: face detection; boosting; human sensing
来源出版物:International Journal of Computer Vision,2004,57(2): 137-154
PCA versus LDA
Martinez,AM; Kak,AC; et al.
Abstract: In the context of the appearance-based paradigm for object recognition,it is generally believed that algorithms based on LDA(Linear Discriminant Analysis)are superior to those based on PCA(Principal Components Analysis). in this communication,we show that this is not always the case. We present our case first by using intuitively plausible arguments and,then. by showing actual results on a face database. Our overall conclusion is that when the training data set is small,PCA can outperform LDA and,also,that PCA is less sensitive to different training data sets.
Keywords: face recognition; pattern recognition; principal components analysis; linear discriminant analysis; learning from undersampled distributions; small training data sets
来源出版物:IEEE Transactions on Pattern Analysis and Machine Intelligence,2001,23(2): 228-233
计算机人脸识别技术是近20a才逐渐发展起来的,90年代更成为科研热点。仅1990年到1998年之间,EI可检索到的相关文献就多达数千篇。由于人脸识别实验所采用的人脸库通常不大,最常见的人脸库仅包括100幅左右的人脸图象,如MIT库、Yale库、CMU库等人脸库均为小型库,且由于不同人脸库之间的输入条件各异,因此不同的识别程序之间很难进行比较。为促进人脸识别算法的深入研究和实用化,美国国防部发起了人脸识别技术(FaceRecognition Technology简称FERET)工程[1],它包括一个通用人脸库和一套通用测试标准。该FERET库可用于各种人脸识别算法的测试比较。1997年,FERET人脸库存储了取自1199个人的14126幅图象,其中同一人的图象差异,包括不同表情、不同光照、不同头部姿态以及不同时期(相隔18个月以上)拍摄差异等。如今FERET人脸库仍在扩充,并定期对各种人脸识别程序进行性能测试,其分析测试结果对未来的工作起到了一定的指导作用。由于 FERET库中包括军人的图片,不能在美国以外获得,因此其他国家的研究只能采用本地的人脸库,如英国的Manchester人脸库[2]。
人脸识别的研究始于60年代末,最早的研究见于文献[7],Bledsoe以人脸特征点的间距、比率等参数为特征,建成了一个半自动的人脸识别系统。而且早期人脸识别研究主要有两大方向:一是提取人脸几何特征的方法[7],包括人脸部件规一化的点间距离和比率以及人脸的一些特征点,如眼角、嘴角、鼻尖等部位所构成的二维拓扑结构;二是模板匹配的方法,主要是利用计算模板和图象灰度的自相关性来实现识别功能。Berto在1993年对这两类方法作了较全面的介绍和比较后认为,模板匹配的方法优于几何特征的方法[8]。目前的研究也主要有两个方向:其一是基于整体的研究方法,它考虑了模式的整体属性,包括特征脸(Eigenface)方法、SVD分解的方法[9]、人脸等密度线分析匹配方法[10]、弹性图匹配(elastic graph matching)方法[11]、隐马尔可夫模型(Hidden Markov Model)方法[12]以及神经网络的方法等;其二是基于特征分析的方法,也就是将人脸基准点的相对比率和其它描述人脸脸部特征的形状参数或类别参数等一起构成识别特征向量。这种基于整体脸的识别不仅保留了人脸部件之间的拓扑关系,而且也保留了各部件本身的信息,而基于部件的识别则是通过提取出局部轮廓信息及灰度信息来设计具体识别算法。文献[8]认为基于整个人脸的分析要优于基于部件的分析,理由是前者保留了更多信息,但是这种说法值得商榷,因为基于人脸部件的识别要比基于整体的方法来得直观,它提取并利用了最有用的特征,如关键点的位置以及部件的形状分析等,而对基于整个人脸的识别而言,由于把整个人脸图象作为模式,那么光照、视角以及人脸尺寸会对人脸识别有很大的影响,因此如何能够有效地去掉这些干扰很关键。虽然如此,但对基于部件分析的人脸识别方法而言也有困难,其难点在于如何建立好的模型来表达识别部件。近年来的一个趋势是将人脸的整体识别和特征分析的方法结合起来,如Kin-Man Lam提出的基于分析和整体的方法[13],Andreas Lanitis提出的利用可变形模型(Flexible Models)来对人脸进行解释和编码的方法[14]。
在介绍重要的人脸识别方法之前,先扼要说明一下应用于人脸识别的其它方法。其中SVD方法和特征脸识别方法同属统计分析的范畴,都是将表达人脸的大量图象数据降维后进行模式分类,其区别仅是变换基的给出不同;而等密度线的分析方法则试图通过从二维的人脸图上抽取等密度线(即等灰度线)来反映人脸的三维信息,其根据是地图上的等高线能反映地形特征,那么通过不同人脸的等密度线也可比较人脸的相似度;HMM 是语音处理中成功的一种统计方法;而神经网络方法通常需要将人脸作为一个一维向量输入,因此输入节点庞大,其识别重要的一个目标就是降维处理。根据文献[15]对于自组织神经网络方法的分析,该文认为可采用自组织神经网络的P个节点来表达原始的N个输入(P < N),但由于将P个输出进行分类,其识别的效果仅相当于提取人脸空间特征向量后进行的识别分类,因此采用此类神经网络进行识别的效果只能是特征脸的水平,所以本文将不对神经网络作专门介绍。需要说明的是,由于人脸处于高维空间,如100×100的图象为10000维,这样神经网络的输入节点将很庞大,因此实际训练网络的时候参数繁多,实现起来很困难,但神经网络方法的优点是可以针对特定的问题进行子空间设计,如神经网络的方法可以用作性别识别等问题[15]。
若通过选用m(m < n)个特征向量作为正交基,则在该正交空间的子空间中,就可得到以下近似表达式
一种较好的特征脸改进方法是fisher脸方法(fisherface)[17],众所周知,fisher线性判别准则是模式识别里的经典方法,一般应用fisher准则是假设不同类别在模式空间是线性可分的,而引起它们可分的主要原因是不同人脸之间的差异。fisher的判别准则是:不同类样本尽可能远,同类样本尽可能近。文献[17]对用KL变换和fisher准则分别求出来的一些特征脸进行比较后得出如下结论,即认为特征脸很大程度上反映了光照等的差异,而fisher脸则能压制图象之间的与识别信息无关的差异。Belhumeur的试验[17],是通过对160幅人脸图象(一共16个人,每个人10幅不同条件下的图象)进行识别,若采用KL变换进行识别,其识别率为81%;若采用fisher方法则识别率为99.4%,显然fisher方法有了很大的改进。 Chengjun Liu在KL变换基础上提出了PRM(Probalistic Reasoning Models)模型[18],并在PRM中采用了贝叶斯分类器,它是利用最大后验概率进行分类,其类条件概率密度的方差参数用类内散布矩阵来估计,而且,PRM是采用马氏距离,而不是采用最小欧氏距离的判别准则,并且特征脸和fisher脸均可以看成是PRM的特殊情况。
文献[19]的改进方法是将人脸图象进行差异分类,即分为脸间差异和脸内差异,其中脸内差异属于同一个人脸的各种可能变形,而脸间差异则表示不同人的本质差异,而实际人脸图的差异为两者之和。通过分析人脸差异图,如果脸内差异比脸间差异大,则认为两人脸属于同一人的可能性大,反之属不同人的可能性大。 假设该两类差异都是高斯分布,则先估计出所需的条件概率密度[19],最后也归为求差图在脸内差异特征空间和脸间差异特征空间的投影问题。如果说fisher脸的方法是试图减少光照等的外在干扰,那么文献[19]则是解决表情干扰的一点有效尝试,虽然这样的尝试还很初步。文献[19]中提到,ARPA在1996年进行的FERET人脸识别测试中,该算法取得了最好的识别效果,其综合识别能力优于其它任何参加测试的算法。
如今特征脸方法用于人脸识别仍存在如下一些弊病:首先,由于作为一种图象的统计方法,图象中的所有象素被赋予了同等的地位,可是角度、光照、尺寸及表情等干扰会导致识别率急剧下降,因此较好的识别算法[19]都对人脸进行了矫正处理,且只考虑裸脸;其次,根据文献[2],人脸在人脸空间的分布近似高斯分布,且普通人脸位于均值附近,而特殊人脸则位于分布边缘。由此可见,越普通的人脸越难识别,虽然特征脸的方法本质上是抓住了人群的统计特性,但好的表达能力不等于好的区分能力;特征脸虽反映了特定库的统计特性,但不具有普遍代表性,而广泛的应用,则需要训练出的特征脸具有普遍意义;采用此方法的重要假设是人脸处于低维线性空间,即人脸相加和相减后还是人脸[2],显然这是不可能的,因为即使在定位和尺寸相同的情况下,由于部件的相对位置不同,相加、相减后的人脸也一样存在模糊,因此文献[14]提出形状无关人脸(shapeless face)的概念,即依据脸部基准点将人脸变形到标准脸,再进行特征脸处理。总之,有效的特征脸识别方法需要做大量预处理,以减少干扰。而如何表达,并去除表情因素则是识别的另一关键。
文献[14]提出了一个形状和灰度分离的模型,即从形状、总体灰度、局部灰度分布3个方面来描述一个人脸(如图1、图2、图3所示)。其中,点分布模型(图1)用来描述人脸的形状特征,该点分布模型中是用每点的局部灰度信息(图3是采用耳朵上一点附近的方向投影)来描述人脸的局部灰度特征;然后用点分布模型将图象进行变形,以生成形状无关人脸(图2),再做特征脸分析,从而得到人脸的总体灰度模式分布特征。这种三者相结合的识别方法,识别率为92%(300个人脸),虽然该方法作了一些改进,但构成该方法的基础仍是KL变换。一般在特征脸的方法中,是由行或列扫描后的人脸图象数据来生成特征脸子空间,这里则对应于3种由不同类型参数生成的3种特征子空间。该方法首先是循序取每点坐标位置信息,并将其排列成待训练数据以生成形状特征子空间;然后对点分布模型的每一点(如图3中耳朵附近一点)取局部投影信息来代表该点附近的局部灰度特征,再通过训练后生成与该点相对应的局部灰度分布特征子空间。若将所有人脸的关键点都变形到规定位置,则生成形状无关人脸,然后对所有的形状无关人脸进行特征脸分析,以生成特征脸子空间。虽然每一个特征子空间都可以单独用来识别人脸,但若要完整地描述一个人脸,则需要 3个特征子空间的人脸参数。文献[14]还试图通过形状特征子空间来分离和表情相关的参数,而设计形状和灰度分离的模型是希望能够有一个好的人脸模型。试验中,将这样的模型用于三维姿态复原、身份识别、性别识别、表情识别以及人脸的重建,均取得了一定的效果。
人脸的相似度可用拓扑图的“距离”来表示,而最佳的匹配应同时考虑顶点特征矢量的匹配和相对几何位置的匹配。由图 6(和图5一样,它们的每一顶点均为一特征矢量)可见,特征匹配即:S1上的顶点i,与S中相对应的顶点j(j= M(i),M为匹配函数),其特征的匹配度则表示i和j顶点的特征矢量相似度,而几何位置的匹配则为S中相近的两顶点,匹配后,S1中对应的两顶点也应该相近,因此文献[11]用了以下能量函数E(M)来评价待识别人脸图象矢量场和库中已知人脸的矢量场之间的匹配程度
根据Jun Zhang[15]对综合MIT、Olivetti、W wizmann、和Bem等人脸库所形成的包括272幅照片的综合人脸库,分别用KL方法和弹性匹配方法进行识别试验比较[15],所得的识别率分别为66%和93%。其中KL变换的识别率很低,其原因主要是由于综合库里来自4个人脸库的人脸图象在光照上有很大的差异所造成的,文献[15]之所以作出了弹性图形匹配优于KL变换的结论,其原因之一是由于拓扑图的顶点采用了小波变换特征,因为它对于光线、变换、尺寸和角度具有一定的不变性。大家知道,小波特征分析是一种时频分析,即空间—频率分析,若空间一点周围区域的不同的频率响应构成该点的特征串,则其高频部分就对应了小范围内的细节,而低频部分则对应了该点周围较大范围内的概貌。根据该原理,文献[20]提出了用数学形态学上的腐蚀扩张方法形成的多尺度(多分辨率)特征矢量来取代小波特征,并证明了它具有和小波特征相似的效果,它能够反映空间一点周围的高低频信息。现已证明,弹性图形匹配能保留二维图象的空间相关性信息,而特征脸方法在将图象排成一维向量后,则丢失了很多空间相关性信息。 这些都是弹性匹配方法优于特征脸方法的原因,如向人脸库中加入新的人脸时,由于不能保证已有特征脸的通用性,因而有可能需要重新计算特征脸;而对于弹性匹配的方法,则不需要改变已有的数据,通过直接加入新的模板数据即可,但计算较复杂是弹性匹配的一大缺点。根据引言中提出的低层次特征和高层次特征的定义,这里的小波特征类似于外界景物在人眼视网膜上的响应,属低层次特征,没有线、面、模式的概念。 由于低层次特征中信息的冗余不仅使得计算复杂,而且由于大量与识别无关的信息没有过滤掉,因而识别率会大打折扣,另外特征脸也存在这样的问题,其中典型的无用信息就是头发。
由于采用的人脸库不同,因此不同识别算法之间的优劣没有可比性,前面的论述也是尽量从理论上进行比较。根据Moghaddam等在1996年进行的FFEIT人脸库测试[19],结果说明区别脸内差异和脸间差异的Bay esian特征脸方法的表现最佳,即从5000幅待识别人像中,第一候选的识别率为89.5%,而灰度和形状分离的可变形模型在300幅人像中的识别率达到92%。另根据文献[15]的测试,在2000幅人脸图象的综合库中,利用小波特征弹性图形匹配的方法获得了93%的识别率,而PCA识别率只达66%。
本文介绍和分析的各种人脸识别方法同样可用于摄像机输入人脸的识别,而对于摄像机图象而言,人脸的定位和表情的分析还可以利用序列图象之间的相关性信息,如从摄像机输入动态图可以进行二维及三维的运动估计,从而建立三维的人脸模型。 由于从摄像机动态输入图中得到的信息很多,故还有可能进行有效的表情分析,以作为身份辨认的辅助手段。本文只是对目前应用于人脸识别的技术作了选择性的介绍,也是对文献[3]、[15]的一点补充。由于人脸识别的理论还不完善,具体算法的实现也有很多的因素待研究,因此计算机人脸识别的实用化还需要众多研究人员的不懈努力。
首先对计算机人脸自动识别技术的研究背景及发展历程做了简单回顾,然后对人脸正面像的识别方法,按照识别特征的不同进行了分类综述,主要介绍了特征脸(Eigenface)方法、基于小波特征的弹性匹配(ElasticMatching)的方法、形状和灰度模型分离的可变形模型(Flexible Model)以及传统的部件建模等分析方法。通过对各种识别方法的分析与比较,总结了影响人脸识别技术实用化的几个因素,并提出了研究和开发成功的人脸识别技术所需要考虑的几个重要方面,进而展望了人脸识别技术今后的发展方向。
关键词:生物特征识别;身份识别;身份认证;人脸识别 指纹识别;虹膜识别;手形识别;掌纹识别;签名识别;说话人识别
Eigenfaces for recognition
Turk,M; Pentland,A
Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection
Belhumeur,PN; Hespanha,JP; Kriegman,DJ
Shape matching and object recognition using shape contexts
Belongie,S; Malik,J; Puzicha,J
Abstract: We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework,the measurement of similarity is preceded by 1)solving for correspondences between points on the two shapes,2)using the correspondences to estimate an aligning transform. In order to solve the correspondence problem,we attach a descriptor,the shape context,to each point. The shape context at a reference point captures the distribution of the remaining points relative to it,thus offering a globally discriminative characterization. Corresponding points on two similar shapes will have similar shape contexts,enabling us to solve for correspondences as an optimal assignment problem. Given the point correspondences,we estimate the transformation that best aligns the two shapes;regularized thin-plate splines provide a flexible class of transformation maps for this purpose. The dissimilarity between the two shapes is computed as a sum of matching errors between corresponding points,together with a term measuring the magnitude of the aligning transform. We treat recognition in a nearest-neighbor classification framework as the problem of finding the stored prototype shape that is maximally similar to that in the image. Results are presented for silhouettes,trademarks,handwritten digits,and the COIL data set.
Keywords: shape; object recognition; digit recognition; correspondence problem; MPEG7; image registration; deformable templates
来源出版物:IEEE Transactions on Pattern Analysis and Machine Intelligence,2001,24(4):509-522
Robust face recognition via sparse representation
Wright,J; Yang,AY; Ganesh,A; et al.
Abstract: We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination,as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by l(1)-minimization,we propose a general classification algorithm for(image-based)object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction,we show that if sparsity in the recognition problem is properly harnessed,the choice of features is no longer critical. What is critical,however,is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as Eigenfaces and Laplacianfaces,as long as the dimension of the feature space surpasses certain threshold,predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly by exploiting the fact that these errors are often sparse with respect to the standard(pixel)basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm and corroborate the above claims.
Keywords: face recognition; feature extraction; occlusion and corruption; sparse representation; compressed sensing; l(1)-minimization;validation and outlier rejection
来源出版物:IEEE Transactions on Pattern Analysis and Machine Intelligence,2008,31(2): 210-227
From few to many: Illumination cone models for face recognition under variable lighting and pose
Georghiades,AS; Belhumeur,PN; Kriegman,DJ
Abstract: We present a generative appearance-based method for recognizing human faces under variation in lighting and viewpoint. Our method exploits the fact that the set of images of an object in fixed pose,but under all possible illumination conditions,is a convex cone in the space of images. Using a small number of training images of each face taken with different lighting directions,the shape and albedo of the face can be reconstructed. In turn,this reconstruction serves as a generative model that can be used to render-or synthesize-images of the face under novel poses and illumination conditions. The pose space is then sampled and,for each pose. the corresponding illumination cone is approximated by a low-dimensional linear subspace whose basis vectors are estimated using the generative model. Our recognition algorithm assigns to a test image the identity of the closest approximated illumination cone(based on Euclidean distance within the image space). We test our face recognition method on 4050 images from the Yale Face Database B; these images contain 405 viewing conditions(9 poses x 45 illumination conditions)for 10 individuals. The method performs almost without error,except on the most extreme lighting directions,and significantly outperforms popular recognition methods that do not use a generative model.
Keywords: face recognition; image-based rendering; appearance-based vision; face modeling; illumination and pose modeling; lighting;illumination cones; generative models
来源出版物:IEEE Transactions on Pattern Analysis and Machine Intelligence,2001,23(6):643-660
Face recognition by elastic bunch graph matching
Wiskott,L; Fellous,JM; Kruger,N; et al.
Abstract: We present a system for recognizing human faces from single images out of a large database containing one image per person. Faces are represented by labeled graphs,based on a Gabor wavelet transform. Image graphs of new faces are extracted by an elastic graph matching process and can be compared by a simple similarity function. The system differs from the preceding one in three respects. Phase information is used for accurate node positioning. Object-adapted graphs are used to handle large rotations in depth. Image graph extraction is based on a novel data structure,the bunch graph,which is constructed from a small set of sample image graphs.
Keywords: face recognition; different poses; Gabor wavelets; elastic graph matching; bunch graph; ARPA/ARL FERET database; Bochum database
来源出版物:IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(7): 775-779
Face recognition using Laplacianfaces
He,XF; Yan,SC; Hu,YX; et al.
Abstract: We propose an appearance-based face recognition method called the Laplacianface approach. By using Locality Preserving Projections(LPP),the face images are mapped into a face subspace for analysis. Different from Principal Component Analysis(PCA)and Linear Discriminant Analysis(LDA)which effectively see only the Euclidean structure of face space,LPP finds an embedding that preserves local information,and obtains a face subspace that best detects the essential face manifold structure. The Laplacianfaces are the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the face manifold. In this way,the unwanted variations resulting from changes in lighting,facial expression,and pose may be eliminated or reduced. Theoretical analysis shows that PCA,LDA,and LPP can be obtained from different graph models. We compare the proposed Laplacianface approach with Eigenface and Fisherface methods on three different face data sets. Experimental results suggest that the proposed Laplacianface approach provides a better representation and achieves lower error rates in face recognition.
Keywords: face recognition; principal component analysis; linear discriminant analysis; locality preserving projections; face manifold;subspace learning
来源出版物:IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(3): 328-340
Face recognition - features versus templates
Abstract: Over the last 20 years,several different techniques have been proposed for computer recognition of human faces. The purpose of this paper is to compare two simple but general strategies on a common database(frontal images of faces of 47 people: 26 males and 21 females,four images per person). We have developed and implemented two new algorithms; the first one is based on the computation of a set of geometrical features,such as nose width and length,mouth position,and chin shape,and the second one is based on almost-grey-level template matching. The results obtained on the testing sets(about 90% correct recognition using geometrical features and perfect recognition using template matching)favor our implementation of the template-matching approach.
Keywords: classification; face recognition; karhunen-loeve expansion; template matching
来源出版物:IEEE Transactions on Pattern Analysis and Machine Intelligence,1993,15(10):1042-1052
Two-dimensional PCA: A new approach to appearance-based face representation and recognition
Yang,J; Zhang,D; Frangi,AF ; et al.
Abstract: In this paper,a new technique coined two-dimensional principal component analysis(2DPCA)is developed for image representation. As opposed to PCA,2DPCA is based on 2D image matrices rather than 1 D vectors so the image matrix does not need to be transformed into a vector prior to feature extraction. Instead,an image covariance matrix is constructed directly using the original image matrices,and its eigenvectors are derived for image feature extraction. To test 2DPCA and evaluate its performance,a series of experiments were performed on three face image databases: ORL,AR,and Yale face databases. The recognition rate across all trials was higher using 2DPCA than PCA. The experimental results also indicated that the extraction of image features is computationally more efficient using 2DPCA than PCA.
Keywords: Principal Component Analysis(PCA); eigentaces; feature extraction; image representation; face recognition
来源出版物:IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(1):131 - 137
PCA versus LDA
Martinez,AM; Kak,AC
摘要:鉴于近年来稀疏表示(Sparse representation,SR)在高维数据例如人脸图像的特征提取与降维领域的快速发展,对原始的稀疏保持投影(Sparsity preserving projection,SPP)算法进行了改进,提出了一种叫做鉴别稀疏保持嵌入(Discriminant sparsity preserving embedding,DSPE)的算法.通过求解一个最小二乘问题来更新SPP中的稀疏权重并得到一个更能真实反映鉴别信息的鉴别稀疏权重,最后以最优保持这个稀疏权重关系为目标来计算高维数据的低维特征子空间.该算法是一个线性的监督学习算法,通过引入鉴别信息,能够有效地对高维数据进行降维.在ORL库、Yale库、扩展Yale B库和CMU PIE库上的大量实验结果验证了算法的有效性。
摘要:针对人脸识别系统准确度不高的问题,提出一种基于非下采样 Contourlet梯度方向直方图(HNOG)的人脸识别算法。先对人脸图像进行非下采样Contourlet变换(NSCT),并将变换后的各系数矩阵进行分块,再计算各分块的梯度方向直方图(HOG),将所有分块的直方图串接得到人脸图像HNOG特征,最后用多通道最近邻分类器进行分类。在YALE人脸库、ORL人脸库上和CAS-PEAL-R1人脸库上的实验结果表明,人脸的HNOG特征有很强的辨别能力,特征维数较小,且对光照、表情、姿态的变化具有较好的鲁棒性。
摘要:由于方向边缘幅值模式(POEM)在剧烈光照变化情况下无法获得足够的特征描述信息,本文分析了相对梯度幅值图像特点,提出了相对梯度直方图特征描述方法。该方法根据图像的梯度方向对相对梯度幅值图像进行分解、滤波、局部二值模式编码和特征降维,形成了对光照变化,尤其是非均匀光照变化具有健壮性的低维直方图特征。在FERET和YaleB子集上的人脸识别实验证实:在光照变化较小时,相对梯度直方图特征描述方法与方向边缘幅值模式的性能相当,均显著优于经典的局部二值模式特征;在光照剧烈变化时,前者的识别精度比方向边缘幅值模式至少高 5%,性能显著优于方向边缘幅值模式和局部二值模式,展示了相对梯度直方图特征描述方法的有效性和对光照变化的良好健壮性。
摘要:传统 Retinex算法在侧光严重的情况下难以消除阴影,为此提出一个对数形式的传导函数,取得了很好的光照补偿效果。为提高人脸识别率,将该问题看成一个典型的模式分类问题,提出基于局部二值模式(LBP)特征的支持向量机(SVM)人脸识别方法,使用“一对一”的方法将多类问题转化为SVM分类器可以解决的两类问题,实现了高效的人脸识别。在CMU PIE、AR、CAS-PEAL以及自行采集的人脸库上进行了仿真实验,结果表明该方法能够有效地去除光照影响,相对传统方法具有较优的识别性能。
Facenet: A unified embedding for face recognition and clustering
Florian Schroff; Dmitry Kalenichenko; James Philbin
Abstract:Despite significant recent advances in the field of face recognition,implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system,called FaceNet,that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced,tasks such as face recognition,verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors. Our method uses a deep convolutional network trained to directly optimize the embedding itself,rather than an intermediate bottleneck layer as in previous deep learning approaches. To train,we use triplets of roughly aligned matching / non-matching face patches generated using a novel online triplet mining method. The benefit of our approach is much greater representational efficiency: we achieve state-of-the-art face recognition performance using only 128-bytes per face. On the widely used Labeled Faces in the Wild(LFW)dataset,our system achieves a new record accuracy of 99.63%. On YouTube Faces DB it achieves 95.12%. Our system cuts the error rate in comparison to the best published result by 30% on both datasets.
来源出版物:preprint arXiv:1503.03832,2015
Face Search at Scale: 80 Million Gallery
Dayong Wang; Charles Otto; Anil K. Jain
Abstract:Due to the prevalence of social media websites,one challenge facing computer vision researchers is to devise methods to process and search for persons of interest among the billions of shared photos on these websites. Facebook revealed in a 2013 white paper that its users have uploaded more than 250 billion photos,and are uploading 350 million new photos each day. Due to this humongous amount of data,large-scale face search for mining web images is both important and challenging. Despite significant progress in face recognition,searching a large collection of unconstrained face images has not been adequately addressed. To address this challenge,we propose a face search system which combines a fast search procedure,coupled with a state-of-the-art commercial off the shelf(COTS)matcher,in a cascaded framework. Given a probe face,we first filter the large gallery of photos to find the top-k most similar faces using deep features generated from a convolutional neural network. The k candidates are re-ranked by combining similarities from deep features and the COTS matcher. We evaluate the proposed face search system on a gallery containing 80 million web-downloaded face images. Experimental results demonstrate that the deep features are competitive with state-of-the-art methods on unconstrained face recognition benchmarks(LFW and IJB-A). Further,the proposed face search system offers an excellent trade-off between accuracy and scalability on datasets consisting of millions of images. Additionally,in an experiment involving searching for face images of the Tsarnaev brothers,convicted of the Boston Marathon bombing,the proposed face search system could find the younger brother's(Dzhokhar Tsarnaev)photo at rank 1 in 1 second on a 5M gallery and at rank 8 in 7 seconds on an 80M gallery.
来源出版物:preprint arXiv:1507.07242,2015
Non-rigid visible and infrared face registration via regularized Gaussian fields criterion
Ma,JY; Zhao,J; Ma,Y; et al.
Abstract: Registration of multi-sensor data(particularly visible color sensors and infrared sensors)is a prerequisite for multimodal image analysis such as image fusion. Typically,the relationships between image pairs are modeled by rigid or affine transformations. However,this cannot produce accurate alignments when the scenes are not planar,for example,face images. In this paper,we propose a regularized Gaussian fields criterion for non-rigid registration of visible and infrared face images. The key idea is to represent an image by its edge map and align the edge maps by a robust criterion with a non-rigid model. We model the transformation between images in a reproducing kernel Hilbert space and a sparse approximation is applied to the transformation to avoid high computational complexity. Moreover,a coarse-to-fine strategy by applying deterministic annealing is used to overcome local convergence problems. The qualitative and quantitative comparisons on two publicly available databases demonstrate that our method significantly outperforms the state-of-the-art method with an affine model. As a result,our method will be beneficial for fusion-based face recognition.
Keywords: registration; image fusion; infrared; non-rigid; face recognition; Gaussian fields
来源出版物:Pattern Recognition,2015,48(3): 772-784联系邮箱:Ma,JY; jiayima@whu.edu.cn
Fully automatic 3D facial expression recognition using polytypic multi-block local binary patterns
Li,XL; Ruan,QQ; Jin,Y; et al.
Abstract: 3D facial expression recognition has been greatly promoted for overcoming the inherent drawbacks of 2D facial expression recognition and has achieved superior recognition accuracy to the 2D. In this paper,a novel holistic,full-automatic approach for 3D facial expression recognition is proposed. First,3D face models are represented in 2D-image-like structure which makes it possible to take advantage of the wealth of 2D methods to analyze 3D models. Then an enhanced facial representation,namely polytypic multi-block local binary patterns(P-MLBP),is proposed. The P-MLBP involves both the feature-based irregular divisions to depict the facial expressions accurately and the fusion of depth and texture information of 3D models to enhance the facial feature. Based on the BU-3DFE database,three kinds of classifiers are employed to conduct 3D facial expression recognition for evaluation. Their experimental results outperform the state of the art and show the effectiveness of P-MLBP for 3D facial expression recognition. Therefore,the proposed strategy is validated for 3D facial expression recognition; and its simplicity opens a promising direction for fully automatic 3D facial expression recognition.
Keywords: 3D facial expression recognition; automatic data normalization; P-MLBP; feature-based irregular divisions; feature fusion
来源出版物:Signal Processing,2015,108: 297-308联系邮箱:Li XL; 09112087@bjtu.edu.cn
UGC-JU face database and its benchmarking using linear regression classifier
Seal,A; Bhattacharjee,D; Nasipuri,M; et al.
Abstract: In this paper,a new face database has been presented which will be freely available to academicians and research community for research purposes. The face database consists of both visual and thermal face images of 84 persons with varying poses,expressions and occlusions(39 different variations for each type,visual or thermal). A new thermal face image recognition technique based on Gappy Principal Component Analysis and Linear Regression Classifier has also been presented here. The recognition performance of this technique on the thermal face images of this database is found to be 98.61 %,which can be considered as the initial benchmark recognition performance this database.
Keywords: thermal face images; visual images; face database; GappyPCA; LRC classifier; decision level fusion
来源出版物:Multimedia Tools and Applications,2015,74(9): 2913-2937联系邮箱:Seal,A; ayanseal30@ieee.org
Learning face representation from scratch
Dong Yi; Zhen Lei; Shengcai Liao; Stan Z. Li
Abstract: Pushing by big data and deep convolutional neural network(CNN),the performance of face recognition is becoming comparable to human. Using private large scale training datasets,several groups achieve very high performance on LFW,i.e.,97% to 99%. While there are many open source implementations of CNN,none of large scale face dataset is publicly available. The current situation in the field of face recognition is that data is more important than algorithm. To solve this problem,this paper proposes a semi-automatical way to collect face images from Internet and builds a large scale dataset containing about 10000 subjects and 500000 images,called CASIAWebFace. Based on the database,we use a 11-layer CNN to learn discriminative representation and obtain state-of-theart accuracy on LFW and YTF. The publication of CASIAWebFace will attract more research groups entering this field and accelerate the development of face recognition in the wild.
来源出版物:preprint arXiv:1411.7923,2014
Joint sparse representation for robust multimodal biometrics recognitionl
Shekhar,S; Patel,VM; Nasrabadi,NM; et al.
Abstract: Traditional biometric recognition systems rely on a single biometric signature for authentication. While the advantage of using multiple sources of information for establishing the identity has been widely recognized,computational models for multimodal biometrics recognition have only recently received attention. We propose a multimodal sparse representation method,which represents the test data bya sparse linear combination of training data,while constraining the observations from different modalities of the test subject to share their sparse representations. Thus,we simultaneously take into account correlations as well as coupling information among biometric modalities. A multimodal quality measure is also proposed to weigh each modality as it gets fused. Furthermore,we also kernelize the algorithm to handle nonlinearity in data. The optimization problem is solved using an efficient alternative direction method. Various experiments show that the proposed method compares favorably with competing fusion-based methods.
Keywords: Multimodal biometrics; feature fusion; sparse representation
来源出版物:IEEE Transactions On Pattern Analysis and Machine Intelligence,2014,36(1): 113-126
联系邮箱:Shekhar,S; sshekha@umiacs.umd.edu
Half-quadratic-based iterative minimization for robust sparse representation
He,R; Zheng,WS; Tan,TN; et al.
Abstract: Robust sparse representation has shown significant potential in solving challenging problems in computer vision such as biometrics and visual surveillance. Although several robust sparse models have been proposed and promising results have been obtained,they are either for error correction or for error detection,and learning a general framework that systematically unifies these two aspects and explores their relation is still an open problem. In this paper,we develop a half-quadratic( HQ)framework to solve the robust sparse representation problem. By defining different kinds of half-quadratic functions,the proposed HQ framework is applicable to performing both error correction and error detection. More specifically,by using the additive form of HQ,we propose an l(1)-regularized error correction method by iteratively recovering corrupted data from errors incurred by noises and outliers; by using the multiplicative form of HQ,we propose an l(1)-regularized error detection method by learning from uncorrupted data iteratively. We also show that the l(1)-regularization solved by soft-thresholding function has a dual relationship to Huber M-estimator,which theoretically guarantees the performance of robust sparse representation in terms of M-estimation. Experiments on robust face recognition under severe occlusion and corruption validate our framework and findings.
Keywords: I(1)-minimization; half-quadratic optimization; sparse representation; M-estimator; correntropy
来源出版物:IEEE Transactions on Pattern Analysis and Machine Intelligence,2014,36(2): 261-275
联系邮箱:He,R; rhe@nlpr.ia.ac.cn
Image Quality Assessment for Fake Biometric Detection: Application to Iris,Fingerprint,and Face Recognition
Galbally,Javier; Marcel,Sebastien; Fierrez,Julian
Abstract: To ensure the actual presence of a real legitimate trait in contrast to a fake self-manufactured synthetic or reconstructed sample is a significant problem in biometric authentication,which requires the development of new and efficient protection measures. In this paper,we present a novel software-based fake detection method that can be used in multiple biometric systems to detect different types of fraudulent access attempts. The objective of the proposed system is to enhance the security of biometric recognition frameworks,by adding liveness assessment in a fast,user-friendly,and non-intrusive manner,through the use of image quality assessment. The proposed approach presents a very low degree of complexity,which makes it suitable for real-time applications,using 25 general image quality features extracted from one image(i.e.,the same acquired for authentication purposes)to distinguish between legitimate and impostor samples. The experimental results,obtained on publicly available data sets of fingerprint,iris,and 2D face,show that the proposed method is highly competitive compared with other state-of-the-art approaches and that the analysis of the general image quality of real biometric samples reveals highly valuable information that may be very efficiently used to discriminate them from fake traits.
Keywords: image quality assessment; biometrics; security; attacks; countermeasures
来源出版物:IEEE Transactions on Image Processing,2014,23(2): 710-724
联系邮箱:Galbally,Javier; javier.galbally@jrc.ec.europa.es
Robust face recognition via occlusion dictionary learning
Ou,WH; You,XG; Tao,DC; et al.
Abstract: Sparse representation based classification(SRC)has recently been proposed for robust face recognition. To deal with occlusion,SRC introduces an identity matrix as an occlusion dictionary on the assumption that the occlusion has sparse representation in this dictionary. However,the results show that SRC's use of this occlusion dictionary is not nearly as robust to large occlusion as it is to random pixel corruption. In addition,the identity matrix renders the expanded dictionary large,which results in expensive computation. In this paper,we present a novel method,namely structured sparse representation based classification(SSRC),for face recognition with occlusion. A novel structured dictionary learning method is proposed to learn an occlusion dictionary from the data instead of an identity matrix. Specifically,a mutual incoherence of dictionaries regularization term is incorporated into the dictionary learning objective function which encourages the occlusion dictionary to be as independent as possible of the training sample dictionary. So that the occlusion can then besparsely represented by the linear combination of the atoms from the learned occlusion dictionary and effectively separated from the occluded face image. The classification can thus be efficiently carried out on the recovered non-occluded face images and the size of the expanded dictionary is also much smaller than that used in SRC. The extensive experiments demonstrate that the proposed method achieves better results than the existing sparse representation based face recognition methods,especially in dealing with large region contiguous occlusion and severe illumination variation,while the computational cost is much lower.
Keywords: face recognition; occlusion dictionary learning; mutual incoherence; structured sparse representation
来源出版物:Pattern Recognition,2014,47(4): 1559-1572联系邮箱:You,XG; you1231cncn@gmail.com
Discriminative multimanifold analysis for face recognition from a single training sample per person
Lu ,JW; Tan,YP; Wang,G
Abstract: Conventional appearance-based face recognition methods usually assume that there are multiple samples per person(MSPP)available for discriminative feature extraction during the training phase. In many practical face recognition applications such as law enhancement,e-passport,and ID card identification,this assumption,however,may not hold as there is only a single sample per person(SSPP)enrolled or recorded in these systems. Many popular face recognition methods fail to work well in this scenario because there are not enough samples for discriminant learning. To address this problem,we propose in this paper a novel discriminative multimanifold analysis(DMMA)method by learning discriminative features from image patches. First,we partition each enrolled face image into several nonoverlapping patches to form an image set for each sample per person. Then,we formulate the SSPP face recognition as a manifold-manifold matching problem and learn multiple DMMA feature spaces to maximize the manifold margins of different persons. Finally,we present a reconstruction-based manifold-manifold distance to identify the unlabeled subjects. Experimental results on three widely used face databases are presented to demonstrate the efficacy of the proposed approach.
Keywords: face recognition; manifold learning; subspace learning; single training sample per person
来源出版物:IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1): 39-51
联系邮箱:Lu ,JW; jiwen.lu@adsc.com.sg
Fast l(1)-minimization algorithms for robust face recognition
Yang,AY; Zhou,ZH; Balasubramanian,AG; et al.
Abstract: l(1)-minimization refers to finding the minimum l(1)-norm solution to an underdetermined linear system b = Ax. Under certain conditions as described in compressive sensing theory,the minimum l(1)-norm solution is also the sparsest solution. In this paper,we study the speed and scalability of its algorithms. In particular,we focus on the numerical implementation of a sparsity-based classification framework in robust face recognition,where sparse representation is sought to recover human identities from high-dimensional facial images that may be corrupted by illumination,facial disguise,and pose variation. Although the underlying numerical problem is a linear program,traditional algorithms are known to suffer poor scalability for large-scale applications. We investigate a new solution based on a classical convex optimization framework,known as augmented Lagrangian methods. We conduct extensive experiments to validate and compare its performance against several popular l(1)-minimization solvers,including interior-point method,Homotopy,FISTA,SESOPCD,approximate message passing,and TFOCS. To aid peer evaluation,the code for all the algorithms has been made publicly available.
Keywords: l(1)-minimization; augmented Lagrangian methods; face recognition
来源出版物:IEEE Transactions on Image Processing,2013,22(8): 3234-3246联系邮箱:Yang,AY;yang@eecs.berkeley.edu
Hybrid Deep Learning for Face Verification
Yi Sun; Xiaogang Wang; Xiaoou Tang
Abstract: his paper proposes a hybrid convolutional network(ConvNet)-Restricted Boltzmann Machine(RBM)model for face verification in wild conditions. A key contribution of this work is to directly learn relational visual features,which indicate identity similarities,from raw pixels of face pairs with a hybrid deep network. The deep ConvNets in our model mimic the primary visual cortex to jointly extract local relational visual features from two face images compared with the learned filter pairs. These relational features are further processed through multiple layers to extract high-level and global features. Multiple groups of ConvNets are constructed in order to achieve robustness and characterize face similarities from different aspects. The top-layer RBM performs inference from complementary high-level features extracted from different ConvNet groups with a two-level average pooling hierarchy. The entire hybrid deep network is jointly fine-tuned to optimize for the task of face verification. Our model achieves competitive face verification performance on the LFW dataset.
来源出版物: 2013 IEEE International Conference on. IEEE,2013: 1489-1496.
We have developed a near-real-time computer system that can locate and track a subject's head,and then recognize the person by comparing characteristics of the face to those of known individuals. The computational approach taken in this system is motivated by both physiology and information theory,as well as by the practical requirements of near-real-time performance and accuracy. Our approach treats the face recognition problem as an intrinsically two-dimensional(2-D)recognition problem rather than requiring recovery of three-dimensional geometry,taking advantage of the fact that faces are normally upright and thus may be described by a small set of 2-D characterstic views. The system functions by projecting face images onto a feature space that spans the significant variations among known face images. The significant features are known as "eigenfaces," because they are the eigenvectors(principal components)of the set of faces; they do not necessarily correspond to features such as eyes,ears,and noses. The projection operation characterizes an individual face by a weighted sum of the eigenface features,and so to recognize a particular face it is necessary only to compare these weights to those of known individuals. Some particular advantages of our approach are that it provides for the ability to learn and later recognize new faces in an unsupervised manner,and that it is easy to implement using a neural network architecture.
superior temporal sulcus; human faces; neurons; macaque; monkey; cortex
