潘天红,李鱼强,陈 琦,陈 山
基于Elastic Net特征变量选择的黄山毛峰茶等级评价
潘天红1,2,李鱼强2,陈 琦3,陈 山2
(1. 安徽大学电气工程与自动化学院,合肥 230061;2. 江苏大学电气信息工程学院,镇江 212013;3.黄山海关茶叶质量安全研究中心,黄山 245000)
为简化茶叶化学检测分析过程,实现茶叶高精度等级评价,该研究以黄山毛峰茶为研究对象,结合茶叶中茶多酚、儿茶素、咖啡碱、没食子酸及氨基酸成分检测,提出基于Elastic Net特征变量选择的茶叶等级评价方法,建立基于特征成分的黄山毛峰茶等级评价模型。试验选取6个不同等级共96个黄山毛峰茶叶样品,并分析了全部样品的19个成分,通过Elastic Net选取了9个特征成分(没食子酸、表儿茶素没食子酸酯、儿茶素、表儿茶素、没食子酸儿茶素没食子酸酯、表没食子儿茶素、谷氨酸、精氨酸和儿茶素苦涩味指数)建立等级评价模型,并与主成分分析(Principal Components Analysis, PCA)进行对比。100次蒙特卡罗试验结果表明,相比于PCA预测集准确率平均值为70.79%,基于Elastic Net特征变量选择的黄山毛峰茶等级评价准确率更高为78.72%。在此基础上,构建Elastic Net特征变量雷达图,实现黄山毛峰茶等级多变量综合评价可视化。研究结果表明所提方法可有效选择茶叶特征成分,提高黄山毛峰茶等级评价准确率,为茶叶高精度等级评价提供参考。
模型;品质控制;Elastic Net;特征变量选择;黄山毛峰茶;等级评价
黄山毛峰作为中国十大名茶之一,以其香高持久等特点拥有一定的国际市场[1-2]。然而,随着市场的不断扩大,茶叶掺假现象的不断发生不仅损害了黄山绿茶的市场形象,也限制了地方经济的快速发展,如何实现茶叶精准评价是目前限制绿茶发展的关键问题[3]。传统感官分析方法主要通过感官实现茶叶品质评价和产地识别,但是其主观性强、稳定性差。此外,由于人工检测效率低,该方法无法实现大批量检测分析[4-5]。
绿茶品质的级别差异主要体现在外观、汤色、滋味、香气和叶底等5个感官指标,而支持这些表观现象的根本是其所含化学物质的种类及含量[6-8]。为避免感官评审的主观性,各种基于茶叶内在成分差异的分析方法不断被提出[7,9-10]。王曼等[11]通过近红外光谱技术构建了黄山毛峰茶鲜叶含水率和粗纤维含量的定量预测模型,实现了掺假茶叶的鉴别分析;吴正敏等[12]提出了基于形态特征参数的茶叶等级评价模型,利用茶叶筛选过程中的形态特性实现茶叶精选;武小红等[13]利用傅里叶光谱分析技术,提出一种基于模糊聚类的茶叶分级评价模型;孙俊等[14]提出一种基于低秩自动编码器及高光谱图像技术的茶叶品种鉴别方法,实现了不同品种的分类鉴别。
上述方法虽然在一定程度上实现了较高精度的茶叶品种鉴别和等级评价,但是对于同一产地的不同等级茶叶,其红外指纹图谱和图像特征信息基本相似,无法通过光谱指纹图谱和高光谱图像提取有效特征变量[15-19]。因此,应用各种化学分析技术对茶叶进行化学品质鉴定仍是目前最有效的分析手段,但是化学方法检测繁琐、周期长、成本高,而且不利于茶叶市场监管[20]。前期研究发现,不同产区或产地茶叶等级差异主要取决于主要成分和矿物元素含量[5],但对于黄山毛峰茶等特定产区的不同等级茶叶,其主要成分含量相近,只有少数特征成分之间存在差异。因此,可通过选择特征成分以减少实际茶叶品质分析化学指标,降低检测成本和检测时间,并提高相应模型分析精度。
本文以黄山毛峰茶为研究对象,利用Elastic Net分析方法进行茶叶中特征成分分析选择,建立基于特征成分的茶叶等级评价模型,并采用蒙特卡罗法进行等级评价建模稳定性分析,为黄山毛峰茶实际等级评价提供理论依据。
分批在黄山市代表性产区徽州区富溪村、杨村和新田村3个产地采摘茶鲜叶样品,并使用手工制作工艺制备黄山毛峰茶样品。工艺主要包括杀青、揉捻和烘焙[21],其中:1)杀青:每批将500 g左右的鲜叶均匀摊放在铜锅底部,在150 ℃下闷杀2 min;然后在130℃锅温下翻炒杀青,翻炒至叶质可揉捻成团、嫩梗不易折断。2)揉捻:杀青起锅后,将杀青叶均匀摊放,待热气散失后,反复揉捻杀青叶1~2 min,使青叶卷曲成条状。3)烘焙:将青叶按0.5~1.5 cm厚度均匀摊放在烘笼顶部,反复检测干燥程度,烘干到茶叶含水率为4%~6%。
邀请7名评茶员对制备样品进行感官评审,共选取了96个黄山毛峰茶标准样品,每个标准样采集1 000 g,不同等级标准样品数量如表1所示,不同等级按照采摘时间划分。
表1 不同等级标准样品数量
注:表中特一、特二和特三分别表示茶叶等级为特级一等、特级二等和特级三等,下同。
Note: the AD 1stgrade, AD 2ndgrade and AD 3rdgrade in table 1 represent the tea’s grade are advanced first grade, advanced second grade and advanced third grade, respectively, the same below.
液相色谱四极杆静电场轨道阱高分辨质谱仪(美国Thermo Fisher公司)、ACQUITY UPLC I-Class超高效液相色谱仪(美国Waters公司)、S-433D氨基酸分析仪(德国SYKAM公司)、CEM MARS 5微波萃取仪(德国LCTech公司)、Mettler-AL204-IC电子天平(瑞士METTLER TOLEDO公司)、HH-6数显恒温水浴锅(上海浦光公司)、Hettich Universal 320R台式离心机(德国Hettich公司)、UV2550分光光度计(日本岛津公司)、S40 Seven Multi型pH仪(德国Mettler公司)、Vottex-Genie 2漩涡混合器(美国SI仪器公司)、KQ200DE超声波清洗机(昆山市超声仪器有限公司)、Milli-Qgradient超纯水仪(美国密理博公司)、1095样品磨机(瑞典FOSS公司)。
茶多酚总量按照《GB/T 8313-2018 茶叶中茶多酚和儿茶素类含量的检测方法》第4部分“茶叶中茶多酚的检测”进行测定。氨基酸总量按照《GB/T 8314-2013茶游离氨基酸总量的测定》进行。利用氨基酸分析仪测定茶叶中26种氨基酸,利用微波辅助萃取结合超高效液相色谱-四极杆静电场轨道阱组合高分辨质谱联用同时测定茶叶中的儿茶素、没食子酸和咖啡碱。
1.3.1 茶叶中儿茶素、没食子酸和咖啡碱测定
样品处理:称取0.2 g磨碎试样于50 mL试管中,加入10 mL在70 ℃预热过的体积分数为70%甲醇溶液,放入70 ℃水浴锅中提取10 min(5 min时震荡一次)。取出后于3 000 r/min离心10 min,吸取上清液于50 mL容量瓶中。重复提取2次,合并上清液,用5 mL的70%甲醇洗涤枪头,用水定容至刻度。
样品净化:取250L的样品提取液用水稀释4倍,经0.22m水系滤膜过滤至进样瓶中,供超高效液相色谱(Ultra Performance Liquid Chromatography, UPLC)分析。
色谱柱,Waters ACQUITY UPLC BEH C18(2.1 mm× 100 mm,1.7m);柱温,35 ℃;进样量,5L;检测器,紫外检测器;检测波长,278 nm。根据GB/T 8312-2013中测定儿茶素的流动相作为依据,流动相A:2.5%乙酸水溶液,流动相B:乙腈,洗脱程序:0~0.8 min,5%~10% B;0.8~2.4 min,10% B;2.4~3.2 min,10%~20% B;3.2~4.0 min,20% B;4.0~4.8 min,20%~10% B,4.8~5.0 min,10%~5% B。
1.3.2 茶叶中26种氨基酸含量测定
样品处理:称取2.0 g茶叶磨碎样品,放入250 mL具塞锥形瓶内,加入预先煮沸的沸水100 mL,盖好盖子,沸水浴加热30 min(每5 min震荡一次)。取出,待茶叶静置到底部,取上清液5 mL于50 mL离心管中,加入质量分数为4%的磺基水杨酸溶液15 mL,涡旋30 s后静置10 min,5 000 r/min离心5 min(使溶液中的蛋白质完全被除去),取上清液1 mL于另一离心管中,用1 mL样品稀释液稀释,涡旋使之混匀,过0.22m水系膜至进样小瓶,待进样。
仪器条件:样量,50L;色谱柱,锂离子型磺酸基强酸性阳离子交换柱;流动相A:pH 值2.90,流动相B:pH值4.20,流动相C:pH值8.00;试剂,茚三酮溶液;洗脱泵流速,0.45 mL/min;衍生泵流速,0.25 mL/min;双通道光度计检测波长,570 nm和440 nm;反应器温度,130 ℃。
氨基酸定性和定量检测:通过氨基酸保留时间进行定性检测,利用标准物质外标法定量。在色谱条件下进行标准溶液外标法定量时,除了茶氨酸决定系数为0.998之外,其他氨基酸的决定系数均大于0.999。
本文将儿茶素苦涩味指数作为成分变量进行分析,因此最终获取的黄山毛峰成分变量数为19。
化学分析数据含有19个成分变量,但是黄山毛峰茶的品质源于特定茶叶产区的特有成分,不同等级之间品质差异只取决于少数特征成分,因此选择特征成分不仅能够减少检测时间、降低检测成本,而且能够有效提高检测结果,对于实际检测分析过程十分重要。
设多变量回归模型为:
式中为惩罚系数,表示回归向量范数。
当=1时,式(3)为最小绝对收敛和选择算子(Least Absolute Shrinkage and Selection Operator, LASSO),LASSO方法以1(1范数)作为惩罚项实现回归系数压缩,使绝对值较小的系数为0,从而实现特征变量选择和稀疏系数估计,其表达式为[23-25]:
可知,当=0和=1时,Elastic Net分别为岭回归和LASSO回归分析[24,26]。可通过变换将其转换为LASSO的形式进行求解,对于给定数据(*,*)和参数(1,2),定义数据集(,),满足[27]:
即:
经过数据变换后样本维度变成了+而*秩为,故Elastic Net可实现全变量选择,克服了LASSO的特征变量维度和共线性限制。
采用预测准确率评价模型性能:
式中为分析数据集样本数,N为预测准确样本数,为模型预测精度。
试验采集茶叶成分分析如表2所示,不同茶叶成分测试基准之间存在差异,导致所获取的分析数据数量级差异较大,为避免因数据量纲差异而导致特征变量丢失现象,在下一步分析之前需对所有茶叶成分数据进行标准化数据处理。
表2 黄山毛峰茶成分分析表
标准化处理后的成分相关性矩阵如表3所示,大部分成分之间相关性小于0.6,仅有GA与ECG(0.90)、咖啡碱与儿茶素总量(0.92)、ECG与精氨酸(0.78)、ECG与儿茶素苦涩味指数(0.73)、天冬氨酸与谷氨酸(0.82)、天冬氨酸与茶氨酸(0.80)、天冬氨酸与精氨酸(0.74)及谷氨酸与精氨酸(0.83)之间存在较强相关性,因此有必要分析特征成分,为实际毛峰等级评价提供指导。
表3 黄山毛峰茶成分相关性分析表
由式(6)可知,Elastic net的优化函数()包含系数(0<<1)和正则化系数(0<)。为确定模型参数,本试验首先通过10次交叉验证确定系数,然后基于最小均方误差(Mean Squared Error, MSE)准则确定正则化系数[26]。当交叉验证确定参数=0.2时,不同正则化系数MSE变化曲线如图1所示,图中箭头所指为最小MSE点。由图可知,基于MSE准则的最佳正则化系数为=0.6。
图1 不同正则化系数均方误差变化曲线(α=0.2)
基于所选最佳系数(=0.2,=0.6),Elastic Net方法通过最小角回归算法(Least Angle Regression, LAR)迭代计算19个成分变量稀疏系数[26],非零稀疏系数对应成分变量即为特征成分变量。根据所得稀疏系数,本文共选择了9个特征成分变量(GA、ECG、C、EC、GCG、EGC、谷氨酸、精氨酸和儿茶素苦涩味指数),根据各变量贡献率大小最终所选特征成分如图2所示,可知选择特征成分按贡献率大小依次是ECG、GA、EC、精氨酸、EGC、儿茶素苦涩味指数、C、谷氨酸和GCG。
图2 特征成分贡献率
为验证Elastic Net变量选择的有效性,对不同等级之间特征成分分布进行可视化分析(图3)。由图可知不同等级之间选择特征成分含量存在明显差异。总体上样品等级越高,ECG、GA、谷氨酸、精氨酸和儿茶素苦涩味指数含量平均值越高,但是EC、EGC、GCG含量平均值越低。按照贡献率大小选择的前三特征成分ECG、GA、EC呈现出明显的等级差异,但其他变量之间存在交叉现象,由此可知,Elastic Net能够有效选择具有等级差异化分布的特征成分。
注:ECG、GA、EC、EGC、C、GCG分别为表儿茶素没食子酸酯、没食子酸、表儿茶素、表没食子儿茶素、儿茶素、没食子酸儿茶素没食子酸酯。
将黄山毛峰茶等级特一(#1)、特二(#2)、特三(#3)、一级(#4)、二级(#5)和三级(#6)依次进行标记,以GA、ECG、C、EC、GCG、EGC、谷氨酸、精氨酸和儿茶素苦涩味指数作为输入变量,相应等级属性作为输出,并将全部样本随机分为训练集(67, 70%)和预测集(29, 30%)进行建模分析。预测结果分布如图4所示,可知基于Elastic Net选择特征成分所建模型的预测准确率为79.31%,能够实现较高精度等级评价,其中6个预测错误样本主要分布在相邻等级属性之间,其原因可能是不同等级茶叶样品采集于同一产地,相同或相似的地理环境条件导致成分含量基本相同。
为验证Elastic Net特征变量选择的有效性,以原始数据为基准,采用相同的训练集和预测集样本,分别对PCA(2个主成分,累计贡献率99.42%)和Elastic Net回归模型进行100次蒙特卡罗试验[28]。为确保模型对比有效性,仅选择前8个特征变量(累计贡献率99.35%)进行蒙特卡罗试验。所建模型的训练集和测试集预测准确率结果如表4所示,测试结果表明,相比于基于原始数据的预测集准确率平均值(69.55%),PCA未能有效提高模型预测准确率(70.79%),而基于Elastic Net的模型预测性能得到明显提高,其模型训练集和预测集预测准确率平均值分别从70.92%、69.55%提高到77.48%、78.72%。此外,由预测集精度标准差可知,基于Elastic Net选择变量所建模型稳定性更高,能够实现较高精度的黄山毛峰茶等级评价。
注:主对角数值表示预测正确等级样本数,其他数值表示预测错误样本数。
表4 蒙特卡罗试验结果对比
本研究基于茶叶品质化学检测分析过程,结合Elastic Net特征选择方法,提出基于Elastic Net特征变量选择的黄山毛峰茶等级评价方法,在6个不同等级共96个样品数据集上进行等级测试,试验结果表明:
1)茶叶特征成分选择能够减少茶叶化学检测指标并提高相应等级评价模型分析性能,为简化实际茶叶检测分析过程提供重要指导。
2)Elastic Net算法作为一种特征选择方法,能够更好地选择特征变量。相比于实际化学检测成分变量有19种,Elastic Net能够有效选择黄山毛峰茶等级评价特征成分减少至9种。
3)相比于原始数据准确率(69.55%)和PCA降维数据(70.79%),基于Elastic Net选择特征的黄山毛峰茶等级评价模型准确率更高(78.72%)、稳定性更好,在减少化学分析指标的同时有效地提高了模型分析性能。
4)基于Elastic Net选择的特征变量,易于构建黄山毛峰茶的特征成分雷达图,实现黄山毛峰茶等级多变量综合评价的可视化。
[1] 陈波,靳保辉,颜治,等. 有机成分与元素分析相结合鉴别6种中国名茶[J]. 食品科学,2014,35(18):119-123.
Chen Bo, Jin Baohui, Yan Zhi, et al. Discrimination of 6 kinds of chinese tea by combination of organic components and multielement analysis[J]. Food Science, 2014, 35(18): 119-123. (in Chinese with English abstract)
[2] 任广鑫,宁井铭,吴卫国,等. 黄山毛峰茶连续化生产线加工工艺参数的研究[J]. 安徽农业大学学报,2013,40(1):124-129.
Ren Guangxin, Ning Jingming, Wu Weiguo, et al. Investigation of the technological parameters for processing line of Huangshan Maofeng green tea[J]. Journal of Anhui Agricultural Univesity, 2013, 40(1): 124-129. (in Chinese with English abstract)
[3] 薛大为,孔慧芳,杨春兰. 主成分分析与神经网络结合的黄山毛峰茶品质检测[J]. 计算机与应用化学,2014,31(5):578-582.
Xue Dawei, Kong Huifang, Yang Chunlan. Huangshan Maofeng tea quality detection based on principal component analysis and netrual network[J]. Computers and Applied Chenistry, 2014, 31(5): 578-582. (in Chinese with English abstract)
[4] 王淑慧,龙立梅,宋沙沙,等. 3种名优绿茶的特征滋味成分研究及种类判别[J]. 食品科学,2016,37(2):128-131.
Wang Shuhui, Long Limei, Song Shasha, et al. Analysis of characteristic flavor components and cultivar discrimination of three varieties of famous green tea[J]. Food Science, 2016, 37(2): 128-131. (in Chinese with English abstract)
[5] 程焕,贺玮,赵镭,等. 红茶与绿茶感官品质与其化学组分的相关性[J]. 农业工程学报,2012,28(1):375-380.
Chen Huan, He Wei, Zhao Lei, et al. Correlation between sensoty attributes and chemical components of black and green tea[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2012, 28(1): 375-380. (in Chinese with English abstract)
[6] Yang Xingbin, Cui Yanmang, Lu Xinshan, et al. Protective effects of polyphenols-enriched extract from Huangshan Maofeng green tea against CCl4-induced liver injury in mice[J]. Chemico-Biological Interactions, 2014, 220(5): 75-83.
[7] 董春旺,梁高震,安霆,等. 红茶感官品质及成分近红外光谱快速检测模型建立[J]. 农业工程学报,2018,34(24):306-313.
Dong Chunwang, Liang Gaozhen, An Ting, et al. Near-infrared spectroscopy detection model for sensory quality and chemical constituents of black tea[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34(24): 306-313. (in Chinese with English abstract)
[8] 张阳,肖卫华,纪冠亚,等. 机械超微粉碎与不同粒度常规粉碎对红茶理化特性的影响[J]. 农业工程学报,2016,32(11):295-301.
Zhang Yang, Xiao Weihua, Ji Guanya, et al. Effects on physicochemical properities of black tea by machanical superfine and general grinding[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2016, 32(11): 295-301. (in Chinese with English abstract)
[9] 文韬,郑立章,龚中良,等. 基于近红外光谱技术的茶油原产地快速鉴别[J]. 农业工程学报,2016,32(16):293-299.
Wen Tao, Zheng Lizhang, Gong Zhongliang, et al. Rapid identification of geographical origin of camellia oil based on near infrared spectroscopy technology[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2016, 32(16): 293-299. (in Chinese with English abstract)
[10] 李晓丽,魏玉震,徐劼,等. 基于高光谱成像的茶叶中EGCG分布可视化[J]. 农业工程学报,2018,34(7):180-186.
Li Xiaoli, Wai Yuzhen, Xu Jie, et al. EGCG distribution visualization in tea leaves based on hyperspectral imaging technology[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34(7): 180-186. (in Chinese with English abstract)
[11] 王曼,张正竹,宁井铭,等. 基于近红外光谱的黄山毛峰茶鲜叶品质分析及等级快速评价[J]. 食品工业科技,2014,35(22):57-60.
Wang Man, Zhang Zhengzhu, Ning Jingming, et al. Study on quality and class rapid evaluation of tea leaf materials based on near infrared spectroscopy[J]. Science and Technology of Food Industry, 2014, 35(22): 57-60. (in Chinese with English abstract)
[12] 吴正敏,曹成茂,王二锐,等. 基于形态特征参数的茶叶精选方法[J]. 农业工程学报,2019,35(11):315-321.
Wu Zhengmin, Cao Chengmao, Wang Errui, et al. Tea selection method based on morphology feature parameters[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(11): 315-321. (in Chinese with English abstract)
[13] 武小红,翟艳丽,武斌,等. 模糊非相关鉴别C均值聚类的茶叶傅里叶红外光谱分类[J]. 光谱学与光谱分析,2018,38(6):1719-1723.
Wu Xiaohong, Zhai Yanli, Wu Bin, et al. Classification of tea varietied via ftir spectroscopy based on fuzzy uncorrelated discriminant C-means clustering[J]. Spectroscopy and Spectral Analysis, 2018, 38(6): 1719-1723. (in Chinese with English abstract)
[14] 孙俊,靳海涛,武小红,等. 基于低秩自动编码器及高光谱图像的茶叶品种鉴别[J]. 农业机械学报,2018,49(8):316-322.
Sun Jun, Jin Haitao, Wu Xiaohong, et al. Tea variety identification based on low-rank stacked auto-encoder and hyperspectral image[J]. Transactions of the Chinese Society for Agricultural Machinery, 2018, 49(8): 316-322. (in Chinese with English abstract)
[15] 宁井铭,张正竹,方世辉,等. 指纹图谱技术及其在茶叶品质控制中的应用[J]. 中国茶叶加工,2009,3(14):39-41.
[16] 宁井铭,李姝寰,王玉洁,等. 基于高光谱成像技术的工夫红茶数字化拼配[J]. 食品科学,2019,40(4):318-323.
Ning Jingming, Li Shuhuan, Wang Yujie, et al. Hyperspectral imaging for quality prediction model in digital blending of congou black tea[J]. Food Science, 2019, 40(4): 318-323. (in Chinese with English abstract)
[17] 陈全胜,赵杰文,蔡健荣,等. 基于近红外光谱和机器视觉的多信息融合技术评判茶叶品质[J]. 农业工程学报,2008,24(3):5-10.
Chen Quansheng, Zhao Jiewen, Cai Jianrong, et al. Inspection of tea quality by using multi-sensor information fusion based on NIR spectroscopy and machine vision[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2008, 24(3): 5-10. (in Chinese with English abstract)
[18] 邹小波,张俊俊,黄晓玮,等. 基于音频和近红外光谱融合技术的西瓜成熟度判别[J]. 农业工程学报,2019,35(9):301-307.
Zou Xiaobo, Zhang Junjun, Huang Xiaowei, et al. Distinguishing watermelon maturity based on acoustic characterstics and near infrared spectroscopy fusion technology[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(9): 301-307. (in Chinese with English abstract)
[19] 朱瑶迪,邹小波,石吉勇,等. 高光谱图像技术快速预测发酵醋醅总酸分布[J]. 农业工程学报,2014,30(16):320-327.
Zhu Yaodi, Zou Xiaobo, Shi Jiyong, et al. Rapidly detecting total acid distribution of vinegar culture based on hyperspectral imaging technology[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2014, 30(16): 320-327. (in Chinese with English abstract)
[20] 宁井铭,孙京京,朱小元,等. 基于图像和光谱信息融合的红茶萎凋程度量化判别[J]. 农业工程学报,2016,32(24):303-308.
Ning Jingming, Sun Jingjing, Zhu Xiaoyuan, et al. Discriminant of withering quality of Keemum black tea based on information fusion of image and spectrum[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2016, 32(24): 303-308. (in Chinese with English abstract)
[21] 滑金杰,袁海波,尹军峰,等. 绿茶电磁滚筒-热风耦合杀青工艺参数优化[J]. 农业工程学报,2015,31(12):260-267.
Hua Jinjie, Yuan Haibo, Yin Junfeng, et al. Optimization of fixation process by electromagnetic roller-hot air coupling machine for green tea[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2015, 31(12): 260-267. (in Chinese with English abstract)
[22] 施兆鹏,刘仲华. 夏茶苦涩味实质的数学模型探讨[J]. 茶叶科学,1987,7(2):7-12.
Shi Zhaopeng, Liu Zhonghua. Probe into mathematical model of chemical essence of bitterness and astringency in summer green tea[J]. Journal of Tea Science, 1987, 7(2): 7-12. (in Chinese with English abstract)
[23] Zou Hui. The adaptive lasso and its oracle properties[J]. Journal of Industrial and Management Optimization (JIMO), 2006, 101(476): 1418-1429.
[24] 李鱼强,潘天红,李浩然,等. 近红外光谱LASSO特征选择方法及其聚类分析应用研究[J]. 光谱学与光谱分析,2019,39(12):3809-3815.
Li Yuqiang, Pan Tianhong, Li Haoran, et al. NIR spectral feature selection using LASSO method and its application in the classification analysis[J]. Spectroscopy and Spectral Analysis, 2019, 39(12): 3809-3815. (in Chinese with English abstract)
[25] Li Yuqaing, Pan Tianhong, Li Haoran, et al. Near infrared spectroscopy quantitative analysis for Tricholoma matsutake based on information extraction by using the elastic net[J]. Journal of Near Infrared Spectroscopy, 2020, 28(3): 125-132.
[26] Zou Hui, Hastie Trevor. Regularization and variable selection via the elastic net[J]. Journal of the Royal Statistical Society, 2005, 67(5): 768-768.
[27] 赵安新,汤晓君,宋娅,等. 光谱分析中Elastic Net变量选择与降维方法[J]. 红外与激光工程,2014,43(6):1977-1981.
Zhao Anxin, Tang Xiaojun, Song Ya, et al. Spectral wavelength selection and dimension reduction using Elastic Net in spectroscopy analysis[J]. Infrared and Laser Engineering, 2014, 43(6): 1977-1981. (in Chinese with English abstract)
[28] 温泉,温志渝. 一种基于蒙特卡罗方法的近红外波长选择算法[J]. 光学学报,2012,30(12):3637-3642.
Wen Quan, Wen Zhiyu. New near infrared wavelength selection algorithm based on monte-carlo method[J]. Acta Optic Sinica, 2012, 30(12): 3637-3642. (in Chinese with English abstract)
Evaluation of Huangshan Maofeng tea grades based on feature variable selection using Elastic Net
Pan Tianhong1,2, Li Yuqiang2, Chen Qi3, Chen Shan2
(1.230061;2.212013;3.,245000)
Huangshan Maofeng tea has become one of the most famous Chinese tea due to its amazing orchid fragrance and fresh, sweet taste. However, different quality grades of Huangshan Maofeng tea vary greatly in price. The quality evaluation of tea has posed a great challenge in the tea market. The quality grades of variant tea are also related to the different microelements and concentrations. Traditional sensory evaluation methods cannot achieve fast and accurate discrimination, particularly depending on the manual experience. Alternatively, the chemical analysis can serve as an essential method for the quality evaluation of tea. But the chemical analysis for all microelements was confined to its complexity and time-consuming in a large-scale production under gradually refined detection standards with the fast expansion of tea market. Previous studies reveal that the samples collected from the same production or origin places have the similar microelement compositions and concentrations, indicating that the variation of tea grades depends only on a few types of microelements. Therefore, it is reasonable to select the typical microelements for the distinguishing performance, thereby to optimize the traditional chemical analysis. In this work, a new method was proposed based on the feature extraction using the Elastic Net, in order to simplify the procedure of conventional chemical analysis, while to improve the grade evaluation. First, 96 samples of Huangshan Maofeng tea were collected from three original places (Fuxi, Yangcun, and Xintian village) with 6 quality grades (advance 1-3 grades, and 1-3 grades) using the traditional manual process. The chemical analysis was used to analyze the types and contents of 19 microelements. Second, a cross-validation method was used to determine the optimal parameters in the Elastic Net, and 9 feature microelements (Gallic Acid, Epicatechin Gallate, Catechin, Epicatechin, Gallocatechin Gallate, Epigallocatechin, Glutamate, Arginine and catechins bitterness index) were selected when the cost function was minimized. Third, the radar chart was used to visualize the selected 9 microelements, indicating the tea grade evaluation. To quantify the classification, a quality grade evaluation model of Huangshan Maofeng tea was established on the selected feature microelements using partial least squares regression. Monte-Carlo method with 100 times was chosen to evaluate the stability and robustness of the presented model. The proposed method can reduce the number of microelements from 19 to 9, and thereby to improve the identification accuracy of quality grade evaluation from 69.55% to 79.31%, compared with the traditional chemical analysis. A principal component analysis (PCA) was also taken for comparison. The recognition accuracies of PCA and the proposed method for validation set were 70.79% and 78.72% respectively in the Monte-Carlo experiment. The experimental results demonstrated that the selection of feature microelements was feasible to simply the traditional chemical analysis, and improve the prediction performance. The analysis model based on the typical microelements can simplify the current chemical process, and thereby provide a flexible selection to the quality identification of tea.
models; quality control; Elastic Net; feature variables selection; Huangshan Maofeng tea; grade evaluation
潘天红,李鱼强,陈琦,等. 基于Elastic Net特征变量选择的黄山毛峰茶等级评价[J]. 农业工程学报,2020,36(13):264-271.doi:10.11975/j.issn.1002-6819.2020.13.031 http://www.tcsae.org
Pan Tianhong, Li Yuqiang, Chen Qi, et al. Evaluation of Huangshan Maofeng tea grades based on feature variable selection using Elastic Net[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(13): 264-271. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2020.13.031 http://www.tcsae.org
2020-03-19
2020-05-31
国家重点研发计划(2017YFF0211301);安徽省高校协同创新项目(GXXT-2019-012)
潘天红,博士,教授,博士生导师,主要从事检测技术与自动化转置、农业电气化与自动化研究。Email:thpan@live.com
10.11975/j.issn.1002-6819.2020.13.031
TP391.41
A
1002-6819(2020)-13-0264-08