Spectrum Quantitative Analysis Based on Bootstrap-SVM Model with Small Sample Set

2016-07-12 12:43MAXiaoZHAOZhongXIONGShanhai
光谱学与光谱分析 2016年5期
关键词:针入度方根光谱

MA Xiao, ZHAO Zhong, XIONG Shan-hai

College of Information Science and Technology, Bejing University of Chemical Technology, Beijing 100029, China

Spectrum Quantitative Analysis Based on Bootstrap-SVM Model with Small Sample Set

MA Xiao, ZHAO Zhong*, XIONG Shan-hai

College of Information Science and Technology, Bejing University of Chemical Technology, Beijing 100029, China

A new spectrum quantitative analysis method based on Bootstrap-SVM model with small sample set is proposed in this paper.To build the spectrum quantitative analysis model for bitumen penetration index, altogether 29 bitumen samples were collected from 6 companies.Based on the collected 29 bitumen samples, spectrum quantitative analysis model with proposed method for predicting bitumen penetration index has been built.To verify the feasibility and effectiveness of the proposed method, the comparative experiments of predicting the bitumen sample penetration index with the proposed method, partial least squares (PLS) and support vector machine (SVM) have also been done.Comparative experiment results have verified that the minimum prediction root mean squared error (RMSE) is achieved by using the proposed Bootstrap-SVM model with the small sample set.The proposed method provides a new way to solve the problem of building the spectrum quantitative analysis model with small sample set.

Spectrum quantitative analysis; Small sample set; Bootstrap; Support vector machines; Partial least squares

Introduction

Spectrum quantitative analysis is an important research area in spectroscopy.Building a stable and accurate prediction model is the premise of spectrum quantitative analysis for unknown samples.Successful applications of spectrum quantitative analysis methods can now be seen in a wide variety areas, such as multiple linear regression (MLR)[1], principle component regression (PCR)[2], partial least squares (PLS)[3], artificial neural networks (ANN)[4]and support vector machine(SVM)[4].MLR, PCR and PLS are usually applied to build the linear prediction model and ANN, SVM can be applied to build the nonlinear prediction model.In the real applications, it is often difficult to obtain complete information from samples due to the limitations of the sample sources.It is noticed that less effort has been made to the studies of spectrum quantitative analysis based on small sample set, while the spectrum quantitative analysis based on large sample set has been well studied[1-4].In the cases of small sample set, it is usually difficult to build the stable and accurate prediction models for spectrum quantitative analysis with traditional methods.Hence, it is important to study the modeling methods for spectrum quantitative analysis with small sample set.

In this paper, how to build quantitative analysis model of the bitumen penetration index spectrum with small sample set is studied.Bitumen as pavement gumming material is widely used in road engineering.Bitumen penetration index is one of the important indicators which reflect the hardness of the pitch, consistency and ability to resist shear failure.Although the bitumen penetration index is a physical property, it is closely related with the content of the bitumen components.Aromatics saturation and aromatics have the high penetration indexes,while the penetration indexes of the resin and asphaltene are very low.According to the JTGF40-2004 issued by Ministry of Transportation of the People's Republic of China, the bitumen penetration index is measured by “Standard Test Methods of Bitumen and Bituminous Mixtures for Highway Engineering (JTJ 052—2000)”.This is time-consuming, difficult to operate and is also found of using toxic solvents.Therefore, it is necessary to work out a fast, clean and convenient method to measure the bitumen penetration index.Infrared spectroscopy analysis is a nondestructive detection and also a rapid analysis method, which can be applied to measure the bitumen penetration index.In this paper, a new spectrum quantitative analysis method based on Bootstrap-SVM model with small sample set is proposed for building the bitumen penetration index prediction model.The paper is organized as follows: in Section 1, the sample processing with Bootstrap algorithm and machine learning with SVM are presented.The detailed description of the experiment is presented in Section 2.Section 3 is devoted to comparative experiments and discussion.The paper is concluded in Section 4.

1 Algorithms and theory

1.1 Sample processing

In the sample processing, Bootstrap resampling was applied to expand the sample set.Bootstrap resampling was proposed by Professor Efron[6].It is essentially a non-parametric resampling method which needs no assumption of the sample distribution.The basic idea of Bootstrap resampling is to simulate the sample generation process by repeating resampling data .Due to the limitations of the sample sources, the spectrum quantitative analysis model for predicting the bitumen penetration index has to be built based on small sample set.In this paper, Bootstrap resampling is applied to expand the sample set.The steps of sample processing with Bootstrap resampling are as follows:

(1) Define the original sample set asX=(X1,X2,…,Xn).Randomly generate the integers asi1,i2,…,in∈[1,n];

1.2 Noise injection

In order to simulate the sampling process and improve the stability of the spectrum quantitative analysis model, noise injection[7-8]was applied to the expanded samples after resampling.Noise injected to the input values, output values and both input and output values are three ways of injection.The noise injection can be described as

ZV=Z+V

(1)

ZVis the data matrix after the noise injection,Zis the source data matrix andVrepresents the noise matrix.So,

then,

(2)

Mis the total number of samples.pis the length of each data sample.zvijdenotes the data items after noise injection.zijdenotes the original data item andvijdenotes the noise which is added tozij.In this paper, Gaussian white noise matrix withVi∈N(0,σ2) was chosen as the noise matrix.The noise intensity can be adjusted byσ.

1.3 Support Vector Machine

Support vector machine (SVM) was proposed based on statistical learning by Vapnik[9].The SVM is a machine learning method based on structural risk minimization which can be used to deal with small sample set, nonlinear and high dimensional machine learning problems.In order to obtain the best generalization ability, the precision of data approximation and the complexity of approximation functions are compromised during the machine learning process in SVM and the learning process is transferred into solving a convex quadratic programming problem.Therefore, the global optima can be gained.The problem of local minima can be avoided compared with the traditional machine learning methods with multilayer feed forward neural networks.In SVM, the nonlinear transformation is applied to transfer the samples into the high-dimensional feature space and the linear decision function can be constructed to classify the original samples in the high-dimensional feature space.Therefore, the complexity of learning process has nothing to do with the dimensions of sample set.In this paper, SVM is applied to build the spectrum quantitative analysis model for predicting the bitumen penetration index.

2 Experiment

2.1 Sample information

29 bitumen samples have been collected from different factories.According to crude oil producers, the collected samples can be divided into two classes, the South America’s heavy oil and Xin Jiang’s thickened oil.The bitumen penetration indexes of the samples penetration were measured under the “Standard Test Methods of Bitumen and Bituminous Mixtures for Highway Engineering (JTJ 052-2000)”.The calibration set and validation set are shown in the table 1.

Tabel 1 Bitumen samples category and distribution

2.2 Instrument and working conditions

The spectrum of bitumen was collected by attenuated total reflectance infrared spectroscopy in the analytical instrumentation center of Beijing University of Chemical Technology.The instrument parameters were set as follows: the wave number range was 4000~650 cm-1, resolution was 4 cm-1and scan times were 32.The samples needed to be heated to 70 ℃ when the infrared spectrum was measured and a few samples were evenly coated on the surface of the ATR crystal.The same sample was repeated three times and then the average spectrum was used as the infrared spectrum of the sample.

2.3 Data processing

The quantitative models of PLS, SVM and Bootstrap-SVM have been compared in this paper.The methods of first-order differential, data smoothing and mean center were applied to PLS.The data normalization was applied to SVM and Bootstrap-SVM.

3 Result and discussion

3.1 Spectrum analysis

The main components of the road bitumen samples studied in this work are hydrocarbon, hydroxyl compound!and oxygenated compound.The penetration index is one of the physical properties of bitumen, but it is closely related to the chemical composition and content in bitumen.The infrared spectrum reflects the molecular vibration and rotational basic information of material.Therefore, the penetration index quantitative predicting model can be built with infrared spectrum analysis.The bitumen infrared absorption spectrum is shown in figure 1.

Fig.1 ATR IR spectrum of Bitumen samples

3.2 The spectrum quantitative analysis model with PLS

PLS is widely applied to the quantitative analysis of infrared spectrum currently.The PLS model in this paper was built with the data after pre-processing.The first three principal components were selected by cross-validation and the input and output data mapping.The input and output principle components and the proportion of eigenvalues are shown in figure 2 and in figure 3 respectively.The prediction result of PLS is show in table 2.

Fig.2 Eigenvalue vs.PC Number

Fig.3 Eigenvalue vs.PC Number

Table 2 Result of PLS

sampleprediction16567.021268.865.729369.863.918462.164.19856667.74467073.296766.966.65887168.87496567.094106567.842RMSE2.889

3.3 The spectrum quantitative analysis model with SVM

For convenience,the Libsvm tools developed by Professor Lin Chih-Jen were applied to build the spectrum quantitative model with SVM.The parameter settings are as follows: the SVM model type selected as ε-SVR, the kernel function selected as RBF, the parameters set as -p1.5,-c0.01.The prediction results are shown in table 3.

Table 3 Result of SVM

3.4 The spectrum quantitative analysis model with Bootstrap-SVM

Firstly, the original sample set was expanded by resampling method as described in 1.1.The calibration set of 19 samples was expanded to 200.Then, the 200 samples were injected with noise as described in 1.2.The noise intensity should be adjusted because the noise level has a great influence on the accuracy of the analysis model.If the intensity of noise is too small, the samples after noise injection are similar to the original samples.And if the intensity of noise is too large, it would generate the abnormal samples.Man-made factors, instrument factors, temperature and other factors may result in subtle differences in measurement of the spectrum.It is found that the subtle differences of spectrum would cause large errors of prediction.So the intensity of noise can be determined by several tests.In this paper, the intensity of noise was taken asσx=0.001,σy=0.1.The SVM model was built by using Libsvm tool.The parameters were chosen as -p2.0,-c0.03.The prediction results with 10 validation samples are shown in table 4.

Table 4 Result of Bootstrap-SVM

4 Conclusion

In this paper, a new spectrum quantitative analysis method based on Bootstrap-SVM model with small sample set is proposed.Based on the collected 29 bitumen samples, spectrum quantitative analysis model with proposed method for predicting bitumen penetration index has been built.The comparative experiments of predicting the bitumen sample penetration index with the proposed method, partial least squares (PLS) and support vector machine (SVM) have also been done.Comparative experiment results have verified that the minimum prediction root mean squared error (RMSE) is achieved by using the proposed Bootstrap-SVM model with the small sample set.In this paper, it is found that the nonlinear models such as SVM and Bootstrap-SVM could predict the bitumen penetration index more precisely.Though SVM based on statistical learning theory can be applied to build the predicting model with small sample set, the accuracy and generalization ability of SVM model with small sample set can be improved obviously by Bootstrap resampling and noise injection.

[1] BIAN Zhao-qi,ZHANG Xue-gong.Pattern Recognition.Beijing: Tsinghua University Publishing Company, 2000.192.

[2] Luo Wentao, Liu Guili.Modern Scientific Instruments, 2013, 6(3): 94.

[3] Roggo Y, Roeseler C, Ulmschneider M.J.Pharm.Biomed.Anal., 2004, 36(4): 777.

[4] Fontalvo-Gomez M, Colucci J A, Velez Natasha, Romanach R J.Applied Spectroscopy, 2013, 67(10): 1142.

[5] Mao R, Zhu H, Zhang L.A.Chen.Proc.ISDA, 2006, (1): 17.

[6] Lanouette R, Thibault J, Valade J L.Comput.Chem.Eng.,1999, 23(9): 1167.

[7] Luigi Fortuna, Salvatore Graziani, Maria Gabriella Xibilia.IEEE Transaction on Instrumentation and Measurement, 2009, 58(8): 2444.

[8] Efron B.The Annals and Statistics,1979, 7(1): 1.

[9] Grandvalet Y, Boucheron S.Neural Comput.,1997, 9(5): 1093.

*通讯联系人

O657.3

A

基于Bootstrap-SVM在小样本条件下光谱定量分析研究

马 啸,赵 众*,熊善海

北京化工大学信息科学与技术学院,北京 100029

提出了一种在小样本条件下建立光谱定量分析的新方法-Bootstrap-SVM模型。以道路沥青为研究对象,共收集29个来自6个不同单位的沥青样本,利用所提方法建立了沥青针入度定量分析模型。Bootstrap-SVM由Bootstrap重抽样、噪声注入及SVM三个步骤组成。为了对比所提方法的优势,对比了目前常用的PLS模型以及SVM模型。研究结果表明Bootstrap-SVM,PLS,SVM预测均方根误差分别为0.773 5,2.889,1.784 4,所提方法预测精度最好,为小样本条件下光谱定量分析提供了一种新的有效方法。

小样本; Bootstrap; 支持向量机

2015-03-02,

2015-07-09)

Foundation item:Fundamental Research Founds for Central Universities (YS1404)

10.3964/j.issn.1000-0593(2016)05-1571-05

Received:2015-03-02; accepted:2015-07-09

Biography:MA Xiao, (1990—), Master degree candidate in Beijing University of Chemical Technology e-mail: maxiao2014job@163.com *Corresponding author e-mail: zhaozhong@mail.buct.edu.cn

猜你喜欢
针入度方根光谱
基于三维Saab变换的高光谱图像压缩方法
道路沥青材料针入度与温度的关联及其数学模型的验证
道路石油沥青针入度与温度的关联优化及其数学模型的建立
高光谱遥感成像技术的发展与展望
改善SBS改性沥青产品针入度指数的方法探究
我们爱把马鲛鱼叫鰆鯃
沥青针入度测量不确定度评定
均方根嵌入式容积粒子PHD 多目标跟踪方法
星载近红外高光谱CO2遥感进展
数学魔术——神奇的速算