LIU Jiaxin,GUI Zhiguo,ZHANG Quan,SHANG Yu
(Shanxi Provincial Key Laboratory for Biomedical Imaging and Big Data, North University of China,Taiyuan 030051,China)
Abstract:By applying the Huber regression algorithm to a relatively new technology of diffuse correlation spectroscopy (DCS),the blood flow index (BFI)from light electric field temporal autocorrelation data is extracted accurately via the Nth-order linear (NL)algorithm.The NL algorithm can extract BFI from tissues with irregular geometric shapes,and its accuracy depends on iterative linear regression.The combination of Huber regression with the NL algorithm is proposed in this paper for the first time.The Huber regression is compared with traditional ordinary least square (OLS)regression through computer simulations for evaluation.The results show that the Huber regression is more accurate in extracting BFI than OLS.Compared to the OLS with an error rate of 4.58%,Huber achieves a much smaller error rate (3.54%),indicating its potential in future clinical applications.
Key words:diffuse correlation spectroscopy (DCS);blood flow;Huber regression;Nth-order linear (NL)algorithm
The diagnosis and treatment of many diseases in life are closely related to microvascular blood flow[1].For example,insufficient cerebral blood perfusion can lead to ischemic stroke[2-3];tumor hemodynamics manifests as abnormal increase in tissue blood flow and oxygen metabolism,as well as decrease in blood oxygen[4-5];apnea during sleep can lead to changes in cerebral blood flow and hypoxemia[6].Abnormal changes in the blood flow of specific tissue indicate the induction of certain disease,so it is crucial to measure the microvascular blood flow.
Although there are various methods for clinical measurement of blood oxygen,some disadvantages appear such as low time resolution,non-dynamic measurement,invasiveness,poor portability and high cost[7].In order to resolve the above problems,near-infrared diffuse optical spectroscopy (NIRS)is widely used to measure blood oxygen in deep tissues,providing a nearly perfect alternative method for monitoring tissue blood oxygen[8].The wavelength range of near-infrared light (NIR)is called as the “NIR window”,in which the tissue has relatively low photon absorption.Therefore,NIR can measure blood flow in deep tissues.
As a dynamic NIRS technology,near-infrared diffuse correlation spectroscopy/tomography (DCS/DCT)has been used by many researchers recently to measure tissue blood flow and blood oxygen[3,9].Unlike the NIRS that uses light intensity information to estimate the static balance between oxygen supply and consumption,DCS/DCT calculates the temporal autocorrelation of light electric field,from which the blood flow index (BFI),representing the movements of red blood cells in microvascular tissues,can be extracted[10].
The traditional approach to obtain the BFI is to fit the experimental data of the DCS (i.e.,the light electric field temporal autocorrelation function)with the analytical solution of the diffusion correlation equation (a partial differential equation,PDE)[10].It is usually assumed that the tissue is homogeneous with regular geometric shapes (e.g.,slab).This assumption is inconsistent with the actual situation of the tissue,so this solution will produce large errors in clinical application,especially for the BFI measurement of tissue with small volume and large curvature.
In order to overcome the above-mentioned problems,our laboratory previously proposed an Nth-order linear (NL)algorithm,which combines the integral form of the light electric field temporal autocorrelation function with the Nth-order Taylor polynomial[11-12].Monte Carlo (MC)simulation is used to obtain information such as photon path length,so as to fully consider the non-homogeneous and irregular geometry of the tissue.Both phantom experiments and clinical experiments have verified the accuracy of the NL algorithm.The key step of the NL algorithm is iterative linear regression,from which the temporal autocorrelation data of light field are fully used.
In the actual measurement,the signal-to-noise ratio (SNR)of the light field temporal autocorrelation data exists,which substantially affects the accuracy in extracting BFI with the NL algorithm.How to suppress the noisy DCS signals is extremely important for the accurate measurement of BFI in many diseases.Although the ordinary least square (OLS)method has been widely used,it is highly susceptible to the noisy data[12-13].In recent years,Huber regression has been widely used due to its accurate performance in fitting tasks[14].For the first time,the combination of Huber regression and NL algorithm is explored based on DCS theory.For objective evaluation,a computer simulation experiment is designed to compare the accuracy of Huber regression and OLS,and the application potential of the proposed approach is verified.
In contrast to the traditional analytical solution,the NL algorithm does not seek for the solution of the correlation diffusion equation.Instead,it originates from the integral form of the light electric field temporal autocorrelation function[11]
(1)
By combining Eq.(1)with the Nth-order Taylor polynomial and carrying out a series of mathematical derivations,the following approximations are obtained in the case thatτis sufficient small
(2)
(3)
where
(4)
For the first-order approximation,the unknown variableαDBonly appears on the right side of Eq.(2).It can be rewritten as
(5)
where
(6)
(7)
For the Nth-order approximation,the unknownαDBappears on both sides of the Eq.(3),and this variable can be derived iteratively using the following equations
(8)
(9)
Here,the left sides of Eqs.(5)and (8)are defined as the modified autocorrelation decays (MADs).Determination of the slope (Sp)fromτand MADs by iterative linear regression is the key procedure to extract the BFI (i.e.αDB).
The mathematical derivation process of the NL algorithm for heterogeneous tissue is similar to that for homogeneous tissue described above.The difference is that multiple slopes for different source-detector measurements are required to extract the BFIs from different tissue types.
By simplifying Eqs.(5)and (8),the expression of the linear regression model is obtained as
y=wx+b,
(10)
whereyis the MAD;xis the delay timeτ;wis the slope value of the linear regression;bis the intercept.
The mean square error (MSE)defined in Eq.(11)is often used to represent the target function for the minimization problem,due to the advantages of differentiable features and easy programming.Sis the residual error between the predicted value and the true value,and its calculation is given by Eq.(12).The solution to MSE minimization problem is termed as the OLS method.In other words,the OLS method aims to find a straight line (i.e.,regression line)on which the square sum of Euclidean distances for all samples are minimal.
(11)
S=yi-wxi-b,
(12)
where (x1,y1),(x2,y2),…,(xi,yi),xi,yi∈Zare the autocorrelation data points in curve ofg1(τ).The OLS can be readily implemented by the regular derivation methods.
When using MSE as the minimal criterion,the regression line is heavily affected by the noisy data,because their square values of Euclidean distance are relatively large and the weight is more influential.The use of mean absolute error (MAE)as minimal criterion can effectively reduce the weight of the noisy data on determining the regression line.The MAE has the expression as
(13)
However,the MAE is not differentiable,and thus it takes much longer time to solve the MAE optimization problem (i.e.,L1normalization).
The Huber loss function integrates the advantages of MAE and MSE and overcomes their limits.Its definition is
(14)
where
(15)
θis the scale transformation coefficient;Sis the residual error between the predicted value and the true value,and its calculation is given by Eq.(12).As a threshold,the smallerδvalue leads to the solution closer to MSE,while the largerδvalue leads to the solution closer to MAE.Huber loss function is differentiable and can converge easily,meanwhile,it could reduce the influence of noise via the threshold parameterδ.
For computer simulations,a spherical tissue model with the diameter of 20 cm was built to mimic the adult human head,which is similar to those reported in previous studies[11-12].Photon MC simulation was carried out via an open-source code MCVM (MC modeling of photon migration in voxelized media).A total of 10 million photon packets were injected into the tissue model.The outcomes derived from MC simulations,including the photon path length and weight factors,were incorporated with optical properties and the assumed BFI value to generate light electric field temporal autocorrelation functions according to the discrete form of Eq.(1)[11]
(16)
Noise is inevitable in actual measurement.In order to mimic the realistic autocorrelation data,a proper level of noise was added to the autocorrelation curveg1(τ)at each delay timeτ,according to the photon statistics theory[15-16].
σ(τ)=
2〈n〉-1β(1+e-2Γτ)+〈n〉2β(1+e-2Γτ]1/2,
(17)
whereTis the bin time of correlator;Γis the decay rate;〈n〉=IT,Iis the detected photon counting rate;σ(τ)is the standard deviation of theg1(τ)noise.
Fig.1 Four S-D separations containing a source (S)fiber and four detector (D)fibers
In order to evaluate the accuracy of the two approaches (OLS and Huber regression)for BFI reconstruction,the parameter of mean absolute percentage error (MAPE)was used to quantify the error between the reconstructed BFI and the target BFI,which is defined as
(18)
All of the computing programs were performed on a desktop computer with a CPU frequency of 2.3 GHz and a memory of 8 G.The simulation outcomes derived from the OLS and Huber regression were compared.
Fig.2 shows the autocorrelation functionsg1(τ)with noise generated by the computer simulations at four S-D separations respectively.
Fig.2 Autocorrelation functions g1(τ)with noise
Theg1(τ)noise matches the real data noise determined by the photon count rate (1 600 kcps to 20 kcps)at four S-D separations (1.5 cm to 3.0 cm).It is clearly seen that a larger S-D separation leads to more noise in the autocorrelation functions.At the largest S-D separation (i.e.,3.0 cm),the linear regression processed by OLS and Huber are presented in Fig.3.The DCS data with the delay timeτin the range of 0.2 μs<τ<30 μs (79 data points)were used for all linear regressions.As shown in Fig.3,the regression line is substantially affected by noisy data points when OLS is adopted.This influence is significantly alleviated via Huber regression.
Fig.3 Fifth-order linear regression lines by OLS and Huber
When the value ofδis large,the MAPE value of Huber is close to that obtained by OLS.The hyperparameterδhas been optimized.As the value ofδdecreases,the performance of Huber regression is greatly improved.Eventually its MAPE value tends to stabilize and reach convergence.Fig.4 shows the time-course of BFI values (100 data points)extracted by the NL algorithm (N=5)wherein the OLS and Huber regression were utilized respectively.Obviously,the BFI values extracted by OLS have larger fluctuation around the target value (1.0×10-8cm2/s).By contrast,the BFI values extracted by Huber are much closer to the target value with smaller fluctuations.
Fig.4 Time-course of CBF (i.e.,αDB)reconstructed from OLS and Huber
The larger S-D separation leads to a larger MAPE value with either of the two methods.At all S-D separations (1.5 cm to 3.0 cm),Huber performs significantly better than OLS in terms of extracting BFI.The MAPE values of the two methods are in the range of 0.58% to 3.54% (Huber)and 0.74% to 4.58% (OLS),respectively.
In this paper,Huber regression was adopted to extract the blood flow from DCS signals via combining with NL algorithm.Compared with other machine learning approaches such as support vector regression (SVR)or recurrent neural network (RNN),the Huber regression does not require big data set and numerous optimization of hyperparameters.Moreover,the Huber function is differentiable and easy to be implemented in the online programming,which is important for the real-time brain monitoring.The accuracy of Huber regression for data fitting has also been well validated[5].The OLS method is very susceptible to noisy data,and the square exponential leads to the regression line inclining to the noise data points.To deal with this problem,MAE is used in Huber regression to reduce the weight of the noisy data.For the fair comparisons with the conventional OLS approach,the same data set derived from computer simulations were used for evaluating their performance in linear regression.In computer simulations,the BFI was set as a known value (i.e.,αDB=1.0×10-8cm2/s),and the DCS signals (i.e.,g1(τ)curves)were generated.The simulated DCS signals aims to quantify the accuracy of the specific approach through assessing the MAPE.By comparing the MAPE values,the accuracy of Huber regression was objectively evaluated.
In conclusion,Huber regression in data fitting has been proposed in this paper for DCS blood flow measurement.This approach was compared with the conventional approach (i.e.,OLS)through computer simulations.The results have demonstrated the advantages of the proposed approach (Huber regression)over the traditional one (OLS).This study promotes the application of an advanced data fitting approach to the clinical hemodynamic assessment.
Journal of Measurement Science and Instrumentation2021年2期