Value of intravoxel incoherent motion in detecting and staging liver fibrosis: A meta-analysis

2020-07-10 07:10ZhengYeYiWeiJieChenShanYaoBinSong

World Journal of Gastroenterology 2020年23期

Zheng Ye, Yi Wei, Jie Chen, Shan Yao, Bin Song

Abstract

Key words: Liver fibrosis; Liver cirrhosis; Intravoxel incoherent motion; Diffusion weight imaging; Diffusion magnetic resonance imaging; Meta-analysis

INTRODUCTION

Liver fibrosis (LF) is characterized by the excessive accumulation of extracellular matrix (primarily collagen type I)[1]. It is a common pathological feature of chronic liver disease caused by various etiologies, which may progress to hepatic dysfunction,portal hypertension, and even hepatocellular carcinoma, resulting in increased morbidity and mortality[2]. Early or intermediate LF is considered to be reversible with timely medical intervention and anti-fibrotic treatments[3]. Hence, early detection and accurate staging of LF is of great clinical significance in making appropriate therapeutic decisions and evaluating patient prognosis.

Liver biopsy is the current reference standard in detecting and staging LF.According to histologic scoring systems, the spectrum of fibrosis severity can be divided into several stages, for example, semi-quantitatively scoring as F0 (no fibrosis), F1 (portal fibrosis without septa), F2 (periportal fibrosis with few septa), F3(septal fibrosis) and F4 (cirrhosis) in the METAVIR system[4]. However, liver biopsy is invasive, observer-dependent, and prone to sampling variability[5], all which hampers its widespread use in clinical practice; thus, a noninvasive method to quantify LF is urgently needed. Recently, magnetic resonance imaging (MRI) techniques have been increasingly applied to LF detection and staging and could possibly be a noninvasive alternative to liver biopsy[6].

Diffusion-weighted imaging (DWI) can capture the information of Brownian motion (random motion of water molecules) and quantitatively reflect the degrees of extracellular matrix accumulationviaapparent diffusion coefficient (ADC), which has been previously reported as a good diagnostic tool in LF[7-9]. However, the diffusion process would be mimicked and confounded by the blood flow in capillaries(perfusion process), thereby affecting diffusion MRI measurements[10]. Intravoxel incoherent motion (IVIM), a bi-exponential model based on DWI, allows for the separate evaluation of true molecular diffusion and perfusion-related diffusion, which is more informative than DWI[10,11]. Although several recent studies focused on the diagnostic performances of IVIM in LF staging, there were discrepancies in the reported results among studies[12-15]. In 2016, Zhanget al[16]conducted a meta-analysis on this topic; however, due to the limited number of included studies, they only performed pooled weighted mean difference to compare the difference of IVIM parameters among LF stages, and failed to conclude the pooled diagnostic indexes to comprehensively evaluate the value of IVIM in detecting and staging LF.

Therefore, with more eligible studies and patients included, the purpose of this meta-analysis was to investigate the diagnostic performance of IVIM in different LF stages with histology as reference.

MATERIALS AND METHODS

Literature search

Two independent investigators conducted a comprehensive literature search of the Cochrane Library, Ovid MEDLINE, Web of Science, EMBASE and Google Scholar databases to identify relevant publications (literature retrieval until December 2019).The following keywords and search strategy were used: “IVIM OR intravoxel incoherent motion OR biexponential DWI OR diffusion magnetic resonance imaging”AND “liver/hepatic fibrosis OR liver/hepatic cirrhosis.” The search was limited to articles in the English language.

Inclusion and exclusion criteria

The inclusion criteria were as follows: (1) IVIM was performed for LF detection and staging; (2) Hepatic histological analysis was used as the reference standard for all LF patients; and (3) Sufficient data were provided to calculate the values of true-positive(TP), false-positive (FP), false-negative (FN), and true-negative (TN). The studies were excluded if they were: (1) Reviews, letters, editorials, comments, case reports, or guidelines; (2) Duplicate publications; and (3)ex vivo, phantom, or animal research.

Data extraction and quality assessment

The following information were extracted from each study: author, publication year,country, study design (prospectively or retrospectively), study population, patient baseline characteristics (sex ratio, mean age, disease spectrum), reference standard,histopathological characteristics, blinding procedure, detailed MRI protocol (scanner,field strength, trigger methods, b-values, scan time) and time intervals between MRI examination and reference test. Meanwhile, the best diagnostic parameter and its diagnostic threshold as well as TP, FP, FN, TN were recorded. For detecting and staging LF, we respectively extracted diagnostic data and 2 × 2 contingency tables in four subgroups, which were LF ≥ F1 (F0vsF1-F4, detecting LF from normal liver), LF≥ F2 (F0-F1vsF2-F4, differentiating moderate LF), LF ≥ F3 (F0-F2vsF3-F4,differentiating severe LF) and LF = F4 (F0-F3vsF4, detecting liver cirrhosis). The Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) scale[17]was used to evaluate the quality of included studies. The other two investigators independently performed data extraction and quality assessment and reached to consensus by discussion or by consulting a senior abdominal radiologist if opinions differed.

Statistical analysis

The pooled sensitivities, specificities, positive likelihood ratio, negative likelihood ratio and diagnostic odds ratio with corresponding 95% confidence interval (CI) were calculated by using random-effects coefficient binary regression model[18]. The summary receiver operating characteristic (SROC) curves analysis were constructed in each LF group, and the areas under the curves (AUCs) were also calculated[19].Heterogeneity among included studies was evaluated by using Q statistic of theχ2test and the inconsistency index (I2), withI2= 25%-50% indicating low heterogeneity,I2=51%-75% indicating moderate heterogeneity andI2＞ 75% indicating substantial heterogeneity[20]. To explore the potential sources of heterogeneity, the threshold effect was firstly examined by computing Spearman correlation coefficient between the logit of sensitivity and the logit of (1-specificity), and a significant strong positive correlation (P＜ 0.05) would suggest the presence of threshold effect[21]. Meta regression or subgroup analysis (depending on the number of included studies) was performed to find the possible sources other than threshold effect of heterogeneity[22].Sensitivity analyses were also conducted to evaluate the stability and reliability of the summary results. To evaluate potential publication bias of the included studies,Deeks’ funnel plot asymmetry test was conducted, and a P value higher than 0.05 in linear regression test indicated that there was no publication bias[23]. All statistical analyses were performed using Meta-Disc (version 1.4), Stata (version 12.0) and Reviewer Manager (version 5.3).

RESULTS

Literature search

A total of 890 studies were initially identified in the databases. After removing the duplicates, the remaining 655 studies were assessed by title, abstract and full paper.Finally, 12 studies with 923 subjects were included in this meta-analysis. The flowchart of studies inclusion and exclusion are shown in Figure 1.

Study characteristics and quality assessment

The baseline, methodological, and imaging protocol characteristics of the included studies are shown in Table 1 and Table 2. Of these 12 studies, there were 5 studies (n=465) for LF ≥ F1[24-28], 9 studies (n= 757) for LF ≥ F2[25-27,29-34], 4 studies (n= 413) for LF ≥F3[25-27,35]and 6 studies (n= 562) for LF = F4[25-27,29,31,33]. The best IVIM index, diagnostic threshold as well as reporting TP, FP, FN, TN, sensitivity and specificity in four LF groups were displayed in Table 3. The quality of included studies was good according to the QUADAS-2 scale (Figure 2).

Pooled diagnostic performance

The summarized diagnostic estimates are shown in Table 4. Pooled sensitivities and pooled specificities were estimated to be 0.78 (0.73-0.82) and 0.81 (0.74-0.86) for LF ≥F1, 0.82 (0.79-0.86) and 0.80 (0.75-0.84) for LF ≥ F2, 0.85(0.79-0.90) and 0.83 (0.77-0.87)for LF ≥ F3, and 0.90 (0.84-0.94) and 0.75 (0.70-0.79) for LF = F4, respectively.According to SROC analysis, the AUCs were 0.862 (0.811-0.914), 0.883 (0.856-0.909),0.886 (0.865-0.907) and 0.899 (0.866-0.932) for LF ≥ F1, F2, F3 and F4, respectively.SROC curves of four LF groups are demonstrated in Figure 3. Forest plots of sensitivity and specificity are shown Supplementary materials part 1.

Assessment of heterogeneity

There were moderate to substantial heterogeneity in our meta-analysis withI2ranging from 0% to 77.9% in pooled sensitivity and pooled specificity Supplementary materials part 1materials part 1). Threshold effect was eliminated by visual assessment of ROC plane, which showed no evidence of “shoulder-arm” shape, and the Spearman correlation coefficient, reporting 0.10 (P= 0.87), 0.47 (P= 0.21), -0.20 (P= 0.80) and 0.66 (P= 0.16) for LF ≥ F1, F2, F3 and F4, respectively. According to Cochrane handbook, meta regression was generally not considered when there were fewer than ten studies, so we conducted subgroup analysis to explore the potential contributors of heterogeneity in LF ≥ F2 group. The eligible studies for LF ≥ F1, F3 and F4 were too limited to perform meta-regression and subgroup analysis, and thus sensitivity analyses were conducted to test the robustness of results, which suggested our results were reliable (Supplementary materials part 2).

Subgroup analysis

We performed subgroup analysis to evaluate the possible sources of heterogeneity in LF ≥ F2 group in terms of study design, blindness manner, field strength, number of low b-values and IVIM trigger methods. The retrospective and double-blinded studies showed slightly higher AUC than prospective and unclear blinded studies. And the AUCs of studies using 3.0 T, more low-b-values and non-respiratory-triggered (RT)IVIM protocol were higher than those of studies with 1.5 T, less low-b-values and RT protocol. The detailed results of subgroup analysis are shown in Table 4.

Publication bias

ThePvalues in Deeks’ tests were 0.18 for LF ≥ F1, 0.28 for LF ≥ F2, and 0.20 for LF ≥F3, and 0.84 for LF = F4, respectively, which suggested the absence of notable publication bias (Supplementary materials part 3).

因此采取适宜经营措施即选择正确主伐方式方法至关重要，二次渐伐既能使林分经济效益得以发挥，也能保持我县森林群落相对稳定性，同时通过天然更新也可节省造林资金。二次渐伐不但实现了森林资源的三大效益，而且使森林资金越采越多，越采越好，达到青山常在、永续利用的目的。

DISCUSSION

With the accumulation of extracellular matrix (especially the collagen) in the fibrotic liver, the molecular water diffusion would be restricted, and the changes of fibrosis severity would be reflected in the diffusion parameters[36,37]. However, due to the relatively high hepatic blood volume fraction, perfusion-related diffusion, which was caused by incoherent motion of blood in pseudorandom capillary network, can contribute significantly to the true diffusion measurements, thus affecting the accuracy of traditional ADC in DWI[13]. Therefore, Le Bihanet al[10]proposed IVIM theory to capture the information of tissue diffusivity and microcapillary perfusion separately. In this meta-analysis, we included 12 eligible studies, and summarized the results based on a systematic and extensive statistical analysis, providing the pooled diagnostic estimates to simulate a large sample study and trying to overcome the limitations that previous studies have mentioned. According to our results, IVIM showed good but not perfect diagnostic accuracy in detecting and staging LF with AUCs ranging from 0.862 (0.811-0.914) to 0.899 (0.866-0.932).

Figure 1 Flowchart of study inclusion and exclusion.

There are three diagnostic parameters in IVIM model:Sb/S0= (1-f)·exp(-bDt) +f·exp[-b(Dt+ D＊)].

Where Dtis true diffusion coefficient, which was free from perfusion effects; D＊is pseudo-diffusion coefficient or perfusion-related diffusion andfstands for the fraction of the perfusion component[11]. In most studies, D＊was reported to decrease significantly with the progression of LF and considered as the best diagnostic parameter in detecting and staging LF, probably because of the architectural disruption and underlying hemodynamics changes of arterial and portal blood flow in fibrotic liver[29,38]. However, in this meta-analysis, there were one or two studies suggesting Dtorfas the best diagnostic index in each LF group[25,34,35], as demonstrated in Table 3, which may be attributed to the different b value distributions in those studies and the relatively large variability of D＊[39]. Although we have validated good reliability of our results by conducting sensitivity analyses in terms of different diagnostic parameters, further investigations are needed to explore the optimal IVIM parameter and its threshold in LF detection and staging.

LF ≥ F2 is considered as the clinically significant fibrosis and is a crucial time point for anti-fibrotic treatment[3]. In this meta-analysis, substantial heterogeneity was detected in LF ≥ F2 group; therefore we performed subgroup analyses to explore the possible contributors. To our knowledge, there is no clear consensus on the number and distribution of b-values in IVIM protocol so far. Theoretically, four b-values would be sufficient for fitting a biexponential model; however, including more bvalues would provide added robustness to the fit process, and low b-values is particularly important in fitting pseudo-diffusion constant[40]. In subgroup analysis,our results revealed that including three or more low b-values (0 ＜ b ＜ 50 s/mm2)would obtain a slightly higher diagnostic performance in detecting F2 fibrosis (AUC:0.877vs0.890), which were in accordance with Cohenet al[41]who recommended including at least two low b-values to ensure the accuracy when conducting liver IVIM research. Previous studies have tried to figure out the optimized b-values number and distribution in different clinical scenarios, however, the conclusions varied in those studies[42,43], and investigators have to balance the parameter estimation quality with the acquisition time during this process.

Apart from b-values, IVIM triggering methods is another key factor in acquisition time. Typically, scanning time of free-breathing (FB) IVIM is predetermined and often less than 5 min, while the time of RT IVIM is unpredictable, usually longer (5-10 min)and highly depends on subjects’ respiratory condition[44]. It is known that the RTtechnique enables the reduction of motion-related blurring by tracking the movement from the respiratory cycle and acquiring data only in the same phase; however,patients’ irregular breathing can decrease the time-efficiency of the acquisition or, in some cases, make the navigator tracking unusable[45,46]. In our study, results of subgroup analysis showed that diagnostic performance of IVIM was lower in five studies with RT method, compared with four studies with non-RT (FB or unclear)method (AUC: 0.867vs0.919). Although still controversial, our findings together with most previous studies indicated that RT method offers no advantage in fitting IVIM parameters and could be substituted by FB method, which is usually more comfortable for the patients[45-47]. In addition, Riexingeret al[48]recently found thatIVIM parameters of the liver showed a significant dependency on the applied field strength, hence we also conducted subgroup analysis in this regard. Commonly speaking, 3.0 T is much more sensitive to magnetic susceptibility induced artifacts and eddy current related distortion[37], however, our results indicated higher diagnostic performance of IVIM in 3.0 T scanners with AUC of 0.904, compared with 1.5 T scanners with AUC of 0.839. Cuiet al[49]also reported the similar findings and concluded the improved signal-to-noise ratio in high filed strength may be the underlying reason. Therefore, the standardized and optimized IVIM protocols in different filed strength should be investigated in the future for better clinical practice.

Table 1 Baseline characteristics of the included studies and subjects

Table 2 Methodological and imaging protocol characteristics of included studies

Figure 2 Quality assessment of included studies according to Quality Assessment of Diagnostic Accuracy Studies-2. The results showed that the quality of the included studies was good.

Other sophisticated diffusion models were also considered feasible in detecting and staging LF, including diffusion kurtosis imaging (DKI)[50], diffusion tensor imaging(DTI)[51], tri-exponential IVIM model[52]and stretched exponential model[53]. However,except for the stretched exponential model, other diffusion models showed no added diagnostic value to conventional DWI or bi-exponential IVIM for LF detection and staging[50-52]. Recently, Seoet al[31]and Fuet al[25]both reported the higher diagnostic potential of distributed diffusion coefficient (DDC) in stretched exponential model,compared with DWI and IVIM, for staging LF greater than F2. These results may be credited to the ability of DDC in capturing a continuous distribution of diffusion coefficients from every diffusion compartment (decided by the “no tissue compartmentalization” assumption)[54,55]. Beside different diffusion techniques,magnetic resonance elastography (MRE) has also been utilized in many studies for LF staging[8,25,56]. Although MRE demonstrated excellent diagnostic ability, even greater than DWI or IVIM, it is currently not widely available around the world since it requires special equipment as well as technical expertise for data acquisition and image postprocessing. However, IVIM is an easy-to-perform and relatively informative technique, which is more widely used in current clinical work.

We acknowledge some limitations in this study. First, although we used QUADAS-2 scale to ensure the high quality of included studies, there were still some studies with retrospective design and unclear blinding method in interpreting IVIM or pathological results, which may introduce inevitable bias and non-objective interpretation of results. Second, substantial heterogeneity was detected in the pooled estimates of LF ≥ F2, therefore we performed subgroup analysis in terms of study design, IVIM protocoletc.to explore the potential contributors and used random effects model to summarize our data. However, due to limited eligible studies (less than 10 studies), we did not perform meta-regression to find heterogeneity sources in a significant statistical way. Third, the number of included studies in LF ≥ F1, F3 and F4 was too limited to be further assessed, but the reliability of our results has been confirmed by sensitivity analyses and we believe that should be valuable in clinicalpractice. In the future, more studies are needed to update this meta-analysis for more comprehensive evaluation.

Table 3 Diagnostic raw data of intravoxel incoherent motion in each liver fibrosis group

In conclusion, with a larger sample size and the comprehensive statistical analysis,our meta-analysis showed that IVIM is a good diagnostic tool in detecting and staging LF and may serve as a noninvasive substitute to liver biopsy. Moreover, establishing an optimized and standardized IVIM protocol for LF detection and staging would be one of the future directions for its widespread application in patient care.

Table 4 Summary diagnostic performance and subgroup analysis

Figure 3 Summary receiver operating characteristic curves of intravoxel incoherent motion in detecting and staging liver fibrosis. A and B: The area under the curves are 0.862 for liver fibrosis (LF) ≥ F1 (A), B: 0.883 (0.856-0.909) for LF ≥ F2 (B); C and D: 0.886 (0.865-0.907) for LF ≥ F3 (C) and 0.899 (0.866-0.932) for LF = F4 (D), respectively. SROC: Summary receiver operating characteristic.

ARTICLE HIGHLIGHTS

Research results

Twelve studies with 923 subjects were included in this meta-analysis with 5 studies (n= 465) for LF ≥ F1, 9 studies (n= 757) for LF ≥ F2, 4 studies (n= 413) for LF ≥ F3 and 6 studies (n= 562) for LF = F4. The pooled sensitivity and specificity were estimated to be 0.78 (95% confidence interval: 0.73-0.82) and 0.81 (0.74-0.86) for LF ≥ F1 detection with IVIM; 0.82 (0.79-0.86) and 0.80(0.75-0.84) for staging F2 fibrosis; 0.85 (0.79-0.90) and 0.83 (0.77-0.87) for staging F3 fibrosis, and 0.90 (0.84-0.94) and 0.75 (0.70-0.79) for detecting F4 cirrhosis, respectively. The AUCs for LF ≥ F1,F2, F3, F4 detection were 0.862 (0.811-0.914), 0.883 (0.856-0.909), 0.886 (0.865-0.907) and 0.899(0.866-0.932), respectively. Moderate to substantial heterogeneity was observed with inconsistency index (I2) ranging from 0% to 77.9%. No publication bias was detected.

Research conclusions

IVIM is a noninvasive tool with good diagnostic performance in detecting and staging LF.Optimized and standardized IVIM protocols are needed for further improving its diagnostic accuracy in clinical practice.

Research perspectives

The results showed that IVIM is a valuable tool in noninvasively detecting and staging LF.However, field strength, the number and distribution of b-values, as well as the triggering methods would affect the diagnostic accuracy. There is still a need to establish an optimized and standardized IVIM protocol for LF diagnosis in clinical practice.