Interobserver agreement for contrast-enhanced ultrasound of liver imaging reporting and data system:A systematic review and metaanalysis

2020-04-08 06:07
World Journal of Clinical Cases 2020年22期

Jun Li,Ming Chen,Zi-Jing Wang,Chun-Li Cao,Tian Sang,Department of Medical Ultrasound,The First Affiliated Hospital of Medical College,Shihezi University,Shihezi 832008,Xinjiang Uygur Autonomous Region,China

Shu-Gang Li,Department of Child,Adolescent Health and Maternal Health,School of Public Health,Capital Medical University,Beijing 100069,Beijing,China

Meng Jiang, Department of Medical Ultrasound,Tongji Hospital,Tongji Medical College,Huazhong University of Science and Technology,Wuhan 430030,Hubei Province,China

Long Shi,Department of Medical Ultrasound,The Second People's Hospital of Jiangmen,Jingmen 448000,Hubei Province,China

Xin-Wu Cui,Department of Medical Ultrasound,Sino-German Tongji-Caritas Research Center of Ultrasound in Medicine,Tongji Hospital,Tongji Medical College,Huazhong University of Science and Technology,Wuhan 430030,Hubei Province,China

Christoph F Dietrich,Department of Internal Medicine,Hirslanden Clinic,Berne 27804,Switzerland

Abstract BACKGROUND Hepatocellular carcinoma is the most common primary liver malignancy. From the results of previous studies,Liver Imaging Reporting and Data System (LIRADS) on contrast-enhanced ultrasound (CEUS) has shown satisfactory diagnostic value. However,a unified conclusion on the interobserver stability of this innovative ultrasound imaging has not been determined. The present metaanalysis examined the interobserver agreement of CEUS LI-RADS to provide some reference for subsequent related research.AIM To evaluate the interobserver agreement of LI-RADS on CEUS and analyze the sources of heterogeneity between studies.METHODS Relevant papers on the subject of interobserver agreement on CEUS LI-RADS published before March 1,2020 in China and other countries were analyzed. The studies were filtered,and the diagnostic criteria were evaluated. The selected references were analyzed using the “meta” and “metafor” packages of R software version 3.6.2.RESULTS Eight studies were ultimately included in the present analysis. Meta-analysis results revealed that the summary Kappa value of included studies was 0.76 [95%confidence interval,0.67-0.83],which shows substantial agreement. Higgins I2 statistics also confirmed the substantial heterogeneity (I2 = 91.30%,95% confidence interval,85.3%-94.9%,P < 0.01). Meta-regression identified the variables,including the method of patient enrollment,method of consistency testing,and patient race,which explained the substantial study heterogeneity.CONCLUSION CEUS LI-RADS demonstrated overall substantial interobserver agreement,but heterogeneous results between studies were also obvious. Further clinical investigations should consider a modified recommendation about the experimental design.

Key Words: Contrast-enhanced ultrasound; Liver imaging reporting and data system;Interobserver agreement; Systematic review; Diagnosis; Meta-analysis

INTRODUCTION

Hepatocellular carcinoma (HCC) is the most common primary liver malignancy,and it is the second-most frequent cause of cancer-related deaths[1,2]. HCC often occurs in patients with risk factors,such as chronic hepatitis and cirrhosis[3].

The prognosis of patients with advanced HCC is poor,and curative treatments in patients with early stage HCC are needed[4]. Unlike other systemic malignancies,HCC is diagnosed noninvasively based on imaging characteristics without mandatory pathological confirmation in at-risk patients[5]. Countries with high incidence rates of HCC traditionally screen for high-risk patients using imaging examination,and these patients are closely followed. The focal liver lesions (FLLs) found in HCC screening on various imaging examinations are definitively diagnosed using contrast-enhanced diagnostic imaging examinations,including contrast-enhanced ultrasound (CEUS),contrast-enhanced computed tomography (CECT),and contrast-enhanced magnetic resonance imaging (CEMRI)[6].

In the abovementioned contrast-enhanced imaging examinations,CEUS allows noninvasive assessments of the contrast enhancement model of HCC without the use of ionizing radiation and with a much higher temporal resolution than computed tomography (CT) and magnetic resonance imaging (MRI)[7,8]. CEUS continues to gain traction as a technique that complements traditional B-mode and Doppler ultrasound in the evaluation of the liver and other organs[9]. CEUS shows changes in microvascular flow mechanics in the focus using real-time imaging of tissue perfusion,which also yields supplementary information,including flow in the microvasculature,slow flow,and perfusion kinetics[10]. CEUS exhibits high accuracy in the differential diagnosis of FLLs in cirrhosis and non-cirrhotic livers[11,12]. CEUS exhibits the same sensitivity and specificity for the differential diagnosis of FLLs,but it is more economical and effective than CECT and CEMRI[13].

Because imaging is important for the diagnosis and treatment decisions in HCC,it is necessary to standardize the imaging diagnosis of HCC and improve its diagnostic accuracy[14,15]. Considering this background,a group of international experts convened by the American College of Radiology (ACR) proposed the Liver Imaging Reporting and Data System (LI-RADS) to standardize the interpretation and reporting of HCC in 2014[16]. The ACR released the CEUS LI-RADS in 2016,with revisions in 2017,and it has become a standardized system for the technique,interpretation,reporting,and data collection for CEUS exams in patients who are at risk for developing HCC[17].CEUS LI-RADS integrates with the previously released CT/MRI LI-RADS,which provides the criteria for ordinal categories and definitions of the major and ancillary features for HCC[18]. CEUS allows radiologists to (1) use consistent terminology,(2)reduce variability and mistakes in imaging interpretation,(3) promote communication with referring clinicians,and (4) facilitate research and quality assurance[18]. Therefore,the standardized diagnosis helps promote standardization and reproducibility across institutions and radiologists[19].

However,a unified conclusion on the interobserver stability of this innovative ultrasound imaging has not been determined. Several previous studies reported research on the repeatability of LI-RADS on CEUS. The results of some clinical trials revealed strong controversies in interobserver agreement. Schellhaaset al[20]demonstrated that the Kappa value was just 0.39,and Tanet al[21]showed a Kappa value of 0.94 using LI-RADS on CEUS.

Based on the abovementioned studies,the present meta-analysis examined the interobserver agreement of CEUS LI-RADS to provide some reference for subsequent related research.

MATERIALS AND METHODS

Literature retrieval

The meta-analysis was performed on relevant literature that was published as late as March 1,2020 in the databases of PubMed,Web of Science,Embase,China Biology Medicine disc,Cochrane Library,Google Scholar,China National Knowledge Infrastructure,WANFANG databases,and ClinicalTrials.gov. No restriction on language was applied. Search keywords were HCC,CEUS,LI-RADS,and their synonyms:‘liver neoplasm’ or ‘liver cancer’ or ‘liver malignancy’ or ‘hepatocellular carcinoma’ or ‘HCC’; ‘Liver Imaging Reporting and Data System’ or ‘LI-RADS’; and‘contrast-enhanced ultrasound’ or ‘CEUS’.

Inclusion and exclusion criteria

Inclusion criteria:Research that met all of the following criteria were included:(1)Study types:Observational studies,such as retrospective or prospective; (2)Population:patients at-risk of HCC who needed regular observation,such as patients with cirrhosis and hepatitis B virus carrier; (3) Index tests:CEUS; and (4) Outcomes:sufficiently detailed information to evaluate interobserver agreement for CEUS of LIRADS.

Exclusion criteria:Papers were excluded if they met the following conditions:(1)Editorials,comments,letters,cases reports,and reviews; (2) Experimentation on animals; (3) Repetitive studies and research topic of documents without meeting the requirements; (4) Studies that were not related to the field of interest of the present research; and (5) Laboratory studies.

Data extraction

Two reviewers independently screened the appropriate articles according to the inclusion and exclusion criteria detailed above. Discrepancies in opinion between the two reviewers were resolvedviaconsult with an additional researcher for reevaluation at a consensus meeting.

The following data were distilled from the included studies using predefined data formats:(1) Article characteristics,including authors,publication years,and study designs (prospective or retrospective); (2) Process characteristics,including enrollment method of patients (selective or consecutive),the number of patients,age and gender ratio,the number of FLLs,and ratio of benign to malignant; (3) Ultrasound (US)system; (4) LI-RADS version; and (5) Reference standard,including pathology and synthesized clinical reference standard (SCRS). The Kappa value for categorical variables was extracted for each major feature and LI-RADS categorization.

Literature quality evaluation

The bias risk of the included papers was assessed using QUADAS-2 domains[22]. The answers to the symbolic questions of each of the five sections were either “yes”,“no”,or “unclear” corresponding to the judgment of the risk level of bias as “low”,“high”,or “uncertain”. If the answers to every question were “yes”,the study was at “low risk”,meanwhile,if the answers to all questions were “no” or “unclear”,the study was judged as “high risk”. If one of these answers was “no” or “unclear”,the study was placed under “uncertain”. Revman 5.3,special software for Cochrane collaborative network was used to output the results of QEDAST.

Statistical analysis

To calculate meta-analysis summary estimations,the Kappa value with standard error categorization was summarized. We estimated standard error from the 95%confidence interval (CI) if it was not mentioned in the original studies. The metaanalysis pooled Kappa value with 95%CI was calculated using the DerSimonian-Laird model with Knapp and Hartung adjustment[23]. According to Landis and Koch,Kappa value was categorized as follows:< 0.20,poor; 0.21-0.40,fair; 0.41-0.60,moderate; 0.61-0.80,substantial; and 0.81-1.00,almost perfect agreement[24]. Substantial heterogeneity existed if the value ofI2statistics exceeded 50% and thePvalue did not exceed 0.10 using the Cochran Q-test. A sensitivity analysis or subgroup analysis was performed when heterogeneity was noted,and data synthesis was selected for the random effect model. If the reasons for heterogeneity required further exploration,then a metaregression analysis was performed using covariates in the bivariate model. Statistical significance was denoted atP< 0.05. Funnel plot was used for diagnostic metaanalysis to assess the publication bias of included articles,and significant asymmetry was denoted atP< 0.10 for the slope coefficient. We used the “meta” package and“metafor” package in R software version 3.6.2 for analysis and synthesis (R Foundation,Vienna,Austria).

RESULTS

Results of the literature search

A detailed flow chart of the study selection process is shown in Figure 1. An aggregate of 129 articles was originally identified using the search strategy,and 54 articles were filtered after excluding duplicates. Twenty-two of the remaining studies were removed,including 19 articles that were unrelated to the field of interest and three review articles. The full texts of the remaining 13 studies were obtained. After review of the full transcripts,five other articles were excluded. Eight studies were ultimately eligible for meta-analysis.

Characteristics of eligible studies

The meta-analysis included eight studies (five in English,three in Chinese) with a total of 1177 patients and 1379 FLLs[3,6,20,21,25-28]. The major characteristics and basic information of the included articles are detailed in Table 1. The publication dates were 2017 to 2020. The patients who conformed to the inclusion and exclusion criteria from seven studies were Easterners[3,6,21,25-28],and only one article included Westerners[20].Only one article was a cohort study[26],and the others were retrospectivestudies[3,6,20,21,25,27,28]. The enrollment method of one article was selective cohort[21],and the other articles used consecutive cohorts[3,6,20,25-28]. The classification standard for CEUS of the four articles was the LI-RADS 2016 version[20,25,27,28],and the other studies selected the LI-RADS 2017 version[3,6,21,26]. The two articles published in the same year were from the same first author,but the samples and methods selected were not the same[3,6]. To distinguish these studies in the present meta-analysis,the article published in theJournal of Ultrasound in Medicineis identified using the first author (Wanget al[6]),and the article published inUltrasound Medicine Biologyis identified using the corresponding author (Cuiet al[3]). Three articles were dissertations[25-27]. Only one study used pathology as a reference standard[28],and the other seven studies used a combination of pathology and SCRS[3,6,20,21,25-27].

Table 1 Characteristics of included studies

Literature quality evaluation

Results of the quality assessment of the included articles are shown in Figure 2. The results revealed that relatively acceptable quality evaluations could be acquired from the involved studies.

Pooled interobserver agreement of LI-RADS for CEUS

Figure 1 Study flow chart.

The 95%CI of Kappa values of each study and the combined Kappa value of all included studies were estimated using analysis software,and the results are indicated in Table 2 and Table 3. Tanet al[21]concluded that the interobserver agreement of LIRADS was in near perfect agreement (Kappa value = 0.94; 95%CI,0.89-0.97). The HigginsI2statistics indicated substantial heterogeneity in the summary Kappa value(I2= 91.30%,95%CI,85.3%-94.9%,P< 0.01). Therefore,the pooled calculation used a random effect model. The pooled Kappa value from the random effect model was 0.76(95%CI,0.67-0.83),which showed substantial agreement. The forest plot of summary Kappa values is shown in Figure 3.

Publication bias

ThePvalue of the linear regression test of funnel plot asymmetry was 0.39,and we determined that no publication bias existed in the statistics. The Egger’s funnel plot is shown in Figure 4.

Sensitivity analysis

One study at a time was omitted from analysis,and the results of the sensitivity analysis are shown in Table 4,which showed no literature influence. Exclusion of the included articles one by one revealed that the HigginsI2also did not change significantly. The consolidation result was relatively stable.

Meta-regression analysis

Because of the strong heterogeneity of pooled studies in the merger statistics,a metaregression was used. The meta-regression analysis analyzed some clinically relevant variables of investigation,including study design (retrospective or prospective),method of patient enrollment (consecutive or selective),LI-RADS version (2016 or 2017),number of interobservers (two or three),number of US systems used (one or more),race of chosen patients (Easterners or Westerners),number of diagnosing FLLs(less than 100 or more than 100),and reference standard (only pathology or combination of pathology and SCRS). The results of regression analysis are exhibited in Table 5. The variables of method of patient enrollment (P< 0.01),number ofinterobservers (P< 0.01),and race of the chosen patients (P< 0.01) had significant statistical significance for the Kappa value. The Kappa value was 0.94 (95%CI,0.89-0.97) in the study that used selective patient enrollment[21],and the pooled Kappa value was 0.73 (95%CI,0.62-0.80) in the articles that used consecutive patient enrollment[3,6,20,25-28]. Estimations of the agreement of three interobservers indicated fair agreement (Kappa value = 0.39; 95%CI,0.14-0.59)[20],and the consistency of two interobservers showed substantial agreement[3,6,21,25-28]. The race of chosen patients also contributed to the strong heterogeneity. The covariates of study design (P= 0.26),LIRADS version (P= 0.66),number of US systems used (P= 0.93),number of FLLs (P=0.16),and reference standard (P= 0.58) did not cause statistical significance in the heterogeneity test.

Table 2 Analysis of interobserver agreement for included studies

Table 3 Interobserver agreement of pooled included studies

Table 4 Sensitivity analysis eliminating studies one by one

Table 5 Results of meta-regression analysis of interobserver agreement of Liver Imaging Reporting and Data System in contrastenhanced ultrasound

DISCUSSION

Meta-analysis of interobserver agreement of LI-RADS on CEUS was not reported previously. The summary Kappa value for the eight included studies was 0.76 (95%CI,0.67-0.83) in our study,which showed substantial inter-reader agreement for the use of LI-RADS on CEUS.

The CEMRI is another common noninvasive imaging method to assess benign and malignant FLLs. Notably,the LI-RADS on MRI to evaluate FLLs was developed in 2011 and recently updated in 2018[29]. The meta-analysis of Kanget al[30]revealed that summary interobserver agreement of LI-RADS on MRI was 0.70 (95%CI,0.56-0.85).Another multicenter international study,which used a large number of readers and a mixture of all LI-RADS category assignments,obtained a similar result (Kappa value =0.73; 95%CI,0.68-0.77)[19]. The interobserver agreement of LI-RADS on CEUS seems better than that of LI-RADS on MRI. Notably,CEUS avoids the disadvantages of MRI,such as high expense and a long inspection time. Some researchers demonstrated that the sensitivity of CEUS in the observation of arterial hypervascularity from nodules in liver cirrhosis was significantly higher than that of MRI[31-33]. Two recent meta-analyses of CEUS showed excellent diagnostic accuracy in differentiating malignant from benign FLLs with a summary sensitivity of 0.92 and summary specificity of 0.87,and the sensitivity of CEMRI was slightly weaker than that of CEUS with a pooled sensitivity of 0.86 and pooled specificity of 0.89[34,35].

The majority of HCCs are not suitable for curative resection at the time of treatment,and difficulties of surgical resection may be related to size,site,and number of tumors,vascular and extrahepatic involvement as well as liver function of the patient[36].Radiofrequency ablation (RFA) is another effective treatment for liver cancer,and it has emerged in clinical practice to expand the pool of patients considered for liverdirected therapies[37]. Traditionally,RFA is usually performed under B-mode US guidance. In recent years,some scholars have reported the treatment technique of RFA guided by CEUS for HCC. Miyamotoet al[38]exhibited the complete ablation rate after a single treatment session was significantly higher in CEUS group than in the B-mode US group. Moreover,Masuzakiet al[39]reported in a large-scale study that the detectability of tumor nodules was 83.5% in B-mode US and 93.2% in CEUS (P= 0.04).Therefore,the use of CEUS guidance in RFA for liver cancer is an efficient approach.

The eight studies included in the present meta-analysis also exhibited some problems. For example,most studies did not list the interobserver agreement for the major features of CEUS in detail. Notably,the CEUS LI-RADS criteria requires the combination of two major features,including arterial phase hyperenhancement(APHE) and washout,to distinguish benign and malignant FLLs[17]. Unfortunately,just two articles mentioned the Kappa value of APHE and washout[3,20]. Therefore,it is recommended that further research on the interobserver agreement of CEUS LI-RADS add an extra consistency test of the major features of CEUS to increase the persuasiveness of the research.

The present meta-analysis used meta-regression to compensate for the high heterogeneity and analyzed existing covariates due to several potential causes. The method of patient enrollment had a significant impact on the Kappa value. Tanet al[21]used selective screening as an inclusion standard rather than consecutive screening,which was different from the other seven articles,and the Kappa value of his research was obviously higher than the summary values of the other articles (0.94vs0.73,P<0.01). However,selective screening of patients caused a small sample size of FLLs,which may introduce potential confounders and bias.

Figure 3 Forest plot of pooled interobserver agreement for Liver Imaging Reporting and Data System on contrast-enhanced ultrasound. 1 This is our study. CI:Confidence interval.

Figure 4 Publication bias of the included studies.

Consistency tests of the eight included studies revealed that the number of reviewers also affected heterogeneity. Schellhaaset al[20]used two reviewers for interobserver agreement and achieved a satisfactory Kappa value using pairwise comparisons,and this Kappa value was lower than in the other articles (0.39vs0.79,P< 0.01). The low Kappa value may be because the calculation of Kappa relies on the assumption that a significant proportion of agreement is due to chance,and if a feature is observed very frequently,then a low Kappa value between the observers results[40].The author indicated that the reason for the low Kappa value was that the calculation of Cohen’s Kappa was influenced by the frequency of a certain feature being observed[20]. Therefore,the use of intraclass coefficient correlation rather than Cohen’s Kappa to represent interobserver agreement in multiple observers would be more satisfactory.

The race of enrolled patients may contribute to the heterogeneity in pooled analysis.Epidemiological surveys showed that chronic hepatitis B virus and hepatitis C virus infection led to HCC in eastern Asia and sub-Saharan Africa,and non-alcoholic fatty liver disease was the major precipitating factor of HCC in Western countries[41]. Nonalcoholic fatty liver diseases(NAFLD)are one of the most prevalent causes of chronic liver diseases in Western countries,with an estimated prevalence of 20%-40%[42]. In the context of fatty liver,the diagnosis of HCC that progressed from NAFLD may be more difficult. However,there is no relevant literature on this issue to research in the databases. Therefore,the diagnosing of HCC progressed from NAFLD and viral stimulation on CEUS is a hot issue.

Because most of the included articles were retrospective studies,the US system and reference standard of these studies were partially diversified. However,the metaregression analysis showed that the study design (0.78vs0.61,P= 0.26),the number of US systems (0.77vs0.76,P= 0.93),the number of FLLs (0.72vs0.82,P= 0.16),and reference standard (0.69vs0.77,P= 0.58) did not reach statistical significance for the heterogeneity,which confirmed that the use of LI-RADS on CEUS for the diagnosing of FLLs was stable laterally.

Notably,reliable interobserver agreements for LI-RADS categorization on CEUS were also observed for LI-RADS version 2016 and version 2017 (0.74vs0.78,P= 0.66).Because LI-RADS included lexicons,minute definitions and illustrations for imaging features,a high interobserver agreement may be achieved. The two versions of LIRADS on CEUS define APHE and washout clearly. For example,a feature of APHE may be considered present if it is demonstrated in the entire nodule or only a portion of the nodule[43]. CEUS characterization of washout requires assessment of its onset(latevsearly) and degree (mildvsmarked),not just its presence[17]. Generally,early (<60 s) and/or marked washout is a major feature for LR-M,and late (≥ 60 s) and mild washout is a major feature for HCC[44,45]. The diversity of version did not cause heterogeneity of statistical significance,and it was sufficient to authenticate the diagnosis stability of the two versions of LI-RADS on interobserver agreement.

Measurements may differ due to bias between reviewers,which may be explained by the use of the reader’s judgment of test results of a measurement[46]. The bias may result from differences in training,learning,and experience between reviewers. The consistency test demonstrated that reviewers in similar working environments used diagnostic reasoning,which may be reduced with continuous education and updated definitions in LI-RADS on CEUS.

The present meta-analysis has several limitations. First,substantial research heterogeneity was mentioned. However,three significant factors for study heterogeneity were found using meta-regression. Second,the different study designs and participants from diverse geographic locations led to the heterogeneous distribution of the disease.

CONCLUSION

In conclusion,the present meta-analysis produced some results. Summary results showed substantial interobserver agreement for LI-RADS on CEUS. The heterogeneity factors included the method of enrolling patients,the method of consistency testing,and the race of patients,which should be considered in subsequent study design.Certainly,a large,prospective,and multicenter study is also needed to confirm our results.

ARTICLE HIGHLIGHTS

Research background

From the results of previous studies,Liver Imaging Reporting and Data System (LIRADS) on contrast-enhanced ultrasound (CEUS) for diagnosing hepatocellular carcinoma (HCC) has shown a satisfactory diagnostic value. However,a unified conclusion on the interobserver stability of this innovative ultrasound imaging has not been determined. The present meta-analysis examined the interobserver agreement of CEUS LI-RADS to provide some reference for subsequent related research.

Research motivation

According to the inclusion and exclusion criteria,we included eight relevant articles to explore interobserver agreement of LI-RADS on CEUS by making a meta-analysis.Finally,meta-analysis results revealed that the summary Kappa value of included studies showed substantial agreement. The heterogeneity factors included the method of enrolling patients,the method of consistency testing,and the race of patients,which should be considered in subsequent study design.

Research objectives

The main objective of the present article is to explore interobserver agreement of LIRADS on CEUS for diagnosing HCC. Results of meta-analysis showed interobserver agreement is substantial and the heterogeneity factors included the method of enrolling patients,the method of consistency testing,and the race of patients,which should be considered in subsequent study design.

Research methods

The method of this article is to calculate Kappa value to estimate interobserver agreement of LI-RADS on CEUS for diagnosing HCC by using the “meta” package and “metafor” package in R software version 3.6.2 for analysis and synthesis (R Foundation,Vienna,Austria). The result of consistency test has a vital reference value for the stability of LI-RADS.

Research results

This article exhibited substantial interobserver agreement for LI-RADS on CEUS. In addition,meta-regression identified several heterogeneity factors,including the method of enrolling patients,the method of consistency testing,and the race of patients,which should be considered in subsequent study design. Meanwhile,a large,prospective,and multicenter related study is also needed to confirm our results.

Research conclusions

This study reported that interobserver agreement for LI-RADS on CEUS was substantial and that the method of enrolling patients,the method of consistency testing,and the race of patients perhaps interfere with interobserver agreement,which should be considered in subsequent study design.

Research perspectives

The method of enrolling patients,the method of consistency testing,and the race of patients perhaps interfere interobserver agreement and should be considered in future research about LI-RADS on CEUS for diagnosing HCC.