Comparative study on artificial intelligence systems for detecting early esophageal squamous cell carcinoma between narrow-band and white-light imaging

2021-02-04 09:47:18BingLiShiLunCaiWeiMinTanJiChunLiAyimukedisiYalikongXiaoShuangFengHonHoYuPinXiangLuZhenFengLiQingYaoPingHongZhouBoYanYunShiZhong
World Journal of Gastroenterology 2021年3期

Bing Li, Shi-Lun Cai, Wei-Min Tan, Ji-Chun Li, Ayimukedisi Yalikong, Xiao-Shuang Feng, Hon-Ho Yu, Pin-Xiang Lu, Zhen Feng, Li-Qing Yao, Ping-Hong Zhou, Bo Yan, Yun-Shi Zhong

Abstract

Key Words: Computer-aided detection; Esophageal squamous cell carcinoma; Endoscopy; Screening; Narrow-band imaging; White-light imaging

INTRODUCTION

Upper gastrointestinal endoscopy combined with biopsy is the method of choice to diagnose esophagus squamous cell carcinoma (ESCC), and it has been widely adopted in population screening for ESCC[1,2]. However, it is not always easy to identify earlystage ESCC, especially for unexperienced doctors, during examination with whitelight imaging (WLI)[3,4]. The narrow-band imaging (NBI) system improves the visualization of microvasculature and mucosal patterns in the alimentary tract[5]. Nonmagnifying endoscopy with NBI (NM-NBI) has been used frequently in routine screening examinations with higher accuracy and specificity[6,7]. However, the sensitivity of NBI for screening of mucosal high-grade neoplasia is significantly different between experienced and less experienced endoscopists[8,9]. Early lesions missed at screening may not be identified until they become more advanced and less amenable to treatment. Thus, the experience of the operator plays a critical role in the screening result for early ESCC.

Artificial intelligence (AI) may be uniquely poised to compensate for the lack of operator experience. Studies have demonstrated the ability of AI to meet or exceed the performance of human experts as a triage or screening tool for gastrointestinal diseases[10,11]. In our previous research, we reported a novel system of computer-aided detection (CAD) to localize and identify early ESCC under conventional endoscopic WLI with sensitivity above 97%[12]. Here, another system of CAD for application in NM-NBI for screening of early ESCC was constructed and validated. More importantly, we compared the effectiveness of the two systems based on WLI or NMNBI in helping endoscopists to detect early ESCC. On the basis of the results, we can determine which technique is most effective to help endoscopists,i.e.whether to use CAD-NBI or CAD-WLI alone or both.

MATERIALS AND METHODS

Study design

This study was performed at the Endoscopy Center of three general hospitals (Zhongshan Hospital of Fudan University, Xuhui Hospital, and Kiang Wu Hospital) in partnership with the School of Computer Science of Fudan University. Patient data were anonymized, and any personal identifying information was excluded. This study was approved by the Institutional Review Board of Zhongshan Hospital, Fudan University (approval No. B2019-141R). All authors had access to the study data and reviewed and approved the final manuscript.

Datasets used for training and validation of the CAD-NBI system

First, we retrospectively obtained esophagoscopic NM-NBI images for the development of the CAD system for NBI images (CAD-NBI system). In the training dataset, a total of 2167 abnormal NM-NBI images of early ESCCs in 235 cases and 2568 normal NM-NBI images in 412 cases were collected between January 2016 and April 2018 from three institutions. Then, we collected 316 pairs of images (133 abnormal and 183 normal), each pair including WLI and NBI at the same location and at the same angle, from 112 consecutive cases. The purpose of establishing this paired image dataset includes: (1) All NBI images are used to test our newly established CAD-NBI system; (2) White light images paired with NBI in the same situation are used to test the CAD system for WLI (CAD-WLI system)[12]that has been reported previously to compare the differences between the two CAD systems; and (3) Endoscopists are asked to review all the images from this validation dataset to evaluate their diagnostic ability with or without the help of these two CAD systems.

The criteria for normal and abnormal images refer to the previous study on CAD for screening of early ESCC from our team[12]. The criteria for choosing normal image data were as follows: (1) The initial endoscopic inspection results for the esophagus were negative; (2) The abovementioned patients had no newly detected lesions until September 2019; and (3) The normal images were confirmed by endoscopists with ≥ 15 years of experience in endoscopy, and all the endoscopists believed that the image is normal. All patients with abnormal endoscopic images underwent endoscopic submucosal dissection procedures, and three gastrointestinal pathologists (two with > 10 years of experience and one with > 15 years of experience) conducted histological assessments in the pathology departments of both centers. Early ESCC includes lowgrade and high-grade intraepithelial neoplasia and esophageal cancer (EC) that has invaded mucosal or superficial submucosal layer.

Design and development of the CAD-NBI system

In this study, we consider the esophagus lesions in endoscopic images to be semantic objects. We demonstrate the development and validation of an endoscopist-level CAD-NBI system based on deep learning algorithm for screening esophagus lesions. We propose to use fully convolutional neural network based on Visual Geometry Group model for semantic segmentation, where semantics denote the esophagus lesions. Therefore, the CAD-NBI system is used to predict the location and irregular shape of esophagus lesions, which is helpful for endoscopists to judge the size, area, and location of lesions more effectively.

To obtain an accurate predictor based on limited esophagoscopic images, some preprocessing is conducted on the esophagoscopic images before training. First, some irrelevant regions for lesion detection, such as black background, are cropped automatically using a simple image processing algorithm developed by us. Second, we randomly flip the esophagoscopic images horizontally and vertically to augment data diversity. Third, for further data augmentation, esophagoscopic images and the corresponding lesion masks are resized to 300 × 300 and randomly cropped to 224 × 224. During training, the network parameters are updated with the initial learning rate of 0.0001 and decayed every 2000 iterations with a decay rate of 0.9 in the staircase mode. During inference, given an esophagoscopic image that the CAD-NBI system has never seen previously, the system outputs the segmentation result of esophagoscopic lesion directly.

Comparison of the improvement of diagnostic capability of endoscopists under CAD-WLI and CAD-NBI

The accuracy of the CAD-NBI system was evaluated by the validation dataset established previously. We invited 20 endoscopists with varying experience from three centers to participate in this study in order to compare diagnostic performance between the CAD system and endoscopists. Moreover, we wanted to know the effectiveness of the two systems, CAD-WLI and CAD-NBI, for the improvement of diagnostic capabilities of endoscopists. Among the endoscopists, four were classified as highly experienced endoscopists who had performed more than 10000 conventional endoscopy examinations and 5000 NBI endoscopy examinations, eight were classified as mid-level endoscopists who had performed more than 5000 conventional endoscopy examinations and 2500 NBI endoscopy examinations, and eight were classified as junior endoscopists who had performed more than 2000 conventional endoscopy examinations and 1000 NBI endoscopy examinations.

To test the effectiveness of the two CAD systems, we designed a four-phase trial. In the first phase, the 20 endoscopists were asked to review every pair image of the validation dataset in digital format on a laptop. All of them were blinded to the histological data and asked to review the esophagoscopic images independently. The CAD-NBI system scanned and analyzed each NM-NBI image of every pair saved in JPEG/PNG format on the hard drive, and the CAD-WLI system did the same for NMWLI images. In the second phase, after the WLI images had been marked using the CAD-WLI system (NBI images were not marked), we invited these endoscopists to again screen every pair of images in the validation dataset. This action was noteworthy as the review sequence of images was altered randomly by computer to minimize the impact of impression from the last performance in each phase. In the third phase, endoscopists were asked to review every pair of images in the validation dataset once again, after all NBI images were marked by the CAD-NBI system while WLI images were not. In the last phase, the endoscopists completed their final diagnosis for the validation paired images by referring to the results from both CAD-WLI and CADNBI systems. Between each two continuous phases, we set the wash-out phase as 1-1-2 mo, respectively. Moreover, these 20 endoscopists were unaware of the performance of the CAD-WLI and CAD-NBI systems. A flowchart depicting the processes used during the study is shown in Figure 1.

Outcome measures

The ability of the CAD-NBI system to identify early ESCC was mathematically assessed by the area under the curve of the receiver operating characteristic curve, and the sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV) were determined. The accuracy, sensitivity, specificity, PPV, and NPV were also compared between CAD-NBI, CAD-WLI, and the endoscopists.

Statistical analysis

The chi-square test and t test were used wherever applicable. APvalue of < 0.05 was considered to be statistically significant. A two-sided McNemar test with a significance level of 0.05 was used to compare differences in accuracy, sensitivity, specificity, PPV, and NPV. All statistical analyses were performed using SPSS version 18.0 (SPSS Inc., Chicago, IL, United States).

RESULTS

The detailed characteristics of the patients and lesions in the validation image dataset are listed in Table 1.

Figure 1 Flowchart of the procedures. CAD: Computer-assisted detection; NBI: Narrow-band imaging; WLI: White -light imaging.

Performance of the CAD-NBI and CAD-WLI systems

The receiver operating characteristic curve of the CAD-NBI system is shown in Figure 2, and the area under the curve was 0.9761. For 316 NBI images in the validation dataset, the sensitivity, specificity, accuracy, PPV, and NPV of CAD-NBI were 91.0%, 96.7%, 94.3%, 95.3%, and 93.6%, respectively. For 316 WLI images in the validation dataset, CAD-WLI correctly identified 131 of the 133 early ESCC lesions, with sensitivity, specificity, and accuracy of 98.5%, 83.1%, and 89.5%, respectively. The PPV and NPV of the CAD-WLI system were 80.8% and 98.7%, respectively.

Comparison of the two CAD systems (Figure 3) revealed that the accuracy and specificity of CAD-NBI were superior to those of CAD-WLI (P= 0.028 andP≤ 0.001). However, the sensitivity of the CAD-WLI system was higher than that of the CADNBI system (P= 0.006). The CAD-WLI and CAD-NBI system recognized and marked the lesion with a blue square in paired images (Figure 4A and B). In addition, when CAD-WLI mistakenly identified the normal mucosa as lesion (blue square) in WLI, CAD-NBI with high specificity could correct it in NBI (Figure 4C and D).

Comparison between CAD systems and the endoscopists

Table 2 compares the performance of the two CAD systems and the endoscopists for diagnosing early ESCC. The overall accuracy, sensitivity, specificity, PPV, and NPV of the 20 endoscopists were 73.9%, 87.7%, 81.9%, 81.7%, and 82.7%, respectively. Apparently, the experienced endoscopists achieved significantly better diagnostic results than the less experienced ones, including mid-level and junior endoscopists. The average accuracy value of the experienced endoscopists for early ESCC was 93.6%, which was similar to that of the CAD-NBI system and higher than that of the CADWLI system. CAD-WLI achieved the highest sensitivity (98.5%), whereas its specificity was lower than that of CAD-NBI with the highest value of 96.7% and the average value of all the endoscopists (87.7%).

Improvement after referring to the results from CAD-WLI and CAD-NBI

With the assistance of either CAD-WLI or CAD-NBI, all the three groups of endoscopists showed improvement in accurately diagnosing early ESCC (Figure 5).

Table 3 shows the average diagnostic performance of endoscopists in the detection of early ESCC after referring to the results from the CAD-WLI system in the second phase and from the CAD-NBI system in the third phase. Next, we compared the advantages of the two systems in different aspects. The CAD-NBI system helped the endoscopists to achieve higher value than that achieved with the assistance of CADWLI system, especially in the mid-level group with a significant difference (85.3%vs88.4%,P= 0.012). Experienced and mid-level endoscopists showed no significant differences in their sensitivity for lesions in the two phases, while the CAD-WLI system helped junior endoscopists to achieve higher sensitivity than that achievedusing the CAD-NBI system (83.0%vs77.3%,P= 0.008). In addition, there were no significant differences in the specificity of experienced endoscopists when using CADWLI or CAD-NBI. However, a significant improvement in diagnostic specificity was shown by mid-level (85.9%vs92.6%,P= 0.000) and junior endoscopists (88.6%vs94.9%,P= 0.003).

Table 1 Patient and lesion characteristics in the validation image set

Table 2 Diagnostic performance of computer-assisted detection systems vs endoscopists

Table 4 shows the average diagnostic performance of endoscopists after referring to the results from both the CAD-WLI and CAD-NBI systems. In the fourth phase, the diagnostic capability of all the endoscopists improved to the highest level, with the accuracy, sensitivity, and specificity of 94.9%, 92.4%, and 96.7%, respectively. The accuracy of junior endoscopists was 92.9%, and it was significantly higher than that in the first (vs78.5%,P= 0.000), second (vs86.2%,P= 0.000), and third (vs87.5%,P= 0.000) phases. The accuracy of mid-level endoscopists was 94.8%, and it was significantly higher than that in the first (vs79.4%,P= 0.000), second (vs85.3%,P= 0.000), and third (vs88.4%,P= 0.000) phases. In the experienced endoscopist group, the average accuracy value was 98.8%, and it was also significantly higher than that in the first (vs93.6%,P= 0.011), second (vs95.5%,P= 0.015), and third (vs96.5%,P= 0.049) phases (Figure 6A). In terms of sensitivity and specificity for lesions, the average values of the mid-level and junior groups also increased to the highest value after using CAD-WLI and CAD-NBI (Figure 6B and C).

DISCUSSION

CAD has been developed to overcome the limitation of less experience of diagnosis in young doctors. A recent study presented an AI system that can surpass human experts in breast cancer prediction[13]. In the field of gastroenterology, several CAD systems have shown excellent diagnostic potential compared with human endoscopists incolorectal polyp classification, determination of the invasion depth of gastric cancer, and identification of small bowel diseases[14-16]. The application of AI in automatically detecting and classifying lesions, especially in the context of medical imaging, is expected to help physicians provide more accurate diagnoses[17].

Table 3 Comparison of the improvement of endoscopists under computer-assisted detection-white-light imaging and computerassisted detection-narrow-band imaging

Squamous cell carcinoma is the predominant histologic subtype of EC in Asia, where the rate of EC is quite high[18]. Several research studies on CAD for improving the screening efficiency for ESCC have been reported. Horieet al[19]first evaluated the ability of the CNN to detect EC in endoscopic images of superficial and advanced cancer and achieved a sensitivity of 98%. Zhaoet al[20]developed a deep learning model based on magnifying NBI images to investigate the automated classification of intrapapillary capillary loops and assist endoscopic diagnosis of early ESCC. In a recent study by Guoet al[21], the authors developed a CAD system for real-time automated diagnosis of precancerous lesions and ESCC. In 2019, our team reported a novel system using deep neural network (DNN) to localize and identify early ESCC under endoscopic WLI with high accuracy and sensitivity[12]. Moreover, after referring to the results of DNN-CAD, the average diagnostic ability of the endoscopists improved significantly. However, this CAD system could only identify early ESCC in WLI, and the specificity was only 85.4%, which may lead to unnecessary biopsies. As NBI is also an accurate diagnostic tool for early ESCC, a better CAD system that can beused in the NBI model of endoscopy needs to be developed. In addition, a comparative study of AI application in WLI and NBI models is lacking.

Table 4 Diagnostic performance of endoscopists in screening of esophagus squamous cell carcinoma after referring to the results from computer-assisted detection-white-light imaging and computer-assisted detection-narrow-band imaging

Figure 2 Receiver operating characteristic curve for the test dataset. The area under the curve (AUC) was above 97%.

Figure 3 A comparison between the computer-assisted detection-narrow-band imaging and computer-assisted detection-white-light imaging systems in detecting early esophageal squamous cell carcinoma. aP < 0.05; bP < 0.01; cP < 0.001. CAD: Computer-assisted detection; NBI: Narrow-band imaging; WLI: White-light imaging.

Figure 4 Examples of computer-assisted detection system-diagnosed images. A and B: under white-light imaging (WLI) and narrow-band imaging (NBI), computer-assisted detection (CAD)-WLI and CAD-NBI recognized the esophageal cancer lesion (blue square); C and D: CAD-WLI mistakenly identified the normal mucosa as a lesion (blue square) in WLI, while CAD-NBI corrected it in NBI.

Figure 5 Improved accuracy of diagnosis with the assistance of the two computer-assisted detection systems according to the groups. A: Improvement of endoscopists’ accuracy in the second phase with the assistance of computer-assisted detection (CAD)-white-light imaging; B: Improvement of endoscopists’ accuracy in the third phase with the assistance of CAD-narrow-band imaging. WLI: White-light imaging.

In the present study, we developed a CAD system to detect early ESCC under the NBI model of endoscopy. Considering our previously developed CAD system for WLI, we wanted to compare the different characteristics of CAD-NBI and CAD-WLI systems to validate the usefulness of CAD-NBI. Therefore, 316 pairs of images, each pair including WLI and NBI at the same location and at the same angle, were collected from three institutions. The results showed that CAD for NM-NBI images of the esophagus had a good ability to diagnose early ESCC, with an accuracy of 94.3%. For WLI images in the validation dataset, CAD-WLI correctly identified 131 of the 133 lesions. The diagnostic ability of CAD-WLI for this validation image dataset was similar to that reported in our previous study[12]. Comparison of the two systems showed that CAD-NBI had superior accuracy and specificity compared to CAD-WLI, while the CAD-WLI system had higher sensitivity than the CAD-NBI system. The results showed that CAD-NBI may compensate for the feature of low specificity of CAD-WLI and make the overall accuracy higher, but its own sensitivity still needs to be improved further.

Figure 6 Average diagnostic performance of the three groups of endoscopists in four phases. A: Accuracy; B: Sensitivity; C: Specificity. aP < 0.05; cP < 0.001.

Endoscopists were asked to review the images from this validation dataset to evaluate their diagnostic ability. In the first phase, the average accuracy value of the experienced endoscopists for early ESCC was 93.6%, which was similar to that of the CAD-NBI system and higher than that of the CAD-WLI system. However, the average diagnostic accuracy of the less experienced endoscopists, including the mid-level (79.4%) and junior groups (78.5%), was lower. Subsequently, the less experienced endoscopists in particular showed improvement in their diagnostic ability with the help of both CAD-WLI in the second phase and CAD-NBI in the third phase. Junior endoscopists made a greater improvement in terms of sensitivity with the CAD-WLI system when their diagnostic specificity was further improved after referring to the results from CAD-NBI. In addition, mid-level endoscopists showed higher values of specificity and accuracy with the assistance of the CAD-NBI system than with the CAD-WLI system. In the fourth phase, by using both CAD-WLI and CAD-NBI, the average values of the three groups increased to the highest value. After simulating the clinical use of the two CAD systems through four different situations, we found that the two systems had different advantages in terms of avoiding missed diagnosis and performing excessive biopsy, and thus, endoscopists could achieve the best diagnostic efficacy by using both CAD-WLI and CAD-NBI.

The sensitivity of a previous system reported by Horieet al[19]for the diagnosis of ESCC on WLI and NBI was 72% and 86%, respectively, and our CAD-WLI and CADNBI systems provided higher sensitivity of 98.5% and 91.0%, respectively. We have continuously developed the two CAD systems for WLI and NBI based on DNN and conducted for the first time a comparative study to reveal the respective advantages of both systems. The ultimate goal is to integrate the two systems into one to meet the needs of different equipment in a hospital at all levels from different regions. Although Guoet al[21]reported that their system for the automated diagnosis of early ESCC under NBI had sensitivity and specificity of 98.04% and 95.03%, respectively, the sensitivity of CAD-WLI and specificity of CAD-NBI were similar to their reported values. In addition, our study invited 20 endoscopists to review the images of the validation dataset with or without the help of the two CAD systems; 10 of the 20 endoscopists (2 mid-level and 8 junior) are from Xuhui Hospital, which is a secondary hospital where the number of patients is much less than that in the tertiary hospital. As secondary hospitals have lower capability to diagnose and treat ESCC than tertiary hospitals, we wanted to assess fully the effectiveness of our CAD system in helping less experienced endoscopists to detect early ESCC, especially doctors in basic hospitals. Our results confirmed that the improvement of diagnosis capability was most pronounced in less experienced endoscopists.

Next, we will explain the reasons for the differences in the diagnostic characteristics of the two CAD systems. CAD-WLI and CAD-NBI are based on different concepts. CAD-WLI uses a bounding box method to detect the location of esophagus lesions in endoscopic images. The detection result of the CAD-WLI system unavoidably includes unnecessary areas such as background regions and parts of other lesions. In contrast, CAD-NBI uses the object semantic segmentation method based on the FCN model, which only outputs the regions of esophagus lesions. Therefore, the accuracy of the CAD-NBI system is higher than that of the CAD-WLI system. In addition, the CADNBI system is developed by an end-to-end trainable approach, which is different from the sliding window approach of the CAD-WLI system that densely generates a large number of candidate boxes with different sizes and ratios on a given image. Thus, the CAD-NBI system has the advantage of computational speed when compared with the CAD-WLI system. Finally, NBI and WLI have different characteristics, which can help us obtain high accuracy, sensitivity, and specificity simultaneously. On the basis of the key observation, we can develop a multichannel DNN to extract and fuse the features of NBI and WLI simultaneously in future studies.

The present study has several limitations. First, the sample size, including images in the training and validation datasets, was small. The low detection rate of early ESCC limits our ability to obtain more images. In addition, our work on CAD for the early detection of ESCC was validated only on still images with a limited scale. Second, the performance of endoscopists in the latter phase may be slightly affected by the image impression of the previous phase. Third, the CAD diagnosis was based on high-quality images, and bias might occur with poor-quality images such as out-of-focus images or blurred images caused by mucus during real-time gastroscopy.

CONCLUSION

In conclusion, we have constructed a CAD system under the NBI model for screening early ESCC, and this CAD-NBI system has higher accuracy and specificity than the CAD-WLI system reported previously. Endoscopists could achieve the best diagnostic efficacy by using both CAD-WLI and CAD-NBI. Therefore, a novel system combining the characteristics of these two systems under WLI and NBI is needed.

ARTICLE HIGHLIGHTS

Research conclusions

The CAD-NBI system for screening early ESCC has higher accuracy and specificity than CAD-WLI. Endoscopists can achieve the best diagnostic efficacy by using both CAD-WLI and CAD-NBI.

Research perspectives

According to the results, the two CAD systems had different advantages in avoiding missed diagnosis and excessive biopsy, which could help endoscopists, especially those with less experience, in screening of early ESCC more efficiently. As the two CAD systems have unique characteristics, we plan to develop a multichannel deep neural network to extract and combine the features of NBI and WLI simultaneously in our future work.