Lu-Ming Huang, Wen-Juan Yang, Zhi-Yin Huang, Cheng-Wei Tang, Jing Li
Abstract Due to the rapid progression and poor prognosis of esophageal cancer (EC), the early detection and diagnosis of early EC are of great value for the prognosis improvement of patients. However, the endoscopic detection of early EC, especially Barrett's dysplasia or squamous epithelial dysplasia, is difficult. Therefore, the requirement for more efficient methods of detection and characterization of early EC has led to intensive research in the field of artificial intelligence (AI). Deep learning (DL) has brought about breakthroughs in processing images, videos, and other aspects, whereas convolutional neural networks (CNNs) have shone lights on detection of endoscopic images and videos. Many studies on CNNs in endoscopic analysis of early EC demonstrate excellent performance including sensitivity and specificity and progress gradually from in vitro image analysis for classification to real-time detection of early esophageal neoplasia. When AI technique comes to the pathological diagnosis, borderline lesions that are difficult to determine may become easier than before. In gene diagnosis, due to the lack of tissue specificity of gene diagnostic markers, they can only be used as supplementary measures at present. In predicting the risk of cancer, there is still a lack of prospective clinical research to confirm the accuracy of the risk stratification model.
Key Words: Artificial intelligence; Early esophageal cancer; Barrett's esophagus; Esophageal squamous cell carcinoma; Endoscopic diagnosis; Pathological diagnosis
Esophageal cancer (EC) is the eighth most common cancer and the sixth leading cause of cancer death worldwide[1]. EC mainly consists of esophageal adenocarcinoma (EAC) and esophageal squamous cell carcinoma (ESCC). EAC is the most common pathological type in Western countries, more than 40% of patients with EAC are diagnosed after the disease has metastasized, and the 5-year survival rate is less than 20%[2,3]. Although the incidence of EAC has been increasing globally, ESCC remains the most common pathological type (80%) of all ECs with the highest incidence across a ‘cancer belt’ extending from East Africa and across the Middle East to Asia. Only 20% of patients with ESCC survive longer than 3 years, primarily due to late-stage diagnosis[4]. In low-resource settings, the 5-year survival is much lower at about 3.4%[5]. Early diagnosis may be associated with significantly improved outcomes for all ECs.
Barrett's esophagus (BE) is a premalignant condition characterized by the replacement of normal squamous esophageal epithelium by metaplastic intestinal epithelium containing goblet cells. It is a result of chronic inflammation of the esophagus, conferring a significantly increased risk of EAC. Endoscopic surveillance for BE patients to enable early detection of dysplasia or carcinoma was recommended by GI societies of Western countries[6,7]. The endoscopic surveillance in patients with BE is required with random 4-quadrant biopsy specimens obtained every 1 to 2 cm to detect dysplasia[8]. This method is invasive, time-consuming, and difficult to comply with[9].
Gastroscopy remains the major way to detection of early ESCC. However, endoscopic features of these early lesions are subtle and easily missed with conventional white-light endoscopy (WLE)[10]. Intrapapillary capillary loops (IPCLs) are microvessels, which have been considered a marker of ESCC, because their changes in morphology correlate with the invasion depth of ESCC. Advanced endoscopic-imaging modalities, such as narrow band imaging (NBI), in combination with magnification endoscopy, afford improved visualization of subtle microvascular patterns in the esophageal mucosa of patients with ESCC. Although NBI has showed a high sensitivity for the detection of ESCC, its performance in characterizing these lesions is still limited[11].
Therefore, the requirement for more efficient methods of detection and characterization of early EC has led to intensive research in the field of artificial intelligence (AI), which can be defined by an intelligence established by machines in contrast to the natural intelligence displayed by humans and other animals[12]. Machine learning (ML) and deep learning (DL) are important parts of AI. ML can be divided into supervised and unsupervised methods. Unsupervised learning is to identify groups within data according to commonalities, lack of knowledge of the number of groups or their significance. When the training packet contains input-output pairs, a supervised learning model is required to map new input to output. Conventional ML techniques are limited in their ability to process natural data in their raw form. During the early stage of research and development, the model training was mainly with ML by which researchers have to manually extract the possible disease features based on clinical knowledge. The power of this computer-aided diagnosis (CAD) system is weak and is not enough to be applied in clinical real-time diagnosis.
Convolutional neural networks (CNNs) are supervised ML models inspired by the visual cortex of the human brain processing and recognizing images. Each artificial neuron is a computing unit and all of them are connected to each other, forming a network. By the multiple network layers, CNNs may extract the key features from an image with minimal preprocessing and then provide a final classification through the fully connected layers as the output. The competition of increasing performance has led to a progressive complexity of pooling layers resulting in the concept of DL[13]. The key aspect of DL is that these layers of features are not designed by human engineers. They are learned from data using a general-purpose learning procedure. DL has brought about breakthroughs in processing images, videos, and other aspects, whereas recurrent CNNs have shone lights on detection of endoscopic images and videos.
Application of AI technique in early EC detection has been over 15 years. Many studies on CNNs in endoscopic analysis of early EC demonstrate excellent performance including sensitivity and specificity and progress gradually fromin vitroimage analysis for classification to real-time detection of early esophageal neoplasia. In this manuscript, we will discuss the following: (1) Utility of AI technique in endoscopic detection of early EC; (2) Role of AI in pathological diagnosis of early EC; (3) AI in gene diagnosis of early EC; (4) AI in risk stratification of early EC; and (5) Conclusion and outlook.
AI based on WLE and NBI:There are some limitations to recognize BE-related early neoplastic lesions by WLE, a conventional technology. High-definition WLE (HDWLE) and NBI were ever considered to enhance the accuracy of the diagnosis of BErelated early neoplastic lesions. But the improvement still not satisfied endoscopists. This situation stimulated development of CAD system for early neoplastic lesions in BE based on supervised ML[14,15]. However, it was still difficult for this system to locate BE-related early neoplastic lesions and to select biopsy sites. To solve those problems, Ebigboet al[16]established a CAD system based on DL. The accuracy of the system using HD-WLE was better than that of general endoscopists. Moreover, the system displayed the ability to locate the lesion and the area coincidence rate between the lesions delineated by the system and by experts was up to 72%. Generally, the specificity of NBI was higher than that of conventional WLE[17]. Compared with the use of HD-WLE, the system showed no obvious advantage when using NBI[16]. However, there was a common problem in the above studies; that is, the same image dataset was used in both the training stage and validation stage. This obviously cannot reflect clinical practice, so it is very important to use different image datasets for training and validation. de Groofet al[18]used different HD-WLE images for training and testing, and the results showed that the sensitivity and specificity of the system were significantly higher than those of general endoscopists. The combinations of AI and HD-WLE/NBI, which are widely used in the clinic, perform well in the diagnosis of BE-related early tumor lesions and are superior to general endoscopists. However, in different studies, the precision of the CAD system in delineating lesions and the criteria for the evaluation of the ability of lesion location are quite inconsistent. More high-quality studies are needed in the future.
AI based on endoscopic optical coherence tomography and confocal laser endomicroscopy:In addition to WLE and NBI, endoscopic optical coherence tomography(EOCT) and confocal laser endomicroscopy(CLE) are also used to diagnose early EAC/BE-associated dysplasia. EOCT can identify BE-related early tumor lesions by analyzing esophageal mucosal and submucosal structures[19]. CLE can observe the mucosal tissue and cellular morphology to achieve optical biopsy[20]. However, the complexity of these two imaging technologies, the time-consuming reading of images, and the need for senior endoscopists limit their clinical use. To solve this problem, Qiet al[19,21]extracted multiple EOCT image features and combined one or multiple features to classify the lesions, but the results were not satisfactory. After that, Swageret al[22]used volumetric laser endomicroscopy (VLE, integrated with second-generation OCT) images for training and testing, and the results showed that the CAD system was superior to VLE experts. Veroneseet al[20]used CLE images, and the results showed that the system could accurately distinguish gastric metaplasia (GM), intestinal metaplasia (IM), and neoplasia. Ghatwaryet al[23]showed that the sensitivity of the system using CLE images to diagnose IM and neoplasia was significantly higher than that to diagnose GM. Similarly, the CAD system established by Hong failed to identify GM, the sensitivity of diagnosing IM was not significantly different from that of the above study, and the sensitivity of diagnosing neoplasia was slightly decreased[24]. This may be due to the limited number of images of GM and neoplasia. At present, there are few clinical studies in this field, and the available images are limited. In the future, more research is needed to confirm the value of VLE/CLE combined with AI in the diagnosis of early EAC.
Real-time diagnosis by AI:Currently, research in this field is limited. Ebigboet al[25]continued to optimize the CAD system based on previous research and applied it to clinical real-time detection for the first time. While 14 patients with Barrett's neoplasia were under endoscopic examination, 62 endoscopic images (36 early EAC and 26 BE without dysplasia) were captured using the CAD system for real-time classification, and the results showed that the sensitivity and specificity were 83.7% and 100%, respectively. There was no significant difference between the system and experienced endoscopists. However, there were still some shortcomings in this study. First, the numbers of patients and images were low. Second, the system still used images for diagnosis, not videos for real-time detection. Finally, the ability of AI to assist in the delineation of lesions and biopsy guidance was not verified. In addition, the CAD system built by Hashimotoet al[17]could meet the needs of clinical real-time detection; unfortunately, the researchers did not verify the performance of the system in realtime diagnosis.
AI based on WLE and NBI:Lugol's chromoendoscopy is the standard screening method for ESCC; however, in view of its low specificity and longtime consumption, it is necessary to adopt new endoscopic techniques. Although WLE has been proven to be unsuitable for screening early ESCC alone, considering its clinical popularity, some researchers still hope to introduce AI to improve the accuracy of WLE. Caiet al[26]built a CAD system based on DL and tested it with WLE images. The results showed that the accuracy of the system in the diagnosis of early EC was significantly higher than that of junior and mid-level endoscopists, and there was no significant difference between the system and senior endoscopists. The results of Ohmoriet al[27]showed that the sensitivity of NBI was higher than that of WLE, but the specificity was lower, and there was no significant difference between the overall performance of the system and endoscopic experts[27]. The results of Horieet al[28]were similar to those of Ohmoriet al[27].
In recent years, it has been found that esophageal intrapapillary capillary loop (IPCL) represents an endoscopically visible feature of esophageal squamous cell neoplasia, and its morphological changes are closely related to the depth of tumor invasion[29]. The classification of IPCL based on NBI proposed by the Japanese Endoscopic Society has been widely used in clinical practice because it is easy to understand[30]. Nevertheless, classifying IPCL still requires sufficient experience, and its interpretation is still subjective. Therefore, to classify the IPCL more objectively and help less experienced endoscopists make full use of NBI, combining NBI with AI will be the best solution. Zhaoet al[31]analyzed magnifying NBI images and showed that the diagnostic accuracy of the CAD system was better than that of junior and mid-level endoscopists, and there was no significant difference between the system and senior endoscopists. However, this study focused on the classification of IPCL only and did not further verify the accuracy of the system to determine the depth of tumor invasion. For early ESCC, accurately determining the depth of tumor invasion is the premise of choosing the appropriate treatment. Nakagawaet al[32]used non-magnifying endoscopic images to test the CAD system and found that the sensitivity of the system to diagnose epithelial-submucosal cancers invading up to 200 μm (EP-SM1) was higher than that of endoscopic experts, but the specificity was lower. There was no significant difference between the system and endoscopic experts based on magnifying endoscopic images. Further analysis showed that the system performed well in the diagnosis of EP/lamina propria mucosa (LPM) and had no significant difference from endoscopic experts. However, the sensitivity of the system to diagnose muscularis mucosa (MM)/SM1 was poor, as was the performance of endoscopic experts. In the diagnosis of SM2/3, the sensitivity of this system was slightly higher than that of endoscopic experts. Tokaiet al[33]also used non-magnifying endoscopic images for analysis and found that the sensitivity of NBI was slightly higher than that of WLE, and the diagnostic performance of the system was better than that of endoscopic experts. Further analysis showed that the accuracy of the CAD system in diagnosing EP/LPM and MM was more than 90%, which was significantly higher than that in diagnosing SM1 and SM2. The reasons may be that the training image set did not contain normal esophageal images, and the system mistook extramural compression and not fully extended esophageal wall for lesion features.
AI based on endocytoscopy and high-resolution microendoscopy:Endocytoscopy is a new technology that combines magnifying endoscopy with vital staining. Because of its excellent magnifying ability, endoscopists can clearly observe the epithelial cells of the esophageal mucosa to achieve a similar effect to pathological diagnosis[34]. However, if endoscopists want to use endocytoscopy to complete real-time detection independently, they need a solid foundation for pathology, which is obviously not practical. Therefore, AI may be the best option to assist endoscopic diagnosis. Kumagaiet al[35]showed that the performance of the system using higher magnification images was better than that of lower magnification images. However, there was no stratified analysis of superficial EC and advanced EC in this study. The results could not accurately reflect the ability of endocytoscopy combined with AI in the diagnosis of early EC. High-resolution microendoscopy (HRME) can be used to observe the esophageal mucosal tissue and cellular morphology. Shinet al[36]tested the ability of different image features to distinguish tumors from non-tumor lesions, and the best features selected had an 84% sensitivity and 95% specificity. However, it took a long time to analyze a single image, and the sensitivity was not ideal. In the future, more high-quality studies are needed to confirm its role in the diagnosis of early EC.
Real-time diagnosis by AI:Quanget al[37]used the same HRME image data as Shinet al[36]for training and validation, and the results showed that the sensitivity was 95% and the specificity was 91%. Then, three patients with suspected EC underwent endoscopic examination with a 100% accuracy. However, there were some limitations in this study. First, the number of patients and images in real-time diagnosis was too small. Second, the system still used images for diagnosis, not videos for real-time detection. Eversonet al[38]and Guoet al[39]established CAD systems based on DL with good diagnostic performance using NBI videos. Unfortunately, these systems have not been applied in real-time diagnosis. Endoscopic detection of early EC by various endoscopic techniques assisted by AI is summarized in Table 1.
Although AI combined with endoscopic diagnosis has made progress, endoscopic diagnosis is still unable to replace the gold standard of pathological diagnosis. However, there is a problem in the pathological diagnosis of early esophageal neoplasia; that is, the accuracy of diagnosing dysplasia is not ideal with considerable interobserver variability. To solve this problem, Saboet al[40]established two models to distinguish no dysplasia (ND) from low grade dysplasia (LGD) and LGD from high grade dysplasia (HGD) by extracting the hematoxylin and eosin (HE) stained pathological section image features of BE patients. The results showed that the two models performed well in the diagnosis of indistinguishable borderline lesions. Baaket alcombined HE stained pathological section image characteristics with p53/Ki67 immunohistochemical indicators and used pathological specimens of surgical resection[41]and endoscopic biopsy samples[42]of BE for testing. It was found that the system performed well in distinguishing ND from LGD and LGD from HGD; however, the accuracy of distinguishing HGD from intramucosal carcinoma needs to be improved. The performance of the system was better than that of the general pathologist and only slightly inferior to that of the experienced pathologist. In addition, there were still some studies using immunohistochemical indicators only[43]or the characteristics of nuclear DNA structure and organization combined with DNA ploidy analysis[44]to grade BE dysplasia, but the results were not satisfactory. Therefore, HE stained pathological image features supplemented by relevant immune indicators or other image enhancement techniques may be the main direction of AIassisted pathological diagnosis in the future.
As mentioned above, the invasiveness of endoscopic diagnosis and variability of pathological diagnosis, together with recent advances in the pathogenesis of EC and the development of various omics technologies, have made early EC gene diagnosis a hot topic of research. Zhanget al[45]and Yuet al[46]used microRNAs and long noncoding RNAs specifically expressed in patients with ESCC to establish diagnostic models, and the results showed that both of them can be used to effectivelydistinguish early from advanced tumors. Xinget al[47]selected characteristic secretory proteins based on the RNA transcriptome data of ESCC patients to establish a diagnostic model with a high sensitivity but unsatisfactory specificity for early ESCC. It may be that the selection of characteristic secretory proteins in this study was based on genes up-regulated in tumor tissues and that the down-regulated genes were not fully utilized. In addition, Shenet al[48]and Zhaiet al[49]used microRNA and protein markers specifically expressed in the plasma of ESCC patients to build diagnostic models and found that both of them could effectively distinguish esophageal squamous dysplasia (ESD) from healthy controls. However, compared with ESD or early ESCC, the AI-assisted gene diagnosis of BE-related dysplasia and early EAC is less studied. The results of Slabyet al[50]showed that characteristic microRNAs in BE tissues could be used to distinguish BE with or without dysplasia. These markers may not be specific for EC; therefore, gene diagnosis can only be used as an assistant measure for endoscopic and pathological diagnosis until specific markers of EC gene diagnosis are established.
Table 1 Application of artificial intelligence in endoscopic detection of early esophageal cancer
AI: Artificial intelligence; EC: Esophageal cancer; HD-WLE: High-definition white light endoscopy; SVM: Support vector machine; BE: Barrett's esophagus; CNN: Convolutional neural network; NBI: Narrow band imaging; VLE: Volumetric laser endomicroscopy; CLE: Confocal laser endomicroscopy; GM: Gastric metaplasia; IM: Intestinal metaplasia; ESCC: Esophageal squamous cell carcinoma; IPCL: Intrapapillary capillary loop; EP-SM1: Epithelium-submucosal cancers invading up to 200 μm; HRME: High-resolution microendoscopy.
Considering that only 0.12%-0.43% of BE patients may progress to EAC every year[51-53], it is particularly necessary to establish an effective model to predict the risk of EAC in BE patients. Previous risk stratification was mainly based on the presence of dysplasia, but its effect was not ideal. Critchley-Thorneet al[54]established a predictive model based on the characteristic differences of tissue immunofluorescence markers and histopathological images between patients with BE who developed EAC and those who did not. The results were not satisfactory, with more than 30% of BE patients who developed EAC classified as low-risk. Liet al[55]established a predictive model based on the differences in single nucleotide polymorphisms in the biopsy tissues of BE patients with good performance for predicting EC. Unlike EAC, ESCC has no recognized precancerous disease, and endoscopic screening for the general population is obviously impractical. Therefore, it will be the focus of clinical screening to establish an effective model to predict which individuals are more likely to develop ESD. Etemadiet al[56]'s model based on the epidemiological data of patients with ESD and healthy controls showed that the performance of the model was not good. Moghtadaeiet al[57]'s predictive model based on epidemiological data was better than Etemadi's. Unfortunately, there is a lack of prospective follow-up studies to verify the true accuracy of these models.
In conclusion, AI is trying to be used in the endoscopic detection, pathological diagnosis, gene diagnosis, and cancer risk prediction of early EC. It is helpful for endoscopists and pathologists to improve the accuracy of diagnosis and to assist clinicians for treatment and making follow-up strategies. The borderline lesions of EC pathology that remain difficult to determine may be the main direction of AI-assisted pathological diagnosis in the future. Gene diagnosis can only be used as an assistant measure for endoscopic and pathological diagnosis until specific markers of EC gene diagnosis are established. Higher precision of the CAD system in delineating lesions is necessary for improvement of the accuracy of diagnosis, and enhancement of accuracy for the risk stratification model may be of benefit for the prediction of EC risk.
World Journal of Gastroenterology2020年39期