Arushi Kohli, Erik A Holzwanger, Alexander N Levy
Abstract Inflammatory bowel disease (IBD) is a complex, immune-mediated gastrointestinal disorder with ill-defined etiology, multifaceted diagnostic criteria, and unpredictable treatment response. Innovations in IBD diagnostics, including developments in genomic sequencing and molecular analytics, have generated tremendous interest in leveraging these large data platforms into clinically meaningful tools. Artificial intelligence, through machine learning facilitates the interpretation of large arrays of data, and may provide insight to improving IBD outcomes. While potential applications of machine learning models are vast, further research is needed to generate standardized models that can be adapted to target IBD populations.
Key Words: Artificial intelligence; Machine learning; Automated diagnostics; Colorectal neoplasia screening; Multiomic data; Predictive models
Inflammatory bowel disease (IBD), comprised of ulcerative colitis (UC) and Crohn’s disease (CD), is a set of chronic, immunologically-mediated diseases of the gut that arise from a complex interaction of host genetics, their environment, and gut microbiome[1]. Due to its chronic relapsing nature, IBD is associated with considerable morbidity and impairment of quality of life.
Accurate diagnosis of IBD relies on a combination of clinical, endoscopic, histologic, laboratory, and radiographic data. Significant heterogeneity exists in both the quality of these critical diagnostic components and their subsequent interpretation, as they are inherently dependent on confounders, such as the technical skill and experience of the provider. Consequently, diagnostic algorithms and management of IBD can vary considerably amongst gastroenterologists.
Recently, “big data” from large clinical trials, electronic health records, medical imaging, biobanks, and multiomic (genomic, transcriptomic, metabolomic, and proteomic) databases have been increasingly employed in an effort to improve diagnostic accuracy and predictability of treatment response[2]. However, the use of big data in development of predictive models is confounded by high dimensionality of clinical and non-clinical factors[2]. To overcome this challenge, machine learning (ML) has been increasingly utilized to organize and interpret these large datasets in an effort to identify clinically meaningful patterns and translate them into improved patient outcomes[3]. This review highlights the nascent efforts to incorporate artificial intelligence (AI) and machine learning in the field of IBD.
AI is an interdisciplinary branch of computer science focused on the development of machines that are programmed to perform tasks that imitate intelligent human behavior[2]. Machine learning is a subset of AI that utilizes algorithms with the goal of identifying patterns or generating predictive models as a result of “learning” from an input dataset[3]. This is achieved through supervised and unsupervised learning (Figure 1). In supervised learning, an algorithm is trained on a labeled training dataset to recognize patterns associated with specific groups (healthyvsdiseased)[2]. The algorithm then uses what it has learned from the training dataset to place unseen data into specific categories. The most commonly employed supervised ML models are random forests, neuronal networks and support vector machines[3]. Unsupervised models use an unlabeled training dataset and the algorithm identifies patterns within the data guided by similar characteristics without knowledge of associated diagnosis or outcome[4]. Machine learning algorithms can provide a framework to identify previously undiscovered patterns within IBD and advance our understanding of IBD pathogenesis.
Endoscopic evaluation with mucosal biopsies remains an integral component to diagnosing IBD, yet it carries several limitations. In addition to being invasive, endoscopic assessment of disease severity remains subjective with high interobserver variability despite the use of scoring systems such as the endoscopic Mayo score. In addition, endoscopy may not be able to adequately distinguish between subtle overlapping features of the various IBD phenotypes, resulting in diagnostic dilemmas. Recent research has focused on integrating ML into the diagnostic paradigm in order to overcome some of these limitations.
Figure 1 Artificial intelligence and machine learning overview.
Genome wide association studies have identified over 240 gene loci that have been linked to increased risk of developing IBD and can assist in distinguishing CD and UC[5,6]. Matrix factorization based ML models, using a combination of genome sequence data and biological knowledge have also been developed to distinguish patients with CD from healthy individuals (AUC = 0.816) without the need of histology[7]. In addition, ML assisted metagenomic, proteomic, and microbial prediction models have been utilized to identify predictive signatures distinguishing CD and UC, allowing for improved characterization of the IBD subtypes and risk stratification[8-10]. For example, Seeleyet al[11]were able to discern between CD and UC with 76.9% accuracy using a histology based, mass spectrometry trained, support vector machine learning model by analyzing protein signatures from colonic tissue. Another study model used featured a selection algorithm combined with a support vector machine program to differentiate UC patients from healthy subjects based on the expression of 32 genes in colon tissue samples[12].
Similarly, the inherent subjectivity of endoscopic and radiographic assessment has led to a great interest in automating image interpretation. ML assisted analysis of computed tomography and magnetic resonance imaging has been shown to effectively identify structural bowel damage, such as stricturing disease in CD[13-15]. Image analyzing programs have also been adapted to improve the efficiency of previously time consuming manual review of video capsule endoscopy images[16].
AI has also been utilized to enhance interpretation of endoscopic images to assess disease severity[17-19].
For example, a convolutional neural network (CNN) was able to distinguish remission (Mayo 0-1) from moderate-to-severe disease (Mayo 2-3) with a sensitivity of 83.0% and specificity of 96.0%[18]. Another CNN model was able to detect severe CD ulcerations with high accuracy, 0.91 for grade 1vsgrade 3 ulcers[20]. Computer-aided diagnosis systems have also been shown to reliably predict persistent histologic inflammation during endocytoscopy with a sensitivity of 74%, specificity 97%, and accuracy of 91%[21].
Clearly, computer assisted imaging assessment has the potential to improve how we interpret diagnostic imaging to assess disease activity. As a result, it is expected that use of AI assisted imaging will continue to expand as the technology evolves.
Despite numerous pharmacologic advances over the past decade, clinicians are not yet able to predict treatment response in IBD. The prevailing trial and error approach has resulted in substantial variation in response rates to IBD therapy. This inefficiency, in combination with the significant pharmacoeconomic impact of failed therapies, has led to a growing interest in developing personalized approaches to IBD management.
To this end, machine learning algorithms have been developed to analyze predictive indicators of response for several IBD medications. One study was able to demonstrate that an ML algorithm could outperform conventional thiopurine metabolite testing to predict response[22]. Subsequent studies by Waljeeet al[23,24]incorporated clinical trial data from the GEMINI I and GEMINI II studies with vedolizumab, and demonstrated that ML models could also be used to predict steroid free remission for UC and CD patients. In another study, a ML model used molecular and clinical data to identify biomarkers predictive of response to infliximab in refractory UC (accuracy = 70%). The authors identified tumor necrosis factor, interferon gamma, and lipopolysaccharide as potential regulators of infliximab response[25]. More recently, multi-omic factor analysis was used to identify transcriptomic and genomic biomarker panels that were predictive of ustekinumab response[26].
As a result of chronic inflammation, IBD patients are at increased risk for dysplasia and colorectal cancer. Patients with extensive colitis have up to a 19-fold increase in colorectal cancer risk when compared to the general population[27]. Despite the introduction of high-definition endoscopes and dye-based chromoendoscopy, the morbidity and mortality ascribed to IBD neoplasia has led to great interest in integrating AI-assisted detection systems into traditional colonoscopy. Multiple AI algorithms have been developed to alert the endoscopist of polyps in real-time, using visual or auditory cues during colonoscopy[28]. CNNs have been successfully used to improve adenoma detection in the general population, even for more experienced endoscopists[29]. The CNN model, when compared to expert review of machineoverlaid videos, had a sensitivity and specificity of 0.98 and 0.93, respectively[30]. Another computer-aided diagnosis system detected polyps by evaluating polyp boundaries and generating energy maps that corresponded to the presence of a polyp[31]. Machine learning may also aid in differentiation of colitis associated neoplasia, sporadic colorectal adenomas, and non-neoplastic lesions[32]. Artificial neuronal networks, when applied to complimentary deoxyribonucleic acid microarray data, have the potential to discriminate the subtle differences between polyp subtypes[32]. This may have longstanding effects on decreasing colorectal malignancy and may also guide surveillance in this population.
While AI has the ability to identify high risk subgroups and inform treatment decisions, there remain several obstacles to its routine implementation into clinical practice. Analysis of big datasets have generated several interesting disease observations but these have not necessarily translated into clinically meaningful benefits. The cross-sectional nature of ML datasets, lack of validated AI models, and paucity of biologic explanations for proposed associations makes it difficult to establish causation or adhere to AI algorithm generated decision recommendations. Additionally, the datasets that machine learning systems are dependent upon can be incomplete or of poor quality which can result in systematic errors or bias[2].
The highly sensitive nature of clinical data makes it logistically difficult to share freely amongst organizations, an obstacle potentially overcome by universal electronic medical records. There is also a need for adherence to unified data formats, as well as development of secure cloud storage facilities for easy extraction of large volumes of data[33]. Furthermore, varying degrees of clinical data are in the form of written notes, making data collection for input into mathematical ML models difficult. This can be overcome by development of natural language processing software to extract data from plain text[34]. Other challenges include the high dimensionality of clinical data, over-fitting of ML models, data security issues, and reliability of models to be generalized to the target population[2].
Overcoming the obstacles to machine learning in IBD will require collaborative efforts between clinicians, statisticians, and bioinformaticians to develop algorithms capable of generating clinically meaningful outputs. Prospective randomized trials are needed to confirm the efficacy and safety of AI assisted decision making before it can truly be translated to the bedside.
In summary, AI is a rapidly growing discipline that has the potential to revolutionize the field of inflammatory bowel disease. Machine learning approaches offer the ability to effectively synthesize and incorporate large amounts of data to improve diagnostic accuracy, uncover new disease associations, identify at risk individuals, and guide therapeutic decision making. While challenges to the routine use of AI in IBD remain, continued exploration of possible applications are expected to accelerate the drive toward precision medicine.
World Journal of Gastroenterology2020年44期