Hyun-Jong Jang, Ahwon Lee, J Kang, In Hye Song, Sung Hak Lee
Abstract
Key Words: Colorectal cancer; Mutation; Deep learning; Computational pathology; Computer-aided diagnosis; Digital pathology
Identifying genetic mutations in cancer patients has been increasingly important because mutational status can be very informative to determine the optimal therapeutic strategy[1]. However, molecular analysis is not performed routinely in every cancer patient, since it is not time and cost effective[2]. Thus, cost-effective alternatives for current molecular tests can be helpful in making appropriate treatment decisions. It has long been recognized that the histologic phenotypes reflect the genetic alterations in cancer tissues[3]. Since hematoxylin and eosin (H&E)-stained tissue slides are produced for almost every cancer patient, mutation prediction from the tissue slides can be a time- and cost-effective alternative method for individualized treatment. Thus, researchers attempted to examine the genotype–phenotype relationship in the H&E-stained tissue slides, and some gross tissue patterns related to specific molecular aberrations have been reported[4-9]. However, it remains largely unknown how specific molecular abnormalities are related to the specific histomorphologic findings, as it is not easy to capture the subtle features underlying the specific molecular alterations with the naked eye. To overcome the limitation of visual inspection of tissue structures by pathologists, various image analysis techniques have been applied for many decades to detect the subvisual characteristics of tissue patterns, not discernible to the unaided eyes[1]. Particularly, deep learning has been successfully applied to perform tasks considered too challenging for conventional image analysis techniques because it learns discriminative features directly from the large training dataset for any given task[10]. Therefore, deep learning is increasingly applied for tissue analysis tasks[11]. With the approval to use the digitized whole-slide images (WSIs) for diagnostic purposes, the digitization of tissue slides has been explosively increasing, providing huge digitized tissue data[12]. Combining the routine digitization of tissue slides with deep learning, the computer-aided analysis of WSIs could be adopted to support the evaluation of molecular alterations in H&E-stained cancer tissues in the near future. Although deep learning-based tissue analysis is still in its early phase, few promising results have been published. For example, a recent study reported that deep learning-based molecular cancer subtyping can be performed directly from the standard H&E sections obtained from patients with colorectal cancers (CRCs)[13]. Microsatellite instability can also be predicted from the tissue slides[14]. Furthermore, positive results for the mutation prediction of specific genes from histopathologic images have been reported in patients with various cancer types[3,15-17].
Motivated by these recent studies, we tried to predict the frequently occurring and clinically meaningful mutations from the H&E-stained CRC tissue WSIs with deep learning-based classifiers. Based on the frequency of mutation and prognostic values of the genes, we choseAPC,KRAS,PIK3CA,SMAD4, andTP53genes for the current study. The area under the curves (AUCs) for the receiver operating characteristic (ROC) curves ranged from 0.645 to 0.809 for The Cancer Genome Atlas (TCGA) datasets, showing the potential for deep learning-based mutation prediction in the CRC tissue slides. By combining two different datasets for training, the prediction performance can be enhanced with the expansion of datasets.
TCGA program offers the opportunity to reveal the genotype-phenotype relationship because it provides extensive archives of digital pathology slides with multi-omics test results[18]. Both frozen section tissue slides and formalin-fixed paraffin-embedded (FFPE) diagnostic slides were provided by the program. The WSIs from the TCGACOAD (colon cancer) and TCGA-READ (rectal cancer) projects were combined in this study because colonic and rectal adenocarcinoma share similar molecular and histological features[18]. After removing the WSIs with poor quality, 629 patients were included in the present study. We chose to include the genetic alteration including frame shift insertion and deletion, missense mutations, and nonsense mutation. ForAPC,KRAS,PIK3CA,SMAD4, andTP53genes, 436, 249, 133, 74, and 340 patients were confirmed to have the mutations, respectively. Deep learning did not perform optimally when there was a huge imbalance between classes[19]. In a previous study, we failed to obtain the balanced performance in tissue classification tasks unless the dataset itself was forced to have similar numbers between the classes[20]. Thus, we limited the difference in patient numbers between the mutation group and wild-type group by less than 1.4 fold through a random sampling. To match this limitation, we selected 263 patients withAPCmutation as there were only 188 patients with theAPCwild-type gene in the cohorts. The final patient IDs with their respective mutations are listed in Supplementary Table 1.
Various artifacts including air bubbles, compression artifacts, out-of-focus blur, pen markings, tissue folding, and white background are unavoidable in the WSIs. To make the prediction process fully automated, these artifacts should be automatically removed. Because it is impractical to analyze a WSI as a whole, small image patches are often sliced from a WSI and used for the analysis. Thus, we built a deep learningbased tissue/non-tissue classifier for 360 × 360 pixel image patches at 20 × magnification to remove all of these artifacts at once (Figure 1A). The classifier was a simple convolutional neural network (CNN) with 12 (5 × 5), 24 (5 × 5), and 24 (5 × 5) convolutional filters, each followed by a (2 × 2) max pooling layer. The tissue/nontissue classifier could filter out more than 99.9% of improper patches. Next, tumor tissues should be delineated to predict the mutational status of cancer cells. Because of the freezing process for frozen tissue preparation, the frozen and FFPE tissue WSIs can differ in their morphologic features. Thus, we built separate normal/tumor classifiers for the frozen and FFPE WSIs based on the 360 × 360 pixel tissue image patches using the Inception-v3 model, a widely used CNN architecture. To train the wildtype/mutation classifiers for each gene, frozen and FFPE tissue patches with tumor probability higher than 0.9 by each tumor classifier were collected (Figure 1B). We arbitrarily chose the tumor probability as 0.9 because we decided to only include tissues with prominent tumor features. Although each slide may contain mixed regions of wild-type and mutated tissues considering the tumor heterogeneity, we assigned the same label for all tumor tissue patches in a WSI based on the mutational status of the patients. This labeling strategy was inevitable since we had no methods to delineate the wild-type and mutated regions before the classifiers could be built. The classifiers for the five genes were separately trained and validated with a patient-level ten-fold cross-validation scheme for frozen and FFPE WSIs. The slide-level mutation probability was calculated as the average of the probabilities of all the tumor patches in the WSI. For the training of the Inception-v3 models, we used a mini-batch size of 128, and the cross entropy loss function was adopted as a loss function. Deep neural networks were implemented using the TensorFlow deep learning library (http://tensorflow.org). To minimize overfitting, data augmentation techniques, including random rotations by 90°, random horizontal/vertical flipping, and random perturbation of the contrast and brightness, were applied to the tissue patches during training. In addition, 10% of the training slides were used as a validation dataset for the early stopping of the training. At least five separate classifiers were trained for each gene and tissue modality, and the classifier with the best AUC on the test dataset was included in the results.
Figure 1 Fully automated prediction of mutation with three consecutive classifiers. A: Proper tissue patches can be selected by the tissue/non-tissue classifier. The four insets in the middle panel demonstrated the tissue patches representing pen marking, blurry scanned area, background rich region, and tissue folding, clockwise from top left, all removed by the tissue/non-tissue classifiers. Then, the normal/tumor classifier delineates the tumor patches among the proper tissue patches; B: The wild-type/mutation classifiers are applied only for patches with tumor probability higher than 0.9. The patch-level probabilities of mutation are averaged to yield the slide-level probability.
Patient cohort:A total of 142 patients with CRC who previously underwent surgical resection in Seoul St. Mary’s hospital between 2017 and 2019 were enrolled (SMH dataset). All cases were sporadic, without any familial history of CRCs. The clinicopathological parameters including age, sex, and tumor location were retrospectively reviewed from the medical records. The study was approved by the Institutional Review Board of the College of Medicine at the Catholic University of Korea, No. KC19SESI0787.
Mutation prediction on SMH dataset:ForAPC,KRAS,PIK3CA,SMAD4, andTP53genes, 66, 75, 31, 23, and 98 patients were confirmed to have the mutations, respectively. The sequencing methods are described in Supplementary Methods. Because the SMH dataset was originally collected to extra-validate the model trained on the TCGA datasets, we did not adjust the patient numbers between the classes. The normal/tumor classifier for TCGA FFPE tissues was also used to discriminate the tumor tissue patches of SMH WSIs. The normal/tumor classification accuracy was reviewed by Lee SH and Song IH and was confirmed to be valid. Again, patches with tumor probability higher than 0.9 were collected for mutational status classification. Then, the SMH data were split into ten folds, and each training fold was mixed with TCGA training fold to build new classifiers trained on both datasets. The classification results of the new classifiers on TCGA or SMH datasets were compared with the TCGA-based classifiers to investigate the effects of the expanded training dataset.
The ROC curves and their AUCs for all classifiers were presented to demonstrate the performance of each classifier. We used a permutation test with 1000 iterations to compare the differences between the two paired or unpaired ROC curves when necessary[21]. APvalue of < 0.05 was considered significant.
This study aimed to investigate the feasibility of mutation prediction for the frequently occurring mutations in the CRC tissue WSIs. Since only tumor tissues would be meaningful for the prediction of the mutational status in the tissue slides, three different tissue patch classifiers were sequentially applied to discriminate between tissue/non-tissue, normal/tumor, and wild-type/mutation in order (Figure 1). Only proper tissue patches with high tumor probabilities were used to determine the mutational status (Figure 1B). Patient-level ten-fold cross validation was applied for both frozen and FFPE datasets to fully evaluate the properties of the TCGA CRC WSIs.
From Figures 2 to 6, the classification results forAPC,KRAS,PIK3CA,SMAD4, andTP53genes are presented for both frozen (upper panels) and FFPE (lower panels) TCGA WSIs. In A and C of every figure, the representative binary heatmaps demonstrating the distribution of tissue patches classified as wild-type or mutation are presented. From left to right, WSIs with gene mutation correctly classified as mutation, with wild-type gene correctly classified as wild type, with gene mutation falsely classified as wild-type, and with wild-type gene falsely classified as mutation are presented, which were determined by the probability threshold set to 0.5. The sensitivity and specificity of a classifier can be much improved by setting the threshold appropriately. However, we set the threshold to 0.5 in the figures for simplicity because every classifier for different folds had different optimal thresholds. To demonstrate the differences in the performance between folds, slide-level ROC curves for folds with the lowest and highest AUCs were presented (left and middle ROC curves in the figures). Finally, the overall performance was inferred based on the slidelevel ROC curves drawn for the concatenated results from all ten folds (right ROC curves). For theAPCgene (Figure 2), the AUCs per fold ranged from 0.648 to 0.819 for the frozen tissues and from 0.655 to 0.880 for the FFPE tissues. The concatenated AUCs were 0.771 and 0.742 for the frozen and FFPE tissues, respectively. For theKRASgene (Figure 3), the performance was much better for the frozen tissues than for the FFPE tissues with a per fold AUC for the frozen tissues of 0.675-0.937 and a concatenated AUC of 0.778. For the FFPE tissues, the concatenated AUC was only 0.645, while the per fold AUCs ranged from 0.594 to 0.736. With regard to thePIK3CAgene (Figure 4), the lowest and highest AUCs per fold were 0.669 and 0.775 for the frozen tissues and 0.597 and 0.857 for the FFPE tissues. The concatenated AUCs were 0.713 and 0.690, respectively. For theSMAD4gene (Figure 5), AUCs per fold ranged from 0.619 to 0.849 for the frozen tissues and from 0.587 to 0.926 for the FFPE tissues, while the concatenated AUCs were 0.693 and 0.763, respectively. With regard to theTP53gene (Figure 6), the lowest and highest AUCs per fold were 0.707 and 0.963 for the frozen tissues and 0.737 and 0.805 for the FFPE tissues. The concatenated AUCs were 0.809 and 0.783, respectively. Overall, the wild-type/mutation classifiers for theTP53gene yielded the highest AUCs for both frozen and FFPE tissues of the TCGA datasets. Between the ROC curves of the frozen and FFPE tissues, classifiers for the frozen tissues yielded better results for theAPCandKRASgenes (P< 0.05,P< 0.001,P= 0.068,P= 0.057, andP= 0.115 between the frozen and FFPE classifiers forAPC,KRAS,PIK3CA,SMAD4, andTP53genes, respectively, by Venkatraman’s permutation test for unpaired ROC curves).
The generalizability of a deep learning model for the external dataset is an important issue to be validated. Thus, we collected our own CRC FFPE WSIs with information on genetic mutation. The normal/tumor classifier for the TCGA FFPE tissues was applied to collect the tissue patches with high tumor probabilities. Then, the mutation classifiers for each gene trained on the TCGA FFPE tissues were applied to the tumor patches. The slide-level ROC curves for the five genes are presented in Supplementary Figure 1. The AUCs were 0.654, 0.581, 0.570, 0.652, and 0.775 forAPC,KRAS,PIK3CA,SMAD4, andTP53genes, respectively. For theAPC,KRAS, andPIK3CAgenes, the performance of the TCGA-based mutation classifiers on the SMH dataset were worse than that on the TCGA dataset (P< 0.01,P< 0.05,P< 0.05,P= 0.107, andP= 0.263 forAPC,KRAS,PIK3CA,SMAD4, andTP53genes, respectively, by Venkatraman’s permutation test for unpaired ROC curves). These results indicated that the mutation classifiers did not have an excellent generalizability when they were trained only with the TCGA WSI datasets. It remains unclear whether the performance could be improved when more data are used for the training. Thus, we combined the TCGA and SMH datasets to train new sets of mutation classifiers. Patient-level tenfold cross validation schemes were also used for the mixed dataset. The performance of the SMH dataset showed an obvious improvement, since the SMH data were included in the training data in this setting. The AUCs forAPCandKRASgenes increased to 0.812 and 0.832 (Figure 7,P< 0.01 andP< 0.001 compared with the TCGA-trained classifiers by Venkatraman’s permutation test for paired ROC curves). Improved results were also obtained forPIK3CA,SMAD4, andTP53with AUCs of 0.769, 0.782, and 0.845, respectively (Figure 8,P< 0.05,P< 0.01, andP< 0.05 by Venkatraman’s permutation test for paired ROC curves). More importantly, the performance of the TCGA data was also generally improved by the classifiers trained on both datasets (Supplementary Figure 2). The AUCs were 0.766, 0.694, 0.708, 0.791, and 0.822 for theAPC,KRAS,PIK3CA,SMAD4, andTP53genes, respectively (P= 0.072,P< 0.01,P= 0.091,P= 0.074, andP< 0.05 compared with the TCGA-trained classifiers). These results indicated that the deep learning-based classifiers for mutation prediction in tissue slides can yield better performance when more data are collected from various sources.
Figure 2 Classifiers to predict APC gene mutation for the Cancer Genome Atlas colorectal cancer tissue slides. A: Representative whole slide images (WSIs) of the frozen slides with APC gene mutation correctly classified as mutation, with wild-type gene correctly classified as wild-type, with gene mutation falsely classified as wild-type, and with wild-type gene falsely classified as mutation, from left to right; B: Receiver operating characteristic curves for the fold with lowest area under the curve (AUC), for the fold with highest AUC, and for the concatenated results of all ten folds, from left to right, obtained with the classifiers trained with the frozen tissues; C and D: Same as A and B, but the results were for the formalin-fixed paraffin-embedded WSIs. APC-M: APC mutated; APC-W: APC wild-type; AUC: Area under the curve; FFPE: Formalin-fixed paraffin-embedded.
Figure 3 Classifiers to predict KRAS gene mutation for the Cancer Genome Atlas colorectal cancer tissue slides. A: Representative whole slide images (WSIs) of the frozen slides with KRAS gene mutation correctly classified as mutation, with wild-type gene correctly classified as wild-type, with gene mutation falsely classified as wild-type, and with wild-type gene falsely classified as mutation, from left to right; B: Receiver operating characteristic curves for the fold with lowest area under the curve (AUC), for the fold with highest AUC, and for the concatenated results of all ten folds, from left to right, obtained with the classifiers trained with the frozen tissues; C and D: Same as A and B, but the results were for the formalin-fixed paraffin-embedded WSIs. KRAS-M: KRAS mutated; KRAS-W: KRAS wild-type; AUC: Area under the curve; FFPE: Formalin-fixed paraffin-embedded.
Figure 4 Classifiers to predict PIK3CA gene mutation for the Cancer Genome Atlas colorectal cancer tissue slides. A: Representative whole slide images (WSIs) of the frozen slides with PIK3CA gene mutation correctly classified as mutation, with wild-type gene correctly classified as wild-type, with gene mutation falsely classified as wild-type, and with wild-type gene falsely classified as mutation, from left to right; B: Receiver operating characteristic curves for the fold with lowest area under the curve (AUC), for the fold with highest AUC, and for the concatenated results of all ten folds, from left to right, obtained with the classifiers trained with the frozen tissues; C and D: Same as A and B, but the results were for the formalin-fixed paraffin-embedded WSIs. PIK3CA-M: PIK3CA mutated; PIK3CA-W: PIK3CA wild-type; AUC: Area under the curve; FFPE: Formalin-fixed paraffin-embedded.
In the present study, we selected theAPC,KRAS,PIK3CA,SMAD4, andTP53genes because they were frequently occurring in both TCGA and SMH CRC datasets and had prognostic values.APCis an important tumor suppressor known to play a role in CRC development. DeactivatingAPCleads to the constitutive activation of the Wnt signaling pathway, which may contribute to tumor progression[22]. The frequency ofAPCmutations was 47% for the SMH dataset, which is a slightly higher mutational rate compared with that in previous studies (24.2%-44.8%). The RAS proto-oncogenes (HRAS,KRAS, andNRAS) play a pivotal role in numerous basic cellular functions, such as control of cell growth, differentiation, and apoptosis, and regulate key signaling cascades including phosphoinositide 3-kinase (PI3K) and mitogen-activated protein kinase (MAPK) pathways[23,24]. Mutations in RAS family members are found in 20% of all human cancers, of whichKRASmutations account for 85%[25].KRASmutated in 30% to 50% of patients with CRCs[25]. In the SMH dataset, the frequency was 53%.KRASis a critical oncogene involved in the MAPK signaling pathway, andKRASmutations promote colorectal adenoma growth in the early phase of carcinogenesis[26]. The presence of activatingKRASandNRASmutations is a predictor of resistance to epidermal growth factor receptor (EGFR) inhibitors, such as cetuximab or panitumumab[27,28]. ThePIK3CAgene is responsible for coordinating various cellular processes, including proliferation, migration, and survival. ThePIK3CAmutation is associated with the activation of downstream PI3K/Akt signaling, which in turn deregulates other signaling pathways that contribute to oncogenic transformations[29]. ThePIK3CAmutation occurs in 10%-30% of patients with CRCs[30]. In the present study, the frequency of thePIK3CAmutation was observed to be 22%. Recent studies have shown thatPIK3CAmutations are associated with a worse clinical outcome and with a negative prediction for anti-EGFR targeted therapy[31].SMAD4is an essential intermediator in the TGFβ signaling pathway, exhibiting a pivotal role as a tumor suppressor gene in CRC[32].SMAD4mutations occur in 10%-20% of patients with CRC[32,33]. In the SMH dataset, the rate of theSMAD4mutation was 16%. Recent studies have demonstrated that somaticSMAD4mutations are more common in patients with advanced stages, and a decrease in the level ofSMAD4expression is associated with worse recurrence-free and overall survival in patients with CRC[32]. The tumor suppressor geneTP53regulates DNA repair mechanism and apoptosis. Loss ofTP53function is one of the major events in the development of CRC, which is thought to occur in the later stages of colon cancer progression[34]. TheTP53mutation rate in the SMH dataset was 69%, which is consistent with the frequencies reported in various studies (45%-84%)[35].
Figure 5 Classifiers to predict SMAD4 gene mutation for the Cancer Genome Atlas colorectal cancer tissue slides. A: Representative whole slide images (WSIs) of the frozen slides with SMAD4 gene mutation correctly classified as mutation, with wild-type gene correctly classified as wild-type, with gene mutation falsely classified as wild-type, and with wild-type gene falsely classified as mutation, from left to right; B: Receiver operating characteristic curves for the fold with lowest area under the curve (AUC), for the fold with highest AUC, and for the concatenated results of all ten folds, from left to right, obtained with the classifiers trained with the frozen tissues; C and D: Same as A and B, but the results were for the formalin-fixed paraffin-embedded WSIs. SMAD4-M: SMAD4 mutated; SMAD4-W: SMAD4 wild-type; AUC: Area under the curve; FFPE: Formalin-fixed paraffin-embedded.
Figure 6 Classifiers to predict TP53 gene mutation for the Cancer Genome Atlas colorectal cancer tissue slides. A: Representative whole slide images (WSIs) of the frozen slides with TP53 gene mutation correctly classified as mutation, with wild-type gene correctly classified as wild-type, with gene mutation falsely classified as wild-type, and with wild-type gene falsely classified as mutation, from left to right; B: Receiver operating characteristic curves for the fold with lowest area under the curve (AUC), for the fold with highest AUC, and for the concatenated results of all ten folds, from left to right, obtained with the classifiers trained with the frozen tissues; C and D: Same as A and B, but the results were for the formalin-fixed paraffin-embedded WSIs. TP53-M: TP53 mutated; TP53-W: TP53 wild-type; AUC: Area under the curve; FFPE: Formalin-fixed paraffin-embedded.
Figure 7 Mutation prediction of APC and KRAS genes for the Seoul St. Mary Hospital colorectal cancer tissue slides by the classifiers trained with both The Cancer Genome Atlas and Seoul St. Mary Hospital data. A: Representative whole slide images of the slides with APC gene mutation correctly classified as mutation, with wild-type gene correctly classified as wild-type, with gene mutation falsely classified as wild-type, and with wild-type gene falsely classified as mutation, from left to right; B: Receiver operating characteristic curves for the fold with lowest area under the curve (AUC), for the fold with highest AUC, and for the concatenated results of all ten folds, from left to right; C and D: Same as A and B, but the results were for the KRAS gene. SMH: Seoul St. Mary Hospital; APC-M: APC mutated; APC-W: APC wild-type; KRAS-M: KRAS mutated; KRAS-W: KRAS wild-type; AUC: Area under the curve; FFPE: Formalin-fixed paraffin-embedded.
In general, theAPCmutation is thought to have no prognostic significance[36]. However, in a specific situation such as in a microsatellite stable proximal colon cancer, wild-typeAPChas been associated with poorer survival[37]. On the contrary,KRAS,PIK3CA,SMAD4, andTP53gene mutations were associated with poorer prognosis in CRCs[34,38-40]. Thus, information on the mutational status of these genes can be useful in making therapeutic decisions for CRC patients. On occasion, a specific gene mutation can be related to a specific visual characteristic in tissue histology. For example, thePIK3CAmutation often coincides with lymphovascular invasion, tumor budding, and a high number of poorly differentiated clusters in CRC tissues[39]. However, it is not always possible to discover the visually discernible features reflecting the mutation of a specific gene. Therefore, we adopted deep learning to predict the mutational status of the five genes because the discriminative features of the mutations can be automatically learned directly from the large training data of tissue images. To our knowledge, this is the first study to evaluate the mutation prediction capabilities of deep learning models for the frequently occurring mutations in the pathologic tissue slides of CRC patients.
Figure 8 Mutation prediction of PIK3CA, SMAD4, and TP53 genes for the Seoul St. Mary Hospital colorectal cancer tissue slides by the classifiers trained with both The Cancer Genome Atlas and Seoul St. Mary Hospital data. A: Representative whole slide images of the slides with PIK3CA gene mutation correctly classified as mutation, with wild-type gene correctly classified as wild-type, with gene mutation falsely classified as wild-type, and with wild-type gene falsely classified as mutation, from left to right; B: Receiver operating characteristic curves for the fold with lowest area under the curve (AUC), for the fold with highest AUC, and for the concatenated results of all ten folds, from left to right; C and D: Same as A and B, but the results were for the SMAD4 gene; E and F: Same as A and B, but the results were for the TP53 gene. PIK3CA-M: PIK3CA mutated; PIK3CA-W: PIK3CA wild-type; SMAD4-M: SMAD4 mutated; SMAD4-W: SMAD4 wild-type; TP53-M: TP53 mutated; TP53-W: TP53 wild-type; AUC: Area under the curve; FFPE: Formalin-fixed paraffin-embedded.
In all the mutation classifiers applied to the TCGA frozen and FFPE tissues, the slide-level discrimination capabilities were much better against chance performance (P< 0.001 for all five genes by permutation test). These results indicated that the Inception-v3 model learned valid features to discriminate the mutated tissue phenotypes of each gene. In the case ofAPCandKRASgenes, the classifiers for the frozen tissues yielded better results compared with the FFPE tissues, although the frozen sections generally showed poorer tissue quality than did the FFPE sections. It can be explained by the fact that the frozen sections provided the best representation of the tissue contents on which the genomic signatures were tested[18]. Since the FFPE sections can be taken far from the frozen tissue sections, the mutational status can be different between them, considering the heterogeneity of large tumors. When we validated the classifiers trained with the TCGA FFPE tissues on the SMH WSIs, the performance was generally poorer (Supplementary Figure 1). Deep learning operates well under a condition where both the training and test datasets come from the same distribution[41]. For the H&E-stained tissue slides, the quality may vary because they undergo multiple processes for preparation including formalin fixation, paraffin embedding, sectioning, and staining, which can be slightly different between institutes[42]. Furthermore, the ethnic difference between the TCGA and SMH datasets may also contribute to the difference in the performance. Although the difference can be negligible to human eye, deep learning can be very sensitive to the subtle difference in tissue conditions. Therefore, many researchers insisted on the necessity of using large multi-national and multi-institutional datasets to enhance the generalizability of the deep learning model[2,12]. Thus, we combined the two datasets to build new classifiers trained on both TCGA and SMH datasets. Naturally, the performance for the SMH data was greatly enhanced because the tissue features of the data were exposed to the classifiers in this setting. More importantly, the performance of the TCGA data was also enhanced by adding the WSIs from the SMH dataset for training. These results clearly demonstrated that multi-national and multi-institutional datasets can improve the performance of the mutation classifiers. However, it remains unclear how far the performance can be improved if much more data are supplied.
When we scrutinized the binary heatmaps of falsely classified WSIs, we recognized that the wild-type and mutated patches were generally aggregated rather than dispersed. The patterns implied the possibility that the tumor tissues in a tissue slide may have different mutational statuses between different regions. Large tumors can be molecularly heterogeneous, and the tumor heterogeneity can contribute to the resistance to treatment[43]. Therefore, tumor heterogeneity has been an important issue for both researchers and clinicians. To elucidate the spatial heterogeneity of a tumor, molecular methods with high spatial specificity such as multi-region sequencing and single-cell sequencing can be applied to examine a tissue sample. However, a random sampling of tissues for these molecular tests would be very inefficient. If possible regions of molecular heterogeneity in a tissue slide could be identified before the tests, molecular testing can be more specific and efficient. Furthermore, there are possibilities of false negative molecular tests because of the imprecise delineation of target regions in a tissue block[12]. Therefore, it is very important to objectively discriminate the tumor regions for the molecular evaluation of the tumor tissues. Thus, both normal/tumor and wild-type/mutation classifiers can be used to delineate the appropriate target sites for various molecular tests in cancer tissues. For example, Supplementary Figure 3 presents the heatmaps for the mutational status of all five genes in a TCGA frozen tissue slide, demonstrating how different regions of a slide can have different mutational statuses. When an overlaid probability map of mutation was drawn, areas with low and high mutational statuses can be recognized. It may not be easy to obtain this kind of information without the help of deep learning. Hence, molecular tests with high spatial specificity can be targeted to specific regions depending on the purpose of the tests. Therefore, these classifiers can make the selection of lesional regions for relevant multi-omics testing fully automated in the near future[2].
Limitations also exist for the deep learning-based tissue classifiers. One of the limitations is the sensitive nature of deep learning to minute differences in the datasets. Because of the sensitive nature, classifiers applied to very subtly different conditions should be separately built. For example, classifiers for the frozen and FFPE tissues should be separately trained for the same tasks. It requires additional data collection and training overload. In clinical practice, pathologists should take an additional step to determine the kind of classifiers that should be applied for a specific specimen. It is currently inevitable to separately build classifiers to support various real-world tasks in the pathology laboratories. Therefore, manual selection of appropriate classifiers for target tasks is a necessary step that can limit the fully automated adoption of deep learning-based classifiers in the pathology workflows.
In the current study, we used the high-throughput cancer panel to identify mutations in CRC tissues of the SMH dataset. This panel test approach makes it possible to identify diverse clinically actionable mutations in a single assay. However, it is quite expensive to prepare the equipment necessary to perform the test and to save a large number of data generated. This study demonstrated that a deep learningbased method could be a useful and effective tool for the prediction of actionable mutations from CRC WSIs. However, the interpretation of decision made by the deep learning-based classifier is unclear because of the black box nature of deep learning and should be further studied. Besides this aspect, the advantages and disadvantages between the mutation panel test (molecular test) and deep learning method were described in Table 1.
Despite the limitation, with the increasing digitization of tissue slides, various computer-assisted methods will be introduced for histopathologic interpretation and clinical care. In the present study, we demonstrated the potential of deep learningbased classifiers to predict mutations in the CRC WSIs. Although the classifiers in this study are not yet enough to be used for predicting the genetic mutations in the clinic, deep learning-based methods have the potential to learn features for discriminating the wild-type tissues from the mutated tissues, which are not easily discernible to the human eye. Thus, deep learning will be increasingly adopted to discover new tissuebased biomarkers, which provide fundamental information for personalized medicine. With the accumulation of large sets of WSI data, deep learning-based tissue analyses will play important roles in the better characterization of cancer patients and will be an essential part of digital pathology in the era of precision medicine.
In the present study, we demonstrated that theAPC,KRAS,PIK3CA,SMAD4andTP53mutation can be predicted from H&E pathology images using the deep learningbased classifiers. Furthermore, by combining the TCGA and our datasets for training, the prediction performance was enhanced. Therefore, with the accumulation of tissue image data for training, deep learning can be used to supplement current molecular testing methods in the near future.
Table 1 The advantages and disadvantages between the mutation panel test and deep learning-based method
World Journal of Gastroenterology2020年40期