Cheng Lu ,Rakesh Shiradkar ,Zaiyi Liu
1Biomedical Engineering Department,Case Western Reserve University,Cleveland 44106,OH,USA;2Department of Radiology,Guangzhou First People’s Hospital,School of Medicine,South China University of Technology,Guangzhou 510080,China
Abstract In the last decade,the focus of computational pathology research community has shifted from replicating the pathological examination for diagnosis done by pathologists to unlocking and discovering“sub-visual”prognostic image cues from the histopathological image.While we are getting more knowledge and experience in digital pathology,the emerging goal is to integrate other-omics or modalities that will contribute for building a better prognostic assay.In this paper,we provide a brief review of representative works that focus on integrating pathomics with radiomics and genomics for cancer prognosis.It includes:correlation of pathomics and genomics;fusion of pathomics and genomics;fusion of pathomics and radiomics.We also present challenges,potential opportunities,and avenues for future work.
Keywords:Radiomics;pathomics;genomics;prognosis;digital pathology
Several recent works suggest that patterns discovered from high dimensional,multi-modal data could improve estimation of disease aggressiveness and patient outcomes(1-3) compared to monomodal data.Data across multiple scales and modalities including radiology images,histology images,genetic mutations,gene expression,etc.were used to create better companion diagnostic tools.Among these modalities,histological images are traditionally used for identifying and characterizing complex histopathological phenotypes,and the histological examination is generally considered as the“gold standard”for diagnosis of most solid tumors.With the advancements in high-speed highresolution whole slide image scanning hardware,the histological tissue slides can be digitized and analyzed efficiently.Pathomics or quantitative histomorphometric analysis refers to the process of extraction and mining of computer derived measurements from digitized histopathology images.While the visual reading of routine histopathology slides of tumors by pathologists can help predict cancer behavior to a certain degree,sophisticated pathomics has the potential to“unlock”more revealing sub-visual attributes about tumors (4).Perhaps,even more importantly pathomics enables a detailed spatial interrogation of the entire tumor landscape and its most invasive elements from a standard hematoxylin and eosin(H&E) slide.The research community has developed approaches quantifying nuclear arrangement,texture,and orientation for disease presence,risk,aggressiveness,progression and survival.These include not only the nuclear architecture and graphical arrangement of a single histologic primitive,but also novel approaches that are focused on characterizing the spatial arrangement (5-7) of tumor infiltrated lymphocytes (TILs) and interplays between multiple different histological primitives simultaneously [e.g.interplay of lymphocytes and cancer cells (8-10)],thus potentially providing a comprehensive portrait of tumor’s morphologic heterogeneity.
On the other hand,radiological imaging,which typically involves non-invasive procedures,presents anatomic and functional characteristics at the macroscopic level.Imaging modalities [such as magnetic resonance imaging (MRI),ultrasound,computerized tomography (CT) and X-ray] are typically used in the initial stages for cancer detection,diagnosis and localization prior to biopsy of specific tissues for confirmatory tests.They are also used for treatment planning,delivery of therapy and monitoring.Radiomics refers to quantitative measurements of texture and shape attributes extracted using advanced image processing and computer vision techniques from imaging modalities.They quantify underlying sub-visual tissue heterogeneity that is not always apparent to a human reader.Since imaging is acquired at the macroscopic scale,radiomics allows for interrogating not only the disease regions of interest,but also surrounding structures such as the peri-tumoral region(11).Radiology images offer the opportunity to be used in conjunction with machine learning to build diagnostic,prognostic and predictive models (12-14).
Compared to imaging and pathology that quantify disease phenotypes,genomic analysis focuses on cellular activities measured at the molecular level.Bulk gene expression data have been used to understand molecular differences between disease phenotypes,socio-economic environments and response to therapies.The investigation on mutations,copy number changes,DNA methylation and gene expression that are correlated with tissue phenotypes enables discovery of new cancer genes and understanding of the underlying molecular mechanism and drivers of tumor morphology associated with diseases.A typical prognostic model using genomic data is OncotypeDX for breast cancer patients,in which a risk score of recurrence was generated by a linear combination of 21 genes expression (15).
In clinical setting,it may very often be likely that patient data comprising more than one of imaging,pathologic or genomic modalities are available in course of their diagnosis through treatment.Genomic data provide rich molecular resolution while imaging data provide spatial phenotype information of cancer in addition to pathology.Thus,multi-modal data offer a unique opportunity to comprehensively interrogate the cancer microenvironment thereby enabling a more accurate assessment of disease aggressiveness.The integration of imaging phenotypes and genotypes could help us 1) understand histological context of genetic data;2) understand underlying biological basis/process of specific quantitative imaging features;3)gain complimentary information for visualizing spatial and molecular context of cancer;4) resolve confounding effects of tissue heterogeneity;5) discover new diagnostic/prognostic signatures;and 6) build a holistic model/approach to understand the progression of different diseases.
In this article,we provide a brief review of representative works that focus on integrating pathomics with radiomics and genomics for cancer diagnosis and prognosis.It includes:correlation of pathomics and genomics;fusion of pathomics and genomics;fusion of pathomics and radiomics.We also present challenges,potential opportunities and avenues for future work.An overview of the fusion of pathomics,radiomics and genomics analysis is shown inFigure 1.
Figure 1 An overview for the fusion of pathomics,radiomics and genomics analyses.In radiomics analysis,quantitative image features were derived from radiology images,which may include traditional hand-crafted features,e.g.,1st and 2nd order statistics,Laws &Local Binary Patterns,Gradient orientations and Gabor and features that learnt by deep learning model.In pathomics analysis,quantitative image features were derived from histopathological images,which may include hand-crafted features like nuclear shape,texture,global structure,local structure,stroma collagen pattern and TIL patterns and features that learnt by deep learning model.In genomics analysis,single nucleotide polymorphism (SNP),copy number variation (CNV),genome structure data and gene expression data [e.g.,ribonucleic acid(RNA)-seq data] were analyzed.In the context of prognosis,features/signatures that associated with patient outcomes from different modalities can be associated and fused,in order to better understand the relationship of disease genotypes and genotypes and to create better prognostic tools.
Correlating tumor morphology quantified by pathomics with large-scale genomic analyses is an emerging research topic in recent literatures,since the causal and inferential relationship between gene expression and pathomics is crucial in biomarker discovery.These association can be done via classical Pearson correlation (16),or advanced methods like sparse canonical correlation analysis (17,18)that can identify correlated sets of genes and histomorphometrics for more effective analysis.In 2013,Wanget al.(19) established an automated pipeline for correlating the histomorphometrics to gene expression data.In The Cancer Genomic Atlas (TCGA) triple negative breast cancer (TNBC) cohort,correlations between histomorphometrics and gene expression were first calculated. The histomorphometrics that can significantly separate high-risk and low-risk patients were then identified in a local TMA cohort.In other datasets,gene clusters with strong correlations to these histomorphometrics were validated as biomarkers.Similarly,Ashet al.(17) used image features learnt by convolutional auto-encoder and performed sparse canonical correlation analysis to identify sets of genes that correlate with histomorphometrics.Luet al.(10) associated the cellular diversity features that derived from the nonsmall cell lung carcinomas with bulk gene data to investigate the underlying biological pathways of image features derived from the pathological image.Subramanianet al.(18) shows that integrative approaches combining tissue phenotypes from images with genomic analysis can resolve confounding effects of tissue heterogeneity and should be used to identify new drivers in other cancers.AbdulJabbaret al.(5) utilized the genomic data to validate the signature extracted from the histology image.Cooperet al.(20) illustrated how morphological features extracted from histology images can be integrated with clinical and genomic data in a study of glioblastomas (GBMs).More specifically,the tumor microenvironment and transcriptional classification of GBM were explored.In addition,the authors shown that molecular and clinical associations were revealed through quantitative nuclear morphometry.Barsoumet al.(21) provided a brief review on how to use morphological features extracted from histology image to correlate with clinical behavior,host immune response,and genomic information.They also discussed the combination of digital pathology and genetic studies and its correlation with tumor behavior.Table 1gives an overview of research works relating to pathomics and genomics correlation.
Due to the intra-tumor heterogeneity,the expression level of certain genes may differ significantly in various regions within the same tumor.On the other hand,the diagnostic slide of tissue samples provides a global view of tumor morphology,and thus pathomic analysis could alleviate the sampling issues raised in genomic analysis.However,the pathomic features may not be able to correlate accurately with the clinical behavior of patients or difficult to provide a biological explanation for certain associations.Therefore,understanding the histological context of genomic data is essential for a full understanding of the clinical behavior of a tumor.Beyond the correlating of pathomics and genomics studies described in the last section,many researches attempted to combine these two to create better diagnostic companion tools.
A straight forward strategy to integrate pathomics and genomics signal is to perform the feature vectors concatenation (22-25).Shaoet al.(26) introduced an ordinal multi-modal feature selection method that identified important features from each modality with the consideration of the intrinsic relationship between modalities.Chenet al.(27) proposed a sophisticated endto-end integrated framework for fusing the learned deep features from histology image,at patch-level and cell graph-level,and learned genomic feature from genomic profile.A gating-based mechanism was first used to control the contribution of each modality,followed by theKronecker product to model feature interactions across modalities.Table 2summarizes the representative research works that combine pathomics and genomics for better prognosticating.
Table 1 Overview of research works on correlating pathomics with genomics
Table 2 Overview of research works on fusion of pathomics with genomics
Radiomics involves high throughput extraction of computational features quantifying tissue heterogeneity at the macroscopic level using advanced image processing and computer vision techniques.Whereas pathomics provides quantitative information at the micro scale.Fusion of radiomics and pathomics provides an opportunity to combine tumor heterogeneity at the macro and micro scale,which may complement each other and result in a stronger integrated signature.
Some of the previous works have explored correlations between radiomics and pathomics to explain the morphological basis of signatures observed on imaging.For instance,studies conducted by Alvarez-Jimenezet al,Penziaset al.,and Shiradkaret al.correlated pathomic features with radiomics and quantitative imaging features to establish the morphologic basis of imaging (28-30).While some other works combined radiomic and pathomics to build integrated models for disease characterization and classification (31-34).Vaidyaet al.
integrated radiomic and pathomic features to build an integrated radio-pathomic signature for cancer prognosis(31-34).Saltzet al.(35) introduced a suite of tools to support the fusion of radiomic and pathomic features and discussed how this toolset can help to investigate the correlations between image features,molecular data,and clinical outcome.Some of these works are summarized inTable 3.
Table 3 Overview of research works on fusion of pathomics with radiomics
Fusion of radiomics,pathomics,and genomics would further allow for integrating multiple scales of data.However,such data are not easy to obtain and there have been very few studies that have explored this aspect.Bramanet al.(36) presented a strategy to intelligently fuse embeddings of radiomics,pathomics and genomics to derive an optimal complementary signature in order to predict outcome in glioblastoma patients.Vaidyaet al.(37)looked at correlating lung CT derived radiomic features with pathomic and genomic signatures to provide a biological rational for radiomic signatures that were associated with better survival in non-small cell lung cancer patients.
The integration of quantitative measurements from multimodality data for prognosis prediction remains achallenging task because of the high dimensionality and heterogeneity of the data.
The understanding of how abstracted features from different modalities influencing the model’s inference remains another significant problem.The deep learning model was treated as a“black box”method since the learnt features and model decision making were difficult to explain.Researchers have tried to open this box by using activation maps (12,38) and providing visualization of learnt features (39). Compared to deep learning approaches,the hand-crafted features extracted from histology image and radiology image provide better explainability since the features were pre-defined,either in a domain agnostic (13,40) or domain inspired (8) way.In the multi-modality fusion study,the interpretation of the extracted feature becomes more difficult.A computational fusion method should not only consider the discriminative power of the extracted features in the task,but also need to consider the explainability of the extracted features.The fusion frameworks proposed by Shaoet al.and Chenet al.(26,27) illustrated that they can visualize and understand the extracted features to some extent.Sharing the extracted features is still challenging since there lacks of standard on the naming,parameter setting of these features.Therefore,developing open-access software that provides transparent information on the computational process is required to evaluate clinical decision support systems.National Cancer Institute (NCI) launched National Interim Clinical Imaging Procedure (NICIP) Code Set to help facilitate the scientific collaboration in cancer research community,which could help researchers to reach a consensus on standardized methods/tools.
One barrier to translating the discovered digital“biomarkers”in pathology imaging-related studies into practice is the issue of generalizability.The discriminative features were mined from a limited number of samples,which easily led to the“overfitting”problem (41,42).That is,the discovered features and model perform well to differentiate patients with distinct outcome in the discovery or training cohort,however,fail in unseen validation cohorts. Therefore,besides the discovery cohort,independent validation cohorts are strongly recommended to further validate the robustness of the found biomarkers.One may claim that cross-validation may help to alleviate the overfitting issues,however,the result may still be biased toward the discovery cohort.The overfitting issue may be caused by the“batch effect”,e.g.,artifacts associated with a specific scanner,thus a proper quality check should be performed first before the analysis of data(43,44).In addition,stain variation may hinder the pretrained model to work well in unseen cohorts.Several approaches have been proposed to address the stain variation issues by using stain normalization (45,46) or training a robust model with a training set containing images with as much variation as possible,i.e.,images scanned by different scanners and from different centers.We believe that the generalizability issue could be alleviated if there are more well-maintain benchmark datasets hosting pathology image,radiology image,and genomic data.
Developing and implementing the multi-modal fusion model require access to matched pathology,radiology or genomic data.As we may have known that TCGA project(47) is a landmark cancer genomics program,which hosts over 2.5 petabytes genomic,epigenomic,transcriptomic,and proteomic data over 20,000 primary cancers of 33 cancer types.More importantly,it also includes the digital diagnostic FFPE histology tissue slides for most of the patients,along with clinical information.The Cancer Imaging Archive (TCIA) (48),on the other hand,providing radiology image and histology image,partially overlapped with patients in TCGA,for cross-modality studies.The TCGA-TCIA interface provides a valuable platform for scientists who would like to perform multi-omics investigations.It is common that we have missing data for a certain modality,either imaging or genomic data,or lack of data labeling.Therefore,the fusion approach should be robust enough to learn the representation of available data and is agnostic to data modality and availability.For better associations or integrated signatures between modalities,generating spatially co-registered data from different modalities is a promising approach.For instance,Bourneet al.(49) introduce an approach for aiding histological validation of MRI studies of human prostate,in which a 3D patient-specific mold was created that facilitates the coregistration ofin vivoMRI and histology image.
In the last decade,the focus of computational pathology research community has shifted from replicating the pathological examination for diagnosis done by pathologists to unlocking and discovering“sub-visual”prognostic image cues from the histopathological image.While we are getting more knowledge and experience in digital pathology,and the emerging goal is to integrate otheromics or modalities that will contribute to building a better prognostic or predictive assay.
Correlations between pathomics and radiomics,genomics allowed for establishing domain specific biological understanding of cancer morphology. Integration of pathomics with radiomics,genomics resulted in improved comprehensive signatures that were better associated with cancer sub-types and prognosticating treatment outcome.While there is significant potential and promise in complementing pathomics with other-omics data,current studies have largely been limited to small and single institutional datasets.Efforts in making large-scale multimodal datasets available to the research community will potentially allow for developing sophisticated fusion strategies furthering the potential of pathomics or quantitative histomorphometry.
This study is supported by the DoD Breast Cancer Research Program Breakthrough Level 1 Award W81XWH-19-1-0668,NIH-NCI R21 CA253108-01;DoD Prostate Cancer Research Program Idea Development Award W81XWH-18-1-0524;Key R&D Program of Guangdong Province,China (No.2021B0101420006); National Science Fund for Distinguished Young Scholars,China (No.81925023);and National Natural Science Foundation of China (No.62002082,62102103,61906050,81771912).
Conflicts of Interest:The authors have no conflicts of interest to declare.
Chinese Journal of Cancer Research2021年5期