Evaluation of human epidermal growth factor receptor 2 status of breast cancer using preoperative multidetector computed tomography with deep learning and handcrafted radiomics features

2020-05-17 07:28XiaojunYangLeiWuKeZhaoWeitaoYeWeixiaoLiuYingyiWangJiaoLiHanxiaoLiXiaomeiHuangWenZhangYanqiHuangXinChenSuYaoZaiyiLiuChanghongLiang
Chinese Journal of Cancer Research 2020年2期

Xiaojun Yang,Lei Wu,Ke Zhao,Weitao Ye,Weixiao Liu,Yingyi Wang,Jiao Li,Hanxiao Li,Xiaomei Huang,Wen Zhang,Yanqi Huang,Xin Chen,Su Yao,Zaiyi Liu,Changhong Liang

1Department of Radiology,Guangdong Provincial People’s Hospital,Guangdong Academy of Medical Sciences,Guangzhou 510080,China;2School of Medicine,South China University of Technology,Guangzhou 510006,China;3Department of Radiology,Guangzhou First People’s Hospital,Guangzhou 510180,China;4Department of Pathology,Guangdong Provincial People’s Hospital,Guangdong Academy of Medical Sciences,Guangzhou 510080,China

Abstract Objective:To evaluate the human epidermal growth factor receptor 2(HER2)status in patients with breast cancer using multidetector computed tomography(MDCT)-based handcrafted and deep radiomics features.Methods:This retrospective study enrolled 339 female patients(primary cohort,n=177;validation cohort,n=162)with pathologically confirmed invasive breast cancer.Handcrafted and deep radiomics features were extracted from the MDCT images during the arterial phase.After the feature selection procedures,handcrafted and deep radiomics signatures and the combined model were built using multivariate logistic regression analysis.Performance was assessed by measures of discrimination,calibration,and clinical usefulness in the primary cohort and validated in the validation cohort.Results:The handcrafted radiomics signature had a discriminative ability with a C-index of 0.739[95%confidence interval(95% CI):0.661−0.818]in the primary cohort and 0.695(95% CI:0.609−0.781)in the validation cohort.The deep radiomics signature also had a discriminative ability with a C-index of 0.760(95% CI:0.690−0.831)in the primary cohort and 0.777(95% CI:0.696−0.857)in the validation cohort.The combined model,which incorporated both the handcrafted and deep radiomics signatures,showed good discriminative ability with a C-index of 0.829(95% CI:0.767−0.890)in the primary cohort and 0.809(95% CI:0.740−0.879)in the validation cohort.Conclusions:Handcrafted and deep radiomics features from MDCT images were associated with HER2 status in patients with breast cancer.Thus,these features could provide complementary aid for the radiological evaluation of HER2 status in breast cancer.

Keywords:Breast cancer;human epidermal growth factor receptor 2;multidetector computed tomography;radiomics;deep learning

Introduction

Breast cancer is the most common malignancy in women worldwide and remains the first cause of female cancerrelated death(1,2).Breast cancer is a heterogeneous disease with diverse phenotypes that exhibit distinctive biological behaviors(3).Approximately 15%−20% of breast cancers manifest as human epidermal growth factor receptor 2(HER2)protein overexpression or gene amplification,which is defined as an aggressive subtype associated with metastatic behavior and poor clinical outcomes(4).Nevertheless,the prognosis of the HER2-positive subtype of breast cancer has substantially improved since the development of anti-HER2 targeted therapies.Female patients with HER2-positive breast cancer show good responses and high pathological complete response rates after neoadjuvant chemotherapy with the HER2-blockade agent trastuzumab;further,these patients demonstrate a considerable improvement in both disease-free and overall survival(5-7).Therefore,HER2 status is important for the prognosis of breast cancer and in choosing the optimal individualized treatment strategy for such patients.

HER2-positive cancers tend to accelerate the growth and division of cancer cells,as well as stimulate the cell proliferation and angiogenesis,all of which may cause the tumor heterogeneity(8).In recent years,studies have suggested that the biological characteristics of tumors could be captured using medical images at both genetic and cellular levels(9).Radiomics,the extraction and analysis of quantitative imaging features,enables imaging phenotypes to be correlated with genetic information,which is of great significance for diagnosis,choosing individualized treatment strategies,and predicting the prognosis of tumors(9-13).

Recent studies have tried to use imaging features to assess their associations with the HER2-positive subtype of breast cancer.Although breast radiomics features derived from magnetic resonance imaging(MRI)(14-16)and mammography(MG)(17)are reportedly associated with the HER2-positive subtype,other studies have failed to find a correlation between HER2 status and the features extracted from MRI(18)and positron emission tomography/computed tomography(PET/CT)(19).Accordingly,the current research is still insufficient due to the conflicting relevance;thus,further exploration is required.Multidetector computed tomography(MDCT)also plays an important role in the clinical practice of breast cancer(20),and the 2019 National Comprehensive Cancer Network(NCCN)guidelines of Breast Cancer(version 3.2019)recommend chest contrast CT examination for patients with breast cancer if pulmonary symptoms are present.Moreover,many patients with breast cancer would undergo MDCT for other reasons as well(e.g.,chest pain).The radiomics features within MDCT images may be correlated with the HER2 status in breast cancer,which could provide supplementary assistance with non-invasive imaging evaluation.

Furthermore,conventional radiomics studies generally extracted handcrafted features,which quantify tumor shape,intensity,and texture information based on imaging.However,low-order handcrafted features may be inadequate as they reveal information about medical images from limited aspects;therefore,the tumor heterogeneity may not be fully characterized.Recently,with developments in image recognition and analysis tools,deep learning has drawn increased interest.Deep learning technology,especially convolutional neural network(CNN),is an artificial intelligence algorithm that learns on its own to extract the most predictive features directly from pixel images(21)and has shown remarkable classification and recognition performances in image analysis(22).Compared to handcrafted features,deep features reflect information in medical images from a different perspective and may add further predictive value to HER2 status prediction(23).

To the best of our knowledge,no studies have investigated the correlations between the radiomics features of MDCT images and HER2 status in patients with breast cancer.Therefore,the purpose of this study was to examine if the combination of handcrafted radiomics features and deep learning features based on preoperative MDCT images could evaluate the HER2 status of breast cancer and thus provide complementary aid for the radiological evaluation of HER2 status in patients with breast cancer.

Materials and methods

Patient population

This retrospective study was approved by the Medical Ethics Committee of Guangdong Provincial People’s Hospital,Guangdong Academy of Medical Sciences,and the requirement for informed consent was waived.The inclusion criteria were as follows:1)patients who underwent preoperative contrast-enhanced MDCT between January 2016 and December 2018 with visible breast cancer on the images;2)histopathological verification of primary invasive breast cancer with surgical resection;and 3)HER2 expression status confirmed by immunohistochemistry(IHC)and fluorescentin situhybridization(FISH)tests of the surgical specimen.The exclusion criteria were as follows:1)patients with incomplete clinicopathologic data;2)patients who were treated with neoadjuvant chemotherapy before surgery;or 3)equivocal HER2 status determined by IHC and FISH tests.The patient recruitment flowchart is presented inSupplementary Figure S1.

In total,339 female patients who met the criteria were included in this study.Of these,27 patients had multi-focal or multi-centric lesions,and only the largest tumor lesion of each patient was used for analysis.Finally,the 339 patients were randomly divided into the primary cohort(n=177;age,50.53±10.49 years old;range,27−78 years)and the independent validation cohort(n=162;age,51.88±8.71 years old;range,35−73 years).The baseline clinical information,including age and tumor location,of recruited patients was collected from the institution archives.

Assessment of HER2 status

The HER2 status of all breast cancer patients included in this study was detected using the surgical specimen without neoadjuvant chemotherapy,and determined using IHC or FISH tests according to the 2013 guideline recommendations of the American Society of Clinical Oncology and College of American Pathologists.The IHC staining intensity of HER2 was graded as 0,1+,2+,or 3+.Grades 0 and 1+were defined as negative,whereas grade 3+was considered positive.Grade 2+was equivocal and was further confirmed using FISH,which considered aHER2gene copy number≥6 or a HER2/chromosome enumeration probe 17(CEP17)ratio≥2.0 as confirmation of HER2 protein overexpression(24).

Radiomics model workflow

The workflow of radiomics features modelling is illustrated inFigure 1and includes tumor segmentation and the resized process;handcrafted features and deep features extraction;feature selection;and radiomics signatures and model construction.

MDCT image acquisition and segmentation

Preoperatively,all patients underwent contrast-enhanced chest CT scans,which was performed at different MDCT facilities at Guangdong Provincial People’s Hospital between January 2016 and December 2018.A more detailed description regarding image acquisition and segmentation is provided inSupplementary materials.

Feature extraction

Handcrafted feature extraction

In this study,four categories of handcrafted radiomics features were extracted for analysis:1)first-order statistics features;2)size-and shape-based features;3)texture features;and 4)filter features.The extraction of handcrafted features is described in detail in theSupplementary materials.Handcrafted feature generation was performed via a toolbox developed in-house using MATLAB 2016b(Mathworks,Natick,MA,USA).

Deep feature extraction

Since medical images typically have a limited dataset compared to natural image sets,causing difficulties in training CNN models from scratch,the transfer learning has been proposed to overcome this shortage.Transfer learning is an approach which uses pre-trained models from images of other domains and makes them useful for new datasets(25).Currently,transfer learning is widely implemented in the area of medical deep learning and may alleviate the limitation of small datasets(26).

In this study,transfer learning was performed to extract deep features from two-dimensional MDCT images.Convolutional neural networks fast(CNN-F)(27),as the CNN model used in our study,consisted of five convolutional layers and three fully connected layers,and was pre-trained using the ILSVRC-2012 dataset.The hyperparameters of the pre-trained model were the same as those used by Krizhevsky(28)and are presented inSupplementary materials.

For each patient,the section containing the largest tumor area was selected as the input into the CNN-F.The pre-trained model required three-channel input images(RGB-coded images);however,the medical images in DICOM format were single-channel gray images.Therefore,we first selected and manually segmented the largest tumor area section and cropped the tumor area.Then,the gray values of the segmented region of interest(ROI)were converted into the range(0,255)using linear transformation(29)and each image was rescaled to 224×224 pixels using bicubic interpolation.Next,three rescaled images were used as the R,G and B channels and stacked into a three-channel image(224×224×3),which met the requirement of the pre-trained CNN-F model input.Finally,the deep features were calculated by forward propagation,and the features of the fully connected layer prior to last fully connected layer were extracted as deep features for subsequent analysis.

The entire deep feature extraction process was performed based on a MATLAB toolbox called MatConvNet(Version 1.0-beta25;http://www.vlfeat.org/matconvnet/).

Feature selection

To construct effective and robust radiomic signatures,a coarse to fine strategy was employed for feature selection.Firstly,depending on the different combinations of the independent segmentation of 100 patients,intra-and interclass correlation coefficients(ICCs)were used to determine the robust features.Features with ICC values>0.75 were classified as robust features for further analysis.Secondly,univariate analysis(the Mann-Whitney U test)was performed to compare the robust radiomics features between HER2-positive and HER2-negative groups.All features were sorted in ascending order in terms of P values generated from the univariate analysis;of the features,the top 20% were selected.Thirdly,the support vector machine with recursive feature elimination(SVM-RFE)algorithm was used for further feature selection(30).SVMRFE is an efficient feature selection algorithm which ranks the features according to the weight of features based on support vectors.Finally,the number of key features selected for building a radiomics signature was determined by the C-index value using 10-fold cross-validation.This procedure was implemented on both handcrafted feature and deep feature selections in the primary cohort.

Radiomics signature construction

Using the selected key handcrafted features and deep features,the handcrafted-and deep-radiomics signatures for predicting HER2 status were respectively developed using multivariate logistic regression in the primary cohort.We calculated the handcrafted radiomics score(HRadscore)and deep radiomics score(DRad-score)for each patient with a linear combination of the selected features which were respectively weighted by their normalized coefficients.The signatures trained on the primary cohort were applied to the validation cohort for testing in independent cases.

Prediction model development

Multivariate logistic regression was used to select clinical predictive factors(i.e.,age,tumor location)for developing the prediction model in the primary cohort.Handcraftedand deep-radiomics signatures were applied to develop a prediction model in the primary cohort.

Statistical analysis

The differences in clinical characteristics and radiomics scores between the HER2-positive and HER2-negative patients in both the primary and validation cohorts were assessed.The two-sided Mann-Whitney U test was used for continuous variables(i.e.,age,Rad-score)and the Chisquare test for categorical variables(i.e.,tumor location).The differences in predictive performance between the combined model and the radiomics signatures were tested using the Delong test.Statistical analysis in this study was performed with R software(Version 3.5.2;http://www.Rproject.org).The R packages that were used in this study are listed in theSupplementary materials.P-values<0.05 were considered to be statistically significant.

Evaluation of signatures and model performance

To evaluate the radiomics signatures and prediction model performance in this study,we measured the overall performance,discrimination,calibration and clinical usefulness of the model in the primary cohort and then validated it in the validation cohort(31).

Overall performance

Brier score(32)was calculated to assess the overall performance of the radiomics signatures and the prediction model.The Brier score provided a measure of the agreement between the observed binary outcome(i.e.,HER2 positivevs.HER2 negative)and the predicted probability of that outcome.The Brier score for a model can range from 0(perfect model)to 0.25(non-information model).Generally,a lower Brier score implies better model calibration and discrimination.

Discrimination

The C-index was used to measure the discriminative ability of the radiomics signatures and prediction model.It is equal to the area under the receiver operating characteristic(ROC)curve and varies from 0.5(no apparent accuracy)to 1.0(perfect accuracy)(33).

Calibration

Calibration was used to describe the consistency between the actual outcomes and the predictions.The calibration curve was graphically presented as an assessment of calibration,with predictions on the x-axis and the actual outcome on the y-axis.Perfect predictions should be on the 45-degree line.The Hosmer-Lemeshow test was applied to assess the goodness-of-fit of the model,and a high P-value(>0.05)is considered to be reasonable calibration.

Clinical usefulness

To evaluate the clinical utility of the combined model,the decision curve analysis was applied(34).In this study the standardized net benefit(sNB),which ranges from 0 to 1,was used as a function of the risk threshold in the decision curve with visualization.The clinical impact plot was used to visually show the estimated number of patients that had been deemed as high risk for each risk threshold and the true positive cases.An ROC components plot was used to show the constituents of sNB(i.e.,the true and false positive rates)(34).

Results

Patients’clinical characteristics

There was no significant difference in the HER2 status between the primary and validation cohorts(P=0.893).There were also no significant differences in clinical characteristics between the two cohorts(P=0.198−0.893).

The differences in clinical characteristics between the HER2-positive and HER2-negative groups in both the primary and validation cohorts are shown inTable 1.No significant differences were found in clinical characteristics(age and tumor location)between the HER2-positive and HER2-negative patients in either cohort(P=0.065−0.883).

Feature selection and radiomics signature construction

In total,5,013 handcrafted and 4,096 deep features were extracted from the ROIs of patients with breast cancer.After feature selection,7 handcrafted and 7 deep features with preferable predictive value were finally selected in the primary cohort to construct the handcrafted and deep radiomics signatures,respectively.The handcrafted radiomics(HRad)-and deep radiomics(DRad)-scores of each patient were calculated using the formula constructed by the respective selected features(Supplementary materials).

Evaluation of handcrafted and deep radiomics signatures

The handcrafted radiomics signature demonstrateddiscriminative ability with a C-index of 0.739[95%confidence interval(95% CI):0.661−0.818]in the primary cohort and 0.695(95% CI:0.609−0.781)in the validation cohort.The deep radiomics signature demonstrated discriminative ability with a C-index of 0.760(95% CI:0.690−0.831)in the primary cohort and 0.777(95% CI:0.696−0.857)in the validation cohort.The Brier scores,Hosmer-Lemeshow test results,and sNBs of the handcrafted radiomics signature and deep radiomics signatures are shown inTable 2.

Table 1 Characteristics of patients in primary and validation cohort

Performance of combined model

No significant clinical predictors of HER2 status in breast cancer were found using logistic regression analysis,whereas the handcrafted and deep radiomics signatures were identified as independent predictors.Therefore,a combination of the handcrafted and deep radiomics signatures was used to form the prediction model in this study.

The combined model showed better performance for the prediction of HER2 status in breast cancer than either of the two radiomics signatures in both cohorts(Table 2).The overall performance of the combined model was improved;the Brier scores decreased from 0.220 and 0.231 to 0.211.The combined model showed good discriminative ability,achieving a C-index of 0.829(95% CI:0.767−0.890)in the primary cohort and 0.809(95% CI:0.740−0.879)in the validation cohort.

The calibration curve of the combined model for the prediction of HER2 status demonstrated good agreement between the predicted and observed outcomes in both the primary and validation cohorts(Figure 2).The Hosmer-Lemeshow test was non-significant in the primary(P=0.887)and validation cohorts(P=0.528),indicating goodness of fitness.

The decision curve analysis for the two radiomicssignatures and the combined model in both cohorts is presented inFigure 3A,B.The decision curve analysis plot showed that the combined model outperformed the two radiomics signatures in the range of thresholds from 0.2 to 0.8.The clinical impact plot showed that,if a 0.5 risk threshold was used,the number of cases identified as high risk of expressing HER2-positive was close to the number of true HER2-positive cases(Figure 3C,D).Finally,the true and false positive rates were displayed as functions of the risk threshold in the ROC components plot(Figure 3E,F).

Discussion

In this study,we explored the potential association between the HER2 status of breast cancer and radiomics features extracted from MDCT images.The combined model,which incorporated both handcrafted and deep radiomics features,showed a good performance in evaluating HER2 status;thus,the combined model may be useful as a complementary aid in the radiological evaluation of breast cancer.

HER2-positive status has been shown to be a poor prognostic indicator for breast cancer,which also simultaneously means significant benefits from anti-HER2 targeted therapies(7).Accurately correlating radiological features with the HER2 status in the radiological evaluation of patients with breast cancer is important.Although MDCT is not the primary radiological method for breast cancer evaluation,at Guangdong Provincial People’s Hospital,most patients with breast cancer undergo routine MDCT scans for staging before surgery or neoadjuvant chemotherapy.MDCT is also used for the follow-up of patients with advanced breast cancer.MDCT is advantageous as it has a fast scan time and is capable of multi-planar reconstruction;further,it can be performed when patient are in the supine position(35).It can also simultaneously evaluate the extension of lesions,as well as region lymph nodes,the skin,chest wall and distant metastasis(20).

A previous study by Tamakiet al.found that MDCT features(tumor shape,enhancement pattern and density)are distinct among breast cancer subtypes(36).In recent years,radiomics has appeared as an emerging field that converts medical imaging into quantitative features,which may improve clinical diagnosis,prognosis and prediction,especially in oncology(12,37).In view of the ability of radiomics to noninvasively decode tumor heterogeneity,current studies of other tumors have shown a potential correlation between tumor genotype and CT-based radiomics characteristics(11,38,39).In this study,we examined if radiomics features based on MDCT images reflect HER2 status in patients with breast cancer.

Many previous studies have tried to investigate the relationship between image characteristics and the HER2-enriched subtype in breast cancer;the most commonly used imaging technique was MRI(14,15),followed by MG(17)and PET/CT(19).Changet al.have demonstrated that breast MRI features that quantify the tumor heterogeneity could be used to indicate HER2 status with an area under the curve(AUC)of 0.8458(16).Although the AUC was higher than that of our study,it lacked independent validation.Moreover,other studies have found no correlation between the HER2 status and imaging features extracted from MRI(18)and PET/CT(19).As for mammographic features(including tumor size,nonspiculated mass,and calcification)in the study of Nieet al.(40)and MG-based quantitative radiomics features from study of Maet al.(17)and Zhouet al.(41),above studies reported good performances in differentiating the HER2-enriched subtype,with AUCs of 0.75,0.78 and 0.787,respectively.The above predictive performances were both slightly poorer than the C-index(which is equal to the AUC)of our study in the validation cohort(C-index:0.809).Additionally,compared with the generally small HER2-positive dataset in most of the above-mentioned studies,our study enrolled 117 HER2-positive patients(117/339)with surgical pathology for analysis.Therefore,the performance of MDCT radiomics features provides a complementary aid for the radiological evaluation of HER2 status in breast cancer.

In recent years,the integration of multiple markers into one model has been shown to be advantageous for the individualized management of patients and has also been shown to outperform the use of individual markers(12,42).To improve the performance of imaging features in our study,we extracted and incorporated handcrafted and deep radiomics features into the predictive model.CNNs are deep learning models which learn increasingly higher level features from input images through a series of successive linear and nonlinear layers(21,43).Compared to conventional handcrafted radiomics features,high-order deep radiomics features could provide further supplementary information to elevate the performance of the model(44,45).In this study,the model that incorporated both handcrafted and deep radiomics signatures outperformed the individual radiomics signatures in HER2 status evaluation,with a good discriminative ability(C-index=0.829 in the primary cohort;C-index=0.809 in the validation cohort).

As a preliminary study,this study had some limitations.First,we examined a relatively small dataset from a single institution.Second,we only used two-dimensional radiomics features of the largest slice for analysis.Third,MDCT is only a single radiological method;further combinations with radiomics findings from multiple radiological methods including MR and MG are expected to improve the radiological evaluation for HER2 status.

Conclusions

This study indicated that handcrafted and deep radiomics features extracted from MDCT images are associated with HER2 status in breast cancer and may provide complementary,noninvasive assistance in the radiological evaluation of the HER2 status in patients with breast cancer.

Acknowledgements

This work supported by the National Key R&D Program of China(No.2017YFC1309100),the National Science Fund for Distinguished Young Scholars(No.81925023),the National Natural Science Foundation of China(No.81771912,81701662,81701782,81601469,and 81702322)and Science and Technology Planning Project of Guangdong Province(No.2017B020227012).

Footnote

Conflicts of Interest:The authors have no conflicts of interest to declare.