Xin Wei, Xue-Jiao Yan, Yu-Yan Guo, Jie Zhang, Guo-Rong Wang,Arsalan Fayyaz, Jiao Yu
Abstract
Key Words: Undifferentiated early gastric cancer; Machine learning; Lymph node metastasis; Gray-level cooccurrence matrix; Feature selection; Prediction
Gastric cancer (GC) is one of the most common and fatal malignancies worldwide and is an important part of the global cancer burden[1,2]. In GC, undifferentiated early GC (UEGC) differs from differentiated-type GC in terms of clinical features and disease state, and their treatment and prognosis vary[3].Therefore, UEGC should be identified and diagnosed early.
The incidence of lymphatic vessel invasion and risk of lymph node metastasis (LNM) in UEGC are high in surgical specimens of GC[4,5]. Endoscopic resection (ER), including endoscopic mucosal resection (EMR) and endoscopic submucosal dissection (ESD), has been considered a minimally invasive treatment option for early GC with negligible risk of LNM[6,7]. Nevertheless, indication or curability evaluation has not been conducted for ESD of undifferentiated GC (e.g.,poorly differentiated adenocarcinoma, signet ring cell carcinoma, or mucinous adenocarcinoma) due to the potential risk of LNM. Although ER can be used as painless treatment, the LNM incidence after non-curative ER can be as low as 5.1% and as high as 12.2%[8-10]. Additionally, ESD is only applicable to intramucosal cancer with a tumor diameter of ≤ 20 mm and without ulcer lesions; thus, treating lesions that meet the ESD indications through surgery is unnecessary[11,12]. That is, when resection beyond the expanded standard is considered ineffective, the potential risk of LNM cannot be ignored. Hence, additional surgical resection and lymph node dissection should be performed. Unlike differentiated early GC, ER indications of UEGC are limited. Therefore, to address this challenging problem, a precise tool that can predict LNM must be explored.
Previous studies have mainly focused on risk factors for LNM or distant metastasis of differentiatedtype early GC[13-15]. However, for UEGC, LNM has different risk factors. Thus, objective and universal evaluation indicators for evaluating its risk are lacking. In this study, we clarified the LNM risk factors of patients with UEGC who underwent surgical resection. Subsequently, we analyzed clinicalpathological factors by introducing gray-level co-occurrence matrix (GLCM) image feature extraction mining to classify LNM risk groups according to the combination of risk factors. This study aims to provide a reference for clinical diagnosis and treatment.
The clinical records of 526 patients who were diagnosed with UEGC were confirmed through pathological examination after radical gastrectomy without endoscopic treatment at four tertiary hospitals. These hospitals are Shaanxi Provincial People’s Hospital, Shaanxi Provincial Tumour Hospital, the First Affiliated Hospital of Xi’an Jiaotong University, and the Second Affiliated Hospital of Xi’an Jiaotong University. The clinical records were between January 2015 to December 2021 and were retrospectively reviewed. The following were the inclusion criteria: (1) Imaging examination was performed; (2) Patients have a complete set of medical data; (3) Primary lesion was resected eitherviaopen surgery or laparoscopic surgery and notviaEMR or ESD; and (4) The status of infiltrating lymph nodes was assessed through routine hematoxylin-eosin staining. To minimize the confounding effect of unnecessary variables, the following were the exclusion criteria: (1) Sufficient information cannot be extracted or mismatched clinical data of patients; and (2) Patients without complete magnetic resonance imaging (MRI) plain scan or the MRI image quality being unacceptable. This study complies with the provisions of the Helsinki Declaration (revised in 2013) and was approved by the Institutional Review Committee of Shaanxi Provincial People’s Hospital (2021-Y024). Figure 1 presents in detail the patient screening steps and modeling process.
All texture parameter post-processing was conducted on Omni dynamics software (GE pharmaceuticals,Shanghai). Two radiologists who have vast experience in gastrointestinal diagnosis referred to the MRI images to sketch the lesions on the ADC map. First, they manually sketch the entire area with cancer on each layer of the map, avoiding the gas in the intestine, until the whole tumor volume was cut out.Second, the software automatically generates the texture features. In this study, the following are the selected texture parameters of the GLCM: Total frequency, energy value, entropy, inertia value,correlation coefficient, inverse moment, cluster shadow, and cluster prominence.
For variables with missing values (often this missing value is less than 10%), the variable’s mean value should be filled. If ≥ 10% of the given variables are missing, this value is excluded from the variable screening of the final model. Similarly, this study adopted unit feature interpolation for the missing values that meet the interpolation requirements. That is, the missing values can be interpolated using the constant values provided or using the statistical data of each column where these missing values are located (e.g.,average value, median value, or the most frequently occurring value)[16,17].
Based on the machine learning (ML) algorithm, the commonly used iterative algorithm models are included: Random forest classifier (RFC), decision tree (DT), support vector machine (SVM), eXtreme gradient boosting (XGBoost), and artificial neural network (ANN). The RFC is an integrated method that forms a cumulative effect by integrating multiple relatively simple evaluators. Random forest is an integrated learning tool based on DT. The SVM is a type of a generalized linear classifier that categorizes data binary through supervised learning. The ANN is a nonlinear equation transformation output algorithm comprising input, hidden, and output layers. Finally, XGBoost is an additive model. In each iteration, only the sub-models in the current step are optimized. In this study, we refer to the guide proposed by Luoet al[18] for the best use of prediction models in biomedical research, that is, the Delphi method, which is used to generate the list of reported items.
For the screening of candidate variables, we mainly rely on the principle of “bag repeatedly put back and extract”, sort according to variables’ weight, and finally obtain the final predictor of the prediction model from the top 10 variables[19]. For the effectiveness evaluation of the prediction model, the receiver operating characteristic (ROC) curve is used to evaluate the accuracy of the model. Meanwhile,the decision curve analysis and clinical impact curve (CIC) were used to evaluate the model’s robustness and differentiation, respectively.
Figure 1 Flowchart of patient selection and data processing. UEGC: Undifferentiated early gastric cancer; EMR: Endoscopic mucosal resection; ESD:Endoscopic submucosal dissection; RFC: Random forest classifier; SVM: Support vector machine; DT: Decision tree; ANN: Artificial neural network; XGboost:Extreme gradient boosting; ROC: Receiver operating characteristic; DCA: Decision curve analysis; CIC: Clinical impact curve; LNM: Lymph node metastasis.
The measurement and counting data in this study are expressed by interquartile spacing (25%, 75%) and percentage (%), respectively. For the comparison between groups, the continuous variables adopt thettest or Mann-WhitneyUtest of independent samples (provided that it does not conform to the normal distribution). The counting data adopt the chi-square goodness-of-fit test. Values of Bonferroni corrected probability are used to compare the qualitative data[20]. The prediction model visualization and other data analysis are performed using R software (version 4.0.4, http://www.r-project.org/). For the comparison between groups,Pvalue < 0.05 is considered statistically significant and vice versa.
Table 1 summarizes the baseline characteristics of 526 hospitalized patients with UEGC. For internal validation, the patients were randomly divided into two sets using the caret package: Training set (n=368, 70%) and validation set (n= 158, 30%). Regarding the LNM rate, the training and validation cohorts were 62 (16.85%) and 29 (18.35%), respectively. In addition to the previously reported clinical-related indicators (e.g.,tumor size, infiltration depth, vascular_invasion, and vascular tumor thrombus),significant differences exist between the LNM and non-LNM groups. We found that GLCM-based texture acquisition features also have significant statistical differences between the two groups.
We conducted a correlation analysis on the variables with significant differences based on the statistical difference analysis of baseline data. As shown in Figure 2A, the correlation matrix (based on Pearsoncorrelation analysis) indicates that the characteristic variables in the GLCM and LNM had a strong correlation degree (r> 0.6). For example, Entropy, Haralick full angle (Haralick_all), Haralick 30°(Haralick_30), Inverse gap full angle (IG_all), Inverse gap 45° (IG_45), Inverse gap 0° (IG_0),etc.were highly correlated with LNM. This suggests that these potential candidate variables can be used as LNM predictors and for the construction of subsequent models. Interestingly, in the subsequent models developed based on ML algorithms, we found that Entropy, Haralick_all, Haralick_30, IG_all, IG_45,IG_0, and Inertia value 45° (IV_45) occupied high weights as the top 7 GLCM-based factors (Figure 2B).Specifically, Entropy has the largest weight among these factors.
Table 1 Patient baseline population and image index characteristic
IV_0 (median, IQR)149.85 (122.75,186.75)96.40 (78.95,125.82)163.20 (134.00,195.65)< 0.001146.70 (112.78,185.57)74.10 (65.60,90.60)158.40 (131.40,196.20)< 0.001 IV_45 (median, IQR)239.55 (201.40,284.75)164.40 (123.83,188.62)254.30 (220.67,290.60)< 0.001226.25 (201.25,266.67)157.40 (133.90,193.80)243.90 (214.30,273.50)< 0.001 IV_90 (median, IQR)129.00 (103.00,154.00)101.00 (77.75,119.00)134.00 (109.25,159.00)< 0.001124.50 (109.00,150.75)105.00 (77.00,118.00)133.00 (117.00,156.00)< 0.001 Haralick_all (median,IQR)0.10 (0.09, 0.10)0.12 (0.11, 0.13)0.09 (0.09, 0.10)< 0.0010.10 (0.09, 0.10)0.12 (0.12, 0.14)0.09 (0.09, 0.10)< 0.001 Haralick_30 (median,IQR)0.10 (0.09, 0.11)0.14 (0.12, 0.15)0.10 (0.09, 0.11)< 0.0010.10 (0.09, 0.11)0.14 (0.13, 0.15)0.10 (0.09, 0.11)< 0.001 Haralick_45 (median,IQR)0.09 (0.08, 0.10)0.11 (0.10, 0.12)0.09 (0.08, 0.10)< 0.0010.09 (0.08, 0.10)0.11 (0.10, 0.13)0.09 (0.08, 0.10)< 0.001 Haralick_90 (median,IQR)0.11 (0.10, 0.13)0.14 (0.12, 0.16)0.11 (0.09, 0.12)< 0.0010.12 (0.10, 0.13)0.15 (0.12, 0.16)0.11 (0.09, 0.13)< 0.001 CSV (median, IQR)106.00 (102.00,111.00)108.00 (105.00,111.00)106.00 (101.00,111.00)0.001107.00 (102.25,111.00)109.00 (105.00,113.00)107.00 (102.00,111.00)0.007 CP (median, IQR)65.50 (60.00,70.00)68.00 (66.00,71.00)64.00 (59.00,70.00)< 0.00164.00 (60.00,68.00)67.00 (64.00,68.00)63.00 (59.00,68.00)0.002 IQR: Interquartile range; TF: Total frequency; EV: Energy value; IV_0: Inertia value 0°; IV_45: Inertia value 45°; IV_90: Inertia value 90°; IG_0: Inverse gap 0°; IG_45: Inverse gap 45°; IG_90: Inverse gap 90°; IG_all: Inverse gap full angle; Haralick_30: Haralick 30°; Haralick_45: Haralick 45°; Haralick_90: Haralick 90°; Haralick_all: Haralick full angle; CSV: Cluster shadow value; CP: Cluster prominence.
When constructing the RFC model [training set: Areas under the ROC curve (AUC): 0.925, 95%confidence interval (CI): 0.378-1.472; testing set: AUC: 0.912, 95%CI: 0.355-1.469], we repeatedly randomly selectedNsamples from the original training setNto generate the new training set DT and then generateMDTs to form a random forest according to the above steps. As shown in Figure 3A and Supplementary Table 1, the smallest Gini index after splitting was selected, including that for Entropy,Haralick_all, Haralick_30, IG_all, IG_45, IG_0, and IV_45. Similarly, Haralick_30 and IG_all served as important weight at DT branches (training set: AUC: 0.856, 95%CI: 0.309-1.403; testing set: AUC: 0.813,95%CI: 0.256-1.370) (Figure 3B). In the ANN model (Figure 4), the accuracy of the prediction model developed using the prediction variables in the GLCM can also reach 0.887 (95%CI: 0.340-1.434) and 0.837 (95%CI: 0.280-1.394) in the training and verification sets, respectively. Although this accuracy is slightly inferior to that of the RFC model, it is better than those of other prediction models (i.e.,DT,XGBoost, and SVM). Table 2, Supplementary Table 1, and Figure 5 summarize the predictive performance of ML-based models. In general, the prediction model constructed by using any ML algorithm was better than the logistic regression algorithm in predicting LNM, further confirming the superiority of ML algorithm, especially the robustness of the RFC.
The prediction efficiency of the RFC model was the best in the process of precise stratification of LNM patients. To further evaluate the “stratification effect” of the RFC, results of CIC analysis indicate that high-risk LNM was accurately distinguished using the RFC model, and “cross-linking” did not occur in the stratification process. The results of this model for the validation and training sets were consistent(Supplementary Table 2), implying that the robustness and LNM discrimination of the RFC model were satisfactory.
Figure 2 Variable screening and weight allocation. A: Correlation matrix analysis of candidate features; B: Weight distribution of candidate variables for each mL based model. RFC: Random forest classifier; SVM: Support vector machine; DT: Decision tree; ANN: Artificial neural network; XGboost: Extreme gradient boosting.
Figure 3 Visualization model prediction based on machine learning based algorithm. A: Random forest classifier model; B: Decision tree model.Candidate factors associated with fracture risk are named through random forest classifier algorithm, and prediction nodes and weights are assigned by the decision tree algorithm.
The standard treatment for early GC is surgery. However, recently, ER has become the standard local treatment for some patients with early GC without LNM[21]. For a long time, it has been used to treat differentiated-type early GC limited to the mucosa, with a diameter of < 2 cm[22,23]. Recent studies have shown that ER indications have been expanded in many studies, even including UEGC and ≤ 2 cm diameter, without ulcer or lymphatic vessel invasion[24]. However, whether UEGC can accept the standard treatment of ER remains a subject of debate. That is, additional surgery should be performed if curability is considered questionable. Given this situation, the risk factors of LNM or distant metastasis and mortality after non-curative ER of UEGC should be investigated. Previous studies have also shown that patients with two or more risk factors (e.g.,ulcer, submucosal invasion, and positive vertical margin) benefit greatly from surgical resection after ER that cannot be cured by UEGC[14,25]. However,due to the heterogeneity of clinical characteristics, risk stratification based on these predictions provides a simple prediction, which is challenging to apply in clinical practice.
The potential application of the GLCM in the prediction of LNM of UEGC has not been systematically explored thus far. In this study, GLCM-based features were extracted from underlying grayscale images collected through MRI. We developed an LNM risk prediction model for patients with UEGC using an ML-based algorithm. The following are the two important findings of our study. First, the accurate risk stratification of UEGC patients who should undergo additional surgery depends on the added value of the GLCM. Second, a new ML-based prediction model was used to identify patients and whether they have LNM. According to previous studies[26], texture analysis can quantify the spatial differences of pixels and the subtle differences reflected in gray values, which is consistent with the conclusion of this study. To some extent, we used GLCM features to gather spatial information and reduced the overfitting effect by replacing the softmax layer with the ML-based algorithm.
Figure 4 Visualization of prediction models based on artificial neural network algorithm. A: Artificial neural network model; B: Importance of variables using connection weights. Candidate factors associated with lymph node metastasis are ordered via artificial neural network (ANN) algorithm and prediction nodes, and weights are assigned via an ANN algorithm. IV_0: Inertia value 0°; IV_45: Inertia value 45°; IG_0: Inverse gap 0°; IG_45: Inverse gap 45°; IG_all: Inverse gap full angle; Haralick_30: Haralick 30°; Haralick_all: Haralick full angle.
In this study, we created five types of ML-based models (i.e.,RFC, ANN, DT, XGBoost, and SVM),which used GLCM features to predict LNM. Interestingly, there were differences in the prediction efficiency obtained by ML-based models of different algorithms. For example, the RFC model had the highest predictive accuracy, which was achieved by incorporating Entropy, Haralick_all, Haralick_30,IG_all, IG_45, IG_0, and IV_45. Meanwhile, the ANN, DT, XGBoost, and SVM exhibited an inferior performance compared with the RFC. This suggests that the accuracy of the RFC in predicting LNM is superior to that of the ML model. A previous study indicated that a random forest algorithm is moreefficient in processing classification problems, which is consistent with the results of this study[27].Meanwhile, DT is not as good as the RFC in terms of fitting, and the low prediction ability of the ANN model indicates that an “overfitting” phenomenon may occur. In general, different ML models show consistent accuracy, indicating that the prediction performance of ML can be improved through data processing.
Table 2 Receiver operating characteristic curve analysis of lymph node metastasis in each mL based model
Figure 5 Predictive performance of candidate models based on machine learning based algorithm. A: Decision curve analysis (DCA) for five mL based models in training sets; B: DCA for five ml based models in test sets. RFC: Random forest classifier; SVM: Support vector machine; DT: Decision tree; ANN:Artificial neural network; XGboost: Extreme gradient boosting.
Our results confirm a GLCM-based LNM classification, which has an ideal predictive effect on the diagnosis and treatment of patients with UEGC. However, the following problems were inevitably encountered in this study. First, because this study involved a retrospective analysis, the case inclusion criteria may have a certain bias on the results, which remains to be confirmed by a large sample of prospective studies in the future. Second, there were relatively few selected cases in this study, and only some parameters of the GLCM were extracted. Thus, the results of its prediction model should be verified by external data. Third, when data from multi-center and large sample studies are available in the future, it is crucial to predict the presence or absence of LNM. Additionally, the GLCM is an important imaging sequence of UEGC, and hence we will further perform other image texture analyses in subsequent research.
GLCM-based feature extraction could, in general, serve as a robust and promising tool to improve predictive efficiency for LNM in individual UEGC patients. ML adopts the algorithm of “classification and pruning” and clearer feature extraction, leading to better data fitting than the conventional prediction model. The model constructed using the RFC had the highest predictive accuracy, with the following being the most important predictors: Entropy, Haralick_all, Haralick_30, IG_all, IG_45, IG_0,and IV_45. In the future, we are still required to validate and optimize these prediction models using datasets of various scenarios to better apply them to clinical practice.
The authors thank all study participants for consenting to the use of their medical records.
Author contributions:Yu J and Wei X conceived and designed the study and wrote the manuscript; Yan XJ, Guo YY,Zhang J, Wang GR, and Arsalan F collected the data, performed the data analysis, and interpreted the outcomes; and all authors critically reviewed the content of the manuscript and helped with the drafts.
Supported bythe General Project-Social Development Field of Shaanxi Province Science and Technology Department, No. 2021SF-313; and Innovation Capability Support Plan of Shaanxi Science and Technology Department - Science and Technology Innovation Team, No. 2020TD-048.
Institutional review board statement:This study was approved by the Institutional Review Committee of Shaanxi Provincial People’s Hospital (2021-Y024).
Informed consent statement:Written informed consent was not required given the retrospective nature of the study from chart review.
Conflict-of-interest statement:All the authors report no relevant conflicts of interest for this article.
Data sharing statement:No additional data are available.
STROBE statement:The authors have read the STROBE Statement-a checklist of items is provided. The manuscript was prepared and revised according to the STROBE Statement-a checklist of items is provided.
Open-Access:This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BYNC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is noncommercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Country/Territory of origin:China
ORCID number:Jiao Yu 0000-0002-8707-8606.
S-Editor:Wang JJ
L-Editor:A
P-Editor:Wang JJ
World Journal of Gastroenterology2022年36期