Zhi-Guo Zhang, Liang Xu, Peng-Jun Zhang, Lei Han
Zhi-Guo Zhang, Lei Han, Department of Oncology, Beijing Daxing District People's Hospital,Beijing 102600, China
Liang Xu, Peng-Jun Zhang, Key Laboratory of Carcinogenesis and Translational Research(Ministry of Education/Beijing), Interventional Therapy Department, Peking University Cancer Hospital and Institute, Beijing 100142, China
Abstract
Key words: Gastric cancer; Gastric polyp; Serum; Artificial neural network; Detection
According to estimates by the World Health Organization, nearly 7 million people worldwide die from cancer every year, and this number is increasing every year.Gastric cancer (GC) is a common malignant tumor that endangers human health. GC ranks second in cancer-related deaths. In China, GC is one of the most malignant tumors with high morbidity and mortality[1]. GC deaths account for approximately 25% to 30% of all cancer-type deaths[2]. The pathogenesis of GC involves physical aging, eating habits and psychological factors[3-5]. The development and progression of GC is a multistage process involving multiple changes at the gene and molecular levels. In the early stage of GC, there are precancerous lesions, most of which remain unchanged, and a small part of which develop into cancer. The Correa cascade is the most common pattern of GC[6]. In current clinical practice, the main treatment for GC is surgical treatment. The 5-year survival rate is very low[7]; however, if GC is detected early, then the 5-year survival rate can be as high as 90%[8]. In developed countries,such as Japan, where the early diagnosis of GC reached 50%, the five-year survival rate reached 90%[9]. The early diagnosis and treatment of GC are extremely important for patients with GC.
Currently, many methods for diagnosing GC are used in scientific research and clinical practice[10]. Among these methods, plasma biomarker detection is an important detection method. The most commonly used tumor markers for early GC detection include carcinoembryonic antigen (CEA), carbohydrate antigens (CA): CA19-9, CA72-4, CA125, CA24-2, CA50, and pepsinogen and alpha-fetoprotein (AFP)[11]. However,these tumor biomarkers are poorly specific and sensitive, and thus far, they have not been used alone for the diagnosis of GC[11,12]. In early GC, tumor markers, such as CEA and CA-724, are increased in the blood. The levels of these markers have been used as important indexes for GC screening, early diagnosis and prognostic evaluation[13].However, specific tumor markers have not yet been discovered. Diagnosis based on a single tumor marker has limited significance[14]. The detection rate of GC is still very low.
In this study, to distinguish between healthy controls (Ctrls)vsGC, gastric polyp(GP) and GC, we analyzed the routine blood detection indexes of GC diagnosis by using binary logistic regression, discriminant analysis, classification tree and artificial neural network. We aimed to use multiparameter joint analysis to improve diagnostic sensitivity and specificity and provide a new potential method for the early diagnosis of GC in clinical practice.
The serum samples of the patients involved in this study were obtained from the blood samples of patients admitted to the Beijing Daxing District People's Hospital from April 2016 to April 2019 and confirmed by imaging and pathology. Sample collection and data screening were approved by the Ethics Committee of Beijing Daxing District People's Hospital.
The inclusion criteria of the disease group were complete clinical and pathological data of the patient, with clear imaging and pathological diagnosis, and no radiotherapy, chemotherapy or other immunotherapy before surgery. The exclusion criteria for the disease group were patients with major diseases associated with the study, combined with other types of tumors, or individuals that had received radiotherapy, chemotherapy, or other immunotherapy before surgery. As shown in Table 1, this study included 144 GP and 253 GC patients. A total of 370 healthy controls were examined for tumor markers and imaging examinations. There were no diseases associated with this study, and both tumor markers and imaging examinations were qualified.
All subjects involved in the study provided early morning fasting peripheral blood samples. EDTA was used as an anticoagulant, and after centrifugation at 3500 r/min for 7 min, the patient serum was collected in a new Eppendorf tube. The serum was then dispensed into 3 tubes and labeled and immediately stored in a -80°C. During the collection process, it is necessary to pay attention to the removal of serum samples of hemolysis or lipemia and avoid repeated freezing and thawing during the test.When testing, directly remove the thawed test samples.
Using SPSS 22.0 statistical software, 77 indexes of GC and GP, 64 indexes of GC and Ctrls were analyzed. The serum levels of each index of GC and GP, Ctrls of GC were compared by an independent samplesttest[15]. The diagnostic value was evaluated by the area under curve (AUC) of the receiver operating characteristic (ROC), and the cutoff value was determined by the Youden index. The combination of indexes was analyzed by statistical methods, such as binary logistic regression analysis,discriminant analysis, classification tree and artificial neural network[16-21].P< 0.01 was considered statistically significant.
There were significant differences in 40 indexes between CtrlsvsGC, and 24 indexes had no significant difference; 39 indexes of GPvsGC were significantly different, and 38 indexes had no significant difference. The ROC were generated for 40 indexes with significant differences in CtrlsvsGC and 39 indexes with significant differences between GPvsGC. Among these indexes, the largest AUC in CtrlsvsGC was and ALB, with values of 0.907. When the ALB cutoff value was 42.05, the sensitivity and specificity were 93.0% and 79.1%, respectively. In GPvsGC, the largest AUC was for D-dimer. The AUC value was 0.729. When the D-dimer cutoff value was 0.435, the sensitivity and specificity were 55.3% and 81.2%, respectively.
The 27 indexes in CtrlsvsGC and 30 indexes in GPvsGC were used to establish a binary logistic regression analysis model (70% of the data). As shown in Figure 1A,the AUC for CtrlsvsGC was 0.989 (0.982, 0.995). When the cutoff value was 0.675, the sensitivity and specificity were 93.4% and 95.5%, respectively. As shown in Figure 1B,the AUC of GPvsGC was 0.929 (0.901, 0.958), when the cutoff value was 0.477, the sensitivity and specificity were 85.1% and 87.6%, respectively. Binary logistic regression analysis is significantly better than the distinction between CtrlsvsGC for distinguishing GPvsGC. As shown in Figure 1C, the AUC of CtrlsvsGC was 0.971(0.957, 0.985), and when the cutoff value was 0.470, the sensitivity and specificity were 86.4% and 97.3%, respectively. As shown in Figure 1D, the GPvsGC AUC was 0.914(0.882, 0.946), and when the cutoff value was 0.462, the sensitivity and specificity were 78.0% and 92.1%, respectively. Discriminant analysis is significantly better than the distinction between CtrlsvsGC for distinguishing GPvsGC.
The 27 indexes in CtrlsvsGC and 30 indexes in GPvsGC were used to establish a classification tree analysis model. As shown in Figure 2A, the AUC of CtrlsvsGC was 0.863 (0.826, 0.900), and when the cutoff value was 0.520, the sensitivity and specificity were 74.0% and 76.3%, respectively. The prediction accuracy rate of the Ctrls was 100%, the prediction accuracy rate of the mGC was 48.2%, and the overall prediction accuracy rate was 76.8%. As shown in Figure 2B, the AUC of GPvsGC was 0.739(0.680, 0.799), and when the cutoff value was 0.290, the sensitivity and specificity were 85.8% and 75.3%, respectively. The predictive accuracy rate of the GP was 62.1%, the correct rate of the GC was 67.8%, and the overall prediction accuracy rate was 65.9%.As shown in Figure 2C, the AUC of CtrlsvsGC was 0.992 (0.980, 1.000). When the cutoff value is 0.837, the sensitivity and specificity were 96.0% and 99.6%,respectively; the prediction accuracy rate of the Ctrls was 97.5%, the prediction accuracy rate of the GC was 84.8%, and the overall prediction accuracy rate was 92.9%. As shown in Figure 2D, the AUC of bGCvsmGC was 0.969 (0.948, 0.990);when the cutoff value was 0.970, the sensitivity and specificity were 94.9% and 96.0%,respectively. The predictive accuracy rate of GP was 71.0%, the predictive accuracy rate of GC was 82.6%, and the overall prediction accuracy rate was 77.9%.
Figure 1 Binary logistic analysis and discriminant analysis results of normal control vs gastric cancer, gastric polyp vs gastric cancer. A: Receiver operating characteristic (ROC) of the binary logistic regression analysis of normal control vs gastric cancer (GC); B: ROC of the binary logistic regression analysis of gastric polyp vs GC; C: ROC of the discriminant analysis of normal control vs GC; D: ROC of the discriminant analysis of gastric polyp vs GC.
Through saliency analysis and ROC curve analysis, there were 27 indexes in the final CtrlsvsGC with aPvalue of < 0.01 and 30 indexes in the GPvsGC with aPvalue of< 0.01. Among these indexes, the maximum AUC of CtrlsvsGC is ALB, and the AUC values were 0.907. The maximum AUC of GPvsGC is D-dimer, and the AUC was 0.729. Pre-ALB levels had been demonstrated to correlate with the outcomes of surgical patients[22,23]. It was usually used to assess the nutritional status. Lots of studies demonstrated that the poor postoperative nutritional status of GC may be related to worse prognosis[24,25]. In our study, we found that it was related to the development of GC. D-dimer is a widely used biomarker for evaluating the ability of coagulation and fibrinolysis, and involved in the progression of cancers[26]. Plasma Ddimer levels was significantly increased in GC patients with distant metastases, and it may be a promising biomarker of detection of GC[27]. In addition, high plasma Ddimer level may also predict poor prognosis in gynecological tumor[28].
Figure 2 Artificial neural network analysis and classification tree analysis results of normal control vs gastric cancer, gastric polyp vs gastric cancer. A:Receiver operating characteristic (ROC) of the classification tree analysis of normal control vs gastric cancer (GC); B: ROC of the classification tree analysis of gastric polyp vs GC; C: ROC of the artificial neural network analysis of normal control vs GC; D: ROC of the artificial neural network analysis of gastric polyp vs GC.
With the rapid development of molecular technology, kinds of molecular detection methods had been explored[29-33]. Many statistical methods currently used in the multiindex joint detection analysis of cancer[15,21,34-36], such as binary logistic regression,discriminant analysis, classification tree and artificial neural network, have achieved good results[16-20]. For example, the artificial neural network model was applied in lung cancer-assisted diagnosis, and the effects of back-propagation neural network and Fisher discriminant model on lung cancer screening were compared by the joint detection of four biomarkers. The results showed that the back-propagation neural network predicts lung cancer model better than the Fisher discriminant analysis,which can provide excellent and intelligent diagnostic tools for lung cancer[37]. Liet al[38]used binary logistic regression analysis to analyze various cytokines in serum for the early detection of GC. Fenget al[39]used the ANN model established by six serum tumor markers to distinguish lung cancer, to identify not only benign lung diseases and normal people but also three common gastrointestinal cancers. These results showed that the artificial neural network model may be an excellent intelligent system to distinguish lung cancer[39]. Suet al[40]applied a classification decision tree model to distinguish between GC and healthy controls. This model is able to distinguish between GC patients and healthy volunteers. The sensitivity in the training set is 95.6%, and the specificity is 92.0%. In the blinded group, this model was able to distinguish GC samples from other samples with a specificity of 88.0%, a sensitivity of 85.3%, and an accuracy of 86.4%. By measuring serum CEA and CA19-9 together,these values were higher than those obtained in the parallel analysis. Therefore, a decision tree analysis demonstrating a serum proteomics model is likely to be used for the diagnosis of GC[40].
For distinguishing CtrlsvsGC, binary logistic regression, discriminant analysis,classification tree analysis and artificial neural network were significantly better than GPvsGC. Binary logistic regression, discriminant analysis and artificial neural network analysis of the ROC curve AUC and the maximum cutoff value corresponding to the sensitivity and specificity were greater than the AUC maximum single index. Therefore, the diagnostic effect of multiparameter joint analysis is significantly better than that of the single-index test. Through the comparison of these four methods, we have the ability to distinguish CtrlsvsmGC, bGCvsmGC, artificial neural network > binary logistic regression > discriminant analysis > classification tree. However, the results may be effected because of the relatively little sample size and lack of independent validation of the model which was built in our study. We propose that the artificial neural network analysis method has good prospects for the multi-index joint detection of tumors, and further research in this area should be carried out in the future.
Tumor markers are increased in the blood in early gastric cancer (GC). The levels of these markers have been used as important indexes for GC screening, early diagnosis and prognostic evaluation.
Specific tumor markers have not yet been discovered. Diagnosis based on a single tumor marker has limited significance. The detection rate of GC is still very low.
In this study, we aimed to improve the diagnostic value of blood markers for GC.
In this study, to distinguish between healthy controls (Ctrls)vsGC, gastric polyp (GP) and GC,we analyzed the routine blood detection indexes of GC diagnosis by using binary logistic regression, discriminant analysis, classification tree and artificial neural network.
By analyzing the data, there are 27 indexes in the final CtrlsvsGC withPvalues < 0.01, the area under the curve (AUC) of albumin is the largest in CtrlsvsGC, and the AUC was 0.907. For 30 indexes in GPvsGC havePvalues < 0.01. Among them, the D-dimer showed an AUC of 0.729.The 27 indexes in CtrlsvsGC and 30 indexes in GPvsGC were used for binary logistic regression, discriminant analysis, classification tree analysis and artificial neural network analysis model. The overall prediction accuracy was 92.9%, and the AUC was 0.992 (0.980, 1.000).
The diagnostic effect of multi-parameter joint artificial neural networks analysis is significantly better than the single-index test diagnosis, and it may provide an assistant method for the detection of GC.
We propose that the artificial neural network analysis method has good prospects for the multiindex joint detection of tumors, and further research in this area should be carried out in the future.
World Journal of Gastrointestinal Oncology2020年4期