臧卫东
摘 要 目的: 通过生物信息学分析乳腺癌中具有自更新能力的乳腺球样本,挖掘与自更新能力有关的关键基因,为乳腺癌治疗提供基础和理论依据。方法:首先通过比较原位乳腺癌样本(breast cancer, BC)与乳腺癌的乳腺球样本(mammosphere samples, MS)的mRNA芯片表达数据,获得差异表达基因(differentially expressed genes,DEGs)。随后构建DEGs的蛋白与蛋白相互作用 (protein-protein interaction, PPI)网络,并从中筛选出一个高度关联的子网络,最后对子网络进行功能富集分析。结果:MS和BC两组样本间共有1 083个DEGs。从这些DEGs构建得到的PPI网络中,获得了一个包含49个DEGs的高度关联的子网络,其中tspo、igf1、fn1 和cdk1为子网络的核心基因。结论:这些核心基因可能是乳腺癌细胞中与自更新相关的基因。
关键词 乳腺癌 乳腺球 自我更新 差异表达基因 蛋白与蛋白相互作用网络
中圖分类号:R737.9 文献标识码:A 文章编号:1006-1533(2018)01-0076-05
Analysis of critical genes related to self-renewal in the mammosphere model of breast cancer by bioinformatics
ZANG Weidong*
(Shanghai Fengheng Biotechnology Co., Ltd., Shanghai 200240, China)
ABSTRACT Objective: To explore the key genes related to self-renewal in breast cancer by bioinformatics, which may provide a basic theoretical basis for the treatment of breast cancer. Methods: The mRNA microarray data from breast cancer(BC) and mammosphere samples (MS) were compared. The protein-protein interaction (PPI) network of differentially expressed genes (DEGs) was constructed and a highly correlated subnetwork was screened out, and then the functional enrichment analysis was performed on the subnetwork. Results: There were 1 083 DEGs between MS and BC samples. Then the PPI network was constructed based on these DEGs. Subsequently, a highly correlated subnetwork containing 49 DEGs was obtained from the PPI network. Notably, tspo, igf1, fn1 and cdk1 were considered as the core genes of the subnetwork. Conclusion: These core genes may be associated with self-renewal in breast cancer cells.
KEY WORDS breast cancer; mammosphere; self-renewal; differentially expressed genes; protein-protein interaction
network
乳腺癌(breast cancer,BC)是发生在乳腺腺上皮组织的恶性肿瘤,多发生于女性,男性仅占1%,全世界每年约有100万例新发病例和40万死亡病例[1]。乳腺并不是维持人体生命活动的重要器官,所以原位乳腺癌并不致命;但癌细胞转移后,会危及生命。乳腺癌细胞的一些子细胞系(如CD44+/CD24-/low细胞)能抵抗治疗并导致癌症复发[2]。CD44+/CD24-/low可以从乳腺癌组织中分离出来并通过体外移植到具备自更新(self-renewal)能力的乳腺球样本(mammosphere samples,MS)中培养[3]。此外,MS培养可以为BC细胞的肿瘤诱导亚群的进一步表征提供高度适宜的模型[4]。Creighton等[5]对原位乳腺癌样本和乳腺癌的乳腺球样本的生物芯片表达谱数据进行分析,发现经过传统治疗后残留的CD44+/CD24-/low在MS样本中具有高表达特征。Creighton等[5]认为与上皮间充质转化(EMT)相关的靶蛋白或许能够治疗癌细胞并抑制BC复发,但能抑制BC复发的目标基因或蛋白质在他们的研究中很少提及。本文利用生物信息学分析Creighton的基因芯片数据,尝试挖掘出与抗癌细胞治疗和复发相关的关键基因,为乳腺癌的相关研究提供基础和理论依据。
1 材料与方法
1.1 表达谱数据获取
从Gene Expression Omnibus(GEO,http://www. ncbi.nlm.nih.gov/geo/)中选取下载实验组GSE7515芯片表达数据[5]。此套表达谱数据集共有26个样本,其中包括11个原位乳腺癌的样本和15个乳腺癌的乳腺球样本。该芯片采用Affymetrix Human Genome U133Plus 2.0 Array平台进行检测。利用Affy软件包中的GCRMA方法[6]对所有样本mRNA表达数据进行预处理,并从Probe ID转换Gene Symbol并处理后,得到Gene Symbol对应的表达矩阵,总共获得19 851个Gene Symbols。endprint