Comparative study of microarray and experimental data on Schwann cells in peripheral nerve degeneration and regeneration: big data analysis

2019-03-15 05:50UlfuaraShefaJunyangJung

Ulfuara Shefa, Junyang Jung,

1 Department of Biomedical Science, Graduate School, Kyung Hee University, Dongdaemun-gu, Seoul, Republic of Korea

2 Department of Anatomy and Neurobiology, College of Medicine, Kyung Hee University, Dongdaemun-gu, Seoul, Republic of Korea

Abstract A Schwann cell has regenerative capabilities and is an important cell in the peripheral nervous system.This microarray study is part of a bioinformatics study that focuses mainly on Schwann cells. Microarray data provide information on differences between microarray-based and experiment-based gene expression analyses. According to microarray data, several genes exhibit increased expression (fold change) but they are weakly expressed in experimental studies (based on morphology, protein and mRNA levels). In contrast, some genes are weakly expressed in microarray data and highly expressed in experimental studies;such genes may represent future target genes in Schwann cell studies. These studies allow us to learn about additional genes that could be used to achieve targeted results from experimental studies. In the current big data study by retrieving more than 5000 scientific articles from PubMed or NCBI, Google Scholar, and Google, 1016 (up- and downregulated) genes were determined to be related to Schwann cells. However,no experiment was performed in the laboratory; rather, the present study is part of a big data analysis. Our study will contribute to our understanding of Schwann cell biology by aiding in the identification of genes.Based on a comparative analysis of all microarray data, we conclude that the microarray could be a good tool for predicting the expression and intensity of different genes of interest in actual experiments.

Key Words: Schwann cells; big data analysis; peripheral nerve degeneration; peripheral nerve regeneration;microarray; matched genes; promising genes; gene ranking

Introduction

Although neuroscience aims to organize its own big data sets, there must be ways to standardize, integrate, and synthesize various types of data from different levels to exploit the full potential of such information (Sejnowski et al., 2014). Schwann cells in the adult peripheral nervous system (PNS) have the ability to differentiate into immature states after nerve injury, and their plasticity is essential for the regeneration of injured peripheral nerves. Schwann cells also play essential roles in the PNS, including the phagocytosis of debris, the facilitation of regeneration via their secretion, and the demyelination of regenerating axons (Whitehead et al., 2018).

Microarrays are capable of synchronously monitoring the expression levels of thousands of genes. Microarray experiments are also useful for identifying, in a highly sequential manner (Sturn et al., 2002), differentially expressed genes. Most microarray studies aim to identify differentially expressed genes and analyze the interactions between individual genes in pathways and networks to indirectly reveal the phenotype of these genes (Ham et al., 2018).Microarray technology offers a great way to take full advantage of the tremendous prospects of genomic data. Microarrays play an essential role in overcoming obstacles to target identification and drug discovery and development (Barrett and Kawasaki, 2003).Studies that would have been performed on a small number of hybridizations can now include tens or hundreds of assays. The challenge is also moving from generating, collecting, managing,and analyzing data to identifying statistically and biologically significant patterns of gene expression (Dudoit et al., 2003).

In a previous study, genomic microarrays were used to analyze gene expression, serving as an essential tool for mapping recessive diseases believed to be the result of alterations at the level of gene expression (Pollack et al., 1999). The successful application of microarray technology in the field of neuroscience provides both a molecular approach to studying systems neurobiology and insights into diverse areas of investigation, ranging from fundamental questions of developmental neurobiology to issues related to neurological and psychological disorders (Nisenbaum, 2002).This information on gene expression can also be used in medicine to compare clinically analogous groups, such as healthy versus diseased groups, revealing a new subclass of significant outcomes(e.g., response to therapy and survival) (Tarca et al., 2006).

The current study examined a number of genes that are part of a big data analysis. Big data methods are used to investigate how decision-making could depend on future-sightedness. Big data methods can also be applied to naturalistic data to reveal underlying psychological properties as well as processes (Thorstad and Wolff, 2018). The concept of “big data,” which is already in use in physics, astronomy, and genomics, has been introduced to the field of neuroscience. Despite its disadvantages, it offers a deeper understanding as well as new insights (Sejnowski et al., 2014).

This study aimed to determine whether matching and mismatching genes between real experiments and microarray data vary across studies, as this could imply differential gene expression. For example, a previous study showed that tenascin c (tnc),with a microarray value of 69.13, was strongly expressed via western blot analysis, and this appears to be matched (Zhang et al., 2016). In another case, S-phase kinase-associated protein 2(skp2), with a microarray value of 1.47, was strongly expressed in the experimental study, and this also seems be matched,although its microarray value was low (Shen et al., 2008). Additionally, other downregulated genes, such as growth factor-associated protein 43 (gap43), which is considered mismatched upon comparison of its microarray value and experimental study results, was strongly expressed, although its microarray value is low (0.25). Because of these differences, it can be hypothesized that an analysis of many genes will likely reveal that some genes exhibit high experimental values. Thus, this article discusses the scope of microarray studies necessary for identifying Schwann cell-related genes.

Materials and Methods

Collection and arrangement of genes

First, Schwann cell-related microarray genes (1016 genes from sciatic nerve samples) were arranged according to various experimental conditions (e.g., in vitro, in vivo culture) by searching more than 5000 scientific articles from PubMed or NCBI, Google Scholar, and Google. All genes were entered into Microsoft Excel 2013 and were considered raw data. Differences between the control and injury data are presented as a fold change and were arranged in the Excel file version 2013. In the Excel file, the fold changes of both up- and downregulated genes are listed in order from largest to smallest. Next, we searched for the genes again in PubMed or NCBI, Google Scholar, and Google. After reviewing articles on the genes of interest, we searched for experimental results on each particular gene. As several genes in the Excel list could be examined in Schwann cells in the future, it is possible that novel genes will be discovered. Identification of these molecules will be helpful in the field of Schwann cell biology (Figure 1).

Unification data

Figure 1 Working procedures of finding various types of promising genes.

Data unification is a means of arranging genes in an identical way.For example, upregulated genes would have a value of 0 < 1, and downregulated genes would have a value of 0 > 1. To clarify, a gene with a microarray fold change of 28.1 is clearly upregulated. By contrast, a gene such as neuronal cell adhesion molecule (nrcam)with a microarray value of 0.48 is downregulated. However, the downregulated gene SRY (sex-determining region Y)-box 2 (sox2)has a microarray value of -3.29. To unify these types of gene, we simply followed the inverse rules in the Excel file (= 1/3.29 or 0.30)to keep these values below 1 in the case of downregulated genes.

Gene ranking

Next, we arranged genes in order from largest to smallest (upper to lower) in the raw data Excel file after unification (Figure 1).For example, in this study, the highest microarray fold change value was identified in a gene named protocadherin-10 (Pcdh10),with a microarray value of 277.13, which was listed in the number 1 position. The lowest value was that of tumor necrosis factor receptor superfamily member 6 (Fas/Apo-1/CD95), with a microarray value of 0.04, which was listed at position number 1016.Based on this ranking, microarray ranking position 1 is the most highly upregulated gene, and microarray ranking position 1016 is the most downregulated gene (Figure 2B-D).

Gene search for citations

Genes were arranged in the Excel file and searched for in PubMed or NCBI, Google, and Google Scholar. Genes used in experiments and those in which experimental results such as quantitative polymerase chain reaction (qPCR), WB were reported were regarded as cited genes, and those without experimental results were regarded as non-cited genes. For example, the glutathione S-transferase, mu 1 (Gstm1) gene had a microarray ranking (genes ranking in the Excel file) of 832 and a microarray fold change of 0.2, but it had no experimental value; thus, it was regarded as a non-cited gene. In contrast,mechanistic target of rapamycin kinase (mTOR) had a microarray ranking of 523 and available experimental data (qPCR and WB); thus, mTOR was considered a cited gene.

Figure 2 Overall distribution of genes in the study.

Gene search for experimental data

We searched for genes examined in the microarray study to determine whether the experimental results, including their morphology, protein or messenger nucleic acid (mRNA) levels,WB, immunohistochemistry (IHC), in situ hybridization (ISH),reverse transcription PCR (RT-PCR), or PCR, as well as other experimental data, were reported and compared with the fold change value. If they were, we would be able to differentiate the extent to which the experimental data varied from the microarray data, which were noted in the Excel file. For example, tnc,with a microarray value of 63.13, is believed to exhibit a 50%increase in expression relative to its value in the experimental study, which included WB analysis (Zhang et al., 2016).

Results

Distribution of up- and down-regulated genes

In this study, upregulated genes were defined as those with high expression values in the microarray study and a value of 0 < 1,and downregulated genes were defined as those with low expression values in the microarray study and a value of 0 > 1. The numbers of up- and downregulated genes were 639 and 377, respectively (Figure 2A). The microarray rankings on the left side(upregulated genes) of the graph indicate genes that are highly upregulated (Figure 2B), and those on the right side (downregulated genes) of the graph indicate genes that are highly downregulated (Figure 2C) (n = 1016, whereas the number of upregulated genes > the number of downregulated genes (Figure 2A)).

The distribution of all upregulated genes ranged from 1 to 639(Figure 2B), and the distribution of all downregulated genes ranged from 640 to 1016 (Figure 2C). The distributions of all up- and downregulated genes are shown in Figure 2D. These graphs show the microarray value versus the percentage fold change (Figure 2B-D) and how the genes are distributed.

Classification of types of promising genes

The genes were further classified based on how many experiments were performed: type 1, most promising genes; type 2, promising genes; and type 3, moderately promising genes. Here, the genes in which the protein levels were determined (WB and IHC) were classified as type 1; those in which the morphology was determined were classified as type 2; and those in which the mRNA levels were determined were classified as type 3 (Figure 3A). The experiments are used to make the graph where the protein level check is the maximum in number compared to the others, indicating that these genes are mostly checked and better confirmed in this study (Figure 3A).

The totals for the most promising, promising, and moderately promising genes averaged 14.08, 4.73, and 9.69, respectively (standard deviations of 24.05, 17.30, and 16.98, respectively), making it difficult to define these data as significant (Figure 3B). To confirm type 1, type 2, and type 3 genes and the experimental data, we reviewed 99, 21, and 27 articles, respectively (Figure 3C). We then analyzed the microarray fold change value (i.e., determined how much the microarray fold change differed from the experimental results) and the matching and mismatching genes (where matched genes were defined as those in which the microarray fold change value obtained from the paper showed the same expression in the experimental studies or in real experiments, such those involving as qPCR, WB, and IHC) (Figure 4A).

Figure 3 Level of potentiality of genes versus the level of experiments.

Figure 4 Microarray versus real experimental data including matching and mismatching genes.

Comparisons between microarray gene ranking and experimental data

The other graph shows the microarray ranking versus the percentage fold change and indicates the total distributions of up- and downregulated genes (Figure 4A). The number of experimental data found from the microarray genes is shown in different colored dots, which indicate microarray ranking(Figure 4A). The total number of matched genes was 290 and that of mismatched genes was 358 (Figure 4B). The matched genes included more type 1 and type 2 genes than type 3 genes,whereas the opposite results emerged for the mismatched genes(Figure 4C).

Some genes are weakly expressed according to experimental studies but strongly expressed according to microarray studies.Dots that are located closer to the microarray ranking line (blue dots) have a stronger possibility of being matched than those that are located farther from the line (Figure 4E). Some genes exhibit increased expression in experimental studies compared with microarray studies and thus have strong potential in Schwann cell studies. To summarize, the promising genes that were identified and confirmed (Figure 4D) indicate that 27.76%of all up- and downregulated genes (Figure 4E) could be useful for future Schwann cell studies.

Most Schwann cell microarray studies include in vitro and in vivo samples. Uninjured samples were used as the control, and crushed, cut, or other samples were used as injured samples.These samples were then analyzed by microarray techniques,and the fold change was obtained from published papers cited on PubMed. After collecting the microarray data, the data were further compared with experimental data in published papers obtained from PubMed (Figure 5). The Schwann cell microarray analysis of genes included those related to Schwann cells and those that aid in peripheral nerve regeneration and degeneration (Figure 5).

Discussion

Researcher's ability to rearrange microarray datasets increases statistical power to the level needed to detect biological phenomena in studies where logistical considerations restrict sample size and require the sequential hybridization of arrays(Johnson et al., 2007). Because of many applications of gene expression microarrays, biologists are able to efficiently extract hypotheses that can later be tested experimentally in a laboratory setting. For example, a microarray experiment may compare the gene expression profile of diseased or treated tissue (treatment) with the profile of normal tissue (control) to determine which genes are involved in the disease or are associated with the presence of treatment, providing a better understanding of the disease/gene relationship (Johnson et al., 2007). Big data analysis provides new opportunities to modern society as well as challenges to data scientists. Conversely, big data hold great promise for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. Big data also offer new levels of scientific discovery as well as economic value (Fan et al., 2014). Technology is advancing and new developments are generating data more efficiently; some examples include advancements in high-throughput next generation sequencing, microarrays in genomics and transcriptomics, mass spectrometry-based flow cytometry in proteomics, real-time medical imaging, and lab-on-a chip technologies (Alyass et al.,2015).

This study, which is part of a big data analysis, was conducted under the assumption that knowledge of different types of genes and their characteristics and functions (i.e., whether they are up- or downregulated) will allow for the easy detection of genes that are appropriate for experimental studies on Schwann cells. Additionally, some genes may act differently under different experimental conditions. This method could be a helpful and effective way to detect gene expression in future experimental studies. In conclusion, some genes are highly expressed in microarray studies but weakly expressed in experimental studies. Therefore, a microarray may not be a good tool for Schwann cell studies because the genes may be differently expressed. Conversely, some genes are weakly expressed in microarray studies and highly expressed in experimental studies; these genes are more promising targets for examination in future Schwann cell studies (Table 1). Therefore, microarray analysis could be a good tool for Schwann cell studies.

Figure 5 Schwann cell microarray study.

No study is perfect, and microarray Schwann cell studies have both advantages and disadvantages. For example, genes are arranged under different experimental conditions; therefore,the microarray value could vary under different conditions. The microarray fold change value indicates how much the value matches or mismatches the experimental conditions; thus, it is sometimes difficult to understand because not every published article provides a graph of its results. Additionally, the study could be wrong about the extent to which the genes are matched or mismatched. Apart from these disadvantages, there are many advantages of big data analyses of Schwann cell microarray studies. The current study provides insights into different types of Schwann cell-related genes, which could provide a new way of examining genes to identify target genes that play a key role in Schwann cells (Table 1). For example, a previous study reported the protein levels determined by northern blotting or WB in Schwann cells, but this study showed only the band and not the graphical results. This example shows why, in cases of genes such as fibroblast growth factor (fgf-5), it is difficult to define the relationship between experimental and microarray data with regard to gene expression (Scarlato et al., 2001).

In this analysis of Schwann cell microarray data, all genes were collected randomly. Thus, there is a possibility that the published data or experimental values were less than the microarray values. In such cases, naming all of those genes as no published or experimental data is also of no value. One might argue that the microarray approach is appropriate for identifying target genes. However, the fact that there were genes with a good microarray value but no experimental results indicates that microarray data differ from experimental data. Therefore,the use of microarray analysis to identify target genes remains controversial. Apart from this, it is also possible that microarray data and experimental data differ. Thus, microarray analysis may be a good tool.

The inherent aim of this microarray Schwann cell study was not to discover novel genes but to analyze genes from microarray studies and compare them with experimental results.Indeed, questions about whether a gene is a good match to theexperimental results have not been examined experimentally but could be a good starting point for identifying genes for further Schwann cell studies. If experimental results were available for 282 of 1016 genes, it would be possible to treat only 27.75%of the genes as targets in Schwann cell experimental studies.Thus, 27.75% of the genes obtained from the microarray analysis could be targets.

Table 1 Upregulated and downregulated genes in Schwann cells by microarray analysis

Differential gene expression is used to identify key genes that undergo changes in expression relative to healthy individuals and to patients with other diseases (Gliddon et al., 2018). One popular source of data is the microarray, a biological platform for the examination of gene expression. Analyzing microarrays can be difficult because of the size of the data they yield. Microarray databases are huge sources of genomic data, which,upon proper analysis, could increase our understanding of biology and medicine. Various microarray experiments have been designed to investigate the genetic mechanisms of cancer, and analytical approaches have been applied to classify various types of cancer or to distinguish between cancerous and noncancerous tissue (Hira and Gillies, 2015). According to our analysis of all published genes, only 27.75% of the genes (both up- and downregulated) could aid in future Schwann cell studies.

Although no experimental methods were used in this study,we have gained a deeper understanding of differential gene expression. The method described herein could be used to detect target genes and thus greatly contribute to studies on Schwann cell nerve degeneration and regeneration in the peripheral nervous system.

Author contributions:US and JJ designed this study, interpreted experimental results, defined intellectual contents, performed experiments,wrote the manuscript, and approved the final version of this paper.

Conflicts of interest:The authors report no potential conflict of interests.

Financial support:This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF)grant funded by the Korea government (MSIT) (2018R1D1A1B07040282;to JJ) and a grant from Kyung Hee University in 2018 (KHU-20181065;to JJ). The funding bodies played no role in the study design, in the collection, analysis and interpretation of data, in the writing of the paper, and in the decision to submit the paper for publication.

Copyright license agreement:The Copyright License Agreement has been signed by both authors before publication.

Data sharing statement:Datasets analyzed during the current study are available from the corresponding author on reasonable request.

Plagiarism check:Checked twice by iThenticate.

Peer review:Externally peer reviewed.

Open access statement:This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-Non-Commercial-ShareAlike 4.0 License, which allows others to remix,tweak, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms.

Open peer reviewers:Xin Luo, Duke University Center for Health Policy and Inequalities Research, USA; Tufan Mert, School of Medicine,University of Cukurova, Turkey.

Additional file:Open peer review reports 1 and 2.