Yuan Fang (袁 芳),Sun Jingsong,Zheng Jia,Li Nong
(Institute of Scientific and Technical Information of China,Beijing 100038,P.R.China)
Abstract
Key words:gene editing,bibliometric analysis,CiteSpace,publication
Gene editing is a type of genetic engineering in which DNA is inserted,deleted,modified or replaced in the genome of a living organism.Because of gene editing targeting the insertions to site specific locations,gene editing has precision and efficiency features.The precise gene editing is a powerful tool for studying biological processes[1].Gene editing is widely used to create model cell lines,engineer metabolic pathways,produce transgenic animals and plants,perform genome-wide functional screen and,most importantly,treat human diseases that are difficult to tackle by traditional medications[2].The major hurdles for efficient gene editing in therapeutically relevant primary human cells have largely been overcome as shown in numerous recent preclinical studies and multiple clinical trials are expected to start within the next couple of years[3].In recent years,gene editing technology has been selected as “disruptive technology” by many organizations.Gene editing has taken the biomedical science field by storm,initiating rumors about future Nobel Prizes and heating up a fierce patent war,but also making significant scientific impact[4].
The common methods for gene editing used engineered nucleases,and 4 major families of engineered nucleases were used:meganucleases,zinc finger nucleases (ZFNs),transcription activator-like effector based nucleases (TALENs),and the clustered regularly interspaced short palindromic repeats (CRISPR/Cas9) system[5-8].Meganucleases,discovered in the late 1980s,are enzymes in the endonuclease family which are characterized by their capacity to recognize and cut large DNA sequences (from 14 to 40 base pairs)[9].ZFNs and TALENs are chimeric Zinc fingers and TAL effectors are fused to the endonucleolytic DNA cleavage domain of the Fok1 endonuclease to make zinc-finger proteins or transcription activator-like effector proteins[2].CRISPR/Cas9 system relies on single guide RNA directing the Cas9 endonuclease to a specific site in the genome[3].Due to different features of the 4 approaches,meganucleases,ZFNs,TALENs and CRISPR/Cas9 system have their applicable fields.TALENs and engineered meganucleases were selected byNatureMethodsas the 2011 Method of the Year[10].The CRISPR/Cas9 system was selected byScienceas 2015 Breakthrough of the Year[11].
Bibliometric analysis is a basic but an effective way to detect and examine the emergence of a new technology[12].At present,Cao et al.[13],Liu[14]and Wang et al.[15]studied the field of gene editing through bibliometrics.However,these researchers just studied the recent 10 years development of gene editing.This research studies the entire development process of gene editing through bibliometric analysis to explore the distribution features and research focus with visual tool.
The bibliometric study involves the statistical analysis of scientific publications,which adopts quantitative performance indicators to get over the disadvantage of subjectivity in peer review and expert judgments,and has been used to assess research performance in an increasing amount and variety of studies[16].These have laid the groundwork for implementing the basic scientometric analysis which evaluates emerging research domains in the present study[17].Visual analysis of scientific publications is an important branch of information visualization.CiteSpace is a Java-based scientific visualization tool,which could be potentially used to detect and visualize abrupt changes,emerging trends,and dynamics in scientific domains[18].In this paper,we will use CiteSpace to describe the development process of gene editing.
The data were retrieved from Science Citation Index Expanded (SCI-EXPANDED) of the Web of Science Core Collection database.Search string is TS (Topic) = (“gene editing” OR “genome editing” OR “gene edited” OR “genome edited” OR “gene editMYM” OR “genome editMYM” OR ((“clustered regularly interspaced short palindromic repeats”) OR (“clustered regularly interspaced short palindromic repeat”) OR (“CRISPRMYM”) OR (“CRISPER” and “cas”) OR (“CRISPER” and “cas9”)) OR (“meganuclease” OR “meganucleases” OR “homing endonuclease$”) OR (((“ZFN” OR “ZFNs”) and (“gene” or “genome” or “nuclease$”)) OR (“zinc fingers”) OR (“zinc finger”)) OR ((“transcription activator-like effector$”) OR (“TAL effector$”) OR ((“TALEN” OR “TALENs”) and (“gene” OR “genome” OR “nuclease$”)) OR (“transcriptional activator-like effector$”))).Document type set as “ARTICLE”.The selected results were sent to file of plain text by setting the “record content” as “full record and cited references”.In basic distribution analysis,the database chosen for gene editing obtained 31 157 articles as of 2018.In keywords analysis and co-citation analysis,the database chosen for this study obtained 32 625 articles and data set last updated on April 19,2019.CiteSpace was used in keywords analysis and co-citation analysis.After the data cleaning and the data preprocessing through CiteSpace,keywords screening includes artificial analysis.In artificial analysis,some words that are not related to gene editing technology or have no practical meaning will be eliminated.For example,we eliminated these words like “family”,“specificity” and “nf kappa b”,etc.
2.1.1 The number of annual publications distribution analysis
The number of annual publications is an important index for measuring the development of scientific research,as it reflects,to a certain extent,the changes in knowledge quantity[19].Therefore,the distribution of annual publications (Fig.1) can form a preliminary understanding of the development process of gene editing.Because the publications data is incomplete in 2019,we only count it as of 2018.It can be seen that the first gene editing related article was published in 1987 and 31 157 gene editing related articles had been published from 1987 to 2018.
In 1996,ZFN was constructed by Kim et al.[20].One major concern of using ZFNs for clinical applications is its potential off-target activity,for instance,to mutate both alleles of the dihydrofolate reductase gene in Chinese hamster ovary cells to improve production of recombinant protein or to create human blood cells resistant to HIV by mutating the gene encoding CCR5[21,22].In 2010,TALEN was first constructed by Christian[23].TALENs are used in a similar way to ZFNs.In practice,ZFNs and TALENs require a lot of time and cost to build and design manually.Since the Cas9-sgRNA system was developed as a genome editing system in 2012[24],the number of publications related to gene editing has grown rapidly with hundreds of related articles published in every year.In 2018,4 284 gene editing related articles was published,which is the most ever for a single year from 1987 to 2018.CRISPR/Cas has the advantages of simple operation and low cost.However,the CRISPR/Cas system’s editable genomic loci are limited by DNA sequences,and the off-target effect also limits the application of CRISPR/Cas system.Therefore,meganucleases,ZFN and TALEN with their own unique technical advantages have not been completely replaced by the CRISPR/Cas system.Gene editing technology,especially CRISPR/Cas,has been rated as one of the most important technological breakthroughs in the field of life sciences.Significant advances in gene-editing technology are emerging,bringing disruptive changes to life science research and related industries.Governments all over the world attach great importance to the application and development of gene editing and have invested a lot of funds to support the research of gene editing technology.
Fig.1 Yearly publications of gene editing research output from 1987 to 2018
2.1.2 Journal distribution analysis
Journal analysis provides guidance for scholars to select platforms for data collection and publishing their researches[19].31 157 gene editing related articles published from 1987 to 2018 and the top 15 academic journals are shown in Table 1.“Journal of Biological Chemistry”,“PloS One” and “Nucleic Acids Research” have issued more than 1 000 gene editing related articles.“Journal of Biological Chemistry” tops the list with 1 477 publications,manifesting the journal greatly favors the subject of gene editing.The impact factors of “Nucleic Acids Research”,“Nature Communication” and “EMBO Journal” all are greater than 10,which proves there is a high possibility of publishing research articles on gene editing in high-level journals.
In addition,the categories of “PloS One”,“Proceedings of the National Academy of Sciences”,“Scientific Reports” and “Nature Communication” all belong to multidisciplinary sciences,which shows gene editing related articles are not only issued on biological journals but also on multidisciplinary journals that would favor the gene editing researches.Gene editing technology,as one of the most important breakthroughs in biotechnology in this century,is gradually intermingling with disciplines such as chemistry,information,and materials.It has conceived and promoted a number of new technologies and new applications in medicine,agriculture,industry,etc.This may be why gene editing technology is widely published in these journals of multidisciplinary sciences.
Table 1 The top 15 journals in the field of gene editing
2.1.3 Country distribution analysis
In the field of gene editing,a total of 144 countries or regions have published relevant articles as of 2018 and Fig.2 shows the top 15 countries or regions.The number of gene editing related articles in USA is 14 525,accounting for 46.619% of the world total,which is far ahead of the rest of the world.It indicates that USA is the main publishing countries for gene editing research.The number of China’s gene editing related articles is followed by 5 237,accounting for 16.808% of the world total.Japan ranks the 3rd in the world,and the number of Japan’s gene editing related articles is 3 228,accounting for 10.360% of the world total.Other countries’ gene editing related articles account for less than 10% of the world total.
In USA,some institutes such as the Massachusetts Institute of Technology,Harvard University,and the Broad Institute have done a lot of outstanding works in the field of gene editing technology.In addition,many companies also promote the development of gene editing technology.For example,Sangamo develops clinical medicine for gene editing technology and Corteva develops gene editing agricultural products.In China,the Chinese Academy of Sciences has the most outstanding scientific research achievements in the field of gene editing technology.Comparing with USA,Chinese companies engaged in gene editing technology are relatively weak.
Fig.2 Countries or regions of gene editing researches in the world
2.2.1 The time-zone of keywords analysis
By plotting the time zone of keywords,the dynamic development process of the gene editing research can be obtained Fig.3.Gene editing research originated in the late 1980s and early 1990s.The early gene editing research mainly focused on the protein level research of nucleases.In this period,some keywords emerged,such as “protein”,“expression”,“crystal structure”.In the middle stage of gene editing development,the research mainly focused on the mechanism level of nucleases.“Apoptosis”,“phosphorylation”,“Mechanism”,“pathway” and so on are merged in this period.After 2007,“zinc finger nuclease” as the representative of gene editing nuclease has emerged on the research of gene editing.And CRISPR has been the focus of gene-editing research in last five years.In recent years,gene editing has also been applied to human cells to treat the diseases.“stem cell”,“disease”,“human cell” and “therapy” are merged in this period.
Fig.3 The time-zone of keywords in the field of gene editing
2.2.2 Co-occurrence network of keywords analysis
Keywords indicate the soul of articles and can show at a glance the advancement in research topics.Co-occurrence network of keywords analysis can evaluate the topics and research trends in the field of gene editing.In Fig.4,the network has 170 nodes indicating the number of keywords and 1 332 links.The size of the node represents the citation count of the keyword,and the larger the node is,the higher the citation count is.The citation counts of “expression”,“gene”,“protein”,“zinc finger protein” and “gene expression” are all above 3 000.The links between the nodes represent the co-occurrence relationship of the keywords.The links between “zinc finger protein” and “gene expression” are dense,which shows the relationship of these 2 keywords is close that these 2 keywords are likely to appear in the same literature.
Fig.4 The co-occurrence network of keywords in the field of gene editing
Co-citation analysis is one key aspect of bibliometrics,which can identify the influence of the authors and the literatures[25].The intellectual structure of highly cited literatures can be developed by means of CiteSpace.The “TopNper slice” is set asN=50,that means 50 most cited or occurred items are selected from each slice.Fig.5 shows the salient intellectual structure of co-cited literatures.In Fig.5,Some points on the right are not connected with points on the left,indicating that in the first 50 documents of each year,they are not co-cited with these literatures in the figure.Most of the points on the left are highly cited literatures,which concentrate in CRISPR/Cas9.For example,Jinek[24]et al.first demonstrated CRISPR/Cas9 could be used in gene editing system.Cong et al.[26]and Mali et al.[27]respectively used CRISPR/Cas9 in human cell lines,which has greatly promoted the application of CRISPR technology.The points on the left are some lower cited literatures,which include the relatively little research in gene editing technology,sush as meganucleases.
Fig.5 The co-citation network of literatures in the field of gene editing
From the network summary table exported by CiteSpace,the top 10 most cited literatures in the co-citation network are selected for specific analysis,as shown in Table 2.The researches of top 10 highly cited literatures are all about CRISPR/Cas and concentrated in 2013.Due to the high concentration of research content and publication time,some literatures did not have a co-citation relationship with the top 10 highly cited literatures in Fig.5.This also explains that CRISPR/Cas is a hot topic in the field of genetic editing.Compared with other methods of gene editing,CRISPR/Cas9 has the advantages of simple operation and low cost.Hence,CRISPR/Cas9 has developed rapidly since 2012 and the highly cited literatures concentrated in CRISPR/Cas.
Table 2 The top 10 highly cited literatures in the field of gene editing
In this study,the basic distribution of gene editing is analyzed,including the number of annual publications distribution,journal distribution and country distribution.The number of publications related to gene editing is increasing year by year.Since 2011,the number of publications has grown rapidly with hundreds of related articles published in every year.USA is the main publishing countries for gene editing research,which number of gene editing related articles is almost half of the world.Gene editing,as “disruptive technology”,has taken the biomedical science field by storm.Therefore,both of high-level journals or multidisciplinary journals would favor and publish the articles of gene editing research.In order to further study,the research hotspots are explored with CiteSpace to analysis the keywords and co-citation.In the 4 major methods of gene editing,the researches of ZFNs and CRISPR/Cas9 system are relatively more than those of meganucleases and TALEN.CRISPR/Cas system is a hot topic in the field of genetic editing in last few years.Gene editing is opening up a world of possibilities for the treatment of genetic diseases.
High Technology Letters2020年1期