Research on the Development of Medical Big Data

2018-05-15 07:15ZhangTing
中阿科技论坛(中英文) 2018年3期

Zhang Ting

(1.Chinese Academy of Medical Sciences&Peking Union Medical College,Beijing 100020;2.Institute of Medical Information/Medical Library,Beijing,100010)

Abstract: This study analyzes the development status of medical big data field by using bibliometric methods and identifies research hotspots and research frontiers by common word analysis and co-citation analysis. Through the analysis of the characteristics of medical big data, people can understand the research status and development trend of this field and provide some reference for the research of medical big data from the perspective of informatics.

Keywords: medical big data;bibliometrics;research hotspot;research frontier;visualization

1 Introduction

With the rise of technologies and applications such as the mobile Internet, the Internet of Things and social networks, the amount of data on a global scale has grown rapidly and the era of big data has arrived. Both academia and industry have given a lot of attention to big data and have had a deep discussion. Gartner pointed out in the annual technical maturity curve report that big data has entered an expansion period and will enter the peak of development in the coming years.Big data is one of the important development directions of information technology in the future. In the medical field, with the development of social economy and medical health progress, the spectrum of human diseases is changing; the types of diseases are increasing; the complexity of etiology, diagnosis and treatment is gradually increasing. In order to improve the health of human beings and explore the law of the occurrence and development of diseases,people need to constantly explore and discover the hidden laws from the vast amount of knowledge through the full exploitation and utilization of medical big data. This will play an important role in improving the level of medical information management, providing theoretical and methodological support for the diagnosis and treatment of diseases and promoting clinical practice and decision-making.

This study uses bibliometrics to analyze the research papers in the field of medical big data in the Web of Science database and to study the research status quo and development situation of medical big data from the perspectives of the number of papers,national or regional distribution and cooperation,research hotspots,and research frontiers.

2 Data Sources and Methods

2.1 Data Sources

The data comes from the Web of Science database and the data is retrieved and downloaded in December 2016. Search strategy- search for the subject of“Medical big data”,the document type is“Articles”.The language is“English”and a total of 2,286 data are obtained.

The data was imported into Thomson Data Analyzer 3.0 analysis tool (TDA, Thomson Reuters Co., New York, NY, USA) for data cleaning and bibliometric analysis and visualization was performed by software such as VOSviewer.

2.2 Research Methods

The co-word analysis method mainly collects the number of times the lexical pairs or noun phrases appear together in the same document,hierarchically clusters the words based on this to reveal the close relationship between the words and then analyzes structural changes that represent the disciplines and themes of them.The more times a vocabulary appears in the same document, the closer the relationship between the two topics is. Therefore, by counting the frequency of occurrence of the subject term in the same document, a common word network composed of these words can be formed and the distance between the nodes in the network can reflect the affinity-disaffinity relationship of subject content.Keywords can reflect the core content and theme of the paper and co-word analysis can reveal the research focus of a field. This study uses the method of co-word analysis to identify research hotspots in the field of medical big data.

Co-cited means that two (or more) documents are cited by one or more subsequent documents and that the two documents (cited documents) have a“co-citation” relationship. The “co-citation”relationship is measured by the number of papers cited (the number of citations). The greater the co-citation strength, the closer the relationship between the two documents. The co-citation analysis method uses a group of documents with certain subject representativeness as the analysis object,uses multi-dimensional statistical analysis methods such as cluster analysis method and multi-dimensional scale analysis method, uses computer to simplify the intricate co-citation relationship of these documents into a relatively small number of relationships between groups and displays in a more intuitive way. On this basis, the structural characteristics of disciplines, fields and documents represented by these documents are analyzed.Citation analysis is also an important part of bibliometric analysis and can reflect the research frontier in a field. This study used co-citation analysis to identify research frontiers in the field of medical big data.

2.3 Visualization

The co-word matrix and the co-introduction matrix are clustered and visualized by using VOSviewer software. VOSviewer is a visualization software that is widely used in various“co-occurrence”analysis. It can draw citations,keywords and other co-occurrence maps. The software has unique advantages in clustering technology and map drawing. In this study,VOSviewer software was used to cluster the co-word matrix and co-citation matrix to consolidate the research hotspots and research frontiers and to visualize the clustering results to obtain the knowledge map in the field of medical big data.

3 Results and Analysis

3.1 Annual Distribution of Papers

The Web of Science database contains 2,286 research papers in the field of medical big data.Overall, the volume of publications is in a steady growth trend. As can be seen from Figure 1, the first paper in the field was published in 1990, with fewer publications in the first few years. After 2000, it entered a period of steady growth. After 2010, the number of documents issued exceeded 100 and the growth rate was obvious.In 2016,it reached a peak of 453.Judging from the annual growth rate,the medical big data field has been in a stable growth trend in recent years and the development momentum is good.

Figure 1 Annual Distribution of the Number of Papers in the Field of Medical Big Data

3.2 Countries and Regions Distribution

According to the countries and regions distribution of medical big data papers published in the database of Web of Science, spatial distribution characteristics of research achievements in this field are grasped. A total of 112 countries and regions have published papers in this field,among which the top 10 in the number of publications is shown in Figure 2.The number of publications ranked the first is the United States (817papers),which is far higher than the second, the United Kingdom (237 papers).China has the third largest number of publications in the world, with 237 papers and the number of publications in other countries is below 200. The number of publications of 4 countries/regions is between 100-200. They are Germany (174),Australia (129), Italy(118)and Canada (113). There are three countries/regions with 100 or less publications. They are Netherlands (87),France(76)and Switzerland(73).

Figure 2 Top 10 Countries in the Field of Medical Big Data

From Figure 3, people can see the annual change trend of the number of publications in the top 10 countries. The US has the fastest growth in the number of publications and the growth rate is obvious. Before 2010, the gap between the US and other countries’annual publication volume was not obvious.However,after 2010,the annual publication volume of the United States is far ahead of other countries and is in an absolute dominant position.China’s annual publication volume has been slightly higher than that of the UK since 2014, ranking the second in the world and indicating that China has entered a stage of rapid development in the field of medical big data after 2014, but there is still a big gap compared with the United States.

Figure 3 The top 10 Countries and Regions in the Field of Medical Big Data

Through the cooperation network analysis of countries/regions, It is seen that the international cooperation in the field of medical big data,construct a co-occurrence matrix for the top 10 countries in the publication volume and draw a cooperative network map with Netdraw (as shown in Figure 4). The thickness and thinness of the two nodes in the network represent the strength of cooperation between the two countries. The thicker the line, the greater the number of papers collaborated in the two countries. As can be seen from Figure 4, the top 10 countries/regions have issued more papers with cooperation and the other nine countries have the closest cooperation with the United States.China also maintains close cooperation with other countries and regions and it has more cooperative papers with the United States and Canada.

Figure 4 Top 10 National and Regional Cooperation Networks in the Number of Papers in the Field of Medical Big Data

3.3 Research Hotspot

The key words are the author’s refinement of the core research content of the article. Through the cluster analysis of high-frequency keywords, it can consolidate the research hotspots in a field. 2,286 research papers covered 6,531 Keywords (author’s)and 6,882 Keywords (plus).After combining the two types of keywords, a total of 10,958 keywords were obtained and the top 30 high-frequency keywords were shown in Table 1. The author deleted keywords with only one record, selected the remaining top1%keywords as the analysis object and got 107.Co-occurrence analysis of 107 high-frequency keywords was performed by using TDA software to generate a high-frequency word co-occurrence matrix of 107*107 and import the common word matrix into the Vosviewer software for cluster analysis and visualization.

Table 1 High- frequency Keywords in the Field of Medical Big Data

Keywords Chinese Meaning Number epidemiology 流行病学 101 information 信息 97 prevalence 流行 92 Model 模型 89 Mortality 死亡 89 Quality 质量 85 HEALTH-CARE 卫生保健 76 management 管理 74 outcomes 结果 66 Gene 基因 65 Therapy 治疗 62 women 妇女 62 Breast cancer 乳腺癌 61 RISK-FACTORS 风险因素 61 Survival 生存 59 impact 影响 57 networks 网络 56 meta analysis meta 分析 54 Diagnosis 诊断 52 Obesity 肥胖 51 Quality-of-life 生活质量 51

Figure 5 shows the changes in the attention of high-frequency keywords over time in the field of medical big data. It can be seen that the research topics of interest in recent years are big data,algorithm, healthcare, cloud, next generation sequencing and so on.

Figure 5 Changes in the Attention of High- frequency Keywords over Time in the Field of Medical Big Data

The high frequency word co-occurrence matrix generated by TDA is imported into Vosviewer for clustering and visualization. Some words are not related to other words and are removed. Clustering results of 80 words are obtained and clustering results are analyzed. Then, 8 research hotspots (as shown in Figure 6) are possessed and they are medical information service platform construction and network security research in big data environment, model-based medical image classification algorithm research, epidemiology of cardiovascular and cerebrovascular diseases research, screening and prediction of breast cancer biomarkers based on gene expression data,development model and countermeasures of health management services, research of sports for health-related impact of life quality on obese people,surgical curative effect evaluation and survival analysis of cancer patients, curative effect evaluation and analysis of cost-effective of randomized clinical trials in the treatment of chronic diseases such as diabetes.

Figure 6 Research Hotspots in the Field of Medical Big Data

3.4 Research Frontier

There are 2,286 papers in the field of medical big data, covering 77,467 cited references and the cited references in the past 5 years (2012-2016)are the research objects of the research frontier, with a total of 17,108 articles. The data with the frequency of 1 is deleted and the remaining 909 pieces are selected. Then, the top1% data is selected as the analysis data and there are 91 pieces.91 papers were imported into Vosviewer software for clustering and visualization (Figure 7). A total of 10 research frontiers were obtained(as shown in Table 2,from 87 papers,4 of which were not related to other papers).

Figure 7 Research Frontiers in the Field of Medical Big Data

Each research frontier is composed of a set of documents,each of which has a publication year that calculates the average publication year for each group of documents (as shown in Table 2). Through the average year of each frontier, people can judge the old and new situation of the frontier and grasp the development of the research frontier in the field of medical big data from time. Among the 10 research frontiers, the latest research hotspot is medical big data quality assessment based on electronic medical records and its prediction potential analysis (average year 2013.7). The oldest research hotspot is data sharing and privacy protection of medical big data.(average year 2012.4).

Table 2 Research Frontiers in the Field of Medical Big Data

No. Research Frontiers Number of Literature Frequency of Total Citation Average Year 8 8 Data sharing and privacy protection of medical big data 7 36 2012.4 7 36 2012.4 9 The application and challenges of medical big data in the fields of genomics, public health and preventive medicine 6 59 2012.7 10 Medical big data quality assessment based on electronic medical records and its potential analysis 6 43 2013.7

4 Conclusion

This study uses bibliometric methods to analyze the research in the field of medical big data,describes its development process and identifies research hotspots and research frontiers.Studies have shown that the number of publications in the field of medical big data is increasing year by year,indicating that the field of medical big data is the focus of global attention. Through the method of bibliometrics, the research status quo and development trend of medical big data field can be proved and 8 research hotspots and 10 research frontiers are determined, which can provide some reference for medical big data research.
