JSON-ASR: A lightweight data storage and exchange format for automatic systematic reviews of TCM

2021-05-10 03:47:42JiXuHongyongDeng
TMR Modern Herbal Medicine 2021年2期

Ji Xu, Hongyong Deng *

1 Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China.

Abstract Objectives: The aim of this study was to investigate and develop a data storage and exchange format for the process of automatic systematic reviews (ASR) of traditional Chinese medicine (TCM).Methods: A lightweight and commonly used data format, namely, JavaScript Object Notation (JSON), was introduced in this study.We designed a fully described data structure to collect TCM clinical trial information based on the JSON syntax.Results: A smart and powerful data format, JSON-ASR, was developed.JSON-ASR uses a plain-text data format in the form of key/value pairs and consists of six sections and more than 80 preset pairs.JSON-ASR adopts extensible structured arrays to support the situations of multi-groups and multi-outcomes.Conclusion: JSON-ASR has the characteristics of light weight, flexibility, and good scalability, which is suitable for the complex data of clinical evidence.

Keywords:Data storage and exchange, Automatic systematic reviews, Traditional Chinese medicine, JavaScript object notation

Background

In the past 30 years, evidence-based medicine (EBM) has made significant developments from its establishment to maturity and has had a large impact on the model of health decision-making.However, EBM still has many problems, in which the most criticized are the long procedure cycle and low timeliness of classic systematic reviews [1].Researchers have attempted to find solutions, such as automation of systematic reviews (ASR), to address this issue.The International Collaboration for Automation of Systematic Reviews (ICASR) pointed out in 2018 [2] that without an automated method for reviewing thousands of research articles, including the many published every year, findings might be overlooked when developing new policies; automated tools for systematic review would enable more transparent and timely reviews, thus maximizing the potential for identifying and translating research findings into practical applications.The research of ASR has been continuously enriched, and now, there are many informatic technologies, data storage and exchange mechanisms, algorithms, and visualization techniques involved in this field.Among them, an efficient data storage and exchange format is essential.This is beneficial for smooth collaboration among different working groups, different system platforms, different systematic reviews software, and different functional modules of evidence production tools.

In the traditional process of system review, printed paper forms and/or Microsoft Excel spreadsheets are usually used for data collection and storage.Cochrane had recommended its data sheet for making Cochrane SR [3], but this kind of table is not suitable for automatic processing by computers.There are currently no public reports on ASR data storage and exchange standards.For this study, we referred some resources of clinical trials, such as Health Level 7 (HL7) [4] and the Clinical Data Interchange Standards Consortium (CDISC) [5].We also considered the possibility of using different file formats in ASR data.In terms of maturity and universality, Extensible Markup Language (XML) [6,7] and JavaScript Object Notation (JSON) have obvious advantages, and JSON is better at lightweight on its size [8].

JSON is a flexible, efficient, and open standard file format for data interchange.It is based on a subset of the JavaScript Programming Language Standard ECMA-262 3rd Edition - December 1999.JSON is a human-readable text format that is completely language-independent but uses conventions that are familiar to programmers of the C-family of languages, including C, Java, JavaScript, Python, and many others [9,10].These properties make JSON an ideal datainterchange format for ASR.

Methods

JSON Syntax

JSON is a language-independent data format.It was derived from JavaScript, but many modern programming languages include code to generate and parse JSON-format data.The official filename extension for JSON is “.json”.JSON is built on two structures: a collection of key (name)/value pairs, and an ordered list of values.The former is also known as an object.An object is an unordered set of key/value pairs.An object begins with “{” (left brace) and ends with “}” (right brace).Each key is followed by “:” (colon) and the key/value pairs are separated by “,” (comma).In this study, we designed the framework and elements of JSON-ASR according to the characteristics of ASR data while following the JSON syntax.

JSON-ASR Sections

A JSON-ASR file is a predefined dataset to describe a clinical evidence object.Different parts of clinical evidence can be mapped to different sections of JSON files.Clinical evidence usually consists of six parts: general information, source of evidence, clinical data, trial design, grouping, and outcomes.Each part corresponds to a sub-object (section) under the main JSON object.For those multi-member sections, such as sources, groups, and outcomes, arrays of objects were employed rather than a single object.

Key/Value Pairs

JSON uses key/value pairs to describe entity information.The key is a string in double quotes to name the value.The values vary in types such as string, number, array, Boolean, and object.A string-type value should be double quoted, while a number, or Boolean (i.e., true or false) value without quotes.An array begins with “[” (left bracket) and ends with “]” (right bracket); elements in an array are separated by “,” (comma).JSON object structures can be nested, which means the value of an object can be another object or even an array of objects.

Results

Framework of JSON-ASR

JSON-ASR is a plain-text file format that is easy to read and parse.Each JSON-ASR file (*.json) stores the entire clinical evidence data in a nested object structure.There are six sections, namely, general, source, clinic, study, group, and outcome, in the main object of JSON_ASR (see Figure 1).Each section has a key/value pair structure to describe the information within its field.All keys are named and followed by an object (general, clinic, study sections) or an array of objects (source, group, and outcome sections).For example, the general information of a clinical evidence, such as unique id, name, create date, and editor, can be fully presented in just one object, but the group section requires a list of objects to record its grouping data.For a clinical trial, there are at least two groups, namely, intervention and control.

JSON-ASR allows up to three levels of nesting, for example, “source-authors-first author.” The evidence source section can clearly inform the first author who reported the trial, and the path of “outcome-treatmentsdata” can be followed to retrieve the exact group’s outcome value.This nested tree structure ensures efficiency of JSON-ASR in storing and transmitting clinical evidence data.

Key/value pairs

JSON-ASR describes data details in the form of keyvalue pairs; some of them are required and others are optional.In this paper, the value types are limited to string, number, array, and object (see Table 1).There are about 80 to 140 pairs in a typical JSON-ASR file; the actual number is based on the number of sources, groups, and outcomes within it.Accordingly, the size of a JSON-ASR file is approximately 9 kilobytes, with a range from 4 kilobytes to 15 kilobytes.

In the general section, there are six pairs for recording evidence and normal information, and the source section has eight pairs.Two source-section elements, namely, journal and authors, have nested objects as value.The former has an object of journal, such as

Similarly, the study object has a sub-object of risk of bias, which provides full details of the Cochrane seven risk factors:

Figure 1.Framework of JSON-ASR

Table 1.Main key/value pairs of JSON-ASR (Level 1,2)

An intervention object is implemented in the group section, which includes drug, administration, and dosage information:

For each outcome, there are always two or more groups’ test values to be reported.Hence, an array is needed in the outcome section to present all groups’ data:

The value syntax of “data” is different based on its own type.For continuous data, there are three numbers separated by a comma, namely,mean, st andard deviation,andsample size; for dichotomous data, onlyevents numberandsample sizeare listed.

Discussion

The JSON data format is widely used in data storage and exchange [11,12], which makes JSON-ASR have the most compatibility.A JSON-ASR file organizes clinical evidence information in the form of objects and presents it in plain-text format, which is not only suitable for computer processing but also convenient for human reading.The symbol-based key/value pair system can clearly describe the data and minimize redundant information, thus ensuring a lightweight JSON-ASR file.We compared 243 trials’ data sizes of different file types.The size of the original PDF files ranged from 26 to 3066 kilobytes and averaged 844.26 kilobytes; XML files were smaller than PDFs, ranging from 9 to 57 and averaging 14.84 kilobytes; the JSONASR files were the smallest, ranging from 4 to 39 and averaging 9.3 kilobytes.Considering that the amount of information carried by the three types of files is almost the same for systematic reviewing, JSON-ASR has higher storage and transmission efficiency compared with the other file types.

JSON-ASR predefines about 80 elements (objects and key/value pairs) of clinical evidence, which is enough for normal systematic reviews.We also verified this on a TCM-ASR prototype system [13].However, we still believe that TCM-ASR is a file format with excellent extensibility; hence, it is recommended to make appropriate extensions in the case of ensuring compatibility in actual use.As readers may have noticed, TCM-ASR is designed and best suited for randomized controlled trial (RCT) data, and it is not inherently suitable for all types of clinical studies, such as one-arm trials and diagnostic accuracy tests.Nevertheless, JSON-ASR is still a good and basic reference format for the other types of clinical evidence data.

Conclusion

This paper reports an attempt to establish a storage and exchange format that supports clinical evidence for automatic systematic reviews.JSON-ASR, a JSONbased lightweight and smart data format was developed.The JSON-ASR file includes six sections of general, source, clinic, study, group, and outcome; each has some key/value pairs to describe the details of evidence.The JSON-ASR file has nested objects and a tree structure to fit the complex data of clinical evidence.The size of a JSON-ASR file is smaller; hence, it has more advantages in storage and transmission.In conclusion, JSON-ASR is a suitable data format standard of clinical evidence that can effectively promote the development and application of automatic systematic reviews.

Data Availability

The data used to support the findings of this study are available from the corresponding author on request.