Dahui Qin
Molecular Diagnostic Laboratory, Department of Pathology, Moffitt Cancer Center, Tampa 33612-9416, FL, USA
Next-generation sequencing (NGS) is a new technology used for DNA and RNA sequencing and variant/mutation detection. NGS can sequence hundreds and thousands of genes or whole genome in a short period of time. The sequence variants/mutations detected by NGS have been widely used for disease diagnosis, prognosis, therapeutic decision, and follow up of patients. The capacity of its massive parallel sequencing offers new opportunities for personalized precision medicine.
NGS1-4is a new technology for DNA and RNA sequencing and variant/mutation detection. This technology combines the advantages of unique sequencing chemistries, different sequencing matrices, and bioinformatics technology. Such a combination allows a massive parallel sequencing of various lengths of DNA or RNA sequences or even whole genome within a relatively short period of time. It is a revolutionary sequencing technology after Sanger sequencing5.
NGS involves several major steps in sequencing. For example, DNA NGS involves DNA fragmentation, library preparation, massive parallel sequencing, bioinformatics analysis, and variant/mutation annotation and interpretation.
DNA fragmentation is used to break the targeted DNA into many short segments, usually 100-300 bp in length. Different methods can be used to achieve this. DNA can be fragmented using mechanical methods, enzymatic digestion6, or other methods. For example, sonication can be used to break DNA into short segments. The short segments relevant to the targeted DNA sequences are pulled out using specific complementary probes of different designs7,8. This method is usually referred to as hybridization capture assay. Another method involves polymerase chain reaction (PCR)amplification. In this method, many pairs of primers are used to amplify the targeted DNA segments using PCR. The PCR products serve as short segments of targeted DNA. This method is usually called amplicon assay9,10. The DNA segments are then used for library preparation.
Library preparation is a process by which DNA segments are modified so that each DNA sample can have a samplespecific index like sample identification which helps to identify the patient from whom DNA sequencing was performed. This process also allows the sequencing adaptors to be added to the DNA segments. Such modification allows the sequencing primers to bind to all the DNA segments and enables massive parallel sequencing later.
Massive parallel sequencing is performed using an NGS sequencer. The library is uploaded onto a sequencing matrix in a certain sequencer. Different sequencers have different sequencing matrices. For example, Illumina NGS sequencer uses flow cells and Ion Torrent NGS sequencer uses sequencing chips. However, its goal is the same, which is to allow massive parallel sequencing of all the DNA segments at the same time. The sequence information generated from such massive parallel sequencing is analyzed using bioinformatics software.
Bioinformatics analysis is a process involving base calling,read alignment, variant identification, and variant annotation11. During this process, the sequence information is compared to a human genome reference sequence to identify whether there are any variants/mutations in the targeted sequences. All information from each sequenced segment is pieced together to generate final sequencing results for the full length of the targeted DNA. The final sequencing results are sent back to the user for interpretation.
The annotation and interpretation processes are set to identify each variant and their possible biological/clinical significance.
One of the advantages of NGS is to interrogate many targets at the same time on the scale of hundreds and thousands or even millions of targets. Such capacity gives NGS huge potential application in clinical settings. For example, in cancer patient care, any given tumor may have multiple mutations. If the traditional molecular assays are used in such clinical settings, multiple assays may have to be performed for multiple mutations. A larger amount of tissue may be needed for these multiple assays. Using NGS technology, these targets can be interrogated in one test.Therefore, less tissue is needed and the results of dozens and hundreds of DNA targets are obtained from one test. In recent years, scientific research has revealed an increasing number of mutations in different diseases. For example,different mutations have been discovered in different hematopoietic leukemias. The mutations in NPM1 and CEBPA of a RUNX1 have been found in acute myeloid leukemia (AML) associated with different entities of AML subtypes12. In myelodysplastic syndrome, many mutations have been found associated with different clinical implications. The genes harboring these mutations include but are not limited to TET2, DNMT3a, ASXL1, EZH2,SF3B1, SRSF2, U2AF1, ZRSR2, TP53, STAG2, NRAS, CBL,JAK2, NF1, RUNX1, ETV6, IDH1, IDH2, SETBP1, PHF6,BCOR, STAT3, and PPM1D13.
In a patient's sample, the leukemic cells bearing the same mutation are believed to be originated from one clone. A patient's sample with many mutations may have more than one clone of leukemic cells. A leukemic disease may not only have more than one clone of leukemic cells, and the clones may change during the course of the disease. Such phenomenon is referred to as clonal evolution14-18. A similar phenomenon has been observed in solid tumors too17,19. This observation has changed the old one-tumor one-mutation concept. A solid tumor may have multiple mutations. These mutations can originate from one clone or multiple clones.This is called tumor mutation heterogeneity20. Therefore, in cancer patient care, multiple gene mutations need to be tested very often. Due to clonal evolution, many different gene mutations need to be tested during follow-up. Besides,not only solid tumors may have multiple mutations, but metastatic tumors may also have mutations different from that of the primary tumor21,22. Such findings also indicate that multiple mutations need to be tested by diagnostic and follow-up molecular tests. With the advent of immunotherapy, tumor mutation burden has become an important parameter to be tested23. This again needs to investigate numerous mutations in a tumor sample. The traditional molecular test methods are not useful for such needs. Therefore, NGS technology becomes necessary for such tasks in patient care. Moreover, in the current medical practice, although biopsy specimens become smaller and smaller, more information related to mutation need to be extracted from small biopsy samples. In many cases, it is impossible for the traditional molecular tests to meet such needs. NGS has evolved to meet such needs. By massive parallel sequencing, NGS technology can test multiple samples and multiple targets at the same time. Therefore, it increases the turnaround time of the molecular tests. It has become clear that NGS technology is an important tool in personalized precision medicine. It provides information for disease diagnostic classification, selection of therapeutic agents, and prognostic evaluation. However, the application of NGS in clinical settings involves challenges.
For a proper clinical application of NGS, multiple necessary measures have to be followed. Guidelines and recommendations regarding the use NGS technology for clinical tests have been published by the College of American Pathology (CAP)24, Association of Molecular Pathology(AMP)25, and other agencies26.
NGS can be performed at different levels. It can be used for whole-genome sequencing. At this level, almost all the nucleotide in the genome, including chromosomal DNA and the mitochondrial DNA, are interrogated. Whole genome sequencing is used more often in research and less common in clinical settings. When used in clinical settings, it is used more often for constitutional genetic diseases, rather than for cancer somatic mutations. It is especially useful for the diagnosis of some rare genetic diseases. For example, when a genetic disease is suspected but no specific mutation has been identified by other molecular tests. In such cases, whole genome sequencing may provide additional information in terms of disease-associated mutations. Whole genome sequencing is used less frequently for cancer somatic mutation because the average depth in whole genome sequencing is limited. In a certain tumor, allelic mutation frequencies may vary and the percentages of tumor cells in different specimens may also vary. Detection of different mutations with different allelic frequencies in such settings often need deep sequencing and that is very challenging for the whole genome sequencing method.
NGS assay can be used for whole exome sequencing. The entire coding region of all axons of an organism including any cell types can be sequenced. In human, that is about 1%of the human genome and is more often used in research.
NGS can also be performed at transcriptome level which includes entire assembly of RNA transcripts in a given cell type including mRNA, rRNA, tRNA, micro-RNA, and noncoding RNA. Unlike DNA sequencing, this is called RNA sequencing. Specially designed mRNA sequencing is also often used to detect fusion genes.
The most commonly used NGS assay for cancer patients is targeted panel sequencing which usually interrogates dozens or hundreds of targeted genes. Such targeted NGS assays are usually designed for a disease or a category of diseases, for example, a panel designed for myeloid leukemia or a panel designed for carcinoma. Compared to whole genome sequencing, such targeted panel has only limited targets.Therefore, it allows a lot more depth in sequencing, which is necessary to cover different mutations with different allelic mutation frequencies.
The clinical application of NGS testing involves several aspects of work. The first step is to determine what genetic mutations need to be interrogated. That is usually based on what disease or diseases the particular NGS testing is designed for. Taking cancer patient as an example, the first thing that needs to do is to check the current guidelines for the disease or diseases to identify the targeted gene mutations that are included in the guideline as the standard of care. For example in non-small cell lung cancer, the guidelines indicate that EGFR, KRAS, BRAF, MET, RET, and Her2 mutations,and ALK and ROS1 translocations are clinically significant and need to be tested27. Cancer mutation research is a very active field and new discoveries are made every day.Therefore, literature needs to be reviewed to identify other potential mutations that are not yet included in the guideline but are potentially significant in clinical settings. For example, PIK3CA, NRAS, AKT1, MAP2K1, NTRK, and DDR2 mutations may have potential clinical significance in a small portion of non-small cell carcinoma patients28-31, and therefore, need to be considered whether to be included. It will also be beneficial for the laboratories to communicate with the clinicians of each subspecialty to get their feedback.By doing so, a list of mutations to be interrogated can be generated. NGS assay designed for such a group of gene mutations is usually called an NGS panel.
The next step is to identify an appropriate NGS method for NGS panel. There are different NGS methods available. For a well-defined gene panel, an amplicon assay9,32,33can be used.However, if some targeted genes are difficult for an amplicon assay to work, a different method can be used, such as hybridization capture assay34. For example, if we are designing a panel for AML patients, CEBPA mutations and FLT3 mutations should be included in this panel. It is well known that CEBPA gene has high GC content which poses a challenge for an amplicon assay35. FLT3 mutations usually consist of different sizes of internal tandem duplicates. This is also challenging for an amplicon assay, which usually sequences a fixed size of amplicons. In such a case, a hybridization capture assay may work better. Once an NGS method is chosen for a clinical application, one needs to decide what type of specimen will be tested25. Will it be fresh tissue or formalin fixed paraffin-embedded (FFPE) tissue? If an NGS assay is designed for hematological malignancy, it will be more likely to test fresh tissue samples. On the other side, if an NGS assay is designed for a solid tumor, it will be more likely to test FFPE samples. FFPE samples are usually provided as unstained slides with an H&E stained slide,which usually involves microscopic review and micro/macrodissection of the samples before DNA extraction. Sometimes,a panel may be designed for both fresh tissue and FFPE tissue based on its application. The next step is to establish the standard operation procedure (SOP). This involves writing up an SOP and performing preliminary tests to fine-tune the SOP. The SOP should include wet lab procedures and dry lab(bioinformatics pipeline) procedures.
After the method of NGS assay is decided, the assay needs to be validated in a CLIA certified laboratory. The validation consists of several aspects of work, including ensuring assay accuracy, precision, reportable range, reference range,analytical sensitivity, and specificity25. A validation plan needs to be laid out. The validation plan usually defines what positive and negative control samples are going to be used.The positive and negative samples can also be mixed at different ratios to constitute the positive controls with different allelic mutation frequencies. Such controls will be used to test the assay sensitivity. Ideally, the positive controls will include different types of mutations, for example, single nucleotide mutations, deletions, and insertions. As for the deletions and insertions, the deletion and insertion sizes should be considered. Normally, Larger the deletion and insertion is, the more difficult it will be for NGS assay to interrogate. The validation plan also needs to define how many samples need to be tested. These are usually patient samples that have been tested in another CLIA certified lab.The number of samples that needs to be tested depends on multiple factors. One of them is a variation of the test results.More variation in the test results requires more samples.Generally speaking, guidelines indicate that if we want to be 95% confident for at least 95% reliability, the minimum number of samples needed is 5925.
The validation process evaluates analytical sensitivity,specificity, accuracy, precision, limit of detection, sequencing depth, and allelic frequency cut off.
Analytical sensitivity is a parameter that reflects the percentage of positive samples that are identified as true positive by a validated assay. Another relevant parameter is the limit of detection (LOD). LOD could be further defined as the lower limit of detection (LLOD), which reflects the lowest variant allele frequency (VAF) of a certain mutation at which 95% of the samples with such VAF would be reliably detected25,36. Normally, the lower the LLOD, the higher is the sensitivity. Analytical specificity is a parameter that reflects the percentage of negative samples that are identified as true negative by a validated assay. Accuracy is a parameter that reflects the concordance of the detected mutations and the expected mutations. Precision is a parameter that reflects the tendency of achieving the same results among different users and different runs25,36-38. Sequencing depth is defined as the number of reads of a certain targeted sequence. In an NGS assay, the sequencing depth for different targets may vary.Therefore, two different terms are introduced in describing the sequencing depth, namely, average depth and minimal depth. The average depth reflects the overall sequencing depth for a certain assay. The minimal depth indicates the lowest sequencing depth required to obtain reliable results for all the intended targets. The allelic frequency cut off is obtained during the validation process, which should allow the detection of real variants and exclusion of false ‘variants'due to a noisy background. AMP has issued a guideline for such validation process25.
The validation plan is usually carried out in a CLIA certified laboratory. The samples need to be tested by at least two different technologists and need to be tested at different runs to evaluate repeatability/reproducibility. The data will be analyzed through a certain bioinformatics pipeline. The results of data analysis are used to define the report reference range and the limit of detection. During validation, the quality control matrix also needs to be determined. For example, the quality and quantity of nucleic acid needed for the assay will be determined. The library quantitation and qualification will also be determined. Other quality control matrices including depth of coverage (average depth and minimal depth), uniformity of coverage, cluster density and alignment passing rate, base call quality scores, mapping quality, duplication rate, and strand bias will also be determined. A discussion on the details of these matrices is beyond the scope of this article. These quality control matrices will ensure the quality of the test results25,36-38. The quality control matrix data should be included in the validation document. The guidelines usually recommend the following components to be included in the documentation:purpose of the clinical test, acceptable clinical sample types,rationale for the inclusion of specific genes, methodological approach, types and sources of reagents and testing instrumentation, software of bioinformatics analysis used for data processing and analysis, detailed step by step testing procedure (SOP), validation sample description,optimization and familiarization results, validation results,performance characteristics, assay acceptance and rejection criteria, assay limitations, quality control/assurance matrix,and other information as the medical director deems necessary25.
The interpretation of NGS testing results is challenging.NGS testing may identify many variants. The clinical significance of these variants may vary. For some variants, the clinical significance is well known. However, for some other variants, the clinical significance is not known or not certain since these variants might have not been reported before or have only been reported rarely. The method of interpretation of NGS results is a hot topic in the discussion. A reasonable approach may consist of several steps. The first step is to follow the current guidelines. For example, in the case of tumor NGS testing, the National Comprehensive Cancer Network (NCCN) guidelines can be used for interpretation.Other relevant guidelines can also be used. The guidelines usually provide a list of genes and mutations that could be clinically significant for diagnosis, prognosis, molecular target therapies, and immunotherapies. The second step is to stay updated with the current scientific discoveries. Since new discovery emerges frequently in the gene variant detection, it is necessary to review the current literature while evaluating the clinical significance of the variants/mutations detected in the NGS test. There are also other resources, for example, the databases that are often freely available on the Internet (http://www.cbioportal.org/, http://www.uniprot.org/, https://www.ncbi.nlm.nih.gov/SNP/, http://evs.gs.washington.edu/EVS/, http://exac.broadinstitute.org/gene/ENSG00000171456, http://cancer.sanger.ac.uk/cosmic/,https://civicdb.org/home, https://www.ncbi.nlm.nih.gov/clinvar/, etc.). These databases contain a huge amount of data related to the features and biological/clinical significance of the variants/mutations.These databases often provide links for inter-database reference. Since no database is perfect, caution needs to be taken while using information from any databases.
After an NGS assay starts in a laboratory, the quality matrix should be continuously monitored. The test results should also be continuously monitored in the context of the clinical setting and by doing biannual proficiency tests to ensure the reliability of the test results.
NGS clinical application is not limited to diagnosis. It is also widely used in identifying mutation targets for targeting therapy and in identifying a high-risk population for certain hereditary cancers. Over the past few years, numerous molecular targeting drugs have been developed and there are more to come39. For example, the genes associated with melanoma therapy could include, but not limited to, BRAF,KIT, NRAS, NF1, GNAQ, CDK4, MITF and PD-140. The genes associated with lung cancer therapy could include, but not limited to, EGFR, KRAS, BRAF, NRAS, PIK3ca, ROS1,MEK, VEGFA, ALK, MET, ERBB2, and ERBB427,41.Therefore, there is a need to identify different mutation targets for targeted therapy. NGS assay is well fit for such tasks. NGS assay has important applications in hereditary cancer diagnosis or risk population assessment. Since the variant allelic frequencies of hereditary cancers are usually around 50% or 100%, the detection of such genetic changes needs less depth of sequencing when compared to those in cancers. Therefore, more targeted genes can be included in a panel of a certain size. This is especially useful in assessing high risk population in hereditary cancers42,43, for example,BRCA1, BRCA2, CDH1, PTEN, TP53, STK11, PALB2, ATM,CHEK2, MUTYH, BARD1, MRE11A, NBN, RAD50, RAD51C,RAD51D, and NF1 are used for breast cancer42.
The clinical application of NGS also includes tests for tumor mutation burden and microsatellite instability, and variants/mutations from cell-free circulating DNA. The cellfree circulating DNA is sometimes called “liquid biopsy”44-49.It is known that many solid tumors shed their DNA, which may end up in the bloodstream or other body fluids. To a certain extent, such DNA can serve as representative samples for the primary tumors. Testing such DNA can generate information on mutations similar to that obtained from tissue biopsies. Therefore, sampling cell-free DNA is sometimes called “liquid biopsy.” The samples of “liquid biopsy” are usually plasma or other body fluids, which are usually easily accessible. This is especially useful for those tumors that are difficult or impossible to biopsy. However,“liquid biopsy” is not without challenges. Different tumors shed DNA differently. A certain tumor at different stage may shed DNA differently. Numerous “liquid biopsy” studies have been carried out for different tumors50-57. NGS assay has been used for liquid biopsy testing. For NGS assay in “liquid biopsy”, the key issue is the test sensitivity. Different methods can be used to improve NGS sensitivity. One of these is to reduce a noisy background. For example, a molecular bar code can be used to label the originally targeted molecules.Such a label can be used later to eliminate a noisy background in order to increase the sensitivity. NGS testing“liquid biopsy” can also be used in non-invasive prenatal testing58-60. There is no doubt that NGS testing is a powerful and revolutionary technology which offers a great contribution to personalized and precision medicine.
No potential conflicts of interest are disclosed.