Comparative Genome Analysis of Scutellaria baicalensis and Scutellaria barbata Reveals the Evolution of Active Flavonoid Biosynthesis

2020-03-04 09:21:00ZhichaoXuRanranGaoXiangdongPuRongXuJiyongWangSihaoZhengYanZengJunChenChunnianHeJingyuanSong
Genomics,Proteomics & Bioinformatics 2020年3期

Zhichao Xu,Ranran Gao,Xiangdong Pu,Rong Xu,2,Jiyong Wang,Sihao Zheng,Yan Zeng,Jun Chen,2,Chunnian He,2,Jingyuan Song,2,*

1 Key Lab of Chinese Medicine Resources Conservation,State Administration of Traditional Chinese Medicine of China,Institute of Medicinal Plant Development,Chinese Academy of Medical Sciences &Peking Union Medical College,Beijing 100193,China

2 Engineering Research Center of Chinese Medicine Resource,Ministry of Education,Beijing 100193,China

3 China National Traditional Chinese Medicine Co.,Ltd,Beijing 102600,China

Abstract Scutellaria baicalensis (S.baicalensis) and Scutellaria barbata (S.barbata) are common medicinal plants of the Lamiaceae family.Both produce specific flavonoid compounds,including baicalein,scutellarein,norwogonin,and wogonin,as well as their glycosides,which exhibit antioxidant and antitumor activities.Here,we report chromosome-level genome assemblies of S.baicalensis and S.barbata with quantitative chromosomal variation (2n=18 and 2n=26,respectively).The divergence of S.baicalensis and S.barbata occurred far earlier than previously reported,and a whole-genome duplication (WGD) event was identified.The insertion of long terminal repeat elements after speciation might be responsible for the observed chromosomal expansion and rearrangement.Comparative genome analysis of the congeneric species revealed the species-specific evolution of chrysin and apigenin biosynthetic genes,such as the S.baicalensis-specific tandem duplication of genes encoding phenylalanine ammonia lyase and chalcone synthase,and the S.barbata-specific duplication of genes encoding 4-CoA ligase.In addition,the paralogous duplication,colinearity,and expression diversity of CYP82D subfamily members revealed the functional divergence of genes encoding flavone hydroxylase between S.baicalensis and S.barbata.Analyzing these Scutellaria genomes reveals the common and species-specific evolution of flavone biosynthetic genes.Thus,these findings would facilitate the development of molecular breeding and studies of biosynthesis and regulation of bioactive compounds.

KEYWORDS Scutellaria;Whole-genome duplication;Flavonoid biosynthesis;Tandem duplication;Species-specific evolution

Introduction

Plant-specific flavonoids,including flavones,flavonols,anthocyanins,proanthocyanidins,and isoflavones,play important functions in plants.These functions include flower pigmentation,ultraviolet protection,and symbiotic nitrogen fixation[1-3].Flavonoid metabolites also have biological and pharmacological activities on human health,including antibacterial and antioxidant functions,and the treatment of cancer,inflammatory,and cardiovascular diseases[3].The genusScutellaria,which belongs to the Lamiaceae family,consists of common herbal plants enriched in bioactive flavonoids.Approximately 300-360Scutellariaspecies have the characteristic flower form of upper and lower lips[4,5].Nonetheless,only twoScutellariaspecies,S.baicalensisandS.barbata,are recorded in the Pharmacopoeia of the People’s Republic of China.The roots ofS.baicalensisand dried herbs ofS.barbataare the basis of the traditional Chinese medicine (TCM)Huang QinandBan Zhi Lian,respectively,which have been used as heat-clearing and detoxifying herbs for thousands of years[6].The main biologically active compounds inScutellariaare derivatives of chrysin and apigenin,such as baicalein,scutellarein,and wogonin,as well as their glycosides,which include baicalin,scutellarin,and wogonoside [7-10].The demonstration that baicalin activates carnitine palmitoyltransferase 1 in the treatment of diet-induced obesity and hepatic steatosis[11,12]has generated extensive interest in the potential antilipemic effect of this compound.

Illuminating the chemodiversity and biosynthesis of the active constituents ofScutellariawill provide a foundation for investigating the use ofHuang QinandBan Zhi Lianin TCM,and the production of these natural products via synthetic biology [13].InS.baicalensis,the biosynthetic genes of the root-specific compounds baicalein and norwogonin have been functionally identified,providing an important basis for studies of the biosynthesis and regulation of the natural products [14,15].Recently,thein vivoproduction of baicalein and scutellarein inEscherichia coliandSaccharomyces cerevisiaewas achieved based on the guidance of synthetic biology[16,17].However,the discovery and optimization of biological components are important limitations to the metabolic engineering of these compounds.Salvia miltiorrhiza(Lamiaceae family)genome has provided useful information on secondary metabolism for the rapid functional identification of biosynthetic and regulatory genes [18-23].In contrast,the genomes of theScutellariagenus remain unclear,and the reliance on transcriptome data from short-read sequencing has restricted gene discovery and analyses of genome evolution,including studies of gene family expansion and contraction,evolution of biosynthetic genes,and identification of regulatory elements[24].

Morphological differences are present at the macroscopic level betweenS.baicalensisandS.barbata.Differentiation of these species are characterized mainly by the fleshy rhizome and branched stem ofS.baicalensisand the fibrous root and erect stem ofS.barbata(Figure 1A).The active compounds baicalein,wogonin,and scutellarein are differentially distributed in the roots and aerial parts ofS.baicalensisandS.barbata.Here,we performedde novosequencing and assembly of theS.baicalensisandS.barbatagenomes using a longread strategy and high-through chromosome conformation capture (Hi-C) technology.The chromosome-level genomes ofS.baicalensisandS.barbatarevealed their divergence time,chromosomal rearrangement and expansion,whole-genome duplication (WGD),and evolutionary diversity of flavonoid biosynthesis.The data provided important insights for the molecular assisted breeding of important TCM resources,genome editing,and increased understanding of the molecular mechanisms of the chemodiversity of active compounds.

Results and discussion

High-quality assembly of two Scutellaria genomes

The size of theS.baicalensisgenome was predicted to be 440.2 ± 10 Mb and 441.9 Mb by using the flow cytometry and the 21k-mer distribution analysis (approximately 0.96%heterozygosity),respectively (Figure S1).The genome survey ofS.barbatarevealed a 404.6 Mb genome size and 0.28%heterozygosity via the 21k-mer distribution analysis(Figure S1).Third-generation sequencing platforms,including PacBio and Oxford Nanopore technologies,have been confirmed to have important advantages inde novoassembly and in processing data with complex structural variation due to high heterozygosity and repeat content [25-27].Thus,52.1 Gb Oxford Nanopore technology (ONT) reads (~120×)with an N50 of 16.3 kb fromS.baicalensisand 51.7 Gb single molecule,real-time sequencing(SMRT)reads from the PacBio platform (~130×)with an N50 of 9.8 kb fromS.barbatawere produced to assemble highly contiguous genomes (Table S1).The low-quality long reads were further corrected and trimmed to yield 20.2 Gb ONT reads with an N50 of 35.5 kb fromS.baicalensisand 18.0 Gb SMRT reads with an N50 of 15.3 kb fromS.barbatausing the Canu pipeline.

The contiguous assembly of theS.baicalensisandS.barbatagenomes was performed using the optimized SMARTdenovo and 3× Pilon polishing (50× Illumina reads)packages.ForS.baicalensis,the contig-level genome assembly,which was 377.0 Mb in length with an N50 of 2.1 Mb and a maximum contig length of 9.7 Mb,covered 85.3% of the estimated genome size(Table S2).TheS.baicalensisgenome identified 91.5% of the complete Benchmarking Universal Single-Copy Orthologs (BUSCO) gene models and had an 88.7%DNA mapping rate,suggesting a high-quality genome assembly.ForS.barbata,the contiguous contig assembly of 353.0 Mb with an N50 of 2.5 Mb and a maximum contig of 10.5 Mb covered 87.2% of the predicted genome size(Table S2).TheS.barbatagenome identified 93.0% of the complete BUSCO gene models and had a 95.0% DNA mapping rate.The high-quality genome assemblies ofS.baicalensisandS.barbatashowed the great advantage of single molecule sequencing,with assembly metrics that were far better than those of other reported genomes of Lamiaceae species,i.e.,S.miltiorrhiza[22]andMentha longifolia[28].

Given the assembly continuity,with a contig N50 of over 2 Mb for theS.baicalensisandS.barbatagenomes,Hi-C technology was applied to construct chromosome-level genomes[29].In total,99.8%and 98.8%of the assembledS.baicalensisandS.barbatacontigs were corrected and anchored to 9 and 13 pseudochromosomes (2n=18 forS.baicalensis,2n=26 forS.barbata) using a Hi-C interaction matrix with N50 values of 40.8 Mb and 23.7 Mb,respectively.The strong signal along the diagonal of interactions between proximal regions suggested the high-quality of the Hi-C assemblies for theS.baicalensisandS.barbatagenomes (Figure S2).

TheS.baicalensisgenome comprised 33,414 protein-coding genes and 2833 noncoding RNAs(ncRNA).For theS.barbatagenome,41,697 genes and 1768 ncRNAs were annotated(Table S3).Consistent with the genome assembly quality assessment,orthologs of 93.2% and 94.3% of the eukaryotic BUSCOs were identified in theS.baicalensisandS.barbatagene sets,respectively,suggesting the completeness of the genome annotation (Table S3).The gene-based synteny betweenS.baicalensisandS.barbatashowed chromosome number variation and structural rearrangement (Figure 1B,Figure S3,Table S4).In addition,the alignment at the DNA sequence level also showed the large-scale structural variations between theS.baicalensisandS.barbatagenomes (Figure S4).

Chromosome rearrangements and expansion after speciation

Transposable elements (TEs) accounted for approximately 55.2% (208,004,279) and 53.5% (188,790,851) of theS.baicalensisandS.barbatagenomes,respectively (Tables S5 and S6).Furthermore,57.6% and 59.9% of these TEs were long terminal repeat (LTR) elements,respectively.We identified 1225 and 1654 full-length LTR elements,includingGypsy(342 and 310) andCopia(354 and 618) elements,in theS.baicalensisandS.barbatagenomes,respectively(Table S7).However,there were differences in the insertion time of LTR elements,i.e.,the LTRs (1.41 million years ago;MYA) inS.baicalensisare more ancient than those inS.barbata(0.88 MYA),assuming a mutation rate of μ=1.3×10-8(per bp per year) (Figure S5,Table S7).The recent insertion and activation of LTRs might be key factors in the generation of chromosome rearrangements and expansion ofS.barbata[30,31].The ribosomal RNAs (rRNAs) and simple sequence repeats (SSRs) were further annotated (Tables S8 and S9).A total of 142,951 and 147,705 SSRs were annotated inS.baicalensisandS.barbata,respectively.They will provide useful molecular markers for breeding and genetic diversity studies.

Figure 1 Genome colinearity reveals the chromosome rearrangement in Scutellaria

A genome-wide high-resolution Hi-C interaction analysis ofS.baicalensisandS.barbatawas performed to characterize the architectural features of folded eukaryotic chromatin,including interchromosomal interactions,compendium of chromosomal territories,and A/B compartments [32-34].First,159×and 173×Hi-C sequencing reads were uniquely mapped(49.6% and 59.0%) to theS.baicalensisandS.barbatareference genomes,respectively.Then,84.8 and 113.1 million valid interaction pairs were obtained to construct the matrix of interactions among 100 kb binned genomic regions across all 9S.baicalensischromosomes and 13S.barbatachromosomes.The whole-chromosome interactions ofS.baicalensisindicated that chr5 and chr9 had a closer association than the other chromosome pairs.InS.baicalensis,the chromosome set including chr2,chr3,and chr8 showed enrichment and association with each other,and depletion with other interchromosomal sets,implying that these three chromosomes were mutually closer in space than the other chromosomes(Figure S6).InS.barbata,the chromosomal territories of chr4,chr5,and chr9 displayed mutual interactions and occupied an independent region in the nucleus (Figure S7).

As the secondary major structural unit of chromatin packing inS.baicalensisandS.barbata,the A/B compartments representing open and closed chromatin,respectively,were characterized according to an eigenvector analysis of the genome contact matrix.Similarly,more than half of the assembledS.baicalensisandS.barbatagenomes(53.2%and 52.0%)were identified as the A compartment in the leaf tissue.As expected,the TE density in the A compartment was dramatically lower than that in the B compartment (P<0.001),and the gene number per 100 kb was significantly higher in the A compartment (P<0.001) (Figures S5 and S6),indicating a positive correlation between the A compartment and transcriptional activity or other functional measures [32,34].

Shared WGD events in Lamiaceae

Conserved sequences,including orthologs and paralogs,can be used to deduce evolutionary history based on whole-genome comparisons.Here,orthologous groups of amino acid sequences from 11 angiosperms were identified,yielding a total of 19,479 orthologous groups that covered 291,192 genes.Among these,120,459 genes clustering in 6837 groups were conserved in all examined plants.Computational analysis of gene family evolution (CAFE) showed that 1180 and 1853 gene families were expanded in theS.baicalensisandS.barbatalineages,respectively,while 1599 and 1632 gene families were contracted,respectively (Figure S8,Table S10).Functional exploration ofScutellaria-specific genes indicated that domains related to secondary metabolite biosynthesis,such as transcription factors,cytochrome P450s,and Omethyltransferase were markedly enriched.

In addition,235 single-copy genes were identified in all tested plants.They were used to construct a phylogenetic tree,which indicated that these twoScutellariaspecies were most closely related toS.miltiorrhizawith an estimated divergence time of 41.01 MYA.S.baicalensisandS.barbatawere grouped into one branch,with an estimated divergence time of approximately 13.28 MYA(Figure 2A).The phylogenetic tree also supported the close relationship between Lamiaceae (S.baicalensis,S.barbata,andS.miltiorrhiza) and Pedaliaceae (Sesamum indicum) with a divergence time of approximately 49.90 MYA(Figure 2A)[35].Previous research reported that the divergence time ofS.baicalensisandS.barbatabased on thematK andCHS(chalcone synthase)genes was approximately 3.35 MYA[36].However,a genome-wide analysis identified 8 and 3CHSgenes inS.baicalensisandS.barbata,respectively.The expansion and evolution ofCHSnegatively impacted the estimation of diversification history between theseScutellariaspecies.

Based on sequence homology,17,265 orthologous gene pairs with synteny were identified between theS.baicalensisandS.barbatagenomes.The distribution of synonymous substitution rates(Ks)peaked at approximately 0.16,representing the speciation time ofS.baicalensisandS.barbata(Figure 2B,Table S11).The meanKsvalues of orthologous gene pairs with synteny and the divergence time amongS.baicalensis,S.barbata,S.miltiorrhiza,S.indicum,andVitis vinifera[37]revealed the estimated synonymous substitutions per site per year as 1.30 × 10-8for the test species (Table S11).In total,7812,7168,6984,and 7711 paralogous gene pairs were identified,and the distribution ofKsvalues peaked at approximately 0.87,0.86,1.02,and 0.67 inS.baicalensis,S.barbata,S.miltiorrhiza,andS.indicum,respectively (Figure 2B,Table S11).Based on the phylogenetic analysis,the WGD event occurred prior to the divergence ofS.baicalensis,S.barbata,S.miltiorrhiza,andS.indicum.The divergence time of the Lamiaceae and Pedaliaceae shared WGD event was determined to be approximately 46.24-60.71 MYA (Table S11).The distribution of theKsvalues of paralogous genes showed that no WGD events have occurred since the divergence ofS.miltiorrhiza,S.baicalensis,andS.barbata.Comparison of theS.baicalensisandS.barbatagenomes with an ancestral eudicot karyotype(AEK)genome[38]and with theV.viniferagenome,also supported the structural rearrangement between theS.baicalensisandS.barbatagenomes,and the shared WGD event after whole-genome triplication (WGT)-γ event ofV.vinifera(Figure 2C,Figure S9).The genome syntenic analysis indicated two copies of syntenic blocks from Lamiaceae and Pedaliaceae species per correspondingV.viniferablock,which confirmed the recent WGD event before the divergence ofS.baicalensis,S.barbata,andS.indicum(Figure S10).

Organ-specific localization of bioactive compounds

Baicalein,scutellarein,norwogonin,wogonin,and their glycosides (baicalin,scutellarin,norwogonoside,and wogonoside)are the main bioactive compounds inS.baicalensisandS.barbata.We collected samples from the root,stem,leaf,and flower tissues ofS.baicalensisandS.barbatato detect the accumulation of active compounds.Baicalein,norwogonin,wogonin,baicalin,norwogonoside,and wogonoside accumulated mainly in the roots ofS.baicalensisandS.barbata,while scutellarin was distributed in the aerial parts (stem,leaf,and flower) of these species (Figure 1C,Figure S11,Table S12),providing a potential basis for the co-expression analysis of biosynthetic genes [23].

Figure 2 Shared WGD events of Lamiaceae and Pedaliaceae

Transcriptome analysis of these four tissues fromS.baicalensisandS.barbataincluded calculation of the fragments per kilobase of exon model per million reads mapped(FPKM) values of 39,121 and 47,200 genes,respectively.Among them,31.5% (12,320) and 40.6% (19,153) of the transcripts were not expressed (FPKM <1) in any of the tested tissues.Based onk-means clustering,all the expressed genes fromS.baicalensisandS.barbatawere classified into 48 clusters(Figures S12 and S13).The expression levels of 3421 genes from clusters 8,20,32,33,34,39,and 47 inS.baicalensis,and 3675 genes from clusters 2,4,21,25,27,31,and 40 inS.barbatawere higher in the roots than in the other organs.The biosynthetic genes involved in the synthesis ofScutellariaspecific flavones and glycosides,containing genes encoding chalcone synthase,chalcone isomerase,CYP450s,O-methyltransferase,glycosyltransferase,and glycosyl hydrolases,were enriched,with high expression in the roots ofS.baicalensisandS.barbata(Tables S13 and S14).

Conserved evolution of the chrysin and apigenin biosynthesis in Scutellaria

The main active compounds in the medicinal plantsS.baicalensisandS.barbataare flavonoids.The chrysin biosynthetic genes inS.baicalensisencoding 4-CoA ligase (4CL),CHS,chalcone isomerase(CHI),and flavone synthase(FNSII)have been cloned and functionally identified[14].However,the gene locations,gene numbers,and evolution of this pathway in theS.baicalensisandS.barbatagenomes remain unclear.Here,we identified the same number of chrysin and apigenin biosynthetic genes in theS.baicalensisandS.barbatagenomes and determined the expression levels of these genes,including phenylalanine ammonia lyase (PAL,5 and 4),cinnamate 4-hydroxylase (C4H,3 and 4),4CL(9 and 14),CHS(8 and 3),CHI(1 and 1),andFNSII(3 and 3),in different tissues(Figure 3A and B,Tables S15 and S16).Eighteen orthologous gene pairs were found between theS.baicalensisandS.barbatagenomes,and theKa/Ksvalue(average 0.13)indicated purifying selection on flavone biosynthesis during evolution [39](Figure 3B,Table S17).ThePALandCHSgene numbers inS.baicalensiswere expanded compared to those inS.barbata.Conversely,a duplication event of4CLgenes inS.barbatawas found,suggesting that expansion via tandem duplication might have occurred after the separation of theseScutellariaspecies.TheKsvalues of 18 orthologous gene pairs ofS.baicalensisandS.barbatain the chrysin and apigenin biosynthetic pathways indicated that the specific expansion of theSbaiPAL(SbaiPAL1andSbaiPAL2),SbaiCHS(SbaiCHS2,SbaiCHS3,SbaiCHS4,andSbaiCHS5),andSbar4CL(Sbar4CL1-1,Sbar4CL1-2,Sbar4CL1-3,Sbar4CL1-4,Sbar4CLL9-2,andSbar4CLL9-3) genes had occurred via tandem duplication,after the speciation ofS.baicalensisandS.barbata(Figure 3C,Figure S14,Table S17).

Figure 3 Conserved flavonoid biosynthesis and species-specific gene expansion in Scutellaria

Sbai4CLL7andSbaiCHS1are reportedly related to the biosynthesis of specific 4′-deoxyflavones with cinnamic acid as a substrate inS.baicalensis[14].Compared toS.miltiorrhiza,the4CLL7genes from theScutellariagenus showed gene expansion,and the gene duplication ofSbai4CLL7-1andSbai4CLL7-2occurred before the speciation ofS.baicalensisandS.barbata(Figure S13).Sbai4CLL7-1andSbar4CLL7-1were not expressed in the tested transcriptomes,and the duplication of theScutellaria-specific4CLL7-2allowed the evolution of substrate preferences for the catalysis of cinnamic acid.The initial step and central hub for flavone biosynthesis is the catalysis of CHS.Hence,the expression ofCHSis required for the production of flavonoids,isoflavonoids,and other metabolites in plants [40].Here,we also detected the highest expression levels ofSbaiCHS1andSbarCHS1in all the tested samples.However,a recent expansion ofCHSgenes has occurred inS.baicalensis,and 4 additional paralogs ofSbaiCHS1(Sbai7C107T21) were observed in chr7.Duplications of theSbaiCHS2,SbaiCHS3,SbaiCHS4,andSbaiCHS5genes occurred after the speciation ofS.baicalensisandS.barbata(Figure 3C).The nucleotide and amino acid sequences ofSbaiCHS2andSbaiCHS3were identical,butSbaiCHS5contained a variant K316 deletion.The divergence ofSbaiCHS1andSbarCHS1occurred before the separation ofS.miltiorrhizaand theScutellariaspecies,suggesting a conserved function of CHS in flavone biosynthesis.In addition,the tandemly duplicatedSbaiCHS2-5genes were more highly expressed in the root ofS.baicalensisthan in other tissues(Figure 3B),suggesting that their species-specific evolution might be related to the biosynthesis of flavones and their glycosides,which are enriched in root.

C4H is responsible for the biosynthesis of coumaroyl-CoA,which might be the restrictive precursor of the 4′-hydroxyl group involved in scutellarein biosynthesis.SbaiC4H1andSbarC4H1were highly expressed in the stems,leaves,and flowers ofS.baicalensisandS.barbata(Figure 3B,Figure S14).The high expression levels of these genes were positively correlated with the distribution of scutellarin,which is biosynthesized in the aerial parts ofS.baicalensisandS.barbata(Figure 1C).

SbaiFNSII2 has been reported to catalyze the formation of chrysin inS.baicalensis[14],and the coding gene was highly expressed in the root and stem.Its ortholog,SbarFNSII2,was also highly expressed in the root ofS.barbata.A genome colinearity analysis identified 566 orthologous gene pairs covering a region approximately 6 Mb in length across chr3 ofS.baicalensisand chr13 ofS.barbata,including the tandem duplication ofSbaiFNSII1-SbaiFNSII2andSbarFNSII1-SbarFNSII2.This duplication occurred before the speciation ofS.baicalensisandS.barbata(Figure S14).The majority of theFNSIIregion (approximately 85%) inS.baicalensisandS.barbatawas assigned to the A compartment,indicating high transcriptional activity.The genome synteny of theFNSIIregion betweenS.baicalensisandS.barbatasuggested the conserved evolution of flavone synthase.

Functional divergence of flavone hydroxylase genes between S.baicalensis and S.barbata

CYP450 superfamily members,such as C4H(CYP73A family),FNSII (CYP93B family),flavone 6-hydroxylase (F6H,CYP82D family),and flavone 8-hydroxylase (F8H,CYP82D family),perform key modifications in flavone biosynthesis.SbaiCYP82D1 has been reported to have 6-hydroxylase activity on chrysin and apigenin to produce baicalein and scutellarein,respectively,and SbaiCYP82D2 can catalyze chrysin to norwogonin inS.baicalensis[15](Figure S15).Here,we identified 418 and 398CYP450gene members,and 17 and 24 physical clusters ofCYP450s(5 gene clusters per 500 kb) in theS.baicalensisandS.barbatagenomes,respectively(Figures S16 and S17),suggesting a high frequency ofCYPgene tandem duplication.Among them,18CYP82Dmembers containingSbaiCYP82D1-9andSbarCYP82D1-9were identified in theS.baicalensisandS.barbatagenomes.These genes might be responsible for the hydroxylation of chrysin and apigenin(Table S18).Consistent with a previous report,high expression ofSbaiCYP82D1andSbaiCYP82D2in the root ofS.baicalen-siswas detected,in accordance with the accumulation of baicalein,wogonin,and their glycosides (Figure 4A).However,SbarCYP82D1showed relatively high expression in stems and leaves,andSbarCYP82D2showed extremely low expression in all tissues ofS.barbata,in contrast to the distributions of active flavones,suggesting a potential functional divergence of hydroxylation betweenS.baicalensisandS.barbata.

Three-gene tandem duplications ofSbaiCYP82D1-SbaiCYP82D7-SbaiCYP82D8andSbarCYP82D1-SbarCYP82D6-SbarCYP82D8(physical distance <30 kb)on chr6 ofS.baicalensisandS.barbatawere identified(Figure 4B).According to the 150 kb colinearity analysis,11 orthologous gene pairs,includingCYP82D8fromS.baicalensisandS.barbata,presented conserved evolution.The phylogenetic analysis andKsvalues of orthologous gene pairs indicated that the duplication ofSbarCYP82D8andSbar-CYP82D6occurred after the speciation ofS.barbata(Table S19).However,duplication ofSbaiCYP82D8andSbai-CYP82D7occurred before the divergence ofS.baicalensisandS.barbata(Figure 4D,Figure S18).This contradiction and evolutionary divergence support the following proposed hypothesis,which features three duplications.The first duplication ofCYP82D8produced the newCYP82D1,and the duplication event occurred around WGD event.The second duplication ofCYP82D8generated the newCYP82D7,similar to the tandem duplication ofSbaiCYP82D8-SbaiCYP82D7-SbaiCYP82D1inS.baicalensis.After speciation,the third duplication event ofSbarCYP82D8uniquely occurred in theS.barbatagenome and producedSbarCYP82D6;a recent gene transfer ofSbarCYP82D7via transposon from chr6 to chr3 inS.barbatawas predicted.An adjacent intact LTR/GypsyinSbarCYP82D7was identified,and its insertion time was estimated to be approximately 3.5 MYA.Given the evolution and high expression ofSbarCYP82D6andSbarCYP82D8,we speculate that these two genes might be responsible for the F6H function in chrysin and apigenin synthesisin vivoinS.barbata.

The chromosome location of F8H functional members showed thatSbaiCYP82D2,SbaiCYP82D3,SbaiCYP82D4,SbaiCYP82D5,SbaiCYP82D6,andSbaiCYP82D9were distributed on chr1 ofS.baicalensis,withSbarCYP82D2,SbarCYP82D3,SbarCYP82D4,SbarCYP82D5,andSbarCYP82D9located on chr7 ofS.barbata.The structural rearrangement of large segments between chr1 ofS.baicalensisand chr7 ofS.barbatawas found (Figure 4C,Figure S4).In addition,tandem duplications containing threeCYPgenes(SbaiCYP82D2-SbaiCYP82D3-SbaiCYP82D5andSbar-CYP82D3-SbarCYP82D2-SbarCYP82D4) were identified(Figure 4C).The orthologous gene pairsSbaiCYP82D2-Sbar-CYP82D2andSbaiCYP82D3-SbarCYP82D3presented high identity values of 90.11%and 83.72%,respectively.The duplications ofSbarCYP82D3-SbarCYP82D4,SbaiCYP82D4-SbaiCYP82D5,andSbaiCYP82D6-SbaiCYP82D9occurred after the speciation ofS.baicalensisandS.barbata(Table S19).However,the expression levels ofSbarCYP82D2,SbarCYP82D3,andSbarCYP82D4were low inS.barbata,indicating functional divergence following species-specific duplication events.In contrast,SbarCYP82D5andSbarCYP82D9were highly expressed in the root ofS.barbata,suggesting a potential F8H function in the biosynthesis of norwogonin.

Figure 4 Tandem repeat of flavone hydroxylase genes revealed the divergent evolution

Conclusion

We reported two chromosome-level genomes of the medicinal plantsS.baicalensisandS.barbatathrough the combination of second-generation sequencing (Illumina platform),thirdgeneration sequencing (PacBio and Oxford Nanopore platforms),and Hi-C technologies.This study confirmed and traced the divergence time ofS.baicalensisandS.barbata,which occurred at 13.28 MYA,far earlier than previously reported [36].Comparative genomic analysis revealed similar TE proportions in theS.baicalensisandS.barbatagenomes,while the recent LTR insertion inS.barbatamight be an important factor resulting in chromosomal rearrangement and expansion.A WGD event (approximately 42.64-60.71 MYA)shared amongS.baicalensis,S.barbata,S.miltiorrhiza,andS.indicum.The tandem duplication of paralogs after the speciation ofS.baicalensisandS.barbatamight be the most important contributor to the divergent evolution of flavonoid biosynthetic gene families,such asPAL,4CL,CHS,F6H,andF8H.A determination of the distribution of flavone content and transcriptome analysis supported the functional divergence of flavonoid biosynthetic genes betweenS.baicalensisandS.barbata.The two high-quality genomes reported in this study will enrich genome research in the Lamiaceae and provide important insights for studies of breeding,evolution,chemodiversity,and genome editing.

Materials and methods

Plant materials

S.baicalensisandS.barbataplants were cultivated in the experimental field(40°N and 116°E)of the Institute of Medicinal Plant Development,Chinese Academy of Medical Sciences &Peking Union Medical College,Beijing,China.Four independent tissues (root,stem,leaf,and flower) fromS.baicalensisandS.barbatawere collected in three replicates.These tissues were used separately for the measurement of active compounds and RNA-seq.High-quality DNA extracted from young leaves was used to construct libraries for Illumina,ONT,and Sequel sequencing.

Long-read sequencing and assemblies

The high-molecular-weight (HMW) genomic DNA ofS.baicalensisandS.barbatawas extracted as described for megabase-sized DNA preparation [41].HMW genomic DNA fragments (>20 kb) were selected using BluePippin.Longread libraries were constructed following the protocols for the ONT(https://nanoporetech.com/)and PacBio Sequel platforms (https://www.pacb.com/).The ONT reads ofS.baicalensiswere generated using the ONT GridION X5 platform,and the library ofS.barbatawas sequenced using the Sequel platform.The raw ONT and SMRT reads were filtered via MinKNOW and SMRT Link,respectively.First,Canu(v1.7) was used to correct and trim the long reads from the ONT and Sequel platforms with the default parameters [42].The corrected and trimmed ONT and SMRT reads were assembled using SMARTdenovo (https://github.com/ruanjue/smartdenovo).Finally,Illumina short reads were used to polish the assembled contigs three times using Pilon(v1.22).The quality of the genome assemblies was estimated by a search using BUSCO (v2.0) [43]and by mapping Illumina reads from the DNA and RNA libraries to the assembled genomes.

Chromosome construction using Hi-C

Young leaves fromS.baicalensisandS.barbatawere fixed and crosslinked,and Hi-C libraries were constructed and sequenced using Illumina as previously described [32,33].The short reads were mapped to the assembled genome using BWA [44],and the valid interaction pairs were selected using HiC-Pro[45].The draft assemblies ofS.baicalensisandS.barbatawere anchored to chromosomes (2n=18 and 2n=26,respectively)using LACHESIS with the following parameters:cluster min re sites=62,cluster max link density=2,cluster noninformative ratio=2,order min n res in turn=53,order min n res in shreds=52 [29].

Genome annotation

The RepeatModeler (v1.0.9) package,including RECON and RepeatScout,was used to identify and classify the repeat elements of theS.baicalensisandS.barbatagenomes.The repeat elements were then masked by RepeatMasker (v4.0.6).The long terminal repeat retrotransposons (LTR-RTs) inS.baicalensisandS.barbatawere identified using LTR_Finder(v1.0.6) and LTR_retriever.Twenty-four samples from a total of eight differentS.baicalensisandS.barbatatissues (root,stem,leave,and flower) were subjected to RNA-seq using the Illumina NovaSeq platform.The clean reads fromS.baicalensisandS.barbatawerede novoassembled using Trinity (v2.2.0),and the coding regions in the assembled transcripts were predicted using TransDecoder (v2.1.0).The gene annotation of the maskedS.baicalensisandS.barbatagenomes wasab initiopredicted using the MAKER (v2.31.9)pipeline,integrating the assembled transcripts and protein sequences fromS.baicalensis,S.barbata,andArabidopsis thaliana[46].Noncoding RNAs and miRNAs were annotated by alignment to the Rfam and miRNA databases using INFERNAL (v1.1.2) and BLASTN,respectively.RNA-seq reads from differentS.baicalensisandS.barbatatissues were mapped to the masked genome using HISAT2(v2.0.5),and the different expression levels of the annotated genes were calculated using Cuff links (v2.2.1) [47].

Genome evolution analysis

The full amino acid sequences ofS.baicalensis,S.barbata,and nine other angiosperms were aligned to orthologous groups using OrthoFinder [48].The basal angiospermAmborella trichopodawas chosen as the outgroup.Single-copy genes were used to construct a phylogenetic tree using the RAxML package with PROTGAMMAJTT model and 1000 replicates(v8.1.13).The divergence time among 11 plants was predicted using r8s program based on the estimated divergence timeA.trichopoda-V.vinifera(173-199 MYA) andPopulus trichocarpa-A.thaliana(98-117 MYA).According to the phylogenetic analysis and divergence time,expansion and contraction of the gene families were identified using CAFE(v3.1) [49].The paralogous and orthologous gene pairs fromS.baicalensis,S.barbata,andS.miltiorrhizawere identified,and theKa,Ks,andKa/Ksvalues ofS.baicalensis-S.baicalensis,S.barbata-S.barbata,S.miltiorrhiza-S.miltiorrhiza,S.baicalensis-S.miltiorrhiza,S.baicalensis-S.barbata,andS.barbata-S.miltiorrhizawere calculated using the SynMap2 and DAGchainer method of the CoGE Comparative Genomics Platform (https://genomevolution.org/coge/).The detection of synteny and colinearity amongS.baicalensis,S.barbata,andS.miltiorrhizawas performed using MCscan X(v1.1) [50].The WGT-γ event in core eudicots,WGT event inS.tuberosum,and WGD events inS.indicum,A.thaliana,P.trichocarpa,Oryze sativa,andBrachypodium distachyon[51]were marked in our phylogenetic tree.

Identification of gene families related to flavone biosynthesis

The protein sequences of thePAL,4CL,C4H,CHS,CHI,andFNSIIgene family members inA.thalianawere downloaded from the TAIR database,and those forF6HandF8HinS.baicalensiswere obtained from a previous study [15].These sequences were searched against theS.baicalensisandS.barbataprotein sequences using BLASTP with anEvalue cutoff of 1E-10.The conserved domains of the protein sequences of candidate genes were further searched in the Pfam database using hidden Markov models [52].Full-length protein sequences were used to construct phylogenetic trees using the maximum likelihood method with the Jones-Taylor-Thornton model and 1000 bootstrap replicates[53].A detailed description of some materials and methods used is provided in the supplementary methods and results.

Data availability

The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive[54]at the National Genomics Data Center,Beijing Institute of Genomics,Chinese Academy of Sciences/China National Center for Bioinformation (GSA:CRA001730) that are publicly accessible at http://bigd.big.ac.cn/gsa.The assembled genomes and gene structures were submitted to the Comparative Genomics Platform(CoGe:54175 forS.baicalensisand 54176 forS.barbata)that are publicly accessible at https://genomevolution.org/coge/.The assembled genomes and gene structures have also been deposited in the Genome Warehouse (GWH:GWHAOTO00000000 forS.baicalensisand GWHAOTP00000000 forS.barbata),which are publicly accessible at https://bigd.big.ac.cn/gwh.

CRediT author statement

Zhichao Xu:Formal analysis,Writing-original draft,Writing-review &editing,Visualization,Project administration,Funding acquisition.Ranran Gao:Formal analysis,Writing -original draft,Writing -review &editing,Visualization.Xiangdong Pu:Formal analysis.Rong Xu:Resources.Jiyong Wang:Resources.Sihao Zheng:Resources.Yan Zeng:Resources.Jun Chen:Resources.Chunnian He:Validation.Jingyuan Song:Supervision,Project administration,Funding acquisition.All authors read and approved the final manuscript.

Competing interests

The authors have declared no competing interests.

Acknowledgments

This work was supported by the National Key R&D Program of China(Grant No.2019YFC1711100),the National Natural Science Foundation of China (Grant No.31700264),and the Chinese Academy of Medical Sciences (CAMS) Innovation Fund for Medical Sciences (CIFMS) (Grant No.2016-I2M-3-016).

Supplementary material

Supplementary data to this article can be found online at https://doi.org/10.1016/j.gpb.2020.06.002.

ORCID

0000-0003-1753-5602 (Zhichao Xu)

0000-0002-7164-0289 (Ranran Gao)

0000-0002-0892-035X (Xiangdong Pu)

0000-0002-0891-3686 (Rong Xu)

0000-0002-0673-1803 (Jiyong Wang)

0000-0002-0523-7176 (Sihao Zheng)

0000-0003-4128-5236 (Yan Zeng)

0000-0001-7270-7361 (Jun Chen)

0000-0003-3659-7833 (Chunnian He)

0000-0003-2733-0416 (Jingyuan Song)