Wang Zhi-hao,Jin Hui-hui,Chen Qing-shan,and Zhu Rong-sheng
1 College of Life Sciences,Northeast Agricultural University,Harbin 150030,China
2 College of Agriculture,Northeast Agricultural University,Harbin 150030,China
3 College of Science,Northeast Agricultural University,Harbin 150030,China
MicroRNAs are a class of non-coding small RNAs that act as important regulators after transcription.Their length is about 19-22 nt and their function is degrading or suppressing target genes (Wightman et al.,1993;Reinhart et al.,2000;Bartel,2009;Voinnet,2009).The first animal miRNA was found in Caenorhabditis elegan in 1993 (Lee,1993).Until 2002,the first plant miRNA was found by several American scientists.Dezulian et al.(2006) found 37 miRNAs in soybean in 2005.With the next sequencing technique developing,there are 28 645 miRNAs and 573 miRNAs among them are soybean's in miRBase version 21 (Griffiths-Jones,2004).
At present,lots of researches demonstrate that plant miRNAs regulate meristem function,cell division,organ polarity,root growth and nodule.Zhu et al.(2011) found that MIR166 is relevant with the growth of the apical meristem.Wong et al.(2014) found MIR166 associated with soybean phytophthora root rot resistance.Besides,MIR166 family is a high conserved gene family across plant families,which exists in 46 species (Griffiths-Jones,2004),indicating their ancient evolutionary origins.Soybeans experience three whole-genome multiplication events during evolution process (Severin et al.,2011).The first event is a hexaploid event or wholegenome triplication (WGT) between 130 and 240 million years ago (Jaillon,2007).The second event is the legume WGD (~58 Mya).The third event is Glycine WGD (~13 Mya) (Schlueter et al.,2007;Schmutz et al.,2010).In this paper,we studied the origin of soybean MIR166 gene family under three genome multiplication events in order to deepen understanding the function and evolution of plant miRNA gene.
Soybean (Glycine max) genome data were fetched from database (www://www.phytozome.net/) (Goodstein et al.,2012) and MIR166 gene family basic information was queried and downloaded from miRBase (Griffiths-Jones,2004).
Based on Glycine max v1.0 data and MCScanX software,we calculated and obtained the total collinear block.GAP penalty of MCScanX (Wang et al.,2012) software was set as 1;a complete collinear block needed to be matched by at least five gene pairs;defaulted E-value was 1e-05;the largest number of GAP was set as 25.
Mega 6.0 was used to construct phylogenetic tree based on MIR166 hairpin sequence.NJ method (neighborjoining method) and 1 000 bootstrap were chosen.
The legume family (Fabaceae) was one of the largest and most diverse plant families and was considered to originate in the tropics about 65-70 Mya (Lewis,2005).They were generally divided into three subfamilies: Mimosoideae,Caesalpinoideae,and Papilionoideae (Bertioli et al.,2009).Most agricultural important species fell within two Papilionoid clades that diverged in 50 Mya (Bertioli et al.,2009).Plant genome double events were a widespread phenomenon in their evolution history.According to the average values of Ks,based on homologous segments analysis for soybean,there were three times genome replications including WGT (about 130 million years ago),Legume WGD (59 million years ago) and Glycine WGD (13 million years ago).WGT event happened before the rosids plants occurred and thas was a triplication (Nozawa et al.,2010);at the beginning of the leguminous plants to form,a whole genome duplication event happened in that period and this was a duplication;recent event happened around 13 Mya in the process of soybean lineage forming and this was a duplication.Multiplication events resulted in 75% of the soybean genes that possessed many copies (Voinnet,2009).
Because miRNAs were non-coding small RNA,there was no corresponding Ks value to them,so we adopted the mean Ks value of conserved protein-coding gene that located at the neighboring area of miRNA modules to represent non-coding miRNAs gene Ks value.In that way,the duplication time of miRNAs in the collinear block (Tang et al.,2008) was consistent with block average Ks value.Amino acid alignments of proteincoding genes were carried out by ClustalW (Thompson et al.,2002) and block analysis of genes was implemented by MCScanX software (Wang et al.,2012).Finally,Ka and Ks values of the collinear genes were calculated.Ks value of soybean's whole genome ranged from 0 to 2.0.Events that Ks of blocks was greater than 2.0 demonstrated that the saturation phenomenon existed in relevant blocks.The average Ks of the duplication block was used to calculate double time and the formula was: T=Ks/2E (Bengtsson and Uyenoyama,1990),T meant double time,and E represented molecular substitution rate.Fossil evidence and molecular evolution analysis results showed that soybean molecular substitution rate was 1/5 of the Arabidopsis thaliana,and Arabidopsis thaliana E value was 1.5×10-8replacement per synonymous substitution sites per year.Then soybean clock-like rates (E) of synonymous substitution was 6.1×10-9(Yin et al.,2013).
Despite their unusual small sizes of length,the evolutionary history of miRNA gene families seemed to be similar to their protein-coding counterparts.In plants,several possible mechanisms were applied to miRNA genes.They were highly similar suggesting that formations of recent expansion could be tandem gene duplication and segmental duplication events (Li and Mao,2007).According to Ks values,gene families could be classified.Based on Ks value of genes,soybean genes could be divided into three intervals,recent genome duplication (Ks of genes between 0 and 0.3),the second WGD (Ks of genes between 0.3 and 1.5) and remote triplication genome event (Ks of genes greater than 1.5) (Nozawa et al.,2012).The triplication genome event was related to monocot and dicot plant (Lewis,2005).The expansion of plant non-coding genes was as similar as coding gene expansion model.MiRNA genes originated from reverse copying of the target genes or random sequence's steady accumulation of mutations to form functional non-coding genes (Allen et al.,2004).The expansion of miRNA gene families basically had following several ways: the whole gene duplication,large fragment duplication and reverse transcription translocation and tandem repeat events (Zhou et al.,2013).Large segment replications were both positive direction and opposite direction.For confirming the type of duplication module,we defined,if different members of the same miRNAs family were in the same or adjacent intergenic regions (Li and Mao,2007),these microRNA genes were considered to be the product of the tandem repeats;if miRNAs were in the same duplication block and more than 50% of the genome genes arranged in order and the corresponding relationship was relatively consistent (Tang et al.,2008),we considered that these miRNAs genes were the products of the chromosomal segment repeats,and these modules could be different areas or in the same chromosome.
Mature gma-MIR166 was located in the third primer of sequence.The nucleotide content tended to be high conservation.The left two bases were U and C and the right four bases were four C.There was a high significant nucleotide A in two bases gap from the right terminal.Mature star of gma-MIR166 was located in the fifth primer of sequence and they possessed less conservation than mature.Especially,there were six continuous nucleotide combinations GUUGAG beside 5'terminal of mature star (Fig.1).These basic sequence features may be the results of evolution selection and reasons of function.
Fig.1 gma-MIR166 family hairpin sequence logo
We implemented soybean wide genome block analysis based on MCScanX software (Wang et al.,2012) and Glycine max version 1.0 data.
Finally,we obtained 1 146 collinear blocks (Tang et al.,2008),and calculated corresponding average Ks values of each block (Nozawa et al.,2012).
Location information of gma-MIR166 family was obtained from miRBase database,the other information with relationship to the genome information were download from Ensemble Plant database.According to information,each miRNA member would be located in the chromosome of soybean.As shown in Fig.2,there were only gma-MIR166e and gma-MIR166q distributing as clusters,other 19 members scattered in 13 chromosomes.There was only 148 bp distance from gma-MIR166e to gma-MIR166q.Both miRNAs were located in the same gene Glyma04g38431 and they constituted a tandem repeat.From Table 1,we found that gma-MIR166e was across 5'UTR and CDS region and gma-MIR166q was located in 3'UTR of this gene.Basic analysis results of the soybean MIR166 family are shown in Fig.2 and Table 1.Soybean MIR166 gene family had 11 members in the internal of the genes,gma-MIR166n/o located in introns,gma-MIR166c located in CDS,gma-MIR166b was across 5'UTR and CDS area,gma-MIR166j was across 3'UTR and CDS area,gma-MIR166i was across gene's space and parts of Glyma02g15860 5'UTR area.Other 10 members were located in gap gene.
The whole genome of Glycine max could be divided into 1 146 collinear blocks,and gma-MIR166 members located in 14 blocks.As shown in Fig.2,13 pairs of gma-MIR166 family's members were large segment replications of the chromosome.Four pairs of micro-RNAs included gma-MIR166p and gma-MIR166g,gma-MIR166a and gma-MIR166c,gma-MIR166n and gma-MIR166o,gma-MIR166o and gma-MIR166b.
Table1 MIR166 gene family distribution and divergence time
Fig.2 MIR166 gene distribution and association of them based on collinear block
As forward direction segmental replication (Clarke and Lumsden,1993),other nine large pieces of micro-RNAs replication direction were reverse.Therefore,the expansion way of gma-MIR166 family mainly was chromosome duplication (Eichler et al.,2001),and only gma-MIR166q and gma-MIR166e were inferred as tandem repeats (Cannon et al.,2004).Their Ks value was approximately 1.62 and greater than threshold 1.5 which was generally considered that their origin was relatively more older (Nozawa et al.,2012).And it was generally believed that features of their mature and precursor sequences could not distinguish their origin.But tandem repeat relationship of them had been determined.Based on the origin of miRNA genes coming from target gene reverse replication and the accumulation of random sequence mutation,we speculated that gma-MIR166n,gma-MIR166r,gma-MIR166o,and gma-MIR166b were derived from the expansion of the two old members.Ks value of gma-MIR166r was about 1.16,and its replication time was earlier than other members.Then,we speculated that the sequence duplicate occurred in the second genome duplication,namely Legume WGD (59 million years ago) (Axtell and Bowman,2008).
Mega 6.0 was used to construct phylogenetic tree based on MIR166 hairpin sequence.NJ method (neighborjoining method) was used and 1 000 bootstrap was chosen (Tamura et al.,2013) (Fig.3).
We classified gma-MIR 166 family members into three group A,B and C.Mean Ks of A group was 0.64,mean Ks of B group was 0.9398 and C group was 0.55.There was significant difference among groups.Especially,gma-MIR166e was located in group A corresponding Ks value was 1.6198.gma-MIR166q was located in group B and corresponding Ks value was 1.6198.gma-MIR166r was located in group C and its Ks value was 1.1625.Phylogenetic tree was constructed based on miRNA hairpin sequences,and the three members of MIR166 family were respectively the oldest one in groups.So,we inferred that they were very likely ancestors of the respective group.
Fig.3 Soybean gma-MIR166 family phylogenetic tree based on NJ algorithm
We found that gma-MIR166e and gma-MIR166q were the oldest in this family,they were located in the same block and the differentiation time was around 130 Mya (Nozawa et al.,2012).gma-MIR166r,gma-MIR166o,gma-MIR166n,gma-MIR166b,gma-MIR166h,and gma-MIR166j were related directly or indirectly to gma-MIR166e and gma-MIR166q by collinear block analysis,and we suspected that they were likely to be gma-MIR166 gene family ancestors (Li and Mao,2007).But there may be other possible reasons: (1) some clues of microRNAs had not been discovered,some gma-MIR166 family members didn't show strong corresponding relationship with MIR166q and MIR166e;(2) some clues demonstrated that genes were lost in the process of evolution;(3) there were other undiscovered original gene besides gma-MIR166e and gma-MIR166q.If we collected more sufficient soybean microRNAs data and genome data,these questions could be clarified.
There were two original formations of miRNAs which included target genes reverse replication and the accumulation of random sequence.Generally,plant genome multiplication events included chromosome large segment replication and tandem repeat.Both of them contributed to miRNA family expansion.By analysis on the expansion pattern of soybean's MIR166 gene family,we found that the origin of this family was earlier than the differentiation on monocot and dicot (Humphry et al.,2010).gma-MIR166e and gma-MIR166q were tandem repeats and other members were successive from their subsequent replications.Ks value of gma-MIR166r was approximately 1.16,and its replication time was earlier than other members.Family expansion analysis showed that,except for a pair of the tandem repeat,there were also other eight reverse repeats,four pairs of the direct repeats,four single members without repeat block,and the largest pieces of replication with reverse repeat.This family played an important role in soybean regulation network,gma-MIR166's research was in favor of the discovering functions of gma-MIR166 and provided a key theoretical evolution basis for other important plant miRNA gene families'evolution researches.
Wang Zhi-hao and Jin Hui-hui contributed equally to this paper.
Allen E,Xie Z,Gustafson A M,Sung et al.2004.Evolution of microRNA genes by inverted duplication of target gene sequences in Arabidopsis thaliana.Nat Genet,36: 1282-1290.
Axtell M J,Bowman J L.2008.Evolution of plant microRNAs and their targets.Trends Plant Sci,13: 343-349.
Bartel D P.2009.MicroRNAs: target recognition and regulatory functions.Cell,136: 215-233.
Bengtsson B O,Uyenoyama M K.1990.Evolution of the segregation ratio: modification of gene conversion and meiotic drive.Theor Popul Biol,38: 192-218.
Bertioli D J,Moretzsohn M C,Madsen L H,et al.2009.An analysis of synteny of Arachis with Lotus and Medicago sheds new light on the structure,stability and evolution of legume genomes.BMC Genomics,10: 45.
Cannon S B,Mitra A,Baumgarten A,et al.2004.The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana.BMC Plant Biol,4: 10.
Clarke J D,Lumsden A,1993.Segmental repetition of neuronal phenotype sets in the chick embryo hindbrain.Development,118: 151-162.
Dezulian T,Remmert M,Palatnik J F,et al.2006.Identification of plant microRNA homologs.Bioinformatics,22: 359-360.
Eichler E E,Johnson M E,Alkan C,et al.2001.Divergent origins and concerted expansion of two segmental duplications on chromosome 16.J Hered,92: 462-468.
Goodstein D M,Shu S,Howson R,et al.2012.Phytozome: a comparative platform for green plant genomics.Nucleic Acids Res,40: D1178-1186.
Griffiths-Jones S.2004.The microRNA Registry.Nucleic Acids Res,32: D109-111.
Humphry M,Bednarek P,Kemmerling B,et al.2010.A regulon conserved in monocot and dicot plants defines a functional module in antifungal plant immunity.Proc Natl Acad Sci USA,107: 21896-21901.
Jaillon O.2007.The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla.Nature,449: 4.
Larkin M A,Blackshields G,Brown N P,et al.2007.Clustal W and Clustal X version 2.0.Bioinformatics,23(21): 2947-2948.
Lee R C F R L,A mbros V.1993.The C.elegans heterochronic gene lin24 encodes small RNAs with antisense complementarity to lin214.Cell,75: 12.
Lewis G P.2005.Legumes of the world.Royal Botanic Gardens,Kew,Richmond,UK.
Li A,Mao L.2007.Evolution of plant microRNA gene families.Cell Res,17: 212-218.
Nozawa M,Miura S,Nei M.2010.Origins and evolution of microRNA genes in Drosophila species.Genome Biol Evol,2: 180-189.
Nozawa M,Miura S,Nei M.2012.Origins and evolution of microRNA genes in plant species.Genome Biol Evol,4: 230-239.
Reinhart B J,Slack F J,Basson M,et al.2000.The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans.Nature,403: 901-906.
Schlueter J A,Lin J Y,Schlueter S D,et al.2007.Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing.BMC Genomics,8: 330.
Schmutz J,Cannon S B,Schlueter J,et al.2010.Genome sequence of the palaeopolyploid soybean (vol 463,pg 178,2010).Nature,465: 120-120.
Severin A J,Cannon S B,Graham M M,et al.2011.Changes in twelve homoeologous genomic regions in soybean following three rounds of polyploidy.Plant Cel,l23: 3129-3136.
Tamura K,Stecher G,Peterson D,et al.2013.MEGA6: molecular evolutionary genetics analysis version 6.0.Mol Biol Evol,30: 2725-2729.
Tang H,Bowers J E,Wang X,et al.2008.Synteny and collinearity in plant genomes.Science,320: 486-488.
Voinnet O.2009.Origin,biogenesis,and activity of plant microRNAs.Cell,136: 669-687.
Wang Y,Tang H,Debarry J D,et al.2012.MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity.Nucleic Acids Res,40: e49.
Wightman B,Ha I,Ruvkun G.1993.Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C.elegans.Cell,75: 855-862.
Wong J,Gao L,Yang Y,et al.2014.Roles of small RNAs in soybean defense against Phytophthora sojae infection.Plant J,79: 928-940.
Yin G,Xu H,Xiao S,et al.2013.The large soybean (Glycine max) WRKY TF family expanded by segmental duplication events and subsequent divergent selection among subgroups.BMC Plant Biol,13: 148.
Zhou Z,Wang Z,Li W,et al.2013.Comprehensive analyses of microRNA gene evolution in paleopolyploid soybean genome.Plant J,76: 332-344.
Zhu H,Hu F,Wang R,et al.2011.Arabidopsis argonaute10 specifically sequesters miR166/165 to regulate shoot apical meristem development.Cell,145: 242-256.
Journal of Northeast Agricultural University(English Edition)2015年1期