Quantitative genetic studies with applications in plant breeding in the omics era

2020-12-20 16:55:22JinkngWngJosCrossJunyiGi
The Crop Journal 2020年5期

Jinkng Wng, José Cross, Junyi Gi

aInstitute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China

bInternational Maize and Wheat Improvement Center (CIMMYT), Carretera Mexico-Veracruz Km. 45, El Batan, Texcoco, Mexico

cSoybean Research Institute, Nanjing Agricultural University, Nanjing 210095, Jiangsu, China

Quantitative genetics is concerned with the inheritance of biological traits showing continuous(or quantitative)phenotypic variation.Quantitative traits are common and have been extensively investigated and studied in evolutionary and genetic studies and in plant and animal breeding.Quantitative traits are normally controlled by multiple genes with various kinds of genetic effect,and their phenotypes are readily modified by environmental variation.Based on the multifactorial hypothesis of quantitative traits,classical theories of quantitative genetics(also referred to as statistical genetics in early years)were well established by the 1940s,owing mainly to the contributions of R.A.Fisher,J.B.S.Haldane,and S.Wright.Thanks to the publication of monographs by Mather[1],Kempthorne[2],and Falconer[3],quantitative genetics has spread widely in western countries since 1950 and contributed immensely to the improvement of plants and animals in the 20th century.In China,quantitative genetics started to be taught at agricultural colleges and universities as a postgraduate academic course from the late 1970s to early 1980s,thanks in part to the publication of textbooks by Wu[4],Ma[5],and Liu et al.[6].

Rapid progress in molecular biology and genomics has made fundamental impacts in theoretical and applied studies of quantitative genetics since the landmark paper of Lander and Botstein[7].In the era of omics,availability of fine-scale genetic linkage maps and easily-accessed genotyping technologies has led to progress in quantitative genetics theory,together with the intensive use of quantitative trait locus(QTL)mapping and map-based cloning in the genetic study of quantitative traits in plants,animals,and humans.As a consequence,novel molecular breeding methods have been developed,including marker-assisted selection,designed breeding,and genomic selection.Two recent Chinese textbooks by Kong[8]and Wang[9]covered classical and modern theories and breeding applications from both population and quantitative genetics.Two professional books edited by Gai et al.[10]and Wang et al.[11],describe some of the major achievements in theoretical and applied quantitative genetics research during the past 30 years in China.

From 8 to 10 May 2000,Prof.Huidong Mo,a famous biometrician and quantitative geneticist in China,organized a meeting on quantitative genetics at Yangzhou University.Two distinguished scientists,Prof.Changxin Wu(China Agricultural University,Academician of Chinese Academy of Sciences)and Prof.Junyi Gai(Nanjing Agricultural University,Academician of Chinese Academy of Engineering)attended the meeting.During the meeting,the two professors and other participants advocated to have the meeting regularly held once every two years,and named the meeting in Yangzhou as the first National Symposium on Quantitative Genetics in Plants and Animals.Two years later as scheduled,Prof.Wu organized the second symposium at Xiangshan Hotel,Beijing in align with the Xiangshan Scientific Series Conferences led by the Chinese Academy of Sciences.Thereafter,the symposium series became an important event in the Chinese scientific community.The eighth symposium(http://qgc2019.isbreeding.net/)was held from 26 to 28 August 2019 in the Friendship Hotel in Beijing.Eighteen presentations were invited on six themes:(1)new and future areas of quantitative genetics;(2)omics-driven quantitative genetics studies;(3)genetic analysis and gene mapping of quantitative traits;(4)whole-genome dissection and prediction of quantitative traits;(5)quantitative genetic theories for molecular breeding;and(6)applications of quantitative genetics. About 120 researchers and graduate students attended the symposium. Distinguished guests professor Huqu Zhai (former president of CAAS), professor Laifu Liu(Beijing Normal University) and professor Chunming Liu(director general of the Institute of Crop Sciences, CAAS)attended the symposium and gave remark speeches during the opening ceremony. In support of the symposium, The Crop Journal arranged a special issue with the title “Quantitative genetics in the omics era”. After peer review, 17 articles were finally selected, including one review article on genomic selection [12], five articles on analytical methods and tools for quantitative traits [13–17], six articles on genetic studies of quantitative traits [18–23], and five articles on applications in breeding for quantitative traits [24–28].

For future quantitative genetics, we anticipate that populations that can be used to dissect the genetic architecture of quantitative traits will be more diversified; genetic data and gene information will become more abundant and diverse and will come from more sources and levels; functions and genetic networks of more quantitative trait genes will be investigated; demand for genetic studies and breeding applications will be more specialized and become ever stronger.Bearing in mind these trends, and also based on the articles collected in this issue, we outline below a few research areas that may be potential in theoretical and applied quantitative genetics studies in the near future.

1.From bi-parental to multi-parental populations

Genetic study is impossible without the use of one or several populations[11].Bi-parental segregating populations are derived from two homozygous parents,such as doubled haploid lines,recombinant inbred lines,backcross,and F2and F3populations,which have been widely used in genetic studies and QTL mapping.The integrated QTL IciMapping software package[29]provides many analysis methods for phenotypic and genotypic data associated with such populations.In bi-parental populations,genetic loci with identical genotypes in the two parents cannot be detected,and the number of recombination events is relatively limited,resulting in a lack of mapping precision.In addition,it is not clear whether an identified QTL has multiple alleles unless it is studied in other independent populations.To save time in population development and identify more alleles at a locus,association mapping(also called genome-wide association study,GWAS)has been employed in natural populations or germplasm panels.GWAS depends on population-wide marker-phenotype associations and historical recombination events,and may suffer from unknown population structure and low linkage disequilibrium.As a result,association mapping in plants has so far failed to identify a single major QTL allele that has been of value in public breeding programs[30].

Multi-parental populations have been developed and are being used in genetic studies in several species in the past ten years.In these populations,each locus harbors multiple alleles,kinship or genetic relationships in the progenies are well defined,and accordingly population structure can be precisely defined.Greater opportunity of recombination during population development increases mapping accuracy and abundant genetic variation allows the detection of more genes and alleles.Linkage analysis methods,QTL mapping methods and associated software packages have been reported for F1populations from two heterozygous parents and from double(or four-way)crosses among four inbred parental lines[31–33],populations of pure lines derived from a double cross[34,35],and populations of pure lines derived from an eight-way cross among eight inbred parental lines[34,36].In this special issue,there is one paper describing the use of a four-way cross pure-line population for mapping QTL for oil content in soybean[22].We anticipate that multi-parental populations will be increasingly developed and used in genetic studies of quantitative traits in the near future,as analysis methods and tools are developed.When more parents are to be considered in population development,the selection of crossing or mating design becomes an issue worthy of investigation[11,32,35,36].In the meantime,there is a need for QTL-by-environment interaction analysis for multiparental populations.To our knowledge,epistatic mapping methods in multi-parental populations have also not been studied.

2.From QTL to gene function,gene regulation,and gene network

As mentioned earlier,molecular biology and genomics together with high-resolution QTL mapping methods have made fundamental impacts in theoretical and applied quantitative genetics studies.Genetic architecture refers to the number and genome locations of genes that affect a trait,the magnitude of their effects,and the relative contributions of additive,dominant,and epistatic gene effects[37].Dissecting the genetic architecture of quantitative traits is a long-term and major task in genetics and quantitative genetics.Three articles in this special issue address theoretical aspects of QTL mapping:ordering of high-density molecular markers in linkage map construction[13],improvement of time efficiency in GWAS using high-density SNP markers[15],and statistical methods suitable for multi-trait GWAS[16].Several articles address the application of QTL mapping for kernel shape and color in durum wheat[18],panicle traits in rice[19],seed flooding tolerance in soybean[20],branch number in soybean[21],seed oil content in soybean[22],yield and plant height in alfalfa[23],and seed glucosinolate content in Brassica napus[27].In the study of Sobhi et al.[21],one major-effect QTL was further localized to a chromosomal region 116 kb in length and a candidate gene in this region was confirmed to control branch number in soybean.Wang et al.[27]identified a sulfotransferase gene that has minor but stable effects on plant height traits in Brassica napus.

We anticipate that QTL mapping will continue to be a major approach in genetic analysis of biological and economic traits.We also expect that the causal genes and sequence changes of detected QTL will be identified,together with their functions and biochemical pathways from gene to phenotype.While this remains a major challenge and will take sustained effort,it is not impossible.The procedure used to elucidate the signaling pathway of GW5 in regulating grain width and grain weight in rice can serve to demonstrate the tedious effort and long research path from one detected QTL to its function and pathway[38–41].In 2005,a QTL for grain width was reported[38]to show stable genetic effects across a wide range of environments in two mapping populations,one of recombinant inbred lines and the other of chromosome segment substitution lines.In 2008,the QTL was fine-mapped in a recombination hotspot region on rice chromosome 5[39],and was further isolated and characterized as a major QTL for grain width and grain weight[40].Weng et al.[40]reported that a 1212-bp deletion was associated with increased grain width in the japonica parent Asominori,in comparison with the slender grain in the indica parent IR 24.In 2017,the GW5 allele at the locus was reported[41]to act in the brassinosteroid signaling pathway and finally regulate grain width and grain weight in rice.

In the above example,it took 12 years from preliminary QTL mapping to full understanding of how the QTL or gene contributes to the final grain width and grain weight phenotype in rice.This time would be much longer if population development for the initial QTL mapping were counted.Nonetheless,we anticipate that,in the near future,more and more quantitative trait genes will be fine-mapped,isolated,cloned,and functionally analyzed.This information not only strengthens our understanding of the genetics of quantitative traits,but also helps to apply new biotechnological approaches,such as genome editing and molecular design,in targeted improvement of quantitative traits.In addition,minor-effect QTL or genes have been occasionally reported[27,42].They can be repeatedly detected across environments and populations when a lower inclusion threshold is used,and their genetic effects on phenotypic traits are estimated toward the same direction[42].These minor but stable QTL or genes should also be further investigated.

3.Increase of prediction accuracy in genomic selection

The concept of genomic selection(GS)was proposed by Meuwissen et al.[43]in 2001,aiming to use genome-wide,densely distributed DNA markers to increase the efficiency of improving quantitative traits.Reduction in breeding cost per cycle and increase in time efficiency are two major advantages of GS over phenotype-based selection.GS has been applied in two different cases.In the first case,we are interested in the prediction of additive(breeding)values rather than total genetic value,and here additive linear models that summarize the effects of markers are sufficient.In the second case,we are interested in predicting complete genetic values of individuals by considering both additive and non-additive(dominance and epistasis)effects,thereby estimating the performance(commercial value)of the cultivars.There is one review article and one methodology article on GS in this special issue[12,14].Considering the complicated and varied genetic architectures of different quantitative traits,we conclude that new approaches and algorithms are still needed to further increase GS prediction accuracy by considering more genetic and environmental factors.

GS employs genome-wide markers and phenotypic information from one or several observed and genotyped populations to establish an association between genotype and phenotype and then to predict phenotypic values in tested and/or breeding populations that have been only genotyped with genome-wide markers.Over the last decade,many prediction models have been proposed,differing from one another in assumptions in estimating breeding values and in computational complexity.Genetic values of breeding lines can be predicted for some environments using an incomplete multi-environment testing scheme.Complexity arises in predicting the values of unobserved lines in specific environments using estimates of genotype by environment(GE)interaction.Also important is the high genomic complexity of GE interactions for multiple traits,requiring the use of statistical genetic models that exploit genetic correlations among environments,traits,and traits and environments simultaneously[44,45].In addition,the volume and complexity of GS data demand more interdisciplinary research in computer science,machine learning,mathematics,physics,statistics,genetics and quantitative genetics,and bioinformatics.Deep learning algorithms are powerful for modeling nonlinear patterns and can be incorporated into GS for integrating data from different sources and increasing prediction accuracy.

4.Applications of genomic selection in conventional breeding programs

The concept of GS has been proposed for 20 years,during which prediction models have been developed and implemented in various programming languages and platforms.In the meantime,some private companies have adopted GS in their breeding programs,especially for economically important large animals.However,the application of GS in plant breeding lags behind.The major factors may be the differences between animals and plants:generation length,population size,selection intensity,and breeding objectives.But one major reason may be the genotyping cost.Every individual in a newly developed breeding population has to be genotyped with high-density markers for the prediction of its breeding value.Populations in animal breeding normally consist of hundreds or thousands of progeny,sizes much smaller than those of most plant breeding populations.Taking wheat as an example,every season a breeder makes tens or hundreds of single crosses,grows out the F1hybrids made in the previous season,grows out the F2populations(each of 1000–3000 plants)derived from the F1bulk in the previous season,and so on.Thus,one wheat breeder grows millions of segregating and heterozygous individuals and thousands of advanced lines in just one season.Some breeding programs may have two or more seasons in one calendar year.Even though genotyping costs per sample are becoming ever cheaper,the total cost of genotyping all individual plants and the cost relative to the value of the individual(think of a milk cow compared to a wheat plant)are still too expensive for most plant breeding programs.

Fortunately,the collection of parents used in plant breeding is rather limited,normally numbering in the tens or hundreds every season.It can be acceptable for plant breeders to genotype all parents that will be used to make crosses for a new breeding cycle.Phenotypic data for these parental lines are already available from previous breeding cycles.Guo et al.[46]investigated the accuracy of several models in predicting the performance of F1hybrids between recombinant inbred lines derived from the cross of two elite maize inbred lines.Through GS prediction,the authors identified untested F1hybrids predicted to have higher grain yield than the original commercial F1hybrid.Yao et al.[47]evaluated the prediction power of several GS models in wheat breeding using a set of parents as the training population.By predicting the performance of all possible crosses,they identified and recommended to breeders the optimum crosses for the simultaneous improvement of grain quality and yield.Three articles describing GS applications in plant breeding are included in this special issue.The first focuses on the prediction of general combining ability of maize inbred lines using a sparse diallel cross design[24],the second focuses on the prediction of untested F1maize hybrids from a limited number of tested hybrids between two heterotic groups[25],and the third focuses on the family information of highly structured populations without pedigree data[26].Ali et al.[48]used a wheat population to investigate the prediction accuracies of various GS models for yield and yieldrelated traits in various quality control scenarios,with missing-genotype imputation,and with GWAS-derived markers.These studies suggest future directions in applying GS in plant breeding.

5.Simulation,prediction and decision-support tools in genetics and breeding

Many QTL and genes for quantitative traits have been reported for various traits in plants and animals.The challenge remains for breeders to determine how best to use this abundance of information[6].Simulation approaches could consider more practical genetic models incorporating multiple alleles,pleiotropy,epistasis,and gene-byenvironment interaction which have been learned from genetic studies,and therefore compare and optimize the selection method under more realistic scenarios[11,28,49].Several articles describing tools are included in this special issue:for linkage analysis and genetic map construction[13],for genome-wide association study[15,16],and for phenotypic data analysis[17].

In the omics ear together with the coming of big data times,we anticipate that simulation,prediction and decisionsupport tools are in high demand in genetics and breeding.In one paper reporting a breeding simulation tool,the authors describe its use to compare GS with conventional selection in the presence of epistasis[28].Such tools will be highly helpful for breeders wishing to compare breeding efficiencies among selection strategies,to predict cross performance using known gene information,and to investigate the efficient use of identified QTL and GS in conventional breeding[49].Such tools can help breeders to investigate many what-if crossing and selection scenarios and allow them to be rapidly tested and compared in silico before resource-intensive field experiments are conducted.

Acknowledgments

The authors appreciate the financial support from the National Natural Science Foundation of China(31861143003),the Agricultural Science and Technology Innovation Program of CAAS,and the CAAS Talent Program.The authors also wish to extend their sincere thanks to all contributors to the eighth symposium and the special issue.