Data Analytics for Accurate Genomic Predictions can accelerate plant breeding by turbocharging gene banks

A new study led by an Iowa State University agronomist may help scientists sift through vast amounts of plant seeds stored in gene bank facilities across the globe to identify those useful to plant breeders attempting to produce better varieties.

The effort represents a proof-of-concept experiment that may help plant scientists separate the wheat from the chaff when it comes to selecting the best accessions to breed cultivars with better yield or stress resistance, said Jianming Yu, an associate professor of agronomy and the Pioneer Distinguished Chair in Maize Breeding.

Nature Plants – Genomic prediction contributing to a promising global strategy to turbocharge gene banks

“We think it’s possible to use these predictions to guide our breeding and selection decisions,” said Xiaoqing Yu, a postdoctoral agronomy research associate and the first author of the paper. “We hope it will facilitate better and more precise breeding with the diverse genetic materials.”

The researchers tested a complex set of genetic tools to predict which traits hundreds of sorghum seeds would possess if cultivated. The team then grew specimens for some of those sorghum accessions, the word used to describe plant material collected from various sites, to gauge the accuracy of their genome-based predictions. The team’s yield predictions proved accurate over 70 percent of the time.

In theory, plant breeders can access a virtual ocean of data on germplasm, or the genetic material of plants, from all over the world. There are 1,750 gene banks in the world containing 7.4 million plant accessions, but only a small percentage of those possess the specific qualities that plant breeders prize in producing new cultivars for production needs.

But finding the best accessions among the millions available poses a logistical nightmare for plant scientists, Jianming Yu said.

The publication shows it’s possible to an extent to predict the traits those accessions possess based on their genetic profile. Yu said the paper takes a step toward “super charging the engine” of a valuable resource allowing sorghum breeders to zero in on valuable accessions with greater ease and speed than is currently possible.

“We all agree on the urgency and challenges to effectively mine the natural heritage stored in gene banks,” he said. “But we need to test different strategies and we need to figure out the way.”

The researchers selected a set of 962 sorghum accessions from a U.S. Department of Agriculture database and conducted sequencing to obtain the genome-wide fingerprinting data. They field tested a selected training sample and used an assortment of prediction tools to assess various traits. The researchers then cultivated 200 of those accessions to check how their predictions matched reality.

Yield predictions had an accuracy of 76 percent, and predictions for other traits, such as plant height, ranged from 67 to 83 percent.

“By leveraging genomics and data analytics, we certainly can do a better job,” Jianming Yu said

The 7.4 million plant accessions in gene banks are largely underutilized due to various resource constraints, but current genomic and analytic technologies are enabling us to mine this natural heritage. Here we report a proof-of-concept study to integrate genomic prediction into a broad germplasm evaluation process. First, a set of 962 biomass sorghum accessions were chosen as a reference set by germplasm curators. With high throughput genotyping-by-sequencing (GBS), we genetically characterized this reference set with 340,496 single nucleotide polymorphisms (SNPs). A set of 299 accessions was selected as the training set to represent the overall diversity of the reference set, and we phenotypically characterized the training set for biomass yield and other related traits. Cross-validation with multiple analytical methods using the data of this training set indicated high prediction accuracy for biomass yield. Empirical experiments with a 200-accession validation set chosen from the reference set confirmed high prediction accuracy. The potential to apply the prediction model to broader genetic contexts was also examined with an independent population. Detailed analyses on prediction reliability provided new insights into strategy optimization. The success of this project illustrates that a global, cost-effective strategy may be designed to assess the vast amount of valuable germplasm archived in 1,750 gene banks.

SOURCES- Iowa State University, Nature Plants