Experimental approaches

Activity 1: Investigating abiotic stress resistance in cultivated sunflower

To assess the genetic and physiological basis of abiotic stress resistance in cultivated sunflower, we will phenotype stress resistance and related traits in separate, large-scale screens for drought, flooding, salt, and low nutrient stress, followed by GWAS to determine the genetic basis of variation in those traits. We will then select lines that exhibit notable differences in resistance to each stress for further characterization in a series of additional experiments designed to better understand the mechanistic basis of the stress resistance under multiple stress levels relevant to field conditions.

Activity 1.1: Phenotyping in large-scale screens

We will perform large-scale screens of cultivated sunflower (SAM population, 288 lines) for stress resistance and associated traits in separate experiments for drought, flooding, salt, and low nutrients. As far as possible, replicate screens will be conducted in the field and greenhouse (or growth room) for each stress. Greenhouse and field studies are complementary. Greenhouse conditions reduce environmental variance and allow greater precision in stress levels applied, whereas field conditions provide a more robust link between stress resistance and yield under more realistic agricultural conditions. For each screen (drought, flooding, salt, or low nutrients), we will implement two treatments (control and stressed). The intensity and timing of the stress will be chosen based on the results of preliminary studies that identify a level of stress that will reduce growth and/or yield by an average of ~30% and provides the ability to distinguish performance among lines. For each screen, the relative stress resistance of each line will be characterized as Geometric Mean Productivity (GMP; sqrt(YCO* YST)), where YCO is yield or biomass in the control (i.e., non-stressed) treatment, and YST is yield or biomass in the stressed treatment33. For each screen, the mean trait value of each line in each treatment, and stress resistance calculated across treatments, will be used in the association mapping (below). Additional analyses such as regressions, principle components analyses, and structural equation modeling will be used to assess the relationships among traits and the relationships of traits to stress resistance for each screen.

Drought stress

The SAM population will be assessed for drought resistance in field-based evaluations in years 1 and 2 at the University of California Desert Research & Extension Center (DREC). DREC is an experimental farm that provides normal cultural practices such as land preparation, planting, irrigation, fertilization, cultivation, and pest control. The average precipitation at DREC of 15 mm from April to August is well below the 600-1000 mm generally required by sunflower10. This dry climate, combined with our extensive experience with irrigation management and soil moisture monitoring, ensures that we can produce the necessary conditions needed to assess the differential responses of lines within the population. This experiment will have 2 watering treatments (well watered, mild drought), 288 lines, and 4 replicate plots per line. The experimental design will be a split plot design, where replications are in complete blocks nested within the drought stress and control treatments, respectively. Within each replication in flood and control, we will have a 17×17 partially balanced lattice design, such that 17 entries will be within each incomplete block and some permutations of the lattice will not be used. Plots will be two rows, 6.2 m in length with 1.5 m alleys between plots, planted at 150% of intended population density. Row width will be 0.75 m. Dates of emergence will be recorded. After the V2 stage is reached on 100% of the plots97, plants will be thinned to a consistent stand of approximately 58 plants/plot (62,500 plants ha-1). In each replicate and treatment, extra plots will be added for a subset of 20 lines that represent the diversity in the SAM population for destructive plant sampling to validate sensor measurements. We will plant at the beginning of April and impose a mild drought from 30 days after planting onward. Based on prior experience, we anticipate that an appropriate level of drought will be achieved by irrigating at 60% of potential evapotranspiration, but this will be confirmed by preliminary trials. Irrigation will be applied within plots so that a high-clearance tractor for phenotyping can be driven to take data at any time. Irrigation will be controlled with gated pipes, and water meters will record irrigation amounts. Plants will be phenotyped with high-throughput, sensor-based approaches as well as traditional methods (see below). At physiological maturity, all plots will be harvested. Several representative heads from each line and replicate plot will be used to estimate yield, achene size estimated as 100-achene weight, and seed oil content98,99. Yield will be used as the integrated measure of performance for estimating stress resistance, calculated as GMP.

For the field trials at DREC, sensor-based HTP approaches will be used to evaluate leaf and canopy traits using a tractor-mounted array of sensors developed by engineers from the Precision Agriculture Program at the University of Arizona57. The sensors include Pulsar dB3 ultrasonic transducers and Blackbox 130D level controllers to measure canopy height; Apogee SI-121 infra-red radiometers of 36 degree view angle to measure canopy temperature to determine CWSI; and active, multispectral radiometers (model ACS-470; Holland Scientific) for measurement of canopy light reflectance in three bands (670, 720, and 820 nm) for determination of NDVI and CWSI. Additional instrumentation in the field platform includes two Campbell Scientific CR-3000 data loggers for data acquisition, and a Hemisphere GPS ultraprecise real-time kinematic receiver model A320 for data geo-referencing. Starting from three weeks after germination, sensor measurements will be taken weekly to estimate canopy height, temperature, NDVI, CWSI, and LAI for each line12,52,57. Settings for platform travel speed and instrumentation sampling frequency will be selected to achieve a spatial resolution of at least half the in-row plant spacing, currently planned to be 20 cm. Bi-weekly destructive plant sampling from the sensor-validation plots will be used to measure plant height, leaf area, and biomass to characterize plant growth and validate their relationships with sensor measurements.

Data from destructive sampling and sensor measurements will then be used to estimate integrated traits such as stress degree-days, leaf area duration, and the rate of height increase. The traits estimated from sensors will be complemented by visual assessment of dates of phenology (emergence, bud formation, anthesis, and maturity), traditional phenotyping on individual plants for leaf traits, and a rapid visual characterization of root architecture100,101. After new leaves have been produced for drought-stressed plants, and parallel to the phenomics, one representative leaf (recently fully expanded) per line, treatment, and replicate plot will be destructively sampled for leaf traits related to plant vigour and growth strategies, including LMA, leaf thickness, integrated photosynthetic water-use efficiency estimated from leaf carbon isotope ratio (δ13C), and leaf N (CHN analysis), leaf area and shape ImageJ)14,16,18,21,50,51,102-105. Due to cost, the leaves will be bulked by line and treatment for δ 13C and N.

The evaluation of drought resistance at DREC will be complemented by evaluation of seedling drought resistance in growth rooms at University of Georgia (UGA) for all 288 lines with 3 biological replicates per line and 2 treatments (control and stressed). This custom protocol, developed jointly by personnel in the Burke and Donovan labs as a modification of other systems using solid substrates and PEG106,107, has been found to produce a differential drought response across sunflower lines. Newly germinated seedlings will be individually transplanted into 50 mL Falcon tubes of sand saturated with either water (control) or PEG 6000 solution inducing a mild osmotic stress of -0.25 MPa (determined using a vapour pressure osmometer, Wescor). Treatments will be applied for 10 days (sufficient time to produce new leaves), after which seedlings will be harvested and assessed for biomass allocation (roots, stems, and leaves), number of root laterals, stem height and diameter, number of true leaves, and leaf N and δ13C (bulked by line and treatment). Biomass will be used as the integrated measure of performance for calculating stress resistance.

Flooding, salt, and low nutrient stress

Field-based evaluations of the SAM population for flooding, salt, and low nutrient stress (Table 1) will be conducted in years 1 and 2 in Fargo (North Dakota, USA), Indian Head (Saskatchewan, Canada), and NaSARRI (Serere District, Uganda), respectively. We will employ the same basic experimental design, planting, and thinning strategy as that detailed above for the drought tolerance screens at DREC: two treatments (control, stress), 288 lines, 4 replicate plots per line, split block design, thinning at the V2 stage, and a final density of 58 plants/plot. During year one we will employ conventional phenotyping approaches, while in year 2 we will build on our experience at DREC to extend the HTP approaches to the Fargo and Indian Head sites. For flooding stress, co-PI Hulke has identified several flat, homogenous fields near Fargo that can be irrigated by sprinkler. Following thinning at the V2 stage, water will be added to the stress treatment until the soil is fully saturated. Growth stage at the onset of stress will be recorded for each plot as described by Schneiter and Miller97. The stress treatment will be maintained for two weeks, and the following traits will be measured using conventional methods: percent survivorship, extent of adventitious root growth, height of growth, leaf traits (same as measured for drought tolerance, listed above), visual assessment of dates of phenology (anthesis and maturity), as well as yield and oil traits. In addition, plants will be assessed for malate accumulation: malate concentrations are very low in flooded sunflowers from dry habitats due to cessation of respiratory mitochondrial activity48. For salt stress, we will target fields near Indian Head for which soil salinity has been previously shown to reduce sunflower yields (B. May, pers. comm.), as well as similar nearby fields that lack soil salinity problems. Plants will be screened for the same set of traits targeted for flooding tolerance (above), except that we also will examine leaf ion characteristics (Na, K, Mg, Ca). For both Fargo and Indian Head, yield will be directly determined on the entire plot using a Kincaid 8XP small plot combine (Kincaid Mfg., Haven, KS, USA) with a Harvestmaster High-Capacity Grain Gage (Juniper Systems, Logan, UT, USA), which measures total harvest weight and grain moisture using onboard probes, and records data directly on a laptop or handheld device. Oil content will be determined by the Nuclear Magnetic Resonance (NMR) method using an Oxford MQC (Oxford Instruments, Abingdon, Oxfordshire, UK) calibrated with appropriate standards to determine oil content on a 10% moisture basis with low error. The screens for low nutrient stress will be conducted on sandy soils near National Semi-Arid Resources Research Institute (NaSARRI) in Uganda, where nutrient stress is known to limit sunflower production. Here the stress treatment will have no nutrients added, whereas for the control we will add a standard top dressing of fertilizer, equivalent to 90 kg N per ha. We will assay largely the same set of traits targeted for the flooding and salt treatments, except for adventitious roots. Also, we will analyze leaf N (bulked by line and treatments) instead of leaf ion concentrations. Yield and oil will be determined in a manner similar to the drought tolerance studies [There isn’t a period here in the PDF. Is it an omission or did the rest of the sentence get cut off?]

The SAM population will also be assessed for flooding, salt, and low nutrient stress resistance in in the greenhouse in years 1 and 2 (Table 1), with biomass as the integrated measure of performance for calculating stress resistance. For salt and low nutrient stress, there will be two screens, one at the seedling stage (through first full leaf), and one at the late vegetative stage, whereas for flooding stress there will be a seedling screen only since flooding stress typically only occurs early in the growing season. For each stress and developmental stage, the SAM lines will be exposed to two treatments (control and stressed) in a randomized split-plot design with a minimum of three replicates per line. For the flooding treatment a fourth replicate will be added to allow destructive sampling. Flooding stress will be imposed by propagating seedlings in fully saturated soils, whereas the salt stress level will be 75 mM NaCl, which, based on preliminary studies, results in differential growth for cultivated H. annuus with little lethality108,109. For the low nutrient stress, preliminary studies with a subset of SAM lines will be used to determine the level of fertilizer for the low nutrient stress treatment. Traits to be measured on all plants in all of the screens include seed germination and survival, relative height growth, leaf characteristics of a recently expanded leaf (leaf chlorophyll (SPAD-502 meter, Konica Minolta), NDVI (PS-300 spectroradiometer, Apogee Inst.), area, shape, LMA, toughness (penetrometer), and N (bulked by line and treatment), and biomass allocation at harvest (roots, stems, leaves). In addition, plants will be assessed for advantageous root growth and malate accumulation in the flooding resistance screen. For the salt stress and low nutrient screens, plants will be assessed additionally for leaf ion concentrations (Na, K, Mg, Ca)110,111 and leaf N resorption proficiency and efficiency42,104.

aDREC = California’s Desert Research & Extension Center.
bTwo screens will be performed, one at the seedling stage and one at the late vegetative stage.
cNaSARRI = National Semi-Arid Resources Research Institute, Uganda.

Activity 1.2: Genome-wide association studies of abiotic stress resistance and related traits

Following the phenotypic characterization of the 288 cultivated lines in the SAM population, we will use GWAS to identify alleles associated with variation in abiotic stress resistance and related traits in sunflower, and to determine their potential suitability for use in marker-assisted and/or genomic selection. This phase of our research will make use of the phenotypic data collected in our large-scale field and greenhouse studies of drought, salt, and low nutrient stress (above), as well as genotypic information extracted from available WGS re-sequencing data (details below). We favour this approach for developing a more complete understanding of the genetic basis of intraspecific variation in abiotic stress resistance, because association mapping offers far greater resolution than traditional QTL-based approaches, and also allows us to simultaneously investigate the effects of multiple alleles per locus across a range of genetic backgrounds. The actual resolution afforded by association mapping ultimately depends on the structure of LD in the population of interest, which itself can be influenced by a number of genetic and nongenetic factors112. In sunflower, LD is known to decline quite rapidly across much of the genome94,113,114. As such, we anticipate being able to map functional variation with a high level of precision, down to the level of one or a few genes in many cases (e.g., Fig. 2). Of course, this rapid decline in LD means that we will need very high marker density. As part of the previous Genome Canada project, we re-sequenced all individuals within the focal population to a minimum of 8-10x depth. Using custom bioinformatics pipelines developed in collaboration with the software company SAP, we extracted 4.1 million SNPs with a minor allele frequency of >10% (see study system), which can be used for association mapping. Detection of structural variants (SVs) and copy number variants (CNVs), which will also be employed for association mapping, is underway. The most time-consuming algorithms in the processing pipeline, from raw reads to variant calling, are implemented in SAP HANA’s in-memory engine and optimized for performance (Fig. 3). After reads are trimmed and cleaned, they are aligned against the sunflower reference genome. A first set of variant candidates is called using a maximumposterior- probability-genotyping algorithm based on the alleles found in the reference genome and the reads overlapping the variant candidates. To remove false positives due to false alignment calls, the reads are fed into theGATK indel realigner. Using only reads that overlap the variant candidates reduces the realignment time dramatically. Afterwards, the variants are called based on the realigned reads and finally processed using Beagle115,116 to impute missing calls and to phase haplotypes. Our long-term vision is to eliminate third party tools in the pipeline and replace them by native functionality to allow consistent and high- performance data processing in an end-toend pipeline. Our traits of interest will include drought, flooding, salt, and low nutrient resistance and related traits phenotyped in the large-scale phenotypic screens under both field and greenhouse/growth room conditions described above (Table 1). These analyses will be relatively straightforward, though care will be taken to minimize false positives. The primary concern in this regard relates to population structure, which can produce marker/trait correlations in the absence of physical linkage, thereby resulting in spurious associations117,118. Statistical approaches have, however, been developed to minimize this problem by adjusting for both population structure and relatedness amongst individuals (i.e., kinship) based on genotypic data from a collection of ‘background’ markers119-121. As currently envisioned, our analyses will make use of both individual SNPs and CNVs, as well as multi-locus haplotypes (the latter have the potential to increase power and reveal novel associations122-124) and will be performed using the R packages GWASTools125 for the association model, and SNPRelate126 to calculate population stratification and relatedness among accessions. We will use both linear regression and likelihood ratio association tests with the additive gene action model to detect regions of interest in the genome. In the regression model, population stratification (eigenvectors) and relatedness (identity by state) will be included separately as covariates to reduce the inflation in p-values which can lead to spurious associations. To choose the best model, we will calculate the inflation factor (lambda) for each analysis and compare results to ensure that we use the best model in each analysis. To reduce false positive calls due to multiple comparisons in the association tests, all p-values will be adjusted and corrected accordingly. Our current implementation of association analysis performs >50 times faster compared with other tools tested. This will facilitate an iterative analytical process to make certain the best results are reached. We will, however, reevaluate the available options on an ongoing basis to ensure that we are using the most appropriate and powerful analytical approaches. In addition to the use of existing tools, we will develop genome-wide association mapping algorithms to incorporate genotype x environment interactions, which are a key component of the present proposal. This will give us the option of including phenotypic data from multiple sites and from both stress and control treatments into a single model. These algorithms will be implemented in the SAP HANA database to gain both analytical flexibility and speed. Beyond the identification of genes/genomic regions underlying variation in abiotic stress resistance, we will investigate the occurrence and nature of genetic correlations between traits.

From a breeding perspective, the goal will be to identify abiotic stress resistance alleles that lack severe antagonistic correlations. Such correlations could result in unacceptable trade-offs with resistance to other stresses or performance under ideal conditions.

Activity 1.3: Detailed ecophysiological characterization

In order to better understand the mechanistic basis of variation in stress resistance, subsets of cultivated lines that have been ranked for resistance to each stress (based on results from Activity 1.1) will undergo a further series of experiments including detailed ecophysiological characterization, characterization with an automated phenotyping system, and transcriptomic analyses. For ecophysiological characterization, we will carry out a series of greenhouse and growth chamber studies at UGA (drought, salt, and low nutrient stress) and at UBC (flooding stress) using multiple phenological stages and multiple levels of stress. For each stress, the most and least resistant lines (15 of each) will be replicated within treatments using experimental designs appropriate for the stress applications (i.e., randomized complete block or split plot design). For drought stress resistance, we have the capacity to control soil moisture with several novel methods, including controlled irrigation to hold soil moisture at predetermined levels with an automated irrigation system that utilizes a soil moisture sensor in each of 200 pots127, and tall pots (120 cm tall x 10 cm dia.) made of clear plastic (permitting root observations) outfitted with multiple irrigators that are removed from the top downwards to produce a declining soil moisture front in a drought treatment. For each stress (drought, flooding salt, low nutrients), the divergent stress resistance lines will be characterized for a core set of traits including relative growth rate, gas exchange (photosynthesis, stomatal conductance, instantaneous water-use efficiency), leaf lifetime, and specific root length41,72,128-130. Additional traits that are specific to each stress will be measured. Traits specific to drought stress include stem hydraulic efficiency and safety, leaf venation, root growth rates and rooting depth rate under well-watered conditions, the ability to respond to soil drying with greater allocation to roots, osmotic adjustment, turgor loss point, and maintenance of leaf N and green leaf area (i.e. stay-green)14,23,39,131-135. Additional flood stress traits include adventitious rooting, and malate accumulation46-48. Additional salt stress traits include leaf succulence, and the ability to exclude or sequester Na ions in root, stems, and leaves111,136. Additional low nutrient stress traits include relative growth rate, nutrient uptake rates, and the ability to respond to low nutrient conditions with greater allocation to roots104,137.

Univariate and multivariate approaches will be used to assess the relationships among traits and the relationship of traits to stress resistance, in order to provide insight to the mechanistic basis of stress resistance at a scale intermediate between genes and phenotypes assessed in largescale screens.

Activity 1.4: Phenotyping with the Heliaphen platform

A subset of the SAM lines phenotyped using the tractor-mounted HTP in DREC field studies (years 1 and 2) will be characterized using Heliaphen’s automated phenotyping platform (demonstration onYouTube), during years 3 and 4 of the project. We will target 15 lines of each type (drought resistant and susceptible based on results from Activity 1.1) with 2 treatments (stressed and control) in 3 replicates (180 individuals/year). This aspect of our work provides characterization of physiological differences of the divergent lines and provides an opportunity for collaboration between two different research groups at the forefront of developing sensor-based HTP approaches. Plants will be grown outdoors by our collaborators at INRA 138 in Toulouse, France during the sunflower growing season in 15L pots equipped with individual pot shelters. A 7,000 sq ft outdoor platform provides homogeneous environmental conditions for the plants to be tested. Throughout an experiment, environmental conditions are automatically recorded (temperature, wind, precipitation, evaporative demand) and a robot waters plants individually according to a pre-defined drought scenario. This scenario will be chosen to reproduce the stress observed in the DREC field experiments. Pot weight is measured automatically and plant water status (i.e., the fraction of transpirable soil water) is calculated on a daily basis. The Heliaphen platform also phenotypes the plants. Plant growth is assessed daily using ultrasonic sensors (plant height), laser scanner (stem diameter), and visible and nearinfrared imaging associated to image segmentation (leaf area). These measurements will be compared to the plant height measurements and vegetative indices measured with the field HTP system at DREC. Together with pot weights, leaf area estimations allow approximation of the transpiration rates of the different lines in various water stress conditions. These estimations will be confirmed by regular “manual” measurements of leaf area to avoid problems of organ orientation. Finally, the transpiration rates of the different lines in each treatment will be determined and compared to the data acquired through canopy temperature using the field HTP system.

Activity 1.5: Transcriptomic analyses

In addition to the extensive ecophysiological characterization described above, we will compare the transcriptional responses of the lines that are most and least resistant to each stress. Our main goals are to: (1) identify genes that are differentially expressed in response to the four stresses; (2) determine overall patterns in the number, direction, and timing of gene expression changes; (3) develop gene co-expression networks for each stress; and (4) assess the extent of overlap in co-expression networks between stresses. The evaluation of transcriptional responses will be conducted in growth rooms at UGA (for drought, salt, and low nutrient stress) and UBC (flooding stress) using four lines from the SAM population (two resistant and two susceptible), three biological replicates per line, four stresses (drought, salt, nutrient, flood), two treatments (control and stressed), three time points, and two tissues (576 samples total). The scale of the experiment necessitates that transcriptional responses be measured at the juvenile stage only. Also, note that the same four lines are unlikely to be chosen for analysis across the various stress treatments, and that the relevant stresses will be applied as described previously. Stress treatments will be applied starting at the four-leaf stage and will continue for 10 days. Whole seedlings will be harvested 24 hours after stress application to monitor expression changes due to acute stress, after 10 days to identify chronically expressed transcripts, and five days after recovery to assess whether stress-induced expression changes are fixed or plastic. Note that a pre-stress time point is not required because we will assay stressed and non-stressed individuals at each time point. All samples will be harvested at the same time of day to minimize the confounding effects of circadian variation. We will generate strand-specific RNAseq libraries139 from leaf and root tissue from each of the 576 seedlings and sequence them on an Illumina HiSeq (75 bp reads, v4 chemistry, 70-80x depth). There are multiple, rapidly maturing packages for analyzing RNAseq data. We currently use the Bioconductor package DESeq140 for identifying expression level differences141. Gene coexpression network analysis will be employed to identify co-expression modules that are associated with specific stress responses, as well as modules that respond to multiple stresses142,143. This will provide insight into gene regulatory networks144 and potentially reveal other genes that influence responses to stress145. The availability of these co-expression networks will also provide a framework for the interpretation of results from other aspects of the proposal. This will include insights into the putative functional modules within which our candidate stress resistance genes reside and their network positions and degree of connectedness to other members of a given module.

Activity 2: Population genomic analyses of stress adaptation in the wild

To address major evolutionary questions, and to further dissect genomic regions underlying abiotic stress resistance, we will search for associations between genotypic variation (SNPs and haplotypes) and important ecological variables (climate and soil characteristics) in natural populations of wild species contributing to our multi-species mapping population (see below). The challenge with this approach is distinguishing locus-environment correlations caused by selection from correlations arising from neutral processes (e.g., unrecognized population structure) and/or small sample sizes58. Simulations indicate that neutral correlations between genotypic and ecological variation arising from population structure can be minimized by sampling from geographically proximate, but ecologically divergent populations. False positives can be further reduced by separate analyses of environmental correlations in multiple species, since the same locus is unlikely to be repeatedly associated with a given environmental variable due to chance alone.

Activity 2.1: Collections & soil characterization

We will obtain soil samples and seeds from 50 populations across the full range of each of three extremophile species: H. annuus (tolerance to drought, flooding, and salt stress), H. argophyllus (drought, flooding, salt, and low-nutrient stress), and H. petiolaris (drought and low-nutrient stress). These three species are diploid annuals, with an outcrossing mating system. Helianthus annuus and H. petiolaris have broadly overlapping distributions across much of central and western North America. In contrast, H. argophyllus has a more linear geographic distribution, occurring for several hundred km along the Texas coastal plain at both coastline and inland sites. As far as possible, we will use the paired population sampling strategy described above to reduce neutral correlations caused by population structure. This will be straightforward for salt, low nutrient, or flooding stress because populations of the same species can be geographically proximal, but occur in soils that differ strongly in salinity, organic matter, and nitrogen availability (or occur at sites that are prone to spring flooding or not). Paired population sampling is more challenging for drought, but for the two widespread species (H. annuus and H. petiolaris) we will exploit both longitudinal and latitudinal precipitation gradients across the central and western USA to disrupt neutral correlations. Our sampling approach will build on the extensive seed collections and soil analyses previously made by the PIs, with the goal of filling gaps in our earlier collections. Note that samples from all populations, along with relevant collection information, will be deposited into the USDA sunflower germplasm collection. As detailed in her letter of collaboration, Dr. Laura Marek has agreed to oversee the distribution and long-term maintenance of these materials.

Activity 2.2: Genotyping of wild populations

For genotypic characterization, we will conduct WGS re-sequencing of normalized libraries147 from 1,500 genotypes (50 populations x 10 individuals per population x 3 species) to circa 6x depth. These sample sizes will allow us to detect essentially all strongly selected loci (s=0.1) and 20-90% of weakly selected loci (s=0.005), depending on demographic history146. Sequencing from normalized libraries represents a cost-effective strategy for comprehensively sampling the gene space in sunflower, as well as providing sufficient SNPs for haplotype reconstruction. In sunflower, for example, the genome is reduced from 3.6 Gb to circa 1 Gb of gene space. Such an approach is cost-competitive with a large SNP genotyping array, while avoiding ascertainment biases that are inherent to genotyping array methods, which can lead to erroneous conclusions concerning demographic history and natural selection (e.g. Lachance and Tishkoff148). The pipeline we previously developed with SAP will enable us to process the data from raw reads to SNP calls efficiently and accurately and will permit filtering of raw SNP calls to account for the minimum minor allele frequency, heterozygosity, and minimum depth of coverage to reduce the error rate in the data.

Activity 2.3: Analyses of population genomic data

To identify locally adapted alleles, we will search for correlations between SNPs and/or haplotypes and spatially explicit environmental variables, including soil fertility data from each site and climatic variables from the WorldClim database. Analyses will be performed for each species independently using a Bayesian statistical method implemented in the program Bayenv, which controls for the neutral correlations due to population structure. This program uses a null model for how allele frequencies co-vary among populations at putatively neutral loci and then identifies unusual correlations between allele frequency and ecological variables given this null model. In addition, we will search for loci with unusually large differences in allele frequency between populations using outlier detection methods, as such outlier loci may contribute to adaptive differences. Population structure can also result in false positives in outlier tests. To reduce the false positive rate, we will compare our simulated FST distribution to the empirical FST distribution to determine if the simulated mean and variance of FST is similar to the 50% quartiles of the empirical FST distribution. As our main interest is in loci with major effects, we will focus on consecutive outliers (using sliding window or HMM). However, the most compelling evidence of selection will be the discovery of the same unusual environmental correlations and/or outlier loci in multiple species. The results from these analyses will allow us to ask fundamental biological questions, including the extent of overlap in genes under selection for various stresses, the extent of gene re-use during adaptation in the different species, the types of genes under selection (regulatory vs. structural), and the positions of these genes within regulatory networks. We also will be able to assess the role of hybridization in adaptation at the gene level by asking if the same haplotype is under selection in multiple species; this would suggest a role for hybridization/introgression as opposed to parallel responses from standing variation or new mutations. Extensive range overlap and frequent hybridization between H. annuus and H. petiolaris make them ideal for addressing this question. The population genomic data and association/QTL mapping data from the SAM and MAGIC populations (below) are highly complementary. Because of low LD in the natural populations, we should be able to identify the specific gene (and site in some instances) that is targeted by selection. On the other hand, the phenotypic effects of this variation may be unclear. With association mapping/QTL data, we will have confidence in the phenotypic effects of a given locus, but it may contain multiple genes and SNPs. A combination of the two data sets will allow us to more confidently and efficiently link abiotic stress resistance to causal variants. An example of the potential power of this approach comes from studies of alcohol dehydrogenase (ADH). Allozyme studies showed that an allele of ADH was associated with flooded habitats, and this locus has repeatedly appeared as a top outlier in our transcriptome scans.

Activity 3: Development and characterization of multi-species mapping populations

To facilitate the efficient genetic analysis of complex trait variation in Helianthus and to produce materials containing exotic alleles that can be readily deployed in breeding programs, we will develop modified Multiparent Advanced Generation Inter-Cross (MAGIC) mapping populations156 that include both cultivated and wild sunflower donors. Such populations combine the high power to detect QTLs offered by biparental populations with the ability to assay a broader spectrum of diversity and improved resolution afforded by association mapping.

Activity 3.1: Population development

Because sunflower is a hybrid crop, we will develop separate male and female populations. For each population, our crossing design will involve four elite breeding lines (E) and four wild (W) donors. Each wild donor will be successively combined with three elite lines to produce four different intermated populations (IM1-4; see Figure 4, below) that will ultimately be intercrossed in all pairwise combinations (with a target of seed from 50 crosses from each of the six pairs, IM1 x IM2, IM1 x IM3, […] IM3 x IM4; 300 crosses total). We will then develop 600 RILs with equal representation from each of the six intercross pairs via multiple rounds of self-pollination. The elite donors were chosen in consultation with our public and private partners and were selected to represent a broad cross-section of cultivated sunflower diversity, including key resistance traits and both oil and confectionary varieties (Table 2). The wild donors, which have likewise been identified in consultation with our partners, include drought, salt, flooding, and dune (low nutrient) adapted populations. Initial crosses were made at UBC during the summer and fall of 2014, prior to the initiation of funding and IM1xF1 lines are currently being produced.

Activity 3.2: Genotyping & Phenotyping of MAGIC populations

As currently envisioned, normalized libraries of the 16 parental genotypes will be sequenced to circa 15x depth using the Illumina Hi-Seq platform. The 600 RILs from each population will be characterized using a genotyping by sequencing (GBS) approach157 we have optimized for sunflower. GBS is inexpensive, and based on our experience with wild and cultivated sunflower, should deliver >50,000 SNPs, but levels of missing data can be high. Missing markers can, however, be efficiently and accurately imputed in multi-parent mapping populations158. We will re-visit our genotyping strategy when the populations are ready for analysis and will use the most suitable and cost-effective methods available at that time. Due to the size and the timing of the availability of the MAGIC populations, evaluation of both populations under both field and greenhouse conditions is beyond the scope of the present proposal, though such work will be a priority going forward. Instead, we will grow and evaluate the female population under field conditions using the same general design and phenotyping methods (traditional as well as HTP) outlined above for the SAM population: two treatments (control, stress), 600 lines, 2 replicate plots per line, and an incomplete block design. The incomplete block design will be nested in two tiers to eliminate spurious variation arising from the large land area within each replicated complete block: each full-sib population will be a 10×10 simple lattice design nested within a sets-in-reps design among full-sib populations. Such an incomplete block design allows us to eliminate spurious “noise” variation due to unknown field trends, and allows us to measure, compare, and map within full-sib families with optimal precision. Note that we have reduced replication relative to the SAM population (from 4 to 2 replicates) because of the larger size of the MAGIC population, and we will conduct power analysis on the SAM population data to verify that this is sensible a posteriori and make adjustments as needed. The drought stress trials will be conducted in year 3 at DREC, whereas the evaluation of flooding, salt, and low nutrient stress will be carried out at Fargo, ND, Indian Head, SK, and NaSARRI, Uganda, respectively. Map construction and QTL detection will employ R/mpMap: a computational platform for the genetic analysis of multiparent recombinant inbred lines159. This work will thus result in the production and baseline characterization (including the mapping of QTLs underlying a variety of agriculturally important traits) of a highly valuable germplasm resource for the sunflower research and breeding community. This next generation germplasm resource and the accompanying genotypic information can be directly employed by our end users to efficiently move desired traits and alleles into their breeding programs through marker-assisted and/or genomic selection.

Activity 3.3: Detailed ecophysiological characterization

Though wholesale, population-level stress testing will necessarily be left to the future, we will use data from the field experiments to rank the lines for each focal stress. For each stress, the 25 most resistant MAGIC lines and the 25 least resistant MAGIC lines will be subjected to indepth phenotypic characterization under control and stressed conditions in year 4. Stress resistance will be tested using the treatments (control, drought, flooding, and low nutrients), experimental design, and trait measures as described for the greenhouse / growth chamber evaluation of the SAM lines (Table 1). A subset of 30 putatively drought resistant/susceptible MAGIC lines (15 of each) will also be characterized by our collaborators at INRA in year 4. This work will follow the same general methods described above for the Heliaphen-based evaluation of SAM lines. Note that all lines from this population will be made freely available via deposition in the USDA sunflower germplasm collection as soon as sufficient seed is available. As detailed in her letter of collaboration, Dr. Laura Marek has agreed to oversee the distribution and long-term maintenance of this population.

Activity 4: Functional validation of candidate stress resistance genes

Building on information from Activities 1-3, we will target a subset of genes that appear to have large impacts on drought, salt, or low nutrient resistance for functional validation. In the short run, this information will confirm the agronomic value of the alleles that are likely to be the focus of marker-assisted breeding efforts and ensure that appropriate markers are deployed for marker-assisted selection. In the longer run, this information will allow us to potentially increase trait efficacy using genome editing approaches and potentially extend our discoveries to other oilcrops (see below).

Activity 4.1: Functional analyses of candidate genes in sunflower

We will prioritize circa 15 genes that: (1) exhibit strong associations with resistance to one or more abiotic stresses; (2) show unusually strong correlations with the relevant environmental parameter(s) in natural populations of two or more wild species; (3) occur in regions of the genome with low LD, where trait associations can be confidently mapped to individual genes; (4) display notable differences in sequence or gene expression; and/or (5) exhibit the signature of divergent natural selection. Note that for the first several quarters, we will analyze candidate genes identified in previous projects, such as an allele of alcohol dehydrogenase that was previously shown to be associated with tolerance to flooding (see Activity 2.3). The Rieseberg lab has developed an effective protocol for transformation and RNAi in sunflower. Stable transgenics with RNAi constructs designed to silence selected candidate genes will be generated using an actin-2 promoter, or the candidate gene’s own promoter, since we and others have found that the standard 35S promoter is not well tolerated in sunflower. T1 plants will be assayed for silencing of the YFP reporter gene and their phenotypes recorded. Phenotypic modifications correlated with the segregation of the T-DNA and the silencing phenotype will be confirmed in the T2 generation. To minimize faulty inferences due to off-target gene silencing, we will test alleles for complementation in homozygous knock-out lines in Arabidopsis whenever possible, and employ genome editing approaches to locate causal mutations70,160. The latter will mainly be conducted in Aradidopsis because of its many experimental advantages. However, Rieseberg is currently collaborating with a seed company to optimize the CRISPR/Cas9 system for sunflower with the goal of enhancing the efficacy of a crop protection trait. Thus, it should be possible to employ such an approach in sunflower for the proposed project as well, if necessary.

Activity 4.2: Genome editing in soybean

For mutations displaying expected effects in Arabidopsis, we will introduce them into other crops as time and technology permits. Our initial focus will be on soybean since it is the world’s leading oilseed crop, has a similar climate suitability envelope in Canada, and transformation efficiencies are high. The CRISPR/Cas9 system is particularly promising for the proposed work since a variety of mutations/alleles can be made. Also, the mutations are permanent, so the transgene need not be retained after the cleavage event161, and collaborator Parrott’s lab brings experience with the CRISPR/Cas9 system in soybean162. Parrott’s group has developed a number of CRISPR family vectors for soybean that can be easily modified for use in the present project (reference). A milestone will be the introduction of 3-4 of the most promising mutations into soybean homologs.