How association genetics can find genes that help Joshua trees beat the heat
This post is by JTGP collaborator Jeremy Yoder, an Assistant Professor in the Department of Biology at California State University, Northridge, who studies ecological and evolutionary genetics.
Since we first launched the Joshua Tree Genome Project, we’ve told you that one big reason we want to sequence a Joshua tree genome is to find genes that are important for adaptation to climate, which could help us makes sure that Joshua trees survive and thrive in a climate-changed future. But we haven’t discussed in detail how we’ll find those climate-adapted genes. The key to that part of the project is association genetics, a method for sifting through the genome to find the parts that contribute to traits we care about.
To understand how association genetics works, first consider a case in which we already know of a gene that might be important for a particular Joshua tree trait, like height or flower shape or physiological performance — or even just growing in places that are hotter or cooler. To figure out whether different variants in the sequence of our “candidate gene” are related to differences in the trait, we could measure the trait in many trees and then sequence that gene in all of those trees. We would test the hypothesis that the gene shapes the trait by comparing the trait values of trees carrying different variants of the gene sequence. Figure 1 gives an example of what this might look like in a hypothetical case, with tree phenotypes plotted against diploid genotypes at our candidate gene — “homozygous” trees carrying two copies of the G variant have higher phenotype values than homozygous trees carrying two copies of the A variant, and “heterozygous” trees with one copy of each have intermediate phenotypes. In this case, we’d probably conclude that the candidate gene has some effect on the phenotype we decided to measure, because different variants of the gene are associated with significantly different phenotype values.
But what about when we don’t have a candidate gene in mind? This is a much more common situation, especially when we’re studying organisms like Joshua tree, which haven’t had much in-depth genetic analysis yet. Well, what we can do then is to conduct the same kind of test at many places across the whole genome — the more, the better. Genome-wide association (GWA) study work by collecting DNA sequence data from thousands or millions of variable loci in the genome, and comparing the variants at each of those places to the phenotypes of the individuals carrying those different variants. The handful of loci that show the strongest associations with the phenotype are, we understand, most likely to be within or close to genes that contribute to the measured phenotype differences.
Modern genome sequencing makes it easier than ever to collect the genetic data necessary for a GWA project, but it’s still not simple to do one rigorously. First, you can’t quite just sequence and measure a bunch of Joshua trees and perform the kind of simple test I cartooned in Figure 1. In natural populations, we have to contend with a phenomenon called isolation-by-distance (IBD), which means that Joshua trees from different populations will likely differ at some points in the genome simply because they come from different populations. When you test millions of places in the genome, you’ll likely find some of those differences that have everything to do with IBD and nothing to do with the phenotype, so it’s necessary to use a statistical test that accounts for IBD. Second, even when you account for confounding population genetic effects, an association test is fundamentally correlational — it doesn’t directly demonstrate that the different genetic variants at an associated site actually create the phenotype differences you’ve measured.
So a single GWA study needs to be connected to other results, from different kinds of experiments, to confirm that genes showing associations to a phenotype in one context show associations, or even direct effects, in other conditions. Ecological geneticists call this process of comparing different kinds of evidence for a gene’s effects “triangulation”.
Finally, to be as useful as possible, a GWA study needs a reference genome to provide context to its results — whether associated loci lie in genes, and what those genes might do. It’s possible to do association testing without a reference, collecting sequence data in such a way that you know individuals’ genotypes at many loci, but don’t know where those loci are with respect to each other in the genome. Sometimes you can still use this approach to determine that an associated locus is similar to a stretch of genetic sequence known to be a particular kind of gene in another, closely related species. More often, though, GWA without a genome results in a list of associated loci about which very little is known beyond the fact that they’re associated with the phenotype you measured.
Building up the genomic resources and experimental knowledge base required to support good GWA can take decades, but the Joshua Tree Genome Project’s collaborators bring together the range of expertise necessary to do it in the course of a four-year NSF-sponsored project. We’ve carefully planned our sampling design and statistical analysis to control for confounding population genetic effects. We’ll perform controlled experiments in Joshua tree physiology and gene expression to help “triangulate” the importance of climate- and growth-associated loci. And, first and foremost, we’re building a carefully annotated reference genome to provide context for GWA results. It’s going to be a lot of work, but it’s what we need to do to confidently identify the genes that help Joshua tree cope with extreme climates.