An integrative approach to prioritize candidate causal genes for complex traits in cattle

Abstract
Genome-wide association studies (GWAS) have identified many quantitative trait loci (QTL) associated with complex traits, predominantly in non-coding regions, posing challenges in pinpointing the causal variants and their target genes. Three types of evidence can help identify the gene through which QTL acts: (1) proximity to the most significant GWAS variant, (2) correlation of gene expression with the trait, and (3) the gene’s physiological role in the trait. However, there is still uncertainty about the success of these methods in identifying the correct genes. Here, we test the ability of these methods in a comparatively simple series of traits associated with the concentration of polar lipids in milk. We conducted single-trait GWAS for ~14 million imputed variants and 56 individual milk polar lipid (PL) phenotypes in 336 cows. A multi-trait meta-analysis of GWAS identified 10,063 significant SNPs at FDR≤10% (P≤7.15E-5). Transcriptome data from blood (~12.5K genes, 143 cows) and mammary tissue (~12.2K genes, 169 cows) were analyzed using the genetic score omics regression (GSOR) method. This method links observed gene expression to genetically predicted phenotypes and was used to find associations between gene expression and 56 PL phenotypes. GSOR identified 2,186 genes in blood and 1,404 in mammary tissue associated with at least one PL phenotype (FDR≤1%). We partitioned the genome into non-overlapping windows of 100 Kb to test for overlap between GSOR-identified genes and GWAS signals. We found a significant overlap between these two datasets, indicating that GSOR-significant genes were more likely to be located within 100 Kb windows that include GWAS signals than those that do not (P=0.01; odds ratio=1.47). These windows included 70 significant genes expressed in mammary tissue and 95 in blood. Compared to all expressed genes in each tissue, these genes were enriched for lipid metabolism gene ontology (GO). That is, seven of the 70 significant mammary transcriptome genes (P<0.01; odds ratio=3.98) and five of the 95 significant blood genes (P<0.10; odds ratio=2.24) were involved in lipid metabolism GO. The candidate causal genes include DGAT1, ACSM5, SERINC5, ABHD3, CYP2U1, PIGL, ARV1, SMPD5, and NPC2, with some overlap between the two tissues. The overlap between GWAS, GSOR, and GO analyses suggests that together, these methods are more likely to identify genes mediating QTL, though their power remains limited, as reflected by modest odds ratios. Larger sample sizes would enhance the power of these analyses, but issues like linkage disequilibrium would remain.
Description
Keywords
Citation
Ghoreishifar M, Macleod IM, Chamberlain AJ, Liu Z, Lopdell TJ, Littlejohn MD, Xiang R, Pryce JE, Goddard ME. (2025). An integrative approach to prioritize candidate causal genes for complex traits in cattle. Plos Genetics. 21. 5 May.
Collections