Journal Articles

Permanent URI for this collectionhttps://mro.massey.ac.nz/handle/10179/7915

Browse

Search Results

Now showing 1 - 10 of 21
  • Item
    pyRforest: a comprehensive R package for genomic data analysis featuring scikit-learn Random Forests in R.
    (Oxford University Press, 2024-10-07) Kolisnik T; Keshavarz-Rahaghi F; Purcell RV; Smith ANH; Silander OK
    Random Forest models are widely used in genomic data analysis and can offer insights into complex biological mechanisms, particularly when features influence the target in interactive, nonlinear, or nonadditive ways. Currently, some of the most efficient Random Forest methods in terms of computational speed are implemented in Python. However, many biologists use R for genomic data analysis, as R offers a unified platform for performing additional statistical analysis and visualization. Here, we present an R package, pyRforest, which integrates Python scikit-learn "RandomForestClassifier" algorithms into the R environment. pyRforest inherits the efficient memory management and parallelization of Python, and is optimized for classification tasks on large genomic datasets, such as those from RNA-seq. pyRforest offers several additional capabilities, including a novel rank-based permutation method for biomarker identification. This method can be used to estimate and visualize P-values for individual features, allowing the researcher to identify a subset of features for which there is robust statistical evidence of an effect. In addition, pyRforest includes methods for the calculation and visualization of SHapley Additive exPlanations values. Finally, pyRforest includes support for comprehensive downstream analysis for gene ontology and pathway enrichment. pyRforest thus improves the implementation and interpretability of Random Forest models for genomic data analysis by merging the strengths of Python with R. pyRforest can be downloaded at: https://www.github.com/tkolisnik/pyRforest with an associated vignette at https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf.
  • Item
    Genomic selection shows improved expected genetic gain over phenotypic selection of agronomic traits in allotetraploid white clover.
    (Springer Nature, 2025-01-23) Ehoche OG; Arojju SK; Jahufer MZZ; Jauregui R; Larking AC; Cousins G; Tate JA; Lockhart PJ; Griffiths AG
    Genomic selection using white clover multi-year-multi-site data showed predicted genetic gains through integrating among-half-sibling-family phenotypic selection and within-family genomic selection were up to 89% greater than half-sibling-family phenotypic selection alone. Genomic selection, an effective breeding tool used widely in plants and animals for improving low-heritability traits, has only recently been applied to forages. We explored the feasibility of implementing genomic selection in white clover (Trifolium repens L.), a key forage legume which has shown limited genetic improvement in dry matter yield (DMY) and persistence traits. We used data from a training population comprising 200 half-sibling (HS) families evaluated in a cattle-grazed field trial across three years and two locations. Combining phenotype and genotyping-by-sequencing (GBS) data, we assessed different two-stage genomic prediction models, including KGD-GBLUP developed for low-depth GBS data, on DMY, growth score, leaf size and stolon traits. Predictive abilities were similar among the models, ranging from -0.17 to 0.44 across traits, and remained stable for most traits when reducing model input to 100-120 HS families and 5500 markers, suggesting genomic selection is viable with fewer resources. Incorporating a correlated trait with a primary trait in multi-trait prediction models increased predictive ability by 28-124%. Deterministic modelling showed integrating among-HS-family phenotypic selection and within-family genomic selection at different selection pressures estimated up to 89% DMY genetic gain compared to phenotypic selection alone, despite a modest predictive ability of 0.3. This study demonstrates the potential benefits of combining genomic and phenotypic selection to boost genetic gains in white clover. Using cost-effective GBS paired with a prediction model optimized for low read-depth data, the approach can achieve prediction accuracies comparable to traditional models, providing a viable path for implementing genomic selection in white clover.
  • Item
    A protocol combining breath testing and ex vivo fermentations to study the human gut microbiome
    (Elsevier Inc, 2021-03-19) Payling L; Roy NC; Fraser K; Loveday SM; Sims IM; Janssen PH; Hill SJ; Raymond LG; McNabb WC
    This protocol describes the application of breath testing and ex vivo fermentations to study the association between breath methane and the composition and functionality of the gut microbiome. The protocol provides a useful systems biology approach for studying the gut microbiome in humans, which combines standardized methods in human breath testing and fecal sampling. The model described is accessible and easy to repeat, but its relative simplicity means that it can deviate from human physiological conditions.
  • Item
    Whole-genome resequencing of the native sheep provides insights into the microevolution and identifies genes associated with reproduction traits
    (BioMed Central Ltd, 2023-07-11) Zhu M; Yang Y; Yang H; Zhao Z; Zhang H; Blair HT; Zheng W; Wang M; Fang C; Yu Q; Zhou H; Qi H
    BACKGROUND: Sheep genomes undergo numerous genes losses, gains and mutation that generates genome variability among breeds of the same species after long time natural and artificial selection. However, the microevolution of native sheep in northwest China remains elusive. Our aim was to compare the genomes and relevant reproductive traits of four sheep breeds from different climatic environments, to unveil the selection challenges that this species cope with, and the microevolutionary differences in sheep genomes. Here, we resequenced the genomes of 4 representative sheep breeds in northwest China, including Kazakh sheep and Duolang sheep of native breeds, and Hu sheep and Suffolk sheep of exotic breeds with different reproductive characteristics. RESULTS: We found that these four breeds had a similar expansion experience from ~ 10,000 to 1,000,000 years ago. In the past 10,000 years, the selection intensity of the four breeds was inconsistent, resulting in differences in reproductive traits. We explored the sheep variome and selection signatures by FST and θπ. The genomic regions containing genes associated with different reproductive traits that may be potential targets for breeding and selection were detected. Furthermore, non-synonymous mutations in a set of plausible candidate genes and significant differences in their allele frequency distributions across breeds with different reproductive characteristics were found. We identified PAK1, CYP19A1 and PER1 as a likely causal gene for seasonal reproduction in native sheep through qPCR, Western blot and ELISA analyses. Also, the haplotype frequencies of 3 tested gene regions related to reproduction were significantly different among four sheep breeds. CONCLUSIONS: Our results provide insights into the microevolution of native sheep and valuable genomic information for identifying genes associated with important reproductive traits in sheep.
  • Item
    Genomic insights into the physiology of Quinella, an iconic uncultured rumen bacterium.
    (Nature Portfolio, 2022-10-20) Kumar S; Altermann E; Leahy SC; Jauregui R; Jonker A; Henderson G; Kittelmann S; Attwood GT; Kamke J; Waters SM; Patchett ML; Janssen PH
    Quinella is a genus of iconic rumen bacteria first reported in 1913. There are no cultures of these bacteria, and information on their physiology is scarce and contradictory. Increased abundance of Quinella was previously found in the rumens of some sheep that emit low amounts of methane (CH4) relative to their feed intake, but whether Quinella contributes to low CH4 emissions is not known. Here, we concentrate Quinella cells from sheep rumen contents, extract and sequence DNA, and reconstruct Quinella genomes that are >90% complete with as little as 0.20% contamination. Bioinformatic analyses of the encoded proteins indicate that lactate and propionate formation are major fermentation pathways. The presence of a gene encoding a potential uptake hydrogenase suggests that Quinella might be able to use free hydrogen (H2). None of the inferred metabolic pathways is predicted to produce H2, a major precursor of CH4, which is consistent with the lower CH4 emissions from those sheep with high abundances of this bacterium.
  • Item
    Genomic and clinical characteristics of campylobacteriosis in Australia.
    (Microbiology Society, 2024-01) Cribb DM; Moffatt CRM; Wallace RL; McLure AT; Bulach D; Jennison AV; French N; Valcanis M; Glass K; Kirk MD
    Campylobacter spp. are a common cause of bacterial gastroenteritis in Australia, primarily acquired from contaminated meat. We investigated the relationship between genomic virulence characteristics and the severity of campylobacteriosis, hospitalisation, and other host factors.We recruited 571 campylobacteriosis cases from three Australian states and territories (2018-2019). We collected demographic, health status, risk factors, and self-reported disease data. We whole genome sequenced 422 C. jejuni and 84 C. coli case isolates along with 616 retail meat isolates. We classified case illness severity using a modified Vesikari scoring system, performed phylogenomic analysis, and explored risk factors for hospitalisation and illness severity.On average, cases experienced a 7.5 day diarrhoeal illness with additional symptoms including stomach cramps (87.1 %), fever (75.6 %), and nausea (72.0 %). Cases aged ≥75 years had milder symptoms, lower Vesikari scores, and higher odds of hospitalisation compared to younger cases. Chronic gastrointestinal illnesses also increased odds of hospitalisation. We observed significant diversity among isolates, with 65 C. jejuni and 21 C. coli sequence types. Antimicrobial resistance genes were detected in 20.4 % of isolates, but multidrug resistance was rare (0.04 %). Key virulence genes such as cdtABC (C. jejuni) and cadF were prevalent (>90 % presence) but did not correlate with disease severity or hospitalisation. However, certain genes (e.g. fliK, Cj1136, and Cj1138) appeared to distinguish human C. jejuni cases from food source isolates.Campylobacteriosis generally presents similarly across cases, though some are more severe. Genotypic virulence factors identified in the literature to-date do not predict disease severity but may differentiate human C. jejuni cases from food source isolates. Host factors like age and comorbidities have a greater influence on health outcomes than virulence factors.
  • Item
    High-resolution genomic analysis to investigate the impact of the invasive brushtail possum (Trichosurus vulpecula) and other wildlife on microbial water quality assessments.
    (Public Library of Science (PLoS), 2024-01-18) Moinet M; Rogers L; Biggs P; Marshall J; Muirhead R; Devane M; Stott R; Cookson A; Adenyo C
    Escherichia coli are routine indicators of fecal contamination in water quality assessments. Contrary to livestock and human activities, brushtail possums (Trichosurus vulpecula), common invasive marsupials in Aotearoa/New Zealand, have not been thoroughly studied as a source of fecal contamination in freshwater. To investigate their potential role, Escherichia spp. isolates (n = 420) were recovered from possum gut contents and feces and were compared to those from water, soil, sediment, and periphyton samples, and from birds and other introduced mammals collected within the Mākirikiri Reserve, Dannevirke. Isolates were characterized using E. coli-specific real-time PCR targeting the uidA gene, Sanger sequencing of a partial gnd PCR product to generate a gnd sequence type (gST), and for 101 isolates, whole genome sequencing. Escherichia populations from 106 animal and environmental sample enrichments were analyzed using gnd metabarcoding. The alpha diversity of Escherichia gSTs was significantly lower in possums and animals compared with aquatic environmental samples, and some gSTs were shared between sample types, e.g., gST535 (in 85% of samples) and gST258 (71%). Forty percent of isolates gnd-typed and 75% of reads obtained by metabarcoding had gSTs shared between possums, other animals, and the environment. Core-genome single nucleotide polymorphism (SNP) analysis showed limited variation between several animal and environmental isolates (<10 SNPs). Our data show at an unprecedented scale that Escherichia clones are shared between possums, other wildlife, water, and the wider environment. These findings support the potential role of possums as contributors to fecal contamination in Aotearoa/New Zealand freshwater. Our study deepens the current knowledge of Escherichia populations in under-sampled wildlife. It presents a successful application of high-resolution genomic methods for fecal source tracking, thereby broadening the analytical toolbox available to water quality managers. Phylogenetic analysis of isolates and profiling of Escherichia populations provided useful information on the source(s) of fecal contamination and suggest that comprehensive invasive species management strategies may assist in restoring not only ecosystem health but also water health where microbial water quality is compromised.
  • Item
    RADseq-based population genomic analysis and environmental adaptation of rare and endangered recretohalophyte Reaumuria trigyna.
    (John Wiley and Sons, Inc., 2024-03-01) Dang Z; Li J; Liu Y; Song M; Lockhart PJ; Tian Y; Niu M; Wang Q; Varshney R
    Genetic diversity reflects the survival potential, history, and population dynamics of an organism. It underlies the adaptive potential of populations and their response to environmental change. Reaumuria trigyna is an endemic species in the Eastern Alxa and West Ordos desert regions in China. The species has been considered a good candidate to explore the unique survival strategies of plants that inhabit this area. In this study, we performed population genomic analyses based on restriction-site associated DNA sequencing to understand the genetic diversity, population genetic structure, and differentiation of the species. Analyses of 92,719 high-quality single-nucleotide polymorphisms (SNPs) indicated that overall genetic diversity of R. trigyna was low (HO = 0.249 and HE = 0.208). No significant genetic differentiation was observed among the investigated populations. However, a subtle population genetic structure was detected. We suggest that this might be explained by adaptive diversification reinforced by the geographical isolation of populations. Overall, 3513 outlier SNPs were located in 243 gene-coding sequences in the R. trigyna transcriptome. Potential sites under diversifying selection occurred in genes (e.g., AP2/EREBP, E3 ubiquitin-protein ligase, FLS, and 4CL) related to phytohormone regulation and synthesis of secondary metabolites which have roles in adaptation of species. Our genetic analyses provide scientific criteria for evaluating the evolutionary capacity of R. trigyna and the discovery of unique adaptions. Our findings extend knowledge of refugia, environmental adaption, and evolution of germplasm resources that survive in the Ordos area.
  • Item
    Comparative genome identification of accessory genes associated with strong biofilm formation in Vibrio parahaemolyticus.
    (Elsevier B.V., 2023-04-01) Wang D; Fletcher GC; Gagic D; On SLW; Palmer JS; Flint SH
    Vibrio parahaemolyticus biofilms on the seafood processing plant surfaces are a potential source of seafood contamination and subsequent food poisoning. Strains differ in their ability to form biofilm, but little is known about the genetic characteristics responsible for biofilm development. In this study, pangenome and comparative genome analysis of V. parahaemolyticus strains reveals genetic attributes and gene repertoire that contribute to robust biofilm formation. The study identified 136 accessory genes that were exclusively present in strong biofilm forming strains and these were functionally assigned to the Gene Ontology (GO) pathways of cellulose biosynthesis, rhamnose metabolic and catabolic processes, UDP-glucose processes and O antigen biosynthesis (p < 0.05). Strategies of CRISPR-Cas defence and MSHA pilus-led attachment were implicated via Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation. Higher levels of horizontal gene transfer (HGT) were inferred to confer more putatively novel properties on biofilm-forming V. parahaemolyticus. Furthermore, cellulose biosynthesis, a neglected potential virulence factor, was identified as being acquired from within the order Vibrionales. The cellulose synthase operons in V. parahaemolyticus were examined for their prevalence (22/138, 15.94 %) and were found to consist of the genes bcsG, bcsE, bcsQ, bcsA, bcsB, bcsZ, bcsC. This study provides insights into robust biofilm formation of V. parahaemolyticus at the genomic level and facilitates: identification of key attributes for robust biofilm formation, elucidation of biofilm formation mechanisms and development of potential targets for novel control strategies of persistent V. parahaemolyticus.
  • Item
    Genomic architecture of resistance to latania scale (H. lataniae) in kiwifruit (A. chinensis var. chinensis)
    (BioMed Central Ltd, 2023-10-31) Flay C; Tahir J; Hilario E; Fraser L; Stannard K; Symonds V; Datson P
    BACKGROUND: Latania scale (Hemiberlesia lataniae Signoret) is an armoured scale insect known to cause damage to kiwifruit plants and fruit, which ultimately reduces crop values and creates post-harvest export and quarantine issues. Resistance to H. lataniae does exist in some commercial cultivars of kiwifruit. However, some of the commercial cultivars bred in New Zealand have not inherited alleles for resistance to H. lataniae carried by their parents. To elucidate the architecture of resistance in the parents and develop molecular markers to assist breeding, these experiments analysed the inheritance of resistance to H. lataniae from families related to commercial cultivars. RESULTS: The first experiment identified a 15.97 Mb genomic region of interest for resistance to H. lataniae in rtGBS data of 3.23 to 19.20 Mb on chromosome 10. A larger population was then QTL mapped, which confirmed the region of interest as the sole locus contributing to H. lataniae resistance. inDel markers mapping the region of low recombination under the QTL peak further narrowed the region associated with H. lataniae resistance to a 5.73 Mb region. CONCLUSIONS: The kiwifruit populations and genomic methods used in this study identify the same non-recombinant region of chromosome 10 which confers resistance of A. chinensis var. chinensis to H. lataniae. The markers developed to target the H. lataniae resistance loci will reduce the amount of costly and time-consuming phenotyping required for breeding H. lataniae scale resistance into new kiwifruit cultivars.