Journal Articles
Permanent URI for this collectionhttps://mro.massey.ac.nz/handle/10179/7915
Browse
4 results
Search Results
Item Survey of functional Mendelian variants in New Zealand Huntaway and Heading dog breeds(John Wiley and Sons Ltd on behalf of Stichting International Foundation for Animal Genetics, 2025-10-01) Smith F; Lopdell T; Stephen M; Henry M; Dittmer K; Hunt H; Sneddon N; Williams L; Rolfe J; Garrick D; Littlejohn MDNew Zealand (NZ) Huntaway and Heading dogs are working breeds that play active roles on farms across NZ. While these breeds are common in NZ, they are not well-known elsewhere, and little is understood about their genetic make-up. Here, we used whole genome sequencing to provide a comprehensive genomic view of 249 working dogs. As first use of this resource, we report the allele frequencies of provisionally functional variants aggregated from the Online Mendelian Inheritance in Animals (OMIA) database. Of 435 “probably causal” variants, 27 segregated in our sample. Notable examples of disease variants potentially actionable for selection include those in the CUBN, CLN8, SGSH, SOD1, VWF, and VPS13B genes. These findings will enable genetic testing and selection opportunities to help improve the health and performance of future generations of these unique breeds.Item pyRforest: a comprehensive R package for genomic data analysis featuring scikit-learn Random Forests in R.(Oxford University Press, 2024-10-07) Kolisnik T; Keshavarz-Rahaghi F; Purcell RV; Smith ANH; Silander OKRandom Forest models are widely used in genomic data analysis and can offer insights into complex biological mechanisms, particularly when features influence the target in interactive, nonlinear, or nonadditive ways. Currently, some of the most efficient Random Forest methods in terms of computational speed are implemented in Python. However, many biologists use R for genomic data analysis, as R offers a unified platform for performing additional statistical analysis and visualization. Here, we present an R package, pyRforest, which integrates Python scikit-learn "RandomForestClassifier" algorithms into the R environment. pyRforest inherits the efficient memory management and parallelization of Python, and is optimized for classification tasks on large genomic datasets, such as those from RNA-seq. pyRforest offers several additional capabilities, including a novel rank-based permutation method for biomarker identification. This method can be used to estimate and visualize P-values for individual features, allowing the researcher to identify a subset of features for which there is robust statistical evidence of an effect. In addition, pyRforest includes methods for the calculation and visualization of SHapley Additive exPlanations values. Finally, pyRforest includes support for comprehensive downstream analysis for gene ontology and pathway enrichment. pyRforest thus improves the implementation and interpretability of Random Forest models for genomic data analysis by merging the strengths of Python with R. pyRforest can be downloaded at: https://www.github.com/tkolisnik/pyRforest with an associated vignette at https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf.Item Screening and Identification of Muscle-Specific Candidate Genes via Mouse Microarray Data Analysis.(Frontiers Media S.A., 2021-12-13) Raza SHA; Liang C; Guohua W; Pant SD; Mohammedsaleh ZM; Shater AF; Alotaibi MA; Khan R; Schreurs N; Cheng G; Mei C; Zan L; Ibelli AMGMuscle tissue is involved with every stage of life activities and has roles in biological processes. For example, the blood circulation system needs the heart muscle to transport blood to all parts, and the movement cannot be separated from the participation of skeletal muscle. However, the process of muscle development and the regulatory mechanisms of muscle development are not clear at present. In this study, we used bioinformatics techniques to identify differentially expressed genes specifically expressed in multiple muscle tissues of mice as potential candidate genes for studying the regulatory mechanisms of muscle development. Mouse tissue microarray data from 18 tissue samples was selected from the GEO database for analysis. Muscle tissue as the treatment group, and the other 17 tissues as the control group. Genes expressed in the muscle tissue were different to those in the other 17 tissues and identified 272 differential genes with highly specific expression in muscle tissue, including 260 up-regulated genes and 12 down regulated genes. is the genes were associated with the myofibril, contractile fibers, and sarcomere, cytoskeletal protein binding, and actin binding. KEGG pathway analysis showed that the differentially expressed genes in muscle tissue were mainly concentrated in pathways for AMPK signaling, cGMP PKG signaling calcium signaling, glycolysis, and, arginine and proline metabolism. A PPI protein interaction network was constructed for the selected differential genes, and the MCODE module used for modular analysis. Five modules with Score > 3.0 are selected. Then the Cytoscape software was used to analyze the tissue specificity of differential genes, and the genes with high degree scores collected, and some common genes selected for quantitative PCR verification. The conclusion is that we have screened the differentially expressed gene set specific to mouse muscle to provide potential candidate genes for the study of the important mechanisms of muscle development.Item Building a global genomics observatory: Using GEOME (the Genomic Observatories Metadatabase) to expedite and improve deposition and retrieval of genetic data and metadata for biodiversity research.(2020-11) Riginos C; Crandall ED; Liggins L; Gaither MR; Ewing RB; Meyer C; Andrews KR; Euclide PT; Titus BM; Therkildsen NO; Salces-Castellano A; Stewart LC; Toonen RJ; Deck JGenetic data represent a relatively new frontier for our understanding of global biodiversity. Ideally, such data should include both organismal DNA-based genotypes and the ecological context where the organisms were sampled. Yet most tools and standards for data deposition focus exclusively either on genetic or ecological attributes. The Genomic Observatories Metadatabase (GEOME: geome-db.org) provides an intuitive solution for maintaining links between genetic data sets stored by the International Nucleotide Sequence Database Collaboration (INSDC) and their associated ecological metadata. GEOME facilitates the deposition of raw genetic data to INSDCs sequence read archive (SRA) while maintaining persistent links to standards-compliant ecological metadata held in the GEOME database. This approach facilitates findable, accessible, interoperable and reusable data archival practices. Moreover, GEOME enables data management solutions for large collaborative groups and expedites batch retrieval of genetic data from the SRA. The article that follows describes how GEOME can enable genuinely open data workflows for researchers in the field of molecular ecology.
