pyRforest: a comprehensive R package for genomic data analysis featuring scikit-learn Random Forests in R.

dc.citation.volume24
dc.contributor.authorKolisnik T
dc.contributor.authorKeshavarz-Rahaghi F
dc.contributor.authorPurcell RV
dc.contributor.authorSmith ANH
dc.contributor.authorSilander OK
dc.coverage.spatialEngland
dc.date.accessioned2025-02-13T20:42:56Z
dc.date.available2025-02-13T20:42:56Z
dc.date.issued2024-10-07
dc.description.abstractRandom Forest models are widely used in genomic data analysis and can offer insights into complex biological mechanisms, particularly when features influence the target in interactive, nonlinear, or nonadditive ways. Currently, some of the most efficient Random Forest methods in terms of computational speed are implemented in Python. However, many biologists use R for genomic data analysis, as R offers a unified platform for performing additional statistical analysis and visualization. Here, we present an R package, pyRforest, which integrates Python scikit-learn "RandomForestClassifier" algorithms into the R environment. pyRforest inherits the efficient memory management and parallelization of Python, and is optimized for classification tasks on large genomic datasets, such as those from RNA-seq. pyRforest offers several additional capabilities, including a novel rank-based permutation method for biomarker identification. This method can be used to estimate and visualize P-values for individual features, allowing the researcher to identify a subset of features for which there is robust statistical evidence of an effect. In addition, pyRforest includes methods for the calculation and visualization of SHapley Additive exPlanations values. Finally, pyRforest includes support for comprehensive downstream analysis for gene ontology and pathway enrichment. pyRforest thus improves the implementation and interpretability of Random Forest models for genomic data analysis by merging the strengths of Python with R. pyRforest can be downloaded at: https://www.github.com/tkolisnik/pyRforest with an associated vignette at https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf.
dc.description.confidentialfalse
dc.edition.edition2025
dc.format.paginationelae038-
dc.identifier.author-urlhttps://www.ncbi.nlm.nih.gov/pubmed/39373492
dc.identifier.citationKolisnik T, Keshavarz-Rahaghi F, Purcell RV, Smith ANH, Silander OK. (2025). pyRforest: a comprehensive R package for genomic data analysis featuring scikit-learn Random Forests in R.. Brief Funct Genomics. 24. (pp. elae038-).
dc.identifier.doi10.1093/bfgp/elae038
dc.identifier.eissn2041-2657
dc.identifier.elements-typejournal-article
dc.identifier.issn2041-2649
dc.identifier.numberelae038
dc.identifier.pii7814658
dc.identifier.urihttps://mro.massey.ac.nz/handle/10179/72493
dc.languageeng
dc.publisherOxford University Press
dc.publisher.urihttps://academic.oup.com/bfg/article/doi/10.1093/bfgp/elae038/7814658
dc.relation.isPartOfBrief Funct Genomics
dc.rights(c) 2024 The Author/s
dc.rightsCC BY-NC 4.0
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0/
dc.subjectbioinformatics
dc.subjectbiomarker identification
dc.subjectgenomic data analysis
dc.subjectmachine learning
dc.subjectrandom forest
dc.subjectGenomics
dc.subjectSoftware
dc.subjectAlgorithms
dc.subjectData Analysis
dc.subjectHumans
dc.subjectComputational Biology
dc.subjectRandom Forest
dc.titlepyRforest: a comprehensive R package for genomic data analysis featuring scikit-learn Random Forests in R.
dc.typeJournal article
pubs.elements-id499492
pubs.organisational-groupOther

Files

Original bundle

Now showing 1 - 5 of 6
Loading...
Thumbnail Image
Name:
499492 PDF.pdf
Size:
816.43 KB
Format:
Adobe Portable Document Format
Description:
Evidence
Loading...
Thumbnail Image
Name:
supplementary_table_1_elae038.docx
Size:
1.95 MB
Format:
Microsoft Word XML
Description:
Evidence
Loading...
Thumbnail Image
Name:
supplementary_table_2_elae038.docx
Size:
1.95 MB
Format:
Microsoft Word XML
Description:
Evidence
Loading...
Thumbnail Image
Name:
supplementary_table_3_elae038.docx
Size:
1.95 MB
Format:
Microsoft Word XML
Description:
Evidence
Loading...
Thumbnail Image
Name:
supplementary_table_4_elae038.xlsx
Size:
93.64 KB
Format:
Microsoft Excel
Description:
Evidence

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
9.22 KB
Format:
Plain Text
Description:

Collections