Massey Documents by Type
Permanent URI for this communityhttps://mro.massey.ac.nz/handle/10179/294
Browse
10 results
Search Results
Item Bioinformatics role of the WGCNA analysis and co-expression network identifies of prognostic marker in lung cancer(Elsevier B.V. on behalf of King Saud University, 2022-05-01) Chengcheng L; Raza SHA; Shengchen Y; Mohammedsaleh ZM; Shater AF; Saleh FM; Alamoudi MO; Aloufi BH; Mohajja Alshammari A; Schreurs NM; Zan LLung cancer is the most talked about cancer in the world. It is also one of the cancers that currently has a high mortality rate. The aim of our research is to find more effective therapeutic targets and prognostic markers for human lung cancer. First, we download gene expression data from the GEO database. We performed weighted co-expression network analysis on the selected genes, we then constructed scale-free networks and topological overlap matrices, and performed correlation modular analysis with the cancer group. We screened the 200 genes with the highest correlation in the cyan module for functional enrichment analysis and protein interaction network construction, found that most of them focused on cell division, tumor necrosis factor-mediated signaling pathways, cellular redox homeostasis, reactive oxygen species biosynthesis, and other processes, and were related to the cell cycle, apoptosis, HIF-1 signaling pathway, p53 signaling pathway, NF-κB signaling pathway, and several cancer disease pathways are involved. Finally, we used the GEPIA website data to perform survival analysis on some of the genes with GS > 0.6 in the cyan module. CBX3, AHCY, MRPL12, TPGB, TUBG1, KIF11, LRRC59, MRPL17, TMEM106B, ZWINT, TRIP13, and HMMR was identified as an important prognostic factor for lung cancer patients. In summary, we identified 12 mRNAs associated with lung cancer prognosis. Our study contributes to a deeper understanding of the molecular mechanisms of lung cancer and provides new insights into drug use and prognosis.Item Integrative analysis identifies two molecular and clinical subsets in Luminal B breast cancer(Elsevier Inc, 2023-09-15) Wang H; Liu B; Long J; Yu J; Ji X; Li J; Zhu N; Zhuang X; Li L; Chen Y; Liu Z; Wang S; Zhao SComprehensive multiplatform analysis of Luminal B breast cancer (LBBC) specimens identifies two molecularly distinct, clinically relevant subtypes: Cluster A associated with cell cycle and metabolic signaling and Cluster B with predominant epithelial mesenchymal transition (EMT) and immune response pathways. Whole-exome sequencing identified significantly mutated genes including TP53, PIK3CA, ERBB2, and GATA3 with recurrent somatic mutations. Alterations in DNA methylation or transcriptomic regulation in genes (FN1, ESR1, CCND1, and YAP1) result in tumor microenvironment reprogramming. Integrated analysis revealed enriched biological pathways and unexplored druggable targets (cancer-testis antigens, metabolic enzymes, kinases, and transcription regulators). A systematic comparison between mRNA and protein displayed emerging expression patterns of key therapeutic targets (CD274, YAP1, AKT1, and CDH1). A potential ceRNA network was developed with a significantly different prognosis between the two subtypes. This integrated analysis reveals a complex molecular landscape of LBBC and provides the utility of targets and signaling pathways for precision medicine.Item DeepCAC: a deep learning approach on DNA transcription factors classification based on multi-head self-attention and concatenate convolutional neural network(BioMed Central Ltd, 2023-09-18) Zhang J; Liu B; Wu J; Wang Z; Li JUnderstanding gene expression processes necessitates the accurate classification and identification of transcription factors, which is supported by high-throughput sequencing technologies. However, these techniques suffer from inherent limitations such as time consumption and high costs. To address these challenges, the field of bioinformatics has increasingly turned to deep learning technologies for analyzing gene sequences. Nevertheless, the pursuit of improved experimental results has led to the inclusion of numerous complex analysis function modules, resulting in models with a growing number of parameters. To overcome these limitations, it is proposed a novel approach for analyzing DNA transcription factor sequences, which is named as DeepCAC. This method leverages deep convolutional neural networks with a multi-head self-attention mechanism. By employing convolutional neural networks, it can effectively capture local hidden features in the sequences. Simultaneously, the multi-head self-attention mechanism enhances the identification of hidden features with long-distant dependencies. This approach reduces the overall number of parameters in the model while harnessing the computational power of sequence data from multi-head self-attention. Through training with labeled data, experiments demonstrate that this approach significantly improves performance while requiring fewer parameters compared to existing methods. Additionally, the effectiveness of our approach is validated in accurately predicting DNA transcription factor sequences.Item Abundant dsRNA picobirnaviruses show little geographic or host association in terrestrial systems.(Elsevier, 2023-08) Knox MA; Wierenga J; Biggs PJ; Gedye K; Almeida V; Hall R; Kalema-Zikusoka G; Rubanga S; Ngabirano A; Valdivia-Granda W; Hayman DTSPicobirnaviruses are double-stranded RNA viruses known from a wide range of host species and locations but with unknown pathogenicity and host relationships. Here, we examined the diversity of picobirnaviruses from cattle and gorillas within and around Bwindi Impenetrable Forest National Park (BIFNP), Uganda, where wild and domesticated animals and humans live in relatively close contact. We use metagenomic sequencing with bioinformatic analyses to examine genetic diversity. We compared our findings to global Picobirnavirus diversity using clustering-based analyses. Picobirnavirus diversity at Bwindi was high, with 14 near-complete RdRp and 15 capsid protein sequences, and 497 new partial viral sequences recovered from 44 gorilla samples and 664 from 16 cattle samples. Sequences were distributed throughout a phylogenetic tree of globally derived picobirnaviruses. The relationship with Picobirnavirus diversity and host taxonomy follows a similar pattern to the global dataset, generally lacking pattern with either host or geography.Item A multi-label classification model for full slice brain computerised tomography image(BioMed Central Ltd, 2020-11-18) Li J; Fu G; Chen Y; Li P; Liu B; Pei Y; Feng HBACKGROUND: Screening of the brain computerised tomography (CT) images is a primary method currently used for initial detection of patients with brain trauma or other conditions. In recent years, deep learning technique has shown remarkable advantages in the clinical practice. Researchers have attempted to use deep learning methods to detect brain diseases from CT images. Methods often used to detect diseases choose images with visible lesions from full-slice brain CT scans, which need to be labelled by doctors. This is an inaccurate method because doctors detect brain disease from a full sequence scan of CT images and one patient may have multiple concurrent conditions in practice. The method cannot take into account the dependencies between the slices and the causal relationships among various brain diseases. Moreover, labelling images slice by slice spends much time and expense. Detecting multiple diseases from full slice brain CT images is, therefore, an important research subject with practical implications. RESULTS: In this paper, we propose a model called the slice dependencies learning model (SDLM). It learns image features from a series of variable length brain CT images and slice dependencies between different slices in a set of images to predict abnormalities. The model is necessary to only label the disease reflected in the full-slice brain scan. We use the CQ500 dataset to evaluate our proposed model, which contains 1194 full sets of CT scans from a total of 491 subjects. Each set of data from one subject contains scans with one to eight different slice thicknesses and various diseases that are captured in a range of 30 to 396 slices in a set. The evaluation results present that the precision is 67.57%, the recall is 61.04%, the F1 score is 0.6412, and the areas under the receiver operating characteristic curves (AUCs) is 0.8934. CONCLUSION: The proposed model is a new architecture that uses a full-slice brain CT scan for multi-label classification, unlike the traditional methods which only classify the brain images at the slice level. It has great potential for application to multi-label detection problems, especially with regard to the brain CT images.Item Investigation of low resolution point clouds for illumination correction in pushbroom hyperspectral images : a thesis presented in partial fulfilment of the requirements for the degree of Master of Engineering in Machatronics at Massey University, Turitea Campus, Palmerston North, New Zealand(Massey University, 2018) Haarhoff, William B.Global food demand is predicted to double between 2015 and 2050. Current agricultural production is unable to facilitate this growth. Consequently, plant breeding must be accelerated to breed improved cultivars that can meet this demand. While technologies such as genomics are suitable for accelerating plant breeding, phenotyping lags behind and is currently considered the bottleneck. Consequently, imaging and remote sensing technologies are being used to provide quantitative, reliable phenotype information. One such technology; hyperspectral imaging can provide physiological, biophysical, and biochemical phenotypic information. While hyperspectral imaging has reached a substantial level of maturity in aerial and satellite based remote sensing applications, it is still underdeveloped in the close-range lab-based phenotyping scenario. In particular is the effect of illumination and complex plant geometry which affects the measured signal and is even more pronounced in the close range hyperspectral imaging. Methods for correction of illumination/geometry effects developed for aerial, and satellite-based imaging are unsuitable for close range hyperspectral imaging. Recently there has been an interest in fusing hyperspectral images with point clouds captured by 3D imaging devices to provide more comprehensive high dimensional phenotype information. However, one study focusses on the possibility of using 3D geometry of the plant to correct for the effects of illumination in hyperspectral images. This study investigates the use of low resolution point clouds captured with low cost devices for use in illumination modelling and correction of hyperspectral images acquired in close range lab-based scenario.Item Bio-mirrors and networking security : for the partial fulfilment of Masters of Information Sciences, Information Systems major, 2006(Massey University, 2006) Mubayiwa, DouglasBioinformatics databanks have been the source of data to bioscience researchers over the years. They need this information especially in the analysis of raw data. When this data is needed, it has to be readily available. This thesis seeks to address the current problems of unavailable data at a critical time. Continued retrieval of data from far away sites is expensive in both time and network resources. Care must also be taken to secure this data otherwise by the time it reaches the researcher, it will be useless. In response to this problem being addressed, this thesis describes a way to move data securely so that the necessary data is stored nearest to whoever requires it. A proposed initial prototype has been implemented with capacity to grow. The overall architecture of the system, the prototype and other related issues are also discussed in this thesis.Item Exploring biological sequence space : selected problems in sequence analysis and phylogenetics : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computational Biology at Massey University(Massey University, 2012) McComish, Bennet JamesAs the volume and complexity of available sequence data continues to grow at an exponential rate, the need for new sequence analysis techniques becomes more urgent, as does the need to test and to extend the existing techniques. These include, among others, techniques for assembling raw sequence data into usable genomic sequences; for using these sequences to investigate the evolutionary history of genes and species; and for examining the mechanisms by which sequences change over evolutionary time scales. This thesis comprises three projects within the Veld of sequence analysis. It is shown that organelle genome DNA sequences can be assembled de novo using short Illumina reads from a mixture of samples, and deconvoluted bioinformatically, without the added cost of indexing the individual samples. In the course of this work, a novel sequence element is described, that probably could not have been detected with traditional sequencing techniques. The problem of multiple optima of likelihood on phylogenetic trees is examined using biological data. While the prevalence of multiple optima varies widely with real data, trees with multiple optima occur less often among the best trees. Overall, the results provide reassurance that the value of maximum likelihood as a tree selection criterion is not often compromised by the presence of multiple local optima on a single tree. Fundamental mechanisms of mutation are investigated by estimating nucleotide substitution rate matrices for edges of phylogenetic trees. Several large alignments are examined, and the results suggest that the situation may be more complex than we had anticipated. It is likely that genome scale alignments will have to be used to further elucidate this question.Item Extracting and exploiting signals in genetic sequences : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Mathematics at Massey University(Massey University, 2011) White, Walton Timothy JamesAs DNA databases continue to grow at an exponential rate, the need for more efficient solutions to basic problems in computational biology grows ever more pressing. These problems range from the principal questions driving evolutionary science: How can we accurately infer the history of genes, individuals and species? How can we separate the signal from the noise in our data? How can we visualise that signal? - to the purely practical: How can we efficiently store all this data? With these goals in mind, this thesis mounts a computational combination attack on a variety of topics in bioinformatics and phylogenetics: A program is designed and implemented for solving the Maximum Parsimony problem - in essence, finding phylogenetic trees having the fewest mutations. This program generally outperforms existing highly optimised programs when using a single CPU, and unlike these earlier programs, offers highly efficient parallelisation across multiple CPUs for further speedup.; A program is designed and implemented for compressing databases of DNA sequences. This program outperforms general-purpose compression by taking advantage of the special "treelike" structure of DNA databases, using a novel data structure, the "leaky move-to-front hashtable", to achieve speed gains.; A data visualisation technique is introduced that concisely summarises the "treelikeness" of phylogenetic datasets on a ternary plot. Each dataset is represented by a single point, allowing multiple datasets, or multiple treatments of a dataset, to be displayed on a single diagram.; We demonstrate problems with a standard phylogenetic analysis methodology in which a single tree is assumed a priori. We argue for a shift towards network methods that can in principle reject the hypothesis of a single tree.; Motivated by a phylogenetic problem, a fast new algorithm is developed for finding the mode(s) of a multinomial distribution, and an exact analysis of its complexity is given.Item DeepPN: a deep parallel neural network based on convolutional neural network and graph convolutional network for predicting RNA-protein binding sites.(29/06/2022) Zhang J; Liu B; Wang Z; Lehnert K; Gahegan MBACKGROUND: Addressing the laborious nature of traditional biological experiments by using an efficient computational approach to analyze RNA-binding proteins (RBPs) binding sites has always been a challenging task. RBPs play a vital role in post-transcriptional control. Identification of RBPs binding sites is a key step for the anatomy of the essential mechanism of gene regulation by controlling splicing, stability, localization and translation. Traditional methods for detecting RBPs binding sites are time-consuming and computationally-intensive. Recently, the computational method has been incorporated in researches of RBPs. Nevertheless, lots of them not only rely on the sequence data of RNA but also need additional data, for example the secondary structural data of RNA, to improve the performance of prediction, which needs the pre-work to prepare the learnable representation of structural data. RESULTS: To reduce the dependency of those pre-work, in this paper, we introduce DeepPN, a deep parallel neural network that is constructed with a convolutional neural network (CNN) and graph convolutional network (GCN) for detecting RBPs binding sites. It includes a two-layer CNN and GCN in parallel to extract the hidden features, followed by a fully connected layer to make the prediction. DeepPN discriminates the RBP binding sites on learnable representation of RNA sequences, which only uses the sequence data without using other data, for example the secondary or tertiary structure data of RNA. DeepPN is evaluated on 24 datasets of RBPs binding sites with other state-of-the-art methods. The results show that the performance of DeepPN is comparable to the published methods. CONCLUSION: The experimental results show that DeepPN can effectively capture potential hidden features in RBPs and use these features for effective prediction of binding sites.
