Exploring biological sequence space : selected problems in sequence analysis and phylogenetics : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computational Biology at Massey University

Loading...
Thumbnail Image
Date
2012
DOI
Open Access Location
Journal Title
Journal ISSN
Volume Title
Publisher
Massey University
Rights
The Author
Abstract
As the volume and complexity of available sequence data continues to grow at an exponential rate, the need for new sequence analysis techniques becomes more urgent, as does the need to test and to extend the existing techniques. These include, among others, techniques for assembling raw sequence data into usable genomic sequences; for using these sequences to investigate the evolutionary history of genes and species; and for examining the mechanisms by which sequences change over evolutionary time scales. This thesis comprises three projects within the Veld of sequence analysis. It is shown that organelle genome DNA sequences can be assembled de novo using short Illumina reads from a mixture of samples, and deconvoluted bioinformatically, without the added cost of indexing the individual samples. In the course of this work, a novel sequence element is described, that probably could not have been detected with traditional sequencing techniques. The problem of multiple optima of likelihood on phylogenetic trees is examined using biological data. While the prevalence of multiple optima varies widely with real data, trees with multiple optima occur less often among the best trees. Overall, the results provide reassurance that the value of maximum likelihood as a tree selection criterion is not often compromised by the presence of multiple local optima on a single tree. Fundamental mechanisms of mutation are investigated by estimating nucleotide substitution rate matrices for edges of phylogenetic trees. Several large alignments are examined, and the results suggest that the situation may be more complex than we had anticipated. It is likely that genome scale alignments will have to be used to further elucidate this question.
Description
Keywords
Nucleotide sequence, Phlyogeny, Bioinformatics, Methodology, Data processing
Citation