Exploring biological sequence space : selected problems in sequence analysis and phylogenetics : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computational Biology at Massey University
As the volume and complexity of available sequence data continues to grow at an
exponential rate, the need for new sequence analysis techniques becomes more urgent,
as does the need to test and to extend the existing techniques. These include,
among others, techniques for assembling raw sequence data into usable genomic sequences;
for using these sequences to investigate the evolutionary history of genes
and species; and for examining the mechanisms by which sequences change over
evolutionary time scales. This thesis comprises three projects within the Veld of
It is shown that organelle genome DNA sequences can be assembled de novo
using short Illumina reads from a mixture of samples, and deconvoluted bioinformatically,
without the added cost of indexing the individual samples. In the
course of this work, a novel sequence element is described, that probably could
not have been detected with traditional sequencing techniques.
The problem of multiple optima of likelihood on phylogenetic trees is examined
using biological data. While the prevalence of multiple optima varies widely
with real data, trees with multiple optima occur less often among the best trees.
Overall, the results provide reassurance that the value of maximum likelihood
as a tree selection criterion is not often compromised by the presence of multiple
local optima on a single tree.
Fundamental mechanisms of mutation are investigated by estimating nucleotide
substitution rate matrices for edges of phylogenetic trees. Several large alignments
are examined, and the results suggest that the situation may be more
complex than we had anticipated. It is likely that genome scale alignments will
have to be used to further elucidate this question.