Extracting and exploiting signals in genetic sequences : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Mathematics at Massey University

dc.contributor.authorWhite, Walton Timothy James
dc.date.accessioned2012-05-13T23:21:49Z
dc.date.available2012-05-13T23:21:49Z
dc.date.issued2011
dc.description.abstractAs DNA databases continue to grow at an exponential rate, the need for more efficient solutions to basic problems in computational biology grows ever more pressing. These problems range from the principal questions driving evolutionary science: How can we accurately infer the history of genes, individuals and species? How can we separate the signal from the noise in our data? How can we visualise that signal? - to the purely practical: How can we efficiently store all this data? With these goals in mind, this thesis mounts a computational combination attack on a variety of topics in bioinformatics and phylogenetics: A program is designed and implemented for solving the Maximum Parsimony problem - in essence, finding phylogenetic trees having the fewest mutations. This program generally outperforms existing highly optimised programs when using a single CPU, and unlike these earlier programs, offers highly efficient parallelisation across multiple CPUs for further speedup.; A program is designed and implemented for compressing databases of DNA sequences. This program outperforms general-purpose compression by taking advantage of the special "treelike" structure of DNA databases, using a novel data structure, the "leaky move-to-front hashtable", to achieve speed gains.; A data visualisation technique is introduced that concisely summarises the "treelikeness" of phylogenetic datasets on a ternary plot. Each dataset is represented by a single point, allowing multiple datasets, or multiple treatments of a dataset, to be displayed on a single diagram.; We demonstrate problems with a standard phylogenetic analysis methodology in which a single tree is assumed a priori. We argue for a shift towards network methods that can in principle reject the hypothesis of a single tree.; Motivated by a phylogenetic problem, a fast new algorithm is developed for finding the mode(s) of a multinomial distribution, and an exact analysis of its complexity is given.en
dc.identifier.urihttp://hdl.handle.net/10179/3346
dc.language.isoenen
dc.publisherMassey Universityen_US
dc.rightsThe Authoren_US
dc.subjectDNA databasesen
dc.subjectComputational biologyen
dc.subjectBioinformaticsen
dc.subjectPhylogeneticsen
dc.subjectComputer program designen
dc.subjectComputer algorithmsen
dc.subjectDatabase compressionen
dc.titleExtracting and exploiting signals in genetic sequences : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Mathematics at Massey Universityen
dc.typeThesisen
massey.contributor.authorWhite, Walton Timothy James Whiteen
thesis.degree.disciplineMathematicsen
thesis.degree.grantorMassey Universityen
thesis.degree.levelDoctoralen
thesis.degree.nameDoctor of Philosophy (Ph.D.)en
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
01_front.pdf
Size:
61.39 KB
Format:
Adobe Portable Document Format
Description:
Loading...
Thumbnail Image
Name:
02_whole.pdf
Size:
1.71 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
804 B
Format:
Item-specific license agreed upon to submission
Description: