Factors affecting the performance of phylogenetic methods : a thesis presented in partial fulfilment of the requirements for the degree of Ph.D. in Mathematics at Massey University

Charleston, Michael A.

Factors affecting the performance of phylogenetic methods : a thesis presented in partial fulfilment of the requirements for the degree of Ph.D. in Mathematics at Massey University

dc.contributor.author	Charleston, Michael A.
dc.date.accessioned	2012-12-20T20:16:34Z
dc.date.available	2012-12-20T20:16:34Z
dc.date.issued	1994
dc.description.abstract	This thesis comprises several computer simulation experiments in which the performance of a selection of phylogenetic methods was assessed. Data were generated according to a known model and used as input for the phylogenetic methods. Some new methods were introduced, and their performance compared with extant methods. Performance was judged by several criteria, being accuracy, consistency, efficiency, falsifiability and robustness. The experiments were designed to be biologically relevant, and yet computationally tractible. Hence the models of evolution used were simple, to allow a wide range of parameters to be tested for their effects within the bounds of available computing resources. The experiments were divided into two main types, the "small n" with up to 10 taxa, and the "large n" with from 10 to 30 taxa. Parameters which were allowed to vary in the "small n" case included number of taxa (n), sequence length, tree topology, edge length probability distribution, and purity of data. In the "large n" case, number of taxa, sequence length, and edge length probability distribution were varied. The simulation experiments show that the accuracy of phylogenetic methods decreases with increasing n, and that the mean number of internal edges of the generating tree which are incorrectly inferred increases at least linearly with n. The rate at which the sequence length must increase with n, to retain a fixed confidence in the inferred tree, is shown to be at least linear in n. All the methods are approximately as susceptible as each other to sampling error, which is exacerbated by the generating tree having very short or very long internal edges, and by finite sequence length. All the methods are susceptible to random error such as sequencing error, but provided such error is small, the effect is not great. One type of method, using edge lengths inferred by the Hadamard conjugation process, is shown to be much more robust to impure data and to sequencing error than are the other methods. With n ≥ 10 only the fastest methods were used. Increasing n again decreased the accuracy of the methods. Varying the "molecular-clockness" of the generating tree was shown to have a much greater effect upon those methods inconsistent with data which do not satisfy the molecular clock hypothesis. All the methods used are described algorithmically, and their computational complexity is discussed. New proofs are provided of the consistency /inconsistency of several methods with the models of evolution used. A notation is introduced to characterize all tree topologies, and used throughout this thesis. Pseudocode is provided for all the major algorithms used in the simulation experiments.	en
dc.identifier.uri	http://hdl.handle.net/10179/4118
dc.language.iso	en	en
dc.publisher	Massey University	en_US
dc.rights	The Author	en_US
dc.subject	Phylogenetic methods	en
dc.subject	Evolution models	en
dc.title	Factors affecting the performance of phylogenetic methods : a thesis presented in partial fulfilment of the requirements for the degree of Ph.D. in Mathematics at Massey University	en
dc.type	Thesis	en
massey.contributor.author	Charleston, Michael A.	en
thesis.degree.discipline	Mathematics	en
thesis.degree.grantor	Massey University	en
thesis.degree.level	Doctoral	en
thesis.degree.name	Doctor of Philosophy (Ph.D.)	en

Files

Original bundle

Now showing 1 - 2 of 2

Name:: 01_front.pdf
Size:: 1.2 MB
Format:: Adobe Portable Document Format

Download

Name:: 02_whole.pdf
Size:: 5.42 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 804 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses and Dissertations