The increasing availability of molecular data means that phylogenetic studies nowadays
often use datasets which combine a large number of loci for many different species. This
leads to a trade-off. On the one hand more complex models are preferred to account
for heterogeneity in evolutionary processes. On the other hand simple models that can
answer biological questions of interest that are easy to interpret and can be computed in
reasonable time are favoured. This thesis focuses on four cases of phylogenetic analysis
which arise from this conflict.
- It is shown that edge weight estimates can be non-identifiable if the data are
simulated under a mixture model. Even if the underlying process is known the
estimation and interpretation may be difficult due to the high variance of the
parameters of interest.
- Partition models are commonly used to account for heterogeneity in data sets.
Novel methods are presented here which allow grouping of genes under similar
evolutionary constraints. A data set, containing 14 genes of the chloroplast from
19 anciently diverged species is used to find groups of co-evolving genes. The
prospects and limitations of such methods are discussed.
- Penalised likelihood estimation is a useful tool for improving the performance of
models and allowing for variable selection. A novel approach is presented that uses
pairwise dissimilarities to visualise the data as a network. It is further shown how
penalised likelihood can be used to decrease the variance of parameter estimates
for mixture and partition models, allowing a more reliable analysis. Estimates
for the variance and the expected number of parameters of penalised likelihood
estimates are derived.
- Tree shape statistics are used to describe speciation events in macroevolution. A
new tree shape statistic is introduced and the biases of different cluster methods
on tree shape statistics are discussed.