Beyond BLASTing : ribonucleoprotein evolution via structural prediction and ancestral sequence reconstruction

Primary homology in DNA and protein sequence has long been used to infer a relationship between similar sequences. However gene sequence, and thus protein sequence, can change over time. In evolutionary biology that time can be millions of years and related sequences may become unrecognisable via primary homology. This is demonstrated most effectively in chapter 4a (figure 10). Conversely the number of possible folds that proteins can adopt is limited by the attractions between residues and therefore the number of possible folds is not infinite. This means that folds may arise via convergence between evolutionarily unrelated DNA sequences. This thesis aims to look at a process to will wring more information from the primary protein sequence than is usually used and finds other factors that can support or refute the placement of a protein sequence within the family in question. Two quite different proteins; the Major Vault Protein whose monomers make up the enigmatic vault particle and the argonaute family of proteins (AGO and PIWI) that appear to have a major hand in quelling parasitic nucleic acid and control of endogenous gene expression, are used to demonstrate the flexibility of the workflow. Principally the method relies on prediction of three-dimensional structure. This requires at least a partially solved crystal structure but once one exists this method should be suitable for any protein. Whole genome sequencing is now a routine practice but annotation of the resultant sequence lags behind for lack of skilled personnel. Automated pipeline data does a good job in annotating close homologs but more effort is needed for correct annotation of the exponentially growing data bank of uncharacterised (and wrongly characterised) proteins. Lastly, in deference to budding biologists the world over, I have tried to find free stable software that can be used on an ordinary personal computer and by a researcher with minimal computer literacy to help with this task.
Nucleoproteins, Evolution, Nucleotide sequence, Evolutionary genetics