Lost in the RNA world : non-coding RNA and the spliceosome in the eukaryotic ancestor : a thesis presented in partial fulfilment of the requirements for the degree of PhD in Bioinformatics at Massey University, Palmerston North, New Zealand
The "RNA world" refers to a time before DNA and proteins, when RNA was both the genetic storage and catalytic agent of life; it also refers to today's world where non-coding RNA (ncRNA, RNA that does not code for proteins) is central to cellular metabolism. In eukaryotes, non-coding regions (introns) are spliced out of protein-coding mRNAs by the spliceosome, a massive complex comprised of five ncRNAs and about 200 proteins. This study examines the nature of the spliceosome and other non-coding RNAs, in the last common ancestor of eukaryotes, called here the eukaryotic ancestor. By looking at the differences between ncRNAs from diverse eukaryotic lineages, it may be possible to infer aspects of the eukaryotic ancestor's RNA systems. Comparing ncRNA and ncRNA-associated proteins involves the evaluation of the available software to search newly available basal eukaryotic genomes (such as Giardia lamblia and Plasmodium falciparum). ncRNAs are not often found using sequence-similarity based software, thus specialist ncRNA-search software packages were evaluated for their use in finding ncRNAs. One such program is RNAmotif, which was further developed during this study (with the help of its principle programmer), and which proved successful in recovering ncRNAs from basal eukaryotic genomes. In a similar manner, sequence-based search techniques may also fail to recover proteins from distantly related genomes. A new protein-finding technique called "Ancestral Sequence Reconstruction" (ASR) was developed in this thesis to aid in finding proteins that have diverged greatly between distantly-related eukaryotic species. A large amount of data was collected to investigate aspects of the eukaryotic ancestor, highlighting data management issues in this post-genomic era. Two databases were created P-MRPbase and SpliceSite to manage, sequence, annotation and results data from this project. Examination of the distribution of spliceosomal components and splicing mechanisms indicate that not only was a spliceosome present in the eukaryotic ancestor, it contained many of the components found in today's eukaryotes. Splicing in the eukaryotic ancestor may have used several mechanisms and have already formed links with other cellular processes such as transcription and capping. Far from being a simple organism, the last common ancestor of living eukaryotes shows signs of the molecular complexity seen today.