Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere without the permission of the Author. The origins and evolution of prokaryotes and eukaryotes. A thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Molecular BioSciences at Massey University Anthony Masamu Poole 2001 Contents Papers and manuscripts included in this thesis. Related papers not included in this thesis. Acknowledgements. Introduction. Paper 1: Paper 2: Paper 3: Paper 4: Paper 5: Relics from the RNA world. The path from the RNA world. RNA evolution: separating the new from the old. Early evolution: prokaryotes, the new kids on the block. The nature of the Last Universal Common Ancestor. iii iii IV Paper 6: Paper 7: The origin of the nuclear envelope and the origin of the eukaryote cell. Prokaryote and eukaryote evolvability . Future work. Appendix: Does endosymbiosis explain the origin of the nucleus? Papers and manuscripts included in this thesis: 1. leffares DC, Poole AM & Penny D. Relics from the RNA world. J Mol Evo146, 1 8-36 ( 1 998) . Reprinted with permissionjrom Springer-Verlag New York Inc. 2. Poole AM, leffares DC & Penny D. The path from the RNA world. J Mol EvoI 46, 1 - 17 ( 1 998). Reprinted with permission from Springer-Verlag New York Inc. 3. RNA evolution: separating the new from the old. (manuscript). 4. Poole A, leffares D, Penny D. Early evolution: prokaryotes, the new kids on the block. Bioessays 2 1 , 880-889 ( 1999) . Reprinted by permission o/Wiley-Liss, Inc., a subsidiary 0/ John Wiley & Sons, Inc. 5. Penny D & Poole A. The nature of the Last Universal Common Ancestor. Curr Opin Genet Dev 9, 672-677 ( 1 999). Reprinted with permission /rom Elsevier Science. 6. The origin of the nuclear envelope and the origin of the eukaryote cell. (manuscript). 7. Poole AM, Phillips Ml & Penny D. Prokaryote and eukaryote evolvability. Biosystems (submitted) . 8. Appendix: Poole A & Penny D. Does endosymbiosis explain the origin of the nucleus? Nature Cell BioI 3, E173 . [Letter] Related papers not included in this thesis: • leffares DC, Poole AM & Penny D. Pre-rRNA processing and the path from the RNA world. Trends Biochem Sci 20, 298-299 ( 1995). [Letter] • Poole A, Penny D & Sjoberg B-M. Methyl-RNA: an evolutionary bridge between RNA and DNA? Chem BioI 7, R207-R2 1 6 (2000). • Poole A, Penny D & Sjoberg B-M. Confounded cytosine ! Tinkering and the evolution of DNA. Nature Reviews Mol Cell BioI 2, 147- 1 5 1 (200 1 ). • Poole AM, Logan DT & Sjoberg B-M. The evolution of the ribonucleotide reductases: much ado about oxygen. J Mol Evol (accepted). iii Acknowledgements. Dad, thanks for teaching me how to think. If I can leam to be half as inquisitive and half as good a thinker as you, I'll be pretty chuffed. Mum, you taught me how to push myself when it is so easy to content oneself with the bare minimum. (Just as well with David as a supervisor!) I remember your reaction when I got 98.5% in a chemistry test at school. Rather than being happy that I scored so well, you asked me why I didn't get 100%! Angelik and Emma, you deserve a big mention because you challenged me to become an efficient worker so that I wouldn't forget to have a life outside science. (Nevertheless, I've been in front of the computer and ignoring you for weeks now.) You also made me happy even after the crappiest of days at work and helped me to believe that what I was doing wasn't a complete waste of time. Angelik, tack ocksa fOr att du visade mig Sverige. Visst ar det bra att ha en hemlig kod har i Nya Zeeland ocksa! I love you both more than you can know. Big thanks also to Alan & Dennis for putting up with me and all the antics that caused you earache! Annette! Thanks for all the letters, phone calls, great discussions and advice over the years, and for showing me Denmark! David, your fourth year course is to blame for you having to deal with me on and off over the last 7 years! You have always surprised me by giving me what I thought was an unreasonable amount of work, only for me to find that I could cope with the extra load (though at times I was swearing under my breath!), which has meant that I've learnt a lot of new skills that I otherwise wouldn't have made the effort to acquaint myself with. Oh, and your infinite store of anecdotes still continues to amaze me! Dan, the clever one in our twosome. I have to admit that many of the clever bits in this thesis are your fault. Pink Giraffes on the Pink Serengeti! Matt, cheers for your ESNDs (Paper 7) another good idea in this thesis that I can't take credit for! I'm still waiting for the opportunity to strap torches to the backs of your knees to see if I can reset your biological (un)clock. Was the best of times flatting with you both, even if Matt is a filthy b'stard in the kitchen and Dan drank all my whisky! Thanks Monbusho for the two years I spent in Japan, prior to beginning my PhuD, where I had the opportunity to broaden my mind, and do what I wanted when I felt like it! Thanks to Hideo Oka for helping me get into the University of Tokyo and Yoshiki Hotta, under whom I worked at Todai. When I got bored with what I was doing in his lab, Hotta sensei was kind enough to let me instead work on the RNA world problem! Very few people, if any, would give you such an opportunity. Thanks to Matt Ridley, who thought our ideas were worth telling the world about, and who is responsible for much of the attention that our ideas have been given. It was an incredible thrill to be reading his book the Origins of Virtue at the time he contacted us-indeed an eerie coincidence! I also wish to thank Britt-Marie Sjoberg, who accepted me into her group when Angelik and I had to move to Sweden. While the work I did with her is not included in this thesis, she has been a fantastic person to work with. Thanks to two years in her group, my knowledge of biochemistry is a bit more respectable than before! The stupid immigration laws in place in New Zealand at the time when I left for Sweden actually did me a favour. . . Also, many thanks to Trish, who will be helping me bind this beggar before I get on the plane on Tuesday! Thesis survival kit: Powerbook G3, Blur 1 3 , Komeda Pop pa svenska, Bjork Debut & Homogenic, the Cardigans Emmerdale, Ryuichi Sakamoto 1996, Pearl Jam Yield. Finally finally, thanks Dad for introducing me to The Hitch-Hikers Guide to the Galaxy and Homer's Odyssey all those years ago! Ant. IV --- - -� Introduction Introduction. Candidate's note. This thesis is a collection of papers, either published, submitted, or in preparation for submission, to international journals. Each chapter is a paper with an introduction, and can be read as a stand-alone paper, the purpose of the thesis introduction is to give an overview of the motivation for the work. It also reviews other approaches being taken ular with respect to establishing the evolutionary relationships between the three domains of l�fe, archaea, bacteria and eukarya. Problems with the accepted scenario for the origin of life. For most biologists, the big picture regarding the origin and evolution of prokaryotes and eukaryotes is not at issue, and recent evidence only serves to back up the intuitively obvious: complex eukaryotes evolved from simpler prokaryotic ancestors. In the standard account, prokaryotes predated eukaryotes by at least 800 million years, as evidenced by cyanobacterial microfossils dating back 3 .5 billion years [e.g. Schopf & Packer 1 987, Walsh 1 992]. (The finding of molecular markers of eukaryote metabolism by Brocks et al. [ 1999] has pushed back the emergence of the earliest eukaryotes from 2.1 billion years to 2.7 billion years .) Establishing the root of the tree of life has shown that prokaryotes in fact consist of two domains, the archaea and bacteria, that the Last Universal Common Ancestor (LUCA) of all extant life lived at extremely high temperatures and that the eukaryotes emerged from the archaea [Woese & Fox 1977, Woe se 1 987, Woese et al. 1 990]. Prior to the emergence of cyanobacteria, life arose from prebiotic conditions on the early earth, and at some stage, possessed an RNA-rich metabolism. This period, dubbed the RNA world [Gilbert 1986, Benner et al. 1 989], predated both the emergence of genetically­ encoded proteins and of DNA as genetic storage molecule. The standard picture is therefore that, after the period of heavy bombardment that is suggested to have vapourised the oceans on Earth perhaps as recently as 3 .8 billion years ago [reviewed in Nisbet & Sleep 2001 ], life emerged, went through an RNA world period, a thermophilic prokaryote LUCA, and developed into cyanobacteria in an astonshingly short period of time - perhaps 300 million years [Lazcano & Miller 1 994]. Indeed, life may have arisen in an even shorter timeframe than this. Among the oldest rocks are those from the Isua belt of Southwest Greenland, which arguably date back around 3 .85 billion years. Enrichment of the BC isotope of carbon in these rocks have been argued to betray evidence of biological carbon fixation [Mojzsis et al. 1 996], 1 A closer look at any one of these 'established facts' , as with any area in science, suggests that none are as clear-cut as various popular science commentaries suggest. For instance, the earliest stromatolites do not contain micro fossils, and may have an abiological origin [Lowe 1 994, Grotzinger & Rothman 1 996], unlike those inhabited by modem cyanobacteria. The dating of the Isua belt is controversial, as is the argument that the enrichment of 13C found in rock samples from the belt is indicative of life [reviewed in Nisbet & Sleep 2001] . Furthermore, a hot earth rules out the possibility of an RNA world, given the instability of both single-stranded RNA [Forterre 1 995a] , and of the bases which make up RNA, particularly cytosine [Miller & Bada 1 988, Levy & Miller 1 998, Shapiro 1 999] . The suggestions of a faint early sun and a 'snowball earth' [Bada et al. 1 994, Nisbet & Sleep 2001 ] potentially fit better with an RNA-rich period in the origin of life [Moulton et al. 2000], yet perhaps one of the few points on which prebiotic researchers agree is that life could not have begun with RNA [e.g. Joyce & Orge1 1 999, Nelson et al. 2000] , there must have been earlier phases. Another issue is the reliability of microfossil classification. Biologists seem to take the finding of 3.5 billion year old cyanobacteria as fact, yet forget that until the early work of Woe se & Fox [ 1977], there were only prokaryotes. The primary domains archaea and bacteria were indistinguishable morphologically and were initially characterised solely on the basis of phylogenetic grouping from sequence motifs. Another concern is that modem cyanobacteria carry out oxygenic photosynthesis, yet the evolution of atmospheric oxygen probably did not occur until around 2 .5-2.2 billion years ago [Ohmoto 1 996, Summons et al . 1 999] . Furthermore, it is not even always possible to distinguish prokaryote from eukaryote on the basis of morphology. Microbial symbionts in the gut of Surgeonfish were first characterised as eukaryotic protists [Fishelson et al. 1 985], and it was not until rRNA sequences were obtained that it was possible to establish unequivocally that these large symbionts were in fact prokaryotes [Angert et al. 1 993] . Finally, a hyperthermophilic LUCA is also at issue. Early work on reverse gyrase by Forterre [ 1995a] suggested that hyperthermophiles were not ancestral to mesophiles, and more recently, reconstruction of ancestral GC content suggests the LUCA was mesophilic [Galtier et al. 1 999] . While the domains archaea, bacteria and eukarya are now generally accepted, it has become clear that horizontal transfer of genes between these lineages has probably occurred at significant levels, so simple phylogenetic reconstruction from a single gene may not be an accurate reflection of the evolution of the three domains [e.g. Martin 1 999] . Moreover, the finding that microsporidia have been incorrectly placed as deep diverging eukaryotes [reviewed by Keeling & McFadden 1 998] has served as a reminder that there are fundamental phylogenetic problems that have yet to be resolved in the reconstruction of deep divergences [e.g. Lockhart et al. 1 996, Forterre 1 997b, Philippe & Laurent 1 998] . Indeed, as argued by Forterre [ 1995a,b, 1 997a,b], we should not only be cautious about the claim that the LUCA was a hyperthermophile, but moreover, it has never 2 actually been established that prokaryotes preceded eukaryotes in evolution. The evidence is at best circumstantial, and conclusions are accepted largely on the basis of the widespread assumption that this must be correct. Right or wrong, the assumption that, owing to their greater complexity, eukaryotes have evolved from prokaryotes, definitely holds sway. Consequently, the approach that many biologists take in approaching the origin of a particular structure is to use the diversity of modem structures to try and build a picture of how the structure gradually became more complex. That is, a succession of forms from the modem prokaryotic apparatus, to the modern eukaryotic apparatus . This is flawed for several reasons. First, the assumption is made that whatever is prokaryotic must be ancient, and second, that there has been negligible change in the prokaryotic form since its advent. That the biolgical community can accept extensive horizontal transfer between prokaryotic organisms and extensive adaptation by prokaryotes to a wide range of dissimilar niches, at the same time as arguing that all prokaryotic structures are effectively living fossils, is amazing! Perhaps the most disturbing consequence of accepting a priori that prokaryotes predate eukaryotes is that the evolution of complex biological phenomena is approached as a purely descriptive problem. The direction of evolution is already known-simple to complex, and prokaryote to eukaryote. However, there is no inherent reason under Darwinian evolution that evolution should proceed from simple to complex [Szathmary & Maynard Smith 1 995] - simplification may equally occur, as is evident in many examples of parasite evolution (e .g. Andersson & Kurland 1998, Grbic 2000, Wren 2000]. More problematically, with the solution implicit in the assumption, selection pressures are usually not given in trying to explain the origins of a structure, rather, the emphasis is on explaining the diversification!complexification of that structure, perhaps with natural selection as an afterthought (Paper 6]. This problem is in some respects parallel to the problem in developmental biology of always applying adaptationist reasoning in describing the evolution of structures, it is widely assumed that every observable trait must have a function, but this is unlikely to be the case [Gould & Lewontin 1 979, Gibson 2000, Paper 7]. Given that reductive processes are as much a feature of evolutionary change as is complexification (as exemplified by parasite evolution), I have avoided making the assumption that prokaryotes are ancestral simply because they appear simpler. Instead, I have examined a range of data relevant to extant prokaryotes and eukaryotes to establish the nature of the processes underlying evolution in these groups [Paper 7]. I have also examined how the origin of the three domains fits with the RNA world period in the evolution of life [Papers 1-4]. My conclusion, and the main point of this thesis, is that the prokaryote lineages appear to have undergone reductive evolution, whereas the beginnings of eukaryote complexity may date back to early inefficient metabolic genetic and cellular systems [Papers 2, 4-6]. Thus prokaryotes are simple because they are streamlined, while eukaryotes are perhaps complex by historical accident [Paper 7]. 3 The tree of life & the LUCA. The tree of life as it currently stands aims to describe the evolutionary relationships between all organisms on Earth, but also to provide, by extrapolation, insights into the likely nature of the Last Universal Common Ancestor (LUCA). The crowning achievement was the tree of life from small subunit rRNA sequences [Woese & Fox 1 977], which established the relationships between representatives of a wide spread of organisms. The work resulted in the discovery of the archaebacteria, later renamed archaea [Woese et al. 1 990], which as distinct as eubacteria and eukaryotes. This was a major improvement for understanding the relationships between prokaryotes (and also single ceIled eukaryotes) since many species appeared very similar in terms of morphology and ultrastructure. Subsequently, attempts were made to root the tree of life using paralogous gene sets (gene pairs which had a common origin, and which were expected to have undergone a duplication and divergence from a single original gene prior to the emergence of the three domains) [Gogarten et al. 1 989, Iwabe et al. 1989]. The overall aim of this was two-fold: to build a phylogeny describing the relationships of all organisms on the planet, and to determine which of the three domains is most like the Last Universal Common Ancestor (LUCA). While the pursuit of a tree of life has been plagued with difficulties such as the problem of long-branch attraction [Hendy & Penny 1 989; Philippe & Laurent 1 998, Forterre & Philippe 1 999], finding suitable genes for rooting the tree [Lopez et al. 1 999], the need to improve on the rates across sites assumption [Lopez et al. 1 999, Brinkmann & Philippe 1 999] and horizontal transfer [Teichmann & Mitchison 1999, Martin 1 999, Doolittle 1 999], and weaknesses and conflicts between individual gene data [e.g. Baldauf et al. 2000] there is still confidence that the correct tree can eventually be recovered. The controversy over the tree of life and difficulties with the dataset and methods used is not an issue that I consider in this thesis. Numerous articles in the literature discuss this issue [e.g. Doolittle 1 999, Snel et aL 1999, Brinkmann & Philippe 1 999, Teichmann & Mitchison 1 999, Stiller & Hall 1 999, Forterre & Philippe 1 999, Philippe & Forterre 1 999, Baldauf et al. 2000, Penny et al. 200 1 ]. Instead, I will consider the problems inherent in using the tree for inferring the nature of the LUCA. Reconstructing the tree of life to is central to understanding evolutionary relationships between all organisms on Earth. Continuing attempts should be made, despite the problems inherent with recovering phylogenetic relationships for such deep divergences [Penny et al. 200 1 ]. I shall suggest however that, even if the correct tree were recovered, it would be largely uninformative for gaining an insight into the nature of the LUCA. It is my aim to describe exactly how the tree could be useful, and what the caveats and limitations of using the tree for evolutionary inference are. Important to that discussion is the issue of how horizontal 4 transfer affects the tree, and whether the effect is so great that the tree becomes unresolvable, as has been suggested by Woese [ 1998]. Attempts have been made to overlay characters onto the tree (such as thermophily) in order to examine the LUCA. However, there has been little consideration of the compatibility with earlier scenarios for the origins of life, based on physicochemical data. For instance, if the LUCA was thermophilic [W oese 1 987], given the thermolability of RNA [Forterre 1 995a, Papers 2&4], it is difficult explain how presumed relics from the RNA world have been retained. Indeed, establishing the position of the root cannot provide an answer to the question of the nature of the LUCA-it is virtually uninformative from this viewpoint [Forterre 1997b, Paper 5]. An approach I take in this thesis is to consider the diversity of RNA in modem organisms. Taking the model for the RNA world, a physicochemical approach to understanding the replacement of RNA by protein in evolution is possible. By recognising the properties of RNA, it is possible to identify niches where RNA would be expected to be lost, or at the very least reduced severely in its use. The link with the RNA world, plus the adherence to the properties of RNA enabled me to take a model for the RNA world [Paper 1 ] and apply it to the problem of the nature of the LUCA [Papers 2&4]. This was done by examining the phylogenetic distribution of putative RNA world relics. Furthermore, the properties of RNA meant it was possible to examine the problem of polyphyletic gene loss for the RNA dataset, which gives a marked improvement over application of simple parsimony [Paper 5]. The minimal genome concept and reconstruction of the LUCA. Currently, an active area of research has been in trying to derive a minimal genome, that is, the smallest gene set required for a functional cell [Mushegian & Koonin 1 996, Mushegian 1 999, Hutchison III et al. 1 999]. Initially, it was considered that this approach would provide a useful means of examining the likely genomic make-up of the LUCA [Mushegian & Koonin 1996], though it is now being acknowledged that a minimal genome and the LUCA are not one and the same thing [Mushegian 1 999; Paper 5]. A minimal genome is defined by the nature of its environment, and hence will differ depending on the genomes compared. In their initial work, Mushegian & Koonin [ 1996] compared the genomes of Haemophilus inJluenzae and Mycoplasma genitalium (at the time, the only two genomes available for analysis) . Their reconstruction produced a minimal genome of 256 genes that could be argued to be both necessary and sufficient for the function of a modem cell. This minimal gene set was criticised by Becerra et al. [ 1997] because it led Mushegian & Koonin [ 1996] to argue that the LUCA had an RNA genome! Mycoplasmas are parasitic and the alternative explanation for the lack of de novo deoxyribonucleotide synthesis is that they obtain these from their host. This is a likely example of loss resulting from 5 intracellular parasitism, and highlights the shortcomings of a minimal gene set as an approximation of the LUCA. Leipe et al. [ 1999] contend that the LUCA had a genome consisting of both RNA and DNA, since their genomic analysis suggests that the bacterial DNA replication machinery is unrelated to the archaeal and eukaryal machinery. The coding capacity of RNA is so low that it is unlikely that an organism as complex as the LUCA had an RNA genome. Likewise, the ubiquity and common origin of ribonucleotide reductases argues against this [Poole et al. 2000). Forterre [ 1 999] has also pointed out that other DNA replication proteins share a common origin, and that anomalies in the others may be a result of non-orthologous gene displacements. Following on from their systematic construction of a modem day minimal gene set, Mushegian & Koonin [ 1996] suggested how this gene set could be reduced to a set that would provide a model of a simpler ancestral cell: 1. Examine pathways requiring complex cofactors and eliminate those of them that can be bypassed without the use of the cofactors. 11. Eliminate the remaining regulatory genes. iii. Delineate paralogs and replace at least the most highly conserved families with a single, presumably multifunctional "founder." IV. Apply the parsimony principle: those systems and genes that are not found in both bacteria and eukaryotes or both bacteria and archaea are unlikely to come from a primitive cell. They also suggest: 'It has to be kept in mind that not only reduction but also certain additions to the minimal gene are likely to be required to produce a realistic model of a primitive cell. The most important of such additions may be a simple system for photo- or chemoautotrophy'. Points i-iii are simplifications for which the only basis is the notion that the direction of evolution was always from simple to complex. There is no inherent requirement that organisms will tend towards greater complexity during evolution [Szathmary & Maynard Smith 1995]. Indeed it has been argued that prokaryotes arose through a process of reductive evolution, with aspects of eukaryote genome architecture and RNA processing being more indicative of the make-up of the LUCA than those found in prokaryotic organisms [Forterre 1995a, Glansdorff 2000, Papers 6 Table: Difficulties with using distribution to establish whether a gene was a feature of the LUCA. Bacteria Eukaryotes Archaea HTa RNA world In LUCA? relic Gene 1 ./ ./ ./ X YES Ubiquitous No HT Gene 2 YES Predates LUCA Gene 3 UNCERTAIN Unplaceable if extensive HT Gene 4 ./ X X X X UNCERTAINb Gene 5 X ./ ./ X X UNCERTAINc aHT: Horizontal transfer. bI f eukaryotes and archaea are monophy l etic, Gene 4 could either be argued to be a feature of the LUCA (with a single l oss prior to the archaea-eukaryote divergence) , or to have arisen in the bacterial l ineage after it split from archaea-eukaryotes. I f bacteria and archaea are m onophy l etic, Gene 4 could be a feature of the LUCA with two independent l osses (once from archaea and once from eukaryotes) , or may have arisen specifical l y in the bacterial lineage, after it split from archaea. CIf eukary otes and archaea are monoph y l etic, it is as likely that Gene 5 arose in the com mon ancestor of these two groups as it is that it was a feature of the LUCA. If bacteria and a rc h aea are monoph y l etic, parsimony would suggest the gene was a feature of the LUCA, w ith l oss from bacteria. 7 2, 4 & 5]. Finally, reductive evolution is a hallmark of the mycoplasmas [Fraser et al. 1 995] and such reductive evolution may be a hallmark of the parasitic lifestyle of the organism [Andersson & Kurland 1998, Paper 7]. An example is the different degrees of degradation of the S-adenosylmethionine synthetase gene in 8 species of Rickettsia [Andersson & Andersson 1 999), which are obligate intracellular parasites. Thus the minimal genome concept may better represent the minimal parasitic/obligate intracellular symbiont genome; further reduction would produce an even more extremely minimal parasitic genome, not an approximation of the LUCA. Mitigating against points i-iii is their final comment. However, this reduces the worth of the minimal genome approach to understanding the LUCA, since one may add or remove anything, without a specified framework that enables additions or removals to be evaluated. The RNA world model suggests that many RNA processing pathways absent from prokaryotes should be included in any reconstruction of the make-up of the LUCA [Papers 2,4&5]. The likelihood then is that the LUCA was not 'minimal' as mycoplasmas or other obligate intracellular parasites are. Importantly, paralogous genes (point iii) are expected to have been a feature of the LUCA, and these have figured in attempts to root the tree of life [see Forterre & Philippe 1 999, Glansdorff 2000, for review). While paralogous genes have originated from a single "founder", the duplications that gave rise to some paralogues will have occurred prior to the emergence of the three domains of life. More generally, throwing away paralogues may mean that a minimal gene set could be underestimating the level of complexity of the LUCA. The problem with which we are faced is then, given a minimal gene set as a starting point, how to decide what features should be removed, and what should be added? Finally , point iv is that simple parsimony is a useful tool for reconstructing the LUCA. Given the three domains, archaea, bacteria and eukaryotes, the presence of a trait in two of the three is not in itself strong evidence for the presence of that trait in the LUCA. If agreement on the topology of the tree, and hence the position of the root, can be reached, this may guide the use of parsimony in tracing genes back to the LUCA [Forterre 1 997a, Papers 3 & 5J. Rigid application of parsimony however may wrongly exclude genes that can be traced back to the LUCA on other grounds, or exclude genes for which no other evidence of their ancestry is evident. Building on the minimal genome. In terms of reconstruction of the LUCA, the minimal genome concept should not be abandoned, but its limitations should be noted. It may help to take the minimal genome concept as a starting point, as it provides a powerful way of sorting through a large number of traits to establish which can possibly be traced back to the LUCA. Certainly, the conceptual difficulty of reconstructing the RNA world [Papers 1 , 3&4] is similar in this regard, but the nature and size of the dataset makes it easier to 8 distinguish, ad hoc, putative RNA world relics from RNAs that have evolved more recently [Paper 3]. Based on Mushegian & Koonin's [ 1 996] original proposal, along with current attempts to reconstruct the LUCA using a model for the RNA world [Papers 2, 4 & 5], I suggest the following amendment, where I remove and replace criteria i-iii, amend iv and effectively expand their final point on additions to include the RNA world data (see table). This provides a tentative method for how to go about reinserting some traits into a minimal gene set to improve the reconstruction of the LUCA: 1 . Inclusion of synthetic pathways for pyridine nucleotide cofactors because these are likely RNA world relics, though not necessarily of pathways requiring these cofactors. Rather, it is the generic reaction chemistries that should be considered ancestral. 2. Inclusion of putative RNA world relics, even where these are not universal in distribution. 3. Reintroduce paralogues in those cases where these clearly diverged prior to the divergence of the LUCA into the three domains. 4 . Apply simple parsimony with caution: under certain circumstances, i t is weak or misleading (see table) . Current disagreements on the position of the root (and therefore the relationships between the three domains) makes it difficult to use this in examining possible polyphyletic losses or gains. 5 . The ability to describe a large number of traits as ancestral or derived on the basis of a single selection pressure should permit reconsideration of some datasets which may not otherwise be included in the minimal genome. The problem of horizontal transfer. Much has been made of the question of horizontal transfer in the three lineages. It is still debated how extensive this is - some authors have argued for massive unbridled horizontal transfer events [Woese 1 998, Doolittle 1 998], some have argued that there are detectable patterns to the process [e.g. Jain et al. 1999, Lan & Reeves 2000, Paper 7], and some have suggested there is very little transfer at all [Snel et al. 1 999]. The other issue is whether this transfer is extensive and ongoing [Ochman et al. 2000, Lan & Reeves] or whether it was extensive and has possibly slowed [Woese 1 998]. The need for caution is obvious: horizontal transfer of genes will blur the ability to trace a given gene back to the LUCA, meaning that until it is possible to recognise even ancient horizontal transfer events, it will pay to be judicious with the application of parsimony. This may mean in effect that careful studies of the distributions of various genes within the diversity of life will be essential, and furthermore, that it will be crucial to develop ever more sensitive ways of recognising potential cases of transfer. Again, the tree of life will be a useful tool here, as limited distribution of a gene within one domain may provide a means of 9 homing in on potential transfer events. Nevertheless, like the simple parsimony approach to the three domains, this will require that we have reconstructed the correct tree if it is to be of any use. A clear example of how the difficulties of tree topology and possible horizontal transfer weakens the propensity for theory to examine events in early evolution is that of the 'respiration early' hypothesis [Castresana & Saraste 1 995, Castresana & Moreira 1 999]. Here the authors acknowledge that their argument rests on the assumption that the position of the root is correct, and that horizontal transfer has had no impact on the traits they examine. The hypothesis is inherently testable, but the prerequisite for testing it is that tree topology can be established, and that the impact of horizontal transfer can be evaluated. If one takes the extreme view of Woese [ 1 998, 2000], it is not possible to test any such hypotheses, and the result is a situation whereby competing theories are evaluated on intuition or popularity, not on hypothesis testing. Current evidence argues that while genes involved in metabolic processes may transfer extensively, those involved in informational processes [sensu Rivera et al. 1998] tend not to be transferred very frequently, and some may not transfer at all [Jain et al. 1 999]. It is thus a crucial goal of genomics to determine how frequent horizontal transfer is, between which types of organisms it tends to occur, and whether it applies to all genes [Martin 1999, Lan & Reeves 2000]. The ultimate goal is to construct a network describing genomic evolution, with those components of the genome that are subject to horizontal transfer overlain on a tree that describes organismal relationships, as determined by vertical transmission [Martin 1999]. Horizontal transfers have been suggested to contribute strongly to speciation events [de la Cruz & Davies 2000, Lawrence 1999], though currently there is no reason to suggest that these are more frequent than speciation by descent, particularly when one considers that there can be large intraspecies genome differences in prokaryotes [Lan & Reeves 2000]. Indeed, as Lan & Reeves [2000] point out, applying the species concept to prokaryotes will require a very different approach to the framework used for sexual organisms. In multicellular eukaryotes, where extensive cell specialisation makes transfers less likely than in single-celled organisms, speciation through horizontal transfer is likely to be rare [Paper 7]. However, in both unicellular and multicellular eukaryotes, there are strong indications that many genes have been transferred from organelles to the nucleus [Martin et al. 1 998, McFadden 1 999, Berg & Kurland 2000]. A tree of genomes is most likely to be part tree, part network and would indicate organismal relationships in terms of descent by modification, and gene relationships in terms of mode of transition. Some regions of the tree may have limited network structure, some may have extensive network structure, with tree branches being highly unreliable [Martin 1 999]. Given known difficulties with phylogenetic analyses for deep divergences [Lockhart et al. 1 996, Philippe & Laurent 1 998, Lockhart et al. 1 998, Philippe & Forterre 1 999, Penny et al. 2001 ] how can cases of transfer be distinguished from 1 0 problems of phylogenetic reconstruction? There are two aspects. One is to determine the nature and extent of horizontal transfer, and should be approached as a biological problem. What is the evolutionary basis for horizontal transfer between organisms, and what patterns emerge? Does transfer occur non-specifically given proximity between two organisms, or is transfer dependent on selection? Some aspects of horizontal transfer are considered in papers 6 and 7. In paper 7, I consider horizontal transfer from the viewpoint of organismal evolvability, and argue that extensive horizontal transfer has a selective component. The other aspect, which is not considered in any depth in this thesis, is how cryptic transfers can mislead phylogenetic reconstructions [Teichmann & Mitchison 1 999, Philippe et al. 1 999], and bioinformatic [Lawrence & Ochman 1 998, Nelson et al. 1 999, Ochman et al. 2000] and experimental [reviewed in Lan & Reeves 2000] approaches for establishing patterns of transfer. Given the correct tree, some transfer events may in principle be identifiable, and so should traits dating back to the LUCA [Paper 5]. A trait that is found on both sides of the root can be best explained as loss in one of the three domains, and hence the most parsimonious explanation is a strong one. A trait that appears in two of the three domains, but where the two domains containing this trait group together (i.e. are monophyletic), is uninformative, and parsimony is not sufficient. Without further knowledge, it is not clear if the trait is ancestral or derived since the grouping of the two domains means the tree is reduced to a 'V' shape (Figure), with the two domains that form a monophyly being represented by a single branch. Nevertheless, the topology makes the application of parsimony weak, and it is also important to note that independent losses are much more likely than independent gains [Forterre 1 997a]. In reconstructing the LUCA, it should be possible to examine whether there are other arguments for the inclusion of a particular gene, even if it has undergone horizontal transfer. Since function is of greater importance than whether there has been horizontal transfer, there may be cases where, say, a metabolic pathway can be included in the LUCA, even though one or more of the genes has been shown to have undergone horizontal transfer. For instance, numerous arguments have been made for an early origin for the TCA cycle [Wachtershauser 1 992, Morowitz et al. 2000], so this may be a good candidate for inclusion on the basis of function as opposed to inclusion on the basis of presence in the minimal genome dataset. In Paper 3 a similar approach is taken in distinguishing betwen the ultimate origin of an RNA, and recent recruitment to new function (proximate origin). 1 1 A more l ik e A or B . o r in between? B c in between? Figure: The topology of the tree of hie is uninformative as to the nature of the organism at the root. Above: I f topology a l one i s considered, i t is not possibl e to establish w hether the organism at the root is most like l ineage C, or A+B, or in between these. Indeed, the same holds for the A - B monoph y l y. Overl aying characters on this tree to establish the nature of the root is l ikewise problematic, especia l ly since h o rizontal transfer may m is lead such analyses. Right: A s hared character in a l l possible combinations, overl ain on trees either rooted by bacteria or eukaryotes. B lue = presence, Grey = absence. For 2,5,9 & 10, independent gains are unlike l y. A l l other trees are equa l l y parsimonious for each s hared character combination. If blue denotes l oss, then these trees are still favoured as independent l osses are easier to expl ain than independent gains. E.g. for 5 & 6, if grey is an R N A world relic, one vs two independent l osses coul d onl y be evaluated by knowing the position of the root [Paper 5]. T rees 9 & 10 could be expl ained by mitochondrion to nucl eus gene transfer. Extended from Forterre [1997a] . E A B E 1 2 A B ..... � ......... ...... A � r � ..... � . . . ....... � .... . 3 4 E A B 5 6 .......................................................... ............. : ...................................................................... . E 7 E 9 A B A B I E 8 E A B A B 'I .......................... .............................................. : ..................................... ,' ............................. . E A B E A B Using the tree for reconstructing LUCA Broadly, the problems faced in reconstructing the tree of life are two-fold: current phylogenetic techniques are not able to recover the correct tree with any certainty, and horizontal transfers may further complicate reconstruction [Paper 5]. If, even with extensive horizontal transfer, the three domains, archaea, bacteria and eukaryotes can be shown to hold, a low-resolution tree of life will be recoverable, and that this can be rooted using various tricks such as using a paralogous gene as an outgroup (building separate unrooted trees from two genes that duplicated before the divergence of the three domains in order to root one tree with the other) [e.g. Gogarten et al. 1 989, Iwabe et al. 1 989, Brinkmann & Philippe 1 999], can we then use the tree to obtain information on the root? The fundamental problem with the tree as it currently stands (technical difficulties in reconstructing relationships aside) is that, at its lowest resolution, it attempts to describe the relationships between three monophyletic groups: archaea, bacteria and eukaryotes. Wherever the root is placed, it is difficult to infer much about the evolutionary relationships between groups of organisms (even when characters are overlain - see figure), and a rooted three-pronged tree can in principle establish whether two of those groups come together as a monophyletic group. Rooting the tree in the phylogenetic sense is an important means by which to examine the monophyly of the prokaryotes [Brinkmann & Philippe 1 999]. What it absolutely cannot do however is to establish the nature of the LUCA. The outgroup is often argued to indicate which lineage is most likely to resemble the organism at the root, but this is incorrect (Figure) . The structure of the tree is uninformative, and importantly, phylogenetic trees do not in themselves describe a process of evolutionary change. Their utility comes when, given the correct tree, various characters or traits can be overlaid upon the tree, giving a more complete picture of evolution. A recent example is the use of both fossils and molecular sequence data in reconstruction of the evolution of echolocation in bats [Springer et al. 2001]. The topology problem in the tree of life is fairly straightforward (Figure). The process of inference from phylogenetic trees has been to argue that the deepest­ diverging groups in the branch that leads to the root provide insight to the nature of the LUCA. This has led to the widely-accepted proposal that the LUCA was hyperthermophilic and much like modern bacteria [e.g. Woese 1987]. Without considering the phylogenetic arguments for and against this proposal, let us first consider the implication of a split in the tree defining two domains (Figure). If domain A and B are shown to be related in the tree with the exclusion of group C, what can we infer about the common ancestor of A and B? Was it more like A, more like B , or did it have traits characteristic of both, some of which they still share in common? Or was it still like C? Considering the whole tree results in the same problem-it is not possible to decide if organisms that constitute 'outgroup' C in general, and deep-dranching members of group C in particular are more representative of the organism at the root. The branch that leads to the 'monophyletic' grouping of A 1 3 and B could potentially provide just as much information on the nature of the organism at the root of the tree. If one of these three has maintained most metabolic traits of the common ancestor, it is not clear from the pattern of divergence given by the tree which of these three this is. When the ancestors of A, B and C diverged, it could have been that C underwent a series of reductions, whereby many ancestral traits were lost in the evolution of this domain, so that, even though the other two groups diverged from each other more recently, one or both may have retained more ancestral traits than has C. Alternatively, it could be the opposite 1 Rooting of a tree with three groups (Figure) implies that A and B are monophyletic, and hence the tree could be represented in simplified form with two branches, and A and B together constituting one domain . Which group then is most similar to the organism at the root-the AB monophyly or C? No such information can be recovered simply by looking at branching patterns on a tree. The tree clearly gives us important information on evolutionary splits between major lineages, but it offers no information on which traits can be traced back to the ancestor of all three groups. That said, evolutionary inference based on the tree of life has not relied solely on the topology - the standard interpretation is that thermophily appears in the deepest branches of both archaeal and bacterial domains, leading to the contention that the LUCA was a hyperthermophile [Woese 1 987]. Given that rooting the tree supported the grouping together of archaea and eukaryotes to the exclusion of bacteria, this was a correct conclusion, assuming the relationships between the three domains were correctly recovered, and assuming that hyperthermophily evolved only once. If so, then, given hyperthermophily is recovered in both branches of the tree (i.e. it traverses the root), this argues that this is the ancestral state (Tree 7 in figure). The bacterial rooting is subject to continued scrutiny as phylogenetic methods improve, and the hypothesis that the LUCA was a hyperthermophile is likewise testable. Indeed, there have been several criticisms on both the rooting of the tree, and the conclusion that the LUCA was a hyperthermophile. The competing hypothesis is that the bacterial rooting is a consequence of long branch attraction [Brinkmann & Philippe 1 999, Lopez et al. 1 999, Forterre & Philippe 1 999]. An examination of the phylogenetic distribution of putative RNA world relics [Papers 2 & 4], gyrases and topoisomerases [Forterre 1 995a), ancestral GC content [Galtier et al. 1 999] and low stability of RNA at high temperature [Moulton et al. 2000] argues that the LUCA was mesophilic. These independent approaches argue that eukaryotes have retained a number of ancestral features that date back to the LUCA, while archaea and bacteria have lost these. Furthermore, the stability of hyperthermophily as a character has also been questioned, with several reports that hyperthermophilic traits common to both bacteria and archaea having undergone horizontal transfer [Nelson et al. 1 999, Aravind et al. 1 999, Forterre et al. 2000], and other traits, such as the lipid composition of hyperthermophile membranes [reviewed in Daniel & Cowan 2000], suggest hyperthermophily has evolved twice independently [Forterre 1 996]. 14 The tree of life displays the evolutionary relationships between extant organisms as patterns of divergence on a tree. All living organisms are thus billions of years removed from the LUCA, such that the deep branches do not necessarily represent 'living fossils', only the pattern of evolutionary divergence. Indeed, indications from current tree building methods are that it is the fastest-evolving lineages that are most likely to take basal positions because most current tree reconstruction methods tend to provide a measure of evolutionary distance which is affected by rate of evolutionary change [Laurent & Philippe 1 998, Stiller & Hall 1 999, Brinkmann & Philippe 1 999] . The pattern of evolutionary divergence is not recovered because it has not been possible to build trees that correctly take into account rate variation between lineages. Brinkmann & Philippe [ 1999] have been able to demonstrate how Long Branch Attraction [Hendy & Penny 1 989] affects the overall topology of the tree, using an implementation [Lopez et al. 1 999] of the covarion model [Fitch & Markowitz 1 970, Fitch 197 1 ] to separate out fast -evolving and slower-evolving sites. With the fast-evolving sites, which will tend to become saturated, archaea and eukaryotes group together, but taking the slower-evolving sites returns a tree where the root is in the eukaryote branch, and the prokaryotes are monophyletic. If correct, the tree severely weakens the conclusion that the LUCA was a hyperthermophile, as this trait is now found in one branch only: the monophyletic prokaryotes (see trees 6 & 8 in figure). Phylogenornics. Nevertheless, the problem remains. Given the alternative trees: Brinkmann & Philippe's [ 1999] bacteria-archaea monophyly or the eukaryote-archaea monophyly [Woe se et al . 1990, Iwabe et al. 1 989, Gogarten et al. 1 989], which is right? One alternative has been to move away from single genes and attempt to use whole genomes in phylogenetic analyses [e.g. Sicheritz-Ponten & Andersson 200 1 ] . Genomics (unlike conventional phylogenetic analyses of one gene conserved across all organisms in the study) promises to allow us to compare all genes in a group of organisms. This is achieved in two ways. The simplest is counting the number of genes that are shared. Relatedness is based on the number of genes in common with other species in the study [Snel et al. 1 999] . The other is carrying out a global phylogenetic analysis of genes that are shared in order to try and build a composite tree using sequence data. A more modest and potentially very powerful approach is a composite tree, where genes which have individually been shown to be informative in reconstructing distant phylogenetic relationships are used to produce a combined dataset. A recent analysis of the phylogeny of eukaryotes is one such example [Baldauf et al. 2000] . Nevertheless, these approaches are not necessarily expected to provide significant improvements to single-gene trees. A consensus tree over all, or for each 1 5 of, the three domains, where there is general agreement for several different genes, all of which contain sufficient phylogenetic information from which to build a tree is not yet achievable. Protein and RNA trees give conflicting results [Forterre 1 997b, Philippe & Forterre 1 999]. At worst, large-scale 'phylogenomic' analysis simply amounts to adding more data without attempting to address limitations of models in current tree-building algorithms [Lopez et al. 1 999, Penny et al. 200 1 ] which require each site always to evolve at the same rate. Furthermore, it is not clear how genome­ level comparisons will be able to deal with the problem of horizontal transfer. Snel et al. [ 1 999] used gene presence and absence in 1 3 genomes as a phylogenetic character, claiming that their analysis supports the 16S rRNA tree and that horizontal transfer was not extensive. However, such an analysis might miss orthologous gene replacements as well as independent gains and losses through horizontal transfer. If it is assumed that the problem of horizontal transfer is real, and that those genes which do transfer can be distinguised from those that do not, should the former be eliminated from reconstructions of the LUCA? These cannot be reliably traced back to the LUCA, unless independent criteria for their inclusion can be used (see table) . From the subset that are primarily transmitted vertically, which are ancestral traits, and which are derived? That is, which were present in the LUCA, and which arose later? The difficulty here is that there is no good methodology for deciding this . One could use parsimony, such that where two of the three have a trait it is ancestral, and where two of the three lack it, it is derived. Parsimony as a rule is fraught with problems, especially where one applies it to three groups, as it could easily lead to artificial groupings of ancestral and derived traits [Forterre 1 997a, figure]. Gene loss versus the origin of novel genes cannot be inferred without some evolutionary precedent, and parsimony is insufficient in three-domain problem [discussed in Paper 3 for the origin of snoRNAs]. Nor, as we have seen, does the tree give such precedent (e.g. if it is in C it is ancestral, if it is A and B but not C, it is derived), so this must be established through other lines of inquiry. Non-phylogenetic approaches. Using a genomic approach, many traits are simply not amenable to analysis, either because of horizontal transfer, or because traits which are not ubiquitous in distribution cannot always be reliably argued to date back to the LUCA on the basis of parsimony alone (Table). With current methods, those that turn out to have been subject to extensive horizontal transfer may not be reliably examined in the context of the LUCA problem, though cases where transfer turns out to be only very limited might be expected to be. Since the reconstruction of the LUCA depends most on rebuilding a rough picture of metabolism before the emergence of the three domains, it is not necessary to use phylogenetic-based approaches in justifications for the antiquity of a given 1 6 trait. While less ambitious than the minimal gene set [Mushegian & Koonin 1 996], an alternative is to try to identify ancient metabolic traits, even if they are limited in distribution. In this thesis, I have attempted to do just that. A means of examining some aspects of extant metabolism is the application of the RNA world theory to the problem, in the first instance to identify RNA species which are likely to be ancient [Papers 1 &3], and subsequently, to explain the asymmetric distribution of these in modern species based on known principles. Since the tree gives us very limited information on the likely nature of the LUCA, owing to the rooting problem, an alternative that examines this is essential. While the notion of an RNA world may or may not represent an intermediate in the evolution of life, currently there is no real alternative for understanding the origins of proteins and DNA. Certainly, it seems highly likely that RNA played a more prominent role in metabolism than it currently does, and not only is there a good physicochemical and biochemical basis for expecting RNA would be replaced over time by proteins and DNA, a number of RNAs, such as rRNA, tRNA, srpRNA and RNase P, are found to be ubiquitous [Papers 1&3]. The biggest problem is trying to identify candidate relics and, although criteria have been put forth that aid in distinguishing between relic RNAs and recent additions to metabolism, the approach is necessarily ad hoc [Papers 1 ,3&4]. Importantly, it is not an absolute requirement for candidate RNA relics to be ubiquitous, and this offers an improvement over parsimony, and abrogates the need for the correct tree in evaluating aspects of the nature of the LUCA. Expanding LUCA: how easy or hard is identification of ancient metabolic traits? Some ancient metabolic traits can be identified if they are ubiquitous and have been demonstrated not to have been subject to horizontal transfer. This is in itself likely to pose a difficult technical problem, as horizontal transfer would make it impossible to judge on distribution alone whether or not the trait was ancient. Those traits that are not ubiquitous represent an equally formidable problem. How can such ancient traits be identified from a tree based on a single gene, or, from a tree based on comparisons of genome content (where presence/absence of a gene is a character) , or a composite tree where several ubiquitous genes give the same tree? Again, one could apply parsimony. However, a tree cannot be used to infer evolutionary pressures that account for changes along a branch, because the branching pattern alone cannot identify such pressures [Forterre 1 997a]. It may however point us in the right direction, provided the topology problem is taken into account. For instance, if we are able to unambiguously determine the relationships between the archaea, bacteria and eukaryotes , the monophyly of two, for example archaea and bacteria, can greatly improve the usefulness of the parsimony rule in certain 1 7 situations. For instance, given the tree in the Figure, if a gene known not to have been subject to horizontal transfer is found in organisms in groups A and C, but not B , and if the grouping (AB)C is correct, we can argue from parsimony that the trait was lost from group B, and that it can be traced back to the LUCA. If the trait were in C only, or in A and B but not C, parsimony cannot be used, so the tree cannot be used to determine whether the trait dates back to the LUCA. In concluding the introduction, the main point I will be arguing with regard to reconstructing the LUCA is that the framework of the RNA world hypothesis provides one way of establishing some events in early evolution, and with greater certainty than searching for patterns in genomic data. This approach provides hard data on the metabolic make-up of the LUCA, and leads to testable hypotheses (described in the section on future work). However it cannot replace phylogenetic approaches for classifying taxa. It cannot even examine the question of the monophyly of the prokaryotes. Indeed, as described in Paper 5, if eukaryotes and archaea do turn out to be monophyletic, this does not affect the conclusion that the LUCA possessed some eukaryote-like features. Rather, it highlights how uninformative the root is - contrary to the interpretation that many non­ phylogeneticists have, the outgroup is not indicative of the LUCA, and the direction of evolutionary change cannot be inferred solely from the topology. What the approach in this thesis does allow is a hypothesis-driven approach to understanding eukaryote and prokaryote evolution. It provides continuity between the RNA world, the LUCA, and the subsequent divergence of the three domains. Furthermore, it makes a significant shift away from the preconception that prokaryotes predate eukaryotes by establishing important factors that influence evolution in extant prokaryotes and eukaryotes [Paper 7]. This provides an insight into evolutionary processes and establishes how the process of natural selection has operated in the evolution of prokaryotes and eukaryotes. Such insight cannot be established through phylogenetic analyses or comparative genomics alone. References. Andersson JO, Andersson SGE: Genome degradation is an ongoing process in Rickettsia. Mol BioI Evol 1 999, 1 6, 1 178- 1 1 9 1 . Andersson SGE, Kurland CG: Reductive evolution in resident genomes. Trends Microbiol 1 998, 6, 263-268. Angert ER, Clements KD, Pace NR: The largest bacterium. Nature 1 993, 362, 239- 241 . Aravind L, Tatusov RL, Wolf YI, Walker DR, Koonin EV: Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet 1 998, 14 , 442-444. 1 8 Bada JL, Bigham C, Miller SL: Impact melting of a frozen ocean on the early earth and the implication for the origin of life. Proc Natl Acad Sci USA 1 994, 9 1 , 1 248- 1 250. Baldauf SL, Roger AJ, Wenk-Siefert I , Doolittle WF: A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 2000, 290, 972-977. Benner SA, ElIington AD, Tauer A: Modem metabolism as a palimpsest of the RNA world. Proc Natl Acad Sci USA 1989, 86, 7054-7058. Berg OG, Kurland CG: Why mitochondrial genes are most often found in nuclei. Mol BioI Evo1 2000, 17, 95 1 -96l . Becerra A, Islas S , Leguina JI, Silva E, Lazcano A: Polyphyletic gene losses can bias backtrack characterizations of the cenancestor. J Mol Evol 1 997, 45, 1 1 5- 1 17. Brinkmann H, Philippe H: Archaea sister-group of Bacteria? Indications from tree reconstruction artifacts in ancient phylogenies. Mol BioI Evo1 1 999, 1 6, 8 17-825. Brocks JJ, Logan GA, Buick R, Summons RE: Archaean molecular fossils and the early rise of eukaryotes. Science 1 999, 285, 1 033- 1 036. Castresana J, Moreira D: Respiratory chains in the Last Common Ancestor of living organisms. J Mol EvoI 1 999, 49, 453-460. Castresana J, Saraste M: Evolution of energetic metabolism: the respiration-early hypothesis. Trends Biochem Sci 1 995, 20, 443-448. Daniel RM, Cowan DA: Biomolecular stability and life at high temperatures . Cell Mol Life Sci 2000, 57, 250-264. de la Cruz F, Davies J: Horizontal gene transfer and the origin of species: lessons from bacteria. Trends Microbiol 2000, 8, 1 28- 1 33 . Doolittle WF: You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends Genet 1 998, 14, 307-3 1 1 . Doolittle WF: Phylogenetic classification and the universal tree. Science 1 999, 284, 2 1 24-2 1 28. Fishelson L, Montgomery WL, Myrberg Jr AA: A unique symbiosis in the gut of tropical herbivorous surgeonfish (Acanthuridae: Teleostei) from the Red Sea. Science 1 985, 229, 49-5 l . Fitch WM: Rate of change of concomitantly variable codons. J Mol Evol 1 97 1 , 1 , 84- 96. Fitch WM, Markowitz E: An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Gen 1 970, 4, 579-593. Forterre P: Thermoreduction, a hypothesis for the origin of prokaryotes. CR Acad Sci In 1 995a, 3 18, 4 1 5-422. Forterre P: Looking for the most "primitive" organism(s) on Earth today: the state of the art. Planet Space Sci 1 995b, 43, 1 67- 1 77. FOlterre P: Archaea: what can we learn from their sequences? Curr Opin Genet Dev 1 997a, 7, 764-770. 1 9 Forterre P: Protein versus rRNA: problems in rooting the universal tree of life. ASM News 1 997b, 63 , 89-95 . Forterre P: Displacement of cellular proteins by cellular analogues from plasmids or viruses could explain puzzling phylogenies of many DNA informational proteins. Mol Microbiol 1 999, 33, 457-465. Forterre P, Bouthier De La Tour C, Philippe H, Duguet M: Reverse gyrase from hyperthermophiles: probable transfer of a thermoadaptation trait from archaea to bacteria. Trends Genet 2000, 1 6, 1 52- 1 54. Forterre P, Philippe H: Where is the root of the universal tree of life? BioEssays 1 999, 2 1 , 87 1 -879. Fraser CM, Gocayne JD, White 0, Adams MD, Clayton RA, Fleischmann RD, BuIt CJ, Kerlavage AR, Sutton G, Kelley JM, Fritchman JL, Weidman JF, Small KV, Sandusky M, Fuhrmann J, Nguyen D, Utterback TR, Saudek DM, Phillips CA, Merrick JM, Tomb J-F, Dougherty BA, Bott KF, Hu P-C, Lucier TS, Peters on SN, Smith HO, Hutchison III CA, Venter JC: The minimal gene complement of Mycoplasma genitalium. Science 1 995, 270, 397-403 Galtier N, Tourasse N, Gouy M: A nonhyperthermophilic common ancestor to extant life forms. Science 1 999, 283, 220-22 1 . Gibson G: Evolution: Hox genes and the cellared wine principle. Curr BioI 2000, 1 0, R452-R455. Gilbert W: The RNA world. Nature 1986, 3 19, 6 1 8 . Glansdorff N : About the last common ancestor, the universal life-tree and lateral gene transfer: a reappraisal. Mol Microbiol 2000, 38, 1 77- 1 85 . Gogarten JP, Kibak H, Dittrich P, Taiz L, Bowman El, Bowman BI, Manolson MF, PooIe RJ, Date T, Oshima T, Konishi J, Denda K, Yoshida M: Evolution of the vacuolar H+-ATPase: implications for the origin of eukaryotes. Proc Natl Acad Sci USA 1 989, 86, 666 1 -6665. Gould SJ, Lewontin RC: The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist program. Proc R Soc Lond B 1 979, 205, 5 8 1 -598. Grbic M: "Alien" wasps and evolution of development. BioEssays 2000, 22, 920-932. Grotzinger JP, Rothman DH: An abiotic model for stromatolite morphogenesis. Nature 1996, 383 , 423-425 . Hendy MD, Penny D: A framework for the quantitative study of evolutionary trees. Syst. Zool. 1 989, 38 , 297-309. Hutchison III CA, Peterson SN, Gill SR, Cline RT, White 0, Fraser CM, Smith HO, Venter JC: Global transposon mutagenesis and a minimal mycoplasma genome. Science 1 999, 286, 2 165-2 169 . Iwabe N , Kuma K-I, Hasegawa M, Osawa S , Miyata T: Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc Natl Acad Sci USA 1 989, 86, 9355-9359. lain R, Rivera MC, Lake lA: Horizontal transfer among genomes: the complexity hypothesis. Proc Natl Acad Sci USA 1 999, 96, 3801 -3806. 20 Joyce GF, Orgel LE: Prospects for understanding the origin of the RNA world. In: Gesteland RF, Cech TR, Atkins JF eds. The RNA World. 2nd ed. Cold Spring Harbor Laboratory Press, New York, 1999, p49-77. Keeling PJ, McFadden GI: Origins of microsporidia. Trends Microbiol l 998, 6, 1 9- 23 . Lan R, Reeves PR: Intra-species variation in bacterial genomes: the need for a species genome concept. Trends Microbio1 2000, 8, 396-40l . Lawrence J : Selfish operons: the evolutionary impact of gene clustering in prokaryotes and eukaryotes. Curr Opin Genet Dev 1999, 9, 642-648. Lawrence JG, Ochman H: Molecular archaeology of the Escherichia coli genome. Proc Natl Acad Sci USA 1998, 95, 94 1 3-9417 . Lazcano A, Miller SL: How long did i t take for life to begin and evolve to cyanobacteria? J Mol Evol 1 994, 39, 546-554. Leipe DD, Aravind L, Koonin EV: Did DNA replication evolve twice independently? Nucleic Acids Res 1999, 27, 3389-340 1 . Levy M , Miller SL: The stability o f the RNA bases: implications for the origin of life. Proc Natl Acad Sci USA 1 998, 95, 7933-7938. Lockhart PJ, Larkum AWD, Steel MA, Waddell PJ, Penny D: Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. Proc NatI Acad Sci USA 1996, 93, 1930- 1934. Lockhart PJ, Steel MA, Barbrook AC, Huson DH, Howe CJ: A covariotide model describes the evolution of oxygenic photosynthesis. Mol Bioi Evol 1 998, 1 5, 1 1 83- 1 1 88. Lopez P, Forterre P, Philippe H: The root of the tree of life in light of the covarion model. J Mol Evol 1 999, 49: 496-508. Lowe DR: Abiological origin of described stromatolites older than 3 .2 Ga. Geology 1994, 22, 3 87-390. Martin W: Mosaic bacterial chromosomes: a challenge en route to a tree of genomes. Bioessays 1 999, 2 1 , 99- 104. Martin W, Stoebe B, Goremykin V, Hansmann S, Hasegawa M, Kowallik KV: Gene transfer to the nucleus and the evolution of chloroplasts . Nature 1 998, 393, 162- 1 65 . McFadden GI: Endosymbiosis and evolution of the plant cell. Curr Opin Plant BioI 1 999, 2, 5 1 3-5 1 9. Miller SL, Bada JL: Submarine hot springs and the origin of life. Nature 1 998, 334, 609-6 1 1 . Mojzsis SJ, Arrhenius G, McKeegan KD, Harrison TM, Nutman AP, Friend CR: Evidence for life on Earth 3800 million years ago. Nature 1 996, 384, 55-59. [Erratum: Nature 1 997, 386, 665] Morowitz HJ, Kostelnik JD, Yang J, Cody GD: The origin of intermediary metabolism. Proc Natl Acad Sci USA 2000, 97, 7704-7708. Moulton V , Gardner PP, Pointon RF, Creamer LK, Jameson GB, Penny D: RNA folding argues against a hot-start origin of life. J MolEvoI 2000, 5 1 , 4 1 6-42 1 . 2 1 Mushegian A: The minimal genome concept. CUIT. Opin. Genet. Dev. 1 999, 9, 709- 7 14. Mushegian AR, Koonin EV: A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci USA 1 996, 93 , 1 0268- 1 0273. Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DH, Hickey EK, Peterson JD, Nelson WC, Ketchum KA, McDonald L, Utterback TR, Malek JA, Linher KD, Garrett MM, Stewart AM, Cotton MD, Pratt MS, Phillips CA, Richardson D, Heidelberg J, Sutton GG, Fleischmann RD, Eisen JA, Whilte 0, Salzberg SL, Smith HO, Venter JC, Fraser CM: Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thennotoga maritima. Nature 1 999, 399, 323-329. Nelson KE, Levy M, Miller SL: Peptide nucleic acids rather than RNA may have been the first genetic molecule. Proc Natl Acad Sci USA 2000, 97, 3868-387 1 . Nisbet EG, Sleep NH: The habitat and nature of early life . Nature 200 1 , 409, 1 083- 109 1 . Ochman H , Lawrence JG, Groisman EA: Lateral gene transfer and the nature of bacterial innovation. Nature 2000, 405, 299-304. Ohmoto H: Evidence in pre - 2 .2 Ga paleosols for the early evolution of atmospheric oxygen and terrestrial biota. Geology 1 996 24( 1 2) 1 135-9 Penny D, Foulds LR, Hendy MD: Testing the theory of evolution by comparing phylogenetic trees constructed from five different protein sequences. Nature 1 982, 297, 1 97-200. Penny D, McComish BJ, Charles ton MA, Hendy MD: Mathematical elegance with biochemical realism: the covarion model of molecular evolution J Mol Evo1 200 1 , (in press). Philippe H, Forterre P: The rooting of the tree of life is not reliable. J Mol Evol 1 999, 49, 509-523. Philippe H, Laurent J : How good are deep phylogenetic trees? CUff Opin Genet Dev 1 998, 8 , 6 1 6-623 . Philippe H, Budin K, Moreira D: Horizontal transfers confuse the prokaryotic phylogeny based on the HSP70 protein family. Mol Microbiol l 999, 3 1 , 1 007- 1 009. Poole A, Penny D, Sjoberg B-M: Methyl-RNA: an evolutionary bridge between RNA and DNA? Chem BioI 2000, 7, R207-R2 1 6. Schopf JW, Packer BM: Early Archean (3.3 billion to 3 .5 billion year old) microfossils from Warrawoona Group, Australia. Science 1 987, 237, 70-73 . Shapiro R: Prebiotic cytosine synthesis: a critical analysis and implications for the origin of life. Proc Natl Acad Sci USA 1 999, 96, 4396-4401 . Sicheritz-Ponten T, Andersson SGE: A phylogenomic approach to microbial evolution. Nucleic Acids Res 200 1 , 29, 545-552. Snel B , Bork P, Huynen MA: Genome phylogeny based on gene content. Nat Genet 1 999, 2 1 , 1 08- 1 10. 22 Springer MS, Teeling EC, Madsen 0, Stanhope MJ, de Jong WW: Integrated fossil and molecular data reconstruct bat echolocation. Proc Natl Acad Sci USA 200 1 , 98, 624 1 -6246. Stiller JW, Hall BD: Long-branch attraction and the rDNA model of early eukaryotie evolution. Mol BioI Evol 1999, 1 6, 1 270- 1279. Summons RE, Janhke LL, Hope JM, Logan GA: 2-methylhopanoids as biomarkers for cyanobacterial oxygenic photosynthesis. Nature 1 999, 400, 554-557. Szathmary E, Maynard Smith J: The major evolutionary transitions. Nature 1 995, 374, 227-232. Teichmann SA, Mitchison G: Is there a phylogenetic signal in prokaryote proteins? J Mol Evol 1 999, 49, 98- 107. Wachtershauser G: Groundworks for an evolutionary biochemistry: the iron-sulphur world. Prog Biophys Mol BioI 1 992, 58, 85-20 1 . Walsh MM: Microfossils and possible microfossils from the Early Archean Onverwacht Group, Barberton Mountain Land, South Africa. Precambrian Res 1 992, 54, 27 1 -293. Woese CR: Bacterial evolution. Microbiol Rev 1987, 5 1 , 22 1 -27 1 . Woese CR: The universal ancestor. Proe Natl Acad Sei USA 1 998, 95, 6854-6859. Woese CR: Interpreting the universal phylogenetie tree. Proe Natl Aead Sei USA 2000, 97, 8392-8396. Woese CR, Fox GE: Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Aead Sei USA 1 977, 74, 5088-5090. Woese CR, Kandler 0, Wheelis ML: Towards a natural system of organisms: proposal for the domains Arehaea, Bacteria, and Eukarya. Proe Natl Aead Sei USA 1 990, 87 :4576-4579. Wren BW: Microbial genome analysis: insights into virulence, host adaptation and evolution. Nat Rev Genet 2000, 1 , 30-39. 23 leffares DC, Poole AM & Penny D. Relics from the RNA world. Journal of Molecular Evolution 46, 1 8-36 ( 1998). Paper 1 Reprinted with permission from Springer-Verlag New York Inc. Poole AM, J effares DC & Penny D. The path from the RN A world. Journal of Molecular Evolution 46, 1 - 1 7 ( 1998). Paper 2 Reprinted with permission from Springer-Verlag New York Inc. RN A evolution: separating the new from the old. Manuscript. Paper 3 RNA evolution: separating the new from the old. Abstract. The existence of an RNA world, an RNA-rich period in the early evolution of life, is widely accepted, as is the idea that many cellular RNAs can be traced back to this period. However, while some RNAs may derive from the very earliest stages of life , others have arisen comparatively recently in evolution. A further difficulty is that some RNAs may have arisen early in evolution, but may have changed their role during evolution. It is therefore useful to distinguish between the 'ultimate' origin of an RNA and a 'proximate' origin, where it evolved into its present function. A number of RNAs have not been unequivocally placed as 'new' or 'old', including group I & II introns, snRNAs, tmRNA and snoRNAs. In this article, we examine how RNA world 'relics' might be distinguished from RNAs with a more recent origin, why there are problems or controversies in establishing the evolutionary origins of some RNAs, and whether it is possible to resolve these. Introduction. In eukaryotes it is well-established that RNA is central to a number of molecular processes, including protein synthesis, mRNA editing and splicing, rRNA and tRNA processing and telomere replication. Some of these RNAs are also found in archaea and eubacteria, though in general it appears that RNA plays a less prominent role in metabolism in these organisms (Wassarman et aI. , 1 999). Indeed, this differential use of RNA is claimed to be a fundamental one, and may be the basis for very different evolutionary mechanisms employed in the diversification of prokaryotes and eukaryotes (Herbert & Rich, 1 999a,b). It is generally accepted that many RNAs are evolutionarily very ancient. The RNA world hypothesis (Gilbert, 1986) is that, prior to the advent of genetically­ encoded proteins and DNA, RNA was both genetic material and major biological catalyst. With the advent of protein synthesis, and later, ribonucleotide reduction, RNA is believed to have gradually lost its central role as catalyst and information storage molecule. Those few RNAs that remain in modem metabolism are widely considered to be 'relics' from the RNA world period (Benner et aI. , 1 989; Jeffares et al . , 1 998). However, with the number of novel RNAs growing, it is clear that many RNAs may have arisen more recently in evolution to fulfill specific functions and do not date back to the RNA world period (Eddy, 1 999). In this article, we briefly review the current state of the RNA world hypothesis insofar as it allows us to distinguish between RNAs that are likely to be ancient in origin and those which are more recent. We define 'ancient' as prior to the emergence of the three domains, archaea, bacteria and eukaryotes, that is pre-Last Universal Common Ancestor (pre-LUCA), and 'recent' as post-LUCA. A broad survey of RNAs that are probably recent innovations suggests that RNA is a potent source of novel Page 1 function in eukaryotes. In addition, we will focus on those RNAs where the evolutionary origins are a current source of controversy. Central to this problem is the need to distinguish between 'ultimate' origins and 'proximate' origins, thereby providing a distinction between the origin of a given RNA and the role it currently plays in modern metabolism. This distinction is in effect the same as the same as the use of the terms paralogous and orthologous in descriptions the evolutionary history of gene families. Orthologous genes have arisen through from a single ancestral gene through duplication and divergence and maintained the same function over time. Paralogous genes have also arisen from a single ancestral gene through duplication and divergence but now perform different functions . An example of orthologous RNA genes are the RNase P genes from E. coli and yeast. An example of para logo us RNA genes are RNase P and RNase MRP. In this paper, we are particularly interested in the latter, case. Where two related RNAs perform different functions, what is the ultimate origin of this family of RNA? What is old and what is new? We have previously suggested several criteria as an aid for drawing the line between relic RNAs and recently-evolved RNAs (Poole et aI. , 1999) . These are: 1 . That the RNA is ubiquitous in distribution. 2 . That the RNA is central to metabolism. 3 . Whether proteins perform the function equally well in other organisms. 4. That the RNA is catalytic! . These criteria are helpful, but are not necessarily sufficient to give a reliable indication of the likely status for every RNA. Criterion 1 is the strongest argument for the RNA world ancestry of a given RNA, and one can assign relic status to a number of RNAs, on this criterion alone. Obvious examples are tRNA, rRNA, RNase P and srpRNA (4.5S in bacteria, 7S in eukaryotes & archaea) (Jeffares et aI . , 1998). In the case of criterion 2, where an RNA is not ubiquitous, one may argue for an RNA world origin on functional grounds. In this manner, Maizels and Weiner ( 1999) have argued for the antiquity of telomerase function, which is further supported by a strong selection pressure for the circularisation of chromosomes in the prokaryotes being a derived trait, and thus not present in the RNA or RNP (ribonucleoprotein) worlds (Forterre, 1 995). In spite of the example of telomerase, arguing just from criterion 2 is difficult, since it is a matter of opinion as to what is central to metabolism. 1 The term catalytic RNA is used either in a chemical sense or a functional sense. In the chemical sense, a catalytic RNA is one which can catalyse a chemical reaction without the aid of protein, that is, the RNA is necessary and sufficient for catalysis. In a functional sense, an RNA which is necessary but not sufficient for catalysis is still a catalytic RNA. Bacterial RNase P RNA is catalytic in both senses, but human RNase P RNA is only catalytic in the functional sense. Page 2 The third criterion is of fundamental importance, and stems primarily from the argument that proteins are in general better catalysts than RNA (Jeffares et al. , 1 998; Poole et al. , 1 999) . This suggests that, given the general trend is replacement of catalytic RNA with protein during evolution, in cases where in one lineage a protein performs a function identical to that of RNA in another lineage, the RNA is ancestral. However, certain functions may simply be better-suited to RNA (a point to which we shall return), and hence, not all RNAs should be placed automatically in the RNA world (Eddy, 1 999). By itself, criterion 3 may be insufficient, but it is an important consideration, particularly where a function is argued to be central to metabolism. We consider several examples where criteria 2 and 3, combined, are important in assigning putative relic status. Criterion 4 is more complex than it appears, which may be somewhat surprising, given the importance that catalytic RNA studies have played in the development of the RNA world hypothesis. Distinguishing between functional and chemical definitions for catalysis is helpful however. We will argue here that all RNAs defined as functionally catalytic but very few RNAs defined as chemically catalytic are direct descendents from the RNA world (see Table), though the latter are nevertheless important exemplars of RNA world complexity. RNA as a source of novel function As the RNA universe expands, it is becoming clear that RNA is more than just a relic from early evolution. 'New' RNAs in many cases can be readily picked out simply because the role they play is highly specialised and their phylogenetic distribution is very limited, indicating recent origins. It seems likely that the growing list of newly discovered RNAs (Table) is but the tip of the iceberg, especially given that current genomic search strategies (e.g. BLAST) do not perform well for RNA families, which in general retain very little primary sequence information (e.g. Ganot et aL, 1 997a; Lowe & Eddy, 1 999; CoIlins et aI. , 2000). Likewise, large-scale identification techniques such as those possible with EST databases are biased against detection of noncoding RNAs (Eddy, 1999, though see Htittenhofer et al. , 2001 ) . Recent reviews (Eddy, 1 999; Wassarman e t al. , 1 999; Erdmann e t aL, 200 1 ) cover much of the developments in RNA identification (for summary and relevant references from the literature, see Table), so we limit ourselves to a number of examples where it might be argued that RNA is inherently better suited to certain roles than protein. Furthermore, we consider briefly how RNA impacts on the evolvability of organisms. RNA editing in kinetoplastids of trypanosomes. RNA editing, whereby the sequence of a transcript is changed prior to translation, is widespread, and occurs via widely different mechanisms. The Page 3 mechanisms appear unrelated and have limited distribution (Smith et al. , 1 997). RNA editing is particularly prevalent in organelles, and the best explanation for this is that editing is a response to mutational pressures from the operation of Muller's Ratchet in organellar genomes (Bomer et aI. , 1997). Muller's Ratchet is the slow accumulation of slightly deleterious mutations in the absence of recombination (reviewed in Andersson & Kurland, 1 998; Blanchard & Lynch, 2000). The largest number of editing events observed in a single organelle is in kinetoplastids of trypanosomes, where uridine insertion and deletion occurs in about 12 of 18 mRNA transcripts, creating start codons, frameshift corrections, and even entire open reading frames (Estevez & Simpson, 1999) . As well as being the most extensive form of transcript editing, it is also the only form where RNA guides are involved. The information for transcript editing is housed on separate minicircles in the form of guide RNA genes. Depending on the organism (see Simpson et aI. , 2000) there are approximately 50 maxi circles which house the mitochondrial genes, and >1000 guide RNA coding minicircles. Given that editing in general (Bomer et aI. , 1 997), the breaking of a single chromosome into several smaller pieces (Reanney, 1 986) and mutational buffering through presence of mUltiple copies, are all expected to slow the loss of genetic information through Muller's Ratchet, and given the limited phylogenetic distribution of guide RNA-mediated uridine insertion/deletion editing (Simpson et aI. , 2000), this is extremely unlikely to date back to the RNA world. Covello and Gray ( 1 993) have introduced a three-step model for the evolution of RNA editing in general, and kinetoplastid RNA editing, the latter having been extended by Stoltzfus ( 1999). In kinetoplastid editing (and editing in general) it is not necessary for there to be a selective advantage for fixation of editing. It may simply arise through suitable preconditions . Stolzfus ( 1 999) points out that recruitment of the editing machinery can be explained by tinkering, since it involves enzymes that are known in other functions. Furthermore, multiple genome copies will slow Muller's Ratchet, and redundancy can result in the accumulation and tolerance of variance between copies . Thus the emergence of a mutation (that can be neutral, slightly deleterious or lethal with only a single copy of the genome) in one copy of a given gene will always be neutral. Likewise, expression of an anti sense transcript from another unaltered copy of the gene, which can bind to the mRNA produced from the mutant gene copy, has no fitness effect. Such potential precursors may arise and subsequently disappear through drift, and the same is expected for an interaction that is edited by chance. While the genotypes may differ, the phenotype for edited and unedited versions is identical , and under a neutral or even slightly deleterious model (i.e. Muller's Ratchet), both can become fixed. As fixation at more sites occurs, while variation in the position of editing will be stochastic (for editing events where the change is neutral), the probability that all revert through back mutation is extremely low. Moreover, at functionally important sites, editing becomes maintained by natural selection (Covello & Gray 1 993). This is because some editing events have become essential for production of the protein Page 4 product. Loss of a key editing enzyme, which would affect all edited sites, would thus be lethal and selected against. Strong evidence for the continuing role of neutral processes and drift in guide RNA-mediated editing includes the presence of multiple copies of both minicircles and maxicircles and large size variation for both minicircles and maxicircles across a range of organisms, large variability in minicircle copy number within strains over time and between species, presence of guide RNA genes on both minicircles and maxicircles, and existence of variant guide RNAs with mismatches in the guide regions (Simpson et aI. , 2000). In summary, the suggestion that the effect of Muller's Ratchet on organellar genomes resulted in the independent evolution of unrelated forms of RNA editing in eukaryotic organelIes (Borner et al. , 1 997), provides a strong precedent for considering uridine insertion/deletion editing to be a recently-evolved trait, and not an RNA world relic. It also underpins the evolutionary utility of RNA-where a class of RNA is limited in phylogenetic distribution and acts as a guide, it may be a recently­ evolved trait. RNA as a 'riboregulator'. Riboregulators are RNAs that act to regulate gene expression, usually through base-pairing, and, as such, are expected to evolve readily. A number of well­ understood examples are known, and a long list of possibles are currently under investigation (Erdmann et aI. , 2001 ). A number of these RNAs are included in Table 1 , and an exciting finding is that 'riboregulation' is not limited to mRNA binding (as with lin-4 and let-7 anti sense RNAs from C. elegans) . It may also occur through other processes, such as RNA-protein interactions, as exemplified by CsrB RNA inhibition of CsrA protein activity in E. coli (Romeo, 1 998), and meiRNA interaction with mei2 protein in regulation of meiosis in S. pombe (Watanabe & Yamamoto, 1 994). Another exciting prospect for the 'modern RNA world' is that unrelated RNAs have appeared in nearly identical functions, where either these functions are known to have evolved more than once, or where the evolutionary origins of the recruited RNAs can be discerned. For instance, BC l and BC200 are RNAs with similar functions, the former having been identified in rodents (Muslimov et aI. , 1 998), the latter being found in primates (Skryabin et al. , 1 998). Both appear to have a role in translation regulation in dendrites, and both apparently bind the same protein (Kremerskothen et al. , 1998; Brosius, 1 999). While convergence of function has yet to be conclusively demonstrated, their evolutionary origins are clear; BC1 appears to have been recruited from tRNAA1a, while BC200 was originally an Alu element, a type of transposable element derived from eukaryotic srpRNA (Brosius, 1 999). Given that searches have so far not yielded other such functionally analogous RNAs within mammals, yet the proteins known to make up the BC IIBC200 are conserved (Brosius, 1 999), it will be interesting to see if there is evidence for non-orthologous replacement by onelboth RNAs. Is RNA is inherently better suited to certain Page 5 functions, being selected for over and over again for the same class of function? To this question we shall return. RNAs in dosage compensation. An even more dramatic example of functional convergence is emerging in studies of dosage compensation. In organisms with sex chromosomes, the number of sex chromosomes is unequal between the sexes. In Drosophila and mammals , males are XY, and females are Xx. The unequal number of Xs means that gene expression from the X differs between the sexes, and there are mechanisms which compensate for this. In Drosophila, dosage is turned up in males, making expression from their single X equivalent to the two X chromosomes in females. In mammals, one X is inactivated in females, so expression is halved, making it equivalent to the single X carried by males . Furthermore, C. elegans takes a third strategy; expression from both copies of the X in hermaphrodites is halved relative to males (which are XY). Given multiple solutions to this problem, it is clear that mechanisms for dosage compensation have evolved more than once (Pannuti & Lucchesi, 2000; Marin et aI. , 2000). In mammals and flies not only are the mechanisms of dosage compensation unrelated, they both make use of RNA for marking the X for either inactivation or upregulation, respectively (Kelley & Kuroda, 2000). The RNAs (roXl & roX2 in Drosophila, and Xist, which is regulated by an anti sense RNA, Tsix in human) are unrelated, yet provide an analogous function-in both systems, RNA is thought to facilitate interaction at numerous points along the length of the target X chromosome, and the RNA genes are themselves to be found on the X chromosome. Importantly, the systems must operate via different mechanisms; in mammals, only one female X is inactivated, and it is therefore unsurprising to find that the mode of inactivation is via some mechanism that occurs exclusively in cis. In flies, there is no such requirement, as might be expected, given that dosage compensation is through upregulation of the single male X. While it is still unclear how RNA is involved in these systems, it is intriguing that RNA has apparently been independently recruited to an analogous function on separate occasions. How does dosage compensation in C. eZegans operate? Does this likewise require RNA, and indeed, in other organisms such as birds and reptiles, where sex chromosomes are different again, is dosage compensation also an RNA­ dependent process? Unclear origins of tmRNA In bacteria, it is well established that release from ribosomal stalling on damaged mRNA is an RNA-mediated process. tmRNA, so called because of its dual role as tRNA and mRNA, allows a stalled ribosome to be uncoupled from the mRNA upon which it is stalled by virtue of the tRNA moiety of tmRNA, which is charged with alanine. The tRNA moiety accesses the A site of the ribosome and the alanine with which it is charged is then added to the partially-synthesised peptide. Next, the Page 6 ribosome switches template by virtue of a conformational change in the tmRNA, and the ribosome uses the tmRNA as a template. The tmRNA encodes a string of alanines, of length 10, that labels the damaged peptide for degradation, and the ribosome is released (Keiler et aI . , 1 996) . So far, this process has only been identified in bacteria where, it appears ubiquitous (Keiler et al. , 1999). Given the dual role of the tmRNA as both tRNA and mRNA, it might be considered a candidate for the RNA world. Indeed Maizels and Weiner ( 1999) have speculated that such an RNA could have been the RNA world counterpart of initiator tRNA in contemporary translation. However, it is equally likely that this is a recent innovation (i.e. post LUCA) specific to the bacterial lineage. In eukaryotes, only mRNAs that possess a 5' cap structure and poly A tail pass a prerequisite quality control check before translation (Ibba & SolI, 1 999). Damaged mRNAs are degraded via a nonsense-mediated decay pathway (Culbertson, 1 999), reducing the production of truncated proteins during translation. There is clearly selection for release of stalled ribosomes and tagging of damaged peptide for protein degradation in a sophisticated protein synthetic machinery, and a scenario for RNA world origins such as that suggested by Maizels and Weiner ( 1 999) is difficult to test. What will be tractable is extending the search for tmRNA to eukaryotes and archaea. Indeed, even with quality control in eukaryote translation, mRNA may occasionally be damaged during translation, so it is possible that eukaryotes possess tmRNA. A more extensive search will thus aid in establishing whether tmRNA may have been a feature of the LUCA. Certainly, given the ubiquity of the cellular protein degradation apparatus, the proteasome (Baumeister et aI. , 1 998; Bouzat et aI. , 2000), and the fact that search strategies for tmRNA identification have not yet been fully applied to eukaryotes and archaea, it will be interesting to see if stalled ribosome release occurs via a similar mechanism in these lineages. Many naturally-occurring catalytic RNAs are not RNA world relics. As we have already seen, not all criteria need necessarily apply for an RNA to be designated a relic, and for all but the first, the application of the criterion may not in itself provide sufficient information for the status of relic to be assigned. Criterion 4 is whether or not an RNA is catalytic. The RNA world hypothesis states that RNA catalysts pre-dated proteins in the evolution of catalysis, and the idea has been extended to a two-step transition, RNA-7RNP-7protein, that more accurately explains the process by which an RNA is replaced by a catalytic protein, and identifies catalytic perfection as central to understanding how come there are any ribozymes remaining at all (Jeffares et al. , 1 998; Poole et aI. , 1 999). The term catalytic RNA is most often used in a chemical sense, that is, a naked RNA that is capable of catalysis without cognate proteins. This definition excludes the peptidyl transferase activity of large subunit ribosomal RNA, eukaryotic RNase P, and spliceosomal snRNA. All are nevertheless putative RNA world relics, and in all cases, the RNA component is absolutely required for catalysis (Noller et al. , Page 7 1 992; Muth et al. 2000; Nissen et al. 2000; Kirsebom & Altman, 1 999; Yean et al . , 2000; Nilsen, 2000). Surprisingly, the sole case where a catalytic RNA (in the chemical sense of being necessary and sufficient to carry out catalysis) can unequivocally be placed in the RNA world is that of RNase P. This has been found in all organisms examined to date, and is universally required for tRNA maturation. Bacterial RNase P has several additional substrates , including srpRNA (4.5S RNA) and tmRNA precursors (Kirsebom & Altman, 1 999), and hence can be claimed under criteria 1 and 2 also. The related RNase MRP, which is involved in pre-rRNA processing in eukaryotes, is more limited in distribution, and its evolutionary origins are less clear. In considering a possible RNA world origin for RNase MRP, perhaps the most important piece of evidence is the position at which RNase MRP cleaves pre-rRNA in eukaryotes (Morrissey and Tollervey, 1995 ; Venema & Tollervey, 2000)-the A3 site in eukaryotic pre-rRNA is at an equivalent position to a tRNA found in archaeal and bacterial pre-rRNAs, and Morrissey and Tollervey ( 1 995) have argued that the tRNA has been lost from the eukaryote pre-rRNA, while cleavage at this site has been maintained. Furthermore, that RNase P is ubiquitous while RNase MRP has only been found in eukaryotes, suggests that MRP is derived from P by duplication and divergence, and bolsters the claim that the original state was tRNA processing from within pre-rRNA. While MRP may post-date the LUCA, its function in pre-rRNA processing is effectively one in the same as P in prokaryotic pre-rRNA processing. As far as the additional substrates of bacterial RNase P are concerned, it is currently hard to establish the antiquity of these. While srpRNA is ubiquitous, the eukaryote and archaeal versions srpRNAs (7S RNAs), are not known to be processed by RNase P, and tmRNA is only known in bacteria, and, as described above, its status as an RNA world relic is uncertain. Certainly there is a precedent for post-RNA world functional diversification, as E. coli RNase P is also known to process phage RNAs and the polycistronic his operon mRNA (AItman & Kirsebom, 1 999). Another example which may clarify the discussion is the finding that there are two spliceosomes in metazoans (Tarn & Steitz, 1 997 ; Burge et al. , 1 999). Both have the same origin, but the minor variant arguably arose more recently, through duplication and divergence. The function of both is identical (both excise introns from pre-mRNA, though the class of introns recognised is different), but one probably has a more recent origin (Burge et al. , 1 999) so in the strictest sense is not a relic, even though splicing in general arguably originated in the RNA world (see next section) . In the case of RNases P and MRP, a more recent duplication and divergence event for these is possible, assuming RNase P carried out both functions initially (Morrissey & Tollervey, 1 995). These examples serve to point out that in some cases, it may difficult to separate the ultimate origin from the proximate origin. This is similar to the problem of trying to establish the ultimate origin of a family of proteins which carry out a range of functions. Where the function of an RNA has remained essentially Page 8 unchanged since the RNA world, it is possible to identify the ultimate origin. In the case of MRP, the function it carries out is arguably ancient, but the origin of MRP itself cannot be unequivocally linked with this function, hence, it is unclear whether it should be assigned relic status. Morrissey and Tollervey's ( 1995) model best fits the data, though other scenarios can be envisaged (CoIlins et al . , 2000). Other naturally-occurring ribozymes, including the hammerhead, hairpin, hepatitis delta virus and neurospora VS ribozymes (Table, Symons, 1 997; Carola & Eckstein, 1 999) are examples of recently-evolved catalytic RNAs, since these are used in novel strategies for viral or plasmid (Neurospora VS ribozyme and Salamander hammerhead-like RNA) genome replication. It has been argued recently that all these ribozymes have a common origin (Harris & Elder, 2000), but even if this is the case, this does not require that they originated in the RNA world. That said, these ribozymes demonstrate a potential mechanism for genome replication, as well as contributing to the reconstruction of a putative RNA world. The HDV ribozyme is a particularly salient example, since it has been shown to carry out self-cleavage through general acid-base catalysis (Perrotta et aI. , 1 999; Nakano et al. , 2000), as opposed to metal ion catalysis (Westhof, 1 999). Likewise, the hairpin ribozyme may also make use of general acid-base catalysis (Rupert & Ferre-D'Amare, 200 1 ), and excitingly, this is also the case for the peptidyl transferase subunit of the ribosome (Muth et aI. , 2000). The similarity to the catalytic reaction carried out by peptidyl transferase certainly establishes the relevance of these viral RNAs to catalysis in the RNA world, but also raises the point that ribozymes could have arisen multiple times in evolution with similar chemistry. mRNA splicing and self-splicing introns. A less clear case is presented by the group I and II self-splicing introns (Table). Broadly, the phylogenetic distribution of these two ribozymes is bacteria and eukaryotic organelles (see Figure 4 in Lykke-Andersen et aI. , 1 997; Cech & Golden, 1 999) Group I introns make use of the 3'-OH of free guanosine as nucleophile in the first step of splicing, while in group II introns, the nucleophile is provided in eis, and consequently, this is a 2'-OH group. Splicing in both cases is via a two step transesterification. The spliceosome, a large ribonucleoprotein complex responsible for splicing out of introns from eukaryotic nuclear pre-mRNA, also makes use of an internaI 2'-OH for the first transesterification. At the core of the spliceosome are 5 �mall lluclear snRNAs: U I , U2, U4, U5 and U6. A common origin of group IT introns and the spliceosome has been suggested by numerous authors (e.g. Sharp, 1 985, 199 1 , 1 994; Cech, 1 986; Copertino & Hallick, 1 993; Stoltzfus 1 999). This possibility revolves around the idea that a group II intron evolved into a 5-piece RNA complex. This idea is gaining ground, with similarities in chemical mechanism of cleavage, structurally analogous regions and ligation by a two-step transesterification (Sharp, 1 985; Cech, 1 986; Chanfreau & Jacquier 1994; Sontheimer et aI. , 1 999; Gordon et al. 2000; Boudvillain et al. 2000; Yean et al. Page 9 2000). Strikingly, Hetzer et al. ( 1 997) removed the ID3 subdomain of a group II intron, which reduced exon anchoring during ligation, and were able to reconsititute this by supplying US snRNA in trans. In addition to the direct comparisons between canonical group IT and spliceosomal splicing, the feasibility of a common origin has been given support from a number of sources. Formation of group II intron structure from three separate transcripts has been observed in Chlamydomonas reinhardii chloroplasts (Goldschmidt-Clermont et al. 1 99 1 ), demonstrating that trans-splicing can arise from cis-splicing, and that the proposal of fragmentation of a single functional RNA (as envisaged for the evolution of the spliceosome) is not without precedent. Group III introns, degenerate group II introns found as 'twintrons' (an intron within an intron) in Euglena chloroplast DNA, lack much of the canonical structure of group II introns, and probably require additional functions in trans for splicing (Copertino & Hallick 1 993) . Again, this has been considered as support for the possibility that the five snRNAs could have arisen from a single precursor. Moreover, Copertino et al. ( 1994) have described a group lIT twintron which excises via a lariat intermediate, analogous to the formation of a lariat in the excised spliceosomal introns. With so much circumstantial evidence, it seems likely that the spliceosomal RNAs and group II introns have a common origin. However, such similarities may either belie a common ancestry or they might be a result of convergence owing to 'chemical determinism' (Weiner, 1993). Given that splicing always begins by nucleophilic attack of the phosphate-sugar backbone by a hydroxyl group on ribose, the different strategies used by group I and II introns (3'-OH of GTP supplied in trans versus 2'-OH of adenosine supplied in cis) might be the only two possible ways of initiating this reaction. That the spliceosome makes use of the same mechanism as group II introns could therefore be a consequence of 'chemical determinism' (and therefore convergence), not common origin (Weiner 1993) . Indeed, in all three cases, splicing is carried out through two transesterifications. Chemical similarities and functional parallels provide an inroad into understanding the evolution of splicing, but given Weiner's ( 1993) point, they are not particularly informative in terms of distinguishing between convergence and divergence. Structural studies may help shed light on this question, in much the same way as this has resolved the question of whether the different classes of ribonucleotide reductase are convergent or divergent (Logan et al. , 1 999) . If it is nevertheless concluded that the similarities between group II introns and pre-mRNA splicing are sufficient to rule out convergence (that there several examples of alternative cleavage reactions available to RNA (see Westhof, 1 999) in addition to those in group I and II introns might suggest this), how is the direction of evolution established? It is as conceivable that group II introns are derived from the snRNAs through fusion and reductive evolution as the possibility that snRNAs evolved from a group II intron. Page 10 In examining the evolutionary origins of splicing, there are two major questions : • Does splicing date back the the RNA world? • Did group II introns give rise to the snRNAs of the eukarotic spliceosome, or vice versa? The short answer to first quesion is that an RNA world origin for splicing is likely, but the argument is over whether such splicing was group II-like, spliceosome-like, or both. In addressing the second question, it is assumed that group II and pre-mRNA splicing are related by descent. We begin with an overview of the first question, specifically with respect to the intron-exon structure of eukaryotic nuclear genes, since this has been the source of greatest controversy. Eukaryotic pre-mRNA splicing has been argued to be an ancient process from which protein diversification by ex on shuffling could have subsequently arisen (see Gilbert, 1 978; Doolittle, 1978; Blake, 1978). It was argued that through the presence of splicing, discrete protein modules could have been mixed and matched, producing protein diversity from functional building blocks encoded by 'exon shuffling' . Indeed, shuffling is seen to some extent, in the form of processes such as alternative splicing, where an mRNA can be spliced in different ways to yield different products (reviewed by Graveley 200 1 ) . The implication of the 'introns-early' hypothesis for the origin of introns is that the eukaryote splicing apparatus and the intron-exon structure of genes arose very early in evolution, and were subsequently lost from prokaryote genomes. This explanation, while potentially explaining a role for splicing in protein diversification through exon shuffling, runs into two problems. First, it does not actually explain intron origins, rather, only a possible role for these in exon shuffling, after the advent of an intron-exon gene structure. Exon shuffling as an explanation for the origin of the intron-exon structure of genes implies that introns arose in order to shuffle exons. That is, it implies evolutionary forethought (Blake 1 978; Doolittle, 1 978). A consequence of the origin of introns might be exon shuffling, but that separates the origin of introns from the emergence of exon shuffling. Second, the specific prediction of exon shuffling is that in at least some cases, the intron-exon structure of a gene should reflect the existence of discrete functional protein modules. Overall, the data are not strong, and even if there are cases of ancient exon shuffling, it may not be possible to detect these if intron sliding (for which there is no support [Stoltzfus et al. 1 997]) is permitted (Rzhetsky et al. 1 997). Indeed, the data accumulated to date (see Logsdon 1 998; Wolf et al. 2000) are most compatible with the alternative theory, 'introns-Iate', that the 5 snRNAs of the spliceosome arose from group II introns which originated in the bacterial lineage as selfish elements, and that introns represent insertion of selfish genetic elements. Under 'introns-late', group II introns entered the eukaryote genome via the mitochondrion (members of the u­ proteobacteria, which, among extant bacteria, share the most recent common ancestor with mitochondria, have been shown to possess group II introns), and this is known as the 'mitochondrial seed' hypothesis (Cavalier-Smith, 199 1 ; Logsdon, 1 998) . Page 1 1 Importantly, phylogenetic evidence suggests that all extant amitochondrial eukaryotes once possessed mitochondria (or hydrogenosomes, which share a common origin with mitochondria - see Embley & Hirt, 1 998; Rotte et al. , 2000). This can be taken as evidence to support the scenario described by Logsdon ( 1 998), since all modern eukaryotes arose from an ancestral cell which harboured an endosymbiont. Hence the advent of splicing specifically in eukaryotes could be explained by endosymbiont to host transfer of a group II intron this direction of transfer is well supported by independent evidence [Blanchard & Lynch, 2000]), followed by complexification to form the modern spliceosome. Introns in are in fact found in all three domains. Archaeal introns are not self­ splicing, but are positionally conserved with eukaryotic tRNA introns, and both make use of a conserved LAGLIDADG endoribonuclease in the cleavage and ligation reaction (Lykke-Andersen et al . , 1 997; Trotta & Abelson, 1 999) . Group I introns are found in bacteria and both the eukaryote nucleus and organelIes (Lykke-Andersen et al . , 1 997; Cech & Golden, 1999), while group II introns are found in bacteria and eukaryote organelles (mitochondria and chloroplasts) (Logsdon, 1 998). However, it is hard to argue for a common origin for the three types of intron (group I, groupIIIspliceosomal, tRNA), so on phylogenetics, introns may have arisen more than once, and do not clearly date back to the RNA world. A common origin is not impossible, just not readily testable, given current data. While many consider the introns early-late debate to be largely over, there are nevertheless shortcomings in the introns-Iate scenario. Furthermore, alternatives exist to exon-shuffling as an explanation for the origin of introns and the spliceosomal RNAs in the RNA world. While there are continued arguments for the validity of exon shuffling (de Souza et aI. , 1 998), we think the evidence does not favour this scenario (see Logsdon, 1998). That modern eukaryotes are all likely to have descended from a mitochondrion-bearing ancestor adds weight to the suggestion that the spliceosome arose specifically within that lineage subsequent to transfer of mitochondrial group II introns to the nUcleus2• However, a serious problem for this account is that, because the model does not involve a selective advantage for the emergence of splicing, it i s hard to understand how a group II intron became fragmented into five-pieces, and associated with a large number of conserved proteins. There is nothing at fault with not invoking a selective pressure in the evolution of complex structures. As described above, this has provided valuable insight into the evolution of kinetoplastid editing. 2 For simplicity, we imply the host was a eukaryote with a nucleus, and the endosymbiont was a mitochondrion. The nature of the endosymbiont and host are currently the subject of intense debate (Andersson & Kurland, 1 999; Rotte et al . 2000), but we note that on current data, it is simplest to describe the endosymbiont as mitochondrial, since it is in these organelles that group II introns have been identified (Logsdon, 1 998). Page 1 2 An additional problem with this scenario is that it relies on inference. It cannot be directly tested using phylogenetic analyses in the same way as other mitochondrial to nucleus transfers (reviewed in Embley & Hirt, 1 998; Philippe et al. 2000). This is because both sequence and structure of group IT introns and spliceosomal RNAs are too divergent to be able to use either of these for phylogenetic reconstruction of their histories . Assuming group II and spliceosomal RNAs have a common origin, it is not possible to distinguish between a common origin in LUCA or transfer from mitochondrion to nucleus on the current dataset (Figure 1) . The model advocated by Logsdon ( 1 998) requires transfer of non-fragmented group II introns to the nucleus (no examples of fragmented group II introns in mitochondria have been described) where these then insert into the host DNA, and excise during mRNA expression. Then, over time, the mechanism shifts from cis splicing to trans splicing by a complex of 5 RNAs. The first point is uncontroversial given that group II intron mobility is known (though no examples of nuclear group II introns are known) to be mediated via an intron-encoded reverse transcriptase (Lambowitz et al. , 1 999). The second is harder to explain. The fragmentation process was either extremely fast, predating divergence of the major eukaryote lineages, or, there was selection for the modem spliceosome over other versions, or least likely, the modem 5-piece spliceosome was fixed through drift. No suggestions have been made regarding the second two possibilities, and the third is becoming more problematic since the previous consensus on eukaryote phylogenetics based on rRNA phylogeny (Sogin, 1 99 1 ) has been challenged by the finding that microsporidia are not deep-diverging eukaryotes as per the rRNA trees, but rather are a sister group of fungi (reviewed in Keeling & McFadden, 1 998). The emergence of the modem splicing apparatus must predate the diversification of eukaryotes, but is also constrained by the endosymbiosis event. In the absence of apparent selection for the origins of the spliceosome late (Stoltzfus, 1 999), there ought to be spliceosomes intermediate to the 5-piece spliceosome. A further point is that both chromosome (Backert et al. , 1 997; Watanabe et al. , 1 999; Zhang et aI., 1999), gene (Estevez & Simpson, 1 999) and RNA gene (Keiler et al. , 2000) fragmentation is found in mitochondria and chloroplasts. A similar architecture is seen in RNA viruses, and this has been argued to be a means of slowing the accumulation of slightly deleterious mutations arising via Muller's Ratchet (Reanney, 1 986). Hence, while fragmentation might be a predicted consequence of an organellar location for group II introns (no fragmented introns have been documented in free-living bacteria), it is not expected for genes located in the nucleus, given that the ratchet does not operate at the same levels as in organellar genomes (Blanchard & Lynch, 2000). Currently there is limited information on the nature of splicing in protists. Spliceosomal introns and all five snRNAs have been identified in Euglena gracilis (Breckenridge et al. 1999, and references therein), Trypanosoma brucei and T. cruzi (Mair et al. 2000, and references therein). The Giardia Lamblia genome project Page 1 3 (McArthur et al. 2000) is underway, and it will be interesting to see whether splicing occurs and whether snRNAs are present. Given the Trypanosoma and Euglena examples, it would be a surprise to find any protists without 5 snRNAs (unless only trans-splicing is present in which case U 1 may be expected to be absent - see Breckenridge et al. , 1 999; Mair et al. , 2000). This suggests it is at least feasible that, prior to the endosymbiosis event that gave rise to the mitochondrion, proto-eukaryotes possessed splicing. Insertion of 'selfish' elements into genomes also deserves consideration. Insertion is not a widespread feature of prokaryotic genomes, while it varies from almost none, to extreme in eukaryotes. In extant bacteria there is good evidence for loss of any sequence that is not under immediate selection, including periodically­ selected functions (reviewed in Poole et aL, 200 1 ) . In bacteria the rate of genome replication is likely to be limited by a single origin of replication, and with fast response times being crucial to proliferation upon detection of an energy source, there is strong selection for sequence loss in the absence of direct selection for the sequence. In general, eukaryotes do not compete via fast reaction times, though this may be more prevalent among 'simple' eukaryotes (see Poole et al. 2001) . Without such competition, there is no inherent selective disadvantage to selfish element insertion if the only consequence is an increase in genome size. With these differences, it is clear that bacterial genomes have not simply remained in some 'primitive' status quo with eukaryotes having diversified through complexification. With a precedent for loss in bacteria, it is as likely that group II introns represent the remnants of eukaryotic mRNA splicing (surviving as selfish elements through intron mobility) as the standard view that splicing has complexified in eukaryotes. Equally, if group II introns did enter eukaryote nuclear genes via the mitochondrion, invasion and proliferation is expected. In examining the case for the spliceosome and mRNA introns in the RNA world, there are two major questions. First, what role might splicing have played in an RNA world, and second, is there any evidence for an RNA world origin? As described above, the exon shuffling theory does not explain the origin of introns, and nor is it well supported in specific and genome-wide analyses. Nevertheless, this does not preclude an RNA world origin for introns. An RNA world origin is not incompatible with the majority of introns being inserted during eukaryote evolution, and it does not require that putatively ancient introns adhere to the exon shuffling theory. Two roles for splicing in the RNA world have been suggested. First, splicing might have been a mechanism for recombination as a buffer against accumulation of deleterious mutation (Reanney 1 984; Darnell & Doolittle, 1986; Jeffares et al. , 1 998). Again, this role would be separate from the origin of an intron-exon structure. An explanation for the origin of splicing comes from examining the origin of chromosomes (Maynard Smith & Szathmary, 1993; Szathmary & Maynard Smith, 1 993). At a very early stage in the evolution of the cell, genes would not have been Page 1 4 maintained on chromosomes. The advantages of chromosomes are that, upon cell division, both daughter cells are guaranteed to receive a copy of all genes, and the spread of selfish genes that replicate faster than the other genes is limited (Maynard Smith & Szathmary, 1 993) . In the early RNA world, where gene and product were one and the same, the advent of the chromosome would have a step toward the separation of phenotype and genotype. Either transcription would have to become separated from replication (see Maizels & Weiner, 1999), or the whole chromosome would be transcribed and subsequently cut up to produce functional products (that is the chromosome and transcript are not distinguishable, unless all functional RNAs are on the same strand). Both these alternatives are likely, though the latter probably predated the former as a means of expressing RNA genes (Poole et al. , 1 998; 1 999). The emergence of physical linkage of genes on chromosomes in an RNA world provides a selection for splicing in the RNA world but does not explain the origins of the intron-exon structure of genes, nor whether group II introns predate the spliceosome. The emergence of an intron-exon structure may have simply been a consequence of absence of selection against the emergence of linker regions as a result of low replication fidelity. The presence of additional nuc1eotides at the 5 ' and/or 3 ' end might not have affected function appreciably, though there is no inherent reason for splicing to have been an inaccurate process. I f i t did cleave at specific sites, insertions between RNA genes resulting from low copying fidelity would not be selectively disadvantageous. There is however a strong argument that splicing from a transcript/chromosome could not have been carried out by group II introns in the RNA world. Consider a chromosome with 5 RNA genes on it, and with group II introns between the RN A genes. Upon self-splicing of the group II introns out of the transcript copy, the 5 genes would still be unprocessed; only the group II introns will have been released from the transcript. Gilbert and de Souza ( 1 999) have suggested that group 11 introns interrupted RNA genes, with splicing yielding a functional RNA. They also suggest that, with recombination, this architecture would enable RNA domain shuffling; that is, exon shuffling for RNA instead of proteins. There are examples of RNAs with introns (e.g. U3 snoRNA, US?), but it is not possible to establish whether these date back to the RNA world, or represent recent insertions. More problematically, the scenario proposed by Gilbert and de Souza ( 1999) requires a one gene, one chromosome model, with group II introns fulfilling a solely 'selfish' role. 'Selfish' elements are likely to be an emergent feature of any replicative system. However, for chromosomes to evolve, splicing in trans is required in order to express functional RNAs from a precursor transcript. Group II introns would not have provided this function, since they self-excise then splice together the two exons ! Furthermore, the propensity for self-splicing introns to insert into a sequence is not a property of the RNA, but of the associated proteins (Lambowitz et al . , 1 999). Without a mechanism for insertion, there would be a tendency for 'selfish' self-splicing introns Page 1 5 to be lost, since the processed chromosome would function equally well without these. In fact, without insertion, it is difficult to see how these introns could be parasitic on early RNA genomes. Hence, self-splicing introns, if they date back to the RNA world, would have had insert themselves as well as excise themselves. Given that modern group I and Il introns only do the latter, it is as likely that these post-date the RNA world, arising subsequent to DNA endoribonucleases and reverse transcriptases and associated factors requried for insertion (Lambowitz et aI. , 1 999). If tRNA introns date back to the RNA world, they have lost both splicing and insertional functions (Trotta and Abelson, 1 999). For expression of several functional RNAs from a single transcript RNA/chromosome (and assuming that these functional RNAs were not all self­ splicing), what is needed is the reverse of modern day splicing (where the junk is cut out and the coding regions are spliced together) . That is, in an RNA world, modem­ day introns would have been the coding genes, and modem-day exons would have been the junk (Figure 2). The brief description of the origin of chromosomes given above is not a new one, but the finding of the exact same structure in modem genomes has rekindled the argument that the intron-exon structure of genes dates back to the RNA world (Poole et al. 1 998, 1 999). Several eukaryotic genes are now known where the introns code for functional RNAs (small nucleQlar snoRNAs), the exons being non-coding (Tycowski et aI . , 1996a; Bortolin & Kiss, 1 998; Pelczar & Filipowicz, 1 998; Smith & Steitz, 1998). In snoRNA expression in these genes, the snoRNA-containing introns are spliced out and the noncoding exons are spliced together. Gene expression from chromsomes would have been identical in the RNA world (Figure 2). Excitingly, the production of a junk RNA from a series of non-coding exons could also solve the problem of where mRNA came from (Poole et al. , 1 999). In a tightly-packed genome of RNA genes, there would have been no raw material for the ribosome to act upon. However, if RNAs were excised from precursor transcripts, with the junk being spliced together, this could have provided the raw material from which protein genes arose (Figure 2). Under this model, there would be no correlation between exons and protein modules, since the proto-exons would have been continuous structures, not modular as per the exon shuffling theory. A good number of snoRNAs are intron-encoded, with almost all vertebrate snoRNAs being intronic, and moreover, these are found in ribosomal and nucleolar proteins (Weinstein & Steitz 1999). The latter group are of particular interest, since models for the origin of protein synthesis involve a positive feedback loop: proteins stabilise and increase the accuracy of the ribosome, which makes proteins more accurately, and these further enhance the accuracy of the ribosome (see Poole et al. 1 999, and references therein). It has been variously argued that this is an ancient system (Poole et aI. , 1 998; 1 999), and that snoRNAs arose by recently by diversification (Morrissey & Tollervey, 1 995 ; Lafontaine & Tollervey, 1 998). Many snoRNAs have now been identified, and Page 1 6 almost all are involved in rRNA processing, being essential for 2'-0-ribose methylations, pseudouridylations or precursor rRNA cleavage (reviewed by Weinstein & Steitz, 1 999). Pre-rRNA processing can certainly be argued to be central to metabolism since it is processing of an ubiquitous RNA, as with processing of tRNA by RNase P. Nevertheless, establishing the antiquity of snoRNAs is not straightforward. Both hypotheses have their merits, and are not necessarily incompatible in all respects (Poole et aI. , 2000). This debate we shall consider further, and try to establish an approach that could resolve this issue. snoRNAs SnoRNAs are involved in extensive processing of eUkaryotic rRNA (Smith & Steitz, 1 997; Weinstein & Steitz, 1999), and some process spliceosomal RNAs (Tycowski et aI . , 1 998; Jady & Kiss, 2001) . Two families have been characterised, CID and HJACA, on the basis of sequence elements. The CID family guides 2'-0- methylation of ribose, and in yeast 5 1 of 55 rRNA methylations have been shown to be snoRNA-guided (Lowe & Eddy, 1999). The HJACA family snoRNAs guide isomerisation of uridine to form pseudouridine. In yeast, based on the number of pseudouridylations of rRNA (Ofengand & Foumier, 1 998) the number of HJ ACA snoRNAs is predicted to be comparable to CID snoRNAs. In humans, this number is expected to be near 100 for each family, again on the basis of the number of modifications made to the rRNA (Smith & Steitz, 1 997) . Members of each class are also involved in cleavage of pre-rRNA during rRNA maturation (reviewed in Smith & Steitz, 1 997). Recently, a 'chimeric' snoRNA, which guides both pseudouridylation and methylation on snRNA U5, has been characterised (Jady & Kiss, 200 1 ) . However, with the exception of this snoRNA, all other snoRNAs fall neatly into the two families, CID and HJACA. The distribution of snoRNAs varies across the three domains. Eukaryotes contain both CID and HJACA family snoRNAs, involved in 2'-0-methylation and pseudouridylation, and representatives of both families participate in pre-rRNA cleavage (reviewed in Morrissey & Tollervey, 1 995; Smith & Steitz, 1 997 ; Lafontaine & Tollervey 1 998 ; Smith & Steitz, 1 999) . Bacteria are not expected to possess snoRNA-like RNAs, having a limited number of 2'-O-methylations and pseudouridylations, all of which are produced by protein enzymes in bacteria studied to date (Bachellerie & Cavaill6, 1 998; Ofengand & Foumier, 1 998). Cleavage of pre­ rRNA in bacteria i s likewise carried out by proteins (Morrissey & Tollervey, 1 995) . A more complex picture has emerged in archaea. Both the crenarchaea and euryarchaea possess extensive 2'-0-methylation of rRNA, guided by a family of small RNAs homologous to eukaryotic CID snoRNAs (Gaspin et aI. , 2000; Omer et al. , 2000). However, the number of pseudouridylations in archaeal rRNA is low, as per bacteria (Lafontaine & Tollervey, 1 998). No homologues ofHJACA snoRNA­ associated proteins have been identified, suggesting that the pseudouridylation apparatus may be protein-mediated like in bacteria (Lafontaine & Tollervey, 1 998; Page 1 7 Charette & Gray, 2000). Less is known about the pre-rRNA processing events involving cleavage in archaea. Evidence to date suggest this aspect of pre-rRNA processing does not involve snoRNA-like RNAs, but one or more novel endonucleases (Russell et aI. , 1 999). However, an in-cis snoRNA U3-1ike function (U3 functions in pre-rRNA cleavage in eukaryotes [see Smith & Steitz, 1997]) for sequences within the 5 ' external transcribed spacer of pre-rRNA has been suggested for both archaea and bacteria (Dennis et al. , 1997), and homologues of the snoRNA U3-associated protein IMP4, have been identified in Archaea (Mayer et aI. , 200 1 ) . If snoRNA-mediated cleavage of pre-rRNA is not demonstrated in archaea, the existence of proteins homologous to the eukaryotic snoRNP-based processing system, and the existence of CID family homologues for pre-rRNA 2'-O-methylation, might be best interpreted as loss from archaea, especially given that some of the eukaryotic snoRNAs involved in cleavage are CID family members. Furthermore, if Dennis et al. ( 1997) are correct in their suggestion of an in-cis U3-like function for the 5'ETS, this may suggest that the snoRNA system for cleavage is, in some form, ancestral, as suggested by leffares et al. ( 1 998). With the paucity of information currently available for archaeal pre-rRNA cleavage events, it is not possible to establish whether it is more like the eukaryote or bacterial pathway, or indeed, whether it is unique to the archaeal domain. We have previously argued that both families of snoRNAs date back to the RNA world (Jeffares et al. 1998; Poole et al. 1 999), while Tollervey and colleagues have argued for more recent origins, with the CID family arising in the ancestor of eukaryotes and archaea and the HI ACA family perhaps arising in the eukaryotes, after divergence from the two prokaryotic lineages. Which scenario is correct, and how does one establish this? There are several aspects to the snoRNA problem: • Consideration of the phylogenetic distribution of CID and HI ACA family snoRNAs, as outlined above. • Problems with the rooting of the tree of life, and how this may influence conclusions. • Selection. • That an RNA world origin for snoRNAs does not preclude recent diversification. It is necessary to consider all aspects in any theory that attempts to account for the origin, evolution and modern distribution of snoRNAs. We shall review relevant aspects of the tree of life problem, and present a theory for the origin of snoRNAs that accounts for all the data. Currently the interrelationships between the three domains is still in dispute, with the widely accepted monophyly of archaea and eukaryotes (Figure 3a, Iwabe et aI. , 1 989; Gogarten et aI. , 1989; Woese et al. , 1 990) having been challenged in the light of new techniques, which suggest that the bacteria appear more divergent because of 'long-branch attraction' (Brinkmann & Philippe, 1 999; Lopez et al. , 1 999), wherein a faster rate of evolution incorrectly groups the two slower-evolving groups Page 1 8 (archaea and eukaryotes). Removing the 'long-branch attraction' artefact places the two prokaryotic groups together, with the root falling on the eukaryote branch (figure 3b). The traditional tree suggests that the snoRNAs arose in the common ancestor of the archaea and eukaryotes, and may or may not have been present in the Last Universal Common Ancestor (LUCA), the latter point depending on whether the bacterial rRNA processing system is ancestral or derived (Figure 3a). The newly­ proposed tree places the snoRNAs in the LUCA (assuming the distribution is not a result of horizontal transfer), as they are represented in both major branches of the tree, so parsimony can be applied to argue that bacteria almost certainly lost these (Figure 3b). Since the position of the root of the tree of life is not known with any certainty, it is difficult to establish the origin of a feature based on its distribution across the three domains. Even if the root is established, it is difficult to use this information to establish the nature of the LUCA. A feature found on both sides of the root can be argued to be present in the LUCA, assuming no horizontal transfer or convergent evolution. A feature which is present in only one lineage, e.g. H/ACA snoRNAs in eukaryotes, must be treated slightly differently however. Multiple losses are far more likely than multiple gains (as exemplified by multiple independent losses of primary synthetic pathways in parasitic and endosymbiotic bacteria [Andersson & Andersson, 1 999]). Hence, if HI ACA snoRNAs are not found in archaea or bacteria, this does not rule out the possibility that it was a feature of the LUCA (Forterre, 1 997; Penny & Poole, 1 999). As the tree describes the relationships between three monophyletic lineages, any argument from parsimony should be treated with caution. More importantly, even with horizontal transfer excluded (as far as we are aware, there is no evidence for horizontal transfer of snoRNAs or associated proteins), the uncertainty of the topology of the tree of life makes it uninformative (Forterre, 1 997; Penny & Poole, 1 999). The problems of using the tree in establishing the evolution of the snoRNAs calls into question the robustness of Tollervey and colleagues' conclusions (Morrissey & ToIIervey, 1 995; Lafontaine & Tollervey, 1 998) because their scenario for the origin of the snoRNAs is based on two assumptions: that the bacterial rooting of the tree of life is correct; and that the corollary of the placement of the bacterial lineage as the outgroup is that bacterial features are ancestral and those shared by archaea and eukaryotes are derived. It is currently unclear whether the bacterial rooting is the correct one, but in placement of the root in the bacterial lineage does not imply that bacterial traits are ancestral, or that shared archaeal-eukaryote traits arose post-LUCA (Forterre, 1 997). This latter point does not in itself invalidate the evolutionary scheme described ToIlervey and colleagues' papers, but it does cast doubt on it. Page 1 9 The case for snoRNAs as RNA relics. As the tree of life cannot be used to establish the antiquity of snoRNAs, it is necessary to establish an alternative approach to examining the origin of snoRNAs. One way to do this is to establish whether there is a role for methylation and pseudouridylation in the RNA world. Both types of modification are ubiquitous, so can be argued to date back to the RNA world (Martfnez Gimenez et aI. , 1 998; Cermakian & Cedergren, 1 998). This suggestion is relatively uncontroversial since it is based on the ubiquity of these modifications, and on arguments for their utility prior to the emergence of protein synthesis. Pseudouridylation might have originally been selected for the increased H-bonding that is possible compared with uridine (see Ofengand & Fournier, 1 998; Charette & Gray, 2000). It might therefore be important in the specification of tertiary structure, or a folding pathway. 2'-O-methylation alters the 2'-OH moiety of ribose, and this could have two roles. First, this modification eliminates the reactivity of the 2'-OH, so 2'-O-methylated ribose cannot be involved in catalytic reactions. Moreover, the addition of a methyl group will restrict the potential for hydrogen bonding at that position. Hence, 2'-O-methylation would prevent cross­ reactivity or unwanted self-cleavage, and furthermore, influencing hydrogen bonding might specify or favour a particular folding pathway (Bachellerie & Cavaille, 1 998; Poole et al . , 2000). 2'-O-methylation is expected to be possible without protein, consistent with a possible RNA world origin for this modification (Poole et aI. , 2000), though it is less clear whether pseudouridylation could be catalysed by RNA. In both cases, this could be established through in vitro selection experiments. A final point is that cleavage reactions analogous to those in pre-rRNA processing are known for RNA, an example being that carried out by RNases P and MRP. The theory proposed by Tollervey and colleagues (Morrissey & Tollervey, 1 995; Lafontaine & Tollervey, 1 998) would require that these modifications were present in the RNA world in limited numbers (or perhaps even absent altogether), with the snoRNA apparatus only arising post-LUCA. If this argument is accepted, an explanation must be given for the very limited use of these functional groups in the RNA world and the LUCA, with emergence of high levels of rRNA methylation in archaea, and both methylation and pseudouridylation in eukaryotes. It also must explain the utility of such rRNA modifications specifically in these two groups, and not bacteria. The alternative is that modification of rRNA dates back to the RNA world, and that it was snoRNA mediated (Poole et aI. , 1 998, 1 999) . Protein-RNA interactions subsequently replaced the role of such modifications in folding, and in silencing sites of potential catalytic activity (Poo1e et al. , 2000). Detailed structural information of the bacterial ribosome is now available (Muth et al. , 2000; Nilssen et al . , 2000, Yusupov et al. , 2001 ), and eventually it may become possible, through comparative structures, to establish whether eukaryotic modifications serve an equivalent function to RNA-protein interactions. If it is assumed that pseudouridylation and 2'-O-methylation date back to the RNA world, was relatively extensive, and that modification was either mediated or Page 20 catalysed by snoRNA, an explanation for the complete absence of snoRNAs from bacteria, and ofH/ACA snoRNAs from archaea must also be given. The bacterial rooting of the tree of life, and the position of thermophiles at the base of both the archaeal and bacterial domains has been taken as evidence to support a thermophilic LUCA (Woese, 1 987). However, single-stranded RNA is unstable at high temperatures, and a strong counter argument for the reduction in RNA processing, and putative RNA relics, in prokaryotes is that either the ancestor of prokaryotes was a thermophile, or, that thermophily arose twice (Forterre, 1 995; Poole et aI. , 1 998, 1 999). In both scenarios, eukaryotes would never have undergone a period of adaptation to high temperatures, and the LUCA would have been a mesophile (Forterre, 1995; Poole et aL, 1 998, 1999). In addition to the expectation that RNA processing would be reduced during adaptation to high temperatures, circular chromosomes may also be an adaptation to high temperature, solving the problem of 'frayed ends' (Marguet & Forterre, 1 994; Poole et al. , 1 999) and also supporting the argument that linear chromosomes and telomerase RNA is the ancestral state (Maizels & Weiner, 1 999; Poole et al. , 1 999). Independent evidence that the LUCA was mesophilic comes from reconstruction of the ancestral GC content by comparing archaeal, bacterial and eukaryote genomes (Galtier et al., 1 999). Even when mesophiles were removed from the dataset the conclusion reached was the same (Galtier et aI . , 1 999). Finally, three independent reports have now suggested that traits contributing to hyperthermophily may have been subject to horizontal transfer (Aravind et aI. , 1 998; Nelson et al. , 1 999; Forterre et at, 2000). Neither scenario can readily explain the snoRNA data however. In addition to the roles for 2'-O-methylation described above, it has also been shown that this type of modification serves to stabilise RNA, and that the extent of modification is positively correlated with growth temperature in thermophilic archaea (Noon et aI. , 1 998). If the LUCA were a thermophile, there ought to have been selection for extensive methylation in all groups, yet single-stranded RNA should not be favoured since it is thermolabile (Forterre, 1 995). SnoRNA-mediated 2'-O-methylation is found in archaea and eukaryotes, but not in bacteria, whereas a thermophilic common origin for all three domains would predict that all three would have extensive methylation, and, if anything, eukaryotes would be the strongest candidates to have lost these. Likewise, a thermophilic ancestor for prokaryotes does not readily explain the presence of extensive methylation in archaea, and near absence in bacteria. However it can potentially explain the loss of pseudouridylation in both lineages, since there is no obvious role for this type of modification in RNA thermostability. Nevertheless, given the inconsistency with the 2'-O-methylation data, this is too simplistic an explanation. As opposed to the scenario given by Lafontaine & Tollervey ( 1998), where CID family snoRNAs emerged in the archaeal-eukaryote lineage, and HI AeA snoRNAs emerged in eukaryotes after divergence from archaea, we favour the following possibility. Page 2 1 Given a likely RNA world role for both pseudouridylation and 2'-0- methylation, the bacterial site-specific protein system for modification is most likely to be derived. The simplest explanation for snoRNAs is therefore that they date back to the RNA world, and hence that these were a feature of the LUCA (PooIe et aL, 1 998, 1 999). The presence of CID family snoRNA-like sRNAs in archaea (Gaspin et al. , 2000; Omer et al. , 2000) and their absence in bacteria, and absence of HJACA snoRNAs from both can be explained by the loss of snoRNAs from the bacterial lineage prior to thermoadaptation, while snoRNAs were present in the ancestors of archaea prior to thennoadaptation. In adaptation to high temperatures in general, there will be the tendency to minimise use of single-stranded RNA, owing to its instability at high temperatures, and hence RNA processing is expected to have been reduced in lineages which underwent a period of thennoadaptation. For RNA to nevertheless be maintained, there must be counter-selection for RNA protection. We suggest that in the archaea, H/ACA snoRNAs were lost since there was selection for reduction of RNA processing, with one consequence being that extensive pseudouridylation was replaced by protein-RNA interactions. In the case of CID snoRNAs, there was still selection for reduction of RNA processing, but 2'-0- methylation was selectively advantageous since it imparted greater stability on the modified RNAs. Consequently, this pathway of RNA processing was retained, though there was selection for reduction in size of CID family snoRNAs, regularity in structure, and for maximal modification from minimal numbers of RNAs (see Omer et aI. , 2000), so those which perfonned two modifications were selected over those that directed just one modification. In the case of bacteria, we suggest that snoRNA-mediated modifications had been lost prior to thennoadaptation, and that these had been replaced by RNA-protein interactions. The selection we have proposed for loss of RNA processing is response time in organisms competing for limited resources that fluctuate in availability (Poole et aI., 1 998; Poole et aI. , 1999). In bacteria, a fast response time is required in order to act upon detection of a nutrient source. Action requires gene expression and subsequent utilisation of that source, and the faster this is achieved, the more progeny that are produced (Carlile, 1 982). Fast gene expression requires fast protein synthesis, and it is notable that in bacteria, translation begins before transcription is complete, and that ribosome assembly requires fewer steps than in eukaryotes, since there is relatively little processing of the rRNA. In eukaryotes, ribosome assembly takes much longer, and gene expression requires many processing steps, as well as export from the nucleus (see PooIe et aI. , 1 998). We therefore suggest that competition drove the streamlining of the RNA processing apparatus in the ancestors of bacteria, prior to thermoadaptation. Consequently, when bacterial lineages colonised high temperature environments, RNA-protein interactions in the ribosome provided thennostability. In eukaryotes we favour the scenario put forth by Lafontaine and Tollervey ( 1 998), who argue that duplication & divergence conceivably resulted in expansion of the modification snoRNAs in this lineage. Duplication and divergence is more likely Page 22 to lead to new function in eukaryotes than in archaea or bacteria since in the latter two groups, the rate of genome replication is under selection. Successful individuals are not only those that respond to a new nutrient, but those that can divide the fastest (see Poole et aI. , 2001 ) . Duplication events in eukaryotes are not in themselves selectively disadvantageous, and could lead to the emergence of two snoRNAs from a single ancestral snoRNA which carried out two modifications. Once this had occurred, there would be a low probability that reversion could have restored the original state. While a few eukaryote snoRNAs can mediate two modifications, the majority carry out just a single modification (Kiss-LaszI6 et al . , 1 996; Tycowski et aI., 1 996b; Ni et aI., 1 997; Ganot et aI. , 1 997b; Lowe & Eddy, 1999). Duplication and divergence would also have resulted in potential for expansion of the role of snoRNAs. As has been recently documented (Cavaille et aI., 2000), some snoRNAs in mouse and human are expressed specifically in the brain, and are targeted to mRNA, possibly playing a role in the regulation of editing which produces alternative gene products. These brain-specific snoRNAs (Cavaille et al., 2000) provide a clear example of RNAs with different proximate and ultimate origins. Even if snoRNAs are a recent deVelopment (i.e. post-LUCA), it is possible to establish the ultimate (original) function as being in rRNA processing, as this is conserved between archaea and eukaryotes. Given that the ancestral state would be two modifications per snoRNA, this would have been maintained, or selected for in the CID box s(no)RNAs of archaea, owing to the thermolability of RNA, whereas loss of this organisation might be an expected outcome of duplication and divergence. As for an explanation for the ancestral state being two modifications and not one, this is unclear, and indeed one evolutionary explanation may simply be that this is what emerged. An alternative possibility is that in the RNA world, two modifications (as is presumably the ancestral state for both CID and HlACA snoRNAs) may have represented the optimal number of modifications by a single RNA, given low coding capacity. Conclusions. The evidence we review here argues that new RNAs do evolve de novo, that this process is ongoing, and central to evolution of new cellular functions. Likewise, new RNA functions can arise through duplication and divergence. Nevertheless, it is still possible to distinguish between RNAs which arose very early in evolution and those which have a relatively recent origin. This distinction is not necessarily on the basis of function alone, and the necessarily ad hoc nature of this classification results in some RNAs being harder to place. However, on current evidence, and consistent with the RNA world theory (leffares et al. , 1 998), we conclude that newly-evolved RNAs do not appear to displace proteins, whereas proteins have probably replaced RNAs on many occasions during evolution. A question of central evolutionary importance is whether, as argued by Eddy ( 1 999), RNA may be inherently better suited to certain roles than are proteins. RNA Page 23 can readily form complementary base pairs, making it effective in regulation of gene expression, guide-mediated site-specific modification, and, moreover, such functions may arise readily, for instance, through duplication and expression of an antisense RNA from the duplication. While a reasonable suggestion; proteins families have also evolved diverse specific RNA binding function. A good example is the large number of restriction-modification systems, where pairs of evolutionarily unrelated endonucleases and methylases recognise the same sequence. A common origin for a range of restriction endonucelases (JeItsch et al. 1 995; Bujnicki, 2000) demonstrates that extensive diversification is possible from a single protein. Indeed, arguing that RNA is inherently better than protein runs counter to the process by which new functions evolve. There is no requirement that the molecule that becomes selected for that function is the 'best' possible for that role, and this is exactly the point of Jacob's ( 1977) analogy of evolution as a tinkerer, not an engineer-selection merely requires that a function confers an advantage. It does not require that only the best possible molecule is the only molecule that can come under selection. It is not clear that RNA is inherently better than protein, even if this apparently makes intuitive sense. RNA may be more readily recruited into functions where base recognition is required, perhaps suggesting that potential anti sense molecules are readily generated in cells. Proteins are able to recognise specific sequences of considerable length, and regulate gene expression through nucleic acid binding. Hence, there is not the same clear picture as for the evolution of catalysis (Jeffares et al. 1 998). Notably, even with the evolution of catalysis, it is possible that some RNAs may never be replaced by proteins if the only criterion is catalytic efficiency, since it is possible for ribozymes to reach catalytic perfection, selection for a faster chemical step in catalysis will only occur when substrate diffusion is not the rate limiting step in the reaction; the larger the substrate, the slower it diffuses (Jeffares et al. 1 998). Arguments such as Eddy's ( 1999) lump the propensity for recruitment together with the propensity for function. In a hypothetical situation where only protein was available, no amount of tinkering would result in an RNA being selected for a given function (even though it might be better than protein) simply because there is no RNA for selection to act on. We therefore suggest that the recruitment of either RNA or protein into new function depends on what is available, not what is best. For catalysis, where there is selection for evolution towards catalytic perfection, protein may replace RNA if an RNA cannot reach rates of catalysis where diffusion becomes the rate-limiting step, but not for a ribozyme where substrate diffusion is rate limiting (Jeffares et al. 1 998). For site-specific recognition, we suggest that recruitment of RNA or protein has more to do with what is available, and that there is no evidence supporting the possibility that RNA is inherently better than protein in this role. In general, the propensity for RNA to be selected over protein in a sequence-recognition role will depend on the Page 24 initial 'environment' not the inherent properties of the molecule. Where this may break down is at high temperature, where RNA will be selected against. The snoRNAs constitute the only case where it is argued that RNA could have displaced proteins (Lafontaine & Tollervey, 1 998), and, at least in respect to their role as guides for post-transcriptional modification, this is not unreasonable. The alternative scenario, that snoRNAs pre-date protein-enzymes is also feasible (Poole et aI. , 1 999). For a resolution of this issue, two questions must be addressed. First, what is the biological function of 2'-O-methylated ribose and pseudouridine, the products of snoRNA-mediated modification? Second, in the context of the two theories, what selection pressures could account for the diversification of these in eukaryotes (and archaea) or the reduction of these in bacteria? Elsewhere, we have offered a selection pressure for the loss of modifications in bacteria (Poole et aI . , 1 999) . In contrast, an argument for the diversification of snoRNA-mediated modifications in eukaryotes based on selection has yet to be proposed. Several exciting developments with respect to the evolutionary origins of snoRNAs and snRNAs are coming from the examination of the protein constituents of the RNPs. For example, the CID family snoRNPs and U4 snRNP possess a common core protein that binds to an equivalent motif in both CID family snoRNAs and U4 snRNA (Watkins et aI. , 2000; Peculis, 2000). As per the problems with establishing the evolutionary relationships between snRNAs and group IT introns, it is not possible to tell whether this similarity is due to convergence or divergence from a common ancestor. Likewise, the common HlACA motifs shared by telomerase RNA and HlACA box snoRNAs could be divergent or convergent (MitchelI et al . , 1 999), as could the demonstration that both associate with the same set of core proteins (Pogacic et al. 2000; Dez et aI. , 200 1 ). With respect to snRNA origins, it is interesting to note that Srn proteins have now been detected in archaea (Salgado-Garrido et aI. , 1 999). Srn proteins are part of the spliceosome, but have recently shown to be involved in mRNA degradation (Bouveret et at, 2000). The function of Srn proteins in archaea is unknown (Salgado-Garrido et aI. , 1 999), as is the pathway of RNA degradation in this domain. In conclusion, information on phylogenetic distribution, together with metabolic context may provide an important test for resolving problematic data sets, such as the snoRNA data set. This is essential primarily because there is no clear way of objectively evaluating the two theories as they currently stand. A major hurdle that needs to be overcome before this approach can be reliably applied is for phylogenetics to unambiguously establish the relationships of the three domains archaea, bacteria and eukaryotes. Finally, it will also be important to test the evolutionary relationship of CID family snoRNAs in eukaryotes and sRN As from archaea. It is difficult to predict whether it will be possible to establish if these are related by descent, or are convergent. However the task ought to be simpler than demonstrating relationships between functionally unrelated RNAs such as HlACA snoRNAs and telomerase RNA, U4 snRNA and CID snoRNAs, or group II introns and the spliceosomal RNAs. Page 25 References. AItman S , Kirsebom L. 1 999. Ribonuclease P. In: Oesteland R, Cech T, Atkins J, eds. The RNA World, 2nd Ed. New York: Cold Spring Harbor Laboratory Press. pp 35 1 - 380. Altuvia S , Zhang A, Argaman L, Tiwari A, Storz O. 1 998. The Escherichia coli OxyS regulatory RNA represses fhlA translation by blocking ribosome binding. EMBO J 1 7: 6069-6075. Andersson JO, Andersson SOE. 1 999. Insights into the evolutionary process of genome degradation. Curr Opin Genet Dev 9: 664-67 1 . Andersson SGE, Kurland CG. 1 998. Reductive evolution of resident genomes. Trends Genet 6: 263-268. Andersson SGE, Kurland CO. 1 999. Origins of mitochondria and hydrogenosomes. Curr Opin Microbiol 2: 535-54 1 . Aravind L, Tatusov RL, Wolf YI, Walker DR, Koonin EV. 1 998. Evidence for massive gene exchange between archaeal and bacterial hyperthermcphiles. Trends Genet 14: 442-444. Bachellerie J-P, Cavai1l6 J. 1 998. Small nucleolar RNAs guide the ribose methylations of eukaryotic rRNAs. In: Grosjean H, Benne R, eds. Modification and Editing of RNA. Washington, DC: ASM Press. pp 255-272. Backert S, Nielsen BL, Bomer T. 1 997. The mystery of the rings: structure and replication of mitochondrial genomes from higher plants. Trends Plant Sci 2: 477- 483. Baumeister W, Walz J, Ziihl F, Seemiiller E . 1 998. The proteasome: Paradigm of a self-compartimentalizing protease. Cell 92: 367-380. Been MD, Wickham GS. 1997. Self-cleaving ribozymes of hepatitis delta virus RNA. Eur J Biochem 247: 741 -753 . Benner SA, Ellington AD, Tauer A. 1 989. Modem metabolism as a palimpsest of the RNA world. Proc Natl Acad Sci USA 86: 7054-7058. Blake CCF. 1 978. Do genes-in-pieces imply proteins-in-pieces? Nature 273: 267. Blanchard JL, Lynch M. 2000. Organellar genes: why do they end up in the nucleus? Trends Genet. 16: 3 1 5-320. Borner GV, Yokobori S-I, Morl M, Domer M, Paabo S. 1 997. RNA editing in metazoan mitochondria: staying fit without sex. FEBS Lett. 409: 320-324. Bortolin ML, Kiss T. 1 998. Human U19 jntron-encoded snoRNA is processed from a long primary transcript that possesses little potential for protein coding. RNA 4: 445- 454. Boudvillain M, de Lencastre A, Py1e AM. 2000. A tertiary interaction that links active­ site domains to the 5' splice site of a group II intron. Nature 406, 3 1 5-3 1 8. Bouveret E, Rigaut G, Shevchenko A, Wilm M, Seraphin B . 2000. A Srn-like protein complex that participates in mRNA degradation. EMBO J 19: 166 1 - 1 67 1 . Bouzat JL, McNeil LK, Robertson HM, Solter LF, Nixon lE, Beever JE, Gaskins HR, Olsen G, Subramaniam S, Sogin ML, Lewin HA. 2000. Phylogenomic Analysis of Page 26 the ex Proteasome Gene Family from Early-Diverging Eukaryotes. J Mol Evo1 51: 532-543. Breckenridge DG, Watanabe Y, Greenwood SJ, Gray MW, Schnare MN. 1 999. U l small nuclear RNA and spliceosomal introns in Euglena gracilis. Proc Natl Acad Sci USA 96: 852-856. Brinkmann H, Philippe H. 1 999. Archaea sister-group of Bacteria? Indications from tree reconstruction artifacts in ancient phylogenies. Mol BioI Evo1 16: 8 17-825 . Brosius J. 1999. RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene 238: 1 1 5- 1 34. Brown JW, Haas ES, Pace NR. 1 993. Characterization of ribonuclease P RNAs from thermophilic bacteria. Nucleic Acids Res 21:67 1 -679. Bujnicki JM. 2000. Phylogeny of the restriction endonuclease-like superfamily inferred from comparison of protein structures. J Mol Evol 50: 39-44. Cavalier-Smith T. 1 99 1 . Intron phylogeny: a new hypothesis. Trends Genet 7: 145- 1 48 . Cech TR. 1 986. The generality of self-splicing RNA: Relationship to nuclear RNA splicing. Cell 44: 207-2 10. Cermakian N, Cedergren R. 1 998. Modified nuc1eotides always were: an evolutionary model. In: Grosjean H, Benne R, eds. Modification and Editing of RNA. Washington, DC: ASM Press. pp. 535-541 . Chanfreau G, lacquier A. 1 994. Catalytic site components common to both splicing steps of a group II intron. Science 266: 1 383-1 387. Charette M, Gray MW. 2000. Pseudouridine in RNA: What, Where, How, and Why. IUBMB Life 49: 34 1 -35 1 . Collins LJ, Moulton V , Penny D. 2000. Use of RNA secondary structure for studying the evolution of RNase P and RNase MRP. J Mol Evo1 51: 1 94-204. Copertino DW, Hall ET, Van Hook FW, Jenkins KP, Hallick RE. 1 994. A group HI twintron encoding a maturase-like gene excises through lariat intermediates. Nucleic Acids Res 22: 1 029- 1036. Copertino DW, Hallick RB. 1 993. Group Il and group III introns of twintrons: potential relationships with nuclear pre-mRNA introns. Trends Biochem Sci 18: 467-47 1 . Covello PS, Gray MW . 1 993 . On the evolution of RNA editing. Trends Genet 9: 265- 268. Culbertson MR. 1 999. RNA surveillance: unforseen consequences for gene expression, inherited genetic disorders and cancer. Trends Genet. 15: 74-80. Damell lE, Doolittle WF. 1 986. Speculations on the early course of evolution. Proc Natl Acad Sci USA 83: 1 27 1 - 1 275. de Souza SJ, Long M, Klein RJ, Roy S , Lin S , Gilbert W. 1 998. Toward a resolution of the introns earlyllate debate: Only phase zero introns are correlated with the structure of ancient proteins. Proc Natl Acad Sei USA 95: 5094-5099 Page 27 De1ihas N. 1 995 . Regulation of gene expression by trans-encoded anti sense RNAs. Mol Microbial 15: 4 1 1 -4 14. Dennis PP, Russell AG, Moniz De Sa M. 1 997. Formation of the 5' end pseudoknot in small subunit ribosomal RNA: involvement of U3-like sequences. RNA 3: 337-343. Dez C, Henras A, Faucon B, Lafontaine D, Caizergues-Ferrer M, Henry Y. 200 1 . Stable expression in yeast of the mature form of human telomerase RNA depends on its association with the box HlACA small nucleolar RNP proteins Cbf5p, Nhp2p and Nop10p. Nucleic Acids Res 29: 598-603 . Doolittle WF. 1 978. Genes in pieces: were they ever together? Nature 272: 58 1 -582. Eddy SR. 1 999. Non coding RNA genes. Curr Opin Genet Dev 9: 695-699. Embley TM, Hirt RP. 1 998. Early branching eukaryotes? Curr Opin Genet Dev 8: 624-629. Erdmann VA, Barciszewska MZ, Szymanski M, Hochberg A, de Groot N, Barciszewski J. 200 1 . The non-coding RNAs as riboregulators. Nucleic Acids Res 29: 1 89- 1 93 . Estevez AM, Simpson L . 1999. Uridine insertion/deletion editing in trypanosome mitochondria-a review. Gene 240: 247-260. Forterre P. 1 995 . Thermoreduction, a hypothesis for the origin of prokaryotes. CR Acad Sci Paris III 318: 4 15-422. Forterre P. 1 996. A hot topic : the origin of hyperthermophiles. Cell 85: 789-792. Forterre P. 1 997. Archaea: what can we learn from their sequences? Curr. Opin. Genet. Dev. 7: 764-770. Forterre P, Bouthier De La Tour C, Philippe H, Duguet M. 2000. Reverse gyrase from hyperthermophiles: probable transfer of a thermoadaptation trait from archaea to bacteria. Trends Genet 16: 1 52- 1 54. Franke A, Baker BS. 1999. The roxI and rox2 RNAs are essential components of the compensasome, which mediates dosage compensation in Drosophila. Mol Cell 4: 1 17- 1 22. Fung PA, Gaertig J, Gorovsky MA, Hallberg RL. 1 995. Requirement of a small cytoplasmic RNA for the establishment of thermotolerance. Science 268: 1036-1039. Galtier N, Tourasse N, Gouy M. 1 999. A nonhyperthermophilic common ancestor to extant life forms. Science 283: 220-22 1 . Ganot P, Caizergues-Ferrer M , Kiss T. 1 997a. The family of box ACA small nucIeolar RNAs is defined by an evolutionarily conserved secondary structure and Ubiquitous sequence elements essential for RNA accumulation. Genes Dev 11 : 94 1 - 956. Garrett TA, Pabon-Pena LM, Gokaldas N, Epstein LM. 1996. Novel requirements in peripheral structures of the extended satellite 2 hammerhead. RNA 2: 699-706. Page 28 Gaspin C, Cavaille J, Erauso G, Bacherllerie J-P. 2000. Archaeal homologs of eukaryotic methylation guide small nuc1eolar RNAs: lessons from the Pyrococcus genomes . J Mol Biol 297: 895-906. (Erratum in J Mol Biol 300: 1 0 17- 10 1 8.J Gilbert W. 1 978 . Why genes in pieces? Nature 271: 501 . Gilbert W. 1 986. The RNA world. Nature 319: 6 1 8. Gogarten JP, Kibak H, Dittrich P, Taiz L, Bowman EJ, Bowman BJ, Manolson MF, Poole RJ, Date T, Oshima T, Konishi J, Denda K, Yoshida M. 1 989. Evolution of the vacuolar H + -ATPase: implications for the origin of eukaryotes. Proc Natl Acad Sci USA 86: 666 1-6665. Goldschmidt-Clermont M, Choquet Y, Girard-Bascou J, Michel F, Schirmer-Rahire M, Rochaix JD. 199 1 . A small chloroplast RNA may be required for trans-splicing in Chlamydomonas reinhardtii. Cell 65: 1 35-143. Gordon PM, Sontheimer EJ, Picirilli JA. 2000. Metal ion catalysis during the exon­ ligation step of nuclear pre-mRNA splicing: Extending the parallels between the spliceosome and group II introns RNA 6: 199-205. Graveley BR. 2001 . Alternative splicing: increasing diversity in the proteomic world. Trends Genet 1 7: 100- 107 . Harris RJ, Elder D. 2000. Ribozyme relationships: the hammerhead, hepatitis delta, and hairpin ribozymes have a common origin. J Mol Evol 51: 1 82-4. Herbert A, Rich A. 1 999a. RNA processing in evolution. The logic of soft-wired genomes. Ann N Y Acad Sci 870: 1 1 9- 132. Herbert A, Rich A. 1 999b. RNA processing and the evolution of eukaryotes. Nat Genet 3: 265-269. Hetzer M, Wurzer G, Schweyen RJ, Mueller MW. 1 997. Trans-activation of group II intron splicing by nuclear U5 snRNA. Nature 386: 4 17-420. Htittenhofer A, Kiefmann M, Meier-Ewert S, Q'Brien J, Lehrach H, Bachellerie J-P, Brosius J. 200 1 . RNomics: an experimental approach that identifies 201 candidates for novel, small, non-messenger RNAs in mouse. EMBO J 20: 2943-2953 . Iwabe N, Kuma K-I, Hasegawa M, Osawa S, Miyata T. 1 989. Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc Natl Acad Sci USA 86: 9355-9359. Jeffares DC, Poole AM, Penny D. 1 995. Pre-rRNA processing and the RNA world. Trends Biochem Sci 20: 298-299. leffares DC, Poole AM, Penny D. 1 998. Relics from the RNA world. J Mol Evo1 46: 1 8-36. leltsch A, Kroger M, Pingoud A. 1995. Evidence for an evolutionary relationship among type-ii restriction endonucleases. Gene 160: 7-1 6. Keeling PJ, McFadden GI. 1998. Origins of micro sporidia. Trends Microbiol 6: 1 9- 23. Keiler K, Wall er P, Sauer R. 1 996. Role of a peptide tagging system in degradation of proteins synthesized from damaged messenger RNA. Science 2 71 : 990-993 . Page 29 Keiler KC, Shapiro L, Williams KP. 2000. tmRNAs that encode proteolysis-inducing tags are found in all known bacterial genomes: A two-piece tmRNA functions in Caulobacter. Proc Natl Acad Sci U S A 97: 7778-7783. Kelley RL, Kuroda MI. 2000. The role of chromosomal RNAs in marking the X for dosage compensation. Curr Opin Genet Dev 10: 555-61 . Kiss-Uisz16 Z, Henry Y, Bachellerie J-P, Caizergues-Ferrer M, Kiss T 1 996. Site­ specific ribose methylation of preribosomal RNA; a novel function for small nucleolar RNAs. Cell 85: 1077- 1088. Kremerskothen J, Nettermann M, op de Bekke A, Bachmann M, Brosius 1. 1 998. Identification of human autoantigen LalSS-B as BCIIBC200 RNA-binding protein. DNA Cell BioI 1 7: 75 1 -759. Lafontaine DLJ, Tollervey D. 1 998. Birth of the snoRNPs: the evolution of the modification-guide snoRNAs. Trends Biochem Sci 23: 383-388. Lambowitz AM, Caprara MG, Zimrnerly S, Perlman PS . 1 999. Group I and group II ribozymes as RNPs: clues to the past and guides to the future. In: Gesteland R, Cech T, Atkins J, eds. The RNA World, 2nd Ed. New York: Cold Spring Harbor Laboratory Press. pp 45 1 -485. Lease R, Belfort M 2000. Riboregulation by DsrA RNA: trans-actions for global economy. Mol Microbio1 38: 667-672. Lee JT, Davidow LS, Warshawsky D 1999. Tsix, a gene antisense to Xist at the X­ inactivation centre. Nat Genet 21: 400-404. Lee JT, Jaenisch R 1 997 . The (epi)genetic control of mammalian X-chromosome inactivation. Curr Opin Genet Dev 7: 274-280. Lee RC, Feinbaum RL, Ambros V. 1993 . The C. elegans heterochronic gene lin-4 encodes small RNAs with anti sense complementarity to lin- 14. Cell 75: 843-854. Logan DT, Andersson J, Sjoberg B-M, Nordlund P. 1999. A glycyl radical site in the crystal structure of a class III ribonucleotide reductase. Science 283: 1 499- 1 504. Logsdon JM Jr. 1 998. The recent origin of spliceosomal introns revisited. Curr Opin Genet Dev 8, 637-648. Lopez P, Forterre P, Philippe H. 1999. The root of the tree of life in light of the covarion modeL J Mol Evol 49: 496-508. Lowe TM, Eddy SR 1999. A computational screen for methylation guide snoRNAs in yeast. Science 283: 1 1 68- 1 1 7 1 . Lykke-Andersen J , Aagaard C , Semionenkov M , Garrett RA. 1997. Archaeal introns: splicing, intercellular mobility and evolution. Trends Biochem Sci 22: 326-33 1 . Mair G, Shi H, Li H, Djikeng A, Aviles HO, Bishop JR, Fa1cone FH, Gavrilescu C, Montgomery JL, Santori MI, Stern LS, Wang Z, Ullu E, Tschudi C . 2000. A new twist in trypanosome metabolism:cis-splicing of pre-mRNA. RNA 6: 1 63- 1 69. Maizels N, Weiner AM. 1 999. The genomic tag hypothesis: what molecular fossils tell us about the evolution of tRNA. In: Gesteland R, Cech T, Atkins J, eds. The RNA World, 2nd Ed. New York: Cold Spring Harbor Laboratory Press . pp 79- 1 1 1 . Page 30 Marguet E, Forterre P. 1 994. DNA stability at temperatures typical for thermophiles. Nucleic Acids Res 22: 1 68 1 - 1 686. Marin I, Siegal ML, Baker BS . 2000. The evolution of dosage compensation mechanisms. BioEssays 22: 1 106- 1 1 14. Martfnez Gimenez JA, Saez GT, Seisdedos RT. 1 998. On the function of modified nuc1eosides in the RNA world. J Theor Bioi 194: 485-490. Mayer C, Suck D, Poch O. 200 1 . The archaeal homolog of the Imp4 protein, a eUkaryotic U3 snoRNP component. Trends Biochem Sci 26: 143- 144. McArthur AG, Morrison HG, Nixon JE, Passamaneck NQ, Kim U, Hinkle G, Crocker MK, Holder ME, Farr R, Reich Cl, Olsen GE, Aley SB, Adam RD, Gillin FD, Sogin ML. 2000. The Giardia genome project database. FEMS Microbiol Lett 189: 27 1 - 273 . Mitchell JR, Cheng J, Collins K. 1999. A box H1ACA small nuc1eolar RNA-like domain at the human telomerase RNA 3' end. Mol Cell Bioi 19: 567-576. Morrissey JP, Tollervey D. 1 995. Birth of the snoRNPs: the evolution of RNase MRP and the eukaryotic pre-rRNA-processing system. Trends Biochem Sci 20; 78-82. Moss E, Lee R, Ambros V. 1 997. The cold shock domain protein LIN-28 controls developmental timing in C. elegans and is regulated by the lin-4 RNA. Cell 88; 637- 646. Muller B , Schtimperli D. 1 997. The U7 snRNP and the hairpin binding protein: key players in histone mRNA metabolism. Semin Cell Dev BioI 8; 567-576. Muslimov lA, Banker G, Brosius J, Tiedge H. 1998. Activity-dependent regulation of dendritic BC1 RNA in hippocampal neurons in culture. J Cell Bioi 141: 1 60 1 - 1 6 1 1 . Muth GW, Ortoleva-Donnelly L, Strobel SA. 2000. A single adenosine with a neutral pKa in the ribosomal peptidyl transferase center. Science 289; 947-950. Nakano S-l, Chadalavada DM, Bevilacqua Pc. 2000. General acid-base catalysis in the mechanism of a hepatitis delta virus ribozyme. Science 287; 1 493- 1497. Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DR, Hickey EK, Peterson JD, Nelson WC, Ketchum KA, McDonald L, Utterback TR, Malek JA, Linher KD, Garrett MM, Stewart AM, Cotton MD, Pratt MS, Phillips CA, Richardson D, Heidelberg J, Sutton GG, Fleischmann RD, Eisen JA, Whilte 0, Salzberg SL, Smith HO, Venter JC, Fraser CM. 1 999. Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima. Nature 399: 323-329. Nilsen TW. 2000. RNA splicing: The case for an RNA enzyme. Nature 408; 782-783. Nissen P, Hansen J, Ban N, Moore PB, Steitz TA. 2000. The structural basis of ribosome activity in peptide bond synthesis. Science 289: 920-930. Noller HF, Hoffarth V, Zimniak L. 1 992. Unusual resistance of peptidyl transferase to protein extraction procedures. Science 256: 1 4 1 6- 14 19. Noon KE, Bruenger E, McCloskey lA. 1 998. Post-transcriptional modifications in 1 6S and 23S rRNAs of the archaeal hyperthermophile Sulfolobus solfataricus. J Bacteriol 180: 2883-2888. Page 3 1 Ofengand 1, Foumier MJ. 1 998. The pseudouridine residues of rRNA: number, location, biosynthesis, and function. In: Grosjean H, Benne R, eds. Modification and Editing of RNA. Washington, DC: ASM Press. pp. 229-253 . Omer AD, Lowe TM, RusseIl AG, Ebhardt H, Eddy SR, Dennis PP. 2000. Homologs of small nucleolar RNAs in Archaea. Science 288: 5 17-522. Pannuti A, Lucchesi le. 2000. Recycling to remodel: evolution of dosage compensation complexes. Curr Opin Dev Genet 10: 644-650. Pasquinelli AE, Reinhart BJ, Slack F, Martindale MQ, Kuroda MI, Maller B , Hayward DC, Ball EE, Degnan B, Muller P , Spring J, Srinivasan A , Fishman M, Finnerty J, Corbo J, Levine M, Leahy P, Davidson E, Ruvkun G. 2000. Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408: 86-89. Pelczar P, Filipowicz W. 1 998. The host gene for intronic V17 small nucleolar RNAs in mammals has no protein-coding potential and is a member of the 5'-terminal oligopyrimidine gene family. Mol Cell BioI 18: 4509-45 18 . Penny D, Poole A. 1 999. The nature of the Last Universal Common Ancestor. Curr Opin Genet Dev 9: 672-677. Perrotta AT, Shih I-H, Been MD. 1999. Imidazole rescue of a cytosine mutation in a self-cleaving ribozyme. Science 286: 1 23- 1 26. Philippe H, Germot A, Moreira D. 2000. The new phylogeny of eukaryotes. Curr Opin Genet Dev 10: 596-60 l . Pogacic V, Dragon F, Filipowicz W. 2000. Human HJACA small nuc1eolar RNPs and telomerase share evolutionarily conserved proteins NHP2 and NOPlO. Mol Cell BioI. 20: 9028-9040. Poole A, Jeffares D, Penny D. 1 999. Early evolution: prokaryotes, the new kids on the block. Bioessays 21: 880-889. Poole A, Penny D, Sjoberg B-M. 2000. Methyl-RNA: an evolutionary bridge between RNA and DNA? Chem. Bioi. 7: R207-R2 1 6. Poole AM, leffares DC, Penny D. 1998. The path from the RNA world. J Mol Evol 46: 1 - 1 7 . Poole AM , Phillips MJ, Penny D. 200 1 . Prokaryote and eukaryote evolvability. Biosystems, submitted. Rastogi T, Beattie TL, Olive lE, Collins RA. 1 996. A long-range pseudoknot is required for activity of the Neurospora VS ribozyme. EMBO J. 15: 2820-2825 . Reanney De. 1 984. RNA splicing as an error-screening mechanism. 1. Theor. Biol. 1 10: 3 1 5-321 . Reanney De. 1 986. Genetic error and genome design. Trends Genet. 2 : 4 1 -46. Romeo T. 1 998. Global regulation by the small RNA-binding protein CsrA and the non-coding RNA molecule CsrB. Mol Microbiol 29: 1 32 1 - 1 330. Rotte C, Henze K, Muller M, Martin W. 2000. Origins of hydrogenosomes and mitochondria. Curr Opin Microbial 3: 48 1 -486. Page 32 Rupert PB, Ferre-D'Amare AR. 200 l . Crystal structure of a hairpin ribozyme­ inhibitor complex with implications for catalysis. Nature 410: 780-786. Rzhetsky A, Ayala FJ, Hsu LC, Chang C, Yoshida A. 1997. Exonlintron structure of aldehyde dehydrogenase genes supports the 'introns-late' theory. Proc Natl Acad Sci USA 94: 6820-6825. Salgado-Garrido J, Bragado-Nilsson E, Kandels-Lewis S, Seraphin B . 1 999. Srn and Sm-like proteins assemble in two related complexes of deep evolutionary origin. EMBO J 18: 345 1 -3462. Saville BJ, Collins RA. 1 99 1 . RNA-Mediated Ligation of Self-Cleavage Products of a Neurospora Mitochondrial Plasmid Transcript. Proc Natl Acad Sci USA 88: 8826-8830. Sharp PA. 1 985 . On the origin of RNA splicing and introns. Cell 42: 397-400. Sharp PA. 1 99 1 . "Five easy pieces". Science 254: 663. Sharp PA. 1 994. Split genes and RNA splicing. Cell 77: 805-8 15 . Simpson L , Thiemann OH, Savill NJ, Alfonzo JD, Maslov DA. 2000. Evolution of RNA editing in trypanosome mitochondria. Proc Natl Acad Sci USA 97: 6986-6993. Skryabin BV, Kremerskothen J, Vassilacopoulou D, Disotell TR, Kapitonov VV, Jurka J, Brosius J. 1 998. The BC200 RNA gene and its neural expression are conserved in Anthropoidea (Primates). J Mol Evo1 47: 677-685. Smit AF A. 1 999. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev 9: 657-663. Smith HC, Gott JM, Hanson MR. 1997. RNA 3: 1 105- 1 1 23. Smith CM, Steitz JA. 1 997. Sno storm in the nucleolus: new roles for myriad small RNPs. Cell 89: 669-672. Smith CM, Steitz JA. 1998. Classification of gas5 as a multi-small-nucleolar RNA (snoRNA) host gene and a member of the 5 '-terminal oligopyrimidine gene family reveals common features of snoRNA host genes. Mol Cell BioI 18: 6897-6909. Sontheimer, EJ, Gordon PM, Piccirilli JA. 1999. Metal ion catalysis during group II intron self-splicing: parallels with the spliceosome. Genes Dev 13: 1 729- 1 74 1 . Stoltzfus A. 1 999. On the possibility of constructive neutral evolution. J Mol Evo1 49: 1 69- 1 8 1 . Stoltzfus A, Logsdon JM Jr, Palmer JD, Doolittle WF. 1 997. Intron 'sliding' and the diversity of intron positions. Proc Natl Acad Sci USA 94: 10739- 10744. Symons RH. 1 997. Plant pathogenic RNAs and RNA catalysis. Nucleic Acids Res 25: 2683-2689. Tarn W-Y, Steitz lA. 1 997. Pre-mRNA splicing: the discovery of a new spliceosome doubles the challenge. Trends Biochem Sci 22: 1 32- 1 37. Trotta CR, Abelson 1. 1 999. tRNA splicing: an RNA world add-on or an ancient reaction? In: Gesteland R, Cech T, Atkins J, eds. The RNA World, 2nd Ed. New York: Cold Spring Harbor Laboratory Press. pp 561 -584. Tycowski KT, Shu MD, Steitz lA. 1 996a. A mammalian gene with introns instead of exons generating stable RNA products. Nature 379: 464-466. Page 33 Tycowski KT, Smith CM, Shu M-D, Steitz JA. 1 996b. A small nuc1eolar RNA requirement for site-specific ribose methylation of rRNA in Xenopus. Proc Natl Acad Sci USA 93: 1 4480- 14485. Tycowski KT, You Z-H, Graham Pl, Steitz lA. 1 998. Modification ofU6 spliceosomal RNA is guided by other small RNAs. Mol. Cell 2: 629-638. Wassarman KM , Storz G. 2000. 6S RNA regulates E. coli RNA polymerase activity. Cell 101: 6 13-623. Wassarman KM, Zhang A, Storz G. 1999. Small RNAs in Escherichia coli . Trends Microbial 7: 37-45 . Watanabe Kl, Bessho Y , Kawasaki, M, Hori H . 1 999. Mitochondrial genes are found on minicirc1e DNA molecules in the mesozoan animal Dicyema J Mol BioI 286: 645-650. Watanabe Y, Yamamoto M. 1994. S. pombe mei2+ encodes and RNA-binding protein essential for premeiotic DNA synthesis and meiosis I, which cooperates with a novel RNA species meiRNA. Cell 78: 487-498 . Watkins NJ, Segault V, Charpentier B, Nottrott S , Fabrizio P, Bachi A, Wilm M, Rosbash M, Branlant C, Llihrmann R. 2000. A common core RNP structure shared between the small nuc1eoar box CID RNPs and the spliceosomal U4 snRNP. Cell 103: 457-466. Weiner AM. 1 993. mRNA splicing and autocatalytic introns: distant cousins or the products of chemical determinism? Cell 72: 1 6 1- 164 Weinstein L, Steitz lA. 1 999. Guided tours : from precursor snoRNA to functional snoRNP. Curr Opin Cell Biol ll: 378-384. Westhof E. 1 999. Chemical diversity in RNA cleavage. Science 286: 6 1 -62. Wightman B, Ha I, Ruvkun G. 1993. Posttranscriptional regulation of the heterochronic gene lin- 14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75: 855-862. Woese CR. 1 987. Bacterial evolution. Microbial Rev 51: 22 1 -27 l . Woese CR, Kandler 0, Wheelis ML. 1 990. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eukarya. Proc Natl Acad Sci USA 87: 4576-4579. Wolf YI, Kondrashov FA, Koonin EV. 2000. No footprints of primordial introns in a eukaryotic genome. Trends Genet 16: 333-334. Yean S-L, Wuenschell G, Termini l, Lin R-J. 2000. Metal-ion coordination by U6 small nuclear RNA contributes to catalysis in the spliceosome. Nature 408: 88 1 - 884. Yusupov MM, Yusupova GZ, Baucom A, Lieberman K, Earnest TN, Cate JH, Noller HP. 2001 . Crystal structure of the ribosome at 5 .5 A resolution. Science 292: 883- 896. Zhang A, Altuvia S, Tiwari A, Argaman L, Hengge-Aronis R, Storz G. 1 998a. The OxyS regulatory RNA represses rpoS translation and binds the Hfq (Hf-I) protein. EMBO J 1 7: 6061 -6068 . Page 34 Zhang F, Lemieux S , Wu X, St.-Arnaud D, McMurray C, Major F, Anderson D. 1 998b. Function of hexameric RNA in packaging of bacteriophage <1>29 DNA in vitro. Mol Cell 2: 1 4 1 - 147. Zhang Z, Green BR, Cavalier-Smith T. 1 999. Single gene circles in dinoflagellate chloroplast genomes. Nature 400: 1 55 - 1 59. Page 35 Figure legends. Figure 1. Problems for inferring ancestry of group 11 introns and spliceosomal RNAs from the tree of life. In a and c, the bacterial rooting is shown, in b and d, the eukaryote rooting is shown. Blue dots represent group II introns and spliceosomal RNAs, grey dots denote absence of these. The position of the root does not allow evaluation of the different trees with simple parsimony since trees a and b show origin of spliceosomal RNAs through 'seeding' from the mitochondrion. Consequently all four trees are equally likely. Testing the 'seed' hypothesis by examining the bacterial distribution of group II introns will be inconclusive for two reasons. First, group II introns at,;; mobile, and second, limited distribution can equally be explained by polyphyletic losses. Likewise, Ubiquity of group II introns in bacteria cannot be taken as support for a common ancestor of group II introns and spliceosomal RNAs in the LUCA, since the former are mobile elements. Finding group II introns in archaea can also be ambiguously interpreted. Figure 2. Introns first hypothesis. The final step in the origin of genetically-encoded protein-synthesis is presumed to be the origin of mRNA. We propose that the non-coding 'transcripts' , produced as a by­ product in the processing of precursor transcripts containing functional RNAs (such as snoRNAs), were the source of the first genetically-encoded proteins. These were utilised by the proto-ribosome to stabilise the interaction between two charged tRNAs, during non-genetically-encoded peptide synthesis . As primary sequence structure appears unimportant for non-specific RNA-binding, we propose that the first proteins produced in this manner were not catalytic, and could retain function despite a high mutation rate in the genomic sequence. Hence, we postulate that it was by virtue of the coupling of cleavage and ligation Ca transesterification) in the proto­ spliceosome that the first genetically-encoded proteins arose. Figure 3. SnoRNAs in the LUCA? The suggestion that snoRNAs date back to the RNA world may be independently examined depending on the placing of the root of tree of life. Currently the position of the root is unresolved, with bacterial and eukaryote rootings being considered as possibilities. A. If the bacterial rooting is correct, it is not possible to establish from the tree alone if the LUCA possessed snoRNAs. B . If the eukaryote rooting is correct, the most parsimonious explanation is that the LUCA contained snoRNAs, since these are then found on both sides of the root. The position of the root is in dispute, and since the rooting drastically affects the utility of the tree, it is difficult to use phylogenetic distribution to resolve the debate. Until a consensus is reached, biochemical arguments have to be relied upon (see text) . Page 36 E A B E A B a b ............................................. . . . ................................................................ \........................................................................ .......................................... . E A B E A B d snoRNAs RNA / � genome transcnpt functional snoRNAs liberated non-coding regIons / � non -functional ' transcript' released ' transcript ' util ised as stabilising template in peptide synthesis A. Common ancestor of Archaea , Eukaryotes had snoR NAs Tree alone cannot determ ine if LUCA possessed snoRNAs A E B LUCA B. snoRNAs lost from bacteria A B LUCA RNA worl d orig in __ � for snoRNAs E Table 1 . Candidate post-RNA world RNAs. RNA Distribution Function Comments References roXl & roX2 D. melanogaster Dosage roXl & roX2 a rc u n re l a ted, Fra n kc & Baker, 1 999 . c o m pensa t i on a nd nei ther a re re l ated to Xisl Xisl & Tsix M a m m als or Tsix. Tsix i s a n a n t i s e n se Lee et a I . , 1 99 9 ; reg u l ator o f Xisl. K e l l ey & K u roda, 2 0 0 0 . BC 200 P r i m ates Tra n s l a t i on reg u l a t ion i n BC 1 and B C 2 0 0 a re u n re l a ted, S k ry a b i n e t a I . , 1 998. d e n d r i tes b u t may be serve a n a l ogous BC 1 Roden ts ro l es . Both b i n d a prote i n M us l i mo v e t a I . , 1 998; homol ogous between P r i m a tes K re me rskothen et a I . , 1 99 8 . a nd Rodents. lin -4 C. elegans, C. briggsae A n t i sense reg u l a to r of lin-14 L e e e t a I . , 1 993 ; W i g h t m a n et a n d lin-28. a I . , 1 993 ; Moss et a I . , 1 997. lel-7 B i l ateri a n a n i m a l s A n t isense reg u l a to r o f lin-41 Pasqu i ne I l i et a I . , 2000. proba b l y i n l a te t e m poral t ra ns i t ions i n d e v e l o p ment. OxyS RNA E. coli Ox i d a t i v e s t ress - i n d i ced A l t u v i a et a I . , 1 99 8 ; a n t isense g l oba l i n h i b i tor o f Z h a ng e t a I . , 1 998a . t ra ns l a t i on i n i t i a t i o n . DsrA RNA E. coli A n t i sense reg u l a t o r o f I n h i b i ts H - N S trans l a t i o n , b u t Lease & B e l fort, 2000 tra n s l a t i on i n i t i at i o n o f global s t i m u l a ted R po S tra n s l a t i o n , act i n thro u h R NA - R N A MicF RNA D icF RNA meiRNA tmRNA G8 RNA 6S RNA gRNAs Gram- negative bacteria E. coli and R poS. Act i vator o f tra n s l a t i o n i n i t i a t i on o f OmpF A n t i s ense regu l a to r in cel l d i v is i on . Schizosaccharomyces pombe R egu l at i on of m e i os i s B acteria R i bosome/m R N A/prote i n E. coli, Erwinia carotovora Tetrahymena thermophila E. coli Kineto p l as tids of t ry p anosomes re l ease B i nds and i n h i b i ts CsrA g l obal regul atory prote i n Estab l is h m e n t o f themotolerance. Mod u l at i o n o f R N A po l y merase ac t i v i ty Ed i t i ng o f III N A tra n s c r i p ts i n terac t i o ns . R N A u d i l i ng by gLl idc N A argued to be a n c i e n t , b u t i s m o s t probably a n adapta tion t o M u l l er's ratchet (see tex t) . D e l i h a s , L 995 . D e l i h as , 1 995 . W a ta nabe & Y a m a moto, 1 994; O h no & Mattaj , 1 999. K e i l e r e t a l . , 1 996; K e i l e r et a l . , 2000 . R o m eo, 1 998. Fung et a l . , 1 99 5 . W assarman & S to rz , 2000 . s te v z & S i m pson , 1 999; S i m pso n e t a l . 2000. Bacteriophage �29 Bacteriophage �29 RNA Hammerhead P l an t pathogen ic R NA s ribozymes S a l a mander n u c l ear D N A Hairpin ribozymc Pl ant pathoge n ic R NAs Hepatit is delta H epat i t i s d e l ta v i rus virus ribozyme Neurospora VS ribozyme U7 snRNA Group I introns Neurospora M etazoa R NA hexamer req u i red fo r DNA packagi ng Genome rep l ical ion T ransc r i p l process i ng Geno me rep l ication V i ra l genome repl ication T ranscr i p t process i ng i n m i tochond r i a l D NA p l as m id H istone pre- m R N A process i ng W h i l e h i stones a re fou n d i n A rc haea, the l i m i ted d i stri b u t i o n of U7 sugges ts it a rose in e u k a ryotes, though more data a re needed. Mob i l e se l f i s h e l ement C ata l y s i s is v i a 3 ' O H o f Z h a n g e t a I . , 1 998b. Sy mons, 1 99 7 ; G a rretl e l a I . , 1 996. S y m o n s , 1 99 7 . B e e n & W i c k h a m , 1 99 7 . S a v i l l e et a I . , 1 99 1 ; R astog i et a I . , 1 996. M u l ler & Sc h u m pe r l i , 1 997 . Cec h & G o l d e n , 1 999; E u karyotic organ e l les & n u c leus, Phage, Bacteria guanos i ne, suppl ied in Lrans , a Ly k ke -A nderse n et a I . , 1 99 7 . Group 11 introns Phage, E u karyote orga n e l les, Mob i l e selfish e ement Bacteria mechan i s m d i s t i n c t fro m group II1s p l iceosoma l c ata lys i s . A rgued t o be e i t her e v o l u t i o n a ri ly re l a ted to the spl i ceosome or e v o l ved Logs d o n , 1 998 Cech & G o l d e n , 1 999. Diversity of CID & H/ACA snoRNAs Eukaryote nucleolus C l eavage, methy l a t i o n & pseudou ridy l a t i on of r R N A . a nd probab l y other R N As . rece n t l y de novo (see tex t) . CID box fa m i l y a re fou nd i n A rc h aea a l so. O p i n ion is d i v i ded on w h e t h e r these a re R N A world re l i cs (sce tex t) . me CI D ' noR N As a ppear to The e noR N As a re most be i n v o l ved in reg u l a t i o n o f p robab l y rece n t i n n o v a t i o n . b ra i n -spec i f i c ge n e e xpress ion . W e i n s te i n & S t e i tz. 1 999; O m e r e l a l . . 2000. Cav a i l l e e t a I . , 2000. Poole A, leffares D, Penny D. Early evolution: prokaryotes, the new kids on the block. Bioessays 2 1 , 880-889 ( 1999). Paper 4 Reprinted by permission of Wiley-Liss, Inc., a subsidiary of John Wiley & Sons, Inc. Paper S Penny D & Poole A. The nature of the Last Universal Common Ancestor. Current Opinion in Genetics & Development 9, 672-677 ( 1 999). Reprinted with permission from Elsevier Science. 674 Genomes and evolution Figure 3 Transcribed pre-RNA - - - pre-tRNA pre-rRNA - - - pre-mRNA RNA processing Mature RNA ct;NAS =i> Ribosome Current Opinion in Genetics & Development The RNA processing pattern in eukaryotes reflects that of the LUCA. An examination of RNAs involved in translation reveals a striking pattern. Precursor RNAs are processed by RNPs (ribonucleoproteins-RNA plus cognate protein) to yield mature RNAs. Furthermore, RNPs process other RNPs - snoRNAs are released by sn RNAs, the RNA component of the splicing machinery, which in turn are crucial for rRNA processing. In prokaryotes, some of these RNAs have been lost (shaded region), and indeed, in the case of pre-mRNA, the processing step has been lost completely. Eukaryotes have retained a more complete record of the supposed RNA-world processing pathway than have prokaryotes. of l ife between eukarya and archaea-bacteriais consistent with the conclusion that tbc genome architecturc of thc LUCA more closely resemhled that of eukarya. Thermoreduction and prokaryote origins I n postulating the nature of the LCCA, i t is essential to consider the selective forces that would give rise to either prokaryo[es or eukaryotes. Two selective forces that rein­ force each othcr have heen proposed by which prokaryotes could have evolved from an ancestor containing a eukary­ me-l ike genome: thermoreductjon a nd r-selection, [20··,21 ··,24}. r-sclected organisms arc fast-growing, com­ peting for nurrient sources which fluctuate greatly in abundance. Yeast is r-selecred when compared to an oak tree, which grows slowly, has a slow generarion rim<: a nd a fairly constant nutrient source (and is thus K-seleeted), and prokaryot<:s arc r-selccted relative to eukaryotes. r selec­ tion generally results in extremely fast and et1icient use of resources, hecause limited availability produces strong competition for these. At the molecular level. the result is that enzymes that affect metaholite uti l isation a nd organ is­ mal growth rate wi l l be driven toward p<:rfection at a faster rate than in organisms not under r selection. Thus, r selec­ tion may a t l east account partially for the observed replacement of R0:A enzymcs by protein in the prokaryote Ii neages [200.,2 1 • • ] ' The tbennoreciuction hypothesis [241 is that prokaryotes arose from mesophiies hy adaptation, via the loss of ther­ molabile traits, to h igh-temperature environments. This expla ins the loss of the ssRNA processing pathways (Figure 3) daring hack to the RNA-world. Single-stranded RNA is heat labi le, and would have been the Achilles' heel of early chermophiies. Accelerating ssRNA processin.g (mRNA., (RNA and rRNA) from hours (eukaryotes) to min­ utes (prokaryotes) would increase the viabi l i ty of an organism at high t<:mperawres. This loss of pre-mR0:A pro­ cessing, as well as the replacement of snoRNA-mediated rRNA processing with a protei n e nzyme system, would have been important steps in the evolution of thermophily. Unlike RNA, proteins are capable of extreme thermost