Copyright is owned by the Author of the thesis.  Permission is given for 
a copy to be downloaded by an individual for the purpose of research and 
private study only.  The thesis may not be reproduced elsewhere without 
the permission of the Author. 
 
The origins and evolution of prokaryotes 
and eukaryotes. 
A thesis presented in partial fulfilment of the 
requirements for the degree of 
Doctor of Philosophy 
in Molecular BioSciences 
at Massey University 
Anthony Masamu Poole 
2001 

Contents 
Papers and manuscripts included in this thesis. 
Related papers not included in this thesis. 
Acknowledgements. 
Introduction. 
Paper 1: 
Paper 2: 
Paper 3: 
Paper 4: 
Paper 5: 
Relics from the RNA world. 
The path from the RNA world. 
RNA evolution: separating the new from the old. 
Early evolution: prokaryotes, the new kids on the block. 
The nature of the Last Universal Common Ancestor. 
iii 
iii 
IV 
Paper 6: 
Paper 7: 
The origin of the nuclear envelope and the origin of the eukaryote cell. 
Prokaryote and eukaryote evolvability . 
Future work. 
Appendix: Does endosymbiosis explain the origin of the nucleus? 
Papers and manuscripts included in this thesis: 
1. leffares DC, Poole AM & Penny D. Relics from the RNA world. J Mol Evo146, 
1 8-36 ( 1 998) .  Reprinted with permissionjrom Springer-Verlag New York Inc. 
2. Poole AM, leffares DC & Penny D. The path from the RNA world. J Mol EvoI 
46, 1 - 17  ( 1 998). Reprinted with permission from Springer-Verlag New York Inc. 
3. RNA evolution: separating the new from the old. (manuscript). 
4. Poole A, leffares D, Penny D. Early evolution: prokaryotes, the new kids on the 
block. Bioessays 2 1 ,  880-889 ( 1999) . Reprinted by permission o/Wiley-Liss, 
Inc., a subsidiary 0/ John Wiley & Sons, Inc. 
5. Penny D & Poole A. The nature of the Last Universal Common Ancestor. Curr 
Opin Genet Dev 9, 672-677 ( 1 999). Reprinted with permission /rom Elsevier 
Science. 
6. The origin of the nuclear envelope and the origin of the eukaryote cell. 
(manuscript). 
7. Poole AM, Phillips Ml & Penny D. Prokaryote and eukaryote evolvability. 
Biosystems (submitted) . 
8. Appendix: Poole A & Penny D. Does endosymbiosis explain the origin of the 
nucleus? Nature Cell BioI 3, E173 .  [Letter] 
Related papers not included in this thesis: 
• leffares DC, Poole AM & Penny D. Pre-rRNA processing and the path from the 
RNA world. Trends Biochem Sci 20, 298-299 ( 1995). [Letter] 
• Poole A, Penny D & Sjoberg B-M. Methyl-RNA: an evolutionary bridge between 
RNA and DNA? Chem BioI 7, R207-R2 1 6  (2000). 
• Poole A, Penny D & Sjoberg B-M. Confounded cytosine ! Tinkering and the 
evolution of DNA. Nature Reviews Mol Cell BioI 2, 147- 1 5 1  (200 1 ). 
• Poole AM, Logan DT & Sjoberg B-M. The evolution of the ribonucleotide 
reductases: much ado about oxygen. J Mol Evol (accepted). 
iii 
Acknowledgements. 
Dad, thanks for teaching me how to think. If I can leam to be half as inquisitive and half as 
good a thinker as you, I'll be pretty chuffed. Mum, you taught me how to push myself when it is so 
easy to content oneself with the bare minimum. (Just as well with David as a supervisor!) I remember 
your reaction when I got 98.5% in a chemistry test at school. Rather than being happy that I scored so 
well, you asked me why I didn't get 100%! 
Angelik and Emma, you deserve a big mention because you challenged me to become an 
efficient worker so that I wouldn't forget to have a life outside science. (Nevertheless, I've been in front 
of the computer and ignoring you for weeks now.) You also made me happy even after the crappiest of 
days at work and helped me to believe that what I was doing wasn't a complete waste of time. Angelik, 
tack ocksa fOr att du visade mig Sverige. Visst ar det bra att ha en hemlig kod har i Nya Zeeland ocksa! 
I love you both more than you can know. 
Big thanks also to Alan & Dennis for putting up with me and all the antics that caused you 
earache! Annette! Thanks for all the letters, phone calls, great discussions and advice over the years, 
and for showing me Denmark! 
David, your fourth year course is to blame for you having to deal with me on and off over the 
last 7 years! You have always surprised me by giving me what I thought was an unreasonable amount 
of work, only for me to find that I could cope with the extra load (though at times I was swearing under 
my breath!), which has meant that I've learnt a lot of new skills that I otherwise wouldn't have made the 
effort to acquaint myself with. Oh, and your infinite store of anecdotes still continues to amaze me! 
Dan, the clever one in our twosome. I have to admit that many of the clever bits in this thesis 
are your fault. Pink Giraffes on the Pink Serengeti! Matt, cheers for your ESNDs (Paper 7) another 
good idea in this thesis that I can't take credit for! I'm still waiting for the opportunity to strap torches to 
the backs of your knees to see if I can reset your biological (un)clock. Was the best of times flatting 
with you both, even if Matt is a filthy b'stard in the kitchen and Dan drank all my whisky! 
Thanks Monbusho for the two years I spent in Japan, prior to beginning my PhuD, where I 
had the opportunity to broaden my mind, and do what I wanted when I felt like it! Thanks to Hideo Oka 
for helping me get into the University of Tokyo and Yoshiki Hotta, under whom I worked at Todai. 
When I got bored with what I was doing in his lab, Hotta sensei was kind enough to let me instead 
work on the RNA world problem! Very few people, if any, would give you such an opportunity. 
Thanks to Matt Ridley, who thought our ideas were worth telling the world about, and who is 
responsible for much of the attention that our ideas have been given. It was an incredible thrill to be 
reading his book the Origins of Virtue at the time he contacted us-indeed an eerie coincidence! 
I also wish to thank Britt-Marie Sjoberg, who accepted me into her group when Angelik and I 
had to move to Sweden. While the work I did with her is not included in this thesis, she has been a 
fantastic person to work with. Thanks to two years in her group, my knowledge of biochemistry is a bit 
more respectable than before! The stupid immigration laws in place in New Zealand at the time when I 
left for Sweden actually did me a favour. . .  
Also, many thanks to Trish, who will be helping me bind this beggar before I get on the plane 
on Tuesday! 
Thesis survival kit: Powerbook G3, Blur 1 3 ,  Komeda Pop pa svenska, Bjork Debut & 
Homogenic, the Cardigans Emmerdale, Ryuichi Sakamoto 1996, Pearl Jam Yield. 
Finally finally, thanks Dad for introducing me to The Hitch-Hikers Guide to the Galaxy and 
Homer's Odyssey all those years ago! 
Ant. 
IV 
---
- -� 
Introduction 

Introduction. 
Candidate's note. 
This thesis is a collection of papers, either published, submitted, or in preparation for 
submission, to international journals. Each chapter is a paper with an introduction, 
and can be read as a stand-alone paper, the purpose of the thesis introduction is to 
give an overview of the motivation for the work. It also reviews other approaches 
being taken ular with respect to establishing the evolutionary relationships between 
the three domains of l�fe, archaea, bacteria and eukarya. 
Problems with the accepted scenario for the origin of life. 
For most biologists, the big picture regarding the origin and evolution of 
prokaryotes and eukaryotes is not at issue, and recent evidence only serves to back up 
the intuitively obvious: complex eukaryotes evolved from simpler prokaryotic 
ancestors. In the standard account, prokaryotes predated eukaryotes by at least 800 
million years, as evidenced by cyanobacterial microfossils dating back 3 .5  billion 
years [e.g. Schopf & Packer 1 987, Walsh 1 992]. (The finding of molecular markers of 
eukaryote metabolism by Brocks et al. [ 1999] has pushed back the emergence of the 
earliest eukaryotes from 2.1 billion years to 2.7 billion years .) Establishing the root of 
the tree of life has shown that prokaryotes in fact consist of two domains, the archaea 
and bacteria, that the Last Universal Common Ancestor (LUCA) of all extant life 
lived at extremely high temperatures and that the eukaryotes emerged from the 
archaea [Woese & Fox 1977, Woe se 1 987, Woese et al. 1 990]. Prior to the emergence 
of cyanobacteria, life arose from prebiotic conditions on the early earth, and at some 
stage, possessed an RNA-rich metabolism. This period, dubbed the RNA world 
[Gilbert 1986, Benner et al. 1 989], predated both the emergence of genetically­
encoded proteins and of DNA as genetic storage molecule. 
The standard picture is therefore that, after the period of heavy bombardment 
that is suggested to have vapourised the oceans on Earth perhaps as recently as 3 .8  
billion years ago [reviewed in  Nisbet & Sleep 2001 ], life emerged, went through an 
RNA world period, a thermophilic prokaryote LUCA, and developed into 
cyanobacteria in an astonshingly short period of time - perhaps 300 million years 
[Lazcano & Miller 1 994]. Indeed, life may have arisen in an even shorter timeframe 
than this. Among the oldest rocks are those from the Isua belt of Southwest 
Greenland, which arguably date back around 3 .85 billion years. Enrichment of the BC 
isotope of carbon in these rocks have been argued to betray evidence of biological 
carbon fixation [Mojzsis et al. 1 996], 
1 
A closer look at any one of these 'established facts' ,  as with any area in 
science, suggests that none are as clear-cut as various popular science commentaries 
suggest. For instance, the earliest stromatolites do not contain micro fossils, and may 
have an abiological origin [Lowe 1 994, Grotzinger & Rothman 1 996], unlike those 
inhabited by modem cyanobacteria. The dating of the Isua belt is controversial, as is 
the argument that the enrichment of 13C found in rock samples from the belt is 
indicative of life [reviewed in Nisbet & Sleep 2001] .  Furthermore, a hot earth rules 
out the possibility of an RNA world, given the instability of both single-stranded 
RNA [Forterre 1 995a] ,  and of the bases which make up RNA, particularly cytosine 
[Miller & Bada 1 988,  Levy & Miller 1 998, Shapiro 1 999] . The suggestions of a faint 
early sun and a 'snowball earth' [Bada et al. 1 994, Nisbet & Sleep 2001 ]  potentially fit 
better with an RNA-rich period in the origin of life [Moulton et al. 2000], yet perhaps 
one of the few points on which prebiotic researchers agree is that life could not have 
begun with RNA [e.g. Joyce & Orge1 1 999, Nelson et al. 2000] , there must have been 
earlier phases. 
Another issue is the reliability of microfossil classification. Biologists seem to 
take the finding of 3.5 billion year old cyanobacteria as fact, yet forget that until the 
early work of Woe se & Fox [ 1977], there were only prokaryotes. The primary 
domains archaea and bacteria were indistinguishable morphologically and were 
initially characterised solely on the basis of phylogenetic grouping from sequence 
motifs. Another concern is that modem cyanobacteria carry out oxygenic 
photosynthesis, yet the evolution of atmospheric oxygen probably did not occur until 
around 2 .5-2.2 billion years ago [Ohmoto 1 996, Summons et al . 1 999] . Furthermore, 
it is not even always possible to distinguish prokaryote from eukaryote on the basis of 
morphology. Microbial symbionts in the gut of Surgeonfish were first characterised as 
eukaryotic protists [Fishelson et al. 1 985], and it was not until rRNA sequences were 
obtained that it was possible to establish unequivocally that these large symbionts 
were in fact prokaryotes [Angert et al. 1 993] . 
Finally, a hyperthermophilic LUCA is also at issue. Early work on reverse 
gyrase by Forterre [ 1995a] suggested that hyperthermophiles were not ancestral to 
mesophiles, and more recently, reconstruction of ancestral GC content suggests the 
LUCA was mesophilic [Galtier et al. 1 999] . While the domains archaea, bacteria and 
eukarya are now generally accepted, it has become clear that horizontal transfer of 
genes between these lineages has probably occurred at significant levels, so simple 
phylogenetic reconstruction from a single gene may not be an accurate reflection of 
the evolution of the three domains [e.g. Martin 1 999] . Moreover, the finding that 
microsporidia have been incorrectly placed as deep diverging eukaryotes [reviewed 
by Keeling & McFadden 1 998] has served as a reminder that there are fundamental 
phylogenetic problems that have yet to be resolved in the reconstruction of deep 
divergences [e.g. Lockhart et al. 1 996, Forterre 1 997b, Philippe & Laurent 1 998] . 
Indeed, as argued by Forterre [ 1995a,b, 1 997a,b], we should not only be cautious 
about the claim that the LUCA was a hyperthermophile, but moreover, it has never 
2 
actually been established that prokaryotes preceded eukaryotes in evolution. The 
evidence is at best circumstantial, and conclusions are accepted largely on the basis of 
the widespread assumption that this must be correct. 
Right or wrong, the assumption that, owing to their greater complexity, 
eukaryotes have evolved from prokaryotes, definitely holds sway. Consequently, the 
approach that many biologists take in approaching the origin of a particular structure 
is to use the diversity of modem structures to try and build a picture of how the 
structure gradually became more complex. That is, a succession of forms from the 
modem prokaryotic apparatus, to the modern eukaryotic apparatus .  This is flawed for 
several reasons. First, the assumption is made that whatever is prokaryotic must be 
ancient, and second, that there has been negligible change in the prokaryotic form 
since its advent. That the biolgical community can accept extensive horizontal transfer 
between prokaryotic organisms and extensive adaptation by prokaryotes to a wide 
range of dissimilar niches, at the same time as arguing that all prokaryotic structures 
are effectively living fossils, is amazing! Perhaps the most disturbing consequence of 
accepting a priori that prokaryotes predate eukaryotes is that the evolution of complex 
biological phenomena is approached as a purely descriptive problem. The direction of 
evolution is already known-simple to complex, and prokaryote to eukaryote. 
However, there is no inherent reason under Darwinian evolution that evolution 
should proceed from simple to complex [Szathmary & Maynard Smith 1 995] -
simplification may equally occur, as is evident in many examples of parasite 
evolution (e .g. Andersson & Kurland 1998, Grbic 2000, Wren 2000]. More 
problematically, with the solution implicit in the assumption, selection pressures are 
usually not given in trying to explain the origins of a structure, rather, the emphasis is 
on explaining the diversification!complexification of that structure, perhaps with 
natural selection as an afterthought (Paper 6]. This problem is in some respects 
parallel to the problem in developmental biology of always applying adaptationist 
reasoning in describing the evolution of structures, it is widely assumed that every 
observable trait must have a function, but this is unlikely to be the case [Gould & 
Lewontin 1 979, Gibson 2000, Paper 7]. 
Given that reductive processes are as much a feature of evolutionary change as 
is complexification (as exemplified by parasite evolution), I have avoided making the 
assumption that prokaryotes are ancestral simply because they appear simpler. 
Instead, I have examined a range of data relevant to extant prokaryotes and eukaryotes 
to establish the nature of the processes underlying evolution in these groups [Paper 7]. 
I have also examined how the origin of the three domains fits with the RNA world 
period in the evolution of life [Papers 1-4]. My conclusion, and the main point of this 
thesis, is that the prokaryote lineages appear to have undergone reductive evolution, 
whereas the beginnings of eukaryote complexity may date back to early inefficient 
metabolic genetic and cellular systems [Papers 2, 4-6]. Thus prokaryotes are simple 
because they are streamlined, while eukaryotes are perhaps complex by historical 
accident [Paper 7]. 
3 
The tree of life & the LUCA. 
The tree of life as it currently stands aims to describe the evolutionary 
relationships between all organisms on Earth, but also to provide, by extrapolation, 
insights into the likely nature of the Last Universal Common Ancestor (LUCA). The 
crowning achievement was the tree of life from small subunit rRNA sequences 
[Woese & Fox 1 977], which established the relationships between representatives of a 
wide spread of organisms. The work resulted in the discovery of the archaebacteria, 
later renamed archaea [Woese et al. 1 990], which as distinct as eubacteria and 
eukaryotes. This was a major improvement for understanding the relationships 
between prokaryotes (and also single ceIled eukaryotes) since many species appeared 
very similar in terms of morphology and ultrastructure. Subsequently, attempts were 
made to root the tree of life using paralogous gene sets (gene pairs which had a 
common origin, and which were expected to have undergone a duplication and 
divergence from a single original gene prior to the emergence of the three domains) 
[Gogarten et al. 1 989, Iwabe et al. 1989]. The overall aim of this was two-fold: to 
build a phylogeny describing the relationships of all organisms on the planet, and to 
determine which of the three domains is most like the Last Universal Common 
Ancestor (LUCA). While the pursuit of a tree of life has been plagued with 
difficulties such as the problem of long-branch attraction [Hendy & Penny 1 989; 
Philippe & Laurent 1 998, Forterre & Philippe 1 999], finding suitable genes for 
rooting the tree [Lopez et al. 1 999], the need to improve on the rates across sites 
assumption [Lopez et al. 1 999, Brinkmann & Philippe 1 999] and horizontal transfer 
[Teichmann & Mitchison 1999, Martin 1 999, Doolittle 1 999], and weaknesses and 
conflicts between individual gene data [e.g. Baldauf et al. 2000] there is still 
confidence that the correct tree can eventually be recovered. 
The controversy over the tree of life and difficulties with the dataset and 
methods used is not an issue that I consider in this thesis. Numerous articles in the 
literature discuss this issue [e.g. Doolittle 1 999, Snel et aL 1999, Brinkmann & 
Philippe 1 999, Teichmann & Mitchison 1 999, Stiller & Hall 1 999, Forterre & 
Philippe 1 999, Philippe & Forterre 1 999, Baldauf et al. 2000, Penny et al. 200 1 ]. 
Instead, I will consider the problems inherent in using the tree for inferring the 
nature of the LUCA. Reconstructing the tree of life to is central to understanding 
evolutionary relationships between all organisms on Earth. Continuing attempts 
should be made, despite the problems inherent with recovering phylogenetic 
relationships for such deep divergences [Penny et al. 200 1 ]. I shall suggest however 
that, even if the correct tree were recovered, it would be largely uninformative for 
gaining an insight into the nature of the LUCA. It is my aim to describe exactly how 
the tree could be useful, and what the caveats and limitations of using the tree for 
evolutionary inference are. Important to that discussion is the issue of how horizontal 
4 
transfer affects the tree, and whether the effect is so great that the tree becomes 
unresolvable, as has been suggested by Woese [ 1998]. 
Attempts have been made to overlay characters onto the tree (such as 
thermophily) in order to examine the LUCA. However, there has been little 
consideration of the compatibility with earlier scenarios for the origins of life, based 
on physicochemical data. For instance, if the LUCA was thermophilic [W oese 1 987], 
given the thermolability of RNA [Forterre 1 995a, Papers 2&4], it is difficult explain 
how presumed relics from the RNA world have been retained. Indeed, establishing the 
position of the root cannot provide an answer to the question of the nature of the 
LUCA-it is virtually uninformative from this viewpoint [Forterre 1997b, Paper 5]. 
An approach I take in this thesis is to consider the diversity of RNA in modem 
organisms. Taking the model for the RNA world, a physicochemical approach to 
understanding the replacement of RNA by protein in evolution is possible. By 
recognising the properties of RNA, it is possible to identify niches where RNA would 
be expected to be lost, or at the very least reduced severely in its use. The link with 
the RNA world, plus the adherence to the properties of RNA enabled me to take a 
model for the RNA world [Paper 1 ]  and apply it to the problem of the nature of the 
LUCA [Papers 2&4]. This was done by examining the phylogenetic distribution of 
putative RNA world relics. Furthermore, the properties of RNA meant it was possible 
to examine the problem of polyphyletic gene loss for the RNA dataset, which gives a 
marked improvement over application of simple parsimony [Paper 5]. 
The minimal genome concept and reconstruction of the LUCA. 
Currently, an active area of research has been in trying to derive a minimal 
genome, that is, the smallest gene set required for a functional cell [Mushegian & 
Koonin 1 996, Mushegian 1 999, Hutchison III et al. 1 999]. Initially, it was considered 
that this approach would provide a useful means of examining the likely genomic 
make-up of the LUCA [Mushegian & Koonin 1996], though it is now being 
acknowledged that a minimal genome and the LUCA are not one and the same thing 
[Mushegian 1 999; Paper 5]. 
A minimal genome is defined by the nature of its environment, and hence will 
differ depending on the genomes compared. In their initial work, Mushegian & 
Koonin [ 1996] compared the genomes of Haemophilus inJluenzae and Mycoplasma 
genitalium (at the time, the only two genomes available for analysis) .  Their 
reconstruction produced a minimal genome of 256 genes that could be argued to be 
both necessary and sufficient for the function of a modem cell. This minimal gene set 
was criticised by Becerra et al. [ 1997] because it led Mushegian & Koonin [ 1996] to 
argue that the LUCA had an RNA genome! Mycoplasmas are parasitic and the 
alternative explanation for the lack of de novo deoxyribonucleotide synthesis is that 
they obtain these from their host. This is a likely example of loss resulting from 
5 
intracellular parasitism, and highlights the shortcomings of a minimal gene set as an 
approximation of the LUCA. 
Leipe et al. [ 1999] contend that the LUCA had a genome consisting of both 
RNA and DNA, since their genomic analysis suggests that the bacterial DNA 
replication machinery is unrelated to the archaeal and eukaryal machinery. The coding 
capacity of RNA is so low that it is unlikely that an organism as complex as the 
LUCA had an RNA genome. Likewise, the ubiquity and common origin of 
ribonucleotide reductases argues against this [Poole et al. 2000). Forterre [ 1 999] has 
also pointed out that other DNA replication proteins share a common origin, and that 
anomalies in the others may be a result of non-orthologous gene displacements. 
Following on from their systematic construction of a modem day minimal 
gene set, Mushegian & Koonin [ 1996] suggested how this gene set could be reduced 
to a set that would provide a model of a simpler ancestral cell: 
1. Examine pathways requiring complex cofactors and eliminate those of them that 
can be bypassed without the use of the cofactors. 
11. Eliminate the remaining regulatory genes. 
iii. Delineate paralogs and replace at least the most highly conserved families with 
a single, presumably multifunctional "founder." 
IV. Apply the parsimony principle: those systems and genes that are not found in 
both bacteria and eukaryotes or both bacteria and archaea are unlikely to come 
from a primitive cell. 
They also suggest: 'It has to be kept in mind that not only reduction but also certain 
additions to the minimal gene are likely to be required to produce a realistic model of 
a primitive cell. The most important of such additions may be a simple system for 
photo- or chemoautotrophy'. 
Points i-iii are simplifications for which the only basis is the notion that the 
direction of evolution was always from simple to complex. There is no inherent 
requirement that organisms will tend towards greater complexity during evolution 
[Szathmary & Maynard Smith 1995]. Indeed it has been argued that prokaryotes arose 
through a process of reductive evolution, with aspects of eukaryote genome 
architecture and RNA processing being more indicative of the make-up of the LUCA 
than those found in prokaryotic organisms [Forterre 1995a, Glansdorff 2000, Papers 
6 
Table: Difficulties with using distribution to establish whether a gene was a 
feature of the LUCA. 
Bacteria Eukaryotes Archaea HTa RNA world In LUCA? 
relic 
Gene 1 ./ ./ ./ X YES 
Ubiquitous 
No HT 
Gene 2 YES 
Predates LUCA 
Gene 3 UNCERTAIN 
Unplaceable if 
extensive HT 
Gene 4 ./ X X X X UNCERTAINb 
Gene 5 X ./ ./ X X UNCERTAINc 
aHT: Horizontal transfer. 
bI f  eukaryotes and archaea are monophy l etic, Gene 4 could either be argued to be a 
feature of the LUCA (with a single l oss prior to the archaea-eukaryote divergence) , or 
to have arisen in the bacterial l ineage after it split from archaea-eukaryotes. I f  bacteria 
and archaea are m onophy l etic, Gene 4 could be a feature of the LUCA with two 
independent l osses (once from archaea and once from eukaryotes) ,  or may have arisen 
specifical l y  in the bacterial lineage, after it split from archaea. 
CIf eukary otes and archaea are monoph y l etic, it is as likely that Gene 5 arose in the 
com mon ancestor of these two groups as it is that it was a feature of the LUCA. If 
bacteria and a rc h aea are monoph y l etic, parsimony would suggest the gene was a 
feature of the LUCA, w ith l oss from bacteria. 
7 
2, 4 & 5]. Finally, reductive evolution is a hallmark of the mycoplasmas 
[Fraser et al. 1 995] and such reductive evolution may be a hallmark of the parasitic 
lifestyle of the organism [Andersson & Kurland 1998, Paper 7]. An example is the 
different degrees of degradation of the S-adenosylmethionine synthetase gene in 8 
species of Rickettsia [Andersson & Andersson 1 999), which are obligate intracellular 
parasites. Thus the minimal genome concept may better represent the minimal 
parasitic/obligate intracellular symbiont genome; further reduction would produce an 
even more extremely minimal parasitic genome, not an approximation of the LUCA. 
Mitigating against points i-iii is their final comment. However, this reduces 
the worth of the minimal genome approach to understanding the LUCA, since one 
may add or remove anything, without a specified framework that enables additions or 
removals to be evaluated. The RNA world model suggests that many RNA processing 
pathways absent from prokaryotes should be included in any reconstruction of the 
make-up of the LUCA [Papers 2,4&5]. 
The likelihood then is that the LUCA was not 'minimal' as mycoplasmas or 
other obligate intracellular parasites are. Importantly, paralogous genes (point iii) are 
expected to have been a feature of the LUCA, and these have figured in attempts to 
root the tree of life [see Forterre & Philippe 1 999, Glansdorff 2000, for review). 
While paralogous genes have originated from a single "founder", the duplications that 
gave rise to some paralogues will have occurred prior to the emergence of the three 
domains of life. More generally, throwing away paralogues may mean that a minimal 
gene set could be underestimating the level of complexity of the LUCA. The problem 
with which we are faced is then, given a minimal gene set as a starting point, how to 
decide what features should be removed, and what should be added? 
Finally , point iv is that simple parsimony is a useful tool for reconstructing the 
LUCA. Given the three domains, archaea, bacteria and eukaryotes, the presence of a 
trait in two of the three is not in itself strong evidence for the presence of that trait in 
the LUCA. If agreement on the topology of the tree, and hence the position of the 
root, can be reached, this may guide the use of parsimony in tracing genes back to the 
LUCA [Forterre 1 997a, Papers 3 & 5J. Rigid application of parsimony however may 
wrongly exclude genes that can be traced back to the LUCA on other grounds, or 
exclude genes for which no other evidence of their ancestry is evident. 
Building on the minimal genome. 
In terms of reconstruction of the LUCA, the minimal genome concept should 
not be abandoned, but its limitations should be noted. It may help to take the minimal 
genome concept as a starting point, as it provides a powerful way of sorting through a 
large number of traits to establish which can possibly be traced back to the LUCA. 
Certainly, the conceptual difficulty of reconstructing the RNA world [Papers 1 , 3&4] 
is similar in this regard, but the nature and size of the dataset makes it easier to 
8 
distinguish, ad hoc, putative RNA world relics from RNAs that have evolved more 
recently [Paper 3]. Based on Mushegian & Koonin's [ 1 996] original proposal, along 
with current attempts to reconstruct the LUCA using a model for the RNA world 
[Papers 2, 4 & 5], I suggest the following amendment, where I remove and replace 
criteria i-iii, amend iv and effectively expand their final point on additions to include 
the RNA world data (see table). This provides a tentative method for how to go about 
reinserting some traits into a minimal gene set to improve the reconstruction of the 
LUCA: 
1 .  Inclusion of synthetic pathways for pyridine nucleotide cofactors because these 
are likely RNA world relics, though not necessarily of pathways requiring these 
cofactors. Rather, it is the generic reaction chemistries that should be considered 
ancestral. 
2. Inclusion of putative RNA world relics, even where these are not universal in 
distribution. 
3. Reintroduce paralogues in those cases where these clearly diverged prior to the 
divergence of the LUCA into the three domains.  
4 .  Apply simple parsimony with caution: under certain circumstances, i t  is weak or 
misleading (see table) . Current disagreements on the position of the root (and 
therefore the relationships between the three domains) makes it difficult to use this 
in examining possible polyphyletic losses or gains. 
5 .  The ability to describe a large number of traits as  ancestral or  derived on the basis 
of a single selection pressure should permit reconsideration of some datasets 
which may not otherwise be included in the minimal genome. 
The problem of horizontal transfer. 
Much has been made of the question of horizontal transfer in the three 
lineages.  It is still debated how extensive this is - some authors have argued for 
massive unbridled horizontal transfer events [Woese 1 998, Doolittle 1 998], some 
have argued that there are detectable patterns to the process [e.g. Jain et al. 1999, Lan 
& Reeves 2000, Paper 7], and some have suggested there is very little transfer at all 
[Snel et al. 1 999]. The other issue is whether this transfer is extensive and ongoing 
[Ochman et al. 2000, Lan & Reeves] or whether it was extensive and has possibly 
slowed [Woese 1 998]. The need for caution is obvious: horizontal transfer of genes 
will blur the ability to trace a given gene back to the LUCA, meaning that until it is 
possible to recognise even ancient horizontal transfer events, it will pay to be 
judicious with the application of parsimony. This may mean in effect that careful 
studies of the distributions of various genes within the diversity of life will be 
essential, and furthermore, that it will be crucial to develop ever more sensitive ways 
of recognising potential cases of transfer. Again, the tree of life will be a useful tool 
here, as limited distribution of a gene within one domain may provide a means of 
9 
homing in on potential transfer events. Nevertheless, like the simple parsimony 
approach to the three domains, this will require that we have reconstructed the correct 
tree if it is to be of any use. 
A clear example of how the difficulties of tree topology and possible 
horizontal transfer weakens the propensity for theory to examine events in early 
evolution is that of the 'respiration early' hypothesis [Castresana & Saraste 1 995, 
Castresana & Moreira 1 999]. Here the authors acknowledge that their argument rests 
on the assumption that the position of the root is correct, and that horizontal transfer 
has had no impact on the traits they examine. The hypothesis is inherently testable, 
but the prerequisite for testing it is that tree topology can be established, and that the 
impact of horizontal transfer can be evaluated. If one takes the extreme view of 
Woese [ 1 998, 2000], it is not possible to test any such hypotheses, and the result is a 
situation whereby competing theories are evaluated on intuition or popularity, not on 
hypothesis testing. 
Current evidence argues that while genes involved in metabolic processes may 
transfer extensively, those involved in informational processes [sensu Rivera et al. 
1998] tend not to be transferred very frequently, and some may not transfer at all [Jain 
et al. 1 999]. It is thus a crucial goal of genomics to determine how frequent horizontal 
transfer is, between which types of organisms it tends to occur, and whether it applies 
to all genes [Martin 1999, Lan & Reeves 2000]. The ultimate goal is to construct a 
network describing genomic evolution, with those components of the genome that are 
subject to horizontal transfer overlain on a tree that describes organismal 
relationships, as determined by vertical transmission [Martin 1999]. Horizontal 
transfers have been suggested to contribute strongly to speciation events [de la Cruz 
& Davies 2000, Lawrence 1999], though currently there is no reason to suggest that 
these are more frequent than speciation by descent, particularly when one considers 
that there can be large intraspecies genome differences in prokaryotes [Lan & Reeves 
2000]. Indeed, as Lan & Reeves [2000] point out, applying the species concept to 
prokaryotes will require a very different approach to the framework used for sexual 
organisms. In multicellular eukaryotes, where extensive cell specialisation makes 
transfers less likely than in single-celled organisms, speciation through horizontal 
transfer is likely to be rare [Paper 7]. However, in both unicellular and multicellular 
eukaryotes, there are strong indications that many genes have been transferred from 
organelles to the nucleus [Martin et al. 1 998, McFadden 1 999, Berg & Kurland 2000]. 
A tree of genomes is most likely to be part tree, part network and would 
indicate organismal relationships in terms of descent by modification, and gene 
relationships in terms of mode of transition. Some regions of the tree may have 
limited network structure, some may have extensive network structure, with tree 
branches being highly unreliable [Martin 1 999]. 
Given known difficulties with phylogenetic analyses for deep divergences 
[Lockhart et al. 1 996, Philippe & Laurent 1 998, Lockhart et al. 1 998, Philippe & 
Forterre 1 999, Penny et al. 2001 ]  how can cases of transfer be distinguished from 
1 0  
problems of phylogenetic reconstruction? There are two aspects. One is to determine 
the nature and extent of horizontal transfer, and should be approached as a biological 
problem. What is the evolutionary basis for horizontal transfer between organisms, 
and what patterns emerge? Does transfer occur non-specifically given proximity 
between two organisms, or is transfer dependent on selection? Some aspects of 
horizontal transfer are considered in papers 6 and 7. In paper 7, I consider horizontal 
transfer from the viewpoint of organismal evolvability, and argue that extensive 
horizontal transfer has a selective component. The other aspect, which is not 
considered in any depth in this thesis, is how cryptic transfers can mislead 
phylogenetic reconstructions [Teichmann & Mitchison 1 999, Philippe et al. 1 999], 
and bioinformatic [Lawrence & Ochman 1 998, Nelson et al. 1 999, Ochman et al. 
2000] and experimental [reviewed in Lan & Reeves 2000] approaches for establishing 
patterns of transfer. 
Given the correct tree, some transfer events may in principle be identifiable, 
and so should traits dating back to the LUCA [Paper 5]. A trait that is found on both 
sides of the root can be best explained as loss in one of the three domains, and hence 
the most parsimonious explanation is a strong one. A trait that appears in two of the 
three domains, but where the two domains containing this trait group together (i.e. are 
monophyletic), is uninformative, and parsimony is not sufficient. Without further 
knowledge, it is not clear if the trait is ancestral or derived since the grouping of the 
two domains means the tree is reduced to a 'V' shape (Figure), with the two domains 
that form a monophyly being represented by a single branch. Nevertheless, the 
topology makes the application of parsimony weak, and it is also important to note 
that independent losses are much more likely than independent gains [Forterre 1 997a]. 
In reconstructing the LUCA, it should be possible to examine whether there 
are other arguments for the inclusion of a particular gene, even if it has undergone 
horizontal transfer. Since function is of greater importance than whether there has 
been horizontal transfer, there may be cases where, say, a metabolic pathway can be 
included in the LUCA, even though one or more of the genes has been shown to have 
undergone horizontal transfer. For instance, numerous arguments have been made for 
an early origin for the TCA cycle [Wachtershauser 1 992, Morowitz et al. 2000], so 
this may be a good candidate for inclusion on the basis of function as opposed to 
inclusion on the basis of presence in the minimal genome dataset. In Paper 3 a similar 
approach is taken in distinguishing betwen the ultimate origin of an RNA, and recent 
recruitment to new function (proximate origin). 
1 1  
A 
more l ik e  
A or B .  o r  
in between? 
B c 
in between? 
Figure: The topology of the tree of hie 
is uninformative as to the nature of the 
organism at the root. Above: I f  
topology a l one i s  considered, i t  is not 
possibl e  to establish w hether the 
organism at the root is most like l ineage 
C, or A+B, or in between these. Indeed, 
the same holds for the A - B  monoph y l y. 
Overl aying characters on this tree to 
establish the nature of the root is 
l ikewise problematic, especia l ly since 
h o rizontal transfer may m is lead such 
analyses. Right: A s hared character in 
a l l possible combinations, overl ain on 
trees either rooted by bacteria or 
eukaryotes. B lue = presence, Grey = 
absence. For 2,5,9 & 10, independent 
gains are unlike l y. A l l  other trees are 
equa l l y  parsimonious for each s hared 
character combination. If blue denotes 
l oss, then these trees are still favoured 
as independent l osses are easier to 
expl ain than independent gains. E.g. for 
5 & 6, if grey is an R N A  world relic, 
one vs two independent l osses coul d  
onl y  be evaluated by knowing the 
position of the root [Paper 5]. T rees 9 & 
10 could be expl ained by 
mitochondrion to nucl eus gene transfer. 
Extended from Forterre [1997a] . 
E A B E 
1 2 
A B 
..... � ......... ...... 
A
� r � ..... � . . . ....... � .... .
3 4 
E A B 
5 6 .......................................................... ............. : ...................................................................... . 
E 
7 
E 
9 
A B 
A B 
I 
E 
8 
E 
A B 
A B 
'I 
.......................... .............................................. : ..................................... ,' ............................. .
E A B E A B 
Using the tree for reconstructing LUCA 
Broadly, the problems faced in reconstructing the tree of life are two-fold: 
current phylogenetic techniques are not able to recover the correct tree with any 
certainty, and horizontal transfers may further complicate reconstruction [Paper 5]. If, 
even with extensive horizontal transfer, the three domains, archaea, bacteria and 
eukaryotes can be shown to hold, a low-resolution tree of life will be recoverable, and 
that this can be rooted using various tricks such as using a paralogous gene as an 
outgroup (building separate unrooted trees from two genes that duplicated before the 
divergence of the three domains in order to root one tree with the other) [e.g. 
Gogarten et al. 1 989, Iwabe et al. 1 989, Brinkmann & Philippe 1 999], can we then use 
the tree to obtain information on the root? 
The fundamental problem with the tree as it currently stands (technical 
difficulties in reconstructing relationships aside) is that, at its lowest resolution, it 
attempts to describe the relationships between three monophyletic groups: archaea, 
bacteria and eukaryotes. Wherever the root is placed, it is difficult to infer much about 
the evolutionary relationships between groups of organisms (even when characters are 
overlain - see figure), and a rooted three-pronged tree can in principle establish 
whether two of those groups come together as a monophyletic group. Rooting the tree 
in the phylogenetic sense is an important means by which to examine the monophyly 
of the prokaryotes [Brinkmann & Philippe 1 999]. What it absolutely cannot do 
however is to establish the nature of the LUCA. The outgroup is often argued to 
indicate which lineage is most likely to resemble the organism at the root, but this is 
incorrect (Figure) . The structure of the tree is uninformative, and importantly, 
phylogenetic trees do not in themselves describe a process of evolutionary change. 
Their utility comes when, given the correct tree, various characters or traits can be 
overlaid upon the tree, giving a more complete picture of evolution. A recent example 
is the use of both fossils and molecular sequence data in reconstruction of the 
evolution of echolocation in bats [Springer et al. 2001]. 
The topology problem in the tree of life is fairly straightforward (Figure). The 
process of inference from phylogenetic trees has been to argue that the deepest­
diverging groups in the branch that leads to the root provide insight to the nature of 
the LUCA. This has led to the widely-accepted proposal that the LUCA was 
hyperthermophilic and much like modern bacteria [e.g. Woese 1987]. 
Without considering the phylogenetic arguments for and against this proposal, 
let us first consider the implication of a split in the tree defining two domains 
(Figure). If domain A and B are shown to be related in the tree with the exclusion of 
group C, what can we infer about the common ancestor of A and B? Was it more like 
A, more like B ,  or did it have traits characteristic of both, some of which they still 
share in common? Or was it still like C? Considering the whole tree results in the 
same problem-it is not possible to decide if organisms that constitute 'outgroup' C in 
general, and deep-dranching members of group C in particular are more representative 
of the organism at the root. The branch that leads to the 'monophyletic' grouping of A 
1 3  
and B could potentially provide just as much information on the nature of the 
organism at the root of the tree. If one of these three has maintained most metabolic 
traits of the common ancestor, it is not clear from the pattern of divergence given by 
the tree which of these three this is. When the ancestors of A, B and C diverged, it 
could have been that C underwent a series of reductions, whereby many ancestral 
traits were lost in the evolution of this domain, so that, even though the other two 
groups diverged from each other more recently, one or both may have retained more 
ancestral traits than has C. Alternatively, it could be the opposite 1 
Rooting of a tree with three groups (Figure) implies that A and B are 
monophyletic, and hence the tree could be represented in simplified form with two 
branches, and A and B together constituting one domain .  Which group then is most 
similar to the organism at the root-the AB monophyly or C? No such information 
can be recovered simply by looking at branching patterns on a tree. 
The tree clearly gives us important information on evolutionary splits between 
major lineages, but it offers no information on which traits can be traced back to the 
ancestor of all three groups. That said, evolutionary inference based on the tree of life 
has not relied solely on the topology - the standard interpretation is that thermophily 
appears in the deepest branches of both archaeal and bacterial domains, leading to the 
contention that the LUCA was a hyperthermophile [Woese 1 987]. Given that rooting 
the tree supported the grouping together of archaea and eukaryotes to the exclusion of 
bacteria, this was a correct conclusion, assuming the relationships between the three 
domains were correctly recovered, and assuming that hyperthermophily evolved only 
once. If so, then, given hyperthermophily is recovered in both branches of the tree 
(i.e. it traverses the root), this argues that this is the ancestral state (Tree 7 in figure). 
The bacterial rooting is subject to continued scrutiny as phylogenetic methods 
improve, and the hypothesis that the LUCA was a hyperthermophile is likewise 
testable. Indeed, there have been several criticisms on both the rooting of the tree, and 
the conclusion that the LUCA was a hyperthermophile. The competing hypothesis is 
that the bacterial rooting is a consequence of long branch attraction [Brinkmann & 
Philippe 1 999, Lopez et al. 1 999, Forterre & Philippe 1 999]. An examination of the 
phylogenetic distribution of putative RNA world relics [Papers 2 & 4], gyrases and 
topoisomerases [Forterre 1 995a), ancestral GC content [Galtier et al. 1 999] and low 
stability of RNA at high temperature [Moulton et al. 2000] argues that the LUCA was 
mesophilic. These independent approaches argue that eukaryotes have retained a 
number of ancestral features that date back to the LUCA, while archaea and bacteria 
have lost these. Furthermore, the stability of hyperthermophily as a character has also 
been questioned, with several reports that hyperthermophilic traits common to both 
bacteria and archaea having undergone horizontal transfer [Nelson et al. 1 999, 
Aravind et al. 1 999, Forterre et al. 2000], and other traits, such as the lipid 
composition of hyperthermophile membranes [reviewed in Daniel & Cowan 2000], 
suggest hyperthermophily has evolved twice independently [Forterre 1 996]. 
14  
The tree of life displays the evolutionary relationships between extant 
organisms as patterns of divergence on a tree. All living organisms are thus billions of 
years removed from the LUCA, such that the deep branches do not necessarily 
represent 'living fossils', only the pattern of evolutionary divergence. Indeed, 
indications from current tree building methods are that it is the fastest-evolving 
lineages that are most likely to take basal positions because most current tree 
reconstruction methods tend to provide a measure of evolutionary distance which is 
affected by rate of evolutionary change [Laurent & Philippe 1 998, Stiller & Hall 
1 999, Brinkmann & Philippe 1 999] . The pattern of evolutionary divergence is not 
recovered because it has not been possible to build trees that correctly take into 
account rate variation between lineages. Brinkmann & Philippe [ 1999] have been able 
to demonstrate how Long Branch Attraction [Hendy & Penny 1 989] affects the 
overall topology of the tree, using an implementation [Lopez et al. 1 999] of the 
covarion model [Fitch & Markowitz 1 970, Fitch 197 1 ]  to separate out fast -evolving 
and slower-evolving sites. With the fast-evolving sites, which will tend to become 
saturated, archaea and eukaryotes group together, but taking the slower-evolving sites 
returns a tree where the root is in the eukaryote branch, and the prokaryotes are 
monophyletic. If correct, the tree severely weakens the conclusion that the LUCA was 
a hyperthermophile, as this trait is now found in one branch only: the monophyletic 
prokaryotes (see trees 6 & 8 in figure). 
Phylogenornics. 
Nevertheless, the problem remains. Given the alternative trees: Brinkmann & 
Philippe's [ 1999] bacteria-archaea monophyly or the eukaryote-archaea monophyly 
[Woe se et al . 1990, Iwabe et al. 1 989, Gogarten et al. 1 989], which is right? One 
alternative has been to move away from single genes and attempt to use whole 
genomes in phylogenetic analyses [e.g. Sicheritz-Ponten & Andersson 200 1 ] .  
Genomics (unlike conventional phylogenetic analyses of  one gene conserved 
across all organisms in the study) promises to allow us to compare all genes in a 
group of organisms. This is achieved in two ways. The simplest is counting the 
number of genes that are shared. Relatedness is based on the number of genes in 
common with other species in the study [Snel et al. 1 999] . The other is carrying out a 
global phylogenetic analysis of genes that are shared in order to try and build a 
composite tree using sequence data. A more modest and potentially very powerful 
approach is a composite tree, where genes which have individually been shown to be 
informative in reconstructing distant phylogenetic relationships are used to produce a 
combined dataset. A recent analysis of the phylogeny of eukaryotes is one such 
example [Baldauf et al. 2000] . 
Nevertheless, these approaches are not necessarily expected to provide 
significant improvements to single-gene trees. A consensus tree over all, or for each 
1 5  
of, the three domains, where there is general agreement for several different genes, all 
of which contain sufficient phylogenetic information from which to build a tree is not 
yet achievable. Protein and RNA trees give conflicting results [Forterre 1 997b, 
Philippe & Forterre 1 999]. At worst, large-scale 'phylogenomic' analysis simply 
amounts to adding more data without attempting to address limitations of models in 
current tree-building algorithms [Lopez et al. 1 999, Penny et al. 200 1 ]  which require 
each site always to evolve at the same rate. Furthermore, it is not clear how genome­
level comparisons will be able to deal with the problem of horizontal transfer. Snel et 
al. [ 1 999] used gene presence and absence in 1 3  genomes as a phylogenetic character, 
claiming that their analysis supports the 16S rRNA tree and that horizontal transfer 
was not extensive. However, such an analysis might miss orthologous gene 
replacements as well as independent gains and losses through horizontal transfer. 
If it is assumed that the problem of horizontal transfer is real, and that those 
genes which do transfer can be distinguised from those that do not, should the former 
be eliminated from reconstructions of the LUCA? These cannot be reliably traced 
back to the LUCA, unless independent criteria for their inclusion can be used (see 
table) . From the subset that are primarily transmitted vertically, which are ancestral 
traits, and which are derived? That is, which were present in the LUCA, and which 
arose later? The difficulty here is that there is no good methodology for deciding this .  
One could use parsimony, such that where two of the three have a trait it  is ancestral, 
and where two of the three lack it, it is derived. Parsimony as a rule is fraught with 
problems, especially where one applies it to three groups, as it could easily lead to 
artificial groupings of ancestral and derived traits [Forterre 1 997a, figure]. Gene loss 
versus the origin of novel genes cannot be inferred without some evolutionary 
precedent, and parsimony is insufficient in three-domain problem [discussed in Paper 
3 for the origin of snoRNAs]. Nor, as we have seen, does the tree give such precedent 
(e.g. if it is in C it is ancestral, if it is A and B but not C, it is derived), so this must be 
established through other lines of inquiry. 
Non-phylogenetic approaches. 
Using a genomic approach, many traits are simply not amenable to analysis, 
either because of horizontal transfer, or because traits which are not ubiquitous in 
distribution cannot always be reliably argued to date back to the LUCA on the basis 
of parsimony alone (Table). With current methods, those that turn out to have been 
subject to extensive horizontal transfer may not be reliably examined in the context of 
the LUCA problem, though cases where transfer turns out to be only very limited 
might be expected to be. 
Since the reconstruction of the LUCA depends most on rebuilding a rough 
picture of metabolism before the emergence of the three domains, it is not necessary 
to use phylogenetic-based approaches in justifications for the antiquity of a given 
1 6  
trait. While less ambitious than the minimal gene set [Mushegian & Koonin 1 996], an 
alternative is to try to identify ancient metabolic traits, even if they are limited in 
distribution. 
In this thesis, I have attempted to do just that. A means of examining some 
aspects of extant metabolism is the application of the RNA world theory to the 
problem, in the first instance to identify RNA species which are likely to be ancient 
[Papers 1 &3], and subsequently,  to explain the asymmetric distribution of these in 
modern species based on known principles. Since the tree gives us very limited 
information on the likely nature of the LUCA, owing to the rooting problem, an 
alternative that examines this is essential. 
While the notion of an RNA world may or may not represent an intermediate 
in the evolution of life, currently there is no real alternative for understanding the 
origins of proteins and DNA. Certainly, it seems highly likely that RNA played a 
more prominent role in metabolism than it currently does, and not only is there a good 
physicochemical and biochemical basis for expecting RNA would be replaced over 
time by proteins and DNA, a number of RNAs, such as rRNA, tRNA, srpRNA and 
RNase P, are found to be ubiquitous [Papers 1&3]. The biggest problem is trying to 
identify candidate relics and, although criteria have been put forth that aid in 
distinguishing between relic RNAs and recent additions to metabolism, the approach 
is necessarily ad hoc [Papers 1 ,3&4]. Importantly, it is not an absolute requirement 
for candidate RNA relics to be ubiquitous, and this offers an improvement over 
parsimony, and abrogates the need for the correct tree in evaluating aspects of the 
nature of the LUCA. 
Expanding LUCA: how easy or hard is identification of ancient metabolic traits? 
Some ancient metabolic traits can be identified if they are ubiquitous and have 
been demonstrated not to have been subject to horizontal transfer. This is in itself 
likely to pose a difficult technical problem, as horizontal transfer would make it 
impossible to judge on distribution alone whether or not the trait was ancient. 
Those traits that are not ubiquitous represent an equally formidable problem. 
How can such ancient traits be identified from a tree based on a single gene, or, from 
a tree based on comparisons of genome content (where presence/absence of a gene is 
a character) , or a composite tree where several ubiquitous genes give the same tree? 
Again, one could apply parsimony. However, a tree cannot be used to infer 
evolutionary pressures that account for changes along a branch, because the branching 
pattern alone cannot identify such pressures [Forterre 1 997a]. It may however point us 
in the right direction, provided the topology problem is taken into account. For 
instance, if we are able to unambiguously determine the relationships between the 
archaea, bacteria and eukaryotes ,  the monophyly of two, for example archaea and 
bacteria, can greatly improve the usefulness of the parsimony rule in certain 
1 7  
situations. For instance, given the tree in the Figure, if a gene known not to have been 
subject to horizontal transfer is found in organisms in groups A and C, but not B ,  and 
if the grouping (AB)C is correct, we can argue from parsimony that the trait was lost 
from group B, and that it can be traced back to the LUCA. If the trait were in C only, 
or in A and B but not C, parsimony cannot be used, so the tree cannot be used to 
determine whether the trait dates back to the LUCA. 
In concluding the introduction, the main point I will be arguing with regard to 
reconstructing the LUCA is that the framework of the RNA world hypothesis 
provides one way of establishing some events in early evolution, and with greater 
certainty than searching for patterns in genomic data. This approach provides hard 
data on the metabolic make-up of the LUCA, and leads to testable hypotheses 
(described in the section on future work). However it cannot replace phylogenetic 
approaches for classifying taxa. It cannot even examine the question of the 
monophyly of the prokaryotes. Indeed, as described in Paper 5, if eukaryotes and 
archaea do turn out to be monophyletic, this does not affect the conclusion that the 
LUCA possessed some eukaryote-like features. Rather, it highlights how 
uninformative the root is - contrary to the interpretation that many non­
phylogeneticists have, the outgroup is not indicative of the LUCA, and the direction 
of evolutionary change cannot be inferred solely from the topology. 
What the approach in this thesis does allow is a hypothesis-driven approach to 
understanding eukaryote and prokaryote evolution. It provides continuity between the 
RNA world, the LUCA, and the subsequent divergence of the three domains. 
Furthermore, it makes a significant shift away from the preconception that 
prokaryotes predate eukaryotes by establishing important factors that influence 
evolution in extant prokaryotes and eukaryotes [Paper 7]. This provides an insight 
into evolutionary processes and establishes how the process of natural selection has 
operated in the evolution of prokaryotes and eukaryotes. Such insight cannot be 
established through phylogenetic analyses or comparative genomics alone. 
References. 
Andersson JO, Andersson SGE: Genome degradation is an ongoing process in 
Rickettsia. Mol BioI Evol 1 999, 1 6, 1 178- 1 1 9 1 .  
Andersson SGE, Kurland CG: Reductive evolution in resident genomes. Trends 
Microbiol 1 998, 6, 263-268. 
Angert ER, Clements KD, Pace NR: The largest bacterium. Nature 1 993, 362, 239-
241 .  
Aravind L, Tatusov RL, Wolf YI, Walker DR, Koonin EV: Evidence for massive 
gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet 
1 998, 14 , 442-444. 
1 8  
Bada JL, Bigham C, Miller SL: Impact melting of a frozen ocean on the early earth 
and the implication for the origin of life. Proc Natl Acad Sci USA 1 994, 9 1 ,  1 248-
1 250. 
Baldauf SL, Roger AJ, Wenk-Siefert I ,  Doolittle WF: A kingdom-level phylogeny of 
eukaryotes based on combined protein data. Science 2000, 290, 972-977. 
Benner SA, ElIington AD, Tauer A: Modem metabolism as a palimpsest of the RNA 
world. Proc Natl Acad Sci USA 1989, 86, 7054-7058. 
Berg OG, Kurland CG: Why mitochondrial genes are most often found in nuclei. Mol 
BioI Evo1 2000, 17, 95 1 -96l .  
Becerra A, Islas S ,  Leguina JI, Silva E, Lazcano A: Polyphyletic gene losses can bias 
backtrack characterizations of the cenancestor. J Mol Evol 1 997, 45, 1 1 5- 1 17. 
Brinkmann H, Philippe H: Archaea sister-group of Bacteria? Indications from tree 
reconstruction artifacts in ancient phylogenies. Mol BioI Evo1 1 999, 1 6, 8 17-825. 
Brocks JJ, Logan GA, Buick R, Summons RE: Archaean molecular fossils and the 
early rise of eukaryotes. Science 1 999, 285, 1 033- 1 036. 
Castresana J, Moreira D: Respiratory chains in the Last Common Ancestor of living 
organisms. J Mol EvoI 1 999, 49, 453-460. 
Castresana J, Saraste M: Evolution of energetic metabolism: the respiration-early 
hypothesis. Trends Biochem Sci 1 995, 20, 443-448. 
Daniel RM, Cowan DA: Biomolecular stability and life at high temperatures .  Cell 
Mol Life Sci 2000, 57, 250-264. 
de la Cruz F, Davies J: Horizontal gene transfer and the origin of species: lessons 
from bacteria. Trends Microbiol 2000, 8, 1 28- 1 33 .  
Doolittle WF: You are what you eat: a gene transfer ratchet could account for 
bacterial genes in eukaryotic nuclear genomes. Trends Genet 1 998, 14, 307-3 1 1 . 
Doolittle WF: Phylogenetic classification and the universal tree. Science 1 999, 284, 
2 1 24-2 1 28. 
Fishelson L, Montgomery WL, Myrberg Jr AA: A unique symbiosis in the gut of 
tropical herbivorous surgeonfish (Acanthuridae: Teleostei) from the Red Sea. 
Science 1 985, 229, 49-5 l .  
Fitch WM: Rate of change of concomitantly variable codons. J Mol Evol 1 97 1 ,  1 ,  84-
96. 
Fitch WM, Markowitz E: An improved method for determining codon variability in a 
gene and its application to the rate of fixation of mutations in evolution. Biochem 
Gen 1 970, 4, 579-593. 
Forterre P: Thermoreduction, a hypothesis for the origin of prokaryotes. CR Acad Sci 
In 1 995a, 3 18, 4 1 5-422. 
Forterre P: Looking for the most "primitive" organism(s) on Earth today: the state of 
the art. Planet Space Sci 1 995b, 43,  1 67- 1 77. 
FOlterre P: Archaea: what can we learn from their sequences? Curr Opin Genet Dev 
1 997a, 7, 764-770. 
1 9  
Forterre P: Protein versus rRNA: problems in rooting the universal tree of life. ASM 
News 1 997b, 63 , 89-95 . 
Forterre P: Displacement of cellular proteins by cellular analogues from plasmids or 
viruses could explain puzzling phylogenies of many DNA informational proteins. 
Mol Microbiol 1 999, 33, 457-465. 
Forterre P, Bouthier De La Tour C, Philippe H,  Duguet M:  Reverse gyrase from 
hyperthermophiles: probable transfer of a thermoadaptation trait from archaea to 
bacteria. Trends Genet 2000, 1 6, 1 52- 1 54. 
Forterre P,  Philippe H:  Where is the root of the universal tree of life?  BioEssays 1 999, 
2 1 ,  87 1 -879. 
Fraser CM, Gocayne JD, White 0, Adams MD, Clayton RA, Fleischmann RD, BuIt 
CJ, Kerlavage AR, Sutton G, Kelley JM, Fritchman JL, Weidman JF, Small KV, 
Sandusky M, Fuhrmann J, Nguyen D, Utterback TR, Saudek DM, Phillips CA, 
Merrick JM, Tomb J-F, Dougherty BA, Bott KF, Hu P-C, Lucier TS, Peters on SN, 
Smith HO, Hutchison III CA, Venter JC: The minimal gene complement of 
Mycoplasma genitalium. Science 1 995, 270, 397-403 
Galtier N, Tourasse N, Gouy M: A nonhyperthermophilic common ancestor to extant 
life forms. Science 1 999, 283, 220-22 1 .  
Gibson G:  Evolution: Hox genes and the cellared wine principle. Curr BioI 2000, 1 0, 
R452-R455. 
Gilbert W: The RNA world. Nature 1986, 3 19, 6 1 8 . 
Glansdorff N :  About the last common ancestor, the universal life-tree and lateral gene 
transfer: a reappraisal. Mol Microbiol 2000, 38, 1 77- 1 85 . 
Gogarten JP, Kibak H, Dittrich P, Taiz L, Bowman El, Bowman BI, Manolson MF, 
PooIe RJ, Date T, Oshima T, Konishi J, Denda K, Yoshida M: Evolution of the 
vacuolar H+-ATPase: implications for the origin of eukaryotes. Proc Natl Acad Sci 
USA 1 989, 86, 666 1 -6665. 
Gould SJ, Lewontin RC: The spandrels of San Marco and the Panglossian paradigm: 
a critique of the adaptationist program. Proc R Soc Lond B 1 979, 205, 5 8 1 -598. 
Grbic M:  "Alien" wasps and evolution of development. BioEssays 2000, 22, 920-932. 
Grotzinger JP, Rothman DH: An abiotic model for stromatolite morphogenesis. 
Nature 1996, 383 , 423-425 .  
Hendy MD, Penny D: A framework for the quantitative study of evolutionary trees. 
Syst. Zool. 1 989, 38 , 297-309. 
Hutchison III CA, Peterson SN, Gill SR, Cline RT, White 0, Fraser CM, Smith HO, 
Venter JC: Global transposon mutagenesis and a minimal mycoplasma genome. 
Science 1 999, 286, 2 165-2 169 .  
Iwabe N ,  Kuma K-I, Hasegawa M, Osawa S ,  Miyata T: Evolutionary relationship of 
archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of 
duplicated genes. Proc Natl Acad Sci USA 1 989, 86, 9355-9359. 
lain R, Rivera MC, Lake lA: Horizontal transfer among genomes: the complexity 
hypothesis. Proc Natl Acad Sci USA 1 999, 96, 3801 -3806. 
20 
Joyce GF, Orgel LE: Prospects for understanding the origin of the RNA world. In: 
Gesteland RF, Cech TR, Atkins JF eds. The RNA World. 2nd ed. Cold Spring 
Harbor Laboratory Press, New York, 1999, p49-77. 
Keeling PJ, McFadden GI: Origins of microsporidia. Trends Microbiol l 998,  6, 1 9-
23 .  
Lan R,  Reeves PR: Intra-species variation in bacterial genomes: the need for a species 
genome concept. Trends Microbio1 2000, 8, 396-40l .  
Lawrence J :  Selfish operons: the evolutionary impact of gene clustering in 
prokaryotes and eukaryotes. Curr Opin Genet Dev 1999, 9, 642-648. 
Lawrence JG, Ochman H: Molecular archaeology of the Escherichia coli genome. 
Proc Natl Acad Sci USA 1998, 95, 94 1 3-9417 .  
Lazcano A,  Miller SL: How long did i t  take for life to begin and evolve to 
cyanobacteria? J Mol Evol 1 994, 39, 546-554. 
Leipe DD, Aravind L, Koonin EV: Did DNA replication evolve twice independently? 
Nucleic Acids Res 1999, 27, 3389-340 1 .  
Levy M ,  Miller SL: The stability o f  the RNA bases: implications for the origin of life. 
Proc Natl Acad Sci USA 1 998, 95, 7933-7938. 
Lockhart PJ, Larkum AWD, Steel MA, Waddell PJ, Penny D: Evolution of 
chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence 
analysis. Proc NatI Acad Sci USA 1996, 93, 1930- 1934. 
Lockhart PJ, Steel MA, Barbrook AC, Huson DH, Howe CJ: A covariotide model 
describes the evolution of oxygenic photosynthesis. Mol Bioi Evol 1 998, 1 5, 1 1 83-
1 1 88. 
Lopez P, Forterre P, Philippe H:  The root of the tree of life in light of the covarion 
model. J Mol Evol 1 999, 49: 496-508. 
Lowe DR: Abiological origin of described stromatolites older than 3 .2 Ga. Geology 
1994, 22, 3 87-390. 
Martin W: Mosaic bacterial chromosomes: a challenge en route to a tree of genomes. 
Bioessays 1 999, 2 1 , 99- 104. 
Martin W, Stoebe B, Goremykin V, Hansmann S, Hasegawa M, Kowallik KV: Gene 
transfer to the nucleus and the evolution of chloroplasts . Nature 1 998, 393, 162- 1 65 .  
McFadden GI: Endosymbiosis and evolution of the plant cell. Curr Opin Plant BioI 
1 999, 2, 5 1 3-5 1 9. 
Miller SL, Bada JL: Submarine hot springs and the origin of life. Nature 1 998, 334, 
609-6 1 1 .  
Mojzsis SJ, Arrhenius G, McKeegan KD, Harrison TM, Nutman AP, Friend CR: 
Evidence for life on Earth 3800 million years ago. Nature 1 996, 384, 55-59. 
[Erratum: Nature 1 997, 386, 665] 
Morowitz HJ, Kostelnik JD, Yang J, Cody GD: The origin of intermediary 
metabolism. Proc Natl Acad Sci USA 2000, 97, 7704-7708. 
Moulton V ,  Gardner PP, Pointon RF, Creamer LK, Jameson GB, Penny D: RNA 
folding argues against a hot-start origin of life. J MolEvoI 2000, 5 1 ,  4 1 6-42 1 .  
2 1  
Mushegian A: The minimal genome concept. CUIT. Opin. Genet. Dev. 1 999, 9, 709-
7 14. 
Mushegian AR, Koonin EV: A minimal gene set for cellular life derived by 
comparison of complete bacterial genomes.  Proc Natl Acad Sci USA 1 996, 93 ,  
1 0268- 1 0273. 
Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DH, Hickey EK, 
Peterson JD, Nelson WC, Ketchum KA, McDonald L, Utterback TR, Malek JA, 
Linher KD, Garrett MM, Stewart AM, Cotton MD, Pratt MS, Phillips CA, 
Richardson D, Heidelberg J, Sutton GG, Fleischmann RD, Eisen JA, Whilte 0, 
Salzberg SL, Smith HO, Venter JC, Fraser CM: Evidence for lateral gene transfer 
between Archaea and Bacteria from genome sequence of Thennotoga maritima. 
Nature 1 999, 399, 323-329. 
Nelson KE, Levy M, Miller SL: Peptide nucleic acids rather than RNA may have 
been the first genetic molecule. Proc Natl Acad Sci USA 2000, 97, 3868-387 1 .  
Nisbet EG, Sleep NH: The habitat and nature of early life .  Nature 200 1 ,  409, 1 083-
109 1 .  
Ochman H ,  Lawrence JG, Groisman EA: Lateral gene transfer and the nature of 
bacterial innovation. Nature 2000, 405, 299-304. 
Ohmoto H: Evidence in pre - 2 .2 Ga paleosols for the early evolution of atmospheric 
oxygen and terrestrial biota. Geology 1 996 24( 1 2) 1 135-9 
Penny D,  Foulds LR, Hendy MD: Testing the theory of evolution by comparing 
phylogenetic trees constructed from five different protein sequences. Nature 1 982, 
297, 1 97-200. 
Penny D, McComish BJ, Charles ton MA, Hendy MD: Mathematical elegance with 
biochemical realism: the covarion model of molecular evolution J Mol Evo1 200 1 ,  
(in press). 
Philippe H, Forterre P: The rooting of the tree of life is not reliable. J Mol Evol 1 999, 
49, 509-523. 
Philippe H, Laurent J :  How good are deep phylogenetic trees? CUff Opin Genet Dev 
1 998, 8 , 6 1 6-623 .  
Philippe H,  Budin K,  Moreira D:  Horizontal transfers confuse the prokaryotic 
phylogeny based on the HSP70 protein family. Mol Microbiol l 999, 3 1 ,  1 007- 1 009. 
Poole A, Penny D, Sjoberg B-M: Methyl-RNA: an evolutionary bridge between RNA 
and DNA? Chem BioI 2000, 7, R207-R2 1 6. 
Schopf JW, Packer BM: Early Archean (3.3 billion to 3 .5 billion year old) 
microfossils from Warrawoona Group, Australia. Science 1 987, 237, 70-73 .  
Shapiro R: Prebiotic cytosine synthesis: a critical analysis and implications for the 
origin of life. Proc Natl Acad Sci USA 1 999, 96, 4396-4401 .  
Sicheritz-Ponten T, Andersson SGE: A phylogenomic approach to microbial 
evolution. Nucleic Acids Res 200 1 ,  29, 545-552. 
Snel B ,  Bork P, Huynen MA: Genome phylogeny based on gene content. Nat Genet 
1 999, 2 1 ,  1 08- 1 10. 
22 
Springer MS, Teeling EC, Madsen 0, Stanhope MJ, de Jong WW: Integrated fossil 
and molecular data reconstruct bat echolocation. Proc Natl Acad Sci USA 200 1 , 98, 
624 1 -6246. 
Stiller JW, Hall BD: Long-branch attraction and the rDNA model of early eukaryotie 
evolution. Mol BioI Evol 1999, 1 6, 1 270- 1279. 
Summons RE, Janhke LL, Hope JM, Logan GA: 2-methylhopanoids as biomarkers 
for cyanobacterial oxygenic photosynthesis. Nature 1 999, 400, 554-557. 
Szathmary E, Maynard Smith J: The major evolutionary transitions. Nature 1 995, 
374, 227-232.  
Teichmann SA, Mitchison G: Is there a phylogenetic signal in prokaryote proteins? J 
Mol Evol 1 999, 49, 98- 107. 
Wachtershauser G: Groundworks for an evolutionary biochemistry: the iron-sulphur 
world. Prog Biophys Mol BioI 1 992, 58, 85-20 1 .  
Walsh MM: Microfossils and possible microfossils from the Early Archean 
Onverwacht Group, Barberton Mountain Land, South Africa. Precambrian Res 
1 992, 54, 27 1 -293. 
Woese CR: Bacterial evolution. Microbiol Rev 1987, 5 1 , 22 1 -27 1 .  
Woese CR: The universal ancestor. Proe Natl Acad Sei USA 1 998, 95, 6854-6859. 
Woese CR: Interpreting the universal phylogenetie tree. Proe Natl Aead Sei USA 
2000, 97, 8392-8396. 
Woese CR, Fox GE: Phylogenetic structure of the prokaryotic domain: the primary 
kingdoms. Proc Natl Aead Sei USA 1 977, 74, 5088-5090. 
Woese CR, Kandler 0, Wheelis ML: Towards a natural system of organisms: 
proposal for the domains Arehaea, Bacteria, and Eukarya. Proe Natl Aead Sei USA 
1 990, 87 :4576-4579. 
Wren BW: Microbial genome analysis: insights into virulence, host adaptation and 
evolution. Nat Rev Genet 2000, 1 , 30-39. 
23 
leffares DC, Poole AM & Penny D. 
Relics from the RNA world. 
Journal of Molecular Evolution 46, 1 8-36 ( 1998). 
Paper 1 
Reprinted with permission from Springer-Verlag New York Inc. 

Poole AM, J effares DC & Penny D. 
The path from the RN A world. 
Journal of Molecular Evolution 46, 1 - 1 7  ( 1998). 
Paper 2 
Reprinted with permission from Springer-Verlag New York Inc. 

RN A evolution: separating the new from the old. 
Manuscript. 
Paper 3 

RNA evolution: separating the new from the old. 
Abstract. 
The existence of an RNA world, an RNA-rich period in the early evolution of 
life, is widely accepted, as is the idea that many cellular RNAs can be traced back to 
this period. However, while some RNAs may derive from the very earliest stages of 
life ,  others have arisen comparatively recently in evolution. A further difficulty is that 
some RNAs may have arisen early in evolution, but may have changed their role 
during evolution. It is therefore useful to distinguish between the 'ultimate' origin of 
an RNA and a 'proximate' origin, where it evolved into its present function. A number 
of RNAs have not been unequivocally placed as 'new' or 'old', including group I & II 
introns, snRNAs, tmRNA and snoRNAs. In this article, we examine how RNA world 
'relics' might be distinguished from RNAs with a more recent origin, why there are 
problems or controversies in establishing the evolutionary origins of some RNAs, and 
whether it is possible to resolve these. 
Introduction. 
In eukaryotes it is well-established that RNA is central to a number of 
molecular processes, including protein synthesis, mRNA editing and splicing, rRNA 
and tRNA processing and telomere replication. Some of these RNAs are also found in 
archaea and eubacteria, though in general it appears that RNA plays a less prominent 
role in metabolism in these organisms (Wassarman et aI. ,  1 999). Indeed, this 
differential use of RNA is claimed to be a fundamental one, and may be the basis for 
very different evolutionary mechanisms employed in the diversification of 
prokaryotes and eukaryotes (Herbert & Rich, 1 999a,b). 
It is generally accepted that many RNAs are evolutionarily very ancient. The 
RNA world hypothesis (Gilbert, 1986) is that, prior to the advent of genetically­
encoded proteins and DNA, RNA was both genetic material and major biological 
catalyst. With the advent of protein synthesis, and later, ribonucleotide reduction, 
RNA is believed to have gradually lost its central role as catalyst and information 
storage molecule. Those few RNAs that remain in modem metabolism are widely 
considered to be 'relics' from the RNA world period (Benner et aI. ,  1 989; Jeffares et 
al . ,  1 998).  However, with the number of novel RNAs growing, it is clear that many 
RNAs may have arisen more recently in evolution to fulfill specific functions and do 
not date back to the RNA world period (Eddy, 1 999). 
In this article, we briefly review the current state of the RNA world hypothesis 
insofar as it allows us to distinguish between RNAs that are likely to be ancient in 
origin and those which are more recent. We define 'ancient' as prior to the emergence 
of the three domains, archaea, bacteria and eukaryotes, that is pre-Last Universal 
Common Ancestor (pre-LUCA), and 'recent' as post-LUCA. A broad survey of RNAs 
that are probably recent innovations suggests that RNA is a potent source of novel 
Page 1 
function in eukaryotes. In addition, we will focus on those RNAs where the 
evolutionary origins are a current source of controversy. 
Central to this problem is the need to distinguish between 'ultimate' origins 
and 'proximate' origins, thereby providing a distinction between the origin of a given 
RNA and the role it currently plays in modern metabolism. This distinction is in effect 
the same as the same as the use of the terms paralogous and orthologous in 
descriptions the evolutionary history of gene families. Orthologous genes have arisen 
through from a single ancestral gene through duplication and divergence and 
maintained the same function over time. Paralogous genes have also arisen from a 
single ancestral gene through duplication and divergence but now perform different 
functions .  An example of orthologous RNA genes are the RNase P genes from E. coli 
and yeast. An example of para logo us RNA genes are RNase P and RNase MRP. In 
this paper, we are particularly interested in the latter, case. Where two related RNAs 
perform different functions, what is the ultimate origin of this family of RNA? 
What is old and what is new? 
We have previously suggested several criteria as an aid for drawing the line 
between relic RNAs and recently-evolved RNAs (Poole et aI. ,  1999) . These are: 
1 .  That the RNA is ubiquitous in distribution. 
2 .  That the RNA is central to metabolism. 
3 .  Whether proteins perform the function equally well in  other organisms. 
4. That the RNA is catalytic! . 
These criteria are helpful, but are not necessarily sufficient to give a reliable 
indication of the likely status for every RNA. Criterion 1 is the strongest argument for 
the RNA world ancestry of a given RNA, and one can assign relic status to a number 
of RNAs, on this criterion alone. Obvious examples are tRNA, rRNA, RNase P and 
srpRNA (4.5S  in bacteria, 7S in eukaryotes & archaea) (Jeffares et aI . ,  1998). In the 
case of criterion 2, where an RNA is not ubiquitous, one may argue for an RNA world 
origin on functional grounds. In this manner, Maizels and Weiner ( 1999) have argued 
for the antiquity of telomerase function, which is further supported by a strong 
selection pressure for the circularisation of chromosomes in the prokaryotes being a 
derived trait, and thus not present in the RNA or RNP (ribonucleoprotein) worlds 
(Forterre, 1 995).  In spite of the example of telomerase, arguing just from criterion 2 is 
difficult, since it is a matter of opinion as to what is central to metabolism. 
1 The term catalytic RNA is used either in a chemical sense or a functional sense. In the chemical 
sense, a catalytic RNA is one which can catalyse a chemical reaction without the aid of protein, that is, 
the RNA is necessary and sufficient for catalysis. In a functional sense, an RNA which is necessary but 
not sufficient for catalysis is still a catalytic RNA. Bacterial RNase P RNA is catalytic in both senses, 
but human RNase P RNA is only catalytic in the functional sense. 
Page 2 
The third criterion is of fundamental importance, and stems primarily from the 
argument that proteins are in general better catalysts than RNA (Jeffares et al. ,  1 998; 
Poole et al. ,  1 999) . This suggests that, given the general trend is replacement of 
catalytic RNA with protein during evolution, in cases where in one lineage a protein 
performs a function identical to that of RNA in another lineage, the RNA is ancestral. 
However, certain functions may simply be better-suited to RNA (a point to which we 
shall return), and hence, not all RNAs should be placed automatically in the RNA 
world (Eddy, 1 999). By itself, criterion 3 may be insufficient, but it is an important 
consideration, particularly where a function is argued to be central to metabolism. We 
consider several examples where criteria 2 and 3, combined, are important in 
assigning putative relic status. 
Criterion 4 is more complex than it appears, which may be somewhat 
surprising, given the importance that catalytic RNA studies have played in the 
development of the RNA world hypothesis. Distinguishing between functional and 
chemical definitions for catalysis is helpful however. We will argue here that all 
RNAs defined as functionally catalytic but very few RNAs defined as chemically 
catalytic are direct descendents from the RNA world (see Table), though the latter are 
nevertheless important exemplars of RNA world complexity. 
RNA as a source of novel function 
As the RNA universe expands, it is  becoming clear that RNA is more than just 
a relic from early evolution. 'New' RNAs in many cases can be readily picked out 
simply because the role they play is highly specialised and their phylogenetic 
distribution is very limited, indicating recent origins. It seems likely that the growing 
list of newly discovered RNAs (Table) is but the tip of the iceberg, especially given 
that current genomic search strategies (e.g. BLAST) do not perform well for RNA 
families, which in general retain very little primary sequence information (e.g. Ganot 
et aL, 1 997a; Lowe & Eddy, 1 999; CoIlins et aI. ,  2000). Likewise, large-scale 
identification techniques such as those possible with EST databases are biased against 
detection of noncoding RNAs (Eddy, 1999, though see Htittenhofer et al. ,  2001 ) .  
Recent reviews (Eddy, 1 999; Wassarman e t  al. ,  1 999; Erdmann e t  aL, 200 1 )  
cover much of the developments in RNA identification (for summary and relevant 
references from the literature, see Table), so we limit ourselves to a number of 
examples where it might be argued that RNA is inherently better suited to certain 
roles than protein. Furthermore, we consider briefly how RNA impacts on the 
evolvability of organisms. 
RNA editing in kinetoplastids of trypanosomes. 
RNA editing, whereby the sequence of a transcript is changed prior to 
translation, is widespread, and occurs via widely different mechanisms. The 
Page 3 
mechanisms appear unrelated and have limited distribution (Smith et al. ,  1 997). RNA 
editing is particularly prevalent in organelles, and the best explanation for this is that 
editing is a response to mutational pressures from the operation of Muller's Ratchet in 
organellar genomes (Bomer et aI. ,  1997). Muller's Ratchet is the slow accumulation of 
slightly deleterious mutations in the absence of recombination (reviewed in 
Andersson & Kurland, 1 998; Blanchard & Lynch, 2000). The largest number of 
editing events observed in a single organelle is in kinetoplastids of trypanosomes, 
where uridine insertion and deletion occurs in about 12 of 18 mRNA transcripts, 
creating start codons, frameshift corrections, and even entire open reading frames 
(Estevez & Simpson, 1999) . As well as being the most extensive form of transcript 
editing, it is also the only form where RNA guides are involved. 
The information for transcript editing is housed on separate minicircles in the 
form of guide RNA genes. Depending on the organism (see Simpson et aI. ,  2000) 
there are approximately 50 maxi circles which house the mitochondrial genes, and 
>1000 guide RNA coding minicircles. Given that editing in general (Bomer et aI. ,  
1 997),  the breaking of a single chromosome into several smaller pieces (Reanney, 
1 986) and mutational buffering through presence of mUltiple copies, are all expected 
to slow the loss of genetic information through Muller's Ratchet, and given the limited 
phylogenetic distribution of guide RNA-mediated uridine insertion/deletion editing 
(Simpson et aI. ,  2000), this is extremely unlikely to date back to the RNA world. 
Covello and Gray ( 1 993) have introduced a three-step model for the evolution 
of RNA editing in general, and kinetoplastid RNA editing, the latter having been 
extended by Stoltzfus ( 1999). In kinetoplastid editing (and editing in general) it is not 
necessary for there to be a selective advantage for fixation of editing. It may simply 
arise through suitable preconditions . Stolzfus ( 1 999) points out that recruitment of the 
editing machinery can be explained by tinkering, since it involves enzymes that are 
known in other functions. Furthermore, multiple genome copies will slow Muller's 
Ratchet, and redundancy can result in the accumulation and tolerance of variance 
between copies . Thus the emergence of a mutation (that can be neutral, slightly 
deleterious or lethal with only a single copy of the genome) in one copy of a given 
gene will always be neutral. Likewise, expression of an anti sense transcript from 
another unaltered copy of the gene, which can bind to the mRNA produced from the 
mutant gene copy, has no fitness effect. Such potential precursors may arise and 
subsequently disappear through drift, and the same is expected for an interaction that 
is edited by chance. While the genotypes may differ, the phenotype for edited and 
unedited versions is identical , and under a neutral or even slightly deleterious model 
(i.e. Muller's Ratchet), both can become fixed. 
As fixation at more sites occurs, while variation in the position of editing will 
be stochastic (for editing events where the change is neutral), the probability that all 
revert through back mutation is extremely low. Moreover, at functionally important 
sites, editing becomes maintained by natural selection (Covello & Gray 1 993). This is 
because some editing events have become essential for production of the protein 
Page 4 
product. Loss of a key editing enzyme, which would affect all edited sites, would thus 
be lethal and selected against. 
Strong evidence for the continuing role of neutral processes and drift in guide 
RNA-mediated editing includes the presence of multiple copies of both minicircles 
and maxicircles and large size variation for both minicircles and maxicircles across a 
range of organisms, large variability in minicircle copy number within strains over 
time and between species, presence of guide RNA genes on both minicircles and 
maxicircles, and existence of variant guide RNAs with mismatches in the guide 
regions (Simpson et aI. ,  2000). 
In summary, the suggestion that the effect of Muller's Ratchet on organellar 
genomes resulted in the independent evolution of unrelated forms of RNA editing in 
eukaryotic organelIes (Borner et al. ,  1 997), provides a strong precedent for 
considering uridine insertion/deletion editing to be a recently-evolved trait, and not an 
RNA world relic. It also underpins the evolutionary utility of RNA-where a class of 
RNA is limited in phylogenetic distribution and acts as a guide, it may be a recently­
evolved trait. 
RNA as a 'riboregulator'. 
Riboregulators are RNAs that act to regulate gene expression, usually through 
base-pairing, and, as such, are expected to evolve readily. A number of well­
understood examples are known, and a long list of possibles are currently under 
investigation (Erdmann et aI. ,  2001 ). A number of these RNAs are included in Table 
1 ,  and an exciting finding is that 'riboregulation' is not limited to mRNA binding (as 
with lin-4 and let-7 anti sense RNAs from C. elegans) . It may also occur through other 
processes, such as RNA-protein interactions, as exemplified by CsrB RNA inhibition 
of CsrA protein activity in E. coli (Romeo, 1 998), and meiRNA interaction with mei2 
protein in regulation of meiosis in S. pombe (Watanabe & Yamamoto, 1 994). 
Another exciting prospect for the 'modern RNA world' is that unrelated RNAs 
have appeared in nearly identical functions, where either these functions are known to 
have evolved more than once, or where the evolutionary origins of the recruited 
RNAs can be discerned. For instance, BC l and BC200 are RNAs with similar 
functions, the former having been identified in rodents (Muslimov et aI. ,  1 998), the 
latter being found in primates (Skryabin et al. ,  1 998). Both appear to have a role in 
translation regulation in dendrites, and both apparently bind the same protein 
(Kremerskothen et al. ,  1998; Brosius, 1 999). While convergence of function has yet to 
be conclusively demonstrated, their evolutionary origins are clear; BC1 appears to 
have been recruited from tRNAA1a, while BC200 was originally an Alu element, a type 
of transposable element derived from eukaryotic srpRNA (Brosius, 1 999). Given that 
searches have so far not yielded other such functionally analogous RNAs within 
mammals, yet the proteins known to make up the BC IIBC200 are conserved (Brosius, 
1 999), it will be interesting to see if there is evidence for non-orthologous 
replacement by onelboth RNAs. Is RNA is inherently better suited to certain 
Page 5 
functions, being selected for over and over again for the same class of function? To 
this question we shall return. 
RNAs in dosage compensation. 
An even more dramatic example of functional convergence is emerging in 
studies of dosage compensation. In organisms with sex chromosomes, the number of 
sex chromosomes is unequal between the sexes. In Drosophila and mammals ,  males 
are XY, and females are Xx. The unequal number of Xs means that gene expression 
from the X differs between the sexes, and there are mechanisms which compensate 
for this.  In Drosophila, dosage is turned up in males, making expression from their 
single X equivalent to the two X chromosomes in females. In mammals, one X is 
inactivated in females, so expression is halved, making it equivalent to the single X 
carried by males . Furthermore, C. elegans takes a third strategy; expression from both 
copies of the X in hermaphrodites is halved relative to males (which are XY). Given 
multiple solutions to this problem, it is clear that mechanisms for dosage 
compensation have evolved more than once (Pannuti & Lucchesi, 2000; Marin et aI. ,  
2000). 
In mammals and flies not only are the mechanisms of dosage compensation 
unrelated, they both make use of RNA for marking the X for either inactivation or 
upregulation, respectively (Kelley & Kuroda, 2000). The RNAs (roXl & roX2 in 
Drosophila, and Xist, which is regulated by an anti sense RNA, Tsix in human) are 
unrelated, yet provide an analogous function-in both systems, RNA is thought to 
facilitate interaction at numerous points along the length of the target X chromosome, 
and the RNA genes are themselves to be found on the X chromosome. Importantly, 
the systems must operate via different mechanisms; in mammals, only one female X 
is inactivated, and it is therefore unsurprising to find that the mode of inactivation is 
via some mechanism that occurs exclusively in cis. In flies, there is no such 
requirement, as might be expected, given that dosage compensation is through 
upregulation of the single male X. 
While it is still unclear how RNA is involved in these systems, it is intriguing 
that RNA has apparently been independently recruited to an analogous function on 
separate occasions. How does dosage compensation in C. eZegans operate? Does this 
likewise require RNA, and indeed, in other organisms such as birds and reptiles, 
where sex chromosomes are different again, is dosage compensation also an RNA­
dependent process? 
Unclear origins of tmRNA 
In bacteria, it is well established that release from ribosomal stalling on 
damaged mRNA is an RNA-mediated process. tmRNA, so called because of its dual 
role as tRNA and mRNA, allows a stalled ribosome to be uncoupled from the mRNA 
upon which it is stalled by virtue of the tRNA moiety of tmRNA, which is charged 
with alanine. The tRNA moiety accesses the A site of the ribosome and the alanine 
with which it is charged is then added to the partially-synthesised peptide. Next, the 
Page 6 
ribosome switches template by virtue of a conformational change in the tmRNA, and 
the ribosome uses the tmRNA as a template. The tmRNA encodes a string of alanines, 
of length 10,  that labels the damaged peptide for degradation, and the ribosome is 
released (Keiler et aI . ,  1 996) . 
So far, this process has only been identified in bacteria where, it appears 
ubiquitous (Keiler et al. ,  1999). Given the dual role of the tmRNA as both tRNA and 
mRNA, it might be considered a candidate for the RNA world. Indeed Maizels and 
Weiner ( 1999) have speculated that such an RNA could have been the RNA world 
counterpart of initiator tRNA in contemporary translation. However, it is equally 
likely that this is a recent innovation (i.e. post LUCA) specific to the bacterial lineage. 
In eukaryotes, only mRNAs that possess a 5' cap structure and poly A tail pass a 
prerequisite quality control check before translation (Ibba & SolI, 1 999). Damaged 
mRNAs are degraded via a nonsense-mediated decay pathway (Culbertson, 1 999), 
reducing the production of truncated proteins during translation. 
There is clearly selection for release of stalled ribosomes and tagging of 
damaged peptide for protein degradation in a sophisticated protein synthetic 
machinery, and a scenario for RNA world origins such as that suggested by Maizels 
and Weiner ( 1 999) is difficult to test. What will be tractable is extending the search 
for tmRNA to eukaryotes and archaea. Indeed, even with quality control in eukaryote 
translation, mRNA may occasionally be damaged during translation, so it is possible 
that eukaryotes possess tmRNA. A more extensive search will thus aid in establishing 
whether tmRNA may have been a feature of the LUCA. Certainly, given the ubiquity 
of the cellular protein degradation apparatus, the proteasome (Baumeister et aI. ,  1 998; 
Bouzat et aI. ,  2000), and the fact that search strategies for tmRNA identification have 
not yet been fully applied to eukaryotes and archaea, it will be interesting to see if 
stalled ribosome release occurs via a similar mechanism in these lineages. 
Many naturally-occurring catalytic RNAs are not RNA world relics. 
As we have already seen, not all criteria need necessarily apply for an RNA to 
be designated a relic, and for all but the first, the application of the criterion may not 
in itself provide sufficient information for the status of relic to be assigned. Criterion 
4 is whether or not an RNA is catalytic. The RNA world hypothesis states that RNA 
catalysts pre-dated proteins in the evolution of catalysis, and the idea has been 
extended to a two-step transition, RNA-7RNP-7protein, that more accurately 
explains the process by which an RNA is replaced by a catalytic protein, and 
identifies catalytic perfection as central to understanding how come there are any 
ribozymes remaining at all (Jeffares et al. ,  1 998; Poole et aI. ,  1 999). 
The term catalytic RNA is most often used in a chemical sense, that is, a 
naked RNA that is capable of catalysis without cognate proteins. This definition 
excludes the peptidyl transferase activity of large subunit ribosomal RNA, eukaryotic 
RNase P,  and spliceosomal snRNA. All are nevertheless putative RNA world relics, 
and in all cases, the RNA component is absolutely required for catalysis (Noller et al. ,  
Page 7 
1 992; Muth et al. 2000; Nissen et al. 2000; Kirsebom & Altman, 1 999; Yean et al . ,  
2000; Nilsen, 2000). 
Surprisingly, the sole case where a catalytic RNA (in the chemical sense of 
being necessary and sufficient to carry out catalysis) can unequivocally be placed in 
the RNA world is that of RNase P. This has been found in all organisms examined to 
date, and is universally required for tRNA maturation. Bacterial RNase P has several 
additional substrates ,  including srpRNA (4.5S RNA) and tmRNA precursors 
(Kirsebom & Altman, 1 999), and hence can be claimed under criteria 1 and 2 also. 
The related RNase MRP, which is involved in pre-rRNA processing in eukaryotes, is  
more limited in distribution, and its evolutionary origins are less clear. In considering 
a possible RNA world origin for RNase MRP, perhaps the most important piece of 
evidence is the position at which RNase MRP cleaves pre-rRNA in eukaryotes 
(Morrissey and Tollervey, 1995 ; Venema & Tollervey, 2000)-the A3 site in 
eukaryotic pre-rRNA is at an equivalent position to a tRNA found in archaeal and 
bacterial pre-rRNAs, and Morrissey and Tollervey ( 1 995) have argued that the tRNA 
has been lost from the eukaryote pre-rRNA, while cleavage at this site has been 
maintained. Furthermore, that RNase P is ubiquitous while RNase MRP has only been 
found in eukaryotes, suggests that MRP is derived from P by duplication and 
divergence, and bolsters the claim that the original state was tRNA processing from 
within pre-rRNA. While MRP may post-date the LUCA, its function in pre-rRNA 
processing is effectively one in the same as P in prokaryotic pre-rRNA processing. 
As far as the additional substrates of bacterial RNase P are concerned, it is 
currently hard to establish the antiquity of these. While srpRNA is ubiquitous, the 
eukaryote and archaeal versions srpRNAs (7S RNAs), are not known to be processed 
by RNase P, and tmRNA is only known in bacteria, and, as described above, its status 
as an RNA world relic is uncertain. Certainly there is a precedent for post-RNA world 
functional diversification, as E. coli RNase P is also known to process phage RNAs 
and the polycistronic his operon mRNA (AItman & Kirsebom, 1 999). 
Another example which may clarify the discussion is the finding that there are 
two spliceosomes in metazoans (Tarn & Steitz, 1 997 ; Burge et al. ,  1 999). Both have 
the same origin, but the minor variant arguably arose more recently, through 
duplication and divergence. The function of both is identical (both excise introns from 
pre-mRNA, though the class of introns recognised is different), but one probably has 
a more recent origin (Burge et al. ,  1 999) so in the strictest sense is not a relic, even 
though splicing in general arguably originated in the RNA world (see next section) . In 
the case of RNases P and MRP, a more recent duplication and divergence event for 
these is possible, assuming RNase P carried out both functions initially (Morrissey & 
Tollervey, 1 995). 
These examples serve to point out that in some cases, it may difficult to 
separate the ultimate origin from the proximate origin. This is similar to the problem 
of trying to establish the ultimate origin of a family of proteins which carry out a 
range of functions. Where the function of an RNA has remained essentially 
Page 8 
unchanged since the RNA world, it is possible to identify the ultimate origin. In the 
case of MRP, the function it carries out is arguably ancient, but the origin of MRP 
itself cannot be unequivocally linked with this function, hence, it is unclear whether it 
should be assigned relic status. Morrissey and Tollervey's ( 1995) model best fits the 
data, though other scenarios can be envisaged (CoIlins et al . ,  2000). 
Other naturally-occurring ribozymes, including the hammerhead, hairpin, 
hepatitis delta virus and neurospora VS ribozymes (Table, Symons, 1 997; Carola & 
Eckstein, 1 999) are examples of recently-evolved catalytic RNAs, since these are 
used in novel strategies for viral or plasmid (Neurospora VS ribozyme and 
Salamander hammerhead-like RNA) genome replication. It has been argued recently 
that all these ribozymes have a common origin (Harris & Elder, 2000), but even if this 
is the case, this does not require that they originated in the RNA world. That said, 
these ribozymes demonstrate a potential mechanism for genome replication, as well as 
contributing to the reconstruction of a putative RNA world. The HDV ribozyme is a 
particularly salient example, since it has been shown to carry out self-cleavage 
through general acid-base catalysis (Perrotta et aI. ,  1 999; Nakano et al. ,  2000), as 
opposed to metal ion catalysis (Westhof, 1 999). Likewise, the hairpin ribozyme may 
also make use of general acid-base catalysis (Rupert & Ferre-D'Amare, 200 1 ), and 
excitingly, this is also the case for the peptidyl transferase subunit of the ribosome 
(Muth et aI. ,  2000). The similarity to the catalytic reaction carried out by peptidyl 
transferase certainly establishes the relevance of these viral RNAs to catalysis in the 
RNA world, but also raises the point that ribozymes could have arisen multiple times 
in evolution with similar chemistry. 
mRNA splicing and self-splicing introns. 
A less clear case is presented by the group I and II self-splicing introns 
(Table). Broadly, the phylogenetic distribution of these two ribozymes is bacteria and 
eukaryotic organelles (see Figure 4 in Lykke-Andersen et aI. ,  1 997; Cech & Golden, 
1 999) Group I introns make use of the 3'-OH of free guanosine as nucleophile in the 
first step of splicing, while in group II introns, the nucleophile is provided in eis, and 
consequently, this is a 2'-OH group. Splicing in both cases is via a two step 
transesterification. The spliceosome, a large ribonucleoprotein complex responsible 
for splicing out of introns from eukaryotic nuclear pre-mRNA, also makes use of an 
internaI 2'-OH for the first transesterification. At the core of the spliceosome are 5 
�mall lluclear snRNAs: U I ,  U2, U4, U5 and U6. 
A common origin of group IT introns and the spliceosome has been suggested 
by numerous authors (e.g. Sharp, 1 985, 199 1 ,  1 994; Cech, 1 986; Copertino & Hallick, 
1 993;  Stoltzfus 1 999). This possibility revolves around the idea that a group II intron 
evolved into a 5-piece RNA complex. This idea is gaining ground, with similarities in 
chemical mechanism of cleavage, structurally analogous regions and ligation by a 
two-step transesterification (Sharp, 1 985;  Cech, 1 986; Chanfreau & Jacquier 1994; 
Sontheimer et aI. ,  1 999; Gordon et al. 2000; Boudvillain et al. 2000; Yean et al. 
Page 9 
2000). Strikingly, Hetzer et al. ( 1 997) removed the ID3 subdomain of a group II 
intron, which reduced exon anchoring during ligation, and were able to reconsititute 
this by supplying US snRNA in trans. In addition to the direct comparisons between 
canonical group IT and spliceosomal splicing, the feasibility of a common origin has 
been given support from a number of sources. Formation of group II intron structure 
from three separate transcripts has been observed in Chlamydomonas reinhardii 
chloroplasts (Goldschmidt-Clermont et al. 1 99 1 ), demonstrating that trans-splicing 
can arise from cis-splicing, and that the proposal of fragmentation of a single 
functional RNA (as envisaged for the evolution of the spliceosome) is not without 
precedent. Group III introns, degenerate group II introns found as 'twintrons' (an 
intron within an intron) in Euglena chloroplast DNA, lack much of the canonical 
structure of group II introns, and probably require additional functions in trans for 
splicing (Copertino & Hallick 1 993) . Again, this has been considered as support for 
the possibility that the five snRNAs could have arisen from a single precursor. 
Moreover, Copertino et al. ( 1994) have described a group lIT twintron which excises 
via a lariat intermediate, analogous to the formation of a lariat in the excised 
spliceosomal introns. 
With so much circumstantial evidence, it seems likely that the spliceosomal 
RNAs and group II introns have a common origin. However, such similarities may 
either belie a common ancestry or they might be a result of convergence owing to 
'chemical determinism' (Weiner, 1993). Given that splicing always begins by 
nucleophilic attack of the phosphate-sugar backbone by a hydroxyl group on ribose, 
the different strategies used by group I and II introns (3'-OH of GTP supplied in trans 
versus 2'-OH of adenosine supplied in cis) might be the only two possible ways of 
initiating this reaction. That the spliceosome makes use of the same mechanism as 
group II introns could therefore be a consequence of 'chemical determinism' (and 
therefore convergence), not common origin (Weiner 1993) .  Indeed, in all three cases, 
splicing is carried out through two transesterifications. Chemical similarities and 
functional parallels provide an inroad into understanding the evolution of splicing, but 
given Weiner's ( 1993) point, they are not particularly informative in terms of 
distinguishing between convergence and divergence. Structural studies may help shed 
light on this question, in much the same way as this has resolved the question of 
whether the different classes of ribonucleotide reductase are convergent or divergent 
(Logan et al. ,  1 999) . 
If it is nevertheless concluded that the similarities between group II introns 
and pre-mRNA splicing are sufficient to rule out convergence (that there several 
examples of alternative cleavage reactions available to RNA (see Westhof, 1 999) in 
addition to those in group I and II introns might suggest this), how is the direction of 
evolution established? It is as conceivable that group II introns are derived from the 
snRNAs through fusion and reductive evolution as the possibility that snRNAs 
evolved from a group II intron. 
Page 10  
In examining the evolutionary origins of splicing, there are two major 
questions :  
• Does splicing date back the the RNA world? 
• Did group II introns give rise to the snRNAs of the eukarotic spliceosome, 
or vice versa? 
The short answer to first quesion is that an RNA world origin for splicing is likely, 
but the argument is over whether such splicing was group II-like, spliceosome-like, or 
both. In addressing the second question, it is assumed that group II and pre-mRNA 
splicing are related by descent. We begin with an overview of the first question, 
specifically with respect to the intron-exon structure of eukaryotic nuclear genes, 
since this has been the source of greatest controversy. 
Eukaryotic pre-mRNA splicing has been argued to be an ancient process from 
which protein diversification by ex on shuffling could have subsequently arisen (see 
Gilbert, 1 978; Doolittle, 1978;  Blake, 1978). It was argued that through the presence 
of splicing, discrete protein modules could have been mixed and matched, producing 
protein diversity from functional building blocks encoded by 'exon shuffling' . Indeed, 
shuffling is seen to some extent, in the form of processes such as alternative splicing, 
where an mRNA can be spliced in different ways to yield different products 
(reviewed by Graveley 200 1 ) . The implication of the 'introns-early' hypothesis for the 
origin of introns is that the eukaryote splicing apparatus and the intron-exon structure 
of genes arose very early in evolution, and were subsequently lost from prokaryote 
genomes. This explanation, while potentially explaining a role for splicing in protein 
diversification through exon shuffling, runs into two problems. First, it does not 
actually explain intron origins, rather, only a possible role for these in exon shuffling, 
after the advent of an intron-exon gene structure. Exon shuffling as an explanation for 
the origin of the intron-exon structure of genes implies that introns arose in order to 
shuffle exons. That is, it implies evolutionary forethought (Blake 1 978; Doolittle, 
1 978). A consequence of the origin of introns might be exon shuffling, but that 
separates the origin of introns from the emergence of exon shuffling. 
Second, the specific prediction of exon shuffling is that in at least some cases, 
the intron-exon structure of a gene should reflect the existence of discrete functional 
protein modules. Overall, the data are not strong, and even if there are cases of ancient 
exon shuffling, it may not be possible to detect these if intron sliding (for which there 
is no support [Stoltzfus et al. 1 997]) is permitted (Rzhetsky et al. 1 997). Indeed, the 
data accumulated to date (see Logsdon 1 998; Wolf et al. 2000) are most compatible 
with the alternative theory, 'introns-Iate', that the 5 snRNAs of the spliceosome arose 
from group II introns which originated in the bacterial lineage as selfish elements, and 
that introns represent insertion of selfish genetic elements. Under 'introns-late', group 
II introns entered the eukaryote genome via the mitochondrion (members of the u­
proteobacteria, which, among extant bacteria, share the most recent common ancestor 
with mitochondria, have been shown to possess group II introns), and this is known as 
the 'mitochondrial seed' hypothesis (Cavalier-Smith, 199 1 ;  Logsdon, 1 998) .  
Page 1 1  
Importantly, phylogenetic evidence suggests that all extant amitochondrial 
eukaryotes once possessed mitochondria (or hydrogenosomes, which share a common 
origin with mitochondria - see Embley & Hirt, 1 998; Rotte et al. ,  2000). This can be 
taken as evidence to support the scenario described by Logsdon ( 1 998),  since all 
modern eukaryotes arose from an ancestral cell which harboured an endosymbiont. 
Hence the advent of splicing specifically in eukaryotes could be explained by 
endosymbiont to host transfer of a group II intron this direction of transfer is well 
supported by independent evidence [Blanchard & Lynch, 2000]), followed by 
complexification to form the modern spliceosome. 
Introns in are in fact found in all three domains. Archaeal introns are not self­
splicing, but are positionally conserved with eukaryotic tRNA introns, and both make 
use of a conserved LAGLIDADG endoribonuclease in the cleavage and ligation 
reaction (Lykke-Andersen et al . ,  1 997; Trotta & Abelson, 1 999) . Group I introns are 
found in bacteria and both the eukaryote nucleus and organelIes (Lykke-Andersen et 
al . ,  1 997; Cech & Golden, 1999), while group II introns are found in bacteria and 
eukaryote organelles (mitochondria and chloroplasts) (Logsdon, 1 998). However, it is 
hard to argue for a common origin for the three types of intron (group I, 
groupIIIspliceosomal, tRNA), so on phylogenetics, introns may have arisen more than 
once, and do not clearly date back to the RNA world. A common origin is not 
impossible, just not readily testable, given current data. 
While many consider the introns early-late debate to be largely over, there are 
nevertheless shortcomings in the introns-Iate scenario. Furthermore, alternatives exist 
to exon-shuffling as an explanation for the origin of introns and the spliceosomal 
RNAs in the RNA world. While there are continued arguments for the validity of 
exon shuffling (de Souza et aI. ,  1 998), we think the evidence does not favour this 
scenario (see Logsdon, 1998). 
That modern eukaryotes are all likely to have descended from a 
mitochondrion-bearing ancestor adds weight to the suggestion that the spliceosome 
arose specifically within that lineage subsequent to transfer of mitochondrial group II 
introns to the nUcleus2• However, a serious problem for this account is that, because 
the model does not involve a selective advantage for the emergence of splicing, it i s  
hard to understand how a group II  intron became fragmented into five-pieces, and 
associated with a large number of conserved proteins. There is nothing at fault with 
not invoking a selective pressure in the evolution of complex structures. As described 
above, this has provided valuable insight into the evolution of kinetoplastid editing. 
2 For simplicity, we imply the host was a eukaryote with a nucleus, and the endosymbiont was a 
mitochondrion. The nature of the endosymbiont and host are currently the subject of intense debate 
(Andersson & Kurland, 1 999; Rotte et al . 2000), but we note that on current data, it is simplest to 
describe the endosymbiont as mitochondrial, since it is in these organelles that group II introns have 
been identified (Logsdon, 1 998). 
Page 1 2  
An additional problem with this scenario is  that it relies on inference. It cannot 
be directly tested using phylogenetic analyses in the same way as other mitochondrial 
to nucleus transfers (reviewed in Embley & Hirt, 1 998; Philippe et al. 2000). This is  
because both sequence and structure of group IT introns and spliceosomal RNAs are 
too divergent to be able to use either of these for phylogenetic reconstruction of their 
histories .  Assuming group II and spliceosomal RNAs have a common origin, it is not 
possible to distinguish between a common origin in LUCA or transfer from 
mitochondrion to nucleus on the current dataset (Figure 1) .  
The model advocated by Logsdon ( 1 998) requires transfer of non-fragmented 
group II introns to the nucleus (no examples of fragmented group II introns in 
mitochondria have been described) where these then insert into the host DNA, and 
excise during mRNA expression. Then, over time, the mechanism shifts from cis 
splicing to trans splicing by a complex of 5 RNAs. The first point is uncontroversial 
given that group II intron mobility is known (though no examples of nuclear group II 
introns are known) to be mediated via an intron-encoded reverse transcriptase 
(Lambowitz et al. ,  1 999). The second is harder to explain. The fragmentation process 
was either extremely fast, predating divergence of the major eukaryote lineages, or, 
there was selection for the modem spliceosome over other versions, or least likely, the 
modem 5-piece spliceosome was fixed through drift. 
No suggestions have been made regarding the second two possibilities, and the 
third is becoming more problematic since the previous consensus on eukaryote 
phylogenetics based on rRNA phylogeny (Sogin, 1 99 1 )  has been challenged by the 
finding that microsporidia are not deep-diverging eukaryotes as per the rRNA trees, 
but rather are a sister group of fungi (reviewed in Keeling & McFadden, 1 998). The 
emergence of the modem splicing apparatus must predate the diversification of 
eukaryotes, but is also constrained by the endosymbiosis event. In the absence of 
apparent selection for the origins of the spliceosome late (Stoltzfus, 1 999), there ought 
to be spliceosomes intermediate to the 5-piece spliceosome. 
A further point is that both chromosome (Backert et al. ,  1 997; Watanabe et al. ,  
1 999; Zhang et  aI., 1999), gene (Estevez & Simpson, 1 999) and RNA gene (Keiler et 
al. ,  2000) fragmentation is found in mitochondria and chloroplasts. A similar 
architecture is seen in RNA viruses, and this has been argued to be a means of 
slowing the accumulation of slightly deleterious mutations arising via Muller's 
Ratchet (Reanney, 1 986). Hence, while fragmentation might be a predicted 
consequence of an organellar location for group II introns (no fragmented introns 
have been documented in free-living bacteria), it is not expected for genes located in 
the nucleus, given that the ratchet does not operate at the same levels as in organellar 
genomes (Blanchard & Lynch, 2000). 
Currently there is limited information on the nature of splicing in protists. 
Spliceosomal introns and all five snRNAs have been identified in Euglena gracilis 
(Breckenridge et al. 1999, and references therein), Trypanosoma brucei and T. cruzi 
(Mair et al. 2000, and references therein). The Giardia Lamblia genome project 
Page 1 3  
(McArthur et al. 2000) is underway, and it will be interesting to see whether splicing 
occurs and whether snRNAs are present. Given the Trypanosoma and Euglena 
examples, it would be a surprise to find any protists without 5 snRNAs (unless only 
trans-splicing is present in which case U 1 may be expected to be absent - see 
Breckenridge et al. ,  1 999; Mair et al. ,  2000). This suggests it is at least feasible that, 
prior to the endosymbiosis event that gave rise to the mitochondrion, proto-eukaryotes 
possessed splicing. 
Insertion of 'selfish' elements into genomes also deserves consideration. 
Insertion is not a widespread feature of prokaryotic genomes, while it varies from 
almost none, to extreme in eukaryotes. In extant bacteria there is good evidence for 
loss of any sequence that is not under immediate selection, including periodically­
selected functions (reviewed in Poole et aL, 200 1 ) . In bacteria the rate of genome 
replication is likely to be limited by a single origin of replication, and with fast 
response times being crucial to proliferation upon detection of an energy source, there 
is strong selection for sequence loss in the absence of direct selection for the 
sequence. In general, eukaryotes do not compete via fast reaction times, though this 
may be more prevalent among 'simple' eukaryotes (see Poole et al. 2001) .  Without 
such competition, there is no inherent selective disadvantage to selfish element 
insertion if the only consequence is an increase in genome size. With these 
differences, it is clear that bacterial genomes have not simply remained in some 
'primitive' status quo with eukaryotes having diversified through complexification. 
With a precedent for loss in bacteria, it is as likely that group II introns represent the 
remnants of eukaryotic mRNA splicing (surviving as selfish elements through intron 
mobility) as the standard view that splicing has complexified in eukaryotes. Equally, 
if group II introns did enter eukaryote nuclear genes via the mitochondrion, invasion 
and proliferation is expected. 
In examining the case for the spliceosome and mRNA introns in the RNA 
world, there are two major questions. First, what role might splicing have played in an 
RNA world, and second, is there any evidence for an RNA world origin? As 
described above, the exon shuffling theory does not explain the origin of introns, and 
nor is it well supported in specific and genome-wide analyses. Nevertheless, this does 
not preclude an RNA world origin for introns. An RNA world origin is not 
incompatible with the majority of introns being inserted during eukaryote evolution, 
and it does not require that putatively ancient introns adhere to the exon shuffling 
theory. 
Two roles for splicing in the RNA world have been suggested. First, splicing 
might have been a mechanism for recombination as a buffer against accumulation of 
deleterious mutation (Reanney 1 984; Darnell & Doolittle, 1986; Jeffares et al. ,  1 998).  
Again, this role would be separate from the origin of an intron-exon structure. An 
explanation for the origin of splicing comes from examining the origin of 
chromosomes (Maynard Smith & Szathmary, 1993;  Szathmary & Maynard Smith, 
1 993). At a very early stage in the evolution of the cell, genes would not have been 
Page 1 4  
maintained on chromosomes. The advantages of chromosomes are that, upon cell 
division, both daughter cells are guaranteed to receive a copy of all genes, and the 
spread of selfish genes that replicate faster than the other genes is limited (Maynard 
Smith & Szathmary, 1 993) .  
In  the early RNA world, where gene and product were one and the same, the 
advent of the chromosome would have a step toward the separation of phenotype and 
genotype. Either transcription would have to become separated from replication (see 
Maizels & Weiner, 1999), or the whole chromosome would be transcribed and 
subsequently cut up to produce functional products (that is the chromosome and 
transcript are not distinguishable, unless all functional RNAs are on the same strand). 
Both these alternatives are likely, though the latter probably predated the former as a 
means of expressing RNA genes (Poole et al. ,  1 998;  1 999). 
The emergence of physical linkage of genes on chromosomes in an RNA 
world provides a selection for splicing in the RNA world but does not explain the 
origins of the intron-exon structure of genes, nor whether group II introns predate the 
spliceosome. The emergence of an intron-exon structure may have simply been a 
consequence of absence of selection against the emergence of linker regions as a 
result of low replication fidelity. The presence of additional nuc1eotides at the 5 '  
and/or 3 '  end might not have affected function appreciably, though there is no  
inherent reason for splicing to  have been an inaccurate process. I f  i t  did cleave at 
specific sites, insertions between RNA genes resulting from low copying fidelity 
would not be selectively disadvantageous. 
There is however a strong argument that splicing from a 
transcript/chromosome could not have been carried out by group II introns in the 
RNA world. Consider a chromosome with 5 RNA genes on it, and with group II 
introns between the RN A genes. Upon self-splicing of the group II introns out of the 
transcript copy, the 5 genes would still be unprocessed; only the group II introns will 
have been released from the transcript. Gilbert and de Souza ( 1 999) have suggested 
that group 11 introns interrupted RNA genes, with splicing yielding a functional RNA. 
They also suggest that, with recombination, this architecture would enable RNA 
domain shuffling; that is, exon shuffling for RNA instead of proteins. There are 
examples of RNAs with introns (e.g. U3 snoRNA, US?), but it is not possible to 
establish whether these date back to the RNA world, or represent recent insertions. 
More problematically, the scenario proposed by Gilbert and de Souza ( 1999) 
requires a one gene, one chromosome model, with group II introns fulfilling a solely 
'selfish' role. 'Selfish' elements are likely to be an emergent feature of any replicative 
system. However, for chromosomes to evolve, splicing in trans is required in order to 
express functional RNAs from a precursor transcript. Group II introns would not have 
provided this function, since they self-excise then splice together the two exons ! 
Furthermore, the propensity for self-splicing introns to insert into a sequence is not a 
property of the RNA, but of the associated proteins (Lambowitz et al . ,  1 999). Without 
a mechanism for insertion, there would be a tendency for 'selfish' self-splicing introns 
Page 1 5  
to be lost, since the processed chromosome would function equally well without 
these. In fact, without insertion, it is difficult to see how these introns could be 
parasitic on early RNA genomes. Hence, self-splicing introns, if they date back to the 
RNA world, would have had insert themselves as well as excise themselves. Given 
that modern group I and Il introns only do the latter, it is as likely that these post-date 
the RNA world, arising subsequent to DNA endoribonucleases and reverse 
transcriptases and associated factors requried for insertion (Lambowitz et aI. ,  1 999). If 
tRNA introns date back to the RNA world, they have lost both splicing and insertional 
functions (Trotta and Abelson, 1 999). 
For expression of several functional RNAs from a single transcript 
RNA/chromosome (and assuming that these functional RNAs were not all self­
splicing), what is needed is the reverse of modern day splicing (where the junk is cut 
out and the coding regions are spliced together) . That is, in an RNA world, modem­
day introns would have been the coding genes, and modem-day exons would have 
been the junk (Figure 2).  
The brief description of the origin of chromosomes given above is not a new 
one, but the finding of the exact same structure in modem genomes has rekindled the 
argument that the intron-exon structure of genes dates back to the RNA world (Poole 
et al. 1 998, 1 999). Several eukaryotic genes are now known where the introns code 
for functional RNAs (small nucleQlar snoRNAs), the exons being non-coding 
(Tycowski et aI . ,  1996a; Bortolin & Kiss, 1 998;  Pelczar & Filipowicz, 1 998; Smith & 
Steitz, 1998). In snoRNA expression in these genes, the snoRNA-containing introns 
are spliced out and the noncoding exons are spliced together. Gene expression from 
chromsomes would have been identical in the RNA world (Figure 2). 
Excitingly, the production of a junk RNA from a series of non-coding exons 
could also solve the problem of where mRNA came from (Poole et al. ,  1 999). In a 
tightly-packed genome of RNA genes, there would have been no raw material for the 
ribosome to act upon. However, if RNAs were excised from precursor transcripts, 
with the junk being spliced together, this could have provided the raw material from 
which protein genes arose (Figure 2). Under this model, there would be no correlation 
between exons and protein modules, since the proto-exons would have been 
continuous structures, not modular as per the exon shuffling theory. 
A good number of snoRNAs are intron-encoded, with almost all vertebrate 
snoRNAs being intronic, and moreover, these are found in ribosomal and nucleolar 
proteins (Weinstein & Steitz 1999). The latter group are of particular interest, since 
models for the origin of protein synthesis involve a positive feedback loop: proteins 
stabilise and increase the accuracy of the ribosome, which makes proteins more 
accurately, and these further enhance the accuracy of the ribosome (see Poole et al. 
1 999, and references therein). 
It has been variously argued that this is an ancient system (Poole et aI. ,  1 998; 
1 999), and that snoRNAs arose by recently by diversification (Morrissey & Tollervey, 
1 995 ; Lafontaine & Tollervey, 1 998). Many snoRNAs have now been identified, and 
Page 1 6  
almost all are involved in rRNA processing, being essential for 2'-0-ribose 
methylations, pseudouridylations or precursor rRNA cleavage (reviewed by 
Weinstein & Steitz, 1 999). Pre-rRNA processing can certainly be argued to be central 
to metabolism since it is processing of an ubiquitous RNA, as with processing of 
tRNA by RNase P. Nevertheless, establishing the antiquity of snoRNAs is not 
straightforward. Both hypotheses have their merits, and are not necessarily 
incompatible in all respects (Poole et aI. ,  2000). This debate we shall consider further, 
and try to establish an approach that could resolve this issue. 
snoRNAs 
SnoRNAs are involved in extensive processing of eUkaryotic rRNA (Smith & 
Steitz, 1 997; Weinstein & Steitz, 1999), and some process spliceosomal RNAs 
(Tycowski et aI . ,  1 998; Jady & Kiss, 2001) .  Two families have been characterised, 
CID and HJACA, on the basis of sequence elements. The CID family guides 2'-0-
methylation of ribose, and in yeast 5 1  of 55 rRNA methylations have been shown to 
be snoRNA-guided (Lowe & Eddy, 1999). The HJACA family snoRNAs guide 
isomerisation of uridine to form pseudouridine. In yeast, based on the number of 
pseudouridylations of rRNA (Ofengand & Foumier, 1 998) the number of HJ ACA 
snoRNAs is predicted to be comparable to CID snoRNAs. In humans, this number is 
expected to be near 100 for each family, again on the basis of the number of 
modifications made to the rRNA (Smith & Steitz, 1 997) . Members of each class are 
also involved in cleavage of pre-rRNA during rRNA maturation (reviewed in Smith & 
Steitz, 1 997). Recently, a 'chimeric' snoRNA, which guides both pseudouridylation 
and methylation on snRNA U5, has been characterised (Jady & Kiss, 200 1 ) . 
However, with the exception of this snoRNA, all other snoRNAs fall neatly into the 
two families, CID and HJACA. 
The distribution of snoRNAs varies across the three domains. Eukaryotes contain 
both CID and HJACA family snoRNAs, involved in 2'-0-methylation and 
pseudouridylation, and representatives of both families participate in pre-rRNA 
cleavage (reviewed in Morrissey & Tollervey, 1 995; Smith & Steitz, 1 997 ; Lafontaine 
& Tollervey 1 998 ;  Smith & Steitz, 1 999) . Bacteria are not expected to possess 
snoRNA-like RNAs, having a limited number of 2'-O-methylations and 
pseudouridylations, all of which are produced by protein enzymes in bacteria studied 
to date (Bachellerie & Cavaill6, 1 998;  Ofengand & Foumier, 1 998). Cleavage of pre­
rRNA in bacteria i s  likewise carried out by proteins (Morrissey & Tollervey, 1 995) . 
A more complex picture has emerged in archaea. Both the crenarchaea and 
euryarchaea possess extensive 2'-0-methylation of rRNA, guided by a family of small 
RNAs homologous to eukaryotic CID snoRNAs (Gaspin et aI. ,  2000; Omer et al. ,  
2000). However, the number of pseudouridylations in  archaeal rRNA is low, as  per 
bacteria (Lafontaine & Tollervey, 1 998). No homologues ofHJACA snoRNA­
associated proteins have been identified, suggesting that the pseudouridylation 
apparatus may be protein-mediated like in bacteria (Lafontaine & Tollervey, 1 998;  
Page 1 7  
Charette & Gray, 2000). Less is known about the pre-rRNA processing events 
involving cleavage in archaea. Evidence to date suggest this aspect of pre-rRNA 
processing does not involve snoRNA-like RNAs, but one or more novel 
endonucleases (Russell et aI. , 1 999). However, an in-cis snoRNA U3-1ike function 
(U3 functions in pre-rRNA cleavage in eukaryotes [see Smith & Steitz, 1997]) for 
sequences within the 5 '  external transcribed spacer of pre-rRNA has been suggested 
for both archaea and bacteria (Dennis et al. ,  1997), and homologues of the snoRNA 
U3-associated protein IMP4, have been identified in Archaea (Mayer et aI. , 200 1 ) .  If 
snoRNA-mediated cleavage of pre-rRNA is not demonstrated in archaea, the 
existence of proteins homologous to the eukaryotic snoRNP-based processing system, 
and the existence of CID family homologues for pre-rRNA 2'-O-methylation, might 
be best interpreted as loss from archaea, especially given that some of the eukaryotic 
snoRNAs involved in cleavage are CID family members. Furthermore, if Dennis et al. 
( 1997) are correct in their suggestion of an in-cis U3-like function for the 5'ETS, this 
may suggest that the snoRNA system for cleavage is, in some form, ancestral, as 
suggested by leffares et al. ( 1 998).  With the paucity of information currently available 
for archaeal pre-rRNA cleavage events, it is not possible to establish whether it is 
more like the eukaryote or bacterial pathway, or indeed, whether it is unique to the 
archaeal domain. 
We have previously argued that both families of snoRNAs date back to the 
RNA world (Jeffares et al. 1998; Poole et al. 1 999), while Tollervey and colleagues 
have argued for more recent origins, with the CID family arising in the ancestor of 
eukaryotes and archaea and the HI ACA family perhaps arising in the eukaryotes, after 
divergence from the two prokaryotic lineages. Which scenario is correct, and how 
does one establish this? There are several aspects to the snoRNA problem: 
• Consideration of the phylogenetic distribution of CID and HI ACA family 
snoRNAs, as outlined above. 
• Problems with the rooting of the tree of life, and how this may influence 
conclusions. 
• Selection. 
• That an RNA world origin for snoRNAs does not preclude recent diversification. 
It is necessary to consider all aspects in any theory that attempts to account for the 
origin, evolution and modern distribution of snoRNAs. We shall review relevant 
aspects of the tree of life problem, and present a theory for the origin of snoRNAs that 
accounts for all the data. 
Currently the interrelationships between the three domains is still in dispute, 
with the widely accepted monophyly of archaea and eukaryotes (Figure 3a, Iwabe et 
aI. ,  1 989; Gogarten et aI. ,  1989; Woese et al. ,  1 990) having been challenged in the 
light of new techniques, which suggest that the bacteria appear more divergent 
because of 'long-branch attraction' (Brinkmann & Philippe, 1 999; Lopez et al. ,  1 999), 
wherein a faster rate of evolution incorrectly groups the two slower-evolving groups 
Page 1 8  
(archaea and eukaryotes). Removing the 'long-branch attraction' artefact places the 
two prokaryotic groups together, with the root falling on the eukaryote branch (figure 
3b). The traditional tree suggests that the snoRNAs arose in the common ancestor of 
the archaea and eukaryotes, and may or may not have been present in the Last 
Universal Common Ancestor (LUCA), the latter point depending on whether the 
bacterial rRNA processing system is ancestral or derived (Figure 3a). The newly­
proposed tree places the snoRNAs in the LUCA (assuming the distribution is not a 
result of horizontal transfer), as they are represented in both major branches of the 
tree, so parsimony can be applied to argue that bacteria almost certainly lost these 
(Figure 3b). 
Since the position of the root of the tree of life is not known with any 
certainty, it is difficult to establish the origin of a feature based on its distribution 
across the three domains. Even if the root is established, it is difficult to use this 
information to establish the nature of the LUCA. A feature found on both sides of the 
root can be argued to be present in the LUCA, assuming no horizontal transfer or 
convergent evolution. A feature which is present in only one lineage, e.g.  H/ACA 
snoRNAs in eukaryotes, must be treated slightly differently however. Multiple losses 
are far more likely than multiple gains (as exemplified by multiple independent losses 
of primary synthetic pathways in parasitic and endosymbiotic bacteria [Andersson & 
Andersson, 1 999]). Hence, if HI ACA snoRNAs are not found in archaea or bacteria, 
this does not rule out the possibility that it was a feature of the LUCA (Forterre, 1 997; 
Penny & Poole, 1 999). 
As the tree describes the relationships between three monophyletic lineages, 
any argument from parsimony should be treated with caution. More importantly, even 
with horizontal transfer excluded (as far as we are aware, there is no evidence for 
horizontal transfer of snoRNAs or associated proteins), the uncertainty of the 
topology of the tree of life makes it uninformative (Forterre, 1 997; Penny & Poole, 
1 999). 
The problems of using the tree in establishing the evolution of the snoRNAs 
calls into question the robustness of Tollervey and colleagues' conclusions (Morrissey 
& ToIIervey, 1 995;  Lafontaine & Tollervey, 1 998) because their scenario for the 
origin of the snoRNAs is based on two assumptions: that the bacterial rooting of the 
tree of life is correct; and that the corollary of the placement of the bacterial lineage as 
the outgroup is that bacterial features are ancestral and those shared by archaea and 
eukaryotes are derived. It is currently unclear whether the bacterial rooting is the 
correct one, but in placement of the root in the bacterial lineage does not imply that 
bacterial traits are ancestral, or that shared archaeal-eukaryote traits arose post-LUCA 
(Forterre, 1 997). This latter point does not in itself invalidate the evolutionary scheme 
described ToIlervey and colleagues' papers, but it does cast doubt on it. 
Page 1 9  
The case for snoRNAs as RNA relics. 
As the tree of life cannot be used to establish the antiquity of snoRNAs, it is 
necessary to establish an alternative approach to examining the origin of snoRNAs. 
One way to do this is to establish whether there is a role for methylation and 
pseudouridylation in the RNA world. Both types of modification are ubiquitous, so 
can be argued to date back to the RNA world (Martfnez Gimenez et aI. ,  1 998;  
Cermakian & Cedergren, 1 998). This suggestion is relatively uncontroversial since it 
is based on the ubiquity of these modifications, and on arguments for their utility prior 
to the emergence of protein synthesis. Pseudouridylation might have originally been 
selected for the increased H-bonding that is possible compared with uridine (see 
Ofengand & Fournier, 1 998; Charette & Gray, 2000). It might therefore be important 
in the specification of tertiary structure, or a folding pathway. 2'-O-methylation alters 
the 2'-OH moiety of ribose, and this could have two roles. First, this modification 
eliminates the reactivity of the 2'-OH, so 2'-O-methylated ribose cannot be involved in 
catalytic reactions. Moreover, the addition of a methyl group will restrict the potential 
for hydrogen bonding at that position. Hence, 2'-O-methylation would prevent cross­
reactivity or unwanted self-cleavage, and furthermore, influencing hydrogen bonding 
might specify or favour a particular folding pathway (Bachellerie & Cavaille, 1 998;  
Poole et al . ,  2000). 2'-O-methylation is expected to be possible without protein, 
consistent with a possible RNA world origin for this modification (Poole et aI. , 2000), 
though it is less clear whether pseudouridylation could be catalysed by RNA. In both 
cases, this could be established through in vitro selection experiments. A final point is 
that cleavage reactions analogous to those in pre-rRNA processing are known for 
RNA, an example being that carried out by RNases P and MRP. 
The theory proposed by Tollervey and colleagues (Morrissey & Tollervey, 
1 995; Lafontaine & Tollervey, 1 998) would require that these modifications were 
present in the RNA world in limited numbers (or perhaps even absent altogether), 
with the snoRNA apparatus only arising post-LUCA. If this argument is accepted, an 
explanation must be given for the very limited use of these functional groups in the 
RNA world and the LUCA, with emergence of high levels of rRNA methylation in 
archaea, and both methylation and pseudouridylation in eukaryotes. It also must 
explain the utility of such rRNA modifications specifically in these two groups, and 
not bacteria. The alternative is that modification of rRNA dates back to the RNA 
world, and that it was snoRNA mediated (Poole et aI. ,  1 998, 1 999) . Protein-RNA 
interactions subsequently replaced the role of such modifications in folding, and in 
silencing sites of potential catalytic activity (Poo1e et al. ,  2000). Detailed structural 
information of the bacterial ribosome is now available (Muth et al. ,  2000; Nilssen et 
al . ,  2000, Yusupov et al. ,  2001 ), and eventually it may become possible, through 
comparative structures, to establish whether eukaryotic modifications serve an 
equivalent function to RNA-protein interactions. 
If it is assumed that pseudouridylation and 2'-O-methylation date back to the 
RNA world, was relatively extensive, and that modification was either mediated or 
Page 20 
catalysed by snoRNA, an explanation for the complete absence of snoRNAs from 
bacteria, and ofH/ACA snoRNAs from archaea must also be given. 
The bacterial rooting of the tree of life, and the position of thermophiles at the 
base of both the archaeal and bacterial domains has been taken as evidence to support 
a thermophilic LUCA (Woese, 1 987). However, single-stranded RNA is unstable at 
high temperatures, and a strong counter argument for the reduction in RNA 
processing, and putative RNA relics, in prokaryotes is that either the ancestor of 
prokaryotes was a thermophile, or, that thermophily arose twice (Forterre, 1 995; 
Poole et aI. ,  1 998, 1 999). In both scenarios, eukaryotes would never have undergone a 
period of adaptation to high temperatures, and the LUCA would have been a 
mesophile (Forterre, 1995; Poole et aL, 1 998, 1999). In addition to the expectation 
that RNA processing would be reduced during adaptation to high temperatures, 
circular chromosomes may also be an adaptation to high temperature, solving the 
problem of 'frayed ends' (Marguet & Forterre, 1 994; Poole et al. ,  1 999) and also 
supporting the argument that linear chromosomes and telomerase RNA is the 
ancestral state (Maizels & Weiner, 1 999; Poole et al. ,  1 999). Independent evidence 
that the LUCA was mesophilic comes from reconstruction of the ancestral GC content 
by comparing archaeal, bacterial and eukaryote genomes (Galtier et al., 1 999). Even 
when mesophiles were removed from the dataset the conclusion reached was the same 
(Galtier et aI . ,  1 999). Finally, three independent reports have now suggested that traits 
contributing to hyperthermophily may have been subject to horizontal transfer 
(Aravind et aI. ,  1 998;  Nelson et al. ,  1 999; Forterre et at, 2000). 
Neither scenario can readily explain the snoRNA data however. In addition to 
the roles for 2'-O-methylation described above, it has also been shown that this type 
of modification serves to stabilise RNA, and that the extent of modification is 
positively correlated with growth temperature in thermophilic archaea (Noon et aI. ,  
1 998). If the LUCA were a thermophile, there ought to have been selection for 
extensive methylation in all groups, yet single-stranded RNA should not be favoured 
since it is thermolabile (Forterre, 1 995). SnoRNA-mediated 2'-O-methylation is found 
in archaea and eukaryotes, but not in bacteria, whereas a thermophilic common origin 
for all three domains would predict that all three would have extensive methylation, 
and, if anything, eukaryotes would be the strongest candidates to have lost these. 
Likewise, a thermophilic ancestor for prokaryotes does not readily explain the 
presence of extensive methylation in archaea, and near absence in bacteria. However 
it can potentially explain the loss of pseudouridylation in both lineages, since there is 
no obvious role for this type of modification in RNA thermostability. Nevertheless, 
given the inconsistency with the 2'-O-methylation data, this is too simplistic an 
explanation. 
As opposed to the scenario given by Lafontaine & Tollervey ( 1998), where 
CID family snoRNAs emerged in the archaeal-eukaryote lineage, and HI AeA 
snoRNAs emerged in eukaryotes after divergence from archaea, we favour the 
following possibility. 
Page 2 1  
Given a likely RNA world role for both pseudouridylation and 2'-0-
methylation, the bacterial site-specific protein system for modification is most likely 
to be derived. The simplest explanation for snoRNAs is therefore that they date back 
to the RNA world, and hence that these were a feature of the LUCA (PooIe et aL, 
1 998,  1 999). The presence of CID family snoRNA-like sRNAs in archaea (Gaspin et 
al. ,  2000; Omer et al. ,  2000) and their absence in bacteria, and absence of HJACA 
snoRNAs from both can be explained by the loss of snoRNAs from the bacterial 
lineage prior to thermoadaptation, while snoRNAs were present in the ancestors of 
archaea prior to thennoadaptation. In adaptation to high temperatures in general, there 
will be the tendency to minimise use of single-stranded RNA, owing to its instability 
at high temperatures, and hence RNA processing is expected to have been reduced in 
lineages which underwent a period of thennoadaptation. For RNA to nevertheless be 
maintained, there must be counter-selection for RNA protection. 
We suggest that in the archaea, H/ACA snoRNAs were lost since there was 
selection for reduction of RNA processing, with one consequence being that extensive 
pseudouridylation was replaced by protein-RNA interactions. In the case of CID 
snoRNAs, there was still selection for reduction of RNA processing, but 2'-0-
methylation was selectively advantageous since it imparted greater stability on the 
modified RNAs. Consequently, this pathway of RNA processing was retained, though 
there was selection for reduction in size of CID family snoRNAs, regularity in 
structure, and for maximal modification from minimal numbers of RNAs (see Omer 
et aI. ,  2000), so those which perfonned two modifications were selected over those 
that directed just one modification. 
In the case of bacteria, we suggest that snoRNA-mediated modifications had 
been lost prior to thennoadaptation, and that these had been replaced by RNA-protein 
interactions. The selection we have proposed for loss of RNA processing is response 
time in organisms competing for limited resources that fluctuate in availability (Poole 
et aI., 1 998;  Poole et aI. ,  1999). In bacteria, a fast response time is required in order to 
act upon detection of a nutrient source. Action requires gene expression and 
subsequent utilisation of that source, and the faster this is achieved, the more progeny 
that are produced (Carlile, 1 982). Fast gene expression requires fast protein synthesis, 
and it is notable that in bacteria, translation begins before transcription is complete, 
and that ribosome assembly requires fewer steps than in eukaryotes, since there is 
relatively little processing of the rRNA. In eukaryotes, ribosome assembly takes much 
longer, and gene expression requires many processing steps, as well as export from 
the nucleus (see PooIe et aI. ,  1 998).  We therefore suggest that competition drove the 
streamlining of the RNA processing apparatus in the ancestors of bacteria, prior to 
thermoadaptation. Consequently, when bacterial lineages colonised high temperature 
environments, RNA-protein interactions in the ribosome provided thennostability. 
In eukaryotes we favour the scenario put forth by Lafontaine and Tollervey 
( 1 998), who argue that duplication & divergence conceivably resulted in expansion of 
the modification snoRNAs in this lineage. Duplication and divergence is  more likely 
Page 22 
to lead to new function in eukaryotes than in archaea or bacteria since in the latter two 
groups, the rate of genome replication is under selection. Successful individuals are 
not only those that respond to a new nutrient, but those that can divide the fastest (see 
Poole et aI. ,  2001 ) .  Duplication events in eukaryotes are not in themselves selectively 
disadvantageous, and could lead to the emergence of two snoRNAs from a single 
ancestral snoRNA which carried out two modifications. Once this had occurred, there 
would be a low probability that reversion could have restored the original state. While 
a few eukaryote snoRNAs can mediate two modifications, the majority carry out just 
a single modification (Kiss-LaszI6 et al . ,  1 996; Tycowski et aI., 1 996b; Ni et aI., 
1 997; Ganot et aI. ,  1 997b; Lowe & Eddy, 1999). 
Duplication and divergence would also have resulted in potential for 
expansion of the role of snoRNAs. As has been recently documented (Cavaille et aI., 
2000), some snoRNAs in mouse and human are expressed specifically in the brain, 
and are targeted to mRNA, possibly playing a role in the regulation of editing which 
produces alternative gene products. These brain-specific snoRNAs (Cavaille et al., 
2000) provide a clear example of RNAs with different proximate and ultimate origins. 
Even if snoRNAs are a recent deVelopment (i.e. post-LUCA), it is possible to 
establish the ultimate (original) function as being in rRNA processing, as this is 
conserved between archaea and eukaryotes. 
Given that the ancestral state would be two modifications per snoRNA, this 
would have been maintained, or selected for in the CID box s(no)RNAs of archaea, 
owing to the thermolability of RNA, whereas loss of this organisation might be an 
expected outcome of duplication and divergence. As for an explanation for the 
ancestral state being two modifications and not one, this is unclear, and indeed one 
evolutionary explanation may simply be that this is what emerged. An alternative 
possibility is that in the RNA world, two modifications (as is presumably the ancestral 
state for both CID and HlACA snoRNAs) may have represented the optimal number 
of modifications by a single RNA, given low coding capacity. 
Conclusions. 
The evidence we review here argues that new RNAs do evolve de novo, that 
this process is ongoing, and central to evolution of new cellular functions. Likewise, 
new RNA functions can arise through duplication and divergence. Nevertheless, it is 
still possible to distinguish between RNAs which arose very early in evolution and 
those which have a relatively recent origin. This distinction is not necessarily on the 
basis of function alone, and the necessarily ad hoc nature of this classification results 
in some RNAs being harder to place. However, on current evidence, and consistent 
with the RNA world theory (leffares et al. ,  1 998), we conclude that newly-evolved 
RNAs do not appear to displace proteins, whereas proteins have probably replaced 
RNAs on many occasions during evolution. 
A question of central evolutionary importance is whether, as argued by Eddy 
( 1 999), RNA may be inherently better suited to certain roles than are proteins. RNA 
Page 23 
can readily form complementary base pairs, making it effective in regulation of gene 
expression, guide-mediated site-specific modification, and, moreover, such functions 
may arise readily, for instance, through duplication and expression of an antisense 
RNA from the duplication. While a reasonable suggestion; proteins families have also 
evolved diverse specific RNA binding function. A good example is the large number 
of restriction-modification systems, where pairs of evolutionarily unrelated 
endonucleases and methylases recognise the same sequence. A common origin for a 
range of restriction endonucelases (JeItsch et al. 1 995; Bujnicki, 2000) demonstrates 
that extensive diversification is possible from a single protein. 
Indeed, arguing that RNA is inherently better than protein runs counter to the 
process by which new functions evolve. There is no requirement that the molecule 
that becomes selected for that function is the 'best' possible for that role, and this is  
exactly the point of  Jacob's ( 1977) analogy of evolution as  a tinkerer, not an 
engineer-selection merely requires that a function confers an advantage. It does not 
require that only the best possible molecule is the only molecule that can come under 
selection. 
It is not clear that RNA is inherently better than protein, even if this apparently 
makes intuitive sense. RNA may be more readily recruited into functions where base 
recognition is required, perhaps suggesting that potential anti sense molecules are 
readily generated in cells. Proteins are able to recognise specific sequences of 
considerable length, and regulate gene expression through nucleic acid binding. 
Hence, there is not the same clear picture as for the evolution of catalysis (Jeffares et 
al. 1 998).  Notably, even with the evolution of catalysis, it is possible that some RNAs 
may never be replaced by proteins if the only criterion is catalytic efficiency, since it 
is possible for ribozymes to reach catalytic perfection, selection for a faster chemical 
step in catalysis will only occur when substrate diffusion is not the rate limiting step 
in the reaction; the larger the substrate, the slower it diffuses (Jeffares et al. 1 998).  
Arguments such as Eddy's ( 1999) lump the propensity for recruitment together 
with the propensity for function. In a hypothetical situation where only protein was 
available, no amount of tinkering would result in an RNA being selected for a given 
function (even though it might be better than protein) simply because there is no RNA 
for selection to act on. 
We therefore suggest that the recruitment of either RNA or protein into new 
function depends on what is available, not what is best. For catalysis, where there is  
selection for evolution towards catalytic perfection, protein may replace RNA if  an 
RNA cannot reach rates of catalysis where diffusion becomes the rate-limiting step, 
but not for a ribozyme where substrate diffusion is rate limiting (Jeffares et al. 1 998). 
For site-specific recognition, we suggest that recruitment of RNA or protein has more 
to do with what is available, and that there is no evidence supporting the possibility 
that RNA is inherently better than protein in this role. In general, the propensity for 
RNA to be selected over protein in a sequence-recognition role will depend on the 
Page 24 
initial 'environment' not the inherent properties of the molecule. Where this may break 
down is at high temperature, where RNA will be selected against. 
The snoRNAs constitute the only case where it is argued that RNA could have 
displaced proteins (Lafontaine & Tollervey, 1 998), and, at least in respect to their role 
as guides for post-transcriptional modification, this is not unreasonable. The 
alternative scenario, that snoRNAs pre-date protein-enzymes is also feasible (Poole et 
aI. ,  1 999). For a resolution of this issue, two questions must be addressed. First, what 
is the biological function of 2'-O-methylated ribose and pseudouridine, the products 
of snoRNA-mediated modification? Second, in the context of the two theories, what 
selection pressures could account for the diversification of these in eukaryotes (and 
archaea) or the reduction of these in bacteria? Elsewhere, we have offered a selection 
pressure for the loss of modifications in bacteria (Poole et aI . ,  1 999) . In contrast, an 
argument for the diversification of snoRNA-mediated modifications in eukaryotes 
based on selection has yet to be proposed. 
Several exciting developments with respect to the evolutionary origins of 
snoRNAs and snRNAs are coming from the examination of the protein constituents of 
the RNPs. For example, the CID family snoRNPs and U4 snRNP possess a common 
core protein that binds to an equivalent motif in both CID family snoRNAs and U4 
snRNA (Watkins et aI. ,  2000; Peculis, 2000). As per the problems with establishing 
the evolutionary relationships between snRNAs and group IT introns, it is not possible 
to tell whether this similarity is due to convergence or divergence from a common 
ancestor. Likewise, the common HlACA motifs shared by telomerase RNA and 
HlACA box snoRNAs could be divergent or convergent (MitchelI et al . ,  1 999), as 
could the demonstration that both associate with the same set of core proteins 
(Pogacic et al. 2000; Dez et aI. ,  200 1 ). With respect to snRNA origins, it is interesting 
to note that Srn proteins have now been detected in archaea (Salgado-Garrido et aI. ,  
1 999). Srn proteins are part of the spliceosome, but have recently shown to be 
involved in mRNA degradation (Bouveret et at, 2000). The function of Srn proteins 
in archaea is unknown (Salgado-Garrido et aI. ,  1 999), as is the pathway of RNA 
degradation in this domain. 
In conclusion, information on phylogenetic distribution, together with 
metabolic context may provide an important test for resolving problematic data sets, 
such as the snoRNA data set. This is essential primarily because there is no clear way 
of objectively evaluating the two theories as they currently stand. A major hurdle that 
needs to be overcome before this approach can be reliably applied is for phylogenetics 
to unambiguously establish the relationships of the three domains archaea, bacteria 
and eukaryotes. Finally, it will also be important to test the evolutionary relationship 
of CID family snoRNAs in eukaryotes and sRN As from archaea. It is difficult to 
predict whether it will be possible to establish if these are related by descent, or are 
convergent. However the task ought to be simpler than demonstrating relationships 
between functionally unrelated RNAs such as HlACA snoRNAs and telomerase 
RNA, U4 snRNA and CID snoRNAs, or group II introns and the spliceosomal RNAs. 
Page 25 
References. 
AItman S ,  Kirsebom L. 1 999. Ribonuclease P. In: Oesteland R, Cech T, Atkins J, eds. 
The RNA World, 2nd Ed. New York: Cold Spring Harbor Laboratory Press. pp 35 1 -
380. 
Altuvia S ,  Zhang A, Argaman L, Tiwari A, Storz O. 1 998. The Escherichia coli OxyS 
regulatory RNA represses fhlA translation by blocking ribosome binding. EMBO J 
1 7: 6069-6075. 
Andersson JO, Andersson SOE. 1 999. Insights into the evolutionary process of 
genome degradation. Curr Opin Genet Dev 9: 664-67 1 .  
Andersson SGE, Kurland CG. 1 998. Reductive evolution of resident genomes. Trends 
Genet 6: 263-268.  
Andersson SGE, Kurland CO. 1 999. Origins of mitochondria and hydrogenosomes. 
Curr Opin Microbiol 2: 535-54 1 .  
Aravind L, Tatusov RL, Wolf YI, Walker DR, Koonin EV. 1 998. Evidence for 
massive gene exchange between archaeal and bacterial hyperthermcphiles. Trends 
Genet 14: 442-444. 
Bachellerie J-P, Cavai1l6 J. 1 998. Small nucleolar RNAs guide the ribose 
methylations of eukaryotic rRNAs. In: Grosjean H, Benne R, eds. Modification and 
Editing of RNA. Washington, DC: ASM Press. pp 255-272. 
Backert S, Nielsen BL, Bomer T. 1 997. The mystery of the rings: structure and 
replication of mitochondrial genomes from higher plants. Trends Plant Sci 2: 477-
483. 
Baumeister W, Walz J, Ziihl F, Seemiiller E .  1 998. The proteasome: Paradigm of a 
self-compartimentalizing protease. Cell 92: 367-380. 
Been MD, Wickham GS. 1997. Self-cleaving ribozymes of hepatitis delta virus RNA. 
Eur J Biochem 247: 741 -753 .  
Benner SA,  Ellington AD, Tauer A. 1 989. Modem metabolism as a palimpsest of  the 
RNA world. Proc Natl Acad Sci USA 86: 7054-7058. 
Blake CCF. 1 978. Do genes-in-pieces imply proteins-in-pieces? Nature 273: 267. 
Blanchard JL, Lynch M. 2000. Organellar genes: why do they end up in the nucleus? 
Trends Genet. 16: 3 1 5-320. 
Borner GV, Yokobori S-I, Morl M, Domer M, Paabo S. 1 997. RNA editing in 
metazoan mitochondria: staying fit without sex. FEBS Lett. 409: 320-324. 
Bortolin ML, Kiss T. 1 998. Human U19  jntron-encoded snoRNA is processed from a 
long primary transcript that possesses little potential for protein coding. RNA 4: 445-
454. 
Boudvillain M, de Lencastre A, Py1e AM. 2000. A tertiary interaction that links active­
site domains to the 5' splice site of a group II intron. Nature 406, 3 1 5-3 1 8. 
Bouveret E, Rigaut G, Shevchenko A, Wilm M, Seraphin B .  2000. A Srn-like protein 
complex that participates in mRNA degradation. EMBO J 19: 166 1 - 1 67 1 .  
Bouzat JL, McNeil LK, Robertson HM, Solter LF, Nixon lE, Beever JE, Gaskins HR, 
Olsen G, Subramaniam S, Sogin ML, Lewin HA. 2000. Phylogenomic Analysis of 
Page 26 
the ex Proteasome Gene Family from Early-Diverging Eukaryotes. J Mol Evo1 51: 
532-543. 
Breckenridge DG, Watanabe Y, Greenwood SJ, Gray MW, Schnare MN. 1 999. U l  
small nuclear RNA and spliceosomal introns in Euglena gracilis. Proc Natl Acad Sci 
USA 96: 852-856. 
Brinkmann H, Philippe H. 1 999. Archaea sister-group of Bacteria? Indications from 
tree reconstruction artifacts in ancient phylogenies. Mol BioI Evo1 16: 8 17-825 . 
Brosius J. 1999. RNAs from all categories generate retrosequences that may be 
exapted as novel genes or regulatory elements. Gene 238: 1 1 5- 1 34. 
Brown JW, Haas ES, Pace NR. 1 993. Characterization of ribonuclease P RNAs from 
thermophilic bacteria. Nucleic Acids Res 21:67 1 -679. 
Bujnicki JM. 2000. Phylogeny of the restriction endonuclease-like superfamily 
inferred from comparison of protein structures. J Mol Evol 50: 39-44. 
Cavalier-Smith T. 1 99 1 .  Intron phylogeny: a new hypothesis. Trends Genet 7: 145-
1 48 .  
Cech TR. 1 986. The generality of self-splicing RNA: Relationship to nuclear RNA 
splicing. Cell 44: 207-2 10. 
Cermakian N, Cedergren R. 1 998. Modified nuc1eotides always were: an evolutionary 
model. In: Grosjean H, Benne R, eds. Modification and Editing of RNA. 
Washington, DC: ASM Press. pp. 535-541 .  
Chanfreau G, lacquier A. 1 994. Catalytic site components common to both splicing 
steps of a group II intron. Science 266: 1 383-1 387. 
Charette M, Gray MW. 2000. Pseudouridine in RNA: What, Where, How, and Why. 
IUBMB Life 49: 34 1 -35 1 .  
Collins LJ, Moulton V ,  Penny D. 2000. Use of RNA secondary structure for studying 
the evolution of RNase P and RNase MRP. J Mol Evo1 51: 1 94-204. 
Copertino DW, Hall ET, Van Hook FW, Jenkins KP, Hallick RE. 1 994. A group HI 
twintron encoding a maturase-like gene excises through lariat intermediates. Nucleic 
Acids Res 22: 1 029- 1036. 
Copertino DW, Hallick RB. 1 993. Group Il and group III introns of twintrons: 
potential relationships with nuclear pre-mRNA introns. Trends Biochem Sci 18: 
467-47 1 .  
Covello PS, Gray MW . 1 993 . On the evolution of RNA editing. Trends Genet 9: 265-
268. 
Culbertson MR. 1 999. RNA surveillance: unforseen consequences for gene 
expression, inherited genetic disorders and cancer. Trends Genet. 15: 74-80. 
Damell lE, Doolittle WF. 1 986. Speculations on the early course of evolution. Proc 
Natl Acad Sci USA 83: 1 27 1 - 1 275. 
de Souza SJ, Long M, Klein RJ, Roy S ,  Lin S ,  Gilbert W. 1 998.  Toward a resolution 
of the introns earlyllate debate: Only phase zero introns are correlated with the 
structure of ancient proteins. Proc Natl Acad Sei USA 95: 5094-5099 
Page 27 
De1ihas N.  1 995 . Regulation of gene expression by trans-encoded anti sense RNAs. 
Mol Microbial 15: 4 1 1 -4 14. 
Dennis PP, Russell AG, Moniz De Sa M. 1 997. Formation of the 5' end pseudoknot in 
small subunit ribosomal RNA: involvement of U3-like sequences. RNA 3: 337-343. 
Dez C, Henras A, Faucon B, Lafontaine D, Caizergues-Ferrer M, Henry Y. 200 1 .  
Stable expression in yeast of the mature form of human telomerase RNA depends on 
its association with the box HlACA small nucleolar RNP proteins Cbf5p, Nhp2p and 
Nop10p. Nucleic Acids Res 29: 598-603 .  
Doolittle WF. 1 978. Genes in pieces: were they ever together? Nature 272: 58 1 -582.  
Eddy SR. 1 999. Non coding RNA genes. Curr Opin Genet Dev 9: 695-699. 
Embley TM, Hirt RP. 1 998.  Early branching eukaryotes? Curr Opin Genet Dev 8: 
624-629. 
Erdmann VA, Barciszewska MZ, Szymanski M, Hochberg A, de Groot N,  
Barciszewski J. 200 1 .  The non-coding RNAs as riboregulators. Nucleic Acids Res 
29: 1 89- 1 93 .  
Estevez AM, Simpson L .  1999. Uridine insertion/deletion editing in trypanosome 
mitochondria-a review. Gene 240: 247-260. 
Forterre P. 1 995 . Thermoreduction, a hypothesis for the origin of prokaryotes. CR 
Acad Sci Paris III 318: 4 15-422. 
Forterre P. 1 996. A hot topic :  the origin of hyperthermophiles. Cell 85: 789-792. 
Forterre P.  1 997. Archaea: what can we learn from their sequences? Curr. Opin. 
Genet. Dev. 7: 764-770. 
Forterre P, Bouthier De La Tour C, Philippe H, Duguet M. 2000. Reverse gyrase from 
hyperthermophiles: probable transfer of a thermoadaptation trait from archaea to 
bacteria. Trends Genet 16: 1 52- 1 54. 
Franke A, Baker BS. 1999. The roxI and rox2 RNAs are essential components of the 
compensasome, which mediates dosage compensation in Drosophila. Mol Cell 4: 
1 17- 1 22. 
Fung PA, Gaertig J, Gorovsky MA, Hallberg RL. 1 995. Requirement of a small 
cytoplasmic RNA for the establishment of thermotolerance. Science 268: 
1036-1039. 
Galtier N,  Tourasse N, Gouy M. 1 999. A nonhyperthermophilic common ancestor to 
extant life forms. Science 283: 220-22 1 .  
Ganot P, Caizergues-Ferrer M ,  Kiss T. 1 997a. The family of box ACA small 
nucIeolar RNAs is defined by an evolutionarily conserved secondary structure and 
Ubiquitous sequence elements essential for RNA accumulation. Genes Dev 11 :  94 1 -
956. 
Garrett TA, Pabon-Pena LM, Gokaldas N, Epstein LM. 1996. Novel requirements in 
peripheral structures of the extended satellite 2 hammerhead. RNA 2: 699-706. 
Page 28 
Gaspin C, Cavaille J, Erauso G, Bacherllerie J-P. 2000. Archaeal homologs of 
eukaryotic methylation guide small nuc1eolar RNAs: lessons from the Pyrococcus 
genomes .  J Mol Biol 297: 895-906. (Erratum in J Mol Biol 300: 1 0 17- 10 1 8.J  
Gilbert W.  1 978 .  Why genes in pieces? Nature 271:  501 . 
Gilbert W. 1 986. The RNA world. Nature 319: 6 1 8. 
Gogarten JP, Kibak H, Dittrich P, Taiz L, Bowman EJ, Bowman BJ, Manolson MF, 
Poole RJ, Date T, Oshima T, Konishi J, Denda K, Yoshida M. 1 989. Evolution of 
the vacuolar H + -ATPase: implications for the origin of eukaryotes. Proc Natl Acad 
Sci USA 86: 666 1-6665. 
Goldschmidt-Clermont M, Choquet Y, Girard-Bascou J, Michel F, Schirmer-Rahire 
M, Rochaix JD. 199 1 .  A small chloroplast RNA may be required for trans-splicing 
in Chlamydomonas reinhardtii. Cell 65: 1 35-143. 
Gordon PM, Sontheimer EJ, Picirilli JA. 2000. Metal ion catalysis during the exon­
ligation step of nuclear pre-mRNA splicing: Extending the parallels between the 
spliceosome and group II introns RNA 6: 199-205. 
Graveley BR. 2001 . Alternative splicing: increasing diversity in the proteomic world. 
Trends Genet 1 7: 100- 107 .  
Harris RJ, Elder D. 2000. Ribozyme relationships: the hammerhead, hepatitis delta, 
and hairpin ribozymes have a common origin. J Mol Evol 51: 1 82-4. 
Herbert A, Rich A. 1 999a. RNA processing in evolution. The logic of soft-wired 
genomes.  Ann N Y Acad Sci 870: 1 1 9- 132. 
Herbert A, Rich A. 1 999b. RNA processing and the evolution of eukaryotes. Nat 
Genet 3: 265-269. 
Hetzer M, Wurzer G, Schweyen RJ, Mueller MW. 1 997. Trans-activation of group II 
intron splicing by nuclear U5 snRNA. Nature 386: 4 17-420. 
Htittenhofer A, Kiefmann M, Meier-Ewert S, Q'Brien J, Lehrach H, Bachellerie J-P, 
Brosius J. 200 1 .  RNomics: an experimental approach that identifies 201 candidates 
for novel, small, non-messenger RNAs in mouse. EMBO J 20: 2943-2953 .  
Iwabe N,  Kuma K-I, Hasegawa M, Osawa S,  Miyata T.  1 989. Evolutionary 
relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic 
trees of duplicated genes. Proc Natl Acad Sci USA 86: 9355-9359. 
Jeffares DC, Poole AM, Penny D.  1 995. Pre-rRNA processing and the RNA world. 
Trends Biochem Sci 20: 298-299. 
leffares DC, Poole AM, Penny D. 1 998. Relics from the RNA world. J Mol Evo1 46: 
1 8-36. 
leltsch A, Kroger M, Pingoud A. 1995. Evidence for an evolutionary relationship 
among type-ii restriction endonucleases. Gene 160: 7-1 6. 
Keeling PJ, McFadden GI. 1998. Origins of micro sporidia. Trends Microbiol 6: 1 9-
23.  
Keiler K,  Wall er P,  Sauer R. 1 996. Role of a peptide tagging system in degradation of 
proteins synthesized from damaged messenger RNA. Science 2 71 :  990-993 . 
Page 29 
Keiler KC, Shapiro L, Williams KP. 2000. tmRNAs that encode proteolysis-inducing 
tags are found in all known bacterial genomes: A two-piece tmRNA functions in 
Caulobacter. Proc Natl Acad Sci U S A 97: 7778-7783. 
Kelley RL, Kuroda MI. 2000. The role of chromosomal RNAs in marking the X for 
dosage compensation. Curr Opin Genet Dev 10: 555-61 .  
Kiss-Uisz16 Z, Henry Y, Bachellerie J-P, Caizergues-Ferrer M, Kiss T 1 996. Site­
specific ribose methylation of preribosomal RNA; a novel function for small 
nucleolar RNAs. Cell 85: 1077- 1088. 
Kremerskothen J, Nettermann M, op de Bekke A, Bachmann M, Brosius 1.  1 998. 
Identification of human autoantigen LalSS-B as BCIIBC200 RNA-binding protein. 
DNA Cell BioI 1 7: 75 1 -759. 
Lafontaine DLJ, Tollervey D.  1 998. Birth of the snoRNPs: the evolution of the 
modification-guide snoRNAs. Trends Biochem Sci 23: 383-388. 
Lambowitz AM, Caprara MG, Zimrnerly S,  Perlman PS . 1 999. Group I and group II 
ribozymes as RNPs: clues to the past and guides to the future. In: Gesteland R, Cech 
T, Atkins J, eds. The RNA World, 2nd Ed. New York: Cold Spring Harbor 
Laboratory Press. pp 45 1 -485. 
Lease R, Belfort M 2000. Riboregulation by DsrA RNA: trans-actions for global 
economy. Mol Microbio1 38: 667-672. 
Lee JT, Davidow LS, Warshawsky D 1999. Tsix, a gene antisense to Xist at the X­
inactivation centre. Nat Genet 21:  400-404. 
Lee JT, Jaenisch R 1 997 . The (epi)genetic control of mammalian X-chromosome 
inactivation. Curr Opin Genet Dev 7: 274-280. 
Lee RC, Feinbaum RL, Ambros V. 1993 . The C. elegans heterochronic gene lin-4 
encodes small RNAs with anti sense complementarity to lin- 14. Cell 75: 843-854. 
Logan DT, Andersson J, Sjoberg B-M, Nordlund P. 1999. A glycyl radical site in the 
crystal structure of a class III ribonucleotide reductase. Science 283: 1 499- 1 504. 
Logsdon JM Jr. 1 998.  The recent origin of spliceosomal introns revisited. Curr Opin 
Genet Dev 8, 637-648.  
Lopez P, Forterre P, Philippe H. 1999. The root of the tree of life in light of the 
covarion modeL J Mol Evol 49: 496-508. 
Lowe TM, Eddy SR 1999. A computational screen for methylation guide snoRNAs 
in yeast. Science 283: 1 1 68- 1 1 7 1 .  
Lykke-Andersen J ,  Aagaard C ,  Semionenkov M ,  Garrett RA. 1997. Archaeal introns: 
splicing, intercellular mobility and evolution. Trends Biochem Sci 22: 326-33 1 .  
Mair G, Shi H, Li H, Djikeng A, Aviles HO, Bishop JR, Fa1cone FH, Gavrilescu C,  
Montgomery JL, Santori MI, Stern LS, Wang Z, Ullu E, Tschudi C .  2000. A new 
twist in trypanosome metabolism:cis-splicing of pre-mRNA. RNA 6: 1 63- 1 69. 
Maizels N, Weiner AM. 1 999. The genomic tag hypothesis: what molecular fossils 
tell us about the evolution of tRNA. In: Gesteland R, Cech T, Atkins J, eds. The 
RNA World, 2nd Ed. New York: Cold Spring Harbor Laboratory Press .  pp 79- 1 1 1 . 
Page 30 
Marguet E, Forterre P. 1 994. DNA stability at temperatures typical for thermophiles. 
Nucleic Acids Res 22: 1 68 1 - 1 686. 
Marin I, Siegal ML, Baker BS .  2000. The evolution of dosage compensation 
mechanisms. BioEssays 22: 1 106- 1 1 14. 
Martfnez Gimenez JA, Saez GT, Seisdedos RT. 1 998. On the function of modified 
nuc1eosides in the RNA world. J Theor Bioi 194: 485-490. 
Mayer C, Suck D, Poch O. 200 1 .  The archaeal homolog of the Imp4 protein, a 
eUkaryotic U3 snoRNP component. Trends Biochem Sci 26: 143- 144.  
McArthur AG, Morrison HG, Nixon JE, Passamaneck NQ, Kim U, Hinkle G, Crocker 
MK, Holder ME, Farr R, Reich Cl, Olsen GE, Aley SB, Adam RD, Gillin FD, Sogin 
ML. 2000. The Giardia genome project database. FEMS Microbiol Lett 189: 27 1 -
273 . 
Mitchell JR, Cheng J, Collins K. 1999. A box H1ACA small nuc1eolar RNA-like 
domain at the human telomerase RNA 3' end. Mol Cell Bioi 19: 567-576. 
Morrissey JP, Tollervey D. 1 995.  Birth of the snoRNPs: the evolution of RNase MRP 
and the eukaryotic pre-rRNA-processing system. Trends Biochem Sci 20; 78-82. 
Moss E, Lee R, Ambros V. 1 997. The cold shock domain protein LIN-28 controls 
developmental timing in C. elegans and is regulated by the lin-4 RNA. Cell 88; 637-
646.  
Muller B ,  Schtimperli D.  1 997. The U7 snRNP and the hairpin binding protein: key 
players in histone mRNA metabolism. Semin Cell Dev BioI 8; 567-576. 
Muslimov lA, Banker G, Brosius J, Tiedge H. 1998. Activity-dependent regulation of 
dendritic BC1 RNA in hippocampal neurons in culture. J Cell Bioi 141: 1 60 1 - 1 6 1 1 .  
Muth GW, Ortoleva-Donnelly L, Strobel SA. 2000. A single adenosine with a neutral 
pKa in the ribosomal peptidyl transferase center. Science 289; 947-950. 
Nakano S-l, Chadalavada DM, Bevilacqua Pc. 2000. General acid-base catalysis in 
the mechanism of a hepatitis delta virus ribozyme. Science 287; 1 493- 1497. 
Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DR, Hickey EK, 
Peterson JD, Nelson WC, Ketchum KA, McDonald L, Utterback TR, Malek JA, 
Linher KD, Garrett MM, Stewart AM, Cotton MD, Pratt MS, Phillips CA, 
Richardson D, Heidelberg J, Sutton GG, Fleischmann RD, Eisen JA, Whilte 0, 
Salzberg SL, Smith HO, Venter JC, Fraser CM. 1 999. Evidence for lateral gene 
transfer between Archaea and Bacteria from genome sequence of Thermotoga 
maritima. Nature 399: 323-329. 
Nilsen TW. 2000. RNA splicing: The case for an RNA enzyme. Nature 408; 782-783.  
Nissen P, Hansen J, Ban N, Moore PB, Steitz TA. 2000. The structural basis of 
ribosome activity in peptide bond synthesis. Science 289: 920-930. 
Noller HF, Hoffarth V, Zimniak L. 1 992. Unusual resistance of peptidyl transferase to 
protein extraction procedures.  Science 256: 1 4 1 6- 14 19. 
Noon KE, Bruenger E, McCloskey lA. 1 998. Post-transcriptional modifications in 
1 6S and 23S rRNAs of the archaeal hyperthermophile Sulfolobus solfataricus. J 
Bacteriol 180: 2883-2888. 
Page 3 1  
Ofengand 1, Foumier MJ. 1 998. The pseudouridine residues of rRNA: number, 
location, biosynthesis, and function. In: Grosjean H, Benne R, eds. Modification and 
Editing of RNA. Washington, DC: ASM Press. pp. 229-253 .  
Omer AD, Lowe TM, RusseIl AG, Ebhardt H, Eddy SR, Dennis PP. 2000. Homologs 
of small nucleolar RNAs in Archaea. Science 288: 5 17-522. 
Pannuti A, Lucchesi le. 2000. Recycling to remodel: evolution of dosage 
compensation complexes. Curr Opin Dev Genet 10: 644-650. 
Pasquinelli AE, Reinhart BJ, Slack F, Martindale MQ, Kuroda MI, Maller B ,  
Hayward DC, Ball EE, Degnan B,  Muller P ,  Spring J, Srinivasan A ,  Fishman M, 
Finnerty J, Corbo J, Levine M, Leahy P, Davidson E, Ruvkun G. 2000. 
Conservation of the sequence and temporal expression of let-7 heterochronic 
regulatory RNA. Nature 408: 86-89. 
Pelczar P, Filipowicz W. 1 998. The host gene for intronic V17  small nucleolar RNAs 
in mammals has no protein-coding potential and is a member of the 5'-terminal 
oligopyrimidine gene family. Mol Cell BioI 18: 4509-45 18 .  
Penny D,  Poole A.  1 999. The nature of  the Last Universal Common Ancestor. Curr 
Opin Genet Dev 9: 672-677. 
Perrotta AT, Shih I-H, Been MD. 1999. Imidazole rescue of a cytosine mutation in a 
self-cleaving ribozyme. Science 286: 1 23- 1 26. 
Philippe H, Germot A, Moreira D. 2000. The new phylogeny of eukaryotes. Curr 
Opin Genet Dev 10: 596-60 l .  
Pogacic V,  Dragon F, Filipowicz W. 2000. Human HJACA small nuc1eolar RNPs and 
telomerase share evolutionarily conserved proteins NHP2 and NOPlO. Mol Cell 
BioI. 20: 9028-9040. 
Poole A, Jeffares D, Penny D. 1 999. Early evolution: prokaryotes, the new kids on the 
block. Bioessays 21:  880-889. 
Poole A, Penny D,  Sjoberg B-M. 2000. Methyl-RNA: an evolutionary bridge between 
RNA and DNA? Chem. Bioi. 7: R207-R2 1 6. 
Poole AM, leffares DC, Penny D. 1998. The path from the RNA world. J Mol Evol 
46: 1 - 1 7 . 
Poole AM , Phillips MJ, Penny D. 200 1 .  Prokaryote and eukaryote evolvability. 
Biosystems, submitted. 
Rastogi T, Beattie TL, Olive lE, Collins RA. 1 996. A long-range pseudoknot is 
required for activity of the Neurospora VS ribozyme. EMBO J. 15: 2820-2825 . 
Reanney De. 1 984. RNA splicing as an error-screening mechanism. 1. Theor. Biol. 
1 10: 3 1 5-321 .  
Reanney De. 1 986. Genetic error and genome design. Trends Genet. 2 :  4 1 -46. 
Romeo T. 1 998. Global regulation by the small RNA-binding protein CsrA and the 
non-coding RNA molecule CsrB. Mol Microbiol 29: 1 32 1 - 1 330. 
Rotte C, Henze K, Muller M, Martin W. 2000. Origins of hydrogenosomes and 
mitochondria. Curr Opin Microbial 3: 48 1 -486. 
Page 32 
Rupert PB, Ferre-D'Amare AR. 200 l .  Crystal structure of a hairpin ribozyme­
inhibitor complex with implications for catalysis. Nature 410: 780-786. 
Rzhetsky A, Ayala FJ, Hsu LC, Chang C, Yoshida A. 1997. Exonlintron structure of 
aldehyde dehydrogenase genes supports the 'introns-late' theory. Proc Natl Acad Sci 
USA 94: 6820-6825. 
Salgado-Garrido J, Bragado-Nilsson E, Kandels-Lewis S, Seraphin B .  1 999. Srn and 
Sm-like proteins assemble in two related complexes of deep evolutionary origin. 
EMBO J 18: 345 1 -3462. 
Saville BJ, Collins RA. 1 99 1 .  RNA-Mediated Ligation of Self-Cleavage Products of a 
Neurospora Mitochondrial Plasmid Transcript. Proc Natl Acad Sci USA 88: 
8826-8830. 
Sharp PA. 1 985 .  On the origin of RNA splicing and introns. Cell 42: 397-400. 
Sharp PA. 1 99 1 .  "Five easy pieces". Science 254: 663. 
Sharp PA. 1 994. Split genes and RNA splicing. Cell 77: 805-8 15 .  
Simpson L ,  Thiemann OH, Savill NJ, Alfonzo JD, Maslov DA. 2000. Evolution of 
RNA editing in trypanosome mitochondria. Proc Natl Acad Sci USA 97: 6986-6993. 
Skryabin BV, Kremerskothen J, Vassilacopoulou D, Disotell TR, Kapitonov VV, 
Jurka J, Brosius J. 1 998. The BC200 RNA gene and its neural expression are 
conserved in Anthropoidea (Primates). J Mol Evo1 47: 677-685. 
Smit AF A. 1 999. Interspersed repeats and other mementos of transposable elements 
in mammalian genomes. Curr Opin Genet Dev 9: 657-663. 
Smith HC, Gott JM, Hanson MR. 1997. RNA 3: 1 105- 1 1 23. 
Smith CM, Steitz JA. 1 997. Sno storm in the nucleolus: new roles for myriad small 
RNPs. Cell 89: 669-672. 
Smith CM, Steitz JA. 1998. Classification of gas5 as a multi-small-nucleolar RNA 
(snoRNA) host gene and a member of the 5 '-terminal oligopyrimidine gene family 
reveals common features of snoRNA host genes. Mol Cell BioI 18: 6897-6909. 
Sontheimer, EJ, Gordon PM, Piccirilli JA. 1999. Metal ion catalysis during group II 
intron self-splicing: parallels with the spliceosome. Genes Dev 13: 1 729- 1 74 1 .  
Stoltzfus A. 1 999. On the possibility of constructive neutral evolution. J Mol Evo1 49: 
1 69- 1 8 1 .  
Stoltzfus A, Logsdon JM Jr, Palmer JD, Doolittle WF. 1 997. Intron 'sliding' and the 
diversity of intron positions. Proc Natl Acad Sci USA 94: 10739- 10744. 
Symons RH. 1 997. Plant pathogenic RNAs and RNA catalysis. Nucleic Acids Res 25: 
2683-2689. 
Tarn W-Y, Steitz lA. 1 997. Pre-mRNA splicing: the discovery of a new spliceosome 
doubles the challenge. Trends Biochem Sci 22: 1 32- 1 37. 
Trotta CR, Abelson 1. 1 999. tRNA splicing: an RNA world add-on or an ancient 
reaction? In: Gesteland R, Cech T, Atkins J, eds. The RNA World, 2nd Ed. New 
York: Cold Spring Harbor Laboratory Press. pp 561 -584. 
Tycowski KT, Shu MD, Steitz lA. 1 996a. A mammalian gene with introns instead of 
exons generating stable RNA products. Nature 379: 464-466. 
Page 33  
Tycowski KT, Smith CM, Shu M-D, Steitz JA. 1 996b. A small nuc1eolar RNA 
requirement for site-specific ribose methylation of rRNA in Xenopus. Proc Natl Acad 
Sci USA 93: 1 4480- 14485. 
Tycowski KT, You Z-H, Graham Pl, Steitz lA. 1 998. Modification ofU6 
spliceosomal RNA is guided by other small RNAs. Mol. Cell 2:  629-638.  
Wassarman KM , Storz G. 2000. 6S RNA regulates E. coli RNA polymerase activity. 
Cell 101: 6 13-623. 
Wassarman KM, Zhang A, Storz G. 1999. Small RNAs in Escherichia coli . Trends 
Microbial 7: 37-45 .  
Watanabe Kl, Bessho Y ,  Kawasaki, M, Hori H .  1 999. Mitochondrial genes are found 
on minicirc1e DNA molecules in the mesozoan animal Dicyema J Mol BioI 286: 
645-650. 
Watanabe Y, Yamamoto M. 1994. S. pombe mei2+ encodes and RNA-binding 
protein essential for premeiotic DNA synthesis and meiosis I, which cooperates with 
a novel RNA species meiRNA. Cell 78: 487-498 . 
Watkins NJ, Segault V, Charpentier B, Nottrott S ,  Fabrizio P, Bachi A, Wilm M, 
Rosbash M, Branlant C, Llihrmann R. 2000. A common core RNP structure shared 
between the small nuc1eoar box CID RNPs and the spliceosomal U4 snRNP. Cell 
103: 457-466. 
Weiner AM. 1 993. mRNA splicing and autocatalytic introns: distant cousins or the 
products of chemical determinism? Cell 72: 1 6 1- 164 
Weinstein L, Steitz lA. 1 999. Guided tours : from precursor snoRNA to functional 
snoRNP. Curr Opin Cell Biol ll:  378-384. 
Westhof E. 1 999. Chemical diversity in RNA cleavage. Science 286: 6 1 -62. 
Wightman B, Ha I, Ruvkun G. 1993. Posttranscriptional regulation of the 
heterochronic gene lin- 14  by lin-4 mediates temporal pattern formation in C. 
elegans. Cell 75: 855-862. 
Woese CR. 1 987. Bacterial evolution. Microbial Rev 51: 22 1 -27 l .  
Woese CR, Kandler 0, Wheelis ML. 1 990. Towards a natural system of organisms: 
proposal for the domains Archaea, Bacteria, and Eukarya. Proc Natl Acad Sci USA 
87: 4576-4579. 
Wolf YI, Kondrashov FA, Koonin EV. 2000. No footprints of primordial introns in a 
eukaryotic genome. Trends Genet 16: 333-334. 
Yean S-L, Wuenschell G, Termini l, Lin R-J. 2000. Metal-ion coordination by U6 
small nuclear RNA contributes to catalysis in the spliceosome. Nature 408: 88 1 -
884. 
Yusupov MM, Yusupova GZ, Baucom A, Lieberman K, Earnest TN, Cate JH, Noller 
HP. 2001 . Crystal structure of the ribosome at 5 .5  A resolution. Science 292: 883-
896. 
Zhang A, Altuvia S,  Tiwari A, Argaman L, Hengge-Aronis R, Storz G. 1 998a. The 
OxyS regulatory RNA represses rpoS translation and binds the Hfq (Hf-I) protein. 
EMBO J 1 7: 6061 -6068 .  
Page 34 
Zhang F, Lemieux S ,  Wu X, St.-Arnaud D,  McMurray C, Major F, Anderson D.  
1 998b. Function of hexameric RNA in packaging of bacteriophage <1>29 DNA in 
vitro. Mol Cell 2:  1 4 1 - 147. 
Zhang Z, Green BR, Cavalier-Smith T.  1 999. Single gene circles in dinoflagellate 
chloroplast genomes. Nature 400: 1 55 - 1 59. 
Page 35  
Figure legends. 
Figure 1. Problems for inferring ancestry of group 11 introns and spliceosomal 
RNAs from the tree of life. 
In a and c, the bacterial rooting is shown, in b and d, the eukaryote rooting is shown. 
Blue dots represent group II introns and spliceosomal RNAs, grey dots denote 
absence of these. The position of the root does not allow evaluation of the different 
trees with simple parsimony since trees a and b show origin of spliceosomal RNAs 
through 'seeding' from the mitochondrion. Consequently all four trees are equally 
likely. Testing the 'seed' hypothesis by examining the bacterial distribution of group II 
introns will be inconclusive for two reasons. First, group II introns at,;; mobile, and 
second, limited distribution can equally be explained by polyphyletic losses. 
Likewise, Ubiquity of group II introns in bacteria cannot be taken as support for a 
common ancestor of group II introns and spliceosomal RNAs in the LUCA, since the 
former are mobile elements. Finding group II introns in archaea can also be 
ambiguously interpreted. 
Figure 2. Introns first hypothesis. 
The final step in the origin of genetically-encoded protein-synthesis is presumed to be 
the origin of mRNA. We propose that the non-coding 'transcripts' ,  produced as a by­
product in the processing of precursor transcripts containing functional RNAs (such 
as snoRNAs), were the source of the first genetically-encoded proteins. These were 
utilised by the proto-ribosome to stabilise the interaction between two charged 
tRNAs, during non-genetically-encoded peptide synthesis .  As primary sequence 
structure appears unimportant for non-specific RNA-binding, we propose that the first 
proteins produced in this manner were not catalytic, and could retain function despite 
a high mutation rate in the genomic sequence. Hence, we postulate that it was by 
virtue of the coupling of cleavage and ligation Ca transesterification) in the proto­
spliceosome that the first genetically-encoded proteins arose. 
Figure 3. SnoRNAs in the LUCA? 
The suggestion that snoRNAs date back to the RNA world may be independently 
examined depending on the placing of the root of tree of life. Currently the position 
of the root is unresolved, with bacterial and eukaryote rootings being considered as 
possibilities. A. If the bacterial rooting is correct, it is not possible to establish from 
the tree alone if the LUCA possessed snoRNAs. B .  If the eukaryote rooting is 
correct, the most parsimonious explanation is that the LUCA contained snoRNAs, 
since these are then found on both sides of the root. The position of the root is in 
dispute, and since the rooting drastically affects the utility of the tree, it is difficult to 
use phylogenetic distribution to resolve the debate. Until a consensus is reached, 
biochemical arguments have to be relied upon (see text) . 
Page 36 
E A B E A B 
a b 
............................................. . . . ................................................................ \........................................................................ .......................................... . 
E A B E A B 
d 
snoRNAs 
RNA / �  
genome 
transcnpt 
functional snoRNAs 
liberated 
non-coding 
regIons 
/ �  
non -functional 
' transcript' released 
' transcript ' util ised as 
stabilising template in 
peptide synthesis 
A. 
Common ancestor of 
Archaea ,  Eukaryotes 
had snoR NAs 
Tree alone cannot 
determ ine if LUCA 
possessed snoRNAs 
A E B 
LUCA 
B.  
snoRNAs lost 
from bacteria 
A B 
LUCA 
RNA worl d orig in  __ � 
for snoRNAs 
E 
Table 1 .  Candidate post-RNA world RNAs. 
RNA Distribution Function Comments References 
roXl & roX2 D. melanogaster Dosage roXl & roX2 a rc u n re l a ted, Fra n kc & Baker, 1 999 . 
c o m pensa t i on a nd nei ther a re re l ated to Xisl 
Xisl & Tsix M a m m als or Tsix. Tsix i s  a n  a n t i s e n se Lee et a I . ,  1 99 9 ;  
reg u l ator o f  Xisl.  K e l l ey & K u roda, 2 0 0 0 .  
BC 200 P r i m ates Tra n s l a t i on reg u l a t ion i n  BC 1 and B C 2 0 0  a re u n re l a ted, S k ry a b i n  e t  a I . , 1 998. 
d e n d r i tes b u t  may be serve a n a l ogous 
BC 1 Roden ts ro l es . Both b i n d  a prote i n  M us l i mo v  e t  a I . ,  1 998; 
homol ogous between P r i m a tes K re me rskothen et a I . ,  1 99 8 .  
a nd Rodents. 
lin -4 C. elegans, C. briggsae A n t i sense reg u l a to r  of lin-14 L e e  e t  a I . ,  1 993 ; W i g h t m a n  et 
a n d  lin-28. a I . , 1 993 ; Moss et a I . ,  1 997. 
lel-7 B i l ateri a n  a n i m a l s  A n t isense reg u l a to r  o f  lin-41 Pasqu i ne I l i  et a I . ,  2000. 
proba b l y  i n  l a te t e m poral 
t ra ns i t ions i n  d e v e l o p ment. 
OxyS RNA E. coli Ox i d a t i v e  s t ress - i n d i ced A l t u v i a  et a I . ,  1 99 8 ;  
a n t isense g l oba l i n h i b i tor o f  Z h a ng e t  a I . ,  1 998a . 
t ra ns l a t i on i n i t i a t i o n .  
DsrA RNA E. coli A n t i sense reg u l a t o r  o f  I n h i b i ts H - N S  trans l a t i o n ,  b u t  Lease & B e l  fort,  2000 
tra n s l a t i on i n i t i at i o n  o f  global s t i m u l a ted R po S  tra n s l a t i o n ,  
act i n  thro u  h R NA - R N A  
MicF RNA 
D icF RNA 
meiRNA 
tmRNA 
G8 RNA 
6S RNA 
gRNAs 
Gram- negative bacteria 
E. coli 
and R poS. 
Act i vator o f  tra n s l a t i o n  
i n i t i a t i on o f  OmpF 
A n t i s ense regu l a to r in cel l 
d i v is i on .  
Schizosaccharomyces pombe R egu l at i on of m e i os i s  
B acteria R i bosome/m R N A/prote i n  
E. coli, Erwinia carotovora 
Tetrahymena thermophila 
E. coli 
Kineto p l as tids of 
t ry p anosomes 
re l ease 
B i nds and i n h i b i ts CsrA g l obal 
regul atory prote i n  
Estab l is h m e n t  o f  
themotolerance. 
Mod u l at i o n  o f  R N A  
po l y merase ac t i v i ty 
Ed i t i ng o f III N A  tra n s c r i p ts 
i n terac t i o ns . 
R N A  u d i l i ng by gLl idc N A  
argued to be a n c i e n t ,  b u t  i s  
m o s t  probably a n  adapta tion t o  
M u l l er's ratchet (see tex t) . 
D e l i h a s ,  L 995 . 
D e l i h as , 1 995 . 
W a ta nabe & Y a m a moto, 1 994; 
O h no & Mattaj ,  1 999. 
K e i l e r  e t  a l . ,  1 996; 
K e i l e r  et a l . ,  2000 . 
R o m eo, 1 998. 
Fung et a l . ,  1 99 5 .  
W assarman & S to rz ,  2000 . 
s te v  z & S i m pson , 1 999; 
S i m pso n e t  a l .  2000. 
Bacteriophage �29 Bacteriophage �29 
RNA 
Hammerhead P l an t  pathogen ic R NA s  
ribozymes S a l a mander n u c l ear D N A  
Hairpin ribozymc Pl ant pathoge n ic R NAs 
Hepatit is delta H epat i t i s  d e l ta v i rus 
virus ribozyme 
Neurospora VS 
ribozyme 
U7 snRNA 
Group I introns 
Neurospora 
M etazoa 
R NA hexamer req u i red fo r 
DNA packagi ng 
Genome rep l  ical ion 
T ransc r i p l  process i ng 
Geno me rep l ication 
V i ra l  genome repl ication 
T ranscr i p t  process i ng i n  
m i tochond r i a l  D NA p l as m id 
H istone pre- m R N A  process i ng W h i l e h i stones a re fou n d  i n  
A rc haea, the l i m i ted 
d i stri b u t i o n  of U7 sugges ts it 
a rose in e u k a ryotes, though 
more data a re needed. 
Mob i l e se l f i s h  e l ement C ata l y s i s  is v i a  3 ' O H  o f  
Z h a n g  e t  a I . , 1 998b. 
Sy mons, 1 99 7 ;  
G a rretl e l  a I . ,  1 996. 
S y m o n s ,  1 99 7 .  
B e e n  & W i c k h a m ,  1 99 7 .  
S a v i l l e et a I . ,  1 99 1 ;  
R astog i et a I . ,  1 996. 
M u l ler & Sc h u m pe r l i ,  1 997 .  
Cec h & G o l d e n ,  1 999; E u karyotic organ e l les & 
n u c leus, Phage, Bacteria guanos i ne, suppl ied in Lrans , a Ly k ke -A nderse n  et a I . , 1 99 7 .  
Group 11  introns Phage, E u karyote orga n e l les, Mob i l e selfish e ement 
Bacteria 
mechan i s m  d i s t i n c t  fro m  group 
II1s p l iceosoma l  c ata lys i s .  
A rgued t o  be e i t her 
e v o l u t i o n a ri ly re l a ted to the 
spl i ceosome or e v o l ved 
Logs d o n ,  1 998 
Cech & G o l d e n ,  1 999. 
Diversity of 
CID & H/ACA 
snoRNAs 
Eukaryote nucleolus C l eavage, methy l a t i o n  & 
pseudou ridy l a t i on of r R N A .  
a nd probab l y other R N As .  
rece n t l y  de novo (see tex t) . 
CID box fa m i l y  a re fou nd i n  
A rc h aea a l so. O p i n ion is 
d i v i ded on w h e t h e r  these a re 
R N A  world re l i cs (sce tex t) . 
me CI D ' noR N As a ppear to The e noR N As a re most 
be i n v o l ved in reg u l a t i o n  o f  p robab l y rece n t  i n n o v a t i o n . 
b ra i n -spec i f i c ge n e  e xpress ion . 
W e i n s te i n  & S t e i tz. 1 999; 
O m e r  e l  a l . .  2000. 
Cav a i l l e e t  a I . ,  2000. 
Poole A, leffares D, Penny D. 
Early evolution: prokaryotes, the new kids on the block. 
Bioessays 2 1 ,  880-889 ( 1999). 
Paper 4 
Reprinted by permission of Wiley-Liss, Inc., a subsidiary of John Wiley 
& Sons, Inc. 

Paper S 
Penny D & Poole A. 
The nature of the Last Universal Common Ancestor. 
Current Opinion in Genetics & Development 9, 672-677 ( 1 999). 
Reprinted with permission from Elsevier Science. 

674 Genomes and evolution 
Figure 3 
Transcribed 
pre-RNA 
- - -
pre-tRNA 
pre-rRNA 
- - -
pre-mRNA 
RNA 
processing 
Mature 
RNA 
ct;NAS =i> 
Ribosome 
Current Opinion in Genetics & Development 
The RNA processing pattern in eukaryotes 
reflects that of the LUCA. An examination of 
RNAs involved in translation reveals a striking 
pattern. Precursor RNAs are processed by 
RNPs (ribonucleoproteins-RNA plus cognate 
protein) to yield mature RNAs. Furthermore, 
RNPs process other RNPs - snoRNAs are 
released by sn RNAs, the RNA component of 
the splicing machinery, which in turn are 
crucial for rRNA processing. In prokaryotes, 
some of these RNAs have been lost (shaded 
region), and indeed, in the case of pre-mRNA, 
the processing step has been lost completely. 
Eukaryotes have retained a more complete 
record of the supposed RNA-world 
processing pathway than have prokaryotes. 
of l ife between eukarya and archaea-bacteriais consistent 
with the conclusion that tbc genome architecturc of thc 
LUCA more closely resemhled that of eukarya. 
Thermoreduction and prokaryote origins 
I n  postulating the nature of the LCCA, i t  is essential to 
consider the selective forces that would give rise to either 
prokaryo[es or eukaryotes. Two selective forces that rein­
force each othcr have heen proposed by which prokaryotes 
could have evolved from an ancestor containing a eukary­
me-l ike genome: thermoreductjon a nd r-selection, 
[20··,21 ··,24}. r-sclected organisms arc fast-growing, com­
peting for nurrient sources which fluctuate greatly in 
abundance. Yeast  is r-selecred when compared to an oak 
tree, which grows slowly, has a slow generarion rim<: a nd a 
fairly constant nutrient source (and is thus K-seleeted), and 
prokaryot<:s arc r-selccted relative to eukaryotes. r selec­
tion generally results in extremely fast and et1icient use of 
resources, hecause limited availability produces strong 
competition for these. At the molecular level. the result is 
that enzymes that affect metaholite uti l isation a nd organ is­
mal growth rate wi l l  be driven toward p<:rfection at  a faster 
rate than in organisms not under r selection. Thus, r selec­
tion may a t  l east account partially for the observed 
replacement of R0:A enzymcs by protein in the prokaryote 
Ii neages [200.,2 1 • •  ] '  
The tbennoreciuction hypothesis [241 is that prokaryotes 
arose from mesophiies hy adaptation, via the loss of ther­
molabile traits, to h igh-temperature environments. This 
expla ins the loss of the ssRNA processing pathways 
(Figure 3) daring hack to the RNA-world. Single-stranded 
RNA is heat labi le, and would have been the Achilles' heel 
of  early chermophiies. Accelerating ssRNA processin.g 
(mRNA., (RNA and rRNA) from hours (eukaryotes) to min­
utes (prokaryotes) would  increase the viabi l i ty of an  
organism at  high t<:mperawres. This loss of  pre-mR0:A pro­
cessing, as well as the replacement of snoRNA-mediated 
rRNA processing with a protei n  e nzyme system, would 
have been important steps in the evolution of thermophily. 
Unlike RNA, proteins are capable of extreme thermost<l­
bility [25J. Furthermore, circular chromosomes a re more 
thermostable than linear [261. 
Other important molecules, such as glutamine [271 and 
carhamoyl phosphate [28], are a lso thermolabi le. 
Glutamine is a protein amino acid and major n itrogen 
donor whereas carbamoyl phosphate is a crucial intermedi­
ate in the formation of pyrimidines and arginine. Pathways 
where carbamoyl phosphate and/or glummine are used 
may have been affected by thermoreduetion. For instance, 
in the hyperthermophi l ic arehaeon Pyrococcusjuriosus, car­
bamoyl phosphate is used immediately after synthesis by 
metabolite channell ing, and has ammonia rather than glu­
tamine as amino donor [28J. A second example of metabo­
l i te channel l ing is  mischarging of gluraminyl-tRNA with 
glutamate, thereby making glutamine synthesis tJle final 
step before i ncorporation i nto protein; this is  widespread 
within the prokaryotes bur absent from eukaryotes [20··J. 
Although the a rea requires more investigation, the distrib­
ution of these traits in archaea and bacteria is predicted hy 
the thermoreduction hypothesis. 
Another damset consistent with the LUCA being mesophilic 
eomes from reconstructions of ancestral GC content. Galtier 
et al. [29··] have estimated its GC content and find it much 
lower than that characteristic of thermophiles. Moreover, a 
comparable result was obtained using only the thermophiles 
in their dataset. All work involving ancient sequence compar­
isons needs to be rigorously scrutinised but, in light of all the 
ahove data, the result is compelling nonetheless. In addition, 
that nucieotides themselves are unstable at high tempera­
tures [30·1 is consistent with a more mesophilic origin of life. 
Overall, the thermoreduction hypothesis predicts a mesophilic 
LUCA with a genome and RNA-processing system more 
characteristic of eukarya. 'l'he power of the thermoreduction 
hypothesis is that it predicts a range of phenomena, rather than 
relying on (Id !toe explanations of individual phenomena. Fossil 
dates do not conlTJdict this picturc because rocks from 2700 
The nature of the last universal common ancestor Penny and Poole 675 
Figure 4 
(a) Fusion (b) Bacterial rooting 
Fitting the data to the trees. Given our current 
understanding, several alternative trees could 
fit the data without altering either main 
conclusion. These are that the eukarya retain 
the greatest amount of biochemical similarity 
to the LUCA and that the prokaryotes have 
been through a period of reductive evolution, 
mainly through evolving to life at high 
temperatures. Some possible trees are as 
follows (episodes of thermoreduction and the 
origin of mitochondria are indicated). (a) The 
origin of eukarya (El by fusion of a bacterium 
(8) and an archeon (A) fits the informational (I) 
and operational (0) gene distribution but is 
hard to f it  a l l  the data. It does not explain the 
origin of the nuclear membrane, however, 
which is assembled and disassembled during 
cell division, quite unlike organellar 
membranes (see [21 "]). (b) Rooling the tree 
in the bacterial branch fits the data provided 
the biochemistry of the LUCA is understood 
to be more closely similar to that of modern 
eukaryotes than that of eubacteria. A bacterial 
rooting would require that the archaea and 
bacteria arose independently via r-selection 
and thermoreduction. (c) The classic 3-
domain tree can also fit the data, provided the 
greater divergence of bacterial informational 
genes can be ascribed to higher rates of 
evolution. There would be transfer of 
operational genes back into eukarya through 
endosymbiosis. (d) The tree where the root is 
on the eukarya branch is perhaps the simplest 
with respect to the biochemical data. It is 
consistent with all the other data, provided (as 
for [c)) that the bacterial informational genes 
are indeed evolving at a faster rate. 
- � - �  
A 
(c) 3-domains 
Mya appear to have organic molecules characteristic of both 
prokaryo(es and eukaryotes retained [31 ··). 
Integrating data from genomes 
Although data gleaned  from biochemical approaches 
a llows tentative reconstmction of  the 'bare bones' L{ :CA, 
whole genomes wil l  u ltimately u ncover much more i nfor­
mation. Genomics a l lows metabolic traits to be compared 
through the presence or abse nce of genes, and by 
sequence comparisons. Howcvcr, simple comparison of  
the  presence or absence of  homologous genes does not 
take into accou nt the problems of gene loss or acquisition 
by horizontal transfer. Initial  reconstruction of the 'mini­
mal  gene set' [321 h ighlights this caveat: being criticised 
because it resulted in  cxclusion of de 11000 pathwa�'s for 
deoxyribonucleotide synthesis, lead ing the authors to con­
clude that the LUCA had an Rl\'A genome [33J. 
There is a difference between reconstfilcting the minimal 
gene set for cellular l ife, and the set of genes which the LLCA 
had. Greater caution is required when e'\amining all three 
domains, as eukaryotes received prokaryotic genes suhse­
quent to the endosymbioses of mit<x:hondria and ehloroplasts 
E A E 
(d) Eukarya rooting 
Current Opinion in Genetics &. OCI((}lopment 
[34··",5·'1· Replacement of unrelated, disrantly related, or 
paralogous genes by functional eounterpans is 'non .. onholo­
gous displacement' [36] and 'may' be cemral to understanding 
how the existing d istribution of genes has arisen. 
Ifwe expect a eukaryote-like genome for LUCA as a srafting 
point, how does this then fit with the data on operational and 
infom13tional genes (Figure I )? It is necessary to idemify the 
direction of transfer. The complexity hypothesis [3'·1 places 
l imits on gene transfer, such that we expect the transfer of 
fl)osrly the operational genes in explaining the apparent 
chimerism. It has been suggested that acquisition of prokary­
otic operational genes by eukaryotes results from their diet 
[37"]. There is no apparent selective advantage to such 
uptake, however, even though the mechanism might con .. 
trihme to gene acquisition. 
Another possibil ity is that the eukarya received the largest 
numher of hacterial operational  genes from the mitochon­
drion [38"] ,  '1\\'0 established evolutionary mechanisms 
together favour this and are compar.ible with a eukaJyal 
root: the i ncreased rate of evolution toward catalytic per­
fection under r-selection [ 1 9".21 ··], and lvliiller's ratchet. 
676 Genomes and evolution 
l\tUJler's ratchet is the term given to the continual accu­
mulation of sl ightly deleterious mutations in l ineages 
lacking recombination. lr has hee n  shown that MUller's 
ratchet is active in organelles [39} and that it drives the 
gene loss there (and also from obligate intracellular para­
sites) [34··,35··]. Most importantly. relocation of organellar 
gene s  to thc nucleus henefits hoth host and symhiont. If 
the action of MUlier's ratchet on the organelle drives gcne 
loss, this can compromise the host-endosymhiont relation­
ship  and th us therc is selcction to relocate useful genes to 
thl:: nucleus. whl::rl:: mutation rate is lowl::r. The majority of 
e ndosymhiont genes were not expected to fit this catego­
ry, however, and it was assumed these were lost over time, 
s ince equivalent functions a lready resided i n  the nucleus; 
bur the simplest explanation of the evidence is that many 
were transferred [38'·]. 
Figure 4 i l lustrates that the hioinformatic data, the RNA 
relic data. plus the evolutionary mechanisms that gave rise 
to the three domains can still fit several trees. Thus even 
with the nature of the Ll'Cf\ the branchi ng order of the 
u n iversal tree i s  not yct sufficiently informativc to resolve 
a l l  the issues. 'I'his is because each domain is a mono­
phyletic group, so the basal branches of the tree (dividing 
thc domains) can only take on a very l imited numbl::r of 
trees. Hence. the metabolic data set cannot be Ilsed as an 
u nambiguous olltgroup for rooting the tree. 
Condusions 
An inreresring picture of the LLlCA is emerging. It was a 
ful ly DNA and protein-based organism with extensive pro­
cessing of RNA transcripts by R?\Ps (Figu rc 3). I t  had an 
extensive set of proteins for DNA, RNA and protein syn­
thesis, DNA repair, recombination, control systems for 
regulation of genes and cell division, chaperone proteins, 
and probably lacked operons. Biochemistry favours a 
mesophi l ic LllCA with eukaryote-l ike RI'A processing, 
though it is stil l  possible to fit the data to several different 
trees ( Figure 4). A e ukaryote-Iike Lt 'CA is not a new idea 
and can hc traced hack to Reanney [40]. 
Details of energy source(s) are unclear, partly because 
opera tional genes a pparently undergo frequent horizontal 
transfers. Comparative genomics promises a clearer picture 
hut apparcnt i ntermingling of l ineages via horizontal trans­
fer is a major obstacle [38"'1. Increasingly, models need to 
tit our understanding of evolutionary theory and popula­
tion genetics -it i s  essential to have plausible mechanisms 
and selective forces. The extent and direction of horizon­
tal genc transfer needs accurate estimates before 
concluding the theory of descent does !lot hold for rbe ear­
liest divergences [8,42,43J .  I'evertheless, it is u nclear 
whether the LUCA was a single 'species' or whether there 
was extensive horizontal transfer between divergent l ife 
forms. An outstanding issue is the origin of nuc/car/cyto­
plasmic comparrmentarion as the concentration of RNA 
relics within the nucleus suggests this organelle is more 
ancient than previously supposed. 
Acknowledgements 
We thank H Philippe. P Forterre, H Brinkm,mn anu P Lope/' for sendini; 
rnanu5<:rtprs prior to puhlicariun. 
References and recommended reading 
Papers of  particular interest, published within the annual period of review, 
have been highlighted as: 
• of special interest 
•• of outstanding interest 
1 .  Doolitlle WF: Phylogenetlc classification and the universal tree. 
•• Science 1 999, 284:21 24-21 28. 
A recent review of developments in the fields of phylogenetics and bioinfor· 
matics as applied to the question of the rool of the tree of life. An important 
aspect is the discussion of horizontal transfer, how this could affect the search 
for the root, and the issue of whether informational genes could potentially 
transfer between lineages as readily as operational genes are suggested to. 
2. Woese CR, Kandler 0, Wheelis ML:  Towards a natural system of 
organisms: proposal for the domains Archaea, Bacteria, and 
Eukarya. Proc Na.11 Aca.d Sci USA 1 990, 87:4576-4579. 
3. Jain R, Rivera MC, Lake JA: Horizontal transfer among genomes: 
the complexity hypothesis. Proc Net! Acad Sci USA 1 999, 
96:3801-3806. 
The authors argue for extensive gene transfer between prokaryotes during 
evolution and that it is genes of the operational class that are transferred 
most frequently. They suggest that transfer of informational genes is hin­
dered by the many intermolecular interactions in which these macromole­
cules are involved. Informational genes include those for transcription, 
translation, replication. and GTPases. Operational genes are those for near­
ly all of metabolism, including regulation. 
4. Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft OH, 
• Hickey EK, Peterson JD, Nelson WC, Ketchum KA et al.: Evidence for 
lateral gene transfer between Archaea and Bacteria from genome 
sequence of Thermotoga maritima. Nature 1 999, 399:323-329. 
Another whole microbial genome sequence from TIGR. This paper consid­
ers especially the issue of horizontal gene transfer, concluding on the basis 
of conserved gene order that some horizontal transfer occurs between 
eubacteria and archaea, 
5. Aravind L, Tatusov RL, Wolf YI. Walker OR. Koonin EV: Evidence for 
• massive gene exchange between archaeal and bacterial 
hyperthermophiles. Trends Genet 1 998, 1 4:442-444. 
Describes evidence that horizontal transfer from hyperthermophilic archaea 
to hyperthermophilic bacteria occurs more readily than to mesophilic bacte­
ria. The authors conclude that this transfer may have been the defining event 
in the origin of hyperthermophilic bacteria. 
6. Rivera MC, Jain R, Moore J E, Lake JA: Genomic evidence for two 
functionally distinct gene classes. Proe Nail Acad Sei USA 1 998, 
95:6239-6244. 
Using whole-genome data, the authors class genes as either operational or 
informational on the basis of function and demonstrate that the operational 
gene sets of bacteria and eukaryotes are more closely related than that of 
the archaea, whereas the archaea-eukaryote grouping holds for the infor­
mational gene set 
7. Snel B, Bork P, Huynen MA: Genome phylogeny based on gene 
•• content Net Genel 1 999, 2 1 : 1 08-1 1 0. 
The authors use the gene content of 1 3  completely sequenced genomes for 
reconstructin9 the tree of life and rooting it Unlike sequence-based phylo­
genies. the tree is built by examining similarities and differences in gene con­
tent, so that the presence or absence of a gene is counted as a character. 
The authors conclude that massive horizontal transfer events between dis­
tant groups is not supported by their results, and that their data largely sup­
port the 1 6S rRNA tree topology for the 1 3  genomes. 
B. Woesa CR: The last universal common ancestor. Proc Natl Aead 
Sci USA 1 998, 95:6854-6859. 
9. Iwabe N, Kuma K-I. Hasegawa M. Osawa S, Miyata T:  Evolutionary 
relationship of archaebacterla, eubacterla, and eukaryotes 
inferred from phylogenetic trees of duplicated genes. Proc Natl 
Acad Se; USA 1 989, 86:9355-9359. 
1 0. Gogarten JP, Kibak H, Dittrich P, Taiz L, Bowman El, Bowman BJ, 
Manolson MF, Poola RJ, Date T, Oshima T et al.: Evolution of the 
vacuolar H+·ATPase: Implications for the origin of eukaryotes. 
Proe Nail Acad Sci USA 1 989, 86:6661 -6665. 
1 1 .  Brinkmann H, Philippe H: Archaea sister-group of Bacteria? 
•
• Indications from tree reconstruction artifacts in ancient 
phylogenies. Mol Bioi Evo1 1 999, 1 6:81 7·825. 
Uses a new method which applies the 'covarion' model to building a tree of life 
from signal recognition particle proteins. The authors conclude that the root is 
in the eukaryote branch and that earlier trees built with these sequences plac­
ing eubacteria as basal were as a result of long·branch attraction. 
The nature of the last universal common ancestor Penny and Poole 677 
1 2. Lopez P, Forterre P, Philippe H :  The root of the tree of life in light of 
•• the covarion model. J Mol Evol 1 999, 49:496-508. 
A description of a new method for applying the 'covanon' model to tree building 
that allows sites to alter their rate of evolution as secondary and tertiary structure 
evolves. Applied to the roobng of the tree of life, and with elongation factors, the 
authors conclude that the eubacteria are evolving at a higher rate than either 
archaea or eukaryotes, accounting for their basal position in earlier trees. 
1 3. Penny 0, Foulds LR, Hendy MD:  Testing the theory of evolution by 
comparing phylogenetic trees constructed from five different 
protein sequences. Nature 1 982 297:197-200. 
1 4. Philippe H, Laurent J: How good are deep phylogenetic trees? CUfr 
•• Opin Genet Dev 1 998, 8:61 6-623. 
A brief appraisal of the problems associated with buildin9 phylogenetic trees for 
deep divergences. Problems that are often ignored such as differences in evo­
lutionary rate, sequence saturation. and fast-evolving lineages are addressed, 
and limitations of existing methods as well as possible solutions are described. 
1 5. Teichmann SA, Mitchison G: Is there a phylogenetic signal ln 
prokaryote proteins? J Mol Evo/ 1 999, 49:98-1 07. 
Using a data set of 32 proteins, it is shown that one gene which has under­
gone horizontal transfer can heavily influence the construction of a phyloge­
netic tree, even for a data set of 32 proteins. Upon removal of the offending 
gene, the remainder of the data set contained little information. 
1 6. Forterre P, Philippe H: Where is the root of the universal tree of 
•• life? Bioessays 1 999, 21 :87 1 -879. 
An overview of problems associated with rooting the tree of life, along with 
arguments favouring prokaryotes being derived from a mesophilic ancestor 
that was eul<.aryote-like in many respects. The paper also reviews details of 
how eukaryotes and prokaryotes have arisen from such an ancestor. 
1 7. Lockhart PJ, Steel MA, Barbrook AC, Huson OH, Howe Cl: 
A covariotide model describes the evolution of oxygenic 
photosynthesis. Mol Bio! Evo/ 1 998 15:1 1 83-1 1 88. 
A mathematical test is introduced that can detect some cases where sites in 
a sequence are evolving under different constraints in different parts of the 
tree. It is well known in structural biology that two- and three-dimensional 
structrures of macromoles evolve over time (as predicted under W Fitch's 
covarion model); however, standard tree-building methods assume a site is 
always under the same contrstraints. 
1 8. Benner SA, Ellington AD, Tauer A: Modem metabolism as a 
palimpsest of the RNA world. Proc Natl Acad Sci USA 1 989, 
86:7054-7058. 
1 9. Jeffares DC, Poole AM, Penny 0: Relics from the RNA world. J Mol 
EvoI 1998, 46:18-36. 
An updated model of the RNA-world, describing arguments, both old and new, 
for the placing of various RNAs in the RNA-world, what gaps there are in our 
present understanding of this period. plus a review of ancient genome archi­
tecture trom the viewpoint of information theory. The paper also describes a 
novel way of viewing the evolutionary transition from RNA to protein catalysts 
and explains why some RNAs have persisted while others have not. 
20. Poole AM, Jeffares DC, Penny 0: The path from the RNA world, 
•• J Mol Evol 1 998, 46: 1 - 17.  
Here we attempted to  establish what was known about the evolutionary tran­
sitions in going from an RNA-world to the emergence of the three domains 
of life. Included is a discussion of the origins of protein synthesis, the lirst 
proteins, messenger RNA, as well as aspects of the origins of DNA. Notably, 
we put forth a new hypothesis on the origin of introns, which we call 'introns­
first'. Also discussed is the validity of using RNA-world relics for 'rooting' the 
tree of life. We conclude that the data we assemble are incompatible with a 
prokaryote-like Last Universal Common Ancestor. 
2 1 .  Poole AM, Jeffares DC, Penny D: Early evolution: prokaryotes, the 
new kids on the block. Bioessays 1 999, 21 :880-889. 
We argue for re-evaluating the nature of the Last Universal Common 
Ancestor. Emphasises the importance of continuity of function in  evolution, 
and suggests that our understanding of the RNA-world and the Last 
Universal Common Ancestor should be mututally compatible. It proposes a 
feedback process (the Darwin·Eigen cycle) where improved accuracy of 
replication permits a larger genome size, which permits coding for more fea­
tures, which permit more accurate replication. 
22. Reanney DC: Genetic error and genome design. Trends Genel 
1 986  2:41 -46. 
23. Reanney DC: Genetic error and genome design. Cold Spring Harb 
Symp Ouant Bio/ 1 987, 52:751 -757. 
24. Forterre P: Thermoreduction, a hypotheSiS for the origin of 
prokaryotes. CR Acad Sci Paris 111 1 995, 31 8:41 5-422. 
25. HilIer R, Zhou ZH, Adams MW, Englander SW: Stability and 
dynamics in a hyperthermophilic protein with melting 
temperature close to 200 degrees C. Proc Nat! Acad Sci USA 
1 997, 94:1 1 329-1 1 332. 
26. Margue! E, Forterre P: DNA stability at temperatures typical for 
thermophiles. Nucleic Acids Res 1 994, 22:1 681- 1 686 
27. Greenstein JP, Winitz M: Glutamic acid and glutamine. In Chemistry 
of the Amino Acids. New York: John WHey and Sons; 1 961 :1 929-1 954. 
28. Legrain C, Demarez M, Glansdorff N, Piarard A: Ammonia­
dependent synthesis and metabolic channelling of carbamoyl 
phosphate in the hyperthermophilic archaeon Pyrococcus furiosis. 
Microbiology 1 995, 141 : 1 093- 1099. 
29. Galtier N, Tourasse N, Gouy M: A nonhyperthermophillc common 
ancestor to extant life forms. Science 1 999, 283:220-22 1 .  
The authors compare tne GC content of modern organisms in order to  under­
stand more on the nature of the LUCA and concludes that the ancestral GC 
content was too low for it to have been hyperthermophilic. A similar ancestral 
GC content was found using only the thermophilic organisms in the dataset. 
30. Levy M, Miller SL: The stability of the RNA bases: implications for 
the origin of life. Proc Natl Acad Sci USA 1 998, 95:7933-7938. 
The hall-lives of RNA bases at temperatures characteristic of hyperthermohiles 
is shown to be too rapid for bases to accumulate in a prebiotic world. The 
authors conclude that life must originate at low temperatures or that theories 
for the high temperature origin of life must exclude the four bases in RNA. 
3 1 .  Brocks jj, logan GA, Buick R, Summons RE: Archean mOlecular 
fossils and the early rise of eukaryotes. Science 1 999 
285:1 033-1 036. 
Identification of molecular biomarkers in 2700-million-year-old Archaean shales 
in Australia argues tor the presence of photosynthetic organisms hundreds of 
millions of years before the atmosphere became oxidising. Perhaps more strik­
ingly, the research also points to the presence of eukaryotes at this time, push­
ing back the earliest identification of these organisms by 600 million years. 
32. Mushegian AR. Koonin EV: A minimal gene set for cellular life 
derived by comparison of complete bacterial genomes. Proc Nail 
Acad Sci USA 1 996, 93:1 0268-1 0273. 
33. Becerra A, Islas S, Leguina JI, Silva E, Lazcano A: Polyphyletic gene 
losses can bias backtrack characterizations of the cenancestor. J 
Mo/ Evo/ 1 997, 45: 1 1 5-1 1 7  [Mushegian AR, Koonin EV: Response. 
J Mo/ Evo! 1 997, 45:1 1 7- 1 1 8.) 
34. Andersson SGE, Zomorodipour A, Andersson JO, Sicheritz-Ponten T, 
Alsmark UCM, Podowski RM, Niislund AK, Eriksson A-S,Winkler HH, 
Kurland CG: The genome sequence of Rickettsia prowazekii and 
the Origin of mitochondria, Nature 1 998, 396: 1 33-1 40. 
The first complete genome for an ex proteobacterium and thereby among the 
closest ancestors to mitochondria. The work demonstrates the effects 01 
MOller's ratchet (accumulation of deleterious alleles in the absence of recom­
bination) on the evolution of intrace"ular Obligate microbes. 
35. Martin W, Stoebe B, Goremykin V, Hansmann S, Hasegawa M 
•• Kowallik KV: Gene transfer to the nucleus and the evolutio� of 
chloroplasts. Nature 1 998, 393:1 62- 1 65 
Using whole genomes, mostly of chloroplasts, the paper demonstrates pat­
terns of gene loss and of gene transfer to the nucleus. They find indepen­
dent gene losses in multiple lineages and identify a large set (44) of 
chloroplast genes which had transferred from chloroplast to nucleus. It is 
concluded that gene loss and transfer in organelles is best explained in 
terms of MUlier's ratchet 
36. Koonin EV, Mushegian AR, Bork P: Non-orthologous gene 
replacement. Trends Genel 1 996, 1 2:334-336. 
37. Doolittle WF: You are what you eat: a gene transfer ratchet could 
account for bacterial genes in eukaryotic nuclear genomes. 
Trends Genel 1 998 14:307-31 1 .  
The author proposes a mechanism for gene transfer from bacteria t o  the 
eukaryote nucleus. The model attempts to account for the apparent chimeric 
makeup of the nuclear genome and is a plausible mechanism by which 
eukaryotes could acquire genetic information from the diet. 
38. Martin W: Mosaic bacterial chromosomes: a challenge en route to 
•• a tree of genomes. Bioessays 1 999, 21 :99-1 04. 
A broad review of current knowledge on horizontal transfer. The author 
describes some interesting consequences of horizontal transfer with respect 
to phylogenetic analyses. Additionally, the likely differences between prokary­
ote-prokaryote transler and prokaryote-eukaryote transfer are discussed. 
39. Lynch M: Mutation accumulation in nuclear, organelle, and 
prokaryotic transfer RNA genes. Mol Bioi Evo! 1 997, 1 4:91 4-925. 
40. Reanney DC: On the origin of prokaryotes_ J Theor Bioi 1 974, 
48:243-251 . 
4 1 .  Pennisi E: Genome data shake the tree of life. Science 1 998 
280:672-673. 
' 
42. Pennisi E: Is it time to uproot the tree of life? Science 1 999, 
284: 1 305-1307. 
Paper 6 
The origin of the nuclear envelope and the origin of the eukaryote cell. 
Manuscript. 

The origin of the nuclear envelope and the origin of the eukaryote cell. 
Summary. 
Establishing the origin of the nucleus is central to understanding the evolution of the 
eukaryotic cell. One feature of virtually all discussions of nuclear origins to date is the 
lack of discussion of the nuclear envelope. Here I attempt to ask how such a unique 
membrane structure could have arisen in evolution, when this occurred, what the 
selection pressure might have been, and why it is not found in prokaryotes. 
Introduction. 
Ever since the first descriptions of prokaryote and eukaryote cell structure, 
researchers have sought an explanation for the origins of the nucleus. While progress 
in understanding the molecular details of nuclear function is moving at a fast pace 
(Olson, et al . 2000; Wente, 2000; Lewis and Tollervey, 2000), progress on nuclear 
origins is slow. Scenarios for the evolution of the nucleus include an endosymbiotic 
origin (e.g. Lake and Rivera, 1 994), autogenous origins in eukaryotes (e.g. Cavalier­
Smith, 1 988), and emergence subsequent to a fusion between an eubacterium and an 
archaeon (e.g. Gupta and Golding, 1 996; Martin and MUller, 1 998; Moreira and 
L6pez-Garda, 1 998; Margulis, et aI. ,  2000). 
With the explosion of new comparative data and, in particular, the finding that 
a number of nuclear features (e.g. histones & small nucleolar RNAs) are also present 
in representatives of the archaea, the relationship between the three domains is again 
becoming clouded. Forterre ( 1 997) has made the important point that, given the lack 
of consensus on the relationships between the three domains, it is problematic to 
assume the direction of evolution on the basis of shared archaeal-eukaryote 
characters . These could be readily explained either as dating back to the Last 
Universal Common Ancestor (LUCA) (having been lost from bacteria), or by 
emergence in the common ancestor of archaea and eukaryotes, after their divergence 
from bacteria. 
Indeed, the rapidly growing data from whole genomes has established that 
eukaryote genomes contain genes whose closest counterparts are in the eubacteria, 
and genes whose closest counterparts are archaeal (Ribeiro and Golding, 1998 ;  Rivera 
et aI. ,  1 998; Horiike et aI. ,  2001 ) .  Horiike et al. (200 1 )  provide the most intuitive 
description of the pattern: eUkaryotic genes which appear most closely related to 
bacterial genes function in the cytoplasm, while those apparently related to archaeal 
sequences generally function in the nucleus. However, as per Forterre's caveat that 
single traits can fit more than one scenario when the relationships between the three 
domains is not known (Forterre, 1 997), the same applies for the genome data. This 
pattern could be interpreted in several ways with respect to the relationships between 
archaea, eubacteria and eukaryotes, with interpretation being further complicated by 
differing accounts of the degree of horizontal transfer between the three domains 
(Martin,  1 999a; Penny and Poole, 1 999, for review). 
Page 1 
Inherent in discussion of the problem of nuclear origins, is the assumption that 
(chloroplasts, hydrogenosomes and mitochondria aside) the greater complexity of the 
eukaryote cell has evolved from a simpler prokaryotic cell ultrastructure. This has 
seemed reasonable, and indeed is implicit in almost all discussions of the evolution of 
prokaryotes and eukaryotes (see Forterre and Philippe, 1999, for a critique), so the 
debate has largely centred on how the nucleus arose in eukaryotes after their 
divergence from prokaryotes. While it seems intuitive that complex eukaryotic 
organisms, and with them, the nucleus, must have evolved from simpler prokaryotic 
organisms, evolution does not necessarily result in complexification with time. 
Reductive evolution is generally accepted in endosymbiosis and parasitology (Fraser 
et aI. ,  1 995; Razin et aI . ,  1 998 ; Andersson and Andersson, 1 999a), but the idea that 
the prokaryote lineages have as a whole undergone a process of reductive evolution 
(Reanney, 1 974; Forterre, 1 995a; Poole et aI . ,  1 998) has been less popular. However, 
a number of groups are finding that a broad range of biochemical, biophysical, 
phylogenetic and genetic data are more compatible with this scenario than with the 
traditionally-accepted prokaryote to eukaryote transition (Forterre and Philippe, 1 999; 
Poole et al. ,  1 999; Penny and Poole, 1 999; Glansdorff, 2000). 
Indeed, a previously unconsidered dataset is the concentration of putative 
RNA world relics in the eukaryote nucleus (Poole et al . ,  1 998, 1 999). The 
identification of snoRNAs in archaea (Omer et aI. ,  2000; Gaspin et al. ,  2000) means 
there are more such RNA world traits in the archaea than in eubacteria. Both archaea 
and bacteria nevertheless appear to have undergone reductive evolution, losing a 
number of these traits (Poole et al. ,  1 999; Penny and Poole, 1 999) . An archaeal origin 
of the nucleus, or the host of the mitochondrionlhydrogenosome (Gupta and Golding, 
1 996; Martin and MUller, 1 998;  Moreira and L6pez-Garcfa, 1 998; L6pez-Garcfa and 
Moreira, 1 999), does not explain the likely RNA world origin of a number of traits 
(though see Sogin et aI. ,  1996, for a more inclusive model) (Table 1 ) . Selection 
pressures for the loss of putative RNA relics in the bacteria and archaea (either once 
or twice) have been described (Forterre, 1 995a; Poole et al. ,  1 999) . Since most fusion 
scenarios cannot explain the observation that the greatest diversity of RNA relics are 
in eukaryotes, they are problematic (Sogin et al . ,  1996; Penny & Poole, 1 999) . 
In this paper, I concentrate on what I consider to be the three most problematic 
issues surrounding the origins of the nucleus: 
1 .  The selection pressure that drove the evolution of the nucleus. 
2. The nature of the organism in which this developed. 
3 .  Whether or not the nucleus arose prior to organellar endosymbioses. 
Current theories fall far short of explaining the entire range of genetic, structural and 
biochemical data, and in my mind, this is symptomatic of many discussions of early 
evolution. The prokaryote dogma is a large part of the problem (see Forterre, 1 995b; 
Forterre & Philippe 1 999, for detailed discussion). This is simply that eukaryotes 
evolved from prokaryote-like ancestors, and is underpinned by the identification of 
3.5 billion-year old microfossils classified as cyanobacteria (Schopf & Packer 1 987). 
Page 2 
The existence of stromatolites as far back as this has likewise been taken to suggest 
the existence of cyanobacteria (Walsh, 1992), since modem stromatolites are formed 
by cyanobacteria. However, the earliest stromatolites lack microfossils and can abiotic 
processes can explain their formation (Lowe, 1 994; Grotzinger & Rothman, 1 996). 
Furthermore, in recent phylogenetic reconstructions, modem cyanobacteria appear as 
a derived group (Lockhart et aI. ,  1 998). 
There are also difficulties in establishing taxonomy from ultrastructure. The 
best known example is the identification of the domain archaea which required 
sequence comparisons (Woese and Fox, 1 977), but there have also been difficulties in 
distinguishing some protists and bacteria on morphology alone. Epulopiscium 
fishelsoni, a symbiont living in the gut of surgeonfishes, was originally thought to be a 
protist (Fishelson et al. ,  1 985; Montgomery & Pol1ak, 1 988) .  Electron microscopy 
suggested that, despite massive cell size, these symbionts might in fact be prokaryotes 
(Clements and Bullivant, 1 99 1 ), but it was only with phylogenetic analysis that this 
could be confirmed (Angert et aI. ,  1 993). 
The problem of the nucleus is, without exception, framed within the 
assumption of prokaryote ancestry. Hence, when it is asked, 'What is the origin of the 
nucleus?' the question really is, 'Given we know that eukaryotes are derived, how did 
the nucleus arise specifically in this lineage after their split from bacteria and 
archaea?' It is worth pointing out that many theories do not even get this far, 
providing nothing more than a description of which bits could have evolved after 
which other bits to give the modem nucleus ! All biochemical data that show any 
relationship between archaea and eukaryotes tend to considered in this light, archaeal 
histones (Pereira & Reeve, 1998) and snoRNA homologues (Omer et al. 2000), being 
two such examples. 
The prokaryote dogma may be correct, or it may be incorrect (its validity has 
been challenged, but is hardly debated) but the problem lies with the application of 
the assumption in general . By making this assumption, the question is answered 
before the data are even looked at. Thus only one scenario can ever be considered 
and, while details may differ, there is only one possible conclusion ! 
There are strong grounds on which to challenge the prokaryote dogma, and 
that, without questioning its validity as a central tenet of early evolution, it is 
impossible to make progress in understanding cellular evolution. Thus, in this paper I 
aim to reexamine the question of the origin of the nucleus without first requiring that, 
as a corollary of the prokaryote dogma, the nucleus must have arisen in the eukaryote 
lineage and is a derived trait. Indeed, because all discussions on the origin of the 
nucleus that I am aware of assume that it arose specifically in the eukaryote lineage, I 
will take the other end of the spectrum: that the nuclear envelope predates the LUCA 
and arose concurrent with the first cells. It may eventually be possible to reject this 
extreme position, but in the meantime it is interesting to see where the argument 
leads. 
Page 3 
The proposal is based in part on a previous finding that the largest collection 
of putative RNA world relics is found in the eukaryote nucleus, suggesting it is 
perhaps a more ancient structure than previously supposed (Poole et al. ,  1 998, 1 999; 
Penny and Poole, 1 999). I further suggest that a double membrane structure (i.e. 
nuclear and cytoplasmic) could have served as a buffer to osmotic pressure in the cell 
prior to the advent of sophisticated gates, channels and pumps for osmoregulation. 
Moreover, I argue that the unique structure of the nuclear envelope may hold 
the key to how the presumed transition from surface chemistry to the first cells 
occurred, as simple pores of either protein or RNA are possible without the 
requirement that these traverse the lipid bilayer. I subsequently argue that, in the 
absence of selection pressure to remove the nuclear membrane, this was never lost 
from eukaryotes, while in prokaryotes it was selectively advantageous to have 
coupled transcription and translation. 
The ideas discussed are speculative, and may well be incorrect. However, if 
the paper serves to drive debate on early evolutionary events away from discussion of 
narrow datasets and preconceptions, it will have served an important purpose. Work 
to date tends to focus on phylogenetic patterns and gene distributions ,  or the study of 
candidate 'living fossil' organisms or groups. There is a paucity of discussion on the 
structure of the nuclear envelope as compared with other cellular membrane structures 
and almost no discussion on selection for its origin and evolution exclusively in 
eukaryotes (Martin 1 999b; Poole and Penny, 200 1 ). Ignoring the unique structure of 
the nuclear envelope in proposing a theory is unforgivable, yet a number of authors do 
this when proposing that the nucleus was an endosymbiont (Rivera and Lake 1 994; 
Gupta and Golding, 1 996; Horiike et aI. ,  200 1 ) . These scenarios explain only a single 
dataset: that the eukaryote genome appears to be chimeric, individual genes being 
either most closely related to bacterial genes or archaeal ones. In itself, this is a salient 
observation based initially on confusing gene relationships (Gupta and Singh, 1 994) 
and later, from larger genomic analyses (e.g. Rivera et al. ,  1 998; Ribeiro & Golding, 
1 998;  Horiike et al. ,  200 1 ) . However, in concluding from these data that the nucleus 
was an endosymbiont, crucial differences between nuclear structure and function and 
organelles of clear endosymbiont origin (mitochondria, hydrogenosomes, 
chloroplasts) are either overlooked or ignored. 
Likewise, the gap between prebiotic chemists, who are largely in favour of 
surface chemistry as a crucial step the origin of life, and molecular biologists, who 
expect that the earliest life forms were cellular, is large. A cell with an almost 
impermeable lipid membrane is not a likely intermediate between these two presumed 
stages. How the first cells might have regulated their intracellular environment 
relative to the external environment is largely unexplored. 
The problem with purely descriptive explanations for the origin of the nucleus. 
Most current theories on the origin of the nucleus attempt to address the 
growing evidence (Gupta and Golding 1 996; Ribeiro and Golding, 1 998; Rivera et al. ,  
Page 4 
1 998; Horiike et al. 200 1 )  that eukaryote genomes represent a mixture of archaeal­
and eubacterial-like genes. However, all lack a crucial component: no clear selection 
pressures are given for the origin of this structure. This is in stark contrast to research 
into the origins of mitochondria, hydrogenosomes and chloroplasts, where there is 
general agreement, and good experimental support (Andersson et al. ,  1 998;  Martin et 
al. ,  1 998; Gray et al. ,  1 999; McFadden, 1 999) to show that these organe1Ies arose by 
endosymbiosis from free living bacteria. Current theories in that field attempt to 
identify a selection pressure for the initial symbiosis, as well as the process of gene 
loss from organe1les and gene transfer to the nucleus (e.g. Martin and MUller, 1 998; 
Martin et aI. ,  1 998;  Moreira and L6pez-Garda, 1 998;  Andersson and Kurland, 1 999; 
Andersson and Andersson, 1999a,b; Blanchard and Lynch, 2000). 
What connects the question of the origin of endosymbiotic organelIes and the 
origin of the nucleus has been the difficulty in establishing whether, prior to the origin 
of the mitochondrion, amitochondriate eukaryotes existed at all (Sogin, 1 997; Embley 
& Hirt, 1 998;  Martin and MUller, 1 998; Philippe et aI. ,  2000a). As pointed out by 
Martin, "for organe1les to take up residence in a cytoplasm, there had to be a host" 
(Martin,  1 999b). If all the DNA-containing organelles of the cell arose through 
endosymbiosis, and the nucleus was the first to arise this way, what happened to the 
genetic material of the original host? Vellai et al. ( 1 998) have noted that for an 
endosymbiotic event to occur, there must have been some mechanism of 
phagocytosing the endosymbiont, and, so far, only eukaryota have been demonstrated 
to be capable of this. Indeed, if it is contended that a prokaryotic organism was the 
original host, it seems odd that phagocytosis is no longer a feature of extant 
prokaryote lineages ! 
Two broad variant theories for the origin of the nucleus through 
endosymbiosis have been suggested, those where the endosymbiont is an archaeon, 
and those where the host is an archaeon. The first, that the nucleus was an 
en do symbiont archaeon that took over the host cell (Rivera & Lake, 1 994; Horiike et 
al. 200 1 ), not only fails to explain how an archeal cell membrane could have become 
the nuclear envelope, it also requires that the endosymbiont gained genes from the 
host (Poole and Penny, 200 1 ). In addition, it requires that the endosymbiont changed 
its lipid composition, from ether-linked lipids to the phospholipids found in the 
nuclear envelope. It also requires a change in structure from a simple lipid bilayer to a 
structure where inner and outer nuclear membranes are continuous. The outer 
membrane is also continuous with the endoplasmic reticulum, forming a continuous 
lumen. Furthermore the nuclear pores do not traverse the lipid bilayer as such, but are 
instead formed at regions where the inner and outer nuclear membranes meet. 
What was the selection pressure that drove this event? How can the lack of 
similarity of nuclear membrane structure (an envelope with pores) and nuclear 
chromosomes (with those of prokaryotes, chloroplasts and mitochondria) be 
accounted for? How is it possible to account for the disappearance, and later 
reformation, of the nuclear envelope at cell division (meiosis and mitosis) in some 
Page 5 
eukaryotic groups? Other organelles of endosymbiotic origin, regardless of placement 
on the eukaryote tree, do not undergo this process. While 'closed' mitosis, where the 
nuclear envelope remains intact throughout, is known in various protists, algae and 
fungi, it is not clear whether this is ancestral or derived. 
Furthermore, the structure of the nuclear envelope bears no resemblance to 
any biological membranes in archaea and bacteria. The nuclear envelope is unlike the 
membrane structure of any prokaryote, consisting of a flattened continuous lipid 
bilayer with nuclear pores allowing free diffusion of molecules 20-40kDa in size 
across the envelope (Wente, 2000; Allen et aI. ,  2000). Engulfment to form 
chloroplasts and mitochondria has not produced such structures, and both the 
membrane and most porins are of Gram-negative bacterial 'origin' (Cavalier Smith 
2000; Fliigge 2000; SoIl et al. 2000) . Nor has a structure equivalent to the nuclear 
envelope appeared in cases of secondary endosymbioses where one eukaryotic cell 
engulfs another. An exception is the nucleomorph, which is clearly a relic of the 
nucleus of the eukaryotic endosymbiont (Gilson et aI. ,  1 997; Cavalier-Smith, 2000; 
Douglas et al. 200 1 ). 
If the nucleus is archaeal in origin (Lake, 1 994; Gupta and Golding, 1996; 
Moreira and Lopez-Garcfa, 1 998; Horiike et al. 2001 ) ,  then these issues are 
unexplained. The worst oversight here is that it requires that the endosymbiont gained 
genes from the host, with the latter presumably losing all its genes, including a 
significant proportion to the endosymbiont (Poole & Penny, 200 1 ). This is 
inconsistent with all documented cases of endosymbiosis and intracellular parasitism 
by prokaryotes, and eukaryotes (Andersson et al. 1 998; Moran & Baumann 2000; 
Wren 200 1 ;  Keeling & McFadden 1 998;  Douglas et al. ,  200 1 ) .  Upon entering a 
symbiotic or parasitic relationship with the host, the endosymbiont, by utilising host 
metabolites, over time loses the capacity to synthesise these metabolites. This pattern 
has been clearly established through numerous whole genome studies (Fraser et al. 
1 995, 1 997, 1 998;  Himmelreich et al. 1 996; Andersson & Andersson 1 999a,b; 
Kalman et aL 1 999; Cole et al. 200 1 ) .  In time, this irreversible process presumably 
results in host dependence, with the endosymbiont becoming obligate. 
Mitochondria, chloroplasts and hydrogenosomes, which are of endosymbiotic 
origin, have suffered this fate (Blanchard & Lynch 2000; Martin et al. 1 998), with 
hydrogenosomes having completely lost their genome in all but a few cases 
(Akhmanova et al. 1998). The intracellular lifestyle places endosymbionts and 
parasites under mutational pressure, particularly in an obligate intracellular lifestyle. 
This is best explained as being due to Muller's ratchet, the gradual accumulation of 
slightly deleterious mutations in asexual organisms with small population size. 
Muller's ratchet has been shown to affect free-living bacteria (Andersson & Hughes 
1 996), endosymbionts such as Buchnera (Moran 1 996; Moran & Baumann 2000), 
intracellular parasites such as the Rickettsiae and Chlamydiae (Andersson & 
Andersson 1 999b; Kalman et al. 1 999) as well as organellar genomes (Berg & 
Kurland 2000; B lanchard & Lynch 2000). 
Page 6 
The hydrogen hypothesis (Martin and MUller, 1998) does not fall  foul of the 
above criticisms as it instead suggests the host was an archaeon, the endosymbiont 
was the forerunner to mitochondria and hydrogenosomes, and that the nucleus 
evolved subsequent to endosymbiosis (Martin and MUller, 1 998; Martin, 1 999b). 
Other chimeric theories where the host is an archaeon exist, but most are largely 
descriptive and do not attempt to establish selection pressures for the origin of the 
nucleus. For reviews of the various chimeric hypotheses, see Gupta and Golding 
( 1 996), Katz ( 1 998), L6pez-Garcfa and Moreira ( 1999) and Margulis et al. (2000). 
The hydrogen hypothesis (Martin and MUller, 1998) and the related but 
independently conceived syntrophy hypothesis (Moreira and L6pez-Garcfa, 1 998) are 
perhaps the most interesting. Both provide a detailed and feasible scenario for the 
origin of the eukaryote cell, ultimately by fusion between an archaeon and a 
bacterium (two bacteria in the case of syntrophy) .  They do not fall foul of any of the 
criticisms levelled at competing theories. Similarities and differences between the 
hydrogen and syntrophy hypotheses have been discussed elsewhere (L6pez-Garcfa 
and Moreira, 1 999) and I do not cover these in depth here. Suffice it to say both 
represent plausible scenarios for the metabolic basis for the establishment of 
symbiosis, as opposed to simply suggesting that the symbiont gave away ATP. 
However, the question I consider is the nature of the host, as opposed to the nature of 
the initial interaction. For simplicity, I shall consider the simpler of the two scenarios, 
where there is only a single symbiont (the hydrogen hypothesis) .  
In the hydrogen hypothesis endosymbiosis occurs, though not as an initial 
step. In this scenario, the endosymbiont is the ancestor of both hydrogenosomes and 
mitochondria. However, it does not explicitly describe the origins of the nucleus, 
other than to say that the possession of numerous traits common to both archaea and 
eukaryotes makes it feasible to suggest the nucleus arose after the endosymbiosis that 
spawned hydrogenosomes and mitochondria. In a separate paper, Martin ( 1 999b) does 
however discuss nuclear origins under the hydrogen hypothesis. To this I shall return. 
A third class of theory for the origin of eukaryotes avoids the problem by 
invoking a proto-eukaryotic host (possibly with a nucleus), thereby explaining the 
chimeric origin of nuclear genes. This is the traditional formulation of the 
endosymbiont hypothesis (as revived by Margulis, 1970), and which has been most 
extensively developed by Cavalier-Smith. He proposed that extant amitochondriate 
protists, which he named the Archaezoa (Cavalier-Smith, 1983,  1 987, 1 988),  were the 
ancestors of mitochondriate eukaryotes, predating endosymbiosis in the eukaryote 
lineage. While the member composition of the Archaezoa has been variable (see 
Table 2 in Patterson, 1 999), it is now widely thought that the Archaezoa may all be 
secondarily amitochondriate (reviewed by Keeling, 1998; Embley and Hirt, 1 998). 
However, that these extant protists are not the 'missing link' in the evolution of the 
eukaryote cell does not necessarily mean that the host could not have been a proto­
eukaryote. 
Page 7 
The wealth of data on mitochondria and hydrogenosomes suggests 
endosymbiosis of an ancient facultative a-proteobacterium best accounts for a single 
origin for these organelles (Rotte et al. 2000). While the details are strongly debated 
(Andersson & Kurland 1 999; Rotte et al. 2000), the issue of the origins of the nucleus 
tend to take a back seat (though see Martin, 1 999b). Indeed, as has been debated 
recently (Biagini & Bemard 1 999; Martin 1 999c) there is much difficulty in 
establishing the nature of the host. Was it an archaeon, with the nucleus arising only 
after the initial endosymbiosis, or an amitochondrial proto-eukaryote with a nucleus? 
Did the nucleus arise after mitochondria/hydrogenosomes? 
Aside from genomic data suggesting that the eukaryote nucleus has a chimeric 
gene composition, little has been said about the origins of the nucleus. There has been 
no systematic attempt to establish whether the host was 'eukaryotic' with a nucleus, or 
whether it was an archaeon (with the nucleus being a late development) . Nor has there 
been much attempt to suggest plausible selection pressures for its origin under either 
one of these scenarios. 
In the current context, the debate between proponents of the 'ox-tox' 
hypothesis (that the original interaction between proto-eukaryote host and ancestors of 
mitochondria was based on oxygen detoxification of the host by the symbiont 
[Andersson and Kurland, 1 999]) and the hydrogen hypothesis is interesting. While the 
details differ, the common feature is that all agree on a common origin for 
mitochondria and hydrogenosomes (Andersson and Kurland, 1 999;  Rotte et aI. ,  2000) . 
However, neither theory addresses the origin of the nuclear envelope. The 'ox-tox' 
hypothesis envisages a proto-eukaryotic host that may or may not possess a nucleus 
(Andersson and Kurland, 1 999), while the hydrogen hypothesis argues for an archaeal 
host, so requires the nucleus to have arisen after the endosymbiosis that gave rise to 
mitochondria and hydrogenosomes (Martin, 1 999b).  
The greater potential for oxidative damage in the mitochondrion (and 
chloroplast) is probably one pressure for many (though not all) genes to be relocated 
to the nucleus (Allen and Raven, 1 996; Race et aI. ,  1 999) . 'Ox-tox' is potentially 
compatible with this, requiring that strictly anaerobic eukaryotes arose from aerobic 
ancestors. A theory put forth by Vellai et al. ( 1998) is intermediate in that it proposes 
an archaeal host, but an aerobic basis for endosymbiosis. 
One argument for the origin of the nucleus is that it served to protect host 
DNA from oxidative damage resulting from leakage of reactive oxygen species from 
the mitochondrion (see Li, 1 999). This theory is interesting, being based on observed 
differences in oxidative damage in the nucleus and mitochondria (Richter et al. ,  1 988 ;  
Ljungman and Hanawalt, 1 992). However, the nuclear envelope allows free diffusion 
of small molecules up to -40kDa, so is unlikely to represent a barrier to oxygen 
radicals.  Furthermore, reactive oxygen species are dealt with by superoxide 
dismutases, catalases and glutathione peroxidases, not compartmentation (McCord, 
2000). Nor does oxidative damage explain the absence of a nucleus-like structure in 
Page 8 
aerobic prokaryotes, or suggest how a nuclear envelope might protect an anaerobe 
against oxygen, and reactive oxygen species. Finally, if the ancestral endosymbiosis 
was based on an anaerobic symbiosis (Martin and Muller, 1998; Moreira and Lopez­
Garcia, 1 998) this theory cannot readily account for the origin of the nucleus in 
anaerobic eukaryotes, though in all current theories, the endosymbiont is considered 
to be facultatively aerobic. 
More importantly, the 'Ox-tox' hypothesis, while less developed than the 
hydrogen hypothesis, permits the origin of the nucleus to be either prior to 
endosymbiosis, or to post-date it. The hydrogen hypothesis, in arguing for an archaeal 
host, requires that the nucleus (and a number of other eukaryote-specific traits) arose 
after hydrogenosomes and mitochondria. It does not suggest what selection pressures 
might account for the origin of the nuclear envelope, endomembrane system, and 
other features that separate archaea from eukaryotes. 
Martin ( 1999b) has argued for a fortuitous emergence of the eukaryote 
endomembrane system using the symbiosis described by the hydrogen hypothesis as a 
starting point. I will argue that this hypothesis, in requiring an archaeal host, does not 
explain many aspects of modern eukaryote cells. However this does not mean that I 
think the hydrogen hypothesis should be rejected outright. In terms of establishing a 
biochemical basis for the symbiotic interaction that gave rise to mitochondria and 
hydrogenosomes, it is not only plausible, but provides in many respects a substantial 
improvement over previous theories. At issue here is the nature of the host, not the 
nature of the symbiosis that gave rise to mitochondria and hydrogenosomes. 
One argument that has been made in favour of the possibility that the host was 
an archaeon is that the amitochondriate group of eukaryotes, the Archaezoa, are 
probably all secondarily amitochondriate, suggesting all extant eukaryote lineages 
once harboured mitochondria (Keeling, 1 998; Embley and Hirt, 1998). This has led to 
the suggestion that the origin of mitochondria & hydrogenosomes is concurrent with 
the origin of the eukaryote cell (Martin & Muller, 1 998; Martin, 1 999b). This 
argument is as problematic as the former assumption that the ancestral state for 
eukaryotes was nucleate but arnitochondrial (Cavalier-Smith, 1 983, 1 987), yet is 
presently being strongly argued for because of the absence of any evidence for the 
Archaezoa being genuinely amitochondriate as opposed to secondarily so (e.g. 
Martin, 1 999b). 
In the same way as there may be no bona fide Archaezoa, there are no 
anucleate eukaryotes/archaea which harbour mitochondrialhyderogenosomes or 
endosymbionts. The hydrogen hypothesis (Martin & Muller, 1 998) points to modern­
day examples of symbioses between archaea and bacteria much like those argued in 
that hypothesis to provide the basis for the interaction that ultimately led to the (X­
proteobacterial symbiont becoming an intracellular organelle 
(mitochondrialhydrogenosomes). But what selection pressures might have led to all 
these subsequent eukaryote-specific traits? 
Page 9 
No intermediates between the modem examples of archaeal-bacterial 
symbioses and modem eukaryotes with hydrogenosomes/mitochondria have been 
identified. Obvious examples would be phagocytic arch aea, archaea with linear 
chromosomes maintained by telomerase with multiple origins of replication, or 
'eukaryotes' or archaea with intracellularly located hydrogenosomes/mitochondria but 
no nucleus. Both Archaezoan and hydrogen hypotheses demand that ancestral forms 
went extinct, presumably through competition. Hence, arguing that the absence of one 
presumed ancestral form supports the alternative hypothesis is not only incorrect it is 
moot! 
In an important sense, arguing for an eukaryotic nuclear host is easier than 
arguing for an archaeal host. One has to accept that no modem examples exist, but it 
permits the host to be endophagocytic, and does not require that a range of eukaryote­
specific features (linear chromosomes with telomeres and multiple origins of 
replication, the nuclear envelope and nuclear pore complex, endoplasmic reticulum, 
golgi) all evolved subsequent to endosymbiosis. 
If current speculations of a eukaryote 'big bang' (Philippe et al . ,  2000a,b) are 
supported, this could be argued to account for the extinction of earlier forms, and 
could in principle fit with either a proto-eukaryote or archaeal host. On current 
evidence of formerly deep diverging amitochondrial eukaryotes being derived 
(Keeling, 1 998), it could either be argued that the endosymbiosis event resulted in 
extinction of all proto-eukaryotic lineages, or that the advent of the nucleus and other 
eukaryote-specific features, subsequent to endosymbiosis, resulted in the extinction of 
intermediate forms. The general agreement that hydrogenosomes and mitochondria 
are of a common endosymbiotic origin, as well as organelle to nucleus transfer of 
genes not strictly required for the function and maintenance of the endosymbiont (e.g. 
glycolysis) tentatively suggests the former. 
Although the 'big bang' hypothesis is far from accepted, it does lend some 
credibility to the fact that both nucleus-first and endosymbiont-first theories require 
extinction of intermediate forms, and this point bears further inspection. Since the 
endosymbiont-first theory has been detailed elsewhere (Martin & Mliller, 1998� 
Martin, 1 999b),  I will limit discussion to two issues. The first is whether there is an 
evolutionary precedent for the extinction of intermediate forms, and the second is 
whether assuming the derivation of eukaryotic nuclear traits from archaeal traits is 
reasonable. 
Extinction of intermediate forms? 
Currently, there are no known intermediate forms between archaea and 
modem eukaryotes that might favour the idea that eukaryote features arose 
subsequent to the endosymbiosis event. Nor are there any bona fide Archaezoa to 
support the idea that the host was nucleate. Two possibilities are immediately 
obvious: 1 .  That the limited sampling of eukaryote and archaeal diversity is such that 
intermediate forms have simply not been found (Embley & Hirt, 1 998; Keeling, 1 998) 
Page 1 0  
2. That intermediate forms have been outcompeted by the ancestors of the extant 
lineage, which would potentially account for the eukaryote 'big-bang' suggested from 
phylogeny (Phillipe et al. 2000a,b). Until intermediate forms are identified, neither 
theory fares better than the other and the debate cannot be readily resolved on this 
point alone. 
Nevertheless, one can speculate on the feasibility of an across the board 
extinction of intermediate forms. If all Archaezoa turn out to be secondarily 
amitochondriate, a revised Archaezoan hypothesis would require that ancestrally 
nucleate forms were outcompeted across all environments by nucleate eukaryotes 
carrying a facultatively aerobic endosymbiont. The hydrogen hypothesis is slightly 
trickier. The initial formulation (Martin & Miiller, 1998) does not address the origin 
of eukaryote-specific traits in detail, and does not involve a symbiont in an 
intracellular location. Instead, a symbiosis event similar to modern symbioses 
between archaea and bacteria is argued as the initial state. Nevertheless, given the 
intracellular location of mitochondria and hydrogenosomes in extant eukaryotes, 
endosymbiosis must have ultimately ensued. In a subsequent paper by Martin 
( 1 999b), the origin of the endomembrane system is argued to be a consequence of the 
relocation of symbiont genes for lipid synthesis to the host chromosomes. The 
wording is ambiguous with respect to whether the symbiont was intracellular by this 
time. However, the statement that, 'Gene transfer from the symbiont's genome to the 
cytosolic chromosomes of the host could have genetically cemented two prokaryotes 
into a single, biochemically compartmented, but nucleus-lacking common ancestor of 
eukaryotes' suggests this. Gene decay and symbiont to host gene transfer are features 
of endosymbionts and obligate intracellular parasites (Andersson & Andersson, 
1 999a,b; Moran & Baumann, 2000). Given such a location for hydrogenosomes and 
mitochondria, it makes most sense that, by this time, the symbiont in Martin's 
scenario is intracellularly located. 
Considering the hydrogen hypothesis first, if biological competition was 
responsible for extinction of intermediate forms, extinction must occur as a 
consequence of the evolution of eukaryote-specific features subsequent to 
endosymbiosis. These eukaryote-specific features would need to be selectively 
advantageous in all ancestral eukaryote environments, displacing existing forms,  as 
well as being maintained in the subsequent colonisation of aerobic environments. The 
theory must explain the ubiquity of eukaryote features such as linear chromosomes 
with telomeres and multiple origins of replication, an endomembrane system 
consisting of nuclear envelope, endoplasmic reticulum and golgi, and a cytoskeleton 
(Table 1 ). Either the final feature to appear was so superior as to outcompete ancestral 
forms (with the other features being fixed), or these cell structural features in 
combination were. There is difficulty even coming up with a selection pressure for the 
emergence of such features, let alone establishing how such features could come to 
define the eukaryote cell architecture. What is it about the endomembrane system that 
makes it so superior to an anuclear host with an endosymbiont? 
Page 1 1  
The other possibility is a modified version of the traditional argument, that the 
endosymbiont took up residence in a proto-eukaryotic host which, other than the lack 
of hydrogenosomes or mitochondria, was structurally similar to modem eukaryotic 
cells. The host would have already been separated from the archaeal lineage, and was 
phagocytic. The ancestral cell would have endophagocytosed the ancestral (J.­
proteobacterium, and the nature of the interaction could still have been initially 
anaerobic, as per the hydrogen hypothesis, with the endosymbiont being facultatively 
anerobic and the proto-eukaryotic host being an anaerobe. I will discuss the points in 
favour of a proto-eukaryotic host in the next section. 
In the absence of intermediate forms, this scenario, as with the hydrogen 
hypothesis, would require extinction of intermediate forms, though in this case, there 
is only one form, proto-eukaryote lineages without an endosymbiont. Thus, the 
presence of an anaerobic endosymbiont in one lineage would have to be argued to be 
sufficient to outcompete all other proto-eukaryote lineages in all environments, and 
account for the colonisation of aerobic environments by its descendents. 
In order to consider this in depth, I shall introduce the concept of 
Evolutionarily-Stable Niche-Discontinuity (ESND) (Poole et al. ,  200 1 ;  M.J. Phillips, 
in prep.). Put simply, the ESND concept describes limits on potential evolvability as a 
result of within species competition between individuals, and the existence of a valley 
of low fitness between two niches. An individual that displays a trait which shifts it 
away from its (original) niche toward a second, occupied niche will be selected 
against within its own niche. It will still be too far away from the second niche to be 
able to compete successfully within the latter. The dual requirement of gradual 
changes across multiple traits, coupled with specialisation within a niche thus results 
in a discontinuity, and inhabitants of one niche cannot reach another (occupied) niche. 
An example given in PooIe et al. (2001 )  is that of cats, which are fast-burst strike 
predators, and dogs, which are indurance predators. 
ESNDs are predicted to exist between eukaryotes and prokaryotes,  the latter in 
general being r-selected relative to the former (Poole et al. , 2001 ) .  Two important 
aspects of prokaryotes and eukaryotes (when viewed not as phylogenetic groups, but 
evolutionary strategies) are evident. First, prokaryotes are able to respond quickly to 
the presence of a new nutrient by virtue of transcription and translation being coupled. 
Thus, before the transcript has been completely synthesised, translation of the protein 
it encodes has begun. In eukaryotes, the transcript is synthesised, capped, 
polyadenylated, spliced and then exported to the nucleus before it is synthesised. 
Secondly, prokaryote genome size is at a premium. There are limits to the rate at 
which a circular chromosome with a single origin of replication can be copied, and is 
be the rate-limiting step during exponential growth in E. coli (Poole et al. ,  200 1 ,  and 
references therein).  With such strong selection on genome size in prokaryotes, only a 
Page 1 2  
single origin of replication per chromosomel , and selection for fast response times, it 
is likely that there is an ESND between r-selected eukaryotes and prokaryotes .  The 
former group possesses mUltiple origins per chromosome, and response time is 
limited by physical separation of transcription and translation provided by the 
nucleus. The number of changes required for eukaryotes to become established in 
niches currently inhabited by prokaryotes (or vice versa) are too great, given 
intermediate low fitness. ESNDs may break down when organisms that inhabit similar 
niches and have never been in contact (e.g. because of geographical isolation) are 
brought into contact, or in organisms where horizontal gene transfer is  possible (Poole 
et aI. ,  2001 ) .  
With regard to the possibility of replacement of ancestral nucleate eukaryotes 
by the lineage that possessed an endosymbiont, the ESND concept provides a useful 
way of looking at how across the board displacements could have occurred. First of 
all, one of the consequences of selection for fast response times and subsequent 
exponential growth in prokaryotes is that enzymes will tend to evolve towards 
catalytic perfection at a faster rate than in eukaryotes (Jeffares et aI., 1 998 ;  Poole et 
al. ,  1 999, 200 1 ) . Catalytic perfection is achieved when the rate-limiting step in a 
reaction is the diffusion of substrate to the active site (Albery and Knowles, 1 976) . 
This may account for the observation that more than just endosymbiont­
specific genes have been transferred to the eukaryote nucleus (Berg & Kurland, 2000; 
Blanchard & Lynch, 2000) . Notably, genes for glycolysis have been argued to be of 
endosymbiotic origin (e.g. Martin et al. ,  1 993;  Keeling & Doolittle, 1 997;  Henze et 
al.,  1 998; Liaud et al., 2000), perhaps consistent with the possibility that ancestral 
prokaryotic metabolic genes were superior in terms of catalytic efficiency to those of 
the host. This might likewise account for the chimeric genome of eukaryotes, where 
most genes of probable bacterial origin are 'cytoplasmic' (i.e. involved in metabolism 
in the eukaryote cytoplasm, sensu Horiike et al. ,  200 1 ,  see also Rivera et al. ,  1 998). 
Selection for relocation of beneficial endosymbiont genes to the host can be argued on 
the basis of Muller's Ratchet (Blanchard and Lynch, 2000; Berg and Kurland, 2000), 
and furthermore, replacement of eukaryote genes by endosymbiont orthologues (non­
orthologous gene replacement, sensu Koonin et aI. ,  1 996) might be argued given the 
predicted catalytic superiority of endosymbiont metabolic enzymes. One could argue 
for other sources for the bacterial genes, but the simplest, most parsimonious, and 
most obvious source of the bulk of bacterial genes is the endosymbiont. 
Returning to the question of how a biologically driven 'mass extinction' of 
nucleate eukaryotes by endosymbiont-harbouring relatives might have occurred, 
endosymbiosis would have provided two selectively advantageous and immediately 
I Putative origins of replication have been identified in Pyrococcus abyssi (Myllykallio et aI., 2000), 
Pyrococcus horikoshii, Methanobacterium thermoautotrophicum (Lopez et aI., 1999) and Thermotoga 
maritima (Lopez et aI.,  2000). This work suggests archaeal replication is analogous to bacterial 
replication, being bidirectional, with a single origin per chromosome. 
Page 1 3  
realised traits that might result in ESND breakdown and therefore extinctions. First, 
the established endosymbiont provided the host with ATp2, and second, I argue that it 
had a large number of orthologous enzymes that were catalytically superior to the host 
complement. Third, the symbiosis presumably allowed previously anaerobic cells to 
diversify into aerobic and facultatively aerobic niches, which were inaccessible to 
their ancestors. 
I will skip the establishment of symbiosis, since this has been covered by other 
authors (Martin and Muller, 1 998; Andersson and Kurland, 1 999; Rotte et aI., 2000). 
Instead, I will concentrate on loss of redundant genes versus transfer to of genes to the 
nucleus. Selection within the endosymbiont-containing eukaryotes for fittest variants 
may have possibly resulted in a significant proportion of metabolic pathway 
orthologues being transferred to the nucleus, though some are expected to be lost 
owing to redundancy, as is seen in contemporary endosymbionts. 
There may be an important difference between modern examples and the 
initial endosymbiosis however, and it might be predicted that the outcome would have 
been different depending upon whether the host is presumed to be archaeal or proto­
eukaryotic . As described above, relative to extant eukaryotes, extant prokaryotes are 
r-selected. I predict that catalytic efficiency of archaeal and bacterial proteins carrying 
out identical reactions will be comparable. Assuming that this is true, genome fusion 
ought to reveal a chimeric origin for metabolic pathways. This is not the case 
however, with evidence to date (Ribeiro & Golding, 1998; Rivera et aI. ,  1 998; Horiike 
et aI. ,  200 1 )  suggesting host ancestral metabolic pathways have been replaced by 
endosymbiont pathways. 
Modern eukaryotes are K-selected relative to bacteria, and if this is this niche 
discontinuity is an ancestral one, there would have been strongest selection in the 
latter for evolution of catalysis towards catalytic perfection. In metabolic pathways, 
where substrates, intermediates and products are usually similar, or can have a similar 
outcome (e.g. generation of ATP), the pathway is not as important as the outcome, 
since it is the products that are utilised. I therefore suggest that with the redundancy of 
orthologous metabolic pathways in the initial endosymbiosis, the endosymbiont 
pathways would have prevailed, being faster and more efficient. Furthermore, 
equivalent (analogous) pathways would be displaced from the host repertoire in 
favour of the endosymbiont pathways. 
So, even though there would be a significant degree of loss through 
redundancy (as is seen in contemporary endosymbionts), those genes that conferred 
an advantage under endosymbiosis would tend to be maintained in the population. If 
2 I am not describing how the initial endosymbiosis was established, but rather how, subsequent to the 
development of the contemporary situation, the endosymbiont provided the host with energy_ Both 
hydrogen and ox-tox hypotheses point out that the initial symbiosis was probably based on different 
interactions (Martin and Muller, 1 998; Moreira and L6pez-Garda, 1998; Andersson and Kurland, 
1 999) 
Page 14 
these are orthologues or analogues of nuclear genes, the end result is selection for 
these over the nuclear genes, the latter being lost, and the former being ultimately 
transferred to the nucleus, as a result of the operation of Muller's Ratchet, and perhaps 
also oxidative DNA damage (Allen and Raven, 1 996). However, moving to the 
nucleus in has the disadvantage that gene expression is slowed, meaning slower 
response times. Some products must be targetted to the endosymbiont, but transfer to 
the nucleus would alleviate the mutational pressure of being located in the 
endosymbiont. 
The limitations of eukaryotic gene expression3 would have meant that the 
organism could never have competed with prokaryote ancestors of the endosymbiont, 
but could however have resulted in extinction of proto-eukaryotes without an 
endosymbiont, within-population selection favouring those individuals that made use 
of the endosymbiont genes. I suggest that endosymbiosis caused a breakdown of 
ESNDs in eukaryotes effectively because of horizontal gene transfer. 
This model is speculative, but in the next section I argue that there is a strong 
case for a proto-eukaryote host, based on the unique characteristics of eukaryote 
genome architecture, and the presence of putative RNA world relics in the nucleus ,  
many of these having been lost from prokaryotes .  
RNA relics in the nucleus. 
If the nucleus is argued to be present in the host that endosymbiosed the 
ancient a-proteobacterium that gave rise to hydrogenosomes and mitochondria, then 
when did the nucleus arise? The standard argument is that it evolved in the eukaryotic 
branch subsequent to the split from archaea. In both pre- and post-endosymbiotic 
scenarios for the evolution of the nucleus, a key point is the presence in archaea of 
genes that contribute to eukaryote-specific traits (e.g. Martin & Muller, 1 998 ; Moreira 
and Lopez-Garcia, 1 998). This is used to suggest the evolutionary building blocks for 
the emergence of eukaryote-specific features evolved in the archaeal-like common 
ancestor of archaea and eukaryotes. It is equally feasible to argue that the presence of 
these genes in archaea, while suggesting a more recent common ancestry between 
archaea and eukaryotes than either with bacteria, is evidence for a eukaryote-like 
common ancestor, and loss of specific structures in archaea through reductive 
evolution ! 
One feature of the nucleus which might favour the latter possibility is that the 
nucleus is the site of the RNA processing events that produce mature functional 
RNAs and mRNAs (Lewis & Tollervey 2000). Most of these processing events 
3 It has been argued that eukaryote individual 'informational' genes (sensu Rivera et aI., 1 998) would 
not have been so readily replaced because of their involvement in large multimeric complexes with 
many interactions, as per the ribosome (lain et aI., 1999). Another explanation would be that if the host 
had a different genome architecture, i.e., much like that of modern eukaryotes, replacement could not 
occur without a fundamental change in architecture. 
Page 1 5  
require functional RNAs that have been argued to be of RNA world origin, and a 
number of these are present only in eukaryotes (Poole et al. 1 999) . Any theory for the 
origin of the nucleus must consider the concentration of relic RNAs within this 
eukaryotic organelle, and the smaller numbers of RNA relics in prokaryote lineages 
(Penny & Poole 1 999) .  I shall briefly review the distribution of RNA world relics 
before examining a possible scenario for the origin of the nuclear envelope prior to 
the emergence of eukaryotes, archaea and bacteria. 
The concept of an RNA world is now well established, and enjoys a prominent 
position in origin of life studies, being pursued both through the identification of 
putative relics (Jeffares et aI. ,  1 998; Poole et aI. ,  1999) and through in vitro selection 
studies (Yarns, 1 999). While it is not clear whether an RNA-only world existed sensu 
stricto, it is certainly clear that there was an earlier period in the evolution of life 
where RNA played a more prominent role in cellular processes than now. At present, 
the main difficulty is that work on this problem has become separated from later 
periods in early evolution (Poole et aI. ,  1 999), with the question of the nature of the 
last universal common ancestor (LUCA) now being largely the domain of 
phylogenetics (Doolittle, 1 999). 
The RNA world model leads to the finding that the greatest diversity of 
putative RNA relics, as well as probable ancestral genome architecture, are 
concentrated in the nucleus of modem eukaryotes (Poole et al. ,  1 998, 1 999). Since 
this is the subject of other recent articles, I refer the reader to these for detail (Poole et 
al . ,  1 998, 1 999; Penny and Poole, 1999), and limit discussion here to a brief overview 
of the main points. 
Several lines of argument suggest that features of the prokaryote lineages are 
derived, and that processes that have been considered ancient owing to their apparent 
simplicity may have evolved from a more complex (inefficient) precursor through 
reductive evolution (Reanney, 1 974; Darnell and Doolittle, 1 986;  Forterre, 1 995a; 
Pooie et al. ,  1 999) . In extant eukaryotic cells, there exists a general processing pattern, 
where pre-tRNA, pre-rRNA and pre-mRNA are transcribed, processed to produce 
mature rRNA, tRNA and mRNA, exported from the nucleus, and then become 
involved in translation in the cytoplasm (Poole et al ., 1 999; Penny and PooIe, 1 999). 
The processing of all three occurs via ribonucleoprotein complexes, with tRNA being 
processed by the ubiquitous RNase p4, which is a strong RNA world candidate with 
the RNA alone being sufficient for catalysis in some organisms (Altman and 
Kirsebom, 1 999; Pannucci et al. 1999). Both spliceosomal snRNAs (Darnell and 
Doolittle, 1 986, Gilbert & de Souza, 1 999), and the snoRNAs involved in rRNA 
4 Bacterial RNase P is also involved in processing of rRNA, 4.5S RNA (srpRNA) and tmRNA (Altman 
& Kirsebom 1999). Eukaryotic RNase MRP, often considered a snoRNA, is specific for rRNA 
processing, carrying out the equivalent cleavage to that of bacterial RNase P on rRNA. On function 
(Venema & Tollervey 1 999) and phylogeny (Collins et al. 2000), both RNase P and MRP appear to 
have a common origin. 
Page 1 6  
processing (Poole et aI. ,  1998, 1999, 2000) have been argued to date back to the RNA 
world. The origins of tRNA (Maizels and Weiner, 1 999), rRNA (Noller, 1 999; Poole 
et aI. ,  1 999) and mRNA (the 'introns first' hypothesis - see Poole et al.,  1 998, 1 999) 
prior to the evolution of protein synthesis, is generally accepted. Strong arguments 
have also been made for the origin of telomerase & telomeres in the RNA world 
(Maizels and Weiner, 1999). Loss of a number of RNA traits, and reduction in RNA 
processing in prokaryotes, is considered more consistent with the RNA world 
hypothesis (Forterre, 1 995a; Poole et al. ,  1 999; Penny and Poole, 1 999). As yet, no 
examples of RNA replacing protein have been found (Poole et al. ,  1 999; A. Poole and 
D. Penny, in preparation). 
Likewise, an argument for major features of the eukaryotic genome 
architecture (introns, multiple origins of replication, redundancy, linear 
chromosomes)  being ancestral is well-developed (Poole et al . ,  1 999), and supported 
by theoretical studies on the evolution of early genetic systems (Eigen and Schuster, 
1 979; Koch, 1 984; Scheuring, 2000). Furthermore, the hypothesis that prokaryotes 
(but not eukaryotes) underwent a period of thermoadaptation from a mesophilic 
ancestor (Forterre, 1 995a; Galtier et al. ,  1 999) is consistent with circular genomes 
being found only in these lineages. (The only apparent selection pressure for circular 
genome architecture is its greater thermostability when compared with linear DNA 
[Marguet and Forterre, 1994 D. This is consistent with the argument that eukaryotic 
telomerase RNA has its roots in the RNA world (Poole et aI. ,  1999). 
Several independent datasets thus point to the prokaryotes as being derived 
from a LUCA that had a number of features now found only in modem eukaryotes 
(Penny and Poole, 1 999; Glansdorff, 2000). The concentration of putative RNA world 
relics in the eukaryote nucleus means that assumptions as to its origins should be 
reevaluated. The evolution of the nucleus needs to be considered in selective terms, 
and the assumption that it arose in the eukaryotes after they split from the prokaryote 
lineages must be relaxed. The absence of the nucleus in prokaryote lineages might 
equally be as a result of adaptive processes (through reduction), so selection scenarios 
for both gain and loss of such a structure need to be considered if progress is to be 
made on this problem. In the following sections, I will outline possible selection 
pressures for the origin of the nuclear envelope, and for its later loss in the lineages 
that ultimately gave rise to modem prokaryotes .  
Nuclear Envelope-like structure for the first cells? 
The RNA world theory represents the most ancient period in the evolution of 
life that can be reached using the 'top-down' approach, that is, working from extant 
biochemistry back towards the origin of life .  This period is still far removed from the 
first steps toward life which have been established via the 'bottom-up' approach taken 
by prebiotic chemists. As Joyce and Orgel ( 1 999) have pointed out, the 'Molecular 
Biologist's dream' is the 'Prebiotic Chemist's nightmare', with the latter group 
favouring one or more alternative genetic systems as intermediates between the origin 
Page 1 7  
of life, and the emergence of RNA (Joyce & Orgel, 1 999; Maurel & Decout, 1 999; 
Shapiro, 1 999; Nelson et al. ,  2000). 
In addition to the problems facing prebiotic chemists trying to understand the 
origin of life and later emergence of an RNA world is that of metabolism. RNA relics 
shed almost no light on essential biosyntheses, or possible energy sources for the first 
living entities (Jeffares et al. ,  1 998). It is currently considered most likely that such 
prebiotic processes were carried out on two-dimensional surfaces than in a prebiotic 
soup. Surface chemistry avoids the problems of low concentration of precursors and 
hyrolysis by water that are expected in a prebiotic soup (Wachtersauser, 1990, 1 992; 
Maurel & Decout, 1999). 
Whether or not life began on surfaces, it at some point became cellular, and 
this is a major problem for origin of life scenarios. It is not clear whether, by the time 
RNA arose, life had become cellular-there is no evidence for or against this 
possibility. However, a major problem with cellularisation is that a simple lipid 
bilayer closed in on itself is largely impermeable, and does not seem to be a likely 
intermediate in the evolution of modern cells. The advantage of cellular 
compartmentation is not only the concentration of substrates, products, and a genetic 
apparatus; a cell must also be 'leaky'. That is, it must allow waste out, and nutrients in. 
Leakiness in the broadest sense has the disadvantage of making the cell 
completely at the mercy of the surrounding environment, so that a change in 
osmolarity can potentially pop the cell. Modern cells have a sophisticated system of 
pumps, gates and channels for regulating the concentration of protons and various 
ions within the cell, provide an effective way by which to buffer the cell from changes 
in the external environment, and allow nutrients in, and waste out. 
A potential link between surface metabolism and cells is the semicell 
(Wachtershauser, 1 992; Maynard Smith and Szathmary, 1 995), helping in bridging 
the gap from surface metabolism to cells. A semicell would allow its contents to 
interact with its environment (the surface) without the requirement of a leaky 
membrane. However, this model must assume that the nutrients at the surface are 
replenished through diffusion, as it is not clear how the semi-cell could have divided, 
and without movement, it would simply use up all the resources at a given site. 
This issue aside, in moving from hypothetical semicell (or some equivalent 
structure) to cell there is nevertheless the requirement for a 'leaky cell' stage in the 
origin of the cell, that is, a cell that could interact with the external environment. A 
cell with such pores may have been a necessary precursor to the modern cell. 
Membrane pores are formed from proteins that traverse the lipid bilayer, 
requiring a hydrophobic region that traverses the bilayer, and a hydrophilic centre, 
through which ions and small molecules can pass, as well as hydrophilic extremities, 
on either side of the bilayer. Gram-negative bacteria have two membranes, separated 
by a periplasmic space. The inner membrane contains most of the well-known pumps 
and transporters, while the outer membrane is best described as leaky, owing to the 
presence of porins, such as OmpF. As homotrimers, these form pores which allow 
Page 1 8  
hydrophilic molecules up to 600Da to diffuse freely between the extracellular 
environment and the periplasmic space, though there are also those which facilitate 
uptake of specific metabolites (Koebnik et aI. ,  2000) .  The minimum requirement for a 
transmembrane protein is an outer hydrophobic surface and an inner hydrophilic 
surface,  and for a pore to be opened up. OmpF from E. coli, just such a protein, is 
comprised of 1 6  �-strands, producing a hydrophilic channel through the outer 
membrane (Cowan et al. ,  1 992). However, the nuclear pore complex suggests there is 
a simpler alternative to a transmembrane pore. 
The nuclear envelope is unique among biological membranes. It is a double 
membrane structure, as is found in Gram-negative bacteria and organelles, but the 
difference is that the inner and outer membranes are continuous. Nuclear pores do not 
traverse the lipid bilayer in the conventional sense (Goldberg and Allen, 1 995). The 
nuclear pore complex is made up of around 30 different nucleoporins in yeast and 
around 50 in vertebrates, has a complex stoichiometry, and allows free diffusion of 
molecules up to -40kDa (Kerminer and Peters, 1 999; Allen et aI. ,  2000; Rout et aI., 
2000; Shulga et al . ,  2000). It is anchored to the surrounding nuclear envelope by a 
small number of transmembrane proteins-three different types in yeast (Rout et aI., 
2000)-but the pore itself does not punch through the lipid bilayer (Rout et aI. , 2000; 
Wente 2000). 
My suggestion is that a lipid bilayer arrangement like that seen in the extant 
nuclear envelope may be a better candidate for the first cell membrane, since, in 
principle, it permits very simple pores. This is because the interaction between protein 
forming the pore and the membrane does not require hydrophobic and hydrophilic 
regions. While the integral membrane proteins of the nuclear pore complex are central 
to the modern structure, and no doubt produce a much more stable pore-membrane 
interaction, a pore could in principle be constructed without such proteins, relying 
only on hydrophilic interactions with the polar head groups at the lipid-solvent 
interface. Thus, with such membrane architecture, a pore is possible through much 
simpler chemical interactions, than with transmembrane proteins. 
Osmotic buffering. 
The structure of the nuclear envelope and the general architecture of the nuclear 
pore complex provide insights into the possible structure of the first pore-containing 
leaky cells, without the requirement for transmembrane proteins. It does not however 
explain the early origin of the nucleus, only the possible utility of a simplified version 
of the nuclear envelope architecture. 
A huge improvement on the leaky pore-containing cell, that would not require the 
advent of complex ion transporters or other complex proteins for osmoregulation, 
would be to have a second leaky membrane outside the first. This would provide a 
buffer region between the cell core and the environment, thereby serving to reduce the 
effect of small changes in osmotic pressure. For a roughly spherical cell, as the radius 
increases linearly, the volume increases by the cube of the radius. Thus, the 
Page 1 9  
'cytoplasm' has the potential occupy a large volume, acting as a buffer region to the 
cell core, even though some of the volume of this buffer region is taken up by the 
nucleus. 
Leakiness could likewise be regulated by the level of pore protein or the amount 
of lipid produced. The fewer pores and the larger the outer membrane, the less 
susceptible the cell is to osmotic pressure. Such regulation requires a mechanism for 
sensing pore density within the membrane, and I do not consider this likely in the 
earliest cells. 
The buffered region resulting from a second, outer, pore-containing membrane 
might also provide the cell core with a nutrient-containing region, but these nutrients 
could only be utilised through diffusion into the core, and it is not obvious that all 
available nutrients would diffuse into the core. The presence of this proximal source 
of nutrients might therefore drive the evolution of protein or RNA transport out of the 
core, thereby allowing nutrients in the buffer region to be metabolised. One point that 
is worth raising is that, in addition to the effect of concentration gradients on 
determining the direction of diffusion, the rate of diffusion is inversely proportional to 
the square root of the molecular weight of the molecule (Graham's Law of diffusion). 
Thus, with large molecules and a shallow gradient, diffusion will be much slower than 
for small molecules and a steep gradient. 
With use of metabolites such as ATP (in energy storage and nucleic acid 
synthesis) in the cell core, production of further A TP in the outer region would 
presumably result in diffusion into the core, where concentration is lower due to use. 
Furthermore, following Graham's Law, breakdown of large-sized nutrients with 
storage of the energy in a small molecule would result in faster diffusion because of 
size. That said, a higher concentration in the buffer, with lower concentrations in both 
the core and the external milieu would also result in metabolite loss from the leaky 
cell into the surrounding milieu. The development of better regulation of diffusion at 
the outer membrane would therefore be selectively advantageous not only because of 
improved buffering to fluctuating osmotic pressure, but because of nutrient/energy 
loss. 
Transport of enzymes from the core to the buffer would presumably require 
transport of a diverse range of enzymes of varying sizes. While some would be 
effectively transported, others may not be transported at all .  However, under general 
selection for transport of proteins and RNA into the buffer, if the components of the 
translation apparatus became transported, this would create a situation equivalent to 
the transport of all proteins to the site of metabolism. It would however require 
mRNA transport, and also create the opposite problem, in that proteins required in the 
core would need to be transported from the site of synthesis back into the core. 
Such transport would probably not be 1 00% efficient, such that the translation 
apparatus would be present in both compartments, as would proteins and RNAs, some 
of which would be produced in the compartment where they were utilised, and others, 
which were superfluous to the functioning of the compartment. Such doubling up, 
Page 20 
resulting from inefficient mRNA and protein transport, would set the stage for 
selection of successively directional and specific RNA and protein transport 
pathways, as are seen in modern eukaryotic cells (Nakielny & Dreyfuss, 1 999). 
Selection for translation in what is now the cytoplasm was presumably stronger than 
selection for nuclear translation. 
The above discussion is speculative, but important in that it suggests a selection 
pressure for the localisation of translation in the cytoplasm. Viewing the cytoplasm as 
a buffer also suggests an important relationship between diffusion and active transport 
of proteins and RNA, the latter being selected for in that metabolism in the cytoplasm 
can potentially produce an artificial gradient, directing nutrients to the nucleus. It also 
implicitly involves proteins, and the possible relationship of such a scenario to the 
RNA world has not been examined. Before I do so, I shall first discuss the difference 
between the cytoplasmic and nuclear membranes. 
The cytoplasmic membrane. 
In arguing for the possibility that the nuclear envelope represents a relic of an 
ancient strategy for cellularisation, and that cells may have evolved a nucleus­
cytoplasm form of compartmentation very early in the evolution of life, it is also 
necessary to address the nature of the extant cytoplasmic membrane. If I am to argue 
that the structure of the nuclear envelope was common to both the ancestral nuclear 
and cytoplasmic membranes, the theory must also explain how the cytoplasmic 
membrane of modern eukaryotes arose, which, structurally, is a conventional lipid 
bilayer, and not an envelope. 
As protein synthesis became more accurate, hydrophobic transmembrane 
proteins would have become possible, and pores would be selected against in the 
outer cytoplasmic envelope, since these are leaky. Furthermore, I assume that an 
envelope is more prone to disruption, since the interactions between pore and the 
polar groups at the membrane surface can be disrupted by competing interactions with 
other molecules. In contrast, hydrophobic interactions are not easily disrupted in 
aqueous solution, so a single lipid bilayer with hydrophobic transmembrane proteins 
is expected to be a more robust architecture for the outer cell membrane. Indeed, the 
presence of anchoring proteins in the modern nuclear pore complex suggests this 
provides a stronger interaction between pore and membrane, reducing the possibility 
of structural disintegration at the interface between pore and bilayer. Such an 
arrangement could potentially have arisen early, and from this, the transmembrane 
proteins of the cytoplasmic membrane, though there is no evidence to support such 
conjecture. 
What is harder to explain is the persistence of an envelope structure in the 
nucleus. And indeed, this is as problematic whether one argues for a late origin, as per 
the standard model, or for an early origin, as is being considered here. The continuity 
of the nuclear envelope with the endoplasmic reticulum is one possible issue to 
examine. Is the endoplasmic reticulum only possible because of the nuclear envelope, 
Page 2 1  
or vice versa? Gupta & Golding ( 1 996) have drawn a schematic that suggests the two 
membrane systems arose from the invaginations of the host, with the endosymbiont 
eventually losing its membrane .  Their imaginative picture goes some way toward 
describing how the two membranes could form from an endosymbiotic event, but 
does not explain the origin of the nuclear pore complex, nor the function of the 
endoplasmic reticulum. 
The origin of the endoplasmic reticulum is as problematic as the origin of the 
nuclear envelope, and I do not address this question here. With regard to the 
persistence of a nuclear envelope structure in eukaryotes, this is likewise difficult to 
establish. The above suggestion that there was selection for a robust outer 
(cytoplasmic) membrane, due to disruption, would not necessarily apply for the 
nucleus. Under the model described in this paper, the persistence of the nuclear 
envelope in the eukaryotic cell is not clear. In light of this, the best option is to apply 
the neutral theory (Stoltzfus, 1 999), and argue that there was simply no selection to 
remove this structure, and over time, it would have become essential simply because 
other functions revolved around its presence. 
An RNA cell? 
While the minimal requirements for the formation of a cell are hard to ascertain 
(Szostak et aI. ,  200 1 ) , one major issue is whether a pre-protein cell is at all feasible. It 
is not likely that RNA could form pores that traverse the lipid bilayer, since this 
requires both hydrophobic and hydrophilic moieties. Highly modified nucleosides 
have nevertheless been shown to have pore-forming capabilities via a G-quartet 
structure (Form an et aI. ,  2000), but there is no evidence such modifications were part 
of the RNA world repertoire, and while these studies do not make use of RNA 
derivatives, this could probably be achieved. 
More tantalisingly, Khvorova et al. ( 1999) carried out in vitro selection 
experiments to screen for RNA that bound phospholipid membranes, and 
subsequently examined their ability to alter phospholipid membrane permeability. 
Their experiments revealed that RNAs can increase the ion permeability of 
phospholipid bilayers. While RNA is predicted to readily interact with the polar head 
group and glycerol phosphate moieties of phospholipids, it is less obvious that RNA 
could traverse the bilayer. However, interactions between RNA and hydrocarbons, in 
the form of the side chains of valine and isoleucine, have previously been 
demonstrated (Majerfeld & Yams, 1 994; 1 998). These interactions were mediated by 
specific hydrophobic pockets within the RNAs, thereby adding hydrophobic 
chemistry to the list of RNA chemistries. 
It is known that 2'-O-ribose methylation is found in all three domains of life, and 
likely dates back to the RNA world (Poole et al. 2000). Complete 2'-O-ribose 
methylation of double-stranded RNA produces a hydrophobic cushion in the deep 
groove of the helix (Popenda et aI. 1 997; Adamiak et al. 1 997). Ribose methylation 
could therefore be a potential means of producing RNAs with hydrophobic moieties, 
Page 22 
and might be one direction to take in building upon Majerfeld & Yarus's work on 
RNA hydrophobicity. Nevertheless, it remains unclear whether such interactions 
could be sufficient to form a transmembrane pore, and I favour the possibility of a 
non-hydrophobic, non-transmembrane RNA pore. 
Given that a membrane structure like the nuclear envelope in principle permits 
pores without the requirement for these to traverse the lipid bilayer, the naturally­
occurring G-quartet structure taken on by RNAs such as telomerase RNA 
(Williamson et al., 1989) may be the strongest link to a cellular RNA world. When 
stacked, G-quartets produce a pore-like structure that has even been shown to permit 
ion diffusion (Gilbert and Feigon, 1 999; Hud et al. ,  1 999). The pore does not need to 
be formed solely from G residues, and can include Us. The structure is simple, can 
self-assemble, and should be capable of interacting with the surface of a lipid bilayer. 
Importantly,  the plausibility of a nuclear envelope-like membrane with RNA pores 
can be tested in vitro. 
Loss of the Nucleus from prokaryotes? 
In arguing for the antiquity of the overall arrangement of the eukaryote cell, 
that is, a nuclear envelope and a cytoplasmic membrane, the absence of the nuclear 
envelope from prokaryotes must also be accounted for. In eukaryotes, transcription 
and RNA processing occur in the nucleus (Lewis and Tollervey, 2000), while 
translation occurs in the cytoplasm. If this division of processes is ancestral, an 
argument for the loss of this structure is possible under Forterre's thermoreduction 
hypothesis (Forterre, 1 995a) and/or under r selection. 
The general argument for loss of eukaryote structures or processes has been 
developed extensively elsewhere (Forterre, 1 995a; Poole et al. ,  1 998, 1 999, 200 1 ), so 
I will only provide a brief treatment here. Relative to eukaryotes, prokaryotes can be 
considered r-selected (Carlile, 1982;  Poole et al. ,  1 999, 2001) .  In short, prokaryotes 
display a fast response to the presence of a nutrient, this response involving activation 
of gene expression for metabolising the nutrient, and subsequent entry into 
exponential growth. Nutrient availability fluctuates in the environment, so there is 
selection for fast metabolism upon detection, and fast doubling times-those 
organisms that proliferate fastest will tend to win out over slower competitors. One 
consequence of such competition is that increases in the rate of gene expression are 
expected to be selectively advantageous (Poole et al. ,  2001 ) .  If the ancestor of modern 
prokaryotes expressed genes in a similar way to modern eukaryotes, the expectation 
would be that there would be strong selection for loss of the nuclear envelope. In 
modern eukaryotes, a transcript is synthesised, spliced, capped and polyadenylated, 
undergoes a quality control check for damage (Ibba and SolI, 1 999), and translation 
occurs after transport across the nuclear membrane. In bacteria, translation begins 
while the transcript is still being synthesised, processing events are minimal, and 
quality control is skipped altogether, with damaged mRNAs being translated anyway. 
mRNA damage causes a ribosome to stall, and this is released via a specific 
Page 23 
mechanism involving tmRNA (Ibba and S611, 1 999) . Under r selection, there would 
be selection for faster rates of gene expression (Poole et al.,  1 998, 1 999). If the 
nucleus were ancestral, loss would have been advantageous in that it would have 
removed a step in the gene expression pathway, speeding the response time, and 
therefore enabling faster protein production. This would have been important not only 
for speeding response times, but would also have permitted faster cell division, 
assuming protein synthesis was a rate-limiting step in this process. 
The thermoreduction hypothesis (Forterre, 1 995a) is that prokaryote lineages 
underwent a period of adaptation to high temperature environments which resulted in 
reduction in thermolabile traits . Single-stranded RNA is known to be thermolabile 
(see Forterre, 1 995a), so the loss of an ancestral nuclear envelope can also be argued 
from the viewpoint that this would shorten the time between mRNA synthesis and 
protein synthesis, and thus reduce the chance of transcript thermodegradation. 
Thermoreduction and r selection can both account for the reduction of RNA world 
relics from prokaryotes, and are indistinguishable, both on reduction of relic RNAs, 
and on the possible loss of the nuclear envelope, from prokaryote lineages (through 
reductive evolution). Again, what is important here is not whether this scenario is 
correct but that the extreme position can be argued, and that this is consistent with 
other explanations for the origin and evolution of eukaryotes and prokaryotes. 
Conclusions & outstanding problems. 
In this paper, I have examined a number of issues regarding the origin of the 
eukaryote nucleus. In addressing the areas that are currently problematic,  I have 
advanced an extreme viewpoint. This can be summarised as four separate 
conclusions: 
1 .  That the RNA world dataset suggests proto-eukaryotes were the host of the 
endosymbiont that gave rise to hydrogenosomes and mitochondria. 
2. That, given evidence that the prokaryote lineages have arisen through reductive 
evolution,  the LUCA may have had a nucleus. 
3 .  That an argument can be made that the origin of the nucleus is concurrent with 
origin of the first cells, eliminating the problem of low permeability of lipid bilayers, 
and at the same time, minimising the problems for an early leaky cell. 
4. That pores similar to those formed by the modem nuclear pore complex could have 
predated transmembrane pores, and that this architecture might even be feasible for an 
RNA world. 
That the conclusions can be treated separately is an important point. For 
instance, it may be accepted that the host for the forerunner to mitochondria and 
hydrogenosomes was nucleate, as per conclusion 1 ,  without requiring that the nucleus 
was a feature of the LUCA. Similarly, conclusions 3 and 4 may be of interest to 
Page 24 
understanding the origin of the first cells without requiring that conclusions 1 and 2 
are correct. 
The hypothesis I have advanced is necessarily speculative and, if previous 
attempts at this question are anything to go by, probably wrong .  However, the value 
of taking an evolutionary approach to understanding modern cell structure and 
function is that it ties together a wide range of experimental data and observation. 
Evolutionary hypotheses allow an explanation of unrelated phenomena within the 
context of a theory. An evolutionary approach also provides a framework for asking 
new questions that would otherwise not get asked. While evolutionary hypotheses are 
often not amenable to simple tests to prove or disprove them (there is no 'killer 
experiment' as Maizels and Weiner ( 1 999) have put it), they can nevertheless advance 
knowledge by providing a framework for understanding existing and subsequent 
experimental results, and may even provide a novel way of choosing �ubsequent 
experiments .  The following points serve to illustrate that the same data often fit more 
than one hypothesis, so it is worth being cautious. 
Reliance on the presence or absence of a trait in order to determine 'modern' and 
'ancestral', and therefore how these 'ancestral' creatures became 'modern' is nonsense 
if the theory has already assumed the direction of evolution. A good evolutionary 
theory should aim to identify potential selection pressures that can account for 
change, and hence might then help us establish in which direction evolution went. 
The hypothesis I present is worthwhile because it identifies a selection 
pressure for the origin of the nucleus-cytoplasm organisation of eukaryotic cells. This 
represents a significant departure from most treatments of the problem, which are 
largely descriptive in nature. Furthermore, having argued that the nucleus predates the 
divergence of eukaryotes, archaea and bacteria, I identify a clear selection for the loss 
of nuclear structure in the latter two groups. The theory also provides an explanation 
of the differences between eukaryotic and prokaryotic ultrastructure that is not at odds 
with the identification of the eukaryotic nucleus as the source of greatest diversity in 
RNA world relics. The 'nucleus first' hypothesis is consistent with both the 
thermoreduction hypothesis as an explanation for the origin of circular genomes in 
prokaryotes as an adaptation to high temperature, and the loss of RNA world relics 
from prokaryotes as a derived trait (Forterre, 1 995a; Poole et al. ,  1 999). The similarity 
of the genome organisation of modern eukaryotes (linear with multiple chromosomes, 
each with multiple origins of replication, and not haploid) with what is  predicted from 
theory to be a low-fidelity genome architecture able to withstand high error rates also 
argues that this organisation never arose from a prokaryotic ancestor with a circular 
genome (Poole et aI. ,  1 998; 1 999) . This, together with the RNA relic dataset, strongly 
suggests that the nucleus was not an endosymbiont and cannot be readily understood 
as having an archaeal origin. Rather, the nucleus is currently best viewed as a 
candidate for the most ancient 'living fossil' of early evolution. The prokaryotic 
lineages are considered to have lost the nucleus in response to selection to reduce the 
Page 25 
time it takes to synthesise a protein, as well as to reduce the risk of mRNA 
degradation at elevated temperatures (Poole et al. ,  1998) .  
What constitutes a useful theory is the ability to continue to explain the 
existing data as well as new data that come to hand, where other theories fail to. In the 
current case, I have argued that this theory best explains the available data, without 
justifying the direction of evolution based on a transition from simple to complex, but 
rather in terms of selection. The theory may be replaced in time, but it succeeds over 
current theories because it is better able to explain the available data in an 
evolutionary context. The possibility of a leaky nuclear envelope with G-quartet 
structures acting as pores is experimentally testable. Crucially, the hypothesis 
describes a selection pressure for the origin of a nucleus-like organisation early in the 
evolution of the cell, and is the first serious attempt to address the stmcture of the 
nuclear envelope in the context of the origins of this structure. Whether pores were 
provided by G-quartets or not, I suggest that the continuous double membrane system 
is a likely structure for an early cell membrane since it does not require amphipathic 
membrane-spanning proteins for pore production. Channel-shaped or leaky basic 
proteins are all that is required in this role as pore-forming proteins only ever come 
into contact with the polar head groups of the membrane surface. 
A final question that should be addressed is the function of additional 
membranes in organisms such as members of the Pirellula (Fuerst and Webb, 1 99 1 ;  
Lindsay et aI. ,  1 997) and the Gram-negative bacteria (Gupta, 1 998). In the case of the 
Pirellula, it will be important to establish what the role of the nucleus-like 
compartmentation is, and whether there are any similarities with the eukaryote 
nucleus over and above the two-membrane structure seen in P. marina and P. stalyei 
(Lindsay et al. ,  1 997). This may open up the possibility for detailed understanding of 
the structure, function and evolution of such ultrastructure, and progress with these 
organisms may also shed new light on the question of the selection pressure that gave 
rise to the nucleus. 
Postscript. 
This work is very much a work in progress. The argument that the origin of 
the nuclear envelope is concurrent with the first cells is particularly interesting in light 
of Blobel's ( 1980) concept of an inside-out cell, which has been extended by Cavalier­
Smith ( 1 987) . The latter author has argued that the first cells were Gram-negative 
bacteria, with inside-out cells containing cell wall material in their lumen, allowing 
them to bend. Eventually, these would have bent round on themselves, to form double 
membrane-bound cells (Cavalier-Smith, 1987) . This looks good on paper, but there 
are two difficulties. First, what selection is there for bending in the first place? 
Second, and of greater interest, how would a cell that closed in on itself interacted 
with its environment? A double membrane with a cell wall between the two 
membranes would have been a particularly impermeable structure ! 
Page 26 
Blobel's idea is nevertheless interesting in light of the recent suggestions that 
many problems for the RNA world could in principle be solved by a 'lipid world' 
(Luisi et al. , 1 999; Segre and Lancet, 2000; Segre et al. ,  2001 ) .  Excitingly, the 
biosynthesis of phospholipids involves activated precursors consisting of a nucleotide 
moiety and a hydrophobic moiety (e.g. CDP-diacylglycerol in phosphatidyl serine and 
phosphatidyl inositol synthesis, CDP-choline in phosphatidyl choline synthesis, CDP­
ethanolamine in phosphatidyl ethanolamine synthesis, and addition of sugar residues 
such as UDP-glucose, UDP-galactose). Not only are these molecules potentially 
synthesisable under prebiotic conditions (Rao et aI. ,  1982, 1987; Mar et al. ,  1 987), 
they provide a possible link to the RNA world since nucleotide cofactors are arguably 
relics of the RNA world (White, 1 976) . The possibility that genetic information was 
initially encoded by heterogeneous lipid vesicles (Segre and Lancet, 2000) is in itself 
interesting, with progress on autocatalytic self-replicating micelles and vesicles, 
central to the feasibility of a lipid world (Bachmann et aI. ,  1 992; Veronese and Luisi, 
1 998) .  
What is exciting for the origin of the RNA world and the link between surface 
and cell metabolism is that hypothesised genetic take-over by RNA does not require 
invoking intermediate steps for which there are no identifiable relics (e.g. clay 
surfaces or PNA). Phosphate chemistry is common to both phospholipids and RNA 
and phosphate chemistry is considered to have played a central role in the emergence 
of life (Westheimer, 1 987; Baltscheffsky, 1 997) . Moreover, not only can 
phosphatidylnucleosides self-assemble to form vesicles, they can in principle permit 
Watson-Crick base pairing via the lipid head groups possessing nucleosides (Berti et 
aI. ,  1 998). Not only do the head groups permit a link between lipid and RNA worlds, 
lipids are able to carry out catalyses (see Segre et aI. ,  200 1 ), and the surface of a lipid 
bilayer would provide a two-dimensional surface for sequestering molecules, as per 
other surface scenarios. 
The inside-out cell might thus have initially provided a surface on which to 
carry out catalysis. Cooperation between such cells would lead to aggregations of 
inside-out cells, and these would form the basis of a 'pseudocell', where the 
'cytoplasm' represents an inner compartment which resembles the nucleus (minus 
nuclear pores) in structure. This can explain the transition from surface to cell without 
invoking a semicell, and would also account for the unique structure of the nuclear 
envelope. A leaky membrane would not initially be at issue, as surface chemistry 
would initially dominate, and pores could form without the requirement for spanning 
a membrane. I plan to develop this idea (and those described in the latter sections of 
the paper) more thoroughly, having represented only the broad concepts here. 
References. 
Adamiak, D .A. ,  Milecki, J . ,  Popenda, M. ,  Adarniak, R.W., Dauter, Z. and 
Rypniewski, W.R. ( 1997) Crystal structure of 2'-O-Me(CGCGCG)2, an RNA 
Page 27 
duplex at 1 .30 A resolution. Hydration pattern of 2'-0-methylated RNA. Nucleic 
Acids Res 25, 4599-4607. 
Akhmanova, A. ,  Voncken, F. ,  van Alen, T . ,  van Hoek, A. ,  Boxma, B . ,  Vogels, G. ,  
Veenhuiss, M. and Hackstein, J.H.P. ( 1 998) A hydrogenosome with a genome. 
Nature, 396, 527-528. 
Albery, W.J. and Knowles, lR. ( 1 976) Evolution of enzyme function and the 
development of enzyme efficiency. Biochemistry 15, 563 1 -5640. 
Allen, 1.F. and Raven, 1.A. ( 1 996) Free-radical-induced mutation vs redox regulation:  
costs and benefits of genes in organelles. J. Mol. Evol. 42, 482-492. 
Allen, T.O. ,  Cronshaw, lM., Bagley, S . ,  Kiseleva, E. and Goldberg, M.W. (2000) 
The nuclear pore complex: mediator of translocation between nucleus and 
cytoplasm. 1. Cell. Sci. , 113, 1 65 1 - 1 659. 
Altman, S .  and Kirsebom, L.A. ( 1 999) Ribonuclease P. In: The RNA World, 2nd Edn. 
Gesteland, R.F., Cech, T.R. and Atkins, 1.F. , eds. Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, New York. pp. 35 1 -380. 
Andersson, DJ. and Hughes, D. ( 1996) Muller's ratchet decreases fitness of a DNA­
based microbe. Proc. Natl. Acad. Sci. USA, 93, 906-907. 
Andersson, 1.0. and Andersson, S .G.E. ( 1999a) Insights into the evolutionary process 
of genome degradation. Curr. Opin. Genet. Dev., 9, 664-67 1 .  
Andersson, 1.0. and Andersson, S .G.E. ( 1999b) Genome degradation is an ongoing 
process in Rickettsia. Mol. BioI. Evol., 16, 1 178- 1 1 9 1 .  
Andersson, S .G.E. and Kurland, e.G. ( 1 999) Origins of mitochondria and 
hydrogenosomes. Curr. Opin. Microbiol., 2, 535-54 1 .  
Andersson, S .G.E. ,  Zomorodipour, A. ,  Andersson, 1.0.,  Sicheritz-Ponten, T.,  
Alsmark, U.e.M., Podowski, R.M., Naslund, A.K., Eriksson, A.-S . ,Winkler, H.H. 
and Kurland, e.G. ( 1 998) The genome sequence of Rickettsia prowazekii and the 
origin of mitochondria. Nature, 396, 1 33- 140. 
Angert, E.R., Clements, K.D. and Pace, N.R. ( 1 993) The largest bacterium. Nature, 
362, 239-24 1 .  
Bachmann, P.A., Luisi, P.L and Lang, J. ( 1992) Autocatalytic self-replicating micelles 
as models for prebiotic structures. Nature, 357, 57-59. 
Baltscheffsky, H.  ( 1 997) Major "anastrophes" in the origin and early evolution of 
biological energy conversion. J. Theor. Biol. , 187, 495-501 .  
Berg, O .G. and Kurland, C.G. (2000) Why mitochondrial genes are most often found 
in nuclei. Mol. BioI. Evoz. , 17, 95 1 -96 1 .  
Berti, D. ,  Baglioni, P. ,  Bonaccio, S . ,  Barsacchi-Bo, G. and Luisi, P.L. ( 1 998) Base 
complementarity and nucleoside recognition in phosphatidylnucleoside vesicles. J. 
Phys. Chem. B, 102, 303-308. 
Biagini, G.A. and Bemard, e. ( 1 999) Primitive anaerobic protozoa: a false concept? 
Mol. Microbiol., 146, 1 0 1 9- 1020. 
Blanchard, J.L. and Lynch, M. (2000) Organellar genes: why do they end up in the 
nucleus? Trends Genet. , 16, 3 15-320. 
Page 28 
Blobel, G. ( 1980) Intracellular protein topogenesis. Proc. Natl. Acad. Sci. USA, 77, 
1 496- 1 500. 
Carlile, M.J. ( 1982) Prokaryotes and eukaryotes: strategies and successes .  Trends 
Biochem. Sci. , 7, 1 28-130. 
Cavalier-Smith, T. ( 1 983) A six-kingdom classification and a unified phylogeny. 
Endocytobiol. , 2, 1027- 1034. 
Cavalier-Smith, T. ( 1987) The origin of cells: a symbiosis between genes, catalysts, 
and membranes. Cold Spring Barb. Symp. Quant. Biol. , 52, 805-824. 
Cavalier-Smith, T. ( 1 988) Origin of the cell nucleus.  BioEssays 9, 72-78. 
Cavalier-Smith, T. (2000) Membrane heredity and early chloroplast evolution. Trends 
Plant Sci. , 5, 174- 1 82. 
Clements, K.D. and Bullivant, S. ( 1 99 1 )  An unusual symbiont from the gut of 
surgeonfishes may be the largest known prokaryote. 1. Bact. , 173, 5359-5362. 
Cole, S .T. ,  Eiglmeier, K., Parkhill, J., James, K.D., Thomson, N.R., Wheeler, P.R., 
Honore, N., Gamier, T., Churcher, c., Harris, D. ,  Mungall, K., Basham, D., 
Brown, D. ,  Chillingworth, T., Connor, R., Davies, R.M., Devlin, K, Duthoy, S . ,  
Feltwell, T . ,  Fraser, A, Hamlin, N. ,  Holroyd, S . ,  Homsby, T . ,  Jagels, K, Lacroix, 
c.,  Maclean, 1. ,  Moule, S . ,  Murphy, L., Oliver, K, Quail, M.A, Rajandream, M.­
A,  Rutherford, K.M.,  Rutter, S. ,  Seeger, K, Simon, S . ,  Simmonds, M. ,  Skelton, J . ,  
Squares, R. ,  Squares, S . ,  Stevens, K. , Taylor, K, Whitehead, S., Woodward, J .R. ,  
Barrell, B.G. , 200 1 .  Massive gene decay in the leprosy Bacillus. Nature 409, 1 007-
l O l l .  
Collins, L.J. , Moulton, V. and Penny, D. (2000) Use of RNA secondary structure for 
studying the evolution of RNase P and RNase MRP. J. Mol. Evol. , 51, 1 94-204. 
Cowan, S .W.,  Schirmer, T., Rummel, G., Steiert, M., Ghosh, R., Pauptit, R.A et al. 
( 1 992) Crystal structures explain functional properties of two E. coli porins. 
Nature, 358 , 727-733 .  
Damell, J.E. and Doolittle, W.F. ( 1986) Speculations on  the early course of  evolution. 
Proc. Natl. Acad. Sci. USA, 83, 1 27 1 - 1275. 
Doolittle, W.F. ( 1 999) Phylogenetic classification and the universal tree. Science, 284, 
2 1 24-2 1 28 .  
Douglas, S . ,  Zauner, S . ,  Fraunholz, M . ,  Beaton, M . ,  Penny, S . ,  Deng, L.-T. , Wu, X., 
Reith, M.,  Cavalier-Smith, T. and Maier, U.-G. (2001 )  The highly reduced genome 
of an enslaved algal nucleus. Nature 410, 1 09 1 - 1 096. 
Eigen, M. and Schuster, P. ( 1 979) The hypercycle: a principle of natural self­
organization. Springer-Verlag, Berlin. 
Embley, T.M. and Hirt, R.P. ( 1 998) Early branching eukaryotes? Curr. Opin. Genet. 
Dev. 1 998,  8, 624-629. 
Fishelson, L. ,  Montgomery, W.L. and Myrberg, Jr., AA ( 1 985) A unique symbiosis 
in the gut of tropical herbivorous surgeonfish (Acanthuridae: Teleostei) from the 
Red Sea. Science, 229, 49-5 1 .  
Page 29 
Fliigge, D.-I. (2000) Transport in and out of plastids: does the outer envelope 
membrane control the flow? Trends Plant Sci. , 5, 1 35- 1 37 .  
Forman, S .L., Fettinger, le. ,  Pieraccini S . ,  Gottarelli G. ,  Davis, J .T .  (2000) Toward 
artificial ion channels: a lipophilic G-quadruplex. 1. Am. Chem. Soc. 122, 4060-
4067. 
Forterre, P. ( 1 995a) Thermoreduction, a hypothesis for the origin of prokaryotes .  CR. 
Acad. Sci. Paris Ill, 318, 4 1 5-422. 
Forterre, P. ( 1995b) Looking for the most "primitive" organism(s) on Earth today: the 
state of the art. Planet. Space Sci. , 43, 1 67- 1 77 .  
Forterre, P.  ( 1 997) Archaea: what can we learn from their sequences? Curr. Opin. 
Genet. Dev. , 7, 764-770. 
Forterre, P. and Philippe, H. ( 1999) Where is the root of the universal tree of life?  
Bioessays, 21 ,  87 1 -879. 
Fraser, e.M., Gocayne, J.D., White, 0., Adams, M.D., Clayton, RA., Fleischmann, 
RD., Bult, e.J. ,  Kerlavage, A.R, Sutton, G., Kelley, J .M.,  Fritchman, J.L., 
Weidman, J.F., Small, K.V., Sandusky, M.,  Fuhrmann, J., Nguyen, D. ,  Dtterback, 
T.R, Saudek, D.M., Phillips, e.A., Merrick, J.M., Tomb, J .-F. ,  Dougherty, B .A., 
Bott, K.F.,  Hu, P.-e. ,  Lucier, T.S. ,  Peterson, S .N.,  Smith, H.O.,  Hutchison Ill, C.A. 
and Venter, J .c.  ( 1 995) The minimal gene complement of Mycoplasma genitalium. 
Science, 270, 397-403.  
Fraser, C .M.,  et  aI. ,  1 997 . Genomic sequence of a Lyme disease spirochaete, Borrelia 
burgdorferi. Nature 390, 580-586. 
Fraser, e.M., et aI. ,  1 998. Complete genome sequence of Treponema pallidum, the syphilis 
spirochaete. Science 28 1 , 375-388. 
Fuerst, J.A. and Webb, RI. ( 1 99 1 )  Membrane-bounded nucleoid in the eubacterium 
Gemmata obscuriglobus. Proc. Natl. Acad. Sci. USA, 88, 8 1 84-8 1 88.  
Galtier, N . ,  Tourasse, N.  and Gouy, M. ( 1 999) A nonhyperthermophilic common 
ancestor to extant life forms .  Science 283, 220-22 1 .  
Gaspin, C. ,  Cavaille, J . ,  Erauso, G. and Bachellerie, J.P. (2000) Archaeal homologs of 
eukaryotic methylation guide small nucleolar RNAs: lessons from the Pyrococcus 
genomes. l. Mol. Bioi. , 297, 895-906. 
Gilbert, W. and de Souza, SJ. ( 1999) Introns and the RNA world. In: The RNA 
World, 2nd Edn. Gesteland, RF., Cech, T.R and Atkins, J.F., eds. Cold Spring 
Harbor Laboratory Press, Cold Spring Harbor, New York. pp. 22 1 -23 1 .  
Gilbert, D.E. and Feigon, J.  ( 1999) Multistranded DNA structures. Curr. Opin. Struct. 
Biol. , 9, 305-3 1 4. 
Gilson, P.R, Maier, U.-G. and McFadden, G.I. ( 1997) Size isn't everything: lessons in 
genetic miniturisation from nucleomorphs. Curr. Opin. Genet. Dev. , 7, 800-806. 
Glansdorff, N. (2000) About the last common ancestor, the universal life-tree and 
lateral gene transfer: a reappraisal. Mo!. Microbial. , 38, 1 77- 1 85.  
Goldberg, M.W. and Allen, T.D. ( 1 995) Structural and functional organization of the 
nuclear envelope. Curr. Opin. Cell Bioi., 7, 301 -309. 
Page 30 
Gray, M.W., Burger, G. and Lang, B .F. ( 1 999) Mitochondrial evolution. Science, 283, 
1 476- 148 1 .  
Grotzinger, l.P. and Rothman, D.H. ( 1 996) An abiotic model for stromatolite 
morphogenesis. Nature, 383, 423-425.  
Gupta, R.S.  ( 1 998) What are archaebacteria: life's third domain or monoderm 
prokaryotes related to Gram-positive bacteria? A new proposal for the 
classification of prokaryotic organisms. Mol. Microbiol. , 29, 695-707. 
Gupta, R.S.  and Golding, G.B. ( 1 996) The origin of the eukaryotic cell. Trends 
Biochem Sci. 21, 1 66- 1 7 1 .  
Gupta, R.S .  and Singh, B .  ( 1 994) Phylogenetic analysis of 70kD heat shock protein 
sequences suggests a chimaeric origin for the eukaryotic nucleus. Curr. BioI. , 4,  
1 104-1 1 14. 
Henze, K., Morrison, H.G., Sogin, M.L. and MUller, M. ( 1 998) Sequence and 
phylogenetic position of a class II aldolase gene in the amitochondriate protist, 
Giardia lamblia. Gene, 222, 1 63- 1 68. 
Himmelreich, R. ,  et aI. ,  1 996. Complete genome sequence analysis of the genome of the 
bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 3 ,  109- 1 36. 
Horiike, T. ,  Hamada, K., Kanaya, S. and Shinozawa, T.  (2001 )  Origin of eukaryotic 
cell nuclei by symbiosis of Archaea and Bacteria is revealed by homology-hit 
analysis . Nat. Cell Bioi. 3, 2 10-2 14.  
Hud, N.V.,  Schultze, P. ,  Sklenar, V. and Feigon, 1 .  ( 1 999) Binding sites and dynamics 
of ammonium ions in a telomere repeat DNA quadruplex. J. Mol. BioI. , 285, 233-
243. 
Ibba, M. and SoIl, D. ( 1 99) Quality control mechanisms during translation. Science, 
286, 1 893- 1 897. 
lain, R., Rivera, M.C and Lake, l .A. ( 1999) Horizontal transfer among genomes: the 
complexity hypothesis. Proc. Natl. Acad. Sci. USA, 96, 380 1 -3806. 
leffares, D.C. ,  Pooie, A.M. and Penny, D. ( 1 998) Relics from the RNA world. 1. Mol. 
Evol. , 46, 1 8-36. 
loyce, G.F. and Orgel, L.E. ( 1 999) Prospects for understanding the origin of the RNA 
world. In: The RNA World, 2nd Edn. Gesteland, R.F. , Cech, T.R. and Atkins, l.F. , 
eds. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York. pp. 49-
77. 
Kalman, S . ,  Mitchell, W. ,  Marathe, R. ,  Lammel, C, Fan, l" Hyman, R.W. ,  Olinger, L. ,  
Grimwood, l . ,  Davis, R.W., Stephens, R.S. ,  1999. Comparative genomes of Chlamydia 
pneumoniae and C. trachomatis. Nat. Genet., 21, 385-389. 
Katz, L.A. ( 1 998) Changing perspectives on the origin of eukaryotes .  Trends Eco!. Evol. , 13, 
493-497. 
Keeling, P.1. ( 1 998) A kingdom's progress: archaezoa and the origin of eukaryotes. 
Bioessays, 20, 87-95. 
Page 3 1  
Keeling, PJ. and Doolittle, W.F. ( 1 997) Evidence that eukaryotic triosephosphate 
isomerase is of alpha-proteobacterial origin. Proc. Natl. Acad. Sci. USA, 94, 1 270-
1 275.  
Keeling, P.J. and McFadden, G.I .  ( 1 998) Origins of micro sporidia. Trends Microbiol., 
6, 1 9-23. 
Kerminer, O. and Peters, R ( 1 999) Permeability of single nuclear pores. Biophysical 
J., 77, 2 17-228. 
Khvorova, A., Kwak, y'-G., Tamkun, M., Majerfeld, I. and Yarus, M. ( 1999) RNAs 
that bind and change the permeability of phospholipid membranes. Proc. Natl. 
Acad. Sci. USA, 96, 1 0649- 10654. 
Koebnik, R ,  Locher, K.P. and Van Gelder, P. (2000) Structure and function of 
bacterial outer membrane proteins: balTels in a nutshell. Mol. Microbiol. , 37, 239-
253 . 
Koch, A.L. ( 1 984) Evolution vs the Number of Gene Copies Per Primitive Cell. J. 
Mol. Evol. 20, 7 1 -76. 
Koonin, E.V., Mushegian, A.R. and Bork, P.  ( 1 996) Non-orthologous gene 
replacement. Trends Genet. , 12, 334-336. 
Lake, J.A. and Rivera, M.C. ( 1 994) Was the nucleus the first endosymbiont? Proc. 
Natl. Acad. Sci. USA, 91 ,  2880-288 l .  
Lewis, J .D. and Tollervey, D.  (2000) Like attracts like: getting RNA processing 
together in the nucleus. Science, 288, 1 385- 1 389. 
Li, Y.-L. ( 1999) The primitive nucleus model and the origin of the cell nucleus. 
Endocyt. Cell Res. , 13, 1 -86. 
Liaud, M.-F., Lichtl6, C. Apt, K., Martin, W. and Cerff, R (2000) Compartment­
specific isoforms of TPI and GAPDH are imported into diatom mitochondria as a 
fusion protein: evidence in favor of a mitochondrial origin of the eukaryotic 
glycolytic pathway. Mol. BioI. Evol. , 17, 2 1 3-223. 
Lindsay, M.R., Webb, RI. and Fuerst, J.A. ( 1997) Pirellulosomes: a new type of 
membrane-bounded cell compartment in planctomycete bacteria of the genus 
Pirellula. Microbiology, 143, 739-748. 
Ljungman, M. and Hanawalt, P.C. ( 1 992) Efficient protection against oxidative DNA 
damage in chromatin. Mol. Carcinog. , 5, 264-269. 
Lockhart, PJ. ,  Steel, M.A., Barbrook, A.C.,  Huson, D.H., and Howe, C.J.  ( 1 998) A 
covariotide model describes the evolution of oxygenic photosynthesis. Mol. BioI. 
Evol. 15, 1 183- 1 1 88 .  
Lopez, P. ,  FortelTe, P . ,  le  Guyader, H.  and Philippe, H .  (2000) Origin of  replication of 
Thermotoga maritima. Trends Genet. , 16, 59-60. 
Lopez, P . ,  Philippe, H . ,  Myllykallio, H. and FortelTe, P. ( 1 999) Identification of 
putative chromosomal origins of replication in Archaea. Mol. Microbio!. , 32, 883-
886.  
L6pez-Garcfa, P. and Moreira, D. ( 1 999) Metabolic symbiosis at the origin of 
eukaryotes. Trends Biochem. Sci. , 24, 88-93. 
Page 32 
Lowe, D.R. ( 1994) Abiological origin of described stromatolites older than 3 .2 Ga. 
Geology 22, 387-390. 
Luisi, P.L. , Walde, P. and Oberholzer, T. ( 1 999) Lipid Vesicles as Possible 
Intermediates in the Origin of Life. Curr. Opin. Colloid lnteiface Sci. , 4, 33-39. 
Maizels, N. and Weiner, AM. ( 1999) The genomic tag hypothesis: what molecular 
fossils tell us about the evolution of tRNA In: The RNA World, 2nd Edn. 
Gesteland, R.F. , Cech, T.R. and Atkins, J.F., eds. Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, New York, p79- 1 1 1 . 
Majerfeld, 1. and Yams, M. ( 1994) An RNA pocket for an aliphatic hydrophobe. Nat. 
Struct. Bioi. , 1, 287-292. 
Majerfeld, I. and Yams, M. ( 1 998) Isoleucine:RNA sites with associated coding 
sequences. �A, 4, 47 1 -478 .  
Marguet, E. and Forterre, P .  ( 1 994) DNA stability at temperatures typical for 
thermophiles. Nucleic Acids Res. , 22, 1 68 1 - 1 686. 
Margulis, L. ( 1970) Origin of eukaryotic cells. Yale University Press, New Haven. 
Margulis, L., Dolan, M.F. and Guerrero, R. (2000) The chimeric eukaryote: Origin of 
the nucleus from the karyomastigont in amitochondriate protists. Proc. Natf. Acad. 
Sci. USA, 97, 6954-6959. 
Martin, W., Brinkmann, H., Savona, C.  and Cerff, R. ( 1 993) Evidence for a chimeric 
nature of nuclear genomes: eubacterial origin of eukaryotic glyceraldehyde-3-
phosphate dehydrogenase genes. Proc. Nati. Acad. Sci. USA, 90, 8692-8696. 
Martin, W. and Muller, M. ( 1 998) The hydrogen hypothesis for the first eukaryote. 
Nature, 392, 37-4 1 .  
Martin, W .  ( 1999a) Mosaic bacterial chromosomes: a challenge en route to a tree of 
genomes. Bioessays, 21, 99- 104. 
Martin, W. ( 1999b) A briefly argued case that mitochondria and plastids are 
descendents of endosymbionts, but that the nuclear compartment is not. Proc. R. 
Soc. Lond B, 266, 1 387- 1 395 . 
Martin, W. ( 1999c) Primitive anaerobic protozoa: the wrong host for mitochondria 
and hydrogenosomes? Mol. Microbiol. 146, 1 02 1 - 1022. 
Martin, W., Stoebe, B . ,  Goremykin, V. ,  Hansmann, S., Hasegawa, M. and Kowallik, 
K.V. ( 1 998) Gene transfer to the nucleus and the evolution of chloroplasts. Nature, 
393, 1 62- 1 65 .  
Mar, A ,  Dworkin, 1 .  and Or6, J .  ( 1 987) Non-enzymatic synthesis of  the coenzymes, 
uridine diphosphate glucose and cytidine diphosphate choline, and other 
phosphorylated metabolic intermediates. Origins Life Evol. Biosph. , 17, 307-3 19 .  
Maurel, M.-C. and Decout, J.-L. ( 1 999) Origins of  life: molecular foundations and 
new approaches. Tetrahedron, 55, 3 14 1 -3 1 82. 
Maynard Smith, J. and Szathmary, E. ( 1 995) The major transitions in evolution. W.H. 
Freeman, Oxford. 
McCord, J. (2000) The evolution of free radicals and oxidative stress. Am. J. Med. , 
108, 652-659. 
Page 33  
McFadden, G.I.  ( 1 999) Endosymbiosis and evolution of the plant cell. Curr. Opin. 
Plant BioI. , 2, 5 1 3-5 19 .  
Montgomery, W.L. and Pollak, P.E. ( 1 988) EpulopisciumJishelsoni N.  G. ,  N.  Sp., a 
protist of uncertain taxonomic affinities from the gut of an herbivorous reef fish. J. 
Protozool. , 35, 565-569. 
Moran, N.  and Baumann, P.  (2000) Bacterial endosymbionts in animals .  Curr. Opin. 
Microbiol. , 3, 270-275. 
Moran, N.A. ( 1 996) Accelerated evolution and Muller's Ratchet in endosymbiotic 
bacteria. Proc. Natl. Acad. Sci. USA, 93, 2873-2878. 
Moreira, D. and L6pez-Garcia, P. ( 1 998) Symbiosis between methanogenic archaea 
and delta-proteobacteria as the origin of eukaryotes: the syntrophic hypothesis. J. 
Mol. Evol. , 47, 5 17-530. 
Myllykallio, H., Lopez, P. ,  L6pez-Garcfa, P., Heilig, R., Saurin, W., Zivanovic, Y. ,  
Philippe, H.  and Forterre, P .  (2000) Bacterial mode of replication with eukaryotic­
like machinery in a hyperthermophilic archaeon. Science, 288, 221 2-22 1 5. 
Nakielny, S .  and Dreyfuss, G. ( 1 999) Transport of proteins and RNAs in and out of 
the nucleus.  Cell, 99, 677-690. 
Nelson, K.E., Levy, M. and Miller, S .L. (2000) Peptide nucleic acids rather than RNA 
may have been the first genetic molecule. Proc. Natl. Acad. Sci. USA, 97, 3868-
387 1 .  
Noller, H.F. ( 1 999) On the origin of the ribosome: coevolution of subdomains of 
tRNA and rRNA In: The RNA World, 2nd Edn. Gesteland, R.F. , Cech, T.R. and 
Atkins, J.F. , eds. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New 
York. pp. 35 1 -380. 
Olson, M.O.J. ,  Dundr, M. and Szebeni A (2000) The nucleolus: an old factory with 
unexpected capabilities. Trends Cell BioI. , 10, 1 89- 196. 
Omer, A.D. ,  Lowe, T.M., Russell, AG., Ebhardt, H., Eddy, S .R. and Dennis, P.P. 
(2000) Homologs of small nucleolar RNAs in Archaea. Science, 288, 5 17-522. 
Pannucci, lA., Haas, E.S. ,  Hall, T.A, Harris, J.K. and Brown, J .W. ( 1999) RNase P 
RNAs from some Archaea are catalytically active. Proc. Natl. Acad. Sci. USA, 96, 
7803-7808. 
Patterson, DJ.  ( 1 999) The diversity of eukaryotes. Am. Nat. , 154, S96-5 1 24. 
Penny, D. and Poole, A ( 1 999) The nature of the last universal common ancestor. 
Curr. Opin. Genet. Dev. , 9, 672-677. 
Pereira S.L. and Reeve J.N. ( 1 998) Histones and nuc1eosomes in Archaea and 
Eukarya: a comparative analysis. Extremophiles, 2, 14 1 - 1 48 .  
Philippe, H . ,  Germot, A and Moreira, D .  (2000a) The new phylogeny of  eukaryotes .  
Curr. Opin. Genet. Dev. 10, 596-601 .  
Philippe, H. ,  Lopez, P. ,  Brinkmann, H. ,  Budin, K., Germot, A ,  Laurent, J . ,  Moreira, 
D . ,  Miiller, M. ,  Le Guyader, H. (2000b) Early branching UT fast evolving 
eukaryotes? An answer based on slowly evolving positions. Phi/os. Trans. R. Soc. 
Lond. B 267, 1 2 1 3- 122 1 .  
Page 34 
PooIe, A. ,  Jeffares,  D. and Penny, D. ( 1 999) Early evolution: prokaryotes, the new 
kids on the block. Bioessays, 21, 880-889. 
Poole, A. and Penny, D. (200 1 )  Does endosymbiosis explain the origin of the 
nucleus? Nat. Cell BioI. 3 ,  E 173 .  
Poole, A. ,  Penny, D.  and Sjoberg, B.-M.  (2000) Methyl-RNA: an evolutionary bridge 
between RNA and DNA? Chem. BioI. 7, R207-R21 6. 
PooIe, A.M., Jeffares, D.C. and Penny, D.  ( 1 998) The path from the RNA world. J. 
Mol. Evol. , 46, 1 - 17 .  
Poole, A.M.,  Phillips, MJ. and Penny, D. (200 1 )  Prokaryote and eukaryote 
evolvability . Biosystems, submitted. 
Popenda, M.,  Biala, E., Milecki, J. and Adamiak, R.W. ( 1 997). Solution structure of 
RNA duplexes containing alternating CG base pairs: NMR study of r(CGCGCG)2 
and 2'-O-Me(CGCGCG)2 under low salt conditions. Nucleic Acids Res. 25, 4589-
4598. 
Race, H.L., Herrmann, R.G. and Martin, W.  ( 1 999) Why have organelIes retained 
genomes? Trends Genet. 15, 364-370. 
Rao, M. ,  Eichberg, J.  and Or6, J. ( 1 982) Synthesis of phosphatidylcholine under 
possible primitive earth conditions. J. Mol. Evol. , 18, 1 96-202. 
Rao, M.,  Eichberg, J.  and Or6, J.  ( 1 987) Synthesis of phosphatidylethanolamine under 
possible primitive earth conditions.  J. Mol. Evol. , 25, 1 -6. 
Razin, S . ,  Yogev, D.  and Naot, Y.  ( 1 998) Molecular biology and pathogenicity of 
mycoplasmas. Microbiol. Mol. Bioi. Rev. , 62, 1094- 1 1 56. 
Reanney, D.e. ( 1974) On the origin of prokaryotes. J. Theor. BioI., 48, 243-25 1 .  
Ribeiro, S .  and Golding, G.B . ( 1 998) The mosaic nature of the eukaryotic nucleus. 
Mol. BioI. Evol. , 15, 779-788. 
Richter, e.,  Park, lW. and Ames, B .N.  ( 1 988) Normal oxidative damage to 
mitochondrial and nuclear DNA is extensive. Proc. Natl. Acad. Sci. USA, 85, 
6465-6467. 
Rivera, M.e., Jain, R, Moore, J.E. and Lake, J.A. ( 1 998) Genomic evidence for two 
functionally distinct gene classes. Proc. Natl. Acad. Sci. USA, 95, 6239-6244. 
Rotte, C. ,  Henze, K., Muller, M. and Martin, W. (2000) Origins of hydrogenosomes 
and mitochondria. Curr. Opin. Microbiol. , 3, 48 1 -486. 
Rout, M.P. , Aitchison, lD., Suprapto, A., Hjertaas, K., Zhao, Y. and Chait, B .T .  
(2000) The yeast nuclear pore complex: composition, architecture, and transport 
mechanism. J. Cell BioI., 148, 635-65 1 .  
Scheuring,  1. (2000) Avoiding Catch-22 of early evolution by stepwise increase in 
copying fidelity . Selection 1, 1 35- 145. 
Schopf, I.W., Packer, B .M. ( 1 987) Early Archean (3.3 billion to 3 .5 billion year old) 
microfossils from Warrawoona Group, Australia. Science, 237, 70-73.  
Segre, D. ,  Ben-EH, D., Deamer, D.W.,  Lancet, D.  (2001 )  The lipid world. Origins Life 
Evol. Biosph. 31, 1 19- 145 .  
Segre, D .  and Lancet, D. (2000) Composing life. EMBO Rep. , 1, 2 17-222. 
Page 35 
Shapiro, R. ( 1 999) Prebiotic cytosine synthesis: a critical analysis and implications for 
the origin of life .  Proc. Natl. Acad. Sci. USA, 96, 4396-440 1 .  
Shulga, N . ,  Mosammaparast, N . ,  Wozniak, R. and Goldfarb, D .S .  (2000) Yeast 
nuc1eoporins involved in passive nuclear envelope permeability. J. Cell BioI., 149, 
1 027- 1038 .  
Sogin, M.L. ( 1997) History assignment: when was the mitochondrion founded? Curr. 
Opin. Genet. Dev. , 7, 792-799. 
Sogin, M.L., Silberman, J.D. , Hinkle, G. and Morrison, H.G. ( 1 996) Problems with 
molecular diversity in the eukarya. In: Evolution of Microbial 1ife. Roberts, D.M., 
Sharp, P. ,  Alderson, G. and Collins, M., eds. Cambridge University Press, 
Cambridge. pp. 1 67- 1 84. 
SoIl, J . ,  Belter, B . ,  Wagner, R. and Hinne, S .c. (2000) . . .  response: The chloroplast 
outer envelope: a molecular sieve? Trends Plant Sci., 5, 1 37- 1 38 .  
Stoltzfus, A,  ( 1 999) On the possibility of constructive neutral evolution. J. Mol. 
Evol. , 49, 1 69- 1 8 1 .  
Szostak, 1.W.,  Barte1, D.P. and Luisi, P.L. (2001 )  Synthesizing life .  Nature, 409, 387-
390. 
Vellai, T., Takacs, K. and Vida, G. ( 1 998) A new aspect to the origin and evolution of 
eukaryotes. 1. Mol. Evol. , 46, 499-507 . 
Venema, J. and Tollervey, D. ( 1 999) Ribosome synthesis in Saccharomyces 
cerevisiae. Annu. Rev. Genet. , 33, 261 -3 1 1 .  
Veronese, A and Luisi, P.L. ( 1 998) An autocatalytic reaction leading to 
spontaneously assembled phosphatidyl nucleoside giant vesicles. J. Am. Chem. 
Soc. , 120, 2662-2663.  
Wachtershauser, G. ( 1 990) Evolution of the first metabolic cycles. Proc. Natl. Acad. 
Sci. USA, 87, 200-204. 
Wachtershauser, G. ( 1992) Groundworks for an evolutionary biochemistry: the Iron­
Sulphur World. Prog. Biophys. Mol. Bioi. , 58, 85-20 1 .  
Walsh, M.M. ( 1 992) Microfossils and possible microfossils from the Early Archean 
Onverwacht Group, Barberton Mountain Land, South Africa. Precambrian Res. , 
54, 27 1 -293 .  
Wente, S .R. (2000) Gatekeepers of the nucleus. Science, 288, 1 374- 1 377. 
Westheimer, F.H. ( 1 987) Why nature chose phosphates. Science, 235, 1 173- 1 178.  
White, H.B . ( 1 976) Coenzymes as fossils of an earlier metabolic state. J. Mol. Evol. , 
7, 10 1 - 104. 
Williamson, J.R., Raghuraman, M.K. and Cech, T.R. ( 1 989) Monovalent cation­
induced structure of telomeric DNA: the G-quartet model. Cell 59, 87 1 -880. 
Woese, c.R. and Fox, G.E. ( 1 977) Phylogenetic structure of the prokaryotic domain: 
the primary kingdoms. Proc. Natl. Acad. Sci. USA, 74, 5088-5090. 
Wren, B .W. (2001 )  Microbial genome analysis: insights into virulence, host 
adaptation and evolution. Nat. Rev. Genet. 1, 30-39. 
Yarus, M. ( 1 999) Boundaries for an RNA world. Curr. Opin. Chem. Bio!. 3, 260-267. 
Page 36 
Table 1 .  Features that need to be explained, and issues that must be addressed in 
examining the origins of the eukaryote nucleus. 
1 .  Chimeric nature of eukaryotic genome. 
2 .  Absence of  phagocytosis in  bacteria and archaea 
3 .  The structure of the nuclear envelope 
4. The nuclear pore complex 
5 .  Nuclear export and import processes. 
6 .  Disappearance of the nuclear membrane, but not of other organellar membranes, 
during cell division in some eukaryotes . 
7. The origin of meiosis/mitosis. 
8 .  Eukaryotic linear chromosomes with multiple origins of replication and telomeres 
9. Preservation of RNA world relics in eukaryotes, and reduction in prokaryotes .  
1 0. Coupled transcription and translation in  prokaryotes compared with mRNA 
splicing and processing in eukaryotes. 
1 1 . Any theory for the origins of the nucleus must also explain the absence of this 
structure in prokaryotes. 
Poole AM, Phillips MJ & Penny D. 
Prokaryote and eukaryote evolvability . . 
Biosystems (submitted). 
Paper 7 

Prokaryote and Eukaryote Evolvability. 
Anthony M Poole*, Matthew J Phillips & David Penny 
Institute of Molecular B ioSciences 
Massey University 
*Corresponding author. 
Private Bag 1 1222 
Palmerston North 
New Zealand 
Email:  a.m.poole@massey.ac.nz 
Fax: +64 6 350 5688 
Abbreviations: 
ESND: Evolutionarily-Stable Niche-Discontinuity 
PSF: Periodically-selected function 
LUCA: Last Universal Common Ancestor 
Paper 7: Poole, Phillips & Penny 
Abstract: 
The concept of evolvability covers a broad spectrum of, often contradictory, ideas. At one end 
of the spectrum it is equivalent to the statement that evolution is possible, at the other end are 
untestable post hoc explanations, such as the suggestion that current evolutionary theory 
cannot explain the evolution of evolvability. We examine similarities and differences in 
eukaryote and prokaryote evolvability, and look for explanations that are compatible with a 
wide range of observations. Differences in genome organisation between eukaryotes and 
prokaryotes meets this criterion. The single origin of replication in prokaryote chromosomes 
(versus multiple origins in eukaryotes) accounts for many differences because the time to 
replicate a prokaryote genome limits its size (and the accumulation of junk DNA). Both 
prokaryotes and eukaryotes appear to switch from genetic stability to genetic change in 
response to stress. We examine a range of stress responses, and discuss how these impact on 
evolvability, particularly in unicellular organisms versus complex multicellular ones. 
Evolvability is also limited by environmental interactions (including competition) and we 
describe a model that places limits on potential evolvability. Examples are given of its 
application to predator competition and limits to lateral gene transfer. We suggest that 
unicellular organisms evolve largely through a process of metabolic change, resulting in 
biochemical diversity. Multicellular organisms evolve largely through morphological 
changes, not through extensive changes to cellular biochemistry. 
Keywords: 
evolvability, evolutionarily-stable niche-discontinuity, eukaryote evolution, genome 
evolution, prokaryote evolution. 
Paper 7: Poo\e, Phillips & Penny 2 
Introduction. 
Evolvability is a central concept in evolution but is easily misconstrued, hence its use 
must be defined carefully. At a basic level, evolvability is the fundamental concept of 
evolution. From the late 1 7th to mid- 1 9th centuries it was generally assumed that species had 
an unchangeable 'essence' .  This Platonic concept was introduced in the late 1 7th century when 
it became increasingly clear that continuing spontaneous generation of larger life forms did 
not occur (see Farley 1 977). If species had an unchangeable essence then, by definition, there 
could be no evolution, even if individual organisms deviated from the ' ideal type' .  
'Evolvability' ,  by denying species have an unchangeable essence, i s  central t o  evolution. 
Since all evolutionists agree, this definition is not that interesting. 
Burch and Chao (2000) offer a more limited definition, "the ability to generate 
adaptive mutations". We consider the two aspects of this definition: 'adaptive mutations' and 
' ability to generate' .  That adaptive mutations occur is the evolvability concept from the 
previous paragraph, but in modem terminology: some mutations are advantageous. In the 
early 19th century many accepted selection, but only in elimination of deleterious variants. 
Selection, by eliminating such variants, tended to preserve the unchanging essence of the 
species. In contrast, the existence of adaptive variants and positive selection allows evolution 
through time and is an essential part of evolvability . 
The 'ability to generate' adaptive mutations is more problematic, and is mirrored in 
Kirschner & Gerhart' s  ( 1 998) definition: ' the capacity to generate (our emphasis] heritable, 
selectable phenotypic variation' . If it is simply the observation that advantageous mutations 
occur, then, again, the usage is uncontroversial, though uninteresting. If it implies that 
advantageous mutations can be generated 'on demand' (e.g. Cairns et al. 1 988) then it is a 
specialised (and controversial) usage. Some discussions on evolvability appear to give the 
impression of ' the more change the better' - yet most major change is highly deleterious. For 
instance, Radman et al. ( 1 999) point out that selection for increased fidelity of DNA synthesis 
has been achieved in the lab (Fijalkowska et al. 1993), and that this demonstrates 'there was 
no durable selective pressure in nature for maximal fidelity' .  
Paper 7: Poole, Phillips & Penny 3 
However, the majority of discussions on evolvability (e.g. Wagner & Altenberg 1 996; 
Wagner 1 996; Kirschner & Gerhart 1998;  Partridge & Barton 2000), acknowledge directed 
mutation is not required to understand evolvability . Nevertheless, confusion arises easily, as 
shown by reactions to work from Lindquist' s  group (Rutherford & Lindquist 1 998;  True & 
Lindquist 2000). Other workers concluded (Dickinson and Seger 1 999; Partridge & Barton 
2000) that these authors favoured the idea that certain traits have been selected for their utility 
to contribute to organismal evolvability, and nothing else. While Lindquist points out that this 
was never her interpretation (Lindquist 2000), the subsequent correspondence generated by 
this work (Dickinson and Seger 1 999; Partridge & Barton 2000; Dover 2000) illustrates how 
problematic this concept can be. There is no agreed definition for evolvability that explicitly 
avoids the problem of evolutionary forethought. Indeed, whenever the phrase ' the evolution 
of evolvability' is used, there is the possibility of it being misconstrued. This is not because 
evolvability cannot evolve through accepted processes of evolution. Rather, under known 
processes of Darwinian evolution, evolvability cannot evolve in itself because the origin and 
maintenance of a trait would have to precede selection for the trait. 
Evolvability can be a by-product of selection however. For example, activation of a 
transposable element might lead to a mutation that is selected, thereby inadvertently leading 
to additional mutations (through additional element insertions) in the future. Such future 
mutations may be deleterious or advantageous; the increased mutation rate is a by-product of 
the transposable element hitchhiking with the selected mutation. 
Still at issue is the evolutionary origin of traits that contribute to evolvability and 
adaptive mutations. Examination of the origins of such traits is an important step in alleviating 
controversy surrounding this area. This is particularly so with evolvability in multicellular 
organisms, where one gets the impression that we should be in awe of the exciting molecular 
and genetic mechanisms that contribute to eukaryote evolvability (Kirschner & Gerhart 1 998; 
Herbert & Rich 1 999). Other reviews on the evolution of evolvability (e.g. Partridge and 
Barton 2000; Kirschner and Gerhart 1 998;  Moxon and Thaler 1997) identify mechanisms by 
which genome architecture can influence this (see also Box 3) .  We focus here on the genome 
Paper 7: Poole, Phillips & Penny 4 
organisation of prokaryotes and eukaryotes, and interactions between these and the 
environment. 
We review recent work on the evolutionary origins of the differing genome 
organisation prokaryotes (archaea and bacteria) and eukaryotes, and how this impacts on our 
understanding of the 'evolution of evolvability' .  Our previous work, and that of others 
(particularly on parasites), suggests that many of the differences can be explained through 
constraints (or lack thereof) on genome size and architecture. Another significant area is stress 
responses. Experimental data, both with bacterial and metazoan models, point towards a 
general response to stress as being important in understanding how traits contributing to 
evolvability may have hitchhiked on survival of individuals .  Horizontal gene transfer, 
stationary phase hypermutation, switching between sexual and asexual cycles, and the role of 
HSP 90 in Drosophila are considered. 
Finally, we discuss how the physical and biotic environment limits potential 
evolvability, allowing a distinction to be drawn between this and realised evolvability (Fig. 2). 
Our model, which we call Evolutionarily-Stable Niche-Discontinuity (ESND), describes how 
competition allows colonisation of a fitness peak, and subsequently, how intraspecific 
competition limits movement away from that peak (Fig. 1 ) .  Examples of interspecies 
competition and predator-prey coevolution are considered, and are aimed at understanding 
evolvability in eukaryotes. 
Assumptions versus hypotheses. 
It is almost universally assumed that eukaryotes evolved from ancestral prokaryote 
forms, an assumption that seems intuitively correct. However, it is just that - an intuitive bias 
that simple evolves to complex - and is taken as given by a large majority of researchers (see 
Forterre & Philippe 1 999 for critique). An extensive body of literature and ongoing research 
challenges this notion (Reanney 1 974; Darnell & Doolittle 1 986; Forterre 1 995; Poole et al. 
1 998, 1 999; Forterre & Philippe 1 999; Penny & Poole 1 999; Glansdorff 2000) .  What is 
important for evolvability studies is that the assumption of a prokaryote to eukaryote 
Paper 7: Poole, Phillips & Penny 5 
transition effectively removes selection from discussions on the evolution of prokaryotes -
they are by definition the ancestral state. Since the direction of change is  assumed to be from 
simple prokaryote cells to complex eukaryote cells, the question becomes, by default, what 
drove eukaryote genomes to become so complex? We will argue that factors affecting the 
origin of prokaryotic genome organisation are equally important. 
There are strong parallels in the evolution of complexity and the evolution of 
evolvability. Neither complexity nor evolvability can be directly selected for; both impact 
future evolution, and hence are in violation of evolution as tinkering. Szathmary and Maynard 
Smith ( 1 995) point out that 'There is no theoretical reason to expect evolutionary lineages to 
increase in complexity with time, and no empirical evidence that they do so' . Unlike with 
evolvability however, there is little apparent controversy here. It is accepted that complexity is  
sometimes a consequence of evolution, but not a predictable outcome of evolution. Reductive 
evolution in parasites and eukaryotic organelles are important examples (see below).  
How can we account for traits that contribute to complexity which are conserved in 
most eukaryotes when we know that, as with evolvability, complexity is not directly 
selectable? It is not sufficient to claim that a trait conserved across a broad range of species is 
evidence for selection. A recent example is that junk DNA has a function because a survey of 
genome size shows that it correlates with cell size in cryptomonads (Beaton & Cavalier-Smith 
1 999). The argument seems to be that selection for increased cell size has led to the expansion 
of junk regions because these take up space, and therefore the amount of DNA can ' specify' 
cell size. Correlation is ambiguous, and in this case it is unclear which is cause and which 
effect. Junk DNA may persist because it has not been selected against. 
A theory that explains a range of phenomena (explanatory power) and leads to new 
tests (predictive power) is certainly preferable to post hoc explanations. These one-off 
explanations are proposed after a discovery has been made, hence post hoc - 'after the event ' .  
When explaining to  students the lack of scientific rigour in  post hoc explanations, we use  the 
story of Darryl (Box 1 ) .  The humour is incidental to the main point, that scientific statements 
are best made as predictions, not thought up after the event. An example is an old natural 
Paper 7: Poole, Phillips & Penny 6 
theology explanation of why the earth changes its tilt on its axis as it rotates around the sun. 
The change in tilt is for generating the seasons. A delightful post hoc explanation! 
This criticism of post hoc explanations is similar to Gould and Lewontin's ( 1 979) 
critique of the 'adaptionist program',  that everything about an organism can be explained as 
aiding some aspect of its life cycle. Post hoc explanations may nevertheless be correct 
(equally, 'good' theories can be incorrect). The aim should be to reformulate them into testable 
hypotheses, and to look for explanations that account for a range of phenomena (not just the 
original observation that led to the hypothesis). Gibson (2000) points out that the tendency for 
researchers to give post hoc adaptationist explanations is still alive and well in developmental 
biology. He writes that, 'selection should only be invoked when the null hypothesis of 
neutrality cannot explain the data' . In molecular evolution the importance of neutral evolution 
is often taken into account, and extremely complex traits such as the spliceosome, mRNA 
editing in trypanosomes, and the scrambled genes of ciliates have been argued to be neutral 
(Stolzfus 1 999). While it is not certain if any of these traits originated through neutral 
evolution, the idea is an important one, since it shifts theorising away from post hoc 
explanations, and frames the problems in the manner advocated by Gibson (2000). 
Returning to the evolution of prokaryotes and eukaryotes, we shall argue that many 
complexities of the eukaryote genome can be explained by the null hypothesis of neutralism, 
while the prokaryote genome cannot. This is an important point, since it changes our view of 
the evolution of genomic features contributing to evolvability. 
Origins of prokaryote and eukaryote genome architecture. 
Key aspects of eukaryotic genome architecture appear to be conserved from a very 
early period in evolution, pre-dating the Last Universal Common Ancestor (LUCA). In 
contrast, prokaryote genome architecture results from one or more periods of reductive 
evolution (PooIe et al. 1 999; Penny & Poole 1 999). Others (Forterre 1 995, Forterre & Philippe 
1 999, Galtier et al. 1 999) have developed similar views from different data. Our argument is 
based on extant genome architectures and the observation that the greatest diversity of RNA 
Paper 7: Poole, Phil lips & Penny 7 
world relics (RNAs that appear to predate the origins of proteins and DNA) are found in 
eukaryotes. For prokaryotes ,  both the loss of ancient RNA genes and their genome 
architecture can be explained in terms of reductive evolution. 
Some of our reasoning is given below, but it is not necessary to accept all our 
conclusions to accept our general argument on eukaryote and prokaryote evolvability. Our 
conclusions are consistent with Kirschner and Gerhart's  ( 1 998) description of prokaryote and 
eukaryote modes of evolvability. Prokaryotes, 'have undergone limited morphological change 
but instead have achieved extensive biochemical diversification' .  Similarly, multicellularity in 
eukaryotes, specifically metazoa, ' achieved extensive control over the milieu of internal cells 
and generated many physiologically sensitive micro-environments in that milieu' . In this 
latter multicellular group, biochemical evolution is limited, and cells receive a more constant 
level of nutrition with little or no variation in the type of nutrients available. If evolution is 
biochemically conservative in metazoa and biochemically innovative in prokaryotes, it is 
perhaps no surprise to find ancient biochemical traits conserved in eukaryotic cells, while 
these have been lost from prokaryotes. 
Broad differences between eukaryote and prokaryote lifestyle have been described in 
terms of r and K selection (Carlile 1 982), terms derived from the equation for the rate of 
population growth (Box 2). Relative to prokaryotes, eukaryotes are K-selected, where K­
selected organisms are broadly defined as having a relatively slow rate of reproduction and 
longer generation time, a stable (though limiting) nutrient supply, relatively stable 
populations, and are larger in size. In contrast, prokaryotes are relatively more r-selected, with 
faster reproduction and short generation times, small size, fast response times to a fluctuating 
nutrient supply, and with large fluctuations in population size. There is a spectrum of values 
with perhaps E. coli and yeast near the r-selection end, and elephants and oak trees near the 
K -selection end of the spectrum. 
Prokaryote genomes. 
Paper 7: Poole, Phillips & Penny 8 
Prokaryote genomes possess only one origin of replication per chromosome. 
Consequently, size places limits on the rate of chromosome replication. As the fidelity is 
affected by replication rate, so rate will be constrained by the need to faithfully copy and 
maintain the genome. 
Transient global hypermutation occurs in stationary phase (Table 1 ) ,  whereas selection 
for fast replication operates during periods of exponential growth. There is no precedent for 
assuming that higher mutation rates will be selected for during exponential growth where 
proliferation of a successful strategy is required. Rather, a quick response to nutrient 
availability, followed by clonal proliferation, is advantageous. r-selection revolves around 
competition (during exponential growth) for resources that fluctuate in availability, and this 
places the reproductive rate under selection (Box 3). 
That replication is rate-limiting during exponential growth has been documented for E. 
coli, where genome doubling takes one hour, and cell doubling occurs every 20 minutes 
(Alberts et al. 1 994). The effect on the genome is straightforward - anything that can be lost 
will eventually be lost. Selection does not distinguish between junk, and what may be 
advantageous later (e.g. on a new nutrient source), so even essential functions required only 
periodically may be lost from the genome. It is therefore of little surprise to find that, in both 
E. coli and Salmonella enterica, genome size varies within species by around 20% .  Similar 
variability is found in Helicobacter pylori and Neisseria meningitidis, and is interpreted as 
different genes being maintained in different isolates, which often inhabit different niches 
(Lan & Reeves 2000). 
Periodically-selected functions (PSFs) are regularly lost from individuals ,  but are 
maintained in bacterial populations through lateral gene transfer. PSFs are essential in the 
long term, given that environmental fluctuation is normal and that organisms must continually 
cope with such fluctuations. In a completely clonal population where replication time is rate 
limiting, PSFs would be irreversibly lost. Constant selection of PSFs within a popUlation, 
coupled with lateral transfer is likely central to prokaryote genome architecture, permitting 
Paper 7: Pooie, Phillips & Penny 9 
maintenance of PSFs crucial to long term survival under conditions where these are frequently 
lost. 
Plasmids are a complete transferable unit that can be immediately expressed, but do 
not increase the replication time of the genome. While a genomic copy of a PSF must be lost 
through gene decay (mutations and deletions) and reestablishment requires reinsertion, a 
plasmid can be lost without gene decay (this would be advantageous during exponential 
growth), is readily reacquired, can be replicated in parallel with the genome. Supernumerary 
chromosomes in fungi have been likened to plasmids, as they are not permanent and in 
several cases have been found to carry genes for pathogenicity, detoxification of host 
antimicrobials, and antibiotic resistance (Covert 1 998). 
An obvious solution to the prokaryote dilemma is to distribute genes across several 
chromosomes and with multiple origins of replication, thereby permitting a larger genome 
without slowing replication. A number of prokaryote genomes are spread across multiple 
chromosomes, and some may possess more genes than yeast (Bendich & Drlica 2000). 
Circular chromosomes with single origins of replication nevertheless place limits on 
individual chromosome size. 
That circular chromosomes are only found in prokaryotes may be historical accident. 
Forterre ( 1995) has argued that the prokaryote lineages arose through adaptation to high 
temperatures (the thermoreduction hypothesis). Currently his is the best explanation for the 
presence of circular chromosomes in prokaryotes; circular DNA is more thermostable than 
linear (Marguet & Forterre 1994). Other data are also consistent with thermodreduction 
(Poole et al. 1 999, Penny & Poole 1999), and while some prokaryotes possess linear genomes 
(Bendich & Drlica 2000), this state appears derived (Poole et al. 1 998, 1999). 
Eukaryotes. 
K-selected organisms have a steadier rate of reproduction, with relatively smaller 
population fluctuation, particularly in multicellular eukaryotes. Eukaryote chromosomes 
possess multiple origins of replication, and accumulation of repetitive elements largely 
Paper 7: Poole, Phillips & Penny 1 0  
accounts for the 80,000-fold genome size variation in this domain (HartI 2000). In many 
cases, increases in size are probably not a result of selection (HartI 2000), and consequently, 
some eukaryote genome sizes are probably only limited by the fidelity of replication (see 
Table 4 in Drake 1 999). 
With few apparent constraints on genome size, gene duplication followed by 
divergence is an effective means for the evolution of new functions. Neither duplication, nor 
the presence of pseudogenes, is inherently deleterious in eukaryotes (in contrast to 
prokaryotes). Gene duplication and divergence has resulted in major expansions of 
developmental gene families, e.g. the homeobox family (Ruddle et al. 1 999) . Genome 
duplication is also considered a feature of eukaryote genome evolution (Wolfe et al. 1 997; 
Ruddle et al .  1 999), a good example being polyploidy in plants. 
Lack of constraint on genome size has enabled large numbers of 'selfish' elements to 
co-exist in eukaryotic genomes (Smit 1999; Brosius 1 999). Such elements can occasionally be 
recruited into the cellular repertoire. Examples include dendrite-specific RNAs, rodent BCl 
and primate BC200. BC 1 has been recruited from tRNA Ala and BC200 from an Alu element 
(Brosius 1 999). V(D)J recombination in the vertebrate immune system is another example. 
Proteins RAG 1 & RAG2 mediate V(D)J recombination, forming a site-specific recombinase 
which recognises and cleaves DNA at conserved recombination signal sequences (Agrawal et 
al. 1 998;  Hiom et al. 1 998). Similarities in gene organisation, signal sequences, mechanism of 
action, and the presence of a transposase DDE motif in RAG 1 (Landree et al. 1 999) suggests 
this system originated through a germline transposition event into a receptor gene in the 
ancestor of jawed vertebrates (Agrawal et al. 1 998;  Plasterk 1 998). An unforeseen 
consequence of the recruitment that gave rise to V(D)J joining is that it also appears to 
participate in at least some chromosomal translocation events, though probably at low 
frequency (Melek & Gellert 2000). 
Aspects of placental development in eutherian mammals appear similar to viral 
infection (Larsson and Andersson 1 998;  Harris 1 998). Cell fusion, forming the placental 
syncytium, is also a feature of endogenous retroviruses (providing an efficient means of 
Paper 7: PooJe, Phillips & Penny 1 1  
infecting new cells). In human placental development, an envelope protein from the 
endogenous retrovirus ERV -3 is responsible for cell fusion and other differentiation events 
during formation of the syncytium (Un et al. 1 999). Production of endogenous retroviral 
particles early in placental development increases the chance of germline insertion, but also 
provides immunosuppression, thereby preventing the maternal immune system from rejecting 
the foetus. Indeed, retroviral envelope protein expression suppresses the immune response 
(Mangeney & Heidmann 1998).  
These examples highlight the centrality of the tinkering concept in evolution (Jacob 
1 977). In all cases, the evolution of complex structures appears to have arisen from selfish 
elements. Occasional recruitment of such elements into new function appears a consequence 
of the lack of selection against genome size, making the genomes of higher eukaryotes more 
vulnerable to intragenomic parasites .  Overall, the neutrality of non-coding sequences in 
chromosomes with multiple centres of replication explains many aspects of eukaryote 
evolvability. 
Transcript processing. 
Extensive transcript processing is a feature of eukaryotes, and includes mRNA 
splicing (Sharp 1 994), editing (Smith et al. 1 997), and snoRNA-mediated cleavage, 
methylation and pseudouridylation of RNA (Weinstein & Steitz 1 999) . Splicing and editing 
are absent from prokaryotes, and snoRNA-mediated modifications are absent in bacteria 
(though methylation is present in archaea). Though disputed (Lafontaine & Tollervey 1 998; 
Sontheimer et al. 1 999), splicing and snoRNA-mediated modifications probably predate the 
LUCA (Poole et al. 1 998, 1999). 
Under r-selection and a single origin of replication, spliceosornal introns and snoRNA­
mediated modifications are expected to be reduced or lost. mRNA processing delays the 
expression of proteins, the transcript being processed largely by RNA-mediated reactions. 
Methylation and pseudouridylation of RNA is ubiquitous, though heavily reduced in bacteria. 
In archaea, methylation is extensive, and requires snoRNA-like sRNAs (Omer et al. 2000), 
Paper 7: Poole, PhiIlips & Penny 12  
smaller than in eukaryotes .  Each sRNA guides two methylations (the mqjority of eukaryotic 
snoRNAs guides just one). Pseudouridylation is minimal in archaea, with numbers 
comparable to those for bacteria (Charette & Gray 2000). Modifications in bacteria are 
limited to highly conserved regions of the rRNA, which may explain their maintenance, while 
methylation may be important in archaeal rRNA for stability at high temperature (Omer et al. 
2000). 
In scenarios of the evolution of snoRNAs post-LUCA, the argument has largely been 
post hoc, with the emphasis being on how these RNAs could have diversified in eukaryotes 
(Morrissey & Tollervey 1995; Lafontaine & Tollervey 1998). The finding of sRNAs in 
archaea requires a revision of that theory. The alternative, loss under r-selection in 
prokaryotes, is the best explanation for the current data. 
Some snoRNAs are paternally imprinted in rodent and human brain, and do not direct 
methylation of rRNA or other functional RNAs (Cavai1l6 et al. 2000). One of these may 
regulate A-to-I editing andlor alternative splicing of the serotonin 5-HT2c receptor mRNA 
through methylation (Cavaill6 et al. 2000; Filipowicz 2000). Indeed, splicing and A-to-I 
editing, perhaps also modification by methylation and pseudouridylation, are central to the 
generation of multiple products from one mRNA (Herbert & Rich 1 999). It is unclear how A­
to-I editing of nuclear mRNAs arose in evolution, but the targets have largely been found in 
signalling in the nervous system of both invertebrates and vertebrates (Reenan 200 1 ). The role 
of splicing in generating alternative protein products, and in regulating developmental fate 
(Graveley 200 1 ), is possibly a consequence of its maintenance in the absence of selection to 
remove this apparatus long after its hypothesised role in early genomes would have become 
redundant. RNA processing pathways can be co-opted and contribute to evolvability, but 
clearly had other origins. 
Cytosine methylation, a double-edged sword. 
Cytosine methylation is widespread in eukaryotes, and is considered to provide a 
mechanism for gene silencing, and parental imprinting. Cytosine is an unstable base, readily 
Paper 7: Poole, Phillips & Penny 1 3  
deaminating to uracil, which, if unrepaired will result in a C.G to T.A mutation in one of two 
daughter copies. Methylation of cytosine produces 5-methylcytosine (5-meC) which 
deaminates more rapidly than unmethylated cytosine, yielding thymine (Poole et al. 200 1 ) . 
Cytosine methylation, while apparently providing a means of epigenetic control, also 
produces mutational hotspots, and this can potentially be beneficial or deleterious, depending 
on context. 
Gene silencing has been considered to represent the main function of cytosine 
methylation, but Y oder et al . ( 1997) point out that evidence is limited. The majority of 5-meC 
residues are found in transposable elements, not promoters. They suggest that methylation is 
primarily a mechanism for silencing transposons, with the corollary that 5-meC to T 
deamination is largely beneficial because it results in faster inactivation of these elements 
through mutation. That this cannot be the only function of cytosine methylation is supported 
by the existence of at least two repair mechanisms (Scharer &Jiricny 200 1 ;  Poole et al. 2001 ). 
If both gene regulation and transposon inactivation are mediated by cytosine methylation, 
there is a trade-off because in the former 5-meC to T deaminations are potentially deleterious, 
whereas in the latter they are potentially beneficial. The presence of deamination repair 
mechanisms would therefore be important for repairing damaged genes, but weaken the 
potential for transposon inactivation (Poole et al. 200 1) .  
The picture is  further complicated, because methylation of transposable elements may 
contribute to epigenetic effects on adjacent genes (Whitelaw & Martin 200 1 ). Patterns of 
methylation are known to be inherited, and to have a phenotypic effect. An example is agouti 
locus in mice, where coat colour is inherited epigenetically through the female line in the 
absence of genetic variation (Morgan et al. 1999). Whitelaw & Martin (2001 )  coined the term 
epigenotype for the effect that epigenetic inheritance has on phenotype, and excitingly, this 
may provide a means of exploring phenotypic space. However, work on agouti demonstrated 
that, even with selection for a given epigenotype, the original proportions of epigenotypes 
may reappear (Morgan et al. 1 999, Whitelaw & Martin 200 1 ), making it hard to see how 
parental imprinting mechanisms could lead to genetic fixation of a phenotypic trait. However, 
Paper 7: Poole, PhiIlips & Penny 14  
- --
- --- -
Monk ( 1995) has proposed that 5-meC deamination may contribute to fixation, since this 
would make permanent the silencing effect at a given site. In this way, the epigenotype could 
permit exploration of alternative phenotypes that could then become 'hard-wired' in the 
genome. 
Again, this mechanism impacts on evol vability, but did not evolve for evol vability' s 
sake. Prerequisites for such complex regulation may instead have been the invasion of 
eukaryote genomes by transposable elements, and selection to silence these, given the 
apparent inability to prevent their insertion. The conflicting need to eliminate these and the 
recruitment of methylation into gene regulation, perhaps through adjacent transposons may 
have set up the requirement to repair 5-meC to T deaminations. Imperfect repair of these 
(Holliday & Grigg 1 993) may be the cost associated with the conflicting roles of methylation 
in the genome. However, it may provide a mechanism where 5-meC to T deamination gives 
rise to a heritable phenotypic trait from an epigenetic trait with limited heritability. Again, it is 
difficult to establish which came first, transposon inactivation or gene regulation, but the 
example serves to make the point that it is necessary to examine the origins of a process when 
considering the evolution of evolvability. 
Another example is somatic hypermutation at the V(D)J locus in formation of the 
antibody variable region by C to U editing (Muramatsu et aL 2000; Revy et aL 2000). This is 
effectively enzyme-catalysed cytosine deamination at hotspots (contingency loci). The 
function is opposite to the uracil-DNA glycosylases, which are involved in repair of cytosine 
deaminations (Scharer &Jiricny 200 1 ), and is also seen in apolipoprotein B transcript editing 
(Herbert & Rich 1 999). 
Parasites: evolvability or reductive evolution? 
Parasites are interesting in regard to evolvability because they represent a strategy 
common to eukaryotes, prokaryotes, viruses, and selfish elements. Parasites are often fast­
evolving, and have often moved from a non-parasitic to a parasitic lifestyle. We consider the 
following questions: 
Paper 7:  Poole, Phillips & Penny 1 5  
• Were ancestral groups from which parasites arose inherently more 'evolvable' ?  
• If there is fast evolution in parasites, are they inherently more evolvable? 
• Is the concept of evolvability useful here? 
Parasitism is widespread - in plants, fungi, insects, worms, protists, bacteria, etc. 
Conspicuously absent are parasitic mammals, birds, amphibians and reptiles (tetrapods). Is 
this due to limited evolvability or an ecological limitation? We think the latter. The 
dependence of the eutherian embryo on the mother for nutrients is much like the dependence 
of endoparasitic larvae on the host for nutrition (Grbic 2000). Suckling in mammals,  and 
nutritional dependence of juvenile birds and mammals on parents might to a lesser extent be 
seen in this light. Indeed, juvenile parasitic stages in early development serve the same role as 
parentally-supplied nutrition, and it is worth noting that egg-yolk mass has become reduced in 
endoparasitic wasps (Grbic 2000). Clearly, this modus operandi of early development has 
been made use of in mammals, and absence of true parasitism in tetrapods may simply reflect 
an absence of niches, though a few examples, such as brood parasitism exist. 
What distinguishes lineages that have become obligate parasites from those that are 
free-living? The discussion above suggests it is the presence of an available niche, not limits 
on evolvability. However, there must be adaptation in order to secure nutrients from the host, 
fine-tune development to coincide with host life cycle, and not kill the host before the parasite 
has matured or moved to the next host. Studies of unicellular parasite genomes suggest that 
the loss of traits no longer required in the parasitic lifestyle accounts for most change. For 
example, in the Rickettsiae, adenosylmethionine synthetase is in the process of being lost 
from the genomes of this genus (Andersson & Andersson 1 999). Likewise, cases of loss from 
parasitic genomes of primary biosynthetic pathways, such as amino acid synthesis and de 
novo pathways for deoxyribonucleotide synthesis (Fraser et al. 1 998;  Andersson et al. 1 998b) 
are consequent to the evolution of mechanisms for extracting these nutrients from the host. 
Genome reduction and higher rates of evolution appear to be general features of 
parasitic genomes, being reported in leprosy bacillus (Cole et al. 200 1 ), the obligate 
intracellular parasites Chlamydia (Kalman et al. 1 999) and Buchnera, and other endosymbiont 
Paper 7: Poole, PhiI Iips & Penny 1 6  
bacteria (Moran & Baumann 2000). Genome reduction is likely a consequence of redundancy, 
while the higher rates of evolution seen in parasites are attributable to Muller's Ratchet, the 
fixation of slightly deleterious mutants within small asexual populations (Moran 1 996). 
Genome reduction is extreme in chloroplasts (McFadden 1 999), mitochondria (Gray et 
aL 1 999), and nucleomorphs, the remains of nuclei in secondary endosymbionts (Douglas et 
al. 200 1 ) .  There is a difficulty in separating selection for evolvability per se from other 
potential selective pressures. Reductive evolution, and increased rates of evolution are 
consequences of parasitism or endosymbiosis, so while the process of adaptation can be 
extensively studied, the initial conditions cannot. What can be said is that to the parasite or 
endosymbiont, the host is a resource, so general models of evolvability are likely to be useful 
in understanding parasitism. In the following section, we consider this problem in greater 
depth. 
The stress response and evolvability. 
In this section, we consider how stress responses promote organismal survival. 
Hypermutation (adaptive evolution), horizontal transfer, sex in organisms with an asexual 
cycle, recombination, cell-cell interactions, and cell specialisation can all be understood as 
stress adaptations (Table 1 ) .  That they contribute to evolvability in prokaryotes and 
unicellular organisms is consequential - these traits have not been selected for their propensity 
to promote evolvability. and the evolutionary origins of these phenomena need not be in the 
adaptation to stress. Rather, what is important is that they currently contribute to adaptation to 
stress in a range of organisms, and that this has an impact on evolvability. 
We suggest that these mechanisms are important for understanding periods of genetic 
stability versus genetic change within the lifecycle of a range of organisms. Respectively, 
these might be described as 'if it ain't broke, don't fix it' and 'adapt or die' strategies. Switching 
between strategies is expected to be more effective in prokaryotes and unicellular eukaryotes 
than in multicellular eukaryotes since, as described below, mechanisms for alleviating lethal 
stresses exist in the first two groups, but not the third. 
Paper 7: Poole, PhilIips & Penny 1 7  
A range of starvation responses, which can be described as 'adapt or die' strategies, are 
seen in prokaryotes (Table 1 ) .  In Bacillus subtilis, sporulation and genetic competence (to 
take up DNA from the external milieu) are both controlled by an extracellular peptide, CSF 
(competence and sporulation factor). At low concentrations, CSF stimulates competence, and 
this occurs 2-3 generations prior to entry into stationary phase. At high concentrations, which 
arise shortly after entry into stationary phase, CSF inhibits competence, and stimulates 
sporulation (Lazazzera et al. 1 999). Importantly, the SOS response and competence are 
coinduced and DNA uptake may provide a template for repair of endogenous DNA (Tortosa 
& Dubnau 1 999). Alternatively, formation of double-strand breakages may permit integration 
of foreign DNA concurrent with uptake. Perhaps favouring the first possibility is the 
observation that these 'quorum sensing' mechanisms are often strain-specific, which may 
favour uptake from closely-related strains. 
Concurrent with competence (and controlled by the same pathway), degradative 
enzymes are expressed and these may act to increase the availability of extracellular nutrients 
(Tortosa & Dubnau 1999). The same situation is seen in sexual sporulation in the fungus 
Aspergillus nidulans, where the a-( 1 ,3)-glucan, which makes up the vegetative hyphal Wall, is 
degraded to glucose (Champe et al. 1 994). 
A parallel to meiosis and sexual sporulation in fungi is evident here. Meiosis and 
competence precede sporulation, and DNA uptake in some bacteria may be most favoured 
between closely related strains, thereby approximating sex. The response to starvation is to 
change from a mode of development where genetic change is minimised, to one where there 
is active change, before dispersal to a new environment. 
In Aspergillus, hyphae are sent out into the medium in a radial pattern away from the 
centre of the colony. Closer to the centre, asexual spores develop, which allow dispersal to 
new nutrient sources. This strategy is analogous to exponential growth in bacteria. Sexual 
sporulation occurs later in the lifecycle of the fungus; sexual spores are formed, at the centre 
of the original colony, where nutrients will have been most exhausted (Champe et al. 1 994), 
Paper 7: Poole, Phillips & Penny 1 8  
and this is equivalent to the stationary phase events of genetic competence and sporulation in 
Bacillus. 
DNA uptake by prokaryotes is apparently not always an approximation of eukaryote 
sex. Distant transfers between archaea and bacteria have been documented (Nelson et al. 
1 999; Forterre et al. 2000), and both Neisseria and Haemophilus are apparently competent all 
the time (Solomon & Grossman 1 996), each containing well over a thousand copies of a DNA 
uptake signal sequence (Smith et al. 1 999). 
It seems unlikely that horizontal transfer is unbridled and without patterns, despite the 
vigour with which many in the phylogenetics community have taken on this idea as a post hoc 
explanation for current difficulties in explaining conflicting datasets (Woe se 1 998;  Doolittle 
1 998). DNA loss, due to constraints on replication rate during exponential growth, suggests 
that any sequences taken up will only be fixed if they confer a selective advantage to the 
organism. Greater promiscuity permits greater sampling of environmental DNA, potentially 
bestowing a greater propensity to adapt to environmental change (greater evo}vability). 
Greater promiscuity may also equate to greater parasite susceptibility, which might explain 
the existence of strain-specific competence factors. 
There is now overwhelming evidence for transient hypermutation, induced by the SOS 
response to starvation (Torkelson et al. 1 997; Foster 1999; McKenzie et al. 2000). Metzgar 
and Wills (2000) argue that it may simply be a spandrel, that is, a by-product, not a directly­
selected adaptation. The DNA polymerases involved in the response have been selected to 
copy highly damaged DNA, which constitutive polymerases (with higher replication fidelity) 
are unable to copy. The lower-fidelity polymerases repair damaged DNA, but the lower 
specificity of polymerisation required to bypass lesions also results in a transient increase in 
mutation rate. 
In the lab, global mutators have been successfully selected for, and tend to outcompete 
nonmutators (Sniegowski et al. 1 997) . Mutators can arise by chance, and, it has been argued 
that they could be maintained in asexual populations through genetic hitch-hiking on an 
advantageous allele created as a result of mutation. While it is thought that complete fixation 
Paper 7: Poole, Phillips & Penny 1 9  
of mutators would be rare, there seems to be a correlation between elevated mutation rate and 
virulence in pathogens (see Metzgar & Wills 2000 for discussion). Perhaps this is  not 
surprising, given that their hosts make use of somatic hypermutation in antibody formation, 
setting up a Red Queen race. However, the side effects for bacterial mutators are potentially 
worse; mutational meltdown due to the accumulation of deleterious mutations. 
Horizontal transfer and genome copy number may be crucial in the maintenance of 
elevated global mutation rates resulting from the appearance of heritable global mutators. 
Tenaillon et aL (2000) point out that horizontal transfer provides a potential mechanism for 
the spread of selectively advantageous mutations (such as those rare beneficial mutations 
arising during hypermutation) within a population. This might result in the advantageous 
allele being selected for while the mutator is selected against (due to an increase in deleterious 
mutations) and thus lost. The ability to segregate the beneficial mutation from the mutator 
phenotype may serve to provide a mechanism for the elimination of mutator alleles from a 
population in the long term. 
Prokaryotes with multiple copies of the genome are widespread (Bendich & Drlica 
2000), perhaps even the rule. For instance, E. coli is polyploid throughout its cell cycle 
CAkerlund et al . 1 995). Multiple genomic copies will serve as a buffer to deleterious mutation, 
minimising the detrimental effects of hypermutation, and at the same time, permitting new 
alleles to arise and be selected for (Koch 1984) . Azotobacter vinlandii maintains over 1 00 
genomic copies in stationary phase (Maldonado et al. 1 994), making it a potentially very 
interesting model organism for mutation studies. 
Another mechanism contributing to adaptive evolution is transient gene amplification 
of the lac operons of Salmonella typhimurium (Andersson et al. 1 998a) and E. coli (Hastings 
et al. 2000). Multiple copies of a mutant locus with residual activity produces an unstable 
'wild type' revertant. At the same time, presence of multiple copies increases the likelihood of 
a true reversion event. This last point is important, since, in effect, multiple copies provide 
mutation with a bigger 'target' without deleterious changes being lethal. This mechanism 
(Andersson et al. 1 998a) may be important in rescuing periodically-selected functions (PSFs) 
Paper 7: Pooie, Phillips & Penny 20 
from loss during selection to reduce genome size. While Hastings et al. (2000) did not find 
such revertants in their studies on E. coli, this does not necessarily imply that this cannot 
occur. 
An additional link between stress response and evolvability is reported in Drosophila. 
Rutherford and Lindquist ( 1 998) mutated the hsp83 locus (encoding HSP90), finding 
mutations of unrelated morphological traits in heterozygotes. The morphological mutations 
are stable even after subsequent crosses restore progeny to wild type. They argue that such a 
situation might arise in nature due to titration of HSP 90 during heat shock, or other stresses 
where heat shock proteins are expressed. 
In contrast to the previous examples where change is immediate, the stress, and the 
release from HSP 90 buffering, would presumably have to be sustained across generations for 
an alternate phenotype to be expressed and for selection to act upon this. Developmental 
processes (formation of adult structures, for instance) must run before phenotype is expressed. 
The comparison highlights the difference in the nature of adaptation between unicellular and 
multicellular organisms. A relaxation of buffering in response to stress could promote survival 
through expression of new variants, but the stress must be sustained and non-lethal. A lethal 
stress such as application of an antibiotic can however be dealt with in unicellular organisms, 
where beneficial mutations or genes received through horizontal transfer confer instant 
alleviation of the stress. 
A parallel system exists in yeast, where, under conditions of heat shock, the PSI 
protein, which has a role in translation termination, undergoes a conformational change, 
becoming a prion (True & Lindquist 2000). This conformational switch impairs translation 
termination, and there is extensive readthrough, producing alternative protein products. 
Reversion to the non-prion form is possible, and the process can result in heritable changes .  
As Metzgar and Wills (2000) point out, i t  is not possible to establish whether these examples 
are best described as spandrels, or whether there was selection for the buffering of variability 
in the absence of stress, and release from buffering during stress. The latter scenario is not 
incompatible with current evolutionary theory, as demonstrated by the above discussion of 
Paper 7: Pooie, Phillips & Penny 2 1  
stress response in unicellular prokaryotes and eukaryotes, but given Rutherford & Lindquist's 
( 1 998) titration model, we favour the first possibility. 
In Table 1 sporulation, and cell-cell interaction are also listed as environmentally 
regulated and promoting survival during stress. Sporulation or cell-cell aggregation to form 
fruiting bodies, biofilms and other transient multicellular structures in response to 
environmental stress is not controversiaL The difference between these, and the more 
controversial mechanisms is that the controversial mechanisms require mutation. If such 
responses can be selected for under lethal conditions, such as starvation, then so can the latter. 
However, that transient hypermutation and horizontal transfer are selected is best explained as 
occurring through hitch-hiking, not direct selection. The twist is that the fixation and 
subsequent maintenance of adaptive evolutionary traits through hitch-hiking may be on 
different loci at each round of selection. 
To conclude this section, while the evolutionary origins of many of the stress 
responses in Table 1 are still obscure, it is nevertheless possible to identify selection pressures 
which result in their maintenance and heritability. These are all 'adapt or die' strategies with a 
short term survival advantage, consistent with standard evolution. As pointed out by Metzgar 
& Wills (2000) and Hastings et al. (2000) there is no requirement for evolutionary 
forethought. If the ultimate consequence of starvation (or other environmental stresses) is 
death, then individuals in which elevated mutation rates, genetic competence or locus specific 
amplification are induced may survive. There are therefore two aspects: the ability to induce 
the mechanism to generate variability, and advent of a new function which may alleviate the 
stress. 
An ecological perspective: Evolutionarily-stable niche-discontinuity (ESND). 
Between groups of (complex multicellular) taxa, there often appear to be long-term 
stable niche boundaries. In a fitness landscape these boundaries limit access to a single peak, 
or sub-set of peaks, and thus limit evolutionary potential. For example, the vertebrate flying 
insectivore niche has been occupied by birds at day and bats at night for over 55 million years 
Paper 7: Poole, Phillips & Penny 22 
- -
-- - ---- - � � - -
- - - -- - - - - -
(Novacek, 1 985) with little crossover between nocturnal and diurnal niches. Dinosaurs and 
mammals may have provided niche boundaries for each other for over 1 50 million years until 
many of the great Mesozoic reptiles became extinct around the Cretaceous-Tertiary boundary 
(Bromham et aI., 1 999; Sereno, 1 999). 
Typically, niche restrictions are explained as dominance resulting from specialisation of 
the incumbent (Rosenzweig and McCord, 1 99 1 ) . This is basically inter-specific competition, 
with the species occupying the niche having had time for many optimisations compared with 
a potential competitor. We introduce the concept of evolutionarily-stable niche-discontinuity 
(ESND) to explain the maintenance of niche boundaries; in addition to interspecies 
competition, it attributes a major role to intraspecific competition within the competitor. A 
shift in an individual competitor (toward an alternative niche) typically involves a deleterious 
trade-off between interspecific and intraspecific competition. That is, a small heritable shift 
away from the fitness peak of the competitor' s  own gene pool will result in a greater fitness 
reduction (due to intraspecific competition) than the fitness increase from increased resources 
via interspecific competition. 
Figure 1 depicts a possible ESND for two taxa ( 1  & 2) that specialise on different food 
resources, with each taxon located near its own peak of fitness. The black and grey curves 
show the relative fitness derived from resources A and B respectively. The contributions sum 
(dashed line) to give the relative fitness for a hypothetical character. Models of resource 
partitioning among mammals (Phillips, in prep.) suggest that an ESND between two taxa can 
be maintained where potentially competing taxa specialise respectively on either side of an 
environmental discontinuity that may be physical (night vs. day) or biological (e.g. different 
prey species). 
Niche partitioning among large cursorial carnivores illustrates ESND maintained by 
specialisation in several characters, and coevolution with resources. Throughout Eurasia, 
Africa and America, the cat and dog groups of carnivores fill niches for fast-burst and 
endurance predators respectively. As predators, cats and dogs have many differences (lones 
and Stoddart, 1 998). As fast-burst, first-strike predators, cats have a high proportion of fast 
Paper 7: Poole, Phillips & Penny 23 
twitch glycolytic muscle ( like olympic sprinters) ,  powerful jaws and crushing canines, as well 
as having forelimbs as part of the killing mechanism. Conversely, as well as behavioural 
differences, large dogs, as endurance predators, have a low proportion of fast twitch 
glycolytic muscle (like marathon runners), slashing jaws and canines, and forelimbs 
(specialised for long-distance running) are not included in the killing mechanism. 
Thus multiple specialisations reinforce the ESND. Consider an individual dog (or cat) 
with a heritable shift in one of these characters towards the optimum phenotype of the other, 
but without concurrent shifts in the others. This change will reduce fitness in its own niche, 
but will still be of little benefit in accessing the other niche. Additionally, coevolution 
between predators and prey can strengthen the ESND. A dog with a slightly higher ratio of 
glycolytic to oxidative muscle is unlikely to benefit as a fast-burst predator because potential 
prey has coevolved with the faster burst-predators (cats). Yet other dogs will leave this mutant 
dog behind before they reach their endurance limit - intraspecific competition is strong. A 
consequence of ESND development for coevolution with prey resources is that evolvability 
may be more affected by ESNDs among taxa that prey on live organisms, than taxa that are 
autotrophs or detritavores. 
Given the prevalence in nature of physical and biological discontinuities, in the absence 
of extrinsic extinction and immigration of foreign (non-coevolved) competitors ESNDs 
should develop between coevolved taxa that compete for resources. As such, it is not 
surprising that catastrophic physical events have so often been suggested to catalyse 
evolvability (Jablonski, 1 986; Roy 1 996) . Although such events may not directly affect 
molecular and developmental mechanisms, they free lineages from ESND-restricted 
evolutionary trajectories. 
The establishment of ESNDs may differ between eukaryotes and prokaryotes in that 
horizontal transfer may break such barriers down in prokaryotes. For example, pathogenic 
Shigella strains of E. coli appear to have multiple independent origins within E. coli, probably 
concurrent with receipt of a plasmid carrying pathogenesis genes, and subsequent convergent 
gene losses (Pupo et aL 2000) . Operons in both prokaryotes (Lawrence 1 999) and fungi 
Paper 7: Poo)e, Phillips & Penny 24 
(Walton 2000) are also interesting in this regard, since, like plasmids, they represent a 
distinct, potentially transferable unit, such as an entire biosynthetic pathway, complete with 
regulatory sequences. 
Horizontal transfer of genes that allow an organism to compete in a new niche may have 
a number of outcomes. 1 ,  the incumbent is better adapted and the invader cannot colonise the 
niche. 2, the invader is better adapted (will depend on genetic background of the trait under 
selection in the niche). 3 ,  both have similar fitness, which may result in further competition, 
extinction of one or the other, or specialisation leading to two new niches. In the context of 
evolvability it is not sufficient just to consider interspecific competition between a potential 
invader and the incumbent species. Evolvability depends also on intraspecific competition 
within the invader, and coevolution between different levels of the food chain. 
Functional interactions between organisms and their environment necessarily invoke 
evolutionary constraints. Flowers which interact with pollinators are subject to greater 
evolutionary constraints than are parts such as leaves and bark, which are not required to 
interact specifically with other organisms (Raven et al. ,  1986). Evolutionary stability 
conferred on plant reproductive structures has made them more useful than (for example) bark 
or leaves in determining phylogenetic relationships. 
Evolutionary constraint can also result when environmental interactions change during 
development. Many amphibian and reptile taxa experience dramatic shifts in their 
environment through development, essentially having to function in different niches. For 
instance, the komodo dragon (Varanus komodoensis) begins life as an arboreal predator of 
small insects, progressively moves onto larger insects, small vertebrates and eggs, then larger 
vertebrates and eventually fills a terrestrial large predator/scavenger niche. Mutations 
providing a potential fitness advantage at any point along this continuum may be deleterious 
somewhere else during growth. This effect is less in mammals and birds because they 
typically feed their young until they can occupy the adult niche. 
Compared with other vertebrates ,  mammals and birds are also notable for an increased 
emphasis on homeostasis, particularly endothermy (Ruben, 1 995), so stabilising internal 
Paper 7: Poole, Phillips & Penny 25 
biochemical and physiological conditions. Both effects, reducing the range of niches during 
development and stabilising internal conditions, should enhance morphological evolvability. 
Indeed, while mammals and birds have diversified into widely different niches and 
morphologies from their ancestors that shared the planet with dinosaurs 65 million years ago, 
amphibians, turtles, lepidosaurs (snakes and lizards) and crocodilians typically have not 
(Benton, 1 993). 
Plasticity, Learning and Evolvability 
Population genetics typically considers just the genetic contribution to the phenotype 
on the grounds that the genetic component is selectable. Phenotypic plasticity, such as the 
specific branching pattern of a tree that has grown into a gap of light in the forest, is not 
genetically determined - yet has an important bearing on evolvability. One suggestion, often 
called the Baldwin effect (Baldwin ( 1 896), though also proposed by others), is that useful 
non-genetically acquired phenotypes will eventually tend to be determined genetically. 
Schmalhausen ( 1 949) and Simpson ( 1 953) explained the Baldwin effect genetically,  without 
the inheritance of acquired characters. These explanations however assumed that the plasticity 
was eventually lost as the optimal phenotype became the only developmental possibility, and 
therefore heritable. However, this approach does not seem useful ; a tree in the forest still 
needs to be able to grow into a new gap where there is light-plasticity needs to be retained. 
Baldwin ( 1 896) also proposed that learning tends to hasten the rate of evolution. 
Traditionally (e.g. Wright 1 93 1 ,  Grant 1 99 1 )  learning, or any non-genetic component of 
phenotypic variability, was thought to slow the rate of evolution by diluting the genetic 
component, thereby reducing the efficiency of natural selection in sorting genetic variance. 
However, quantitative genetic models (Anderson, 1 995) suggest that after an environmental 
change, populations of individuals able to ' search phenotype space' and those that can learn, 
will tend to find fitness peaks faster. Using neural networks, Hinton and Nowlan ( 1 987) 
showed that non-genetically acquired phenotypes could allow an organism to find a fitness 
peak faster than networks that only had genetically determined variability. In terms of fitness 
Paper 7: Poole, Phill ips & Penny 26 
---- ------ - -- - -
- --- - --
� 
- -- --- -
landscapes, it is straightforward to produce models where a combination of phenotypic 
flexibility and genetic variants will find a new optimum faster than the same model with only 
the genetic component. Testing this hypothesis may be challenging, though we note the 
parallels with the earlier discussion on epigenotypes. 
Wyles et al. ( 1983) reported that land vertebrates had an increasing rate of 
morphological evolution with increasing brain size to body size (encephalisation) . How could 
larger brain size lead, on average, to a faster rate of morphological evolution? Their 
suggestion was that the more flexible behaviour of larger-brained animals allows them to 
broaden, for example, their use of food sources. Because the behaviour of the species is more 
flexible, it is possible that a new morphological variant would be advantageous in using the 
new food source. In this suggestion there is no direct linkage between relative brain size and 
morphological evolution. Mutations leading to improved learning ability could be selected for 
if behaviour was more flexible, and quite independently this could allow a different mutation 
to be selected that modified some aspect of morphology. To follow the idea further, the 
plasticity of flowering plants in varying their growth form in response to their local 
environment is considered the plant equivalent of flexible behaviour. For example, the 
phytochrome pigment system by detecting the level of shade, produces etiolation in plants 
(Smith 1974). 
An important conclusion of these last two sections is that the potential to evolve is 
dependent on other organisms in the environment, with both intra- and inter-group 
competition being important. Potential evolvability is thus greater than realised evolvability. 
Conclusions. 
In this paper, we have examined a wide range of biological phenomena relevant to the 
concept of evolvability. In agreement with most authors, we conclude that there is  no need to 
explain evolvability as having evolved in itself; the evolution of phenomena contributing to 
evolvability can be explained by current evolutionary theory. It is important to base models 
for evolvability on a range of data, rather than establishing post hoc explanations for a single 
Paper 7: Pooie, Phillips & Penny 27 
dataset. To this end, we have examined how genome architecture affects evolvability in 
prokaryotes and eukaryotes. 
In prokaryotes, an r-selected lifestyle is characterised by exponential growth in 
response to an energy source, with competition driving shorter doubling times. That 
prokaryotes possess a single replication origin places pressure on chromosome size, since 
replication is the rate-limiting step in cell doubling under exponential phase. Consequently, 
there is selection for elimination of superfluous DNA, including periodically-selected 
functions (PSFs).  PSFs can be maintained by horizontal transfer, permitting more or less 
continual selection within a population or wider unit. Numerous prokaryotes maintain 
multiple genomic copies which may buffer against gene loss, provide a means of sidestepping 
the rate-limiting effect of replication by genome copy stockpiling, and may also permit the 
emergence of biochemical novelty through divergent evolution at identical copies of a given 
locus. This latter point, given the potential for additional catalytic activities in numerous 
enzymes (O'Brien & Herschlag 1 999), may explain how prokaryotes have become so 
biochemically diverse and colonised so many environments (Rothschild & Mancinelli 2001 ), 
even with ongoing sequence elimination. 
In general, eukaryotes are K-selected relative to prokaryotes (Carlile 1 982). They 
possess multiple origins of replication per chromosome, and, with relatively stable nutrient 
sources, doubling times are not the major component to competition. Genome size is therefore 
not limited by replication rate, but by replication fidelity . Consequently, the accumulation of 
junk DNA is not in itself selected against. In eukaryotes, neutral evolution appears to be 
central to understanding complexity and evolvability . Accumulation of junk DNA is neutral, 
and conducive to occasional co-option of junk or duplicated DNA into a new function. 
Both prokaryotic and eukaryotic parasites and endosymbionts have repeatedly 
undergone reductive evolution, losing massive amounts of genetic material. This is a 
convergent feature resulting from redundancy subsequent to the evolution of mechanisms for 
nutrient import. There may be less pressure for loss of superfluous sequences compared to 
Paper 7: Poole, Phillips & Penny 28 
free-living prokaryotes, as suggested by the 24% non-coding content of the Rickettsia 
genome, compared with around 10% for other bacterial genomes (Andersson et al. 1 998b). 
No hard boundary delineates r and K lifestyles which are best considered as a 
spectrum, helpful in understanding general patterns, but problematic if used to compare 
specific taxa. The utility of describing an r-K spectrum can be seen when comparing 
unicellular and simple eukaryotes to prokaryotes and complex multicellular eukaryotes. 
Unicellular eukaryotes appear to make use of horizontal transfer and tend to lose and gain 
PSFs, as supernumerary chromosomes in fungi (Covert 1 998) demonstrate, but the eukaryote 
translation apparatus makes for response times on the order of an hour in yeast compared with 
minutes in E. coli. 
Where prokaryotes and, to a lesser extent, unicellular eukaryotes have diversified 
through biochemical adaptation to a wide range of environments, multicellular eukaryotes 
have tended to colonise niches very similar to the initial niche. These can be reached by virtue 
of changes in structures, rather than the underlying biochemistry (e.g. ,  the beaks of Darwin' s  
finches (Lawrence 1999)). The emergence of an internal biochemical environment that can be 
regulated in response to starvation (e.g. by release of large reserves of stored energy) may 
have been a prerequisite to the emergence of morphological evolution in multicellular 
organisms, permitting the colonisation of new niches, but precluding access to ancestral 
niches. 
Mechanisms for dealing with environmental stresses are also different between 
eukaryotes and prokaryotes On the whole, changes in environment which are lethal to the 
organism will result in extinction in specialised multicellular eukaryotes whereas adaptation 
to non-lethal, sustained changes in environment may be possible. The process of heritable 
adaptation cannot happen within-generation because developmental programs cannot be re­
run to produce new, slightly modified structures in an adult. In prokaryotes, unicellular 
eukaryotes, and to some extent plants (which produce multiple centres of reproduction from 
vegetative tissue), there is the possibility of within-generation adaptation through immediate 
expression of a beneficial mutation or acquired gene. Viewed in these terms, prokaryote ' adapt 
Paper 7: Poole, Phillips & Penny 29 
or die' strategies make them more evolvable in response to environmental stress, while 
mechanisms to stabilise the internal environment in complex multicellular eukaryotes serve as 
a buffer to the external environment. Unicellular and simple multicellular eukaryotes are 
perhaps somewhere in the middle. 
An important consequence of this is that the extensive biochemical change seen in 
prokaryotes and unicellular eukaryotes, together with reductive evolution, may explain the 
observation that r-selected organisms appear to have lost more early biochemical relics than 
multicellular eukaryotes (Poole et al. 1 998, 1 999). Much more of multicellular biochemistry 
may in fact be a frozen accident, though many processes would have been lost because of the 
diminished requirement for interaction with fluctuating environments . The relevance of 
organisms in extreme environments as models for the earliest organisms (Nisbet & Sleep 
200 1 )  must be reconsidered within this framework. 
The effects of stress have been very important in experimental studies relevant to 
evolvability (particularly in prokaryotes), but we emphasise that we have still not covered all 
aspects of evolvability. Questions such as redundancy and modularity need more 
consideration, and other aspects of the system will affect potential evolvability in more ways 
than those described in our treatment of genome architecture and environmental interactions. 
A formal treatment of time scale, from within generations, to millions or billions of years, is 
also required. 
Finally, our evolutionarily-stable niche-discontinuity (ESND) model emphasises the 
difference between potential and realised evolvability, the latter including limits placed on 
organisms from constraints in their environment. Lateral transfer in prokaryotes may break 
down some ESNDs in a way that is similar to the niche competition when organisms adapted 
to previously isolated niches are able to interact (e.g. geological changes allowing interaction 
of isolated biota, or the introduction of exotic species into an environment) . Likewise, ESNDs 
can break down in some cases where complex behaviour is a trait in one organism, humans 
being the prime example. The emergence of plasticity, including complex behaviour, further 
separates organism from environmental changes because this allows a wider range of 
Paper 7: Poole, Phillips & Penny 30 
responses for a given genotype. The effect of organisms restricting the potential evolvability 
of others needs more consideration, as does plasticity (including learning). 
Evolvability has been a loosely defined concept and it is important to avoid post hoc 
usages of it. As a final comment, evolvability, in one sense, never needed to evolve because 
information transfer is always error prone - early biological systems were of much lower 
fidelity, and therefore inherently 'evolvable' .  
Acknowledgements. 
We thank David Martin for helpful discussions regarding the epigenotype concept. This work 
was supported by the New Zealand Marsden Fund. 
Paper 7 :  Poole, Phillips & Penny 3 1  
References. 
Adams, T.H.,  Wieser, J.K., Yu, J.-H. , 1 998. Asexual Sporulation in Aspergillus nidulans. 
Microbiol. Mol .  BioI .  Rev. 62, 35-54. 
Agrawal, A. ,  Eastman, Q.M., Schatz, D.G., 1 998. Implications of transposition mediated by 
V(D)l-recombination proteins RAG l  and RAG2 for origins of antigen-specific 
immunity. Nature 394, 744-75 1 .  
Akerlund, T., Nordstrom, K., Bemander, R ,  1995. Analysis of cell size and DNA content in 
exponentially growing and stationary-phase batch cultures of Escherichia coli. J. 
Bacteriol. 1 77, 6791 -6797. 
Alberts, B . ,  Bray, D . ,  Lewis, J . ,  Raff, M., Roberts, K., Watson, J.D., 1 994. Molecular Biology 
of the Cell, 3rd. Ed. Garland Publishing, NY. 
Anderson, RW., 1 995.  Learning and evolution: a quantitative genetics approach. J .  Theor. 
Biol. 1 75, 89- 1 0 1 .  
Andersson, DJ. ,  Slechta, E.S. ,  Roth, J.R, 1 998a. Evidence that gene amplification underlies 
adaptive mutability of the bacterial lac operon. Science 282, 1 133- 1 1 35 .  
Andersson, J.O., Andersson, S .G.E., 1 999. Genome degradation i s  an ongoing process in 
Rickettsia. Mol. BioI. Evol. 1 6, 1 178- 1 1 9 1 .  
Andersson, S .G.E. , et aI. ,  1 998b. The genome sequence of Rickettsia prowazekii and the 
origin of mitochondria. Nature 396, 1 33- 140. 
Baldwin, l.M., 1 896. A new factor in evolution. Am. Nat. 30, 441 -451 .  
Banuett, F. ,  1 998. S ignalling in the Yeasts: An Informational Cascade with Links to the 
Filamentous Fungi. Microbiol. Mol. BioI. Rev. 62, 249-274. 
Beaton, M.J.,  Cavalier-Smith, T. ,  1 999. Eukaryotic non-coding DNA is functional: evidence 
from the differential scaling of cryptomonad genomes. Proc. Roy. Soc. Lond. B 266, 
2053-2059. 
Bendich, AJ.,  Drlica, K., 2000. Prokaryotic and eukaryotic chromosomes: what' s the 
difference? Bioessays 22, 48 1 -486. 
Benton, M.J., 1 993 . The Fossil Record 2. Chapman and Hall, London. 
Paper 7: Poole, Phillips & Penny 32 
Bromham, L., Phillips, M .1., Penny, D., 1 999. Growing up with dinosaurs: molecular dates 
and the mammalian radiation. Trends Ecol. Evol. 1 4, 1 13 - 1 18 .  
Brosius, J. ,  1 999. RNAs from all categories generate retrosequences that may be  exapted as 
novel genes or regulatory elements. Gene 238, 1 1 5- 1 34.  
Burch, c.L.,  Chao, L., 2000. Evolvability of an RNA virus is determined by its mutational 
neighbourhood. Nature 406, 625-628. 
Cairns, J . ,  Overbaugh, J., Miller, S . ,  1 988. The origin of mutants. Nature 335, 1 42- 145 .  
Carlile, M.J. , 1 982. Prokaryotes and eukaryotes :  strategies and successes. Trends Biochem. 
Sci. 7, 1 28-1 30. 
Cavaille, 1. ,  et aI. ,  2000. Identification of brain-specific and imprinted small nucleolar RNA 
genes exhibiting an unusual genomic organization. Proc. Natl . Acad. Sci. USA 97, 
143 1 1 - 143 1 6. 
Champe, S.P. ,  Nagle, D.L., Yager, L.N., 1 994. Sexual sporulation. Prog. Ind. Microbiol. 29, 
429-454. 
Charette, M., Gray, M.W., 2000. Pseudouridine in RNA: What, Where, How, and Why. 
IUBMB Life 49, 341-35 1 .  
Cole, S.T.,  et aI. ,  200 1 .  Massive gene decay in the leprosy Bacillus. Nature 409, 1 007- 1 0 1 1 .  
Covert, S.F., 1998.  Supernumerary chromosomes in filamentous fungi. Curr. Genet. 33 ,  3 1 1 -
3 19.  
Crespi, B .1 . , 200 1 .  The evolution of social behaviour in microorganisms. Trends Ecol. Evol. 
1 6, 178- 1 83 .  
Darnell, J.B. , Doolittle, W.F., 1 986. Speculations on  the early course of  evolution. Proc. Natl. 
Acad. Sci. USA 83, 127 1 - 1 275. 
Dickinson, W.1. ,  Seger, J. , 1 999. Cause and effect in evolution. Nature 399, 30. 
Doolittle, W.F., 1 998.  You are what you eat: a gene transfer ratchet could account for 
bacterial genes in eukaryotic nuclear genomes. Trends Genet. 14, 307-3 1 1 . 
Douglas, S . ,  et aI. ,  2001 .  The highly reduced genome of an enslaved algal nucleus. Nature 
4 10, 1 09 1 - 1 096. 
Paper 7: Poole, Phillips & Penny 33 
Dover, G. ,  2000. Results may not fit well with current theories . . .  Nature 408, 1 7. 
Drake, l.W., 1 999. The distribution of rates of spontaneous mutation over viruses, 
prokaryotes, and eukaryotes. Ann. NY Acad. Sci. 870, 100- 107 .  
Farley, 1. ,  1 977. The spontaneous generation controversy from Descartes to Oparin. lohns 
Hopkins University Press, Baltimore MD. 
Fij alkowska, I.J., Dunn, R.L., Schaaper, RM., 1 993. Mutants of Escherichia coli with 
increased fidelity of DNA replication. Genetics 1 34, 1023- 1030. 
Filipowicz, W., 2000. Imprinted expression of small nucleolar RNAs in brain: Time for 
RNomics. Proc. Natl. Acad. Sci. USA 97, 14035- 14037. 
Finkel, S .E. ,  Kolter, R., 1 999. Evolution of microbial diversity during prolonged starvation. 
Proc. Natl. Acad. Sci. USA 96, 4023-4027. 
Forterre, P., 1 995 . Thermoreduction, a hypothesis for the origin of prokaryotes. eR Acad. Sci. 
Paris III 3 1 8, 4 1 5-422. 
Forterre, P . ,  Philippe, H. ,  1 999. Where is the root of the universal tree of life? Bioessays 2 1 ,  
87 1 -879. 
Forterre, P., Bouthier de la Tour, c., Philippe, H., Duguet, M., 2000. Reverse gyrase from 
hyperthermophiles: probable transfer of a thermoadaptation trait from Archaea to 
Bacteria. Trends Genet. 1 6, 1 52- 1 54. 
Foster, P.L., 1 999. Mechanisms of stationary phase mutation: a decade of adaptive mutation. 
Annu. Rev. Genet. 33 , 57-88. 
Fraser, C.M.,  et aI. ,  1 998. Complete genome sequence of Treponema pallidum, the syphilis 
spirochaete. Science 28 1 ,  375-388.  
Galtier, N. ,  Tourasse, N. ,  Gouy, M., 1 999. A nonhyperthermophilic common ancestor to 
extant life forms. Science. 283, 220-22l .  
Gibson, G.,  2000. Evolution: Hox genes and the cellared wine principle. Curr. BioI. 10, R452-
R455 .  
Glansdorff, N . ,  2000. About the last common ancestor, the universal life-tree and lateral gene 
transfer: a reappraisal. Mol. Microbiol. 38 ,  1 77- 1 85 .  
Paper 7: Poole, Phillips & Penny 34 
Gould, S .  J . ,  Lewontin, R. C. ,  1 979. The spandrels of San Marco and the Panglossian 
paradigm: A critique of the adaptationist program. Proc. R Soc. Lond. B 205, 58 1 -598. 
Grant, V., 1 99 1 .  The Evolutionary Process. Columbia Univ. Press, New York.Gray, M.W., 
Burger, G., Lang, B .F., 1 999. Mitochondrial evolution. Nature 283, 1476- 1 48 1 .  
Graveley, B .R,  200 1 .  Alternative splicing: increasing diversity in the proteomic world. 
Trends Genet. 17,  100- 107.  
Gray, M.W., Burger, G. ,  Lang, RP., 1 999. Mitochondrial evolution. Science. 283, 1476- 148 1 .  
Grbic, M.,  2000. "Alien" wasps and evolution of development. BioEssays 22, 920-932. 
Harris, lR, 1998.  Placental endogenous retrovirus (ERV) : structural, functional, and 
evolutionary significance. BioEssays 20, 307-3 1 6. 
Hartl, D.L, 2000. Molecular melodies in high and low C. Nat. Rev. Genet. 1 ,  145- 149. 
Hastings, P.J., Bull, H.1., Klump, J .R, Rosenberg, S .M., 2000. Adaptive amplification: an 
inducible chromosomal instability mechanism. Cell 103 ,  723-73 1 .  
Herbert, A.,  Rich, A., 1 999. RNA processing in evolution. The logic of soft-wired genomes. 
Ann. N.Y. Acad. Sci. 870, 1 19- 1 32. 
Hinton, G.E., Nowlan S .1. ,  1 987. How learning can guide evolution. Complex systems 1 , 495-
502. 
Hiom, K., Mele, M.,  Gellert, M., 1 998. DNA transposition by the RAG! and RAG2 proteins: 
a possible source of oncogenic translocations. Cell 94, 463-470. 
Holliday, R,  Grigg, G.W., 1 993 . DNA methylation and mutation. Mutat. Res. 285, 6 1 -67. 
Hood, D.W., et aI., 1 996. DNA repeats identify novel virulence genes in Haemophilus 
inJluenzae. Proc. Natl. Acad. Sci. USA 93, 1 1 12 1 - 1 1 1 25 .  
Jablonski, D . ,  1 986. Background and mass extinctions: the alteration of  macroevolutionary 
regimes. Science 23 1 ,  1 29- 1 33 .  
Jacob, P., 1 977. Evolution and Tinkering. Science 1 96, 1 16 1 - 1 1 66.  
Jacobs, H. ,  Bross, L,  200 1 .  Towards an understanding of somatic hypermutation. CUrt. Opin. 
Immunol . 1 3 , 208-2 18 .  
Paper 7 :  Poole, Phillips & Penny 35 
Jones, M.E., Stoddart, D.M., 1 998.  Reconstruction of the predatory behaviour of the extinct 
marsupial thylacine (Thylacinus cynocephalus). J. Zoo1. Soc. Lond. 246, 239-246. 
Kalman, S . ,  et al. 1 999. Comparative genomes of Chlamydia pneumoniae and C. trachomatis. 
Nat. Genet. 2 1 ,  385-389. 
Kasak, L., Horak, R., Kivisaar, M., 1 997 Promoter-creating mutations in Pseudomonas 
putida: A model system for the study of mutation in starving bacteria. Proc. Natl. Acad. 
Sci. USA 94, 3 134-3 1 39.  
Kirschner, M.,  Gerhart, J . ,  1 998. Evolvability . Proc. Natl . Acad. Sci. USA 95, 8420-8427. 
Koch, AL., 1984. Evolution vs the number of gene copies per primitive cell . 1. Mol. Evol. 20, 
7 1 -76. 
Lafontaine, D.L.J., Tollervey, D., 1 998.  Birth of the snoRNPs: the evolution of the 
modification-guide snoRNAs. Trends Biochem. Sci. 23, 383-388. 
Lan, R. ,  Reeves, P.R., 2000. Intra-species variation in bacterial genomes: the need for a 
species genome concept. Trends Microbiol. 8 ,  396-401 .  
Landree, M.A, Wibbenmeyer, J.A, Roth, D.B . ,  1 999. Mutational analysis of RAG 1 and 
RAG2 identifies three catalytic amino acids in RAG 1 critical for both cleavage steps of 
V(D)J recombination. Genes Dev. 1 3 , 3059-3069. 
Larsson, E., Andersson, G., 1998. Beneficial role of Human Endogenous Retroviruses: Facts 
and Hypotheses. Scand. J. Immunol . 48, 329-338 .  
Lawrence, J . ,  1 999. Selfish operons: the evolutionary impact of gene clustering in prokaryotes 
and eukaryotes CUff. Opin. Genet. Dev. 9, 642-648. 
Lazazzera, B .A,Kurtser, LG., McQuade, R.S. ,  Grossman, AD., 1999. An autoregulatory 
circuit affecting peptide signalling in Bacillus subtilis. J. Bact. 1 8 1 , 5 1 93-5200. 
Levin, P.A, Grossman, AD., 1 998. Cell cycle and sporulation in Bacillus subtilis. Curf. 
Opin. Microbiol. 1 , 630-635.  
Lin, L. ,  Xu, B. ,  Rote, N.S. ,  1999. Expression of Endogenous Retrovirus ERV-3 Induces 
Differentiation in Be Wo, a Choriocarcinoma Model of Human Placental Trophoblast. 
Placenta 20, 109- 1 1 8 . 
Lindquist, S . ,  2000 . . . .  but yeast prion offers clues about evolution. Nature 408, 17 - 18 .  
Paper 7 :  Poole, Phillips & Penny 36 
Maldonado, R,  Jimenez, J . ,  Casadesus, J . ,  1 994. Changes of ploidy during the Azotobacter 
vinelandii growth cycle. J. Bacteriol. 1 76, 391 1 -39 19. 
Mangeney, M. ,  Heidmann, T. ,  1 998.  Tumor cells expressing a retroviral envelope escape 
immune rejection in vivo. Proc. Natl. Acad. Sci. USA 95: 14920-1 4925. 
Marguet, E. ,  Forterre, P. ,  1 994. DNA stability at temperatures typical for thermophiles. 
Nucleic Acids Res. 22, 1 68 1-1686. 
McFadden, G.I. ,  1 999. Endosymbiosis and evolution of the plant cell. Curr. Opin. Plant BioI. 
2,5 1 3-5 1 9. 
McKenzie, G.1. ,  Harris, RS. ,  Lee, P.L. , Rosenberg, S.M., 2000. The SOS response regulates 
adaptive mutation. Proc. NatI. Acad. Sci. USA 97, 6646-665 1 .  
Melek, M.,  Gellert, M. ,  2000. RAG II2-mediated resolution of transposition intermediates: 
two pathways and possible consequences. Cell. 1 0 1 , 625-633. 
Metzgar, D. ,  Wills, C. ,  2000. Evidence for the adaptive evolution of mutation rates. Cell 1 0 1 ,  
58 1 -584. 
Monk, M., 1 995.  Epigenetic programming of differential gene expression in development and 
evolution. Dev. Genet. 17 ,  1 88- 1 97 .  
Moran, N.A. ,  1 996. Accelerated evolution and Muller's Ratchet in endosymbiotic bacteria. 
Proc. Natl. Acad. Sci. USA 93, 2873-2878. 
Moran, N., Baumann, P., 2000. Bacterial endosymbionts in animals. Curr. Opin. Microbiol . 3 ,  
270-275. 
Morgan, H.D., Sutherland, H.G.E., Martin, D.LK. , Whitelaw, E. ,  1 999. Epigenetic inheritance 
at the agouti locus in the mouse. Nat. Genet. 23, 3 14-3 1 8 .  
Morrissey, J.P.,  Tollervey, D . ,  1 995. Birth of the snoRNPs: the evolution of RNase MRP and 
the eukaryotic pre-rRNA-processing system. Trends Biochem. Sci. 20, 78-82. 
Moxon, E.R, Rainey, P.B . ,  Nowak, M.A., Lenski, RE., 1 994. Adaptive evolution of highJy 
mutable loci in pathogenic bacteria. Curr. BioI. 4, 24-33.  
Moxon, E.R. ,  Thaler, D.S . ,  1 997. The Tinkerer' s evolving toolbox. Nature 387, 659-662. 
Paper 7: Poole, Phillips & Penny 37 
Muramatsu, M.,  Kinoshita, K.,  Fagarasan, S . , Yamada, S . ,  Shinkai, Y.,  Honjo, T . ,  2000. Class 
switch recombination and hypermutation require activation-induced cytidine deaminase 
(AID), a potential RNA editing enzyme. Cell 1 02, 553-563. 
Nelson, K.E.,  et al . ,  1 999. Evidence for lateral gene transfer between Archaea and Bacteria 
from genome sequence of Thermotoga maritima. Nature 399, 323-329. 
Nisbet, E.G., S leep, N.H., 200 1 .  The habitat and nature of early life .  Nature 409, 1 083- 109 1 .  
Novacek, MJ.,  1 985.  Evidence for echolocation in the oldest known bats. Nature 306, 683-
684. 
O'Brien, P.J. ,  Herschlag, D. ,  1 999. Catalytic promiscuity and the evolution of new enzymatic 
activities. Chem. Biol. 6, R9 1 -R 105 .  
Omer, AD.,  Lowe, T.M., Russell, AG. ,  Ebhardt, H. ,  Eddy, S .R, Dennis, P.P., 2000. 
Homologs of small nucleolar RNAs in Archaea. Science 288, 5 1 7-522. 
Partridge, L., Barton, N.H., 2000. Evolving evolvability. Nature 407, 457-458 .  
Penny, D. ,  Poole, A, 1 999. The nature of the Last Universal Common Ancestor. Curr. Opin. 
Genet. Dev. 9, 672-677. 
Plaga, W., Schairer, H.U., 1 999. Intercellular signalling in Stigmatella aurantiaca. Curr. 
Opin. Microbiol. 2, 593-597. 
Plasterk, R, 1 998.  V(D)J recombination. Ragtime jumping. Nature 394, 7 1 8-7 1 9. 
Poole, AM., Jeffares, D.C.,  Penny, D. ,  1 998.  The path from the RNA world. 1. Mol. Evol. 46, 
1 - 1 7 .  
Poole, A ,  Jeffares, D . ,  Penny, D.,  1 999. Early evolution: prokaryotes, the new kids on  the 
block. Bioessays 2 1 ,  880-889. 
Poole, A, Penny, D., Sjoberg, B.-M.,  200 1 .  Confounded cytosine! Tinkering and the 
evolution of DNA Nat. Rev. Mol. Cell .  Biol . 2, 147- 1 5 1 .  
Powell, S .c. ,  Wartell, RM., 200 1 .  Different characteristics distinguish early versus late 
arising adaptive mutations in Escherichia coli FC40. Mutat. Res. 473 ,  2 19-228.  
Paper 7: Poo\e, Phillips & Penny 38  
Pupo, G.M., Lan, R,  Reeves, P.R, 2000. Multiple independent origins of Shigella clones of 
Escherichia coli and convergent evolution of many of their characteristics. Proc. Natl. 
Acad. Sci. U.S .A. 97, 1 0567- 1 0572. 
Radman, M. ,  Matic, 1., Taddei, F., 1 999. Evolution ofevolvability. Ann. N.Y. Acad. Sci. 870, 
146- 1 55 .  
Raven, P .H. ,  Evert, RE. ,  Eichhorn, S .E.,  1 986. Biology of Plants. Worth Publishers, New 
York. 
Reanney, D.e. ,  1 974. On the origin of prokaryotes. J. Theor. BioI. 48, 243-25 1 .  
Reenan, RA., 200 1 .  The RNA world meets behavior: A-I pre-mRNA editing in animals. 
Trends Genet. 17, 53-56. 
Revy, P., et al . 2000. Activation-induced cytidine deaminase (AID) deficiency causes the 
autosomal recessive form of the Hyper-IgM syndrome (HIGM2). Cell 1 02, 565-575. 
Rosenzweig, M.L., McCord, RD., 1 99 1 .  Incumbent Replacement: evidence for long-term 
evolutionary progress. Paleobiology 1 7, 202-21 3 .  
Rothschild, L.J. ,  Mancinelli, RL., 200 1 .  Life in extreme environments. Nature 409, 1 092-
1 10 1 .  
Roy, K., 1 996. The roles of mass extinction and biotic interaction in large-scale replacements: 
a reexamination using the fossil record of stromboidean gastropods. Paleobiology 22, 
436-452 .  
Ruben, J . ,  1 995. The evolution of endothermy in  mammals and birds: from physiology to 
fossils. Ann. Rev. Physiol. 57, 69-95 . 
Ruddle, F.H.,  et al. ,  1 999. Evolution of chordate Hox gene clusters. Ann. N.Y. Acad. Sci. 870, 
238-248. 
Rutherford, S .L.,  Lindquist, S .L., 1 998. Hsp 90 as a capacitor for morphological evolution. 
Nature 406, 336-342. 
Scharer, O.D. ,  Jiricny, J., 200 1 .  Recent progress in the biology, chemistry and structural 
biology of DNA glycosylases. Bioessays 23, 270-28 1 .  
Schmalhausen, I.I. , 1 949. Factors of Evolution. University of Chicago Press, Chicago. 
Paper 7: Poole, Phillips & Penny 39 
Sereno, P. ,  1 999. The evolution of dinosaurs. Science 284, 2 1 37-2 147 .  
Sharp, P .A, 1 994. Split genes and RNA splicing. Cell 77, 805-8 1 5  
Simpson, G.G., 1953.  The Baldwin effect. Evolution 7, 1 10- 1 17. 
Smit, A.P.A,  1 999. Interspersed repeats and other mementos of transposable elements in 
mammalian genomes. Curf. Opin. Genet. Dev. 9, 657-663. 
Smith, H. ,  1 974. Phytochrome and photomorphogenesis ;  an introduction to the photocontrol 
of plant development. McGraw-Hill, London. 
Smith, H.O. ,  Gwinn, M.L. ,  Salzberg, S .L.,  1 999. DNA uptake signal sequences in naturally 
transformable bacteria. Res. Microbiol. 1 50, 603-6 16 .  
Smith, H.c. ,  Gott, J .M., Hanson, M.R, 1997. A guide to RNA editing. RNA 3,  1 105- 1 123.  
Sniegowski, P.D. ,  Gerrish, P.J., Lenski, RE., 1997. Evolution of high mutation rates in 
experimental populations of E. coli. Nature 387, 703-705. 
Solomon, J.M., Grossman, A.D., 1 996. Who's  competent and when: regulation of natural 
genetic competence in bacteria. Trends Genet. 1 2, 1 50- 1 55 .  
Sontheimer, EJ.,  Gordon, P.M., Piccirilli, lA. ,  1 999. Metal ion catalysis during group II 
intron self-splicing: parallels with the spliceosome Genes Dev. 1 3 ,  1 729- 1 74 1 .  
Stoltzfus, A ,  1 999. On the possibility of constructive neutral evolution. 49, 1 69- 1 8 1 .  
Szathmary, E. ,  Maynard Smith, J. ,  1 995. The major evolutionary transitions. Nature 374, 227-
232. 
Torkelson, J.  Harris, RS., Lombardo, M-l, Nagendran, J . ,  Thulin, c . ,  Rosenberg, S .M. ,  1 997. 
Genome-wide hypermutation in a subpopulation of stationary-phase cells underlies 
recombination-dependent adaptive mutation. EMBO J. 1 6, 3303-33 1 1  
Tortosa, P . ,  Dubnau, D. ,  1999. Competence for transformation: a matter of taste. Curf. Opin. 
Microbiol. 2, 588-592. 
True, H.L. ,  Lindquist, S .L., 2000. A yeast prion provides a mechanism for genetic variation 
and phenotypic variability. Nature 407, 477-483 .  
Varon, M. ,  Choder, M . ,  2000. Organization and cell-cell interaction in  starved Saccharomyces 
cerevisiae colonies. 1. Bacteriol. 1 82, 3877-3880. 
Paper 7: Poole, Phillips & Penny 40 
Wagner, A., 1 996. Does evolutionary plasticity evolve? Evolution 50, 1 008- 1 023. 
Wagner, G.P., Altenberg, L., 1 996. Complex adaptations and the evolution of evolvability. 
Evolution 50, 967-976.  
Walton, 1.D., 2000. Horizontal gene transfer and the evolution of secondary metabolite gene 
clusters in fungi: an hypothesis. Fungal Genet. BioI. 30, 1 67- 1 7 1 .  
Ward, M.J., Zusman, D.R., 1 999. Motility in Myxococcus xanthus and its role in 
developmental aggregation. Curf. Opin. MicrobioL 2, 624-629. 
Weinstein, L.B .,  Steitz, J.A., 1 999. Guided tours: from precursor snoRNA to functional 
snoRNP. Curf. Opin. Cell. BioI. 1 1 , 378-384. 
Whitelaw, E., Martin, DJ.K., 200 1 .  Retrotransposons as epigenetic mediators of phenotypic 
variation in mammals. Nat. Genet. 27, 361-365. 
Woese, e.R., 1 998. The universal ancestor. Proc. Natl. Acad. Sci. USA 95, 6854-6859. 
Wolfe, K.H., Shields, D.e., 1 997. Molecular evidence for an ancient duplication of the entire 
yeast genome. Nature 387, 708-7 1 3 .  
Wright, S.K., 1 93 1 .  Evolution in Mendelian populations. Genetics 1 6, 97- 1 59 .  
Wyles, 1.S., Kunkel, J.G., Wilson, A.e., 1 983. Birds, behavior, and anatomical evolution. 
Proc. Natl. Acad. Sci. USA 80, 4394-4397. 
Yoder, l.A. , Walsh, e.P., Bestor, T.H., 1 997. Cytosine methylation and the ecology of 
intragenomic parasites. Trends Genet. 13 ,  335-340. 
Paper 7: Pooie, Phillips & Penny 41  
Table 1 .  Examples of stress response which may affect evolvability. 
Prokaryotes 
Mechanism Activating stress Organism(s) Notes References 
Global hypermutation Occurs in stationary phase, E.coli Hypermutation is transient, Torkelson et a1. l997 
thus likely to be a recombination-dependent and McKenzie et al. 2000 
starvation response. Pseudomonas plltida? in stationary phase. Kasak et al . 1 997 
Local hypermutation Recurrent selection, such H. inJluenzae E.g. phenotypic switching of Moxon et al. 1 994 
(contingency loci) as in host-parasite E. coli surface antigens, hypermutable Hood et al. 1 996 
coevolution. S. typhimurium virulence factors. 
cf V(D)J hypervariability. 
Gene amplification Occurs in stationary phase, S. typhimurium Requires residual activity at Andersson et al. 1 998a 
thus likely to be a amplified locus. Powell & Wartell 2001 
starvation response. E. coli In late arising colonies. Hastings et al. 2000 
Genetic competence (DNA Occurs in stationary phase. B. subtilis Extracellular signalling Solomon & Grossman 
uptake) Streptococcus molecules indicate a cell 1 996 
pneumoniae density 'quorum' which Tortosa & Dubnau 1 999 
H. inJluenzae establishes competence. 
Sporulation B. sllbtilis Sporulation controlled by the Levin & Grossman 1 998 
same pathway as competence. 
Cell-cell interaction Starvation Stigmatella auantiaca Sporulation occurs in response Ward & Zusman 1 999 
Myxococcus xanthus to starvation in these Plaga & Schairer 1 999 
myxobacteria 
Eukaryotes 
Mechanism Activating stress Organism Notes References 
Sexual sporulation Starvation S. cerevisiae Saccharomyces enters meiosis Banuett 1 998 
upon nitrogen starvation . 
A. nidulans Aspergillus sporulates sexually Adams et a1. 1 998 
. . . . . . .  
at low glucose concentrations ,. 
At high glucose it switches to 
asexual sporulation (dispersal) 
Supernumerary Fungi Not usually stably maintained Covert 1 998 
chromosomes in the genome. cf plasmids. 
Cell-cell interaction Starvation S. cerevisiae In yeast, connecting fi laments Varon & Choder 2000 
form between cells. 
D. discoideum Starvation promotes fruiting Crespi 200 1 
body and spore formation. 
PSI-dependent translation Heat shock protein- S. cerevisiae PSI normally translation True & Lindquist 2000 
readthrough. mediated terminator. Change in protein 
conformation occurs. 
Hsp 90-mediated Heat stress, other stresses D. melanogaster Hypothesised that Hsp 90 Rutherford & Lindquist 
phenotype exploration. involving Hsp 90. titration during heat stress lifts 1 998 
buffering, resulting in hidden 
phenotypes being tested. 
Local hypermutation Host-parasite interactions Mammals Somatic hypermutation of Jacobs & Bross 200 1 
V(D)J genes in antibody 
formation. 
--- resource A--- resource B - - resource A+B 
Quantitative Phenotype 
Figure 1. Evolutionarily stable niche discontinuity between two taxa. The curves represent 
relative fitness contribution derived by an organism from access to resources A (black 
line) and B (grey line), as dependent on a quantitative phenotype. The sum of the curves 
for resources A+B (dashed line) represents the overall relative fitness of organism's with 
respect to a quantitative phenotype. The signature of an ESND is a direction of selection 
pattern creating a valley of low fitness. This is expected to occur where there is a 
deleterious phenotype shift trade-off between interspecific and intraspecific competition. 
c T i me 
B 
A 
Access to biological resources 
-----;;,;; ... . ;,;;; . . ..� . . ... .. . . . . . . . . . .... . . .. . . . . . 
-----� .. . � . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
•
•
• 1 
. - . .  - .  : . ' . . ' . 
..
. : 
•• + : . . . . . . 
. 
:-
. . . . . . . . . ' . . ... .
.... 
PhysicaL 
conditions 
Figure 2 traces the rel at i o ns h i p  between the potential (transparent) and rea l  ised (s haded) 
n iche thro ugh t i m e  for a h y pothetical organ i s m .  The pote n t i a l  n i c h e  i n c l udes the fu l l  
range of phys ical (ax i s  1 )  and b i o t i c  (ax i s  2 )  cond i t i ons for w h i c h  the o rganism can 
s u rv i v e  and reproduce. The effect of competi t ion and predation o n  f itness contracts t h i s  
range, l eav i n g  the rea l i sed n iche (sh aded) that natura l l y  occurs. Exti nction of a predator at 
t i m e  B a l lows the expan s i on of the rea l i sed n iche ( w i t h i n  the bounds of the pote n t i a l  
n iche) . C h a nges t o  t h e  potential  n i che m a y  fol low due to a l teration of t h e  f itness 
landscape ow i ng to the expansion of the rea l i sed n iche. 
- -
- - - -- �� �
--
�
- - -
Box 1 - Post hoc explanations 
The Story of Darryl. 
Darryl lived in a small fannhouse on the edge of an isolated village. Perhaps as a 
result of generations of inbreeding, he was slow, but very gentle and wouldn' t  even harm a 
fly. Darryl had one ability that really endeared him to the locals. He was a fantastic shot. 
The wall of Darryl's barn was covered with small round circles, each with a small 
hole right in the centre where a bullet had hit. An intrigued journalist from the neighboring 
town arranged an interview for a feature story. 
"Tell me Darryl", she said, "how is it that you are such a good shot with a rifle?" Darryl 
replied, 
"It' s veeery simple, --- I taaakes my rifle, --- aaaims it at the wall, --- puuulls the 
trigger, --- fiiinds where it hits, --- and draaaws a circle around it. " 
Box 2. r and K selection. 
Rate of population growth, R, is given by the equation: 
R = dNldt = rN( I-NIKJ 
Where: 
r = maximum intrinsic rate of increase for a population 
N = number of organisms 
K = carrying capacity (of the environment) 
r-selected organisms: K-selected organisms: 
• small • large 
• high reproductive rates • lower, more constant, reproductive rate 
• short life cycles • longer life cycles 
• live in unpredictable environments • live in more stable environments 
• fluctuation in resource availability and type • resources in more constant supply 
requires fast response times (though limited in amount) 
• population size varies hugely • population size relatively stable 
r- and K- selection is a relative measure. While specific application of this concept is 
problematic (organism A may be r-selected relative to organism B,  but K-selected relative 
to organism C), it is no more problematic than fitness, which is also a relative measure. 
The concept is useful in general discussions such as this since it aims to explain many 
aspects of prokaryotes and eukaryotes, rather than invoke special explanations for each 
feature. 
Box 3. Selection pressures on genome organisation and consequences for 
evolvability. 
Prokaryotes. 
Fast reproductive rate during exponential growth is a consequence of r selection. 
• Under r selection, fast replication is selectively advantageous. 
• Under fast replication, a single origin of replication per chromosome limits genome size. 
• Consequently there is selection against multiple copies of genes, 'junk DNA', and genes 
that are only rarely required (periodically-selected') .  
• Horizontal transfer is advantageous for recovering periodically-selected' genes. 
Copy number. 
• Retaining multiple copies of the genome appears widespread (Bendich & Drlica 2000). 
• This redundancy provides a buffer to deleterious mutation, and is expected to promote 
survival during hypermutation in the stationary phase (Finkel & Kolter 1 999). 
• Redundancy may favour diversification of new functions similar to duplication and 
divergence in eukaryote genomes. 
• May maintain faster cell division through genome stockpiling - overcomes a problem if 
replication takes an hour, but cells can double in 20 minutes during exponential growth. 
Plasmids. 
• Maintain periodically-selected functions in r-selected populations; a gene on a plasmid 
can be retained within a population though lost from individuals. 
Operons. 
• Transferable units of metabolism. The origin of the operon organisation is debated, but 
once formed, an operon may be spread through horizontal transfer (Lawrence 1 999). 
Response times. 
• Ability to respond quickly to changes in environment, e.g. presence of a new substrate, is 
a feature of r-selected organisms. 
• Beginning translation before transcription is finished allows fast response. mRNA being 
extensively processed, and then exported from the nucleus, makes response times much 
slower even in r-selected eukaryotes such as yeast. Response times are in minutes in 
prokaryotes, and of the order of an hour in yeast (Alberts et al. 1 994). 
• Loss of extensive transcript processing will be selected for. 
Environmental interactions. 
• Regulation of developmental pathways are strongly linked to environmental cues. 
Examples are fruiting body formation (asexual sporulation), genetic competence, biofilm 
formation, regulation of virulence (see Table 1 in Crespi 2001 ) .  
Eukaryotes. 
In K-selected organisms reproductive rate is slower. 
• Given many centres of replication (and replicons) there are few constraints on genome 
size, and accumulation of junk DNA is not inherently disadvantageous. Thus expansion of 
genome size through transposable elements, retroviral incorporation, duplication of genes 
or genomic regions can occur frequently. 
• Occasional recruitment of new function from this pool is possible. 
• Similarly, duplication and divergence of genes is a major source of evolutionary novelty. 
Extensive transcript processing. 
• K-selected organisms tend to occur where nutrient supply is more stable. 
• Fast gene expression is therefore not strongly selected, so extensive transcript processing 
is not strongly disadvantageous. 
• Any potential benefits of processing, such as alternative splicing and RNA editing can 
therefore be realised, and lead to many RNA intermediates from one gene, resulting in a 
more complex genotype-phenotype relationship (the ribotype concept of Herbert & Rich 
1 999). 
Constitutive multicellularity in eukaryotes. 
• Increased propensity for division of labour among 'obligate cooperators' results in cell 
specialisation. (This occurs transiently in other eukaryotes and in prokaryotes). 
• Specialisation also results in different, irreversible, developmental fates of cells, tissues 
and organs, larval and adult stages in metazoa, polyphenic insects. 
• Specialisation can lead to efficient mechanisms for large-scale nutrient storage (e.g. 
adipose tissue, glycogen, and starch), further stabilising the control of nutrients. 
• Specialisation permits heavy investment in specific structures such as organs and 
mechanical tools for nutrient acquisition, defence or competition. 
• Regulation of developmental pathways is less dependent on environmental cues, with 
greater internal control. 
All the above are generalisations to which there must be exceptions. Describing an 
organism as r- or K-selected is relative, and focuses on the extremes (prokaryotes and 
multicellular eukaryotes). The differences are on a continuum. For instance, 
unicellular eukaryotes are r-selected relative to their multicellular relatives, and many 
of the points listed under prokaryotes apply to this group. Transcript processing and 
junk accumulation is less extensive in unicellular eukaryotes, operons and 
periodically-selected functions are a feature of their genomes, and developmental 
regulation is tightly linked to environmental cues. Constitutive multicellularity makes 
horizontal transfer unlikely, but unicellular eukaryotes may acquire new functions 
through DNA uptake. 
Future work 

Future work 
Testing the thermoreduction hypothesis. 
In this thesis, I have examined a wide range of issues with respect to the origin 
of prokaryotes and eukaryotes .  My overall conclusion is that prokaryotes represent 
derived lineages, not ancestral ones, as has generally been thought. By using a well­
developed model for the RNA world as the outgroup, it has been possible to establish 
that a number of expected features of the LUCA have been maintained in eukaryotes 
and lost in prokaryotes.  Eukaryotes have been more conservative in terms of 
biochemical evolution, and and I have presented a detailed discussion on prokaryote 
and eukaryote evolvability to support this claim. I have questioned the dogma that all 
eukaryote-specific features are recent evolutionary innovations, and have presented a 
challenge to this dogma in the form of a critique of the problems surrounding the 
question of the origin of the nucleus. 
In each chapter, specific conclusions are presented, and it would be redundant 
to describe these again. This thesis has concentrated both on the use of RNA relics as 
a marker for establishing the direction of evolution at the root of the tree of life, and 
on ecological aspects of prokaryote and eukaryote lifestyle and how this could 
account for my findings. An independent test is however available, and has been 
briefly suggested in several of the chapters, but not described in detail .  
If  Forterre's thermoreduction hypothesis is correct, evidence of past 
thermophily should be identifiable in prokaryote lineages, but not eukaryotes .  If the 
LUCA was a thermophile however, such evidence will be found in all three lineages. 
A range of traits which contribute to thermostability at high temperatures might be 
relevant in testing the thermoreduction hypothesis [see Forterre 1 996; Daniel & 
Cowan 2000] and three specific studies are outlined below. 
Thermoreduction can be invoked in understanding the phylogenetic 
distribution of RNA relics, and r-selection reinforces this. Perhaps the most obvious 
prokaryotic feature that the latter cannot account for however is the emergence of 
circular genomes. If the prokaryote lineages did evolve through thermoreduction, this 
can be tested by looking for signatures of past thermophily in mesophilic prokaryotes 
and in eukaryotes. The thermoreduction hypothesis predicts that such signatures will 
be present in mesophilic prokaryotes, but absent from eukaryotes .  The 'thermophilic 
LUCA' hypothesis predicts that evidence of past thermophily will be present in all 
three domains, though this has never been tested. 
Both hypotheses actually cover a range of possibilities, which in a simple form 
can be considered as 'cold start, hot LUCA' or 'hot start, hot LUCA' for a thermophilic 
1 
I)re-RNA .. RNA 
world world .. RNP .. 
c.atal�c .. DNA pl'otellls .. LUCA . 
post-
LUCA 
85 °c - 1 10 °C • • • 'hyperthermophily' • 
50 °C - 85 °C • • • • • ' thermophily 
� 50 °C 
'mesophily' • e e e e e 
Figure 1 .  Relationship between temperature and the origin and early evolution 
of life. 
T h i s  f igure is a form a l isation of the relation s h i p  between tem perature and the ori g i n  
o f  l ife, based on F i gure 3 o f  Moul ton e t  a l .  [2000]. Red dots i nd icate w hether eac h 
stage can, i n  p r i n c i p l e ,  e x i st w i th i n  the temperature ranges i n d icated to the right. I n  
t h e  case o f  peri ods predat i n g  t h e  R N A  worl d ,  i t  i s  n o t  c l ear w hether l i fe began a t  h i g h  
or l o w  temperatures, a n d  t h e  l i m its are not wel l  estab l ished, because t h e  p rocesses and 
req u i s i tes are not estab l i s hed . For the RNA world period, the upper l i m i t  on stab i l i ty 
of tert i a ry structure of naked R N A  [B ri o n  & Westhof 1997] l i m i ts t h i s  period to the 
l owest temperature range s h o w n .  T h e  perm iss i b i l ity of l ater periods i s  estab l i shed by 
w hether modern orga n i s ms l iv in g  at various temperatures possess any of these tra i ts 
shown .  For i ns tance, for the R N P  stage, the r i bosome is known to be u b i q u i tous, so 
R N Ps can c learly function at over 100°C. W h i l e  any com b i nation of stages is poss i b l e  
i n  p r i n c i p le, t h e  b l ue r i ngs i n d i cate t h e  hypothes i s  that best f i ts w i th t h e  R N A  world 
data described in t h i s  thes is .  The data can not be used to exam i n e  ear l i e r  periods in the 
origin of l ife, as has been po i nted out e l sewhere [M i l l e r  & Bada 1998]. 
2 
e 
e 
e 
----- - ����-- - �- - - -- -
-- - -
LUCA, and 'cold start, cold LUCA' or 'hot start, cold LUCA' for thermoreduction. 
This is discussed by Miller & Bada [ 1 988], and by Moulton et al. [2000] . Moulton et 
al. provide a more detailed set of scenarios, though because they only consider 
conditions in the RNA world period and later, they do not consider the possibility of a 
'hot start' prior to the emergence of RN A. Figure 1 ,  extends the work of Moulton et al. 
to cover all aspects of the origin of life, in line with the general consensus that the 
RNA world would have been mesophilic, but earlier prebiotic periods could have 
involved either low or high temperatures. 
The RNA world data presented here supports all scenarios where the RNA 
world and LUCA existed at moderate temperatures. Specific estimates of the upper 
temperature limit for both the RNA world and the LUCA are possible, given the 
conclusions presented here. For the RNA world, the upper temperature limit is 
dictated by the stability of RNA tertiary structure, which is lost under 50°C (reviewed 
in Brion & Westhof [ 1 997]).  The upper limit might conceivably be increased through 
stabilisation of RNA by Mg2+ [Brion & Westhof 1 997] ,  or through methylation 
[Kowalak et al. 1 994; Noon et al. 1 998] .  The LUCA is more difficult to estimate in 
that the potential for stabilisation of thermolabile traits is available. Nevertheless, 
given that the RNA world data establish the eukaryotic lineage as having been more 
conservative in terms of RNA replacement, it is reasonable to assume that higher 
temperature tolerance in prokaryotes is derived (and concurrent with replacement of 
ancestral RNA biochemistry), so the LUCA most probably existed at those 
temperatures inhabited by modem day eukaryotes. While some putatively 
'thermophilic' eukaryotes have been identified (such as desert ants & bees, 
polychaetes worms from hydrothermal vents [McMullin et al. 2000] and Tetrahymena 
thermophila [Hallberg et al. 1 985]) in all cases, none have been shown to stand 
sustained internal temperatures above 50°C. The Australian ant Melophorus begoti is 
capable of surviving at 54°C for one hour, with a critical thermal maximum of 
56.7°C. On phylogeny, these have evolved from more mesophilic organisms, and 
while little is known of their usage of RNA, the maximum may be predicted to be set 
by RNA tertiary structure, as suggested by the close correlation between internal body 
temperature [see McMullin et al. 2000] and upper limits on RNA tertiary structure 
[see Brion & Westhof 1997] . In the case of the hydrothermal polychaete worms, 
proteins such as haemoglobin and collagen have been shown to be unstable at 
temperatures approaching 50°C [reviewed in McMullin et al. 2000] . 
Extremes of pressure might be relevant to increased stability, but recent 
studies have suggested that pressure results in unfolding as a result of water 
penetration into the protein matrix [Silva et al. 200 1 ] .  Nevertheless, prokaryotes have 
clearly surpassed these limits [Rothschild & Mancinelli 200 1 ], and likewise, proteins 
have been identified that are stable well above the growth temperature of 
hyperthermophiles [Hiller et al. 1 997] .  The point that is interesting in light of the 
RNA world hypothesis is that, in the absence of mechanisms of stability (such as 
3 
protein-mediated stabilisation), the expectation is that the upper limit for life in this 
period would not have exceeded 50°C, and the data in this thesis are most consistent 
with this constraint having been present in the LUCA and in the lineage leading to 
modern eukaryotes, while the prokaryote lineages developed mechanisms which 
enabled colonisation of higher temperatures [Figure 1 ] .  
A final point i s  that the interpretation of early evolution described here i s  not 
synonymous with the thermoreduction hypothesis, it is merely consistent with it. For 
any thesis suggesting that a high temperature lifestyle is ancestral, it is still necessary 
to explain what selection pressure led to the replacement of protein by RNA in ancient 
processes subsequent to the emergence of eukaryotes from prokaryote-like ancestors. 
The direction of evolution from RNA to RNP to protein described in this thesis was 
based not on temperature considerations, but on the evolution of catalytic efficiency. 
Where thermoreduction is perhaps important is that RNA instability at high 
temperatures may result in replacement of RNP with protein, even where the RNP had 
reached catalytic perfection. No detailed argument for protein replacement by RNP 
has been provided by those who favour the various thermophilic LUCA hypotheses. 
In table 1 ,  I have described all patterns in the data that might be observed 
(including those which have not been observed, and are not predicted by either 
thermophilic or mesophilic LUCA hypotheses. In addition, I have provided an 
interpretive framework for the patterns, in the form of the two proposed rootings of 
the tree of life (bacterial and eukaryotic). Since the archaeal rooting is not seriously 
considered, this is omitted, but the interpretations would overlap with those for the 
bacterial rooting. 
Importantly, the formal interpretations given in table 1 are expected to be very 
limited in terms of hypothesis testing, since these consider only a single trait, whereas 
for thermophily, many traits contribute to this. Using the RNA world as an outgroup 
for the mesophilic LUCA hypothesis greatly aids interpretation, but there are several 
possible sources of potential conflict. The simplest would be that a trait contributing 
to thermophily in archaea and bacteria was also found in eukaryotes, and no evidence 
of horizontal transfer was detected (scenario 2a). For this data to overturn the 
conclusion that the LUCA was mesophilic would require that the RNA world dataset, 
the absence of circular genomes in eukaryotes and Forterre's reverse gyrase data 
[Forterre 1 995] can also be explained within this new context. Indeed; given that the 
observation in scenario 2a relies on a lack of evidence for horizontal transfer, the 
simplest interpretation of the data would be that detection of such an ancient 
horizontal transfer is beyond the limits of current methods. Another is that the origin 
of the trait was mesophilic, and that it was simply coopted during adaptation to 
elevated temperatures. 
Another complication is demonstrated by scenario 1 2, where, taking only the 
trait described, support for a mesophilic or thermophilic LUCA is root-dependent. 
The data from whole genome comparisons strongly suggests that eukaryote 
genomes are chimeric, with operational genes (sensu Rivera et al. [ 1998]) being of 
4 
bacterial origin [Ribeiro & Golding 1 998;  Rivera et al. 1 998;  Horiike et al. 2001 ] .  On 
the model described in this thesis, the distributional data are best interpreted as the 
result of transfer of endosymbiont genes to the nucleus, with subsequent widespread 
replacement of proto-eukaryotic orthologues (scenarios 2b & 3b in table 1 ) .  The 
implication is as follows. If the reductive evolution model described in this thesis is 
correct, on the order of 50% of genes in eukaryotes are bacterial, and therefore had a 
hot history under thermoreduction. These genes fit largely into the operational class. 
With a hot LUCA and an archaeal-bacterial fusion origin for the eukaryote lineage, 
100% of euka.ryotic genes would be prokaryotic in origin, and would all retain 
evidence of a hot history. Thus, testing the thermoreduction hypothesis would require 
looking at the approximately 50% of genes which are argued to be most closely 
related to archaeal genes, that is, informational genes (sensu Rivera et al. [ 1 998]). 
5 
Table 1 .  Interpreting trait distributions within the framework of a thermophilic or mesophilic LUCA, under bacterial or 
eukaryote rootings of the tree of l ife. 
Traitsa H� Bacterial rooting Eukaryote rooting 
B A E Thermophilic Thermoreduction Thermophilic Mesophilic 
(T)LUCA (M)LUCA (T)LUCA (M)LUCA 
Scenario 0 1 2 - - U n i nfo rmative o n  U n i n fo r m ative o n  8 -A c o n v e rg e n ce U n i n fo rm at i v e  o n  
s i m p l e  pa rs i m o n y  p a rs i m o ny a l o n e  - p rec l u d es s i m p l e  p a rs i m o n y  
a l o n e  - 8-A b u t  R NA world a s  i nterp retati o n .  a l o n e  - b u t  R NA 
c o n v e rg e n ce o utg ro u p  s u p port s  An cestra l w o r l d  a s  o utg ro u p  
fav o u rs reject i o n .  ( M ) L U CA .  t h e rm o p h i ly n o t  s u p p o rt s  
s upported .  (M)L U CA .  
Scenario 1 a  1 2 2 )( F a l s ified , o n l y A- E F a l s ifi ed , Trait 2 s u g g ests F a l s ifi e d , 
(Not observed) M R CA a t h e rm o p h i l i c  o ri g i n  ( T ) L U CA ,  b ut tra i t  t h e rm o p h i l i c  o ri g i n  
thermophi le ,  Tra its for eukaryotes . 1 must be for eukaryotes. 
1 &2 s u g g est 8 &A exp l a i n ed by 
t h e rm o p h i ly N O R c  ( n ot 
convergent.  testable) 
Scenario 1b  1 2 2 ,f Not i nformative, If H T  is A-7E,  not Not informative for A-7E ,  not 
(Not observed) fo r A-E M R CA, H T  i ncons istent with L U CA , H T  i ncons istent with 
o bs c u res t h e rm o red u ct i o nl o b s c u re s  t h e rm o re d u ct i o nl 
ancestral state . ( M ) L U CA. ancestral state . ( M ) L U CA .  
Reco n c i l i n g E �A n ot Reco n ci l i n g E �A n ot 
convergence of c o n s i stent with convergence of co n s i stent w it h  
t ra its 1 & 2  req u i res e it h e r .  t r:a its 1 & 2  req u i res e i t h e r . 
N O R  - u ntesta b l e .  N O R  - u ntesta b l e .  
Scenario 2a 1 1 1 )( ( T ) L U CA ( M ) L U CA fa l s i fi ed ( T ) L U CA ( M ) L U CA fa l s ifi ed 
(Not observed) s u p p o rted s u p p o rted 
Traitsa Hr Bacterial rooting Eukaryote rooting 
B A E Thermophilic Thermoreduction Thermophilic Mesophilic 
(T)LUCA (M)LUCA (T)LUCA (M)LUCA 
Scenario 2b 1 1 1 J N ot i nfo rm ative , I f  B/A---7 E ,  n ot N ot i nfo rmative, I f  B/A---7 E ,  n ot 
(Organellar glu H T  o bscu res i n co n s i stent w ith H T  o b s c u res i n co n s i stent with 
mischarging, Nf4- a n cestra l state . t h e r m o red u cti o n . a n cestral state . t h e rm o re d u cti o n . 
dep . CP synthesis) N . B .  F o r  N . B . F o r  
o rg a n e l l a r  o rg a n e l l a r  
fu n cti o n s ,  et 3 b  fu n ct i o n s ,  et 3 b  
Scenario 3a 1 2 1 X B - E  M RCA F a l s ified , B-E M RCA F a l s ifi e d , 
(Not observecl) ( L U CA) a t h e rm o p h i l i c  o ri g i n  ( L U C A )  a t h e rm o p h i l i c  o ri g i n  
t h e rm o p h i l e ,  A-B for e u k a ry otes . t h e rm o p h i l e ,  A- B fo r e u ka ry o te s .  
conve rg e n ce c o n ve rg e n ce 
requ i res N O R _  requ i res N O R _  
Scenario 3b 1 2 1 J N o t  i nfo rmative, B ---7 E ,  n ot N ot i n fo rmative, B ---7 E ,  not 
(cf 2b) B - E  H T  o b s c u res i n co n s i stent w ith H T  o b s c u res i n co n s i stent w it h  
a n cestral state . ( M ) L U CA; a n cestra l state . t h e rm o re d  u cti o n ;  
C o nverg e n ce of e n d os y m b i o nt C o n v e rg e n ce e n d o sy m b i o nt 
t h e rm o p h i ly h y pothes i s  g ives fav o u rs rej ect i o n .  h y p ot h e s i s  g ives 
favo u rs reject i o n .  a d d it i o n a l  test. a d d i ti o n a l  test. 
et S ce n a ri o  O.  et S ce n a ri o  O. 
Scenario 4a 1 1 - X ( T ) L U CA ( M ) L U CA rej ected U n i nformative - U n i n fo r m at i ve o n  
s u pported o n  o n  s i m p l e t h e rm o p h i l i c tra i t  s i m p l e  p a rs i m o n y , 
s i m p l e  p a rs i m o n y .  p a rs i mo n y . o n ly o n  o n e  s i d e  a l o n e  - b ut R N A  
N . B . C o n cl u s i o n  i s  N . B .  Rej e ct i o n  i s  o f  t h e  ro ot , ca n n ot w o r l d  a s  o u tg ro u p  
d e pen d e nt o n  d ep e n d e nt o n  esta b l i s h  i f  i t  i s  s u p p o rt s  
co rrect rooti ng. c o rrect rooti ng. a n cestra l .  (M)L U CA .  
Traitsa HT' Bacterial rootinQ Eukaryote rooting 
B A E Thermophilic Thermoreduction Thermophilic Mesophilic 
(T)LUCA (M)LUCA (T)LUCA (M)LUCA 
Scenario 4b 1 1 - J U n i nfo rmative - H T  betwee n  A& B U n i n fo rmative - H T  b etwe e n  A & B 
(Reverse gyrase ,  H T  o bs c u res n ot i n co n s i stent HT o bscu re s  n ot i n co n s i st e n t  
T. maritima a n cestra l state . w ith (M)LUCA a n cestral state . w it h  (M)LUCA 
genome) rooted w ith R NA B ut t h e rm o p h i ly rooted with R NA 
w o rl d  dataset. o n l y  o n  one s i d e  w o r l d  dataset. 
of root - T ( L U CA) 
n ot s uppo rted . 
Scenario 5 - - 3 - U n i nfo rm ative u n d e r  a l l  sce n a r i o s  with o n ly s i m p l e  p a r s i m o n y  - n ot poss i b l e  to 
esta b l i s h  w h ether t h e  tra it a rose s pecifica l l y  in e u k a ry otes after d i v e rg e n ce fro m 
t h e  proka ryote l i n e a g e s .  R NA w o r l d  d ataset p ro v i d e s  a n  exce p ti o n  a s  tra its 
p red ate L U CA. 
Scenario 6a 3 - 3 X R eject C o n s i stent w i th R ej e ct C o n s i ste n t  w ith 
(Not. observed) ( M ) L U CA, b ut n ot ( M ) L U C A ,  b u t  n ot 
t h e rmored u ct i o n  t h e r m o red u ct i o n  
Scenario 6b 3 - 3 J N ot s u p p o rted . N ot i n co n s i stent N ot s u p ported . N ot i n co n s i st e n t  
(Not o.bser:vec:l) 
ScenaJ..�io 7a - 3 3 X R ej ect . A- E M R CA O n  s i m p l e  R ej ect C o n s i stent w it h  
(Not observed) m e so p h i l i c .  p a rs i m o n y ,  c a n n ot ( M ) L U CA ,  b u t  n ot 
esta b l i s h  a n cestral t h e r m o red u ct i o n  
state . 
C o n s i stent w i t h  
( M ) L U CA, b ut n o t  
t h e r m o red u ct i o n , 
u s i n g  R NA w o r l d  
d ataset .  
Traitsa Hr Bacterial rooting Eukaryote rooting 
B A E Thermophilic Thermoreduction Thermophilic Mesophilic 
(T)LUCA (M)LUCA (T}LUCA (M)LUCA 
Scenario 7b - 3 3 ./ Not s u p p o rted . O n  s i m p l e  N ot s u p p o rted HT o b s c u res 
(Not observed) pars imony ,  cannot ancestra l state . 
estab l ish  ancestra l Us ing RNA world 
state . a s  o utg ro u p ,  is 
Consistent with consistent w ith 
( M ) L U C A ,  but not ( M ) L U CA ,  but not 
t h e rm o red u cti o n ,  t h e r m o red uct i o n  
using RNA world 
dataset. 
Scenario 8 1 1 3 X Supported on Rejected on Prokaryot ic MRCA (M)LUCA 
(Not observed) s imple pars imony. s imple pars imony.  thermophi l ic ,  supported us ing 
Conclusion is root- Conclusion is root- un informative for RNA world as 
dependent. dependent. (T)LUCA outgroup .  
P roka ryotes 
monophy letic,  
thermoreduction 
supported . 
Scenario 9 1 1 3 ./ Un informative - Not inconsistent Un informative - (M)LUCA 
HT of trait 1 with (M)LUCA, HT of tra it 1 supported us ing 
obscu res using RNA world obscures RNA world as 
ancestral state. as  outq roup .  ancestra l  state . outgroup. 
Traitsa HP Bacterial rooting Eukaryote rooting 
B A E Thermophilic Thermoreduction Thermophilic Mesophilic 
(T)LUCA (M)LUCA (T)LUCA (M)LUCA 
Scenario 1 0  1 2 3 - Reject o n  s i m p le N ot i n co n s i ste n t  Reject o n  s i m p le ( M ) L U CA 
(Convergent glu p a rs i m o n y  - with ( M ) L U CA, p a rs i m o n y  - s u p p o rted u s i n g  
mischarging t h e rm o p h i l y  therm o p h i l y  a s  therm o p h i ly R NA w o rl d  a s  ID c o n vergent, n ot d e r i ve d . c o n v e rg e nt ,  n o t  o utg ro u p ,  
archaea- a n cestra l .  N O R  a n cestra l .  E v e n  t h e rm o p h i ly 
bacteria, p o s s i b l e ,  b u t  n ot co n s i d eri n g  N O R ,  d e rived . 
direct charging d e m o n stra b l e .  ca n n ot s h ow 
in eukaryotes ) therm o p h i l y a s  
a n cestra l .  
Scenario 1 1 3 1 3 X R ej ect o n  s i m p l e ( M ) L U CA Reject o n  s i m p l e  ( M ) L U CA 
(Not observed) parsimony.  s u p p o rted on pars imony.  supported on 
s i m p l e  p a rs i m o n y ,  s i m p l e  p a rs i m o n y ,  
thermored u ct i o n  t h e r m o re d u ct i o n  
on ly i n  archaea . on ly i n  archaea . 
Scenario 1 2  3 1 3 ../ U n i n fo rmative - HT fro m E -7 8 ,  U n i nfo r m ative - H T  fro m E -7 8 ,  
(Direct gln H T  of tra i t  3 co n s i stent with H T  of tra it 3 co n s i ste n t  w i t h  
charging in G+ o bscu res e n d o sy m b i o nt o bs c u res e n d o sy m b i o n t  a n cestral state . hypoth es i s ,  b ut a n cestra l state . h y po t h es i s ,  b u t  
bacteria) F o r  g i n  c h a rg i n g ,  a n cestra l state F o r  g i n  c h a rg i n g ,  a n cestra l state 
a n cestra l state i n  o bs c u red . F o r  g i n  a n cestra l state i n  o b s c u re d . F o r  g i n  
8 i s  1 ,  o n  s i m p l e  c h a rg i n g ,  8 i s  1 ,  s i m p l e  c h a rg i n g , 
p a rs i mo ny ( &  a n cestral state i n  p a rs i m o ny a n cestra l state i n  
w it h o u t  H T  of 1 )  8 i s  1 ,  o n  s i m p l e  u n i nfo rm a t i v e .  8 i s  1 ,  u s i n g  R N A  
( T ) L U CA p a rs i m o ny ( &  w o rl d  d a ta set , 
s u p ported . with o u t  H T  of 1 )  ( M ) L U CA 
C o n c l u s i o n  i s  root- ( M ) L U CA rej ected . s u p p o rte d . 
d ep e n d e nt . C o n c l u s i o n  i s  root-
d epe n d e n t .  
C' . ,-,cenarlO 
(Nut oJ.::>servec:l) 
Scenario 
1 3  
1 4  
Traitsa Hr 
B A E 
1 3 3 )( 
1 3 3 ./ 
Bacterial rooting Eukaryote rooting 
Thermophilic Thermoreduction Thermophilic Mesophilic 
(T)LUCA (M)LUCA (T)LUCA (M)LUCA 
Un informative , RNA world S imple pars imony ,  RNA world 
cannot estab l ish outgroup supports RNA world outgroup  supports 
ancestral state on (M )LUCA,  but not outgroup support (M )LUCA,  but not 
s imple pars imony.  thermoreduct ion (M )LUCA,  but  not thermoreduction  
for Archaea . thermoreduct ion for Archaea . 
for Archaea . 
Un informative, HT RNA world H T  obscures H T  obscures 
obscures outgroup supports ancestra l state.  ancestral state . 
ancestral state in  (M)LUCA, but not 
AlE .  thermoreduct ion  
for Archaea , 
un less N O R -
untestab le .  
aEac h n u m ber represents an i ndependent trait,  not rel ated to a n y  o f  the others b y  common descent. Red n u m bers :  tra i ts contributing to thermop h i l y .  
B l ue n u m be rs :  mesop h i l ic traits .  B - Bacteria, A - A rchaea, E - E u karyotes. 
bHT is s ho rt for hori zonta l  transfer. 
c N O R :  N on-orthologous rep lacement. 
Glutamine usage. 
As a free amino acid, glutamine is relatively more thermolabile than when 
incorporated into a peptide chain [reviewed in Greenstein & Winitz 1 96 1 ] .  Early 
studies described deamidation of free glutamine at higher levels than asparagine on 
boiling with magnesia [see Chibnall & Westall 1 932;  Greenstein & Winitz 1 96 1  and 
references therein] , and heating of glutamine at l OO°C for 2-3 hours at a range of pH 
values resulted in extensive deamidation [ChibnalI & Westall 1 932;  Vickery et al . 
1 935] .  Gilbert et al. [ 1 949],  measured non-enzymatic deamidation of glutamine in the 
presence of a range of anions at various concentrations, pH and temperature. Non­
enzymatic deamidation of glutamine is extensive in the presence of phosphate (at 
pH8, and 37°C). Near complete deamidation can be seen within 48 hours at 47°C in 
the presence of phosphate. Glutamine does not appear to possess an optimal pH for 
stab ility, but at extremes of pH, deamidation is greater. However, added phosphate 
results in increased deamidation at increasing pH, and decreasing deamidation at 
decreasing pH. 
To measure the effect of temperature, Gilbert et al . [ 1 949] made digests with 
O . l M  glutamine and 0.8M phosphate in buffer at pH8, and incubated these at either 
47.4°C or 37°e. After l .5 hours, 1 O.6�M and 4.8�M ammonia as liberated at these 
respective temperatures, and after 3 .25 hours, 2 1 .0 and 9 .2�M ammonia was liberated 
respectively (of a total of 90�M for complete deamidation) .  The temperature 
coefficient for glutamine in the presence of phosphate is approximately 2 for a 
difference of l O°e. 
These data suggest that glutamine instability should present a significant 
problem for even moderately thermophilic organisms (i .e.  living above 50°C), 
especially given the greater instability of this amino acid in the presence of phosphate. 
Indeed there are indications that this may be the case, and that the examination of 
glutamine usage will shed light on the competing hypotheses of thermoreduction and 
a thermophilic LUCA. 
Glutamine is a major nitrogen donor in eukaryote metabolism, but predicted to 
be at  such low concentrations in thermophiles that ammonia is expected to be used in 
its place [Papers 2&4] . One example from the hyperthemophilic archaeon Pyrococcus 
furiosus is carbamoyl phosphate synthesis via an ammonia-dependent pathway, as 
opposed to the standard glutamine-dependent pathway [Legrain et al. 1 995] . Another 
example is glutamate mischarging [Ibba et al. 1 997J, where glutamate is charged to 
glutaminyl-tRNA then amidated to form glutamine. This has the effect of making 
glutamine synthesis the final step before incorporation into protein, suggesting that 
this is an adaptation to a high temperature environment [Poole et al . 1 998] .  In support 
of thermoreduction, mischarging is found in eubacteria, archaea, and eukaryote 
organelIes, but not the cytoplasm [Ibba et al. 1 997, Ibba & SolI 200 1 ] .  While 
mischarging of glutamate to glutaminyl-tRNA can be argued to be a 
1 2  
thermoadaptation, in those organisms examined to date, the nitrogen donor for 
amidation of mischarged glutamate is  glutamine [Ibba & SoU 200 1 ] !  That glutamine 
is the nitrogen donor in such cases is consistent with glutamine being favoured over 
ammonia under 'permissive' conditions (no hyperthermophilic pathways have been 
examined as yet) . It also serves to exemplify that looking for relics of 
thermoreduction will not be a trivial exercise. 
Glutamine-dependent metabolism would be broadly classed as falling within 
the operational class of genes, so might be expected to have been replaced by genes 
from the endosymbiont. However, while I have argued in this thesis for a general 
replacement of proto-eukaryote operational othologues by endosymbiont genes, there 
will be exceptions. I suggest that glutamine-dependent metabolism would be one 
example, since, if the endosymbiont pathways are ammonia-dependent, they will not 
be able to supplant the original proto-eukaryotic genes because they cannot utilise 
glutamine. Indeed, a clear case is in the different pathways for synthesis of carbamoyl 
phosphate in the cytosol and in the mitochondrion of eukaryotic cells .  The cytosolic 
class of enzyme is glutamine-dependent, while the mitochondrial class is ammonia 
dependent [Legrain et al. 1 995]. 
Comparative metabolic databases such as WITS 
[http://wit.mcs.anl.govIWIT2/] and KEGG [http://www.genome.ad.jpikegg/] . genome 
data, and the biochemical literature can be searched for all pathways where glutamine 
andlor ammonia act as nitrogen donors, to look for evidence of past thermophily. 
Furthermore, pathways involving other thermolabile metabolites such as carbamoyl 
phosphate [Van de Casteele et al. 1 997] will also be examined. It is worth 
emphasising that while the example of carbamoyl phosphate synthesis represents a 
clear-cut case, this is not always to be expected. In thermophilic organisms, one 
should find mechanisms of adaptation to metabolite thermolability. However, in 
mesophilic prokaryotes, it is signatures of past thermophily that are important, and 
these will not necessarily be as clear as expected for comparisons between extant 
hyperthermophiles and eukaryotes .  An example is that Gram negative bacteria have a 
direct pathway for glutaminyl-tRNA charging. In this case however, it is has been 
argued by several authors that this is as a result of horizontal transfer from a 
eukaryote source [Lamour et al. 1 994, Handy & Doolittle 1999] . Upon readaptation to 
mesophilic temperatures, free glutamine can become available intracellularly, so 
glutamine-dependent pathways could potentially replace ammonia-dependent 
pathways .  
One problem with thermolability studies is that these are generally carried out 
in vitro. This helps in establishing the physicochemical properties of a molecule, but 
this alone is not necessarily informative in all cases, as exemplified by the use of 
carbamoyl phosphate in hyperthermophiles. That this metabolite is used in an 
organism such as Pyrococcus furiosus might be considered anomalous if it were not 
known that metabolite channelling protects carbamoyl phosphate from being 
degraded [Van de Casteele 1997] .  In the case of glutamine, it is therefore of great 
l3 
interest to establish its stability intracellularly when present as a free amino acid, and 
moreover, to establish the fates of the ammonia and glutamate moieties in 
thermophilic and hyperthermophilic organisms. 
Nuclear Magnetic Resonance (NMR) studies provides the sort of resolution 
required, and have already been used in this way to examine intracellular free amino 
acid dynamics in archaea (e.g. Robertson et al. 1 992). The advantage for the study of 
intracellular glutamine is that it would be possible to label both the nitrogen eSN), 
which is released subsequent to deamidation, as well as I3C-Iabel the glutamine [see 
Lundberg et al. 1 990, for review] . In this way, the fates of both moieties could be 
examined, in particular, making use of the fact that enzymatic deamidation of 
glutamine )\ields glutamate, while non-enzymatic deamidation yields 
pyrrolidonecarboxylic acid [Chibnall & Westall 1 932; Vickery et al. 1 935;  Greenstein 
& Winitz 1 96 1 ] .  It should also be possible to establish whether glutamine is directly 
incorporated into protein. Likewise, the fate of free ammonia and glutamate and could 
be examined to see whether these are coincorporated into protein. 
Some data are available on intracellular glutamine concentration in the 
archaeon Methanobacterium thermoautotrophicum, based on NMR studies of 
nitrogen assimilation [Choi et al. 1 986, Choi & Roberts 1 987] . M. 
thermoautotrophicum can utilise glutamine, urea or ammonia as sole nitrogen source. 
Choi & Roberts [ 1 987] report that when cells were grown on [8)SN] glutamine as 
sole nitrogen source, intracellular concentrations of this amino acid were too low to be 
detectable, yet glutamine was reported to be stable for several days in the presence of 
an anaerobic cell extract (though the incubation temperature was not described) . The 
authors conclude that their data best support the existence of an efficient glutamine 
permease for uptake, coupled with glutamate synthase. They rule out the presence of a 
glutaminase on the basis of the stability of glutamine when incubated with cell 
extract. This explanation requires the glutamate synthase to be located at the 
membrane, coupled to the permease. It has alternately been suggested that non­
enzymatic degradation in the cell medium prior to uptake is also a possibility that has 
been suggested [Friedman & Thauer 1 987]. 
The genome sequence of M. thermoautotrophicum [Smith et al. 1 997] sheds 
some light on these conflicting positions. No glutaminase was detected in the 
published annotation, consistent with Choi & Roberts' conclusion. An ABC 
transporter for glutamine is present, and enzymes such as glutamine synthetase and 
glutamate synthase are also detected, again consistent with Choi & Roberts [ 1 987].  
Nevertheless, ammonium transporters are also present, so, under some conditions, 
ammonia liberated from glutamine would be taken up. Indeed, given that M. 
thermoautotrophicum has been documented to grow at temperatures ranging from 40-
70°C [Smith et al. 1 997, and references therein], it would be interesting to examine 
the fate of extracellular glutamine at various temperatures ,  labelling with both l3C and 
1 5N.  
14  
While such studies were outside the scope of this thesis, the use of NMR 
spectroscopy for examining intracellular glutamine concentrations and fate are 
technically feasible, and together with genome data, will provide a rich source of data 
of relevance to studies of the nature of the LUCA. This would serve to establish 
substrate usage and/or preference where ammonia and glutamine may be 
interchangeably utilised, and under a range of growth temperatures. 
Other thermolabile metabolites can also be examined. Porterre [ 1 996] points 
out that A TP is thermolabile, and that its use is avoided in hyperthermophilic archaea, 
which make use of ADP or PPi for energy storage in glycolysis. Nevertheless,  given 
that A TP stores more energy than these other two energy cofactors, it would not 
necessarily be an argument against thermoreduction to find that these are now used in 
non-hyperthermophilic archaea - it may simply reflect selection for ATP over the 
other two cofactors at lower temperatures. There are a number of key metabolites and 
coenzymes which are thermolabile at hyperthermophilic growth temperatures, such as 
NAD, pyridoxal phosphate and acetyl phosphate [see table 1 in Daniel & Cowan 
2000],  yet these are nevertheless present in hyperthermophiles, suggesting 
mechanisms for prevention of thermodegradation are present [Daniel & Cowan 2000]. 
Another thermolabile metabolite is carbamoyl phosphate, which is a 
ubiquitous intermediate in the synthesis of arginine and pyrimidines. Interestingly, it 
is subject to metabolite channelling (where a metabolite is moves through a channel in 
a mUltienzyme complex - the intermediates are not released but instead move along 
the channel to the next active site), which has the effect of stabilising it at high 
temperatures [Van de Casteele et al. 1 997] .  If thermoreduction has occurred, 
channelling ought to be found in mesophilic prokaryotes, but not necessarily in 
eukaryotes, assuming that orthologous gene replacement has not occurred. 
RNA thermoadaptation. 
A pivotal study on RNase P RNA in bacteria established evidence for past 
thermophily in E. coli, where the optimum temperature for operation of this RNA is 
50°C, well above the growth temperature of E. coli [Brown et al. 1 993]. While this 
was interpreted as evidence for a thermophilic LUCA, eukaryote RNase P RNAs were 
not compared. This comparison is necessarily difficult since eukaryote RNase P RNA 
is not catalytically active in the absence of its cognate proteins [Kirsebom & AItman 
1999]. A broader study of RNase P, as well as other RNAs can be approached by 
examining frequency of mismatches and non-canonical base-pairs in helices, percent 
pairing, percent G-C pairing, and other parameters that were shown to impact on 
RNA thermostability. In their original analysis, Brown et al. [ 1 993] established that 
these parameters were more central to thermostability than GC content. 
Secondary structure melting profiles for RNA are reasonably accurately 
predicted by theoretical approaches [Moulton et al. 2000], allowing a wide range of 
1 5  
RNAs to be tested. It is noteworthy that thermostability of methylated RNA, as found 
in hyperthermophiles, will be underpredicted by such analyses [Kowalak et al. 1 994; 
Noon et al. 1 998] . 
There is little likelihood that horizontal transfers will confound such an 
analysis . In the case of RNase P, the differences between eukaryotic and prokaryotic 
RNase P RNPs is significant, and notably, the protein composition is completely 
different [Altman & Kirsebom 1 999]. The same is expected for other ubiquitous 
RNAs that could be compared, such as the signal recognition particle (srp) RNA 
[Stroud & WaIter 1 999; Eichler & Moll 200 1 ] .  
Protein themlOadaptation. 
Several studies have compared proteins between thermophilic and mesophilic 
prokaryotes [McDonald et al. 1 999; Haney et al. 1 999; McDonald 200 1 ] ,  finding that 
serine, asparagine, glutamine, threonine and methionine tend to be reduced in the 
proteins of thermophiles, while isoleucine, arginine, glutamate, lysine and proline 
residues tend to be increased. No composition asymmetry analyses have yet been 
carried out across all three domains. Testing thermoreduction by examining protein 
thermoadaptation requires such an analysis. Concern has been raised that the effects 
of thermophily on protein composition would be obscured by effects such as G+C 
content and environment-specific effects [McDonald 2001 ] .  However, studies to date 
have focused on quite narrow datasets, and the expectation is that temperature effects 
should be identifiable from other effects by examining a broad range of proteins from 
a broad range of organisms and looking for trends common to the entire dataset. 
Furthermore, by distinguishing physicochemical properties from the outset, it may be 
possible to carry out a more specific analysis than previous analyses which have 
concentrated on composition asymmetries at all sites [Haney et al. 1 999, McDonald et 
al. 1 999] .  For instance, glutamine is more thermolabile when incorporated into a 
peptide chain, whilst the opposite is true for asparagine [Greenstein & Winitz 1 96 1 ] .  
Das & Gerstein [200 1 ] ,  who carried out a comparison o f  1 2  genomes across all three 
domains reported, among other things, that thermophilic proteins tend to have reduced 
amounts of glutamine and asparagine compared to mesophilic proteins. 
A large scale analysis is likely to be necessary in order to be able to 
distinguish between fluctuations in individual proteins and a consistent signaL The 
work would need to be carried out separately for informational and operational genes, 
and this would be interesting in itself, since, if a signature has been maintained for 
this length of time, it ought to be seen for eukaryote operational genes, but not for 
informational genes. The data reported by Das & Gerstein [2000] suggests that it will 
be possible to examine amino acid composition to see if a signature of past 
thermophily is detectable in mesophilic prokaryotes, though the signal may be weak. 
There are a number of other physicochemical factors that impact on protein 
1 6  
- --
-�----- � - -
- -- - -- -- - - -
thermostability, such as prevalence of salt bridges [Das & Gerstein 2000], increased 
hydrogen bonding, shortening of loops and helix dipole stabilisation [Jaenicke & 
Bohm 1998].  However, the emerging consensus is that it is the combination of a 
number of factors which contributes to thermostability, and rather than observing 
clear common differences between proteins from mesophilic and thermophilic 
organisms, there appears to be multiple routes to protein stability . Individual proteins 
may differ substantially in terms of properties which contribute to thermostability 
[Jaenicke & Bohm 1998]. 
Thermoreduction: once or twice ? 
One question which this proposed work might be able to answer is whether 
thermoreduction has occurred once, implying that the prokaryotes are a monophyletic 
group, or whether it has happened twice, once for archaea and once for bacteria. This 
question could not be approached using the data described in this thesis, hence the use 
of the terms prokaryote and eukaryote, even though it has been accepted throughout 
that this may not reflect a phylogenetic grouping. 
If thermoreduction has happened twice, convergent thermoadaptations should be 
observed between archaea and bacteria. Interestingly, circular genomes may be such 
an example, as the origins of replication, and the associated replication proteins in 
archaea and bacteria are best explained as having evolved independently [Myllykallio 
et al. 2000] . Another example is the presence of a lipid monolayer in thermophilic 
archaea as opposed to the bilayer found in bacteria and eukaryotes. Likewise, the 
latter two groups possess ester-linked lipids, while archaea possess more stable ether­
linked lipids [reviewed in Daniel & Cowan 2000]. Ether-linked lipids are however 
found in some thermophilic bacteria, making for a more complex picture. Forterre 
[ 1 996] has nevertheless argued that, especially given the presence of lipid monolayers 
only in thermophilic archaea, mechanisms of membrane stability have evolved 
independently in thermophilic archaea and bacteria. 
Because thermophily is not a single trait, but a descriptive term for lifestyle that 
comprises many traits, it is imperative to systematically look at a large number of 
traits to establish whether thermophily arose once or twice. The thermophilic LUCA 
hypothesis requires that thermophily arose only once, irrespective of whether the 
bacterial or eukaryotic rooting is correct (though in both cases, it also requires 
evidence for past thermophily in eukaryotes) . Finding that archaea and bacteria have 
independently adaptated to thermophily would be a falsification of the thermophilic 
LUCA hypothesis. Thermoreduction is consistent with thermophily evolving once, if 
the prokaryotes are monophyletic, whereas it is only consistent with independent 
adaptation to thermophily by archaea and bacteria under the bacterial rooting. 
Horizontal transfer would blur these distinctions (see table 1 ) ,  so not only is it 
necessary to establish the nature of thermophily in archaea and bacteria, but whether 
1 7  
1 )  common traits have been subject to horizontal transfer, and 2) whether the entire 
domain possesses such traits, or whether these are restricted to only some regions of 
the tree. In addition, it will be necessary to establish 3) whether traits unique to 
members of a domain are ubiquitous, and 4) whether there have been within-domain 
transfers . In this respect, correctly rooting the tree is not going to be sufficient to 
establish which of the two hypotheses are correct. 
I fully expect that horizontal transfer from endosymbiont to nucleus (but also 
other instances of horizontal transfer such as between archaeal and bacterial 
hyperthermophiles - [Aravind et al. 1999; Kyrpides & Olsen 1999; Nelson et al. 1 999; 
Forterre et al . 2000]) will complicate testing of the thermoreduction and thermophilic 
LUCA hypotheses, but on current expectations, it ought to be possible to establish 
such cases, and exclude them from the analysis [see Nara et al. 2000, for a discussion 
of this with respect to pyrimidine biosynthesis, a carbamoyl phosphate-dependent 
pathway] .  Furthermore, I expect the examination of glutamine-dependent and 
ammonia-dependent pathways, and pathways involving other thermolabile 
metabolites will not be straightforward to interpret. Additionally, while the presence 
of alternate pathways solve the problem of thermolabile metabolites (e.g. NAD(P), 
acetyl phosphate, A TP - see Daniel & Cow an [2000]), and should be detectable by 
comparative genome analyses, other mechanisms of thermoadaptation (such as the 
high catalytic efficiency of phosphoribosyl anthranilate isomerase as a means of 
abrogating phosphoribosyl anthranilate thermolability [Sterner et al. 1 996]) will not 
be amenable to bioinformatic analyses. 
The implicit assumption in the proposed work is that, at temperatures typical of 
mesophiles, thermoadaptations are not selectively disadvantageous, and hence may 
persist as relics. Certainly it is not expected that this should be the case for all such 
adaptations, such that it may be impossible to establish whether traits found in extant 
thermophiles date back to the common ancestor of one or both prokaryotic lineages. 
Nevertheless, if a general trend emerges from several unrelated datasets (metabolism, 
RNA and protein) it might be possible to use these data to test the thermoreduction 
hypothesis, and might enable an examination of the question of the monophyly of the 
prokaryotes independent of phylogenetic analyses. 
In spite of the potential pitfalls of such analyses, and consistent with 
thermoreduction twice rather than once (and thus thermoreduction over the 
thermophilic LUCA hypothesis) is the recent demonstration that the pathways of 
glutamate mischarging in archaea and bacteria have arisen by independent recruitment 
of enzymes involved in amino acid metabolism, as opposed to being related by 
descent [Tumbula et al. 2000] . At the time of writing papers 4&5, these data were not 
available, but are important for two reasons. First, they are consistent with other data 
(see above) that suggest archaeal and bacterial thermophiles are convergent rather 
than divergent, and second, they overturn the other major interpretation of 
mischarging, that it is a relic from the evolution of the genetic code [Di Giulio 2000] . 
1 8  
In concluding this thesis, I wish to underscore the inherent difficulties with 
phylogenetic approaches to understanding the nature of the LUCA, and the evolution 
of prokaryotes and eukaryotes .  A tree with three major groupings holds too little 
information to be able to establish the nature of the organism at the root. The tree is 
useful for systematic analyses, and should in principle be able to establish whether the 
prokaryotes are monophyletic, assuming that a reliable phylogenetic signal can be 
recovered. Nevertheless, even with the correct topology, it is not possible to infer the 
nature of the ancestor from the topology alone. As I have shown in this thesis, the 
traditional bacterial rooting could be correct and yet the LUCA could still be 
eukaryote-like. The 3-domain tree simply does not hold enough information to 
establish the direction of evolutionary change, that is, to determine ancestral and 
derived states. These have simply been assumed because the notion that prokaryotes 
predate eukaryotes has been taken as given. This thesis has provided an alternative to 
phylogenetic approaches which reveals their weakness in tenns of inferring the nature 
of the LUCA, and which has challenged the prokaryote dogma. 
A final note: the origin of DNA. 
In the sections dealing with the RNA world, I have considered the question of 
the origin of protein synthesis in depth but the question of the origin of DNA is only 
briefly mentioned. I have examined this question in depth [Poole et al. 2000, 
200I a,b] , but this work is not included in this thesis. Since I discuss this question in 
Poole et al. 1 998 and 1 999, I shall briefly state my major conclusions for 
completeness .  
Most significantly, I conclude that the RNA to DNA transition had to occur 
subsequent to the advent of genetically-encoded protein synthesis, and that the low 
coding capacity of RNA as a genetic material presents a major problem in 
understanding how ribonucleotide reduction arose. This is counter to earlier 
suggestions, notably by Benner et al. [ 1 989], who argue for an RNA world with a 
DNA genome, with protein synthesis arising later. In their account, Benner et al. 
[ 1989] argue for deoxyribonucleotide synthesis from glyceraldehyde-3-phosphate and 
acetaldehyde as opposed to ribonucleotide reduction. 
Ribonucleotide reduction is the only known pathway for de novo synthesis of 
deoxyribonucleotides, and requires protein radical chemistry. In all three classes of 
ribonucleotide reductase, a radical is generated and subsequently transferred to a 
cysteine residue, forming a thiyl radical. For this reduction to take place, a mechanism 
for radical generation, storage and specific control and transfer to the substrate is  
required. Other than radical generation, these roles could not be carried out by RNA, 
which is non-specifically cleaved by radicals, so either catalytic proteins predate DNA 
[Poole et al. 2000], or an alternative, chemically simpler but unobserved pathway 
existed [Benner et al. 1 989] , The latter is chemically feasible, given the presence of 
1 9  
the degradative pathway in deoxyribonucleotide salvage. Indeed it was considered the 
most likely route for deoxyribonucleotide synthesis, prior to the denonstration that the 
sole route was ribonucleotide reduction [reviewed in Reichard 1 989]. However, the 
pathway suggested by Benner et al. [ 1989] may simply not have been 'discovered' by 
evolution [Poole et al. 200 lb] ,  given that evolution is analogous to tinkering, not 
engineering [Jacob 1 977] .  Indeed, the evolution of ribonucleotide reduction as 
opposed to a simpler reaction may have been contingent on the presence of an 
established pathway for ribonucleotide synthesis and thus availability of 
ribonucleotides, with acetaldehyde and glyceraldehyde-3-phosphate perhaps not being 
available in large enough quantities for deoxyribonucleotide synthesis via this route 
[Reichard 1 989, Poole et al. 200 lb] .  
I f  ribonucleotide reduction was prerequisite for the advent of  DNA synthesis, then 
this causes problems for the RN A world theory, since, in the absence of proofreading 
and repair, RNA is not expected to be capable to maintaining sufficiently large 
amounts of genetic information for proteins of the complexity of ribonucleotide 
reductase to emerge (see the Darwin-Eigen cycle in Poole et al. [ 1 999]) .  I have 
proposed a possible solution to this problem [Poole et al. 2000] . 
In brief, I have argued that post-replicative 2'-O-methylation of RNA could have 
provided a more stable genetic material than RNA, having some, but not all of the 
features that makes DNA a more stable information storage molecule than RNA. 
Post-replicative 2'-O-methylation would eliminate the tendency for RNA to self­
cleave because the modification renders the reactive 2'-hydroxyl group of the ribose 
inactive. Consequently, 2'-O-methyl RNA would potentially be a more stable genetic 
material than unmodified RNA. Incomplete ribose modification, post-replicative 
versus pre-replicative modification (deoxyribonucleotides are synthesised prior to 
DNA synthesis) and deep groove hydrophobicity resulting from extensive 
methylation make 2'-O-methyl RNA inferior to DNA, and hence provide selection for 
replacement. 2'-O-methylation of RNA is found in all three domains of life and has 
been argued to be a feature of the RNA world, and the theory describes a plausible 
scenario for the recruitment of 2'-O-methylation from functional RNAs to genomic 
RNA [Poole et al. 2000) . 
I have also examined the second stage in the RNA to DNA transition, where uracil 
was replaced by thymine [Poole et al. 2001 a] .  The substrates for ribonucleotide 
reduction are ATP, CTP, GTP and UTP (some ribonucleotide reductases make use of 
diphosphate substrates), forming dATP, dCTP, dGTP and dUTP. dUTP is 
subsequently converted to dTTP, and this indirect pathway suggests that the U to T 
transition occurred subsequent to the replacement of ribose by deoxyribose. However 
the standard argument for the replacement of uracil by thymine in the evolution of 
DNA is flawed. It suggests that replacing uracil with thymine eliminated the problem 
of cytosine deamination to uracil, permitting deaminations to be identified since uracil 
was no longer native to DNA. This requires evolutionary forethought, since thymine 
only provides a means of recognising deaminations, not of repairing them. If repair 
20 
evolved before thymine replaced uracil, this is also problematic, as there would then 
be no selection for thymine to replace uracil. This problem, and the question of what 
selection pressure might account for the U to T transition is discussed in PooIe et al. 
[2001a]. 
Concluding remark. 
In this thesis, a case has been made for continuity from the RNA world through to 
the emergence of the three domains, eukaryotes, bacteria and archaea. The major 
conclusion is that using the RNA world as outgroup to root the tree of life suggests 
that eukaryotes have retained more ancestral features than prokaryotes. From this 
conclusion it is then possible to examine the differing modes of evolution in 
prokaryotes and eukaryotes, and a biological basis for these differences is described. 
The work is based on the established principles of the error threshold (Eigen limit), 
the relationship between rate of diffusion and catalytic efficiency, the 
physicochemical properties of RNA, r- and K-selection and standard evolutionary 
theory. I believe it provides a significant improvement over previous studies on early 
evolution in that a wide range of phenomena can be explained consistently, as 
opposed to being treated as unrelated problems. Importantly, while the model 
described may not be correct, it is testable, as described above. 
References. 
Altman S, Kirsebom L: Ribonuclease P. In: Gesteland R, Cech T, Atkins J, eds. The 
RNA World, 2nd Ed. New York: Cold Spring Harbor Laboratory Press 1 999, pp 
35 1 -380. 
Aravind L, Tatusov RL, Wolf YI, Walker DR, Koonin EV: Evidence for massive 
gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet 
1 998, 14, 442-444. 
Benner SA, Ellington AD, Tauer A: Modern metabolism as a palimpsest of the RNA 
world. Proc Natl Acad Sci USA 1989, 86, 7054-7058. 
Brion P, Westhof E: Hierarchy and dynamics of RNA folding. Annu Rev Biophys 
Biomol Struct 1 997, 26, 1 13- 137. 
Brown JW, Haas ES, Pace NR: Characterization of ribonuclease P RNAs from 
thermophilic bacteria. Nucleic Acids Res 1 993, 2 1 , 67 1 -679. 
Chibnall AC, Westall RG: The estimation of glutamine in the presence of asparagine. 
Biochem J 1 932, 26, 1 22- 1 32. 
Choi B-S, Roberts JE, Evans JNS, Roberts MF: Nitrogen metabolism in 
Methanobacterium thermoautotrophicum: a solution and solid-state 15N NMR 
study. Biochem 1986, 25, 2243-2248. 
2 1  
Choi B-S, Roberts MF: 1 5N-NMR studies of Methanobacterium 
thermoautotrophicum: comparison of assimilation of different nitrogen sources. 
Biochim Biophys Acta 1 987, 928, 259-265. 
Daniel RM, Cowan DA: Biomolecular stability and life at high temperatures .  Cell 
Mol Life Sci 2000, 57, 250-264. 
Das R, Gerstein M: The stability of thermophilic proteins: a study based on 
comprehensive genome comparison. Funct Integr Genomics 2000, 1 ,  76-88.  
Di Giulio M:  The RNA world, the genetic code and the tRNA molecule. Trends Genet 
2000, 1 6, 17- 19 .  
Eichler J ,  Moll R: The signal recognition particle of Archaea. Trends Microbiol 200 1 ,  
9 ,  1 30- 1 36. 
Friedman HC, Thauer RK: FEMS Microbiol Lett 1 987, 40, 179- 1 8 l .  
Forterre P :  A hot topic: the origin of hyperthermophiles. Cell 1 996, 85, 789-792. 
Forterre P, Bouthier De La Tour C, Philippe H, Duguet M: Reverse gyrase from 
hyperthermophiles: probable transfer of a thermoadaptation trait from archaea to 
bacteria. Trends Genet 2000, 1 6, 1 52- 1 54. 
Galtier N, Tourasse N, Gouy M: A nonhyperthermophilic common ancestor to extant 
life forms. Science 1 999, 283 :220-22 l .  
Gilbert JB, Price YE, Greenstein JP: Effect of anions on the non-enzymatic 
desamidation of glutamine. J BioI Chem 1949, 1 80, 209-2 1 8. 
Greenstein JP, Winitz M: Glutamic acid and glutamine. In Chemistry of the Amino 
Acids. John Wiley and Sons, NY 1 96 1 ,  pp I 929- 1 954. 
Hallberg RL, Kraus KW, Hallberg EM: Induction of acquired thermotolerance in 
Tetrahymena thermophila: effects of protein synthesis inhibitors. Mol Cell BioI 
1 985, 5, 206 1 -2069. 
Handy J, Doolittle RP: An attempt to pinpoint the phylogenetic introduction of 
glutaminyl-tRNA synthetase among bacteria. J Mol Evol 1 999, 49, 709-7 15 .  
Haney PJ, et al. :  Thermal adaptation analysed by comparison of  protein sequences 
from mesophilic and extremely thermophilic Methanococcus species. Proc Natl 
Acad Sci USA 1 999, 96:3578-3583. 
Hiller R, Zhou ZH, Adams MW, Englander SW: Stability and dynamics in a 
hyperthermophilic protein with melting temperature close to 200 degrees C.  Proc 
Natl Acad Sci USA 1 997, 94, 1 1 329- 1 1 332. 
Horiike T, Hamada K, Kanaya S ,  Shinozawa T: Origin of eukaryotic cell nuclei by 
symbiosis of Archaea and Bacteria is revealed by homology-hit analysis. Nat Cell 
Bioi 200 1 ,  3, 2 10-2 14.  
lbba M, Curnow AW, SoIl D:  Aninoacyl-tRNA synthesis:  divergent routes to a 
common goal. Trends Biochem Sci 1 997, 22:39-42. 
lbba M, SolI D: The renaissance of aminoacyl-tRNA synthesis. EMBO Rep 200 1 ,  2, 
382-387 . 
Jacob F: Evolution and Tinkering. Science 1 977, 196, 1 1 6 1 - 1 1 66. 
22 
Jaenicke R, Bohm G: The stability of proteins in extreme environments. CUff Opin 
Struct BioI 1 998, 8, 738-748. 
KowaIak lA et al. :  The role of posttranscriptional modification in stabilization of 
transfer RNA from hyperthermophiles. Biochemistry 1 994, 33:7869-7876. 
Kyrpides NC, OIsen GJ: Archaeal and bacterial hyperthermophiIes: horizontal gene 
exchange or common ancestry?/Aravind et al. :  Reply. Trends Genet 1 999, 15 , 298-
300. 
Lamour V, Quevillon S, Diriong S, N'Guyen VC, Lipinski M, Mirande M: Evolution 
of the Glx-tRNA synthetase family: the gIutaminyl enzyme as a case of horizontal 
gene transfer. Proc Natl Acad Sci USA 1994, 9 1 , 8670-8674. 
Legrain C, Demarez M, Glansdorff N, Pierard A: Ammonia-dependent synthesis and 
metabolic channelling of carbamoyl phosphate in the hyperthermophilic archaeon 
Pyrococcus furiosus. Microbiol 1 995, 14 1 , 1093- 1 099. 
Lundberg P, Harmsen E, Ho C, Vogel HJ: Nuclear Magnetic Resonance Studies of 
Cellular Metabolism. Anal Biochem 1990, 1 9 1 ,  1 93-222. 
McDonald JH, Grasso AM, Rejto LK: Patterns of temperature adaptation in proteins 
from Methanococcus and Bacillus. Mol BioI Evol 1999, 16 :785-790. 
McDonald JH: Patterns of temperature adaptation in proteins from the bacteria 
Deinococcus radiodurans and Themus thermophilus. Mol BioI EvoI 200 1 ,  1 8 :741 -
749. 
McMullin ER, Bergquist DC, Fisher CR: Metazoans in Extreme Environments : 
Adaptations of Hydrothermal Vent and Hydrocarbon Seep Fauna. Gravit Space 
BioI Bull 2000, 1 3, 1 3-23. 
Miller SL, Bada JL: Submarine hot springs and the origin of life. Nature 1 988, 334, 
609-6 1 1 . 
Moulton V et al. :  RNA folding argues against a hot-start origin of life. J MolEvol 
2000, 5 1 :4 1 6-42 1 .  
Myllykallio H ,  et al. :  Bacterial mode of replication with eukaryotic-like machinery in 
a hyperthermophilic archaeon.Science 2000, 288, 221 2-2215 .  
Nara T, Hashimoto T, Aoki T :  Evolutionary implications of  the mosaic pyrimidine­
biosynthetic pathway in eukaryotes. Gene 2000, 257, 209-222. 
Nelson KE, et al . :  Evidence for lateral gene transfer between Archaea and Bacteria 
from the genome sequence of Thermotoga maritima. Nature 1999, 399, 323-329. 
Noon KE, Bruenger E, McCloskey lA: Post-transcriptional modifications in 1 6S and 
23S rRNAs of the archaeal hyperthermophile Sulfolobus solfataricus. J Bacteriol 
1 998, 1 80, 2883-2888. 
Poole AM, Jeffares DC, Penny D: The path from the RNA world. J Mol Evol 1 998, 
46, 1 - 1 7  
Poole A ,  Jeffares D ,  Penny D :  Early evolution: prokaryotes, the new kids on the 
block. BioEssays 1999, 2 1 , 880-9 
Poole A, Penny D, Sjoberg B-M: Methyl-RNA: an evolutionary bridge between RNA 
and DNA? Chem BioI 2000, 7, R207-R2 1 6. 
23 
Poole A, Penny D, Sjoberg B-M: Confounded cytosine ! Tinkering and the evolution 
of DNA. Nat Rev Mol Cell BioI 200 1 ,  2, 147- 1 5 1 .  
Poole AM, Logan DT, Sjoberg B-M: The evolution of the ribonucleotide reductases :  
much ado about oxygen. J Mol Evol 2001 b,  accepted. 
Reichard P: Commentary on 'Formation of deoxycytidine 5'-phosphate from cytidine 
5 '-phosphate with enzymes from Escherichia coli' by P Reichard & L Rutberg. 
Biochim Biophys Acta 1 989, 1000, 49-50. 
Ribeiro S, Golding, GB: The mosaic nature of the eukaryotic nucleus .  Mol BioI Evol 
1 998, 15 , 779-788. 
Rivera MC, Jain R, Moore JE, Lake JA: Genomic evidence for two functionally 
distinct gene classes. Proc Natl Acad Sci USA, 1998, 95, 6239-6244. 
Robertson DE, Noll D, Roberts MF: Free amino acid dynamics in marine 
microorganisms. J BioI Chem, 1 992, 267, 14893-14901 .  
Rothschild LJ, Mancinelli RL: Life in extreme environments. Nature 200 1 ,  409, 
1092- 1 101 . 
Silva JL, Foguel D, Royer CA: Pressure provides new insights into protein folding, 
dynamics and structure. Trends Biochem Sci 200 1 ,  26, 6 1 2-6 1 8. 
Smith DR, Doucette-Stamm LA, Deloughery C, Lee H, Dubois J, Aldredge T, 
Bashirzadeh R, Blakely D, Cook R, Gilbert K, Harrison D, Hoang L, Keagle P, 
Lumm W, Pothier B, Qiu D, Spadafora R, Vicaire R, Wang Y, Wierzbowski J, 
Gibson R, Jiwani N, Caruso A, Bush D, Safer H, Patwell D, Prabhakar S ,  
Mcdougall S ,  Shimer G,  Goyal A,  Pietrokovski S ,  Church GM, Daniels CJ, Mao J­
I, Rice P, Nolling J, Reeve IN: Complete Genome Sequence of Methanobacterium 
thermoautotrophicum �H: Functional Analysis and Comparative Genomics. J Bact 
1997, 1 79, 7 1 35-7 155 .  
Sterner R, Kleeman GR, Szadkowski H,  Lustig A,  Hennig M, Kirschner K: 
Phosphoribosyl anthranilate isomerase from Thermotoga maritima is an extremely 
stable and active homodimer. Protein Sci 1 996, 5 , 2000-2008. 
Stroud RM, Walter P: Signal sequence recognition and protein targeting. CUff Opin 
Struct BioI, 1 999, 9, 754-759. 
Tumbula DL, Becker HD, Chang W -Z, SolI D: Domain-specific recruitment of amide 
amino acids for protein synthesis. Nature 2000, 407, 106- 1 10. 
Van de Casteele M et al. :  Molecular physiology of carbamoylation under extreme 
conditions: what can we learn from extreme thermophilic microorganisms? Comp 
Biochem Physiol A 1997, 1 1 8 , 463-473.  
Vickery HB, Pucher GW, Clark HE, Chibnall AC, Westall RG: The determination of 
glutamine in the presence of asparagine. Biochem J 1935, 29, 27 10-2720. 
24 
Poole A & Penny D.  
Does endosymbiosis explain the origin of the nucleus? 
Nature Cell Biology 3, E173 (2001) .  
Letter in response to: 
Horiike T, Hamada K, Kanaya S & Shinozawa T. 
Appendix 
Origin of eukaryotic cell nuclei by symbiosis of Archaea and Bacteria is revealed by 
homology-hit analysis. 
Nature Cell Biology 3,  210-214  (2001 ).