Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere without the permission of the Author. STRUCTURAL STUDIES ON TH� NUCLEAR LAMINS AND OTHER INTERMEDIATE FILAMENT PROTEINS A thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Biophysics at Massey University. James Frederick CONW AY October, 1989 H.�!;sey University Library. Thesis Copyright Form Title of thesis: SfnALfuroJ, S-fuJieJ (h.ilt.£ rV{).C(� kn<{�J a!t4 o!frv.- fv,d�(ah Fik� lfuf��J (2) (3) I give permission for my .thesis to be �de ava110Lle to readers in the Hassey University Library under condition!; determined by the Librarian. I do not �ish �J thesis to be �de available to re�dcrs �ithout my �ritten consent for' months. I agree that my thesis, or a copy, may be sent to another institution under conditions determined .. ?y the Librarian. I do not �ish my thesis, or a copy, to be sent to '-Inother institution �ithout my vritten consent for month!:, 1 agree that my thesis may be copied for Library u!:e. I do not �ish my thesis to be copied for Libr�ry U$C for " lDon ths. ------ The copyright of this t�esis belon gs to the author. Re�der� �U5t sign their name in the space belo� to shov thllt they recognise this. They are aske.d to add their pe�nent nddrcss. NAME }JlD ADDRESS DATE page ii ABSTRACT A number of aspects of IF chain and molecular structure, as well as molecular aggregation, have been examined. These include the delineation of periodicities in the sequences of structural domains of IF proteins, the distribution of amino acid residues within the heptad substructure, the flexibility of the peptide backbone, the extent of homology among the IF proteins, the packing of chains in the dimeric molecule, and the axial packing of molecules in the IF. Particular focus has been placed on the newly sequenced type V IF proteins (the nuclear lamins and the Helix pomotia B protein) and on a type m IF protein (peripherin). A parallel in-register arrangement of chains in the molecule is predicted for peripherin and the type V chains from a consideration of interchain ionic interactions. Also, periodicities in the linear distribution of charged residues in the rod domains of these proteins are shown to be comparable with periods in other IF chains. Ionic interactions between lamin molecules have been used to assess the likely modes of molecular aggregation in an in vitro assembly and a model is presented which also satisfies the constraints imposed by electron microscope data. In this model, antiparallel arrays of molecules are half-staggered and an extended conformation for the carboxy-terminal domains is predicted. Simple explanations are given for the transition between paracrystalline and lattice structures and for the disassembly of the lamin meshwork concommitant with hyperphosphorylation. The method of calculating intermolecular ionic interaction profiles is enhanced and a new, three-dimensional method IS developed. The inhomogeneous distribution of residues in the heptad substructure can be correlated to the coiled-coil structure and chain packing in the molecule. In particular, the -75% occupancy rate of apolar residues in the internal!! and d heptad positions is shown to be a general feature of a-fibrous proteins. Variability of re si dues in the outer 12, £ and f positions indicates that structural or functional specificity in the rod domain may be determined by these parts of the sequence. The predicted flexibilities of IF chains have been compared to the underlying structure for the chains. Evidence from sequence homology studies suggests that several new sUbtypes are appropriate in the classification scheme. For the hard keratins the terms types la and Ha are proposed and for the soft keratins, types Ib and lib; the need to separate the neurofilaments into the type IV class separate from the type III IF chains is confmned; and division of the type V chains into cytoskeletal and karyoskeletal groups is indicated. A more detailed delineation is made of regions within the amino- and pageiii carboxy-terminal domains than has been possible previously. Periodic features of the homology profiles for the rod domain are examined and found to be similar to those in the linear distribution of residues in the amino acid sequences. Comparison between amphibian and mammalian keratins, and also between hard and soft keratins reveals that type II chains are maintained at a higher level of fidelity than type I chains. Consensus rod domain sequences are derived for the various IF sUbtypes: absolutely conserved regions of primary structure identify types or subtypes. page iv ACKNOWLEDGEMENTS It is with great pleasure that I acknowledge my chief supervisor. Prof. D.A.D. Parry. for his assistance and commitment to this work. His indefatigable energy and high standards have contributed largely to the work described herein and I look forward to working with him in the future. Prof. P.T Callaghan and the staff of the Physics and Biophysics Department at Massey University have been very supportive during the course of my studies and I appreciate their efforts on my behalf. The other doctoral and graduate students have provided a friendly atmosphere within which to work and I wish them every success in their endeavours. Mr P.M. Ngan. Director of the Massey University Image Analysis Unit. has been of great assistance in providing access to the Unit' s facilities and in many interesting discussions on a variety of topics. I much appreciate the kind hospitality provided by Drs. P.M. Steinen and A.C. Steven during the course of a visit to the National Institutes of Health in 1986 and also for the extra impetus they have provided in the completion of this work. I look forward to continued collaboration with them. In addition. I acknowledge the valuable contributions towards the cost of that trip from the Royal Society of New Zealand. the Dean of the Science Faculty at Massey. Prof. G.N. Malcolm. and the Fibrous Protein Merit Award. Thanks also to my parents for their encouragement and assistance during the earlier period of my studies. Finally. special thanks to Sharon Wards for her support and encouragement during the course of my PhD studies. I am indebted to her for all her time spent directly and indirectly in the preparation of this manuscript. Some amino acid sequences were kindly provided prior to publication by: Or RA. Lazzarini (human NF-M chain) Or SSM. Chin (rat NF-M chain) Professor GE. Rogers and Or BC. Powell (a type 11 keratin chain from sheep) Or PM. Steinen and Or DR. Roop (mouse M50k and M55k chains) Electron micrographs were kindly provided by: Or PM. Steinen (Figure 1-1 of reconstituted IF) Professor GE. Rogers (Figure 1-2 of wool ftlaments in cross-section) Or AE. Goldman and Or RD Goldman (Figure 4-2 of lamin paracrystals) Or C. Cohen (Figure 4-7 of a tropomyosin lattice/paracrystal structure) page v TABLE OF CONTENTS 1. An Overview of Intermediate Filament Structure ................ 1 1.1. ........ Early Structural Work on Keratin IF ................. ........... . . . . . 2 1.2 .... ..... Soft Keratin and Other IF .............................................. 7 1.3 ...... ... Current Models of Intennediate Filament Structure .............. 10 1.4 ......... Conservation of Amino Acid and Gene Sequences ............... 15 1.5 ......... Lamin, Peripherin and the Helix A and B proteins .... , .......... 17 1.6 ......... Function of Intermediate Filaments ................................. 22 1.7 ......... Structural Fonn of the Thesis .... " ............................ , .... 25 2. Pr imary S truct u re .................................................. 2 6 2.1 ......... Fourier Analysis ...................................................... 29 2.1.1 ....... Fourier Analysis - Method ....................................... 30 2.1.2 ....... Fourier Analysis - Results ............................... . ....... 33 2.2 . . ....... Residue Distribution in the Heptad . . ..... .... . ....... . . ...... '" ... 37 2.3 ......... Flexibility ................................................ . ............. 41 2.4 ... . . . . .. Summary ..... ...... . . ..... ........... . ....................... . . .. . .... 47 3. Sequ ence H omol ogy ................ ............................... 5 0 3.1 ......... Homology Statistics .................. . . ...................... ........ 51 3.1.1. . ..... Amino Acid Homology Score, ha ............................... 52 3.1.2 . . ..... Residue Homology Score, hr ... .... . ...................... . ..... 52 3.1.3 . . ..... Segment Homology Score, hs . . . . .... ................... ........ 54 page vi 3.2 ......... Homology within the Rod Domain ............. .................... 54 3.2.1 ....... Coiled-Coil Segments ............................. ............... 56 3.2.2 ....... Link Segments .................................................... 57 3.2.3 ... .... Type V Chains ...... .... ..................................... ..... 61 3.3 ........ . Subtyping on the basis of Homology ................ ........... ... 61 3.3.1. ...... Hard and Soft Keratin IF Chains ............................... 61 3.3.2 ....... Type IV IF Chains ................................................ 62 3.4 ......... Homology among the 'H' subdomains ............................ 63 3.5 ......... Comparison of Amphibian and Mammalian Scores .............. 66 3.6 ......... Periodic Features in the Homology Score Distributions ......... 69 3.7 ......... Consensus Sequences for IF Chain Types ........................ 78 3.8 ......... Summary .............................................................. 81 4. Secondary and Tertiary Structure ............. : . . . . . . . • . . . . . . . . . . 8 4 4.1. ........ ID Ionic Interactions Between Chains ........ . .. . ................. 85 4.2 ....... .. ID Ionic Interactions Between Molecules .. ....................... 90 4.3 .. . . . .... Modelling Lamin ................... ............... ... . . ............... 96 4.4 ......... 3D Ionic Interactions Between Molecules ....................... . 110 4.4.1 ....... Generation of Coordinates for a Coiled-Coil Molecule ...... 112 4.4.2 ....... Determination of Interactions ............... .... . ............... 113 4.4.3 ....... Analysis of the Interaction Maps ................. . ............. 117 4.5 ......... Summary .... ..... . ............ ....................... . .. ............. 120 5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 page vii Appendices .......................................................... 1 3 1 Appendix A : Zero-Filling prior to Fourier Transfonnation .................. 131 Appendix B : Fourier Transforms ............................................... 135 Appendix C : Curve Smoothing ................................................. 149 Appendix D : Intermolecular Ionic Interactions ................................ 150 B ibl iography .................. . .................................. ... 161 Publications ........................ .......... .... . . . ............. . .. 189 page viii LIST OF FIGURES Figure 1-1 .... Electron micrographs of IF in vitro. . . .. . . . .. .. . .. . .. . . . . .. .. . .. . . . .. 1 Figure 1-2 . ... Electron micrograph of wool microfibrils in cross-section ....... 3 Figure 1-3 .... Schematic representation of the IF protein chain ................... 9 Figure 1-4 .... Schematic representation of the heptad substructure .... . ........ 12 Figure 1-5 . . . . Schematic comparison of IF and lamin chain structures . . ... .... 20 Figure 2-1 .... Example of applying the baseline correction operation prior to the Fourier transfonn ...................................... 32 Figure 2-2 .... Comparison of the human and Xenopus lamin A protein sequences .............................................................. 35 Figure 2-3 .... Flexibility profiles for a selection of IF chains ................... .43 Figure 3-1 ... . Homology profIles for the IF chain types .................. ....... 55 Figure 3-2 .... Comparison of the link segments Ll. L12 and L2 ............... 59 Figure 3-3 .... Fourier transforms of the homology profiles . ..................... 70 Figure 3-4 . . . . Consensus sequences of the rod domain . .. . ... ................... 79 Figure 4-1 . . . . Intermolecular ionic interaction curves for human lamin A . ... . . 92 Figure 4-2 ... . Electron micrographs of lamin paracrystals . . . . . ... . ...... . .. . . . .. 98 Figure 4-3 . . . . Diagram of staggered arrays of lamin molecules in Model A . . 100 Figure 4-4 .. . . Diagram of staggered arrays of lamin molecules in Model B.. 101 Figure 4-5 .. . . The alignment of conserved sequences .......... . ..... . ......... 106 Figure 4-6 .... Comparison of Models A and B ..... ..................... . ....... 108 Figure 4-7 . ... Schematic of a lattice collapsing into a paracrystal .... . . . . . . . . . . 109 pageix Figure 4-8 .... Electron micrograph of a tropomyosin lattice/paracrystal s tructure ............................................................... 110 Figure 4-9 .... Schematic of the relative orientations of molecules ............. 112 Figure 4-10 . . Schematic of a pair of segment IB dimers ....................... 116 Figure 4-11 .. Intensity map of the 30 intermolecular ionic interactions.. . . .. 117 Figure A-I . .. Rectangle and sinc functions - Fourier pairs .................... 133 Figure B-1 . .. Fourier transfonns for peripherin. .. . . . .. . . . .. . . . .. . . . . . . .. .. . .. .. 136 Figure B-2 ... Fourier transforms for human lamins A and C . .... . .. . .. . ... .. 138 Figure B-3 . .. Fourier transforms for Xenopus lamin A . . . ........ . . . '" ...... 141 Figure B-4 . " Fourier transforms for Xenopus lamin B ..... . . . . . . . . . . .. . . . . . .. 144 Figure B-5 . . Fourier transforms for Helix pornotia B ......................... 147 Figure 0-1 ... Intermolecular ionic interaction curves for peripherin .......... 151 Figure 0-2 ' " Intermolecular ionic interaction curves for human lamin A . . .. 154 Figure 0-3 ... Intermolecular ionic interaction curves for Xenopus lamin A. 156 Figure D-4 . .. Intermolecular ionic interaction curves for Xenopus lamin B. 158 Figure D-5 . . . Intermolecular ionic interaction curves for Helix pomotia B .. 160 page x LIST OF TABLES Table 1-1 ..... Lengths of the rod domain segments for IF types I-N . ......... 11 Table 2-1 ..... IF amino acid sequences ... . .. . ... . ... . ....... . . .... . . .. . ..... 27 Table 2-2 . . . . Comparison of periods present in peripherin and other type ID IF proteins .................................. ... ......... . .... 33 Table 2-3 ..... Peaks resulting from multiplying the Fourier transforms together ..... ........ . . .......... . ...................................... 36 Table 2-4 ..... Residue distribution in the heptad for IF and myosins ...... ... . . 38 Table 2-5 .. ... Mean flexibility indices for chain segments from a selection of IF chains ............................ .... ............. .. 44 Table 3-1 . . ... Look-up table for mixed homology scores ... . . . . . ................ 53 Table 3-2 .. ' " Look-up tables for acidic homology. basic hOqlology and large apolar homology scores ............... ....... ..... . ..... 54 Table 3-3 ..... Regions of high sequence homology (h�90%) .................. 56 Table 3-4 ... . . Regions of low sequence homology (hr<60%) ................... 57 Table 3-5 . . . . . Mean segment homology scores ..... . ....... . . ... . . .... ... . ... . .... 58 Table 3-6 .. . .. Mean segment homology scores for the rod domain segments in soft and hard keratins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Table 3-7 ..... Lengths of the homologous subdomains HI and H2 ...... . . . . . . 64 Table 3-8 . . . . Extents of the structural domains in IF chains .................... 67 Table 3-9 . .. . . Mean segment homology scores for the rod domain segments in amphibian and mammalian keratins ......... . ........ 68 Table 3-10 .... A selection of the most significant periodicities in the homology proftles .......... . . . . ..... ......... . .. . . . . . .......... ...... 75 page xi Table 3-11. ... Mean residue homology scores for each position in the heptad substructme ................................................ ... 78 Table 3-12 .... Percentage occurrence of highly conserved residues ............ . 81 Table 4-1 ..... Interchain ionic interactions for peripherin and type V IF ...... . 86 Table 4-2 ..... Interchain ionic interactions for a selection of IF ................. 89 Table 4-3 ..... Interchain ionic interactions per dual heptad ........... ............ 89 Table 4-4 ... . . Significant intermolecular ionic interactions between human lamin A IIlOlecules ...................... ..................... 95 Table 4-5 ..... Significant ionic interactions between peripherin IIlOlecules ..... 96 Table 4-6 ... .. Volume calculation for the C-tenninal domains of human lamins A and C ............................................... ........ 104 Table 4-7 ..... Ionic interactions used for Models A and B . ....... . . .. ... . ..... 105 Table 4-8 " ... Values used to derive a set of five pitch lengths for the coiled-coil. ............................................................ 111 Table 4-9 . . . .. Starting set coordinates in an undistorted a-helix ............... 113 Table 4-10 .... Coordinates for a pair of segment IB dimers . ............. ... . . 114 Table 4-11.. .. Selection of the highest 3D interaction scores (antiparallel) . .. 118 Table 4-12 .... Selection of the highest 3D interaction scores (parallel) ........ 119 Table B-1.. ... Peaks in the Fourier transfonns for peripherin ....... . .. . . . ..... 135 Table B-2 ..... Peaks in the Fourier transforms for human lamins A and C . .. 137 Table B-3 ..... Peaks in the Fourier transforms for Xenopus lamin A ....... . . 140 Table B-4 ..... Peaks in the Fourier transfonns for Xenopus lamin B ... . .... . 143 Table B-S ..... Peaks in the Fourier transfonns for Helix pomotia B ... . . . . .. . 146 Table D- l ..... Peaks in the ionic interaction curves for peripherin .. . . ..... . . . . 150 page xii Table 0-2 ..... Peaks in the ionic interaction curves for human lamin A . . . . . . . 153 Table 0-3 ..... Peaks in the ionic interaction curves for Xenopus lamin A . .. . 155 Table 0-4 ..... Peaks in the ionic interaction curves for Xenopus lamin B . . . . 157 Table 0-5 ..... Peaks in the ionic interaction curves for Helix pomotia B ..... 159 CHAPTER 1: Introduction page 1 1. AN OVERVIEW OF INTERMEDIATE FILAMENT STRUCTURE Intermediate filaments (IF) are long, unbranched structures found in a diverse range of cell types (Figure 1 - 1 ). They fonn one of three major classes of cytoskeletal networks and are so-called because their diameters (-1 0 run) lie intermediate in size between the smaller actin-containing microfilaments and the larger diameter microtubules and myosin-containing thick filaments. The recent addition of the nuclear lamin proteins into the IF family has shown that IF networks are found not only throughout the cytoplasm but also within the cell nucleus where they are thought to provide at least part of the structural integrity of the nuclear envelope. The IF proteins were initially classified according to the cell types with which they were associated. The first of these classes comprised the relatively large group of keratins from epithelial cells; the second class contained the desmin protein found in myogenic cells; glial fibrillary acidic protein (GF AP) in astroglial cells, and vimentin in cells of mesenchymal origin, made up the third and fourth classes respectively. Figure 1-1 Examples of the 10 nm-diameter intermediate filament (IF) structures in vitro (separate images are not to scale). Branching filaments are evident as are filaments crossing over each other. Figure courtesy of Dr P.M. Steinert. CHAPTER 1: Introduction page 2 (Vimentin, however, was also found in cells that expressed other IF). The fifth class encompassed the neurofilaments from neuronal cells and was further divisible into three sub-classes according to the relative molecular weights of the chains. As amino acid sequence data became available for the various IF chains, an altem�tive classification system based on sequence homology present in the major a-helix­ containing fraction of the IF proteins largely replaced the cell-specific nomenclature. The early sequence studies of Crewther and co-workers on wool keratins revealed two kinds of a-helix-rich segment: type I had a net acidic character and type IT was neutral­ basic. Subsequent keratin sequences, primarily from the epidermal keratins, were found to fall into these same two groups and consequently the chains from which these characteristic sequences were derived were termed type I and type II IF chains. The equivalent a-helical regions in desmin, vimentin and GF AP showed greater sequence homology with one another than with the keratins and were classified together as type III IF chains. Although the neurofilament proteins were initially included within the type III grouping, increasing evidence has indicated that they are members of a distinct group and they are now referred to as type IV IF chains. Several recent additions to the IF family of proteins have been made: peripherin, a type III IF protein; the nuclear lamin proteins which are now established as a new IF type-protein - type V; and the Helix pomotia B protein which shows structural homology with the nuclear lamins. Since the presence of cell-specific IF indicates a diversity of function, it is pertinent to ascertain whether this translates into a number of variants on the basic model for IF structure. The low-resolution substructures of some IF have been partially defined in the last five years or so and this will form the basis of the work described here. This thesis, therefore, represents a collection of studies on the structure and aggregation of IF molecules with special reference to the more recently characterized members of the IF family - the nuclear lamin proteins in particular but also peripherin and the Helix pomotia B protein. 1. 1 Early Structural Work on Keratin IF The structure of wool keratin has been a focus for research over the past 50 years due largely to the commercial significance of the wool fibre. The keratin IF (originally termed microfibrils) in wool, hair and quill are about 10 nm in diameter and are embedded in a matrix of proteins of high-sulphur and of high-glycine-tyrosine content (Figure 1-2). Keratins from these sources are termed 'hard' to distinguish them from the 'soft' (or epidermal) keratins found in the stratum corneum of skin. IF in hard keratin are preferentially oriented parallel to the axis of the fibre and have proven to be ideal subjects for X-ray diffraction studies. In contrast the orientation of IF in the soft CHAYrER 1: Introduction page 3 keratins is much poorer and consequently few X-ray diffraction data have been recorded. Pioneering work in this area was carried out by Astbury and co-workers who studied hard keratinized tissues from a variety of sources (Astbury and Street, 1931; Astbury and Marwick, 1932; Astbury and Woods, 1933). They noted a number of distinctive X-ray diffraction patterns: a (from wool keratin), J3 (from stretched mammalian hard keratin), the feather-pattern (from hard avian and reptilian tissue), and the amorphous pattern (from the cuticle of animal hair). The terms 'a-keratin' and 'a-fibrous protein' are derived from this nomenclature and refer to proteins that give rise to the a-pattern. Astbury and Woods (1933) proposed that the a-pattern was generated by a folded structure and that the J3-form of keratin arose from stretching the molecular chains into an almost fully extended conformation. Current descriptions of the J3-structure are based on the pleated sheet models of Pauling and Corey (1951 b, 1953b) and will not be discussed further here (see, for example, Fraser et ai, 1972 and Fraser and MacRae, 1973b). A number of polypeptide chain structures were proposed to explain the a-pattern (see for example, Astbury and Bell, 1941; Huggins, 1943; Astbury et ai, 1948; Ambrose et ai, 1949; Bragg et aI, 1950; Pauling and Corey, 1950). The model of Pauling and Corey (elaborated in Pauling and Corey, 1951a and Pauling et ai, 1951) placed the residues in a helical arrangement, and two variants were described: one had 3.7 residues per turn of helix and the other had 5.1 residues per turn. These were termed the a-helix and the y-helix respectively (Pauling and Corey, 1951a) and featured intrachain hydrogen bonds and planar amide groups. Perutz (1951a,b) eliminated from consideration all models but the Pauling and Corey a-helix by observation of a meridional reflection of spacing 0.149 nm in the X-ray patterns of poly-y-benzyl-L­ glutamate, keratin, and hremoglobin. Only the a-helix was expected to give rise to this spacing which corresponds to the axial rise per residue. (The 0.149 nm meridional Figure 1-2 Cross-section of intennediate filaments in wool. Figure courtesy of Professor G.E. Rogers. CHAPTER 1: Introduction page 4 reflection was originally noted by MacArthur, 1943, in the X-ray diffraction pattern from African porcupine quill tip but was interpreted in terms of repeating sidechains along the polypeptide chain. By using the synthetic polypeptide, poly-r-benzyl-L­ glutamate, Perutz demonstrated that the 0.149 nm reflection was independent of the specific natures of the sidechains). Cochran et al (1952) calculated the Fourier transform of an a-helix and quantitative agreement for this conformation was found from X-ray diffraction studies on the synthetic polypeptides poly-y-methyl-L­ glutamate (Bamford et ai, 1952) and poly-L-alanine (Brown and Trotter, 1956). The screw sense of the a-helix in vivo was not determined for a number of years. Pauling et al (1951) observed that for residues with the L-configuration (which is predominantly the case in vivo) the position of the sidechains differed between left­ and right-handed a-helices (except for glycine which has a sidechain consisting of a single hydrogen atom). Huggins (1952) showed that for L-amino acids, significant steric hindrance was probable between the l3-carbon and the carbonyl-oxygen in the same residue for a left-handed a-helix whereas no such hindrance would occur for a right-handed one. Subsequent studie� on a-helix-forming synthetic polypeptides (for example poly-L-alanine: Elliott and Malcolm, 1956, 1959) showed that the a-helices were right-handed, as is the general case, although several left-handed a-helices of marginal stability have subsequently been reported (see Fraser et ai, 1972). The first direct confirmation of the presence of a-helices in proteins was obtained from the 0.2 nm Fourier synthesis of myoglobin (Ken drew et ai, 1960). In addition, the screw sense of all the a-helical segments in the molecule was shown to be right-handed. A prominent feature on the meridian of the X-ray diffraction patterns from a-keratin and other a-fibrous proteins (for example a-tropomyosin) is a strong reflection at a spacing of 0.515 nm (Astbury and Street, 1931). This is not predicted by an undistorted a-helical structure which should instead show off-meridional layer line. diffraction at an axial spacing of 0.54 nm (Crick, 1952). An explanation of the strong 0.515 nm reflection and the missing layer line was made independently by Crick (1952) and Pauling and Corey (1953a): in their models the a-helices were distorted so as to wrap around one another in a supercoil of opposite sense to that of the a-helix and this structure was termed a 'coiled-coil rope'. The 0.515 nm reflection was shown to arise from the axial rise per turn of the supercoil. Crick (1953) described a seven .residue substructure, later termed the heptapeptide or heptad repeat, in which sidechains from two of every seven residues could be made to fit neatly into the spaces between the sidechains on the other a-helix or a-helices. This arrangement of side­ chains along the line of contact was termed 'knob-hole' packing and the nature of these internalized sidechains was suggested to be hydrophobic. Confirmation of this CHAPTER 1: Introduction pageS suggestion was provided many years later by Hodges et al ( 1972), Parry ( 1974) and Stone et al (1974) who demonstrated such a substructure in the amino acid sequence of tropomyosin - apolar residues were indeed successively three and four residues apart. McLachlan and Stewart ( 1975) introduced a nomenclature for the positions of the residues in the heptad, iL h., k' d, � f and � where it. and d are the internal positions of the coiled-coil and are filled predominantly by apolar residues. In IF proteins, about 75% of residues in the � and � positions are apolar (Parry and Fraser, 1 985). Hard a-keratin, although ideal for X-ray diffraction studies, is not readily amenable to chemical analysis. The high content of covalent disulphide bonds present in wool, for example, effectively welds the proteins into a mechanically inert structure (Fraser et aI, 1972). Even when the disulphide bonds are reduced, a complex mixture of related proteins is revealed (Crewther et ai, 1965). The problems associated with sequencing the keratin proteins in wool (using the c lassic protein sequencing techniques) were summarized by Hogg et al ( 1978) as follows: the protein chains were relatively large at 40-50 kDa; the number of similar chains within the relatively small range of molecular weights was also large; and identification of particular chains was difficult. Nonetheless, Crewther and Dowling (197 1 ) and Dowling and Crewther (1972) have shown by amino acid analysis, peptide mapping and optical rotatory dispersion measurements that two types of chain segment, termed I and 11, were present in subfractions of helical fragments of S-carboxymethylkerateine-A. Crewther and co-workers (Crewther, 1976; Crewther et ai, 1978a; Elleman et aI, 1978; Gough et aI, 1978; Hogg et aI, 1978) sequenced a type I and a type 11 segment of lengths 103 and 109 residues respectively and showed that a heptad substructure existed with apolar residues common in the � and Q positions. Consequently, in a helix of 3.6 residues per turn these apolar residues generate a stripe running around the outside of the undistoned a-helix - a result in accord with the coiled-coil model of Crick ( 1953) and the tropomyosin model (see above). They also noted that 32% of the residues were identical between the type I and the type IT segments when the sequences were aligned to maximize homology. Crewther et al ( 1978b) extended these results by comparing the sequences of five fragments (-30 residues each) from different wool keratin proteins: of those, three were type I segments and two were type 11 segments. As before, the identity between the groups was about 30% but within each type there were few differences. More recently, Crewther et al ( 1985) have fully sequenced the rod domains of two type I wool keratin proteins (components 8a and 8c- l ) and two type 11 proteins (components 7 and 7c) and have reponed that the identity within the type I and 11 groupings is 90-95%. CHAYfER 1 : Introduction page 6 Long standing problems concerning the structure of the coiled-coil in a-keratin proteins have been to detennine the pitch length of the coiled-coil and the number and relative orientation of polypeptide chains that comprise the rope. Crick ( 1953) described coiled-coil ropes with two and three a-helical strands and suggested that the three-stranded rope would be an appropriate model for a-keratin. Pauling and Corey ( 1953a) originally suggested that a-keratin comprised 'a-cables' of six a-helices coiled around a seventh with single a-helices in the interstices between the c ables. Evidence for a two-strand rope was found by Cohen and Holmes ( 1963) in the X-ray diffraction pattern of a highly oriented specimen of paramyosin and they concluded that the pitch length of the left-handed coiled-coil was 17 .8± 1 nm. The degree of orientation in the X-ray diffraction pattern from a-keratin was insufficient to allow a clear choice to be made between two and three strands (Fraser et aI, 1964a, 1964b, 1965) and in addition the X-ray diffraction pattern was overlaid with an elaborate interference pattern associated with higher levels of structure (Fraser et aI, 1 972). Estimates of the pitch length for a-keratin ranged from 14 to 17 nm and 21 to 26 nm for two- and three-stranded ropes respectively (Fraser et ai, 1965) but it was suggested that the degree of similarity between the diffraction patterns from tropomyosin, myosin, paramyosin (all of which were known to contain two-stranded ropes from physico-chemical data) and a-keratin would tend to favour a two-strand rope model for a-keratin. In principle, two-stranded coiled-coils can adopt one of two forms: in the first both a-helices are similarly directed, or parallel (ie, the polypeptide mainchain sequences, -NH-aC-C' -, were oriented in the same direction), whereas in the other the chains are oppositely directed, or antiparallel. Parry and Suzuki (1969) calculated the free energies of the parallel and antiparallel forms for a two-stranded model of poly-L­ alanine and compared them with the energies of the analogous pairs of straight a-helices. The coiled-coils were shown to be significantly more stable than the straight a-helix analogues and the anti parallel coiled-coil was favoured over the parallel. A feature not considered at that time, however, was the role played by charged sidechains in specifying the orientation of the chains in the coiled-coil (Parry, 1974, 1975; McLachlan and Stewart, 1975, 1976). McLachlan and Stewart (1975) and Parry (1975) showed that the distribution of charged residues in a-tropomyosin resulted in a largely acidic stripe in the � position of the heptad and a largely basic stripe in the g position where both stripes were adjacent to, but on opposite sides of, the 'internal' apolar stripe. Parallel chains allowed salt bridges (or ionic interactions) to be made between a pair of helices but anti parallel chains placed similarly charged residues in close proximity and caused destabilization. Interchain ionic interactions were studied for the type I and type 11 chain segments of Crewther et al (described above) in a two- CHAPTER 1 : Introduction page 7 stranded rope (Parry et al, 1977) and a three-stranded rope (Parry, 1979) and all combinations of chains, polarity (ie, parallel or antiparallel) and relative stagger were examined. While the number of chains in the molecule could not be detennined, a parallel, in-register arrangement of a type I chain and a type 11 chain was considered among the more likely candidates. A related study by McLachlan (1978) suggested that the coiled-coil in a-keratin was more likely to be two-stranded than three-stranded. The chemical studies on wool keratin originally appeared to favour a three chain subunit (Crewther and Harrap, 1967; Crewther et ai, 1968, Crewther and Dowling, 1971; Skerrow et ai, 1973; Lee and Baden, 1976; Lotay and Speakman, 1977; Steinert, 1978; Steinert et al, 1980a) although in some cases a two chain structure could not be excluded. Subsequent studies by Crewther et al ( 1980) showed that type I proteins interact specifically with type 11 proteins in a 1: 1 molar ratio although there was no evidence to show whether single or multiple chains of each species were involved in the molecular structure. However, crosslinking studies by Ahmadi and Speakman ( 1978) and Ahmadi et al (1980) provided strong evidence for a four-chain subunit. Further studies by Woods and Oruen (1981) and Oruen and Woods (1983) showed that the wool microfibril contained equal numbers of type I and IT chains and that the fundamental sub-unit was a tetramer of chains configured as a pair of coiled­ coils. More detailed studies by Inglis et al (1983) and Woods and Inglis (1984) confirmed that the dimer of chains in the coiled-coil molecule from wool keratin was indeed comprised of a type I chain and a type IT chain - a heterodimeric structure - and that the chains were parallel. Their data also indicated that the molecules were antiparallel and approximately half-staggered so that the N-terminal regions of the rod domain were overlapped. 1.2 Soft Keratin and Other IF Until relatively recently, the so-called 'soft' or epidermal keratins from the stratum corneum of skin had not been studied as extensively as the hard keratins from wool or quill. Certainly the filaments found in the stratum corneum were known to be similar in diameter to those from hard keratin-containing tissue (- 10 nm). However, the degree of alignment was so much poorer than in the hard keratins that the use of physical techniques such as X-ray diffraction and electron microscopy had been very limited although specimens prepared from mammalian epidermal extracts had yielded an a-pattern of poor quality. A comparison of the amino acid content of hard and soft keratins had shown much similarity. Major chemical differences did exist, however, between the hard and soft keratins, mainly in the relative contents of the cystine and glycine residues (see Fraser et ai, 1972). CHAPTER 1: Introduction pageS The hard keratins are very insoluble and are embedded in a matrix of high-sulphur proteins and high glycine-tyrosine proteins. A high degree of disulphide crosslinks between hard-keratin chains gives rise to this insolubility as well as the 'hardness' of the filament. It also gives rise to the degree of orientation so necessary for useful X­ ray diffraction study. The soft keratins, however, have few disulphide cross-links and are much more amenable to chemical investigation. Chemical studies on all members of the IF family revealed that the proteins were more closely related to one another than had been thought initially (for example, Steinert et ai, 1978, 1980a,b). Studies on the proteolytic digests of epidennal keratins (Woods, 1983; Parry et ai, 1985) produced four-chain and two-chain particles analogous to those produced from hard keratins (Woods and Inglis, 1984; and see above) and it was concluded that epidennal keratin IF was also a type I-type II heterodimer composed of parallel, two-stranded coiled-coils. Quinlan and Franke (1982, 1983) induced crosslinks in type III IF chains using 1,10-phenanthroline cupric ion complexes and subsequently isolated crosslinked homo- and hetero-dimers: they suggested that the crosslink between the single cyste.ine residues in these sequences was inter-molecular. Parry et al (1985) showed that the crosslink could instead be made intra-molecularly between parallel, in-register chains and used this as further evidence for a common molecular structure for all IF proteins. Sequence studies of the wool keratins, the epidennal keratins, and the other IF have also shown that these proteins, originating from a wide variety of cell types, exhibit a high degree of homology in the central, largely a-helical (rod domain) portion of the chains (Geisler and Weber, 1982; Weber and Geisler, 1982; Crewther et aI, 1983; Dowling et ai, 1983). Little was known about the tenninal domains of IF proteins before amino acid sequence data became available. Optical rotatory dispersion experiments had indicated that the keratin molecule was not all a-helical: Harrap (1963), for example, showed that the helical content was only 50-60% (see also Crewther et ai, 1966, 1968). Early sequencing studies on wool keratin by O'Donnell and co-workers (for example, O'Donnell, 1969) had revealed fragments with no structure that could be related'to the regular a-pattern deduced from the X-ray diffraction data. Later studies by Crewther et al (1983) on more complete hard a-keratin sequences showed that the N- and C-terminal portions of the chains were not likely to be a-helical in confonnation and in fact �-bends were largely predicted. It is now clear that all IF chains share a common structural plan (a central coiled-coil domain bracketed by tenninal domains of non-repetitive secondary structure - see CHAPTER 1 : Introduction page 9 lA IB 2A 2B El HI H2 V2 E2 Amino- Carboxy- terminal I terminal domain I� Segment I � I� Segment 2 "-1 domain Figure 1·3 Schematic representation of the IF protein chain. The central rod region is largely of coiled-coil structure (shaded) with breaks at the linker regions, L l and Ll2, neither of which are predicted to be coiled-coil or a-helical in conformation, and at link L2 which is predicted to be a-helical but not a coiled-coil. The lengths of segments 1 and 2 are each about 20.5 nm and the combined length of the rod is predicted to be about 47 nm. The terminal domains vary greatly in size among the IF proteins and are largely non-a-helical. Figure 1-3: Geisler et aI, 1982a; Crewther et ai, 1983). However, a large range of molecular weights is apparent in the IF group. This arises almost entirely from variations in the sizes of the terminal domains: extreme examples are the cytokeratin 19 protein (mol. wt. 44 kDa: Bader et aI, 1986; Eckert, 1 988) which is almost devoid of a C-terminal domain (only nine residues in length from the generally accepted boundary between the rod and C-terminal domains) and the neurofilament NF-H protein (mol. wt. 140 kDa: for example the mouse NF-H chain, Shneidman et aI, 1988) which has a C-terminal domain of length 658 residues. The structural role of the terminal domains is not altogether clear although the amino­ terminal domain does appear to be involved in the formation of at least some IF. Certainly the carboxy-terminal domain is not essential for keratin IF formation, as exemplified by the tailless cytokeratin 19 protein. Also, removal of a portion of the carboxy-terminal domain does not inhibit the development of normal type III IF but removal of part of the amino-terminal domain from desmin (Geisler et aI, 1982a: Kaufmann et aI, 1 985), vimentin (Traub and Vorgias, 1983) and keratin (Sauk et ai, 1 984) does appear to limit the filament-forming ability of these proteins. (Lu and Johnson, 1 983, have disputed the involvement of the amino-terminal domain in filament formation but no corroboration of their evidence has been forthcoming). However, Steinert et aI (1983a) have shown that pieces of the non-a-helical terminal domains from epidermal keratin chains could be removed enzymatically from intact filaments without affecting the structural integrity of the filament, as judged by its appearance in the electron microscope. Hence, although portions of the terminal domains may be necessary for the assembly of IF, the mature structure seems to be stabilized largely by the rod domains and does not necessarily rely on the terminal domains for maintenance of the structure. The accessibility of the terminal domains to enzymatic cleaving implies that they are at least partly, if not fully, on the exterior CHAPTER 1: Introduction page 10 surface of the ftlament. Examination of the N- and C-tenninal domain sequences enables classification into subdomains which show homology within each IF type (Steinert and Parry, 1985; Steinen et ai, 1985b; see Figure 1 -3). Adjacent to the rod domain are regions of high sequence homology, HI and H2. Distal to these are domains which are variable in content and size, termed VI and V2. At the ends of the chains are the E l and E2 domains which usually have a high charge content. Keratin chains, for example, often exhibit glycine-serine rich motifs in the VI and V2 domains and are basic in the E l and E2 domains as are the type III chains. This net basic character is in contrast to the major coiled-coil regions of the rod domain which are acidic (Steinen and Parry , 1985). Some of the N- and C-tenninal subdomains are not evident in particular IF groups: type I soft keratins are missing the H2 domain for example, whereas the C-terminal domain of type ill chains is all H2 with no V2 or E2 regions. 1 . 3 Current Models of Intermediate Filament Structure Aspects of the currently held model for the structure of the IF molecule have already been outlined: a pair of parallel right-handed a-helices aligned in axial register to form a left-handed coiled-coil configuration (the rod domain) and bracketed by terminal domains of non-repetitive secondary structure. Predictive schemes have indicated that the rod domain of the IF molecule is comprised of four coiled-coil regions (segments lA: 35 residues, IB: 101 residues, 2A: 19 residues, and 2B: 12 1 residues) separated by shon links (link Ll : 7-14 residues; L12: 1 6- 1 7 residues and L2: 8 residues) of largely unknown conformation (Crewther et ai, 1983; Steinert et ai, 1983a; Dowling et ai, 1983; Steinen and Parry, 1985; see Figure 1 -3 and Table 1- 1 ). The coiled-coil regions are readily grouped into two segments, I ( lA-LI-1B) and 2 (2A-L2-2B), each about 20.5 nm in length (assuming the average axial rise per residue of 0. 1485 nm measured from X-ray diffraction studies of a-keratin). This is very similar to a structural repeat deduced from the X-ray diffraction patterns of hard keratin (see Fraser et ai, 1980) and visualized in electron microscopy studies of keratins and other IF (Milam and Erickson, 1982; Sauk et ai, 1984). If a similar average axial rise per residue is assumed for the linker regions as for the coiled-coil segments, the combined length of the rod domain would be about 47 nm. The linker regions, however, could be arranged as loops external to the coiled-coil and this could result either in a minimum length of 20-2 1 nm if the rod domain was folded back on itself around the L12 linker region or of 4 1 nm for a linear rod structure. Alternatively, the links could be fully extended with an axial rise per residue of 0.334 nm leading to a rod domain as long as 53 nm. High-resolution X-ray diffraction studies have revealed an axial period CHAPTER 1 : Inttoduction N rennmru •• � ________ __ domain I 35 1 1 -14 101 II 35 10- 14 101 III 35 8 101 IV 35 10 101 Rod domain 1 6 1 7 16 17,22 page 1 1 c __ __ __ __ __ �.� rennmru domain 19 8 121 19 8 121 19 8 12 1 19 8 121 Table 1 - 1 Comparison of the lengths of structural domains within the rod domain of IF proteins. of 47 nm (Fraser et aI, 1976) and measurements of rotary shadowed IF molecules from a variety of sources show that the length of the molecule is about 45-50 nm (Steinert, 1 981; Geisler et aI, 1982a, 1985b; Quinlan et aI, 1984; Ip et aI, 1985; see ruso Steven et aI, 1989) and this is consistent with a partially extended structure of the link regions in a linear rod domain of the type described above. The regular coiled-coil structure of the rod domain is correlated with the regular distribution of apolar residues in a heptad substructure - about 75% of the residues in the .it and d positions of the heptad are apolar (Parry and Fraser, 1985). Aggregation of pairs of coiled-coil chains into a parallel, in-register molecule is largely due to the basic and acidic residues (lysine, arginine and aspartic and glutamic acids) which are also distributed in a non-random manner within the rod domain. The � position of the heptad has a net basic character while the � position is net acidic and hence ionic interactions (ie, salt bridges) can be made between residues in the � and � positions of adjacent chains in the molecule. This implies a parallel rather than antiparallel arrangement of chains (see Figure 1-4). Ionic interactions are maximised for an in­ register arrangement of parallel chains. In addition to regularities involving the � and � positions, highly significant long range periods have been found in the linear distributions of the acidic and basic residues in the coiled-coil domains (Parry et aI, 1977; Parry and Fraser, 1985). In segment 1B CHAPTER 1: Introduction page 12 (a) (b) ...... (c) ...... Figure 1-4 The heptad pattern in coiled-coils can be represented as (a-h-ki1-�-f-i.>n where a and d are typically apolar. The coiled-coil structure is stabilized by the knob-hole packing of these apolar residues, as shown in (a) where the a and d positions joined by dashed lines are axially staggered relative to each other to optimize the meshing of the hydrophobic sidechains. Oppositely charged residues in the � and g positions are also able to interact by fonning salt bridges (ionic interactions) and, in doing so, specify both the parallel orientations of the chains and the in-register alignment. Two stutters are apparent in the heptad substructure of the rod domains. Continuation of the undistorted coiled-coil geometry through these points are shown in (b) for the stutter near the midpoint of segment 2B and in (c) for the stutter within the link, L2. Figure adapted from Parry and Fraser (l985). (101 residues) the periods for the acidic and the basic residues are both ",,9.6 residues (=1.42 nm) and in segment 2 (=2A-L2-2B) they are ::=9.8 residues (=1.46 nm). The periods of the acidic and basic residues differ in phase by about 1800 in each case. Despite the heptad stutters in segment 2, the period in the charged residues is continuous throughout this section of the rod domain and indicates that segment 2 is a single structural unit (Parry, 1989). The slight, but significant difference in periods between segments I B and 2 could prevent the aggregation of these segments via ionic interactions: only I B- I B and 2-2 combinations might be possible. However, supercoiling of segments IB and 2 about each other might permit an alignment of the CHAPTER 1 : Introduction page 13 axial periods of the charged residues and allow IB-2 aggregation to be achieved (Parry and Fraser, 1985). The possible arrangements of a pair of two-stranded coiled-coils to form the four-chain structural unit (tetramer) was investigated by Crewther et al (1983). Only alignments that maximised the overlap of the two major coiled-coil domains, segments IB and 2, were considered as these provided the maximum opportunity for ionic interactions to be made between the molecules. Five classes of arrangements were considered possible: parallel molecules either in-register or approximately half-staggered, or anti­ parallel molecules that are either completely overlapped or half-staggered. The favoured model was a half-staggered, antiparallel arrangement with the N-tenninal segments of the rod overlapped by about 28 nm giving a combined length (excluding the terminal domains) of about 60 nm. Support for this model was found in the tryptic digestion study of Woods and Inglis ( 1984) on hard a-keratin in which two types of helical particles were found: a four-chain fragment from the N-terminal segment of the rod domains (previously described by Woods and Oruen, 1 98 1 and Oruen and Woods, 1983) and a heterodimeric fragment from the C-terminal segment of the rod domains. Since no fragment with both N- and C-terminal segments was found, the data strongly implied an antiparallel arrangement rather than a parallel one. (Note that an in-register, parallel arrangement of coiled-coils dimers is possible in theory but would not give rise to a filament structure). Further evidence for the antiparallel arrangement of dimers has come from limited chymotryptic digestion experiments on reduced carboxymethylated wool IF which resulted in covalently linked N-terminal segments of the rod domain (Sparrow et ai, 1989). Cross-linking peptides were isolated and characterized and their locations were compatible only with the dimers being approximately half-staggered and antiparallel. Support for a half-staggered arrangement of molecules is provided by the electron microscopy studies of Steven et al (1989) who visualized particles with a variety of lengths: 20-25 nm rods; 45-50 nm rods, some of which were kinked near their centres; and 70-80 nm rods which were kinked about one-third along from one end. All of these structures are compatible with the -20 nm dimensions of the major coiled-coil domains in the IF chain. In addition they show at least two modes of molecular aggregation: fully overlapped and half-overlapped. The conformations of the terminal domains are currently unknown. The enzyme cleavage studies on epidermal keratin IF by Steinert et al ( 1983a, described above) indicate that at least part of the terminal domains are located on the outside of the filament. Solid state Nuclear Magnetic Resonance (NMR) studies on epidermal keratin CHAYI'ER 1: Introduction page 14 IF (Mack et ai, 1988; Steinert et ai, 1989) and prekeratin IF (Steven et ai, 1 989) revealed little order in the structures of the N- and C-terminal domains but showed high flexibility about the ubiquitous glycyl peptide bonds, and by implication, of the polypeptide backbones in these domains. Properties of the VI and V2 domain s bear some similarities to those of the proposed a-loop (Zhou et ai, 1988; S teven et ai, 1989) but as yet there is no direct evidence for this element of secondary structure in IF chains. The reported diameter of the IF has varied from the earlier estimates of 7 nm (electron microscopy, Birbeck and Mercer, 1957; Rogers, 1959; X-ray diffraction, Fraser and MacRae, 1959; Fraser et ai, 1959, 1973) to 14-1 5 nm (Steven et ai, 1982, 1 985; Steven, 1989) and depends to some extent on the criteria used for determining the filament's 'edge'. Protofilaments of about 2 nm diameter were described by Fraser et al ( 1962) and a '9+2' arrangement of protofilaments, reminiscent in part of the organization of tubules in cilia and flagella, was suggested for the filament by Filshie and Rogers (1961 ) on the basis of evidence from thin-section electron rnicroscopy. Subsequent evidence (for example, Fraser et ai, 1972) suggested instead a ring-core substructure. Scanning transmission electron rnicroscopy (STEM) measurements of the radial density of unstained freeze-dried IF showed that the filament was composed of a uniform density-core of diameter 9- 10 nm surrounded by a diffuse periphery that extended to 15- 16 nm (Steinert et ai, 1983b; Steven et ai, 1985, 1989). The ratios of core mass to peripheral mass for vimentin and an epidermal keratin were similar to those for the rod domain to terminal domain masses, implying that the tenninal domains are largely on the external surface of the filament. Mass-per-Iength measurements of native unstained filaments of vimentin using STEM have also indicated that IF consist of polymorphs: the density of the major component IF was consistent with 16 molecules in cross-section (=32 chains) and a minor component contained about 1 1 molecules (Steven et ai, 1982). The lower mass variant was suggested to be be a breakdown product or an immature filament (Steven , 1 989). Reconstituted keratin IF also showed several mass variants and in each case the axial density of the filament scaled with the average density of its constituent subunits (Steven et ai, 1983a,b). Neurofilaments reconstituted in vitro also contained 1 6 molecules in cross-section although polymorphs with fewer or greater numbers of chains were present to a lesser degree (Troncoso et ai, 1989). X-ray diffraction studies on well-ordered hard keratin tissue have revealed an axial period of 47 nm in which there are seven or eight quasi-equivalent units on a helix of pitch 22 nm (Fraser and MacRae, 1973a, 1983, 1985; Fraser et ai, 1985, 1986). A discontinuous surface lattice model has been generated from the X-ray diffraction data, CHAPTER 1: Introduction page 15 the STEM mass data and ionic interaction calculations (Fraser et ai, 1985, 1986, 1988). Each lattice point is associated with a single tetramer (two antiparallel molecules approximately half-staggered). Some studies have suggested that the IF structure is organized by increasing levels of aggregation of subfilaments: a single molecule forms the 2 nm-diameter protofilament; pairs of these protofilaments form a 4.5 nm-diameter protofibril (the tetramer); and protofibrils aggregate in groups of four (typically) as variants of the intact IF (Aebi et ai, 1983; see also Steinert and Roop, 1988). 1 . 4 Conservation of Amino Acid and Gene Sequences A comparison of the amino acid sequences of the rod domains of IF chains reveals several features common to all IF. Two regions of very highly conserved sequence occur in the last 4-5 heptads at the C-terminal end of segment 2B (Geisler and Weber, 1982; Hanukoglu and Fuchs, 1983) and in heptads 3-4 of segment lA (Weber and Geisler, 1982, 1984; Steinen et ai, 1984a; Parry and Fraser, 1985; Steinert and Parry, 1985). In contrast, the linker segment L1 is variable in length and shows little or no homology amongst the various IF chains. Segment Ll2 is more regular in length (generally 1 6 or 17 residues depending on the chain type) and in homology (Crewther et ai, 1983). The secondary structure of these two link segments is not predicted to be a-helical and they do not contain the heptad substructure or regular distribution of ionic residues apparent in the coiled-coil regions. The third linker segment, L2, is absolutely conserved in length (8 residues) and shows high homology across the IF chains. In addition, the regularities in the distributions of ionic residues are maintained across L2 although the heptad substructure is disturbed somewhat by the irregular step between consecutive apolar residues at that point (see below). The heptad substructure of the coiled-coil undergoes a conserved stutter (ie, the heptad sequence i!-Q-£-d-�-f-� is broken by the insertion or deletion of several residues) near the midpoint of segment 2B (an insertion of four residues: Geisler et ai, 1982a; Dowling et ai, 1983; Steinert et ai, 1985c). This general feature has been observed in the coiled-coil sequences of all a-fibrous proteins except tropomyosin (see Cohen and Parry, 1989). The consequences of this disruption of the regular heptad structure are unclear although possible distortions of the coiled-coil have been theorized (McLachlan and Karn, 1983; Parry and Fraser, 1985). The effect of this panicular stutter, however, is to cause the internal i! and d positions to be rotated by 3600n (=54°) relative to the axis of the a-helix (see Figure 1 -4b). Assuming that the coiled-coil is undistorted at some distance remote from the stutter on both sides, the stutter may be accommodated by a gradual change in the pitch length of the coiled-coil over an extensive region (McLachlan and Karn, 1983). Alternatively, there may be a highly CHAPTER 1: Introduction page 16 localized discontinuity in structure which could result in a kink in the axis of the coiled-coil (Parry and Fraser, 1985). A similar feature occurs in the link region, L2, where an insertion of five residues (or deletion of two) occurs and as a result the il and Q positions are rotated by :::::154° (Figure 1-4c). The requirement for such distortion of the regular coiled-coil structure is not known but its importance is reflected in the absolute conservation of these stutters in all IF protein sequences. Regularities in the dispositions of acidic and basic residues in the rod domain of IF proteins have already been described: the charged stripes evident in the � and g positions of the heptad specify in part the alignment of chains to form the dimeric IF molecule and the next level of structure appears to be determined by the 9.6-residue period of the charged residues. It is possible that higher levels of IF structure are also dependent to some degree on other regularities in the positioning of these residues. The self-assembly of IF in vitro, and possibly in vivo also, must be some function of the protein sequences. The dispositions of acidic and basic residues, as well as the apolar residues, appear to play an important part in this process. Conserved features are also apparent in the nucleotide sequences of the genes that encode IF proteins. In particular, the positions of six introns are generally conserved within the rod domain (Quax et ai, 1983, 1985; Lehnen et ai, 1984; Marchuk et ai, 1984; Krieg et ai, 1985; Rieger et ai, 1985; Tyner et ai, 1985; Lewis and Cowan, 1986). Desmin, vimentin and GF AP have, in addition to these six introns, two others at common positions within the C-terminal domain and several keratin protein genes also exhibit one of these (Marchuk et ai, 1984; Tyner et ai, 1985; Lewis and Cowan, 1986). An interesting exception to this pattern is the bovine cytokeratin 19 protein (Bader et ai, 1986) which was found to be lacking the intron near the junction of the C­ terminal and rod domains. This protein is also unusual in that the C-terrninal domain is small (9 residues in length) and effectively constitutes a continuation of the heptad substructure from the rod domain: apparently it is more an extension to the rod than a distinct non-a-helical domain. The corresponding human cytokeratin 19 protein (Ecken, 1988) also has the same intron structure and the same number of amino acids. Furthermore, the two protein sequences are 89% identical. Other exceptions to the general IF intron pattern are the neurofilament protein s which have only two introns within the rod domain, both of which are conserved among the NF proteins but which do not correspond to intron positions in the other IF proteins (NF-L: Lewis and Cowan, 1986; NF-M: Myers et ai, 1987; NF-H: Lees et ai, 1988). Genes for mouse and human NF-L proteins show 90% homology within the exons, CHAYrER 1: Introduction page 17 conservation of the positions and sizes of the introns but considerable differences in the nucleotide sequences of the introns (Julien et ai, 1987). The purpose of introns has not been established and their positions do not seem to coincide with structural domains or functional properties of the proteins in any regular way. They do, however, allow some conclusions to be drawn on the evolution of the genetic material of which they are a part. The conservation of introns among a protein family expressed in such diverse cell types is an important indication of a common ancestor gene. In addition, the difference between the intron positions for the types I-Ill and type IV groups provides evidence on how long ago they diverged (see, for example, Steinert and Roop, 1988). 1 . 5 Lamin, Peripherin and the Helix A and B proteins The nuclear envelope is a double membrane composed of two lipid bilayers separated by approximately 20 to 40 nm. Between the external and internal membranes is a region known as the perinuclear space. The outer nuclear membrane is continuous with the membrane of the endoplasmic reticulum, as is the perinuclear space with the lumen of the endoplasmic reticulum. The ribosomes that 'stud' the rough endoplasmic reticulum are also apparent on the outer nuclear membrane. The inner and outer nuclear membranes are connected at the nuclear pores which allow exchange of material between the cytoplasm and nucleus. A complex layer of filaments lining the nucleoplasmic surface of the inner nuclear membrane has been observed in invertebrate cells, and a similar layer of filaments, 15- 20 nm thick, has also been seen in ultra-thin sections of vertebrate cells (Fawcett, 1966; Scheer et ai, 1 976). The diameters of the filaments, called the nuclear lamina, have been reported as 5- 10 nm (Scheer et ai, 1976). The native lamin meshwork of Xenopus oocytes was shown to be an orthogonal network with sides of about 52 nm (Aebi et ai, 1986). This filamentous mesh work comprises a scaffolding associated with the pore complexes, the inner nuclear membrane and interphase chromatin . It also appears to assist in the structural organization of the nucleus. Gerace et al ( 1 978) showed that the nuclear lamina from rat liver cells was comprised mainly of three proteins (termed lamins A, B and C by Gerace and Blobel, 1 980) with molecular weights of 60-70 kDa and which were located at the periphery of the normal interphase nucleus. In addition, lamin B was found to be significantly different from lamins A and C on the basis of peptide mapping and immunological studies (Gerace and Blobel, 1980; Shelton et ai, 1980). However, coincident with the disassembly of the nuclear envelope during prophase, antigens to the three nuclear lamin proteins became distributed evenly throughout the cell until telophase, at which stage they became CHAPTER 1 : Introduction page 18 localized around the daughter chromosomes. It was proposed that the lamina was reversibly disassembled concomitant with disintegration of the nuclear envelope during mitosis. The nuclear lamina was shown to be interposed between the nuclear envelope and the chromatin (Fawcett, 1966). Lamin B was more resistant to extraction from membranes than lamins A and C (Gerace and Blobel, 1982) which suggests that it has a special membrane binding role and indeed a lamin B receptor protein has been described recently in turkey erythrocytes (Worman et aI, 1988) and in yeast cells (Georgatos et ai, 1989). During mitosis, lamins A and C become diffuse throughout the cytoplasm but lamin B remains associated with membranes (Gerace and Blobel, 1980; Burke and Gerace, 1986). It has been suggested that nuclear membrane fragments may be 'labelled' by lamin B for use in re-establishment of the envelope (see Stick et aI, 1988). A study on chicken nuclear lamin proteins, lamins A, B l and B2, also found that lamin A was dispersed throughout the cytoplasm during mitosis and that lamins B l and B2 were membrane-bound and hence were considered functional analogues of the mammalian lamin B (Stick et aI, 1988). In addition, lamin B2 was found to be associated with the endoplasmic reticulum and, during the mitotic disassembly of the nuclear envelope, the lamin B2 proteins may become reversibly concentrated in the endoplasmic reticulum. Characterization of the different biochemical and structural features of the A and B lamins has been described recently by Peter et al ( 1989). The integrity of the lamin mesh work is thought to be regulated by the degree of phosphorylation of the lamin proteins (Gerace and Blobel, 1 980). S ites for phosphorylation in keratin are specific serine and threonine residues in the terminal domains (Steinen, 1988). Such residues are abundant in the C-terminal domain of the lamin molecules: for example, examination of the protein sequences shows that this domain of the human lamins A and C contains 44 and 23 serine residues respectively and 20 and 14 threonine residues respectively. However, newly synthesized lamin proteins require some alternative to phosphorylation in order to avoid the possibility of premature oligomerization while in the cytoplasm. A higher molecular mass precursor (7 1 kDa) to lamin A (68 kDa) was studied by Lehner et al ( 1986a) who suggested that the precursor form could allow migration of lamin A protein into the nucleus where the precursor would be processed into the mature protein and integrated into the lamina mesh work in some unspecified manner. Lehner et al ( 1986a) also identified two variants of the lamin B protein: a mammalian form, lamin B 1, and an avian form, lamin B2. An apparent higher molecular mass precursor for lamin B2 was also synthesized but in vitro translation produced equal amounts of the two fonns of the protein. The reason for this is unclear. No precursor was found for lamin B 1 . (No CHAPTER 1 : Introduction page 19 investigation was made into precursors for lamin C. However, if the lamin proteins copolymerize, a non-aggregating fonn of one protein could be sufficient to prevent ftlament fonnation). Zackroff et al (1984) isolated 'keratin-like' proteins located near the periphery of the nucleus and suggested a relationship with the nuclear lamin proteins. This was confirmed in further studies by Goldman et al (1986) who also proposed that the nuclear lamins formed an IF-like network in the nuclear lamina and that this was connected with the cytoplasmic IF network, possibly through the nuclear pores or by some trans-membrane mechanism. The possibility of the filamentous networks being unrelated was discounted for two reasons: the component proteins were related biochemically and the cytoplasmic IF network was known to be closely associated with the nuclear surface as well as the plasma membrane. The protein sequences for human lamins A and C were determined by McKeon et al (1986) and Fisher et al (1986), who showed conclusively that the nuclear lamins were indeed IF-type proteins . Amphibian lamin proteins from Xenopus laevis have been described and named lamins A, L., LIl, Lm, LIV (for review, see Wolin et ai, 1987). Invertebrate lamins from the surf clam and Drosophilia have also been reported as have minor lamins (Lehner et ai, 1986b). The primary structures of human lamins A and C bear a remarkable resemblance to those of the other IF proteins and common structural properties have been postulated for both lamins and IF (Gerace, 1985; McKeon et ai, 1986; Fisher et ai, 1986). McKeon et al (1986) demonstrated that human lamins A and C shared the general structural plan of IF chains - an a-helix-rich rod domain bracketed by largely non­ a-helical N- and C-terminal domains - and that the rod domain of lamins may be subdivided into segments in much the same manner as for IF. In addition, the human A and C lamins were seen to be identical from the N-terminus through the rod domain and into the C-terminal domain and McKeon et al (1986) suggested that both proteins arose from the same gene by differential processing - this may be related to the unusual repeat of four histidine residues after which the two polypeptide sequences diverge. An important difference noted between the rod domains of IF and lamin was a six­ heptad insert (ie, 42 residues) in segment IB of the lamin chains. Weber (1986) also noted that the start of the insert occurred at the position of a conserved intron (see also Steinert et ai, 1985b). This extension is predicted to result in a length of -52 nm for the rod domain of lamin molecules. Dimers of lamins A and C from rat liver reveal a -52 nm rod with two globular heads at one end (Aebi et ai, 1986), presumably the CHAPTER 1 : Introduction page 20 lA lB 2A 2B N C / Ll L12 L2 Stutter / / N C Segment 1 Segment 2 Figure 1-5 Schematic representations of the chain structures for IF types I-IV (top) and lamin. Unlcer segments Ll and Ll2 in typeS I-IV IF are not predicted to be a-helical in structure, as indicated by boxes wider than the shaded coiled-coil regions (top). In the type V (lamin) chain, however, the entire rod domain is predicted to be a-helical (bottom). Segment 1B is extended by a 42-residue insertion and linlcer segment L l has been replaced by a stutter in the heptad substructure.Figure adapted from Parry et al (1986). larger C-tenninal domains. Dimers of lamin B were similar in form although typically only one globular head was apparent. Other points noted by Parry et al (1986) in their analysis of the sequence data were (i) the 42-residue insert is part of a 70-residue piece in human lamins A and C that replaces residues 43-70 inclusive in segment 1B of IF chains (numbering from the amino-terminal end of segment 1B), (ii) segment L12 is predicted to be a-helical in lamin chains but non-a-helical in other IF chains, (iii) segment L1 in lamin has a heptad substructure and is also predicted to be a-helical and indeed it can be reduced to a single stutter in the heptad phasing for segment 1, and (iv) unlike IF type I-IV chains, lamins do not appear to fonn long filamentous structures in vitro but instead fonn tactoids or paracrystalline arrays which have alternating dark- and light-staining bands when negatively stained and observed by electron microscopy (Zackroff et ai, 1984; Goldman et ai, 1986). The resultant model for the lamin chain is shown in Figure 1-5 alongside that for IF chains (Parry et ai, 1986). The rod domain of human lamins A and C is predicted to be entirely a-helical with a heptad substructure present throughout segment 1 and in segments 2A and 2B. In addition, the ionic and heptad regularities in the rod domain were found to be maintained across the linker region L12. Parry et al (1986) suggested that the lamins be tenned type V IF proteins since they showed a high degree of homology with IF overall but did not fall naturally into the type I-IV classification previously established. Indeed, the novel extension to segment 1B and the intranuclear location of these proteins together imply that the lamins are fundamentally different from the cytoplasmic IF proteins. �Rl: Introduction page 21 An interesting parallel to the description of the lamin protein structure is the recent characterization of other proteins that have the same longer 1B segment seen first in the lamin A and C proteins. Weber et al ( 1988) have sequenced two epithelial proteins from the snail Helix pomotia of molecular weights 66 kDa (Helix A) and 52 kDa (Helix B). Both form monocomponent IF in vitro indicating that they are not analogous to the heteropolymeric keratin IF despite their occurrence in a near equal molar ratios in epithelial tissue. Although the homology in structure and sequence between the nuclear lamin and Helix proteins might have been taken as evidence for a similar location and function, this is not supponed by Weber et al ( 1988) who note that a karyophilic motif in the C-terminal domains of the nuclear lamin proteins (the sequence Lys-Lys-Arg-Lys-Leu-Glu, Fisher et ai, 1986) is not apparent in the sequence of the Helix proteins. However, the departures from the type I-IV IF structure described above for human lamin A and C are shared also by the Helix B protein. No other IF proteins have demonstrated this variation to the structure of the rod domain and so it is appropriate to designate the Helix B protein a type V IF chain. The sequence of the Helix A reponed by Weber et al ( 1988) was incomplete and no assignment is yet possible for this chain. Studies on a neuronal protein of molecular weight -58 kDa by Liem et al ( 1 978), Portier et al ( 1984a,b), Franke et al (1986) and Parysek and Goldman ( 1987) had shown that it was related to the IF family but was not one of the neurofilament triplet proteins, NF-L, NF-M or NF-H. This protein was termed peripherin by Portier et al ( 1984a). The sequence (Leonard et ai, 1988; Parysek et al , 1988) was shown to have greatest homology with the type III IF sequences, desmin and vimentin. Although its expression in neuronal tissue might suggest that peripherin is a type IV IF protein, the gene structure (Thompson and Ziff, 1989) follows the pattern for the type I-In IF proteins (six conserved introns in the rod domain) rather that for the neurofilament proteins (two conserved introns in the rod domain) and peripherin is therefore conclusively not a type IV IF protein. The high content of serine residues in the N-terminal domain was described as being "novel" (Leonard et ai, 1988). However the peripherin protein is only average in this respect amongst the type ID IF proteins which have 23 (hamster desmin), 25 (hamster vimentin) , 7 (mouse GFAP) and 24 (peripherin) serines in this domain (sequences as cited by Leonard et ai, 1988). Indeed, it is only the GFAP protein that is unusual in having a lower serine content in its N-terminal domain. The serine residues are potential sites for phosphorylation, a process which appears to have imponance as a regulatory mechanism for the polymerization of IF proteins in general (see later). Phosphorylated peripherin, however, is found almost exclusively in the insoluble CHAPTER 1 : Introduction page 22 cytoskeleton (Aletta et al, 1989), indicating that it may still be an integral part of the filament network. Aletta et al (1989) suggest that this is consistent with phosphorylation-promoted assembly, an opposite effect to that reported for other IF proteins. Vimentin (lnagaki et ai, 1987) and desmin (Geisler and Weber, 1988), for example, have a reduced ability to polymerize when phosphorylated and the nuclear lamin network disassembles concomitant with phosphorylation of the lamin proteins (Gerace and B10bel, 1980). Aletta et al ( 1989) also report that non-detectably phosphorylated peripherin is present in the cytoskeleton and so no finn conclusion can be drawn from these data as to the effect of phosphorylation on the in situ proteins. A truly novel feature of the peripherin protein is its expression in neuronal tissue, previously thought to be the exclusive domain of the neurofilament proteins. Leonard et al ( 1988) reported that the distributions of the NF proteins and peripherin mRNA within the nervous system are different, although there is some overlap, and that the regions which express peripherin are all evolutionarily old. The term "peripherin" originated from the location of this protein in neurons peripheral to the nervous system (Portier et ai, 1984a) although it is also expressed in certain of the central nervous system neurons (Leonard et ai, 1988). 1 . 6 Function of Intermediate Filaments Intermediate filaments form a highly insoluble skeleton that extends throughout the cytoplasm and is also found in the nucleus. The functions of the IF networks are unclear as are the mechanisms by which they are controlled and their associations with other elements of the cytoplasm. Their Ubiquity and underlying homology imply some fundamental importance for cell physiology that transcends species and cell types, although some cultured cell lines apparently grow normally in the absence of IF (Venetianer et aI, 1983). The nuclear lamin proteins appear to be especially widespread in eukaryotic cells: lamin A and B analogues have been reported in yeast cells (Georgatos et aI, 1989) in addition to mammals, amphibia, insects and birds. The fIrst steps towards understanding the role of IF have been made by determining the anchorage sites of some IF proteins to membranes and it has become increasingly apparent that IF constitute some form of connection between the cell surface and the nucleus. Vimentin IF were shown to be associated with the cell membrane, possibly via head­ on interactions between the non-a-helical amino-terminal domain and at least one membrane-anchoring protein, ankyrin (Georgatos et al, 1985; Georgatos and Marchesi, 1985). Studies on desmin IF located the plasma membrane binding site to within the amino-terminal domain (Georgatos et ai, 1987). Georgatos et al ( 1 985) �R l: lntroduction page 23 showed that ankyrin inhibited IF formation of vimentin beyond a protofilament stage (probably the tetrameric subunit) and, since the in vitro assembly process for IF appears to require intact amino-terminal domains (vimentin: Traub and Vorgias, 1983; desmin: Kaufmann et ai, 1985), this suggests that ankyrin may serve in vivo as a "capping site" that effectively terminates the filament at the membrane (Georgatos et ai, 1987; Steinert and Roop, 1988). An alternative attachment method was suggested by Georgatos et al (1985) whereby the filaments bind side-on to the membrane via short branches and loop back into the cytoplasm. However, the branches might themselves abut the membrane in an end-on fashion. In addition to vimentin and desmin, keratin IF also appear to attach to the cell periphery, possibly to desmoplakin proteins which are found in complex regions of cell-to-cell adhesion called desmosomes (see, for example, Goldman and Dessev, 1989; Green et ai, 1989). IF are also localized around the exterior of the nuclear surface. In vitro binding of desmin to lamin B has been described (Georgatos et ai, 1987) and this association apparently involves a portion of the carboxy-terminal domain of desmin adjacent to the rod domain. Vimentin has also been observed to associate with lamin B in a similar manner to desmin (Georgatos and Blobel, 1987). Indeed, lamin B has been suggested as a nucleating centre for IF (see, for example, Gerace and Burke, 1988) since it is clear that the ankyrin-mediated attachment to the plasma membrane is not capable of performing such a role (Georgatos et ai, 1985). The method by which lamin B might interact with IF proteins through the nuclear envelope, if indeed it does so at all, is unclear. A connection through the nuclear pores is possible as is a trans-membrane interaction mediated by some membrane process. Georgatos and Blobel (1987) have suggested that IF networks may be directly anchored to the nuclear lamina at distinct locations coinciding with the nuclear pores. A feature of the direct connection model for IF to membrane-binding proteins is that a certain amount of local polarity is implied for the IF: the amino-terminal domains of the IF chains link to the plasma membrane whereas the carboxy-terminals bind at the nuclear membrane. However, the structures reconstituted in vitro show no axial polarity (see, for example, the GFAP paracrystals of Stewart et ai, 1989a,b and Quinlan et ai, 1989). If the filaments are axially apolar (ie, equal numbers of oppositely directed chains), why should the amino- and carboxy-terminal domains of the molecule associate with separate membranes in the cell? What role then is played by the amino- and carboxy-terminal domains in the main body of the filaments? Do they bind to some other membranes in the cytoplasmic space? CHAPTER 1: Introduction page 24 In addition to the locale of IF attachment, recent work has examined the role of phosphorylation as a post-synthetic modifier of IF structure. Phosphorylation of the nuclear lamina has already been described as has the general oligamerization-blocking effect of phosphorylation on vimentin and desmin (above) and evidence is now accumulating that the cytoplasmic IF networks also undergo structural changes as a result of phosphorylation. Major sites for phosphorylation have been detennined for the human epidermal keratin 1 protein as well as the turnover rates of phosphate isomers of this protein (Steinert, 1988). Sites in the more mobile parts of the tenninal domains showed the highest rate of turnover and the rate dropped in sites closer to the relatively stiff rod domain. This may be due in part to the accessibility of different regions of the chain to external phosphorylating agents. As Steinert ( 1988) points out, there is no indication why the keratin IF examined should undergo post-synthetic disassembly and reassembly and so the purpose of this phosphorylation remains unclear. Other cellular activities may cause the proximity of phosphorylating agents to the IF network leading to opportunistic interactions. Disassembly of IF networks during mitosis has been studied recently by Chou et al ( 1989). The amount and sites of phosphorylation on the IF chains (desmin and vimentin) were stated to be different during mitosis than during interphase and complete disassembly of the in vitro IF network by phosphorylation was described. These results indicate that phosphorylation is a regulator of IF structure: phosphorylation can disassemble IF networks (and block oligamerization of the subunits: Geisler and Weber, 1988) and conversely dephosphorylation will allow self­ assembly of the molecules into IF. An interesting development has been the recent discovery of lamins A and B in yeast (Georgatos et ai, 1989). These cells divide by closed mitosis where the nuclear envelope is not dismantled as in higher eukaryotic cells but instead develops a cytoplasmic extension that 'buds off' as a daughter nucleus with a separate membrane. There is no apparent requirement for the disassembly/reassembly of the nuclear lamina and the role of the lamina must be directed entirely towards maintaining the structural integrity of the nuclear envelope (as described above). An investigation into the primary structure of such lamins would be of use in identifying regions of the chain than are involved in specific functions. Also, the consequence of hyper­ phosphorylation of the yeast lamins, which presumably does not take place d uring closed mitosis, would be of interest. CHAPTER 1: Introduction 1 . 7 Structural Form of the Thesis page 25 The detailed structure of IF cannot be detennined from the currently available X-ray diffraction data or from electron microscopy studies. However, there are some attributes of their conformations that may be elucidated by non-physical techniques. The studies described in this thesis are a collection of such indirect methods which are nonetheless evaluated on the basis of their ability to conform with such physical data as are available. Protein sequence data are the raw material on which these studies are based: the ever increasing body of sequences available allows new insights to be made into the structural hierarchy of the IF. Periodicities are examined in the linear distribution of charged residues within the rod domains of a type ID IF protein (peripherin) and of various type V IF chains (the nuclear lamins and the Helix pomotia B protein). It is important to establish whether differences occur between these proteins and other IF chains, especially with regard to the extended IB segment characteristic of the type V chains. Studies on the primary sequences of IF in general include investigation of residue occupancy in the heptad substructure, calculation of chain flexibilities (Chapter 2) and quantification of the degree of homology present amongst the IF proteins (Chapter 3). For the latter work, amino acid sequences of the rod domains of IF chains are compared in order to describe more accurately the features that are common to IF chains as well as to distinguish sub-groups. Consensus sequences for the sUbtypes and for all IF chains are derived and, in addition, homology among residues in the heptad positions of the coiled-coil are determined. The current data base is sufficiently large to allow some interesting conclusions to be drawn that aid in further elucidation of IF structure and function. The packing of IF chains into the molecule and of the molecules into larger scale structures is also studied (Chapter 4). Prediction schemes based on the calculation of potential ionic interactions enables the most favourable axial alignments to be found. The simple methods employed in investigating the alignment of molecules are expanded to include more of the three-dimensional information in the molecular structure. Models for the aggregation of lamin molecules based on electron microscopy data and ionic interaction studies are constructed. These are used to explain some of the more unusual features of the nuclear lamina as well as indicating how phosphorylation can result in depolymerization of the lamina filaments. CHAPTER 2: Primary Structure page 26 2. PRIMARY STRUCTURE The number of IF protein sequences available has increased dramatically over the seven or so years since the first full sequence - chicken gizzard desmin - was determined by Geisler and Weber (1982). At the present time over 40 sequences have been fully or partially completed. Examination of these sequences has yielded valuable clues to the likely molecular and filament structure of the proteins. For example, secondary structure can be predicted fairly reliably for fibrous proteins where extensive regions of fixed conformation exist. On this basis, a model for the structure of the IF molecule has been proposed in which there is a central rod-like domain with a regular coiled-coil structure that is interrupted by well conserved breaks. Higher levels of organization, however, are less easy to determine due to the complexity of the information encoded in the protein sequences. Regularities in the disposition of residues having a certain character (for example, the mutually attractive acidic and basic residues) may be indicators of special modes of aggregation of specific IF chains. The likelihood of this feature arising purely by chance is greatly reduced where these regularities are shared by many other protein chains . The characterization of regular patterns in the distribution of residues is not sufficient to deduce the packing order of the protein chains but does indicate that regular packing of the proteins does occur. A list of the proteins used in various studies in this work are given in Table 2- 1 . Several other IF protein sequences have been published recently but are not included in any of the analyses detailed in this thesis. These include the human epidermal type II keratin K5 (Lersch and Fuchs, 1988) and the rat NF-H protein (Dautigny et ai, 1988). In this Chapter, regularities in the dispositions of the charged residues are examined for the rod domains of the newly characterized proteins: the human and Xenopus lamin proteins, the Helix pomotia B protein and peripherin. Fourier transforms of the distributions of these residues have already been undertaken on a number of type I to IV IF proteins and these are compared with the new data presented here. Secondly, this study looks at the distribution of residues in the heptad substructure and how it relates to the predicted structures of the coiled-coil segments in the rod domain. Finally, the flexibility profiles of representatives of each of the classes of IF proteins are calculated with a view to comparing known structural features of the chains with their predicted mobility. CHAPTER 2: Primary Structure Abbrev. Full name B40K Cow (Bovine) 40K B50K Cow (Bovine) 50K B54K Cow (Bovine) 54K H46K Human 46K H50K Human 50K H56.5K Human 56.5K M47K Mouse 47K MSOK Mouse 50K M50K Mouse M pkSCC 50K MS2K Mouse M pkSCC 52K M55K Mouse MS9K Mouse 8a Sheep Component 8a 8c- l Sheep Component 8c- l XL51 Xenopus XL70 Xenopus XL81 Xenopus Table 2-1 IF amino acid sequences. page 27 Type I Source Bader et al ( 1986) Jorcano et al ( 1984) Jorcano et al ( 1984), Rieger et al (1985) Raychaudury et al ( 1986) Hanukoglu and Fuchs (1982), Marchuk et al ( 1984, 1985) Steinert et al (private communication) Singer et al (1986) Steinert and Roop (private communication) Knapp et al (1987) Knapp et al ( 1987) Steinert and Roop (private communication) Krieg et al ( 1985), Steinert et al (1983a) Crewther et al ( 1985) Crewther et al (1983), Dowling et al ( 1983, 1986) Hoffmann and Franz ( 1984) Winkles et al ( 1985) Jonas et al ( 1985) CHAPTER 2: Primary Structure Type 11 Abbrev. Full name H55K Human 55K H56K Human 56K H67K Human 67K M60K Mouse 60K M67K Mouse 67K 5 Sheep Component 5 7c Sheep Component 7c 7x Sheep Component 7x (unnamed) XL64 23 Xenopus XL 64 (pUF23) XL64 164 Xenopus XL 64 (pUFI64) Type TII Abbrev. Full name COD Chicken Gizzard Desmin HD Hamster Desmin PSD Pig Stomach Desmin CV Chicken Vimentin HELV Hamster Eye Lens Vimentin PELV Pig Eye Lens Vimentin MGFAP Mouse Glial Fibrillary Acidic Protein PGFAP Pig Glial Fibrillary Acidic Protein RP Rat Peripherin Table 2-1 continued page 28 Source Glass et al (1985) Hanukoglu and Fuchs (1983), Tyner et al (1985) Steinert et al (1985a) Steinert et al ( 1984a) Steinert et al ( 1985a) Crewther et al ( 1985) Sparrow and Inglis ( 1980), Crewther et al (1983), Dowling et al (1983), Rogers ( 1984) Powell and Rogers (private communication) Hoffmann et al ( 1985) Hoffmann et al ( 1985) Source Geisler and Weber (1982, 1983), Geisler et al ( 1982a) Quax et al ( 1984) Geisler and Weber (1981), Geisler et al ( 1982b) Zehner and Paterson ( 1985) Quax et al ( 1983), Quax-Jeuken et al (1983) Geisler and Weber (1981), Geisler et al ( 1982b) Lewis et al ( 1984) Geisler and Weber ( 1982, 1983) Leonard et al (1988), Parysek et al (1988), Thompson and Ziff ( 1989) CHAPI'ER 2: Primary Structure Type IV Abbrev. Full name HNF-L Human Neurofilament Light MNF-L Mouse Neurofilament Light PNF-L Pig Neurofilament Light RNF-L Rat Neurofilament Light HNF-M Human Neurofilament Medium MNF-M Mouse Neurofilament Medium PNF-M Pig Neurofilament Medium RNF-M Rat Neurofilament Medium HNF-H Human Neurofilament Heavy MNF-H Mouse Neurofilament Heavy PNF-H Pig Neurofilament Heavy Type V Abbrev. Full name HLA Human Lamin A XLA Xenopus Lamin A XLB Xenopus Lamin B H1x B Helix pomotia B Table 2-1 continued 2 . 1 Fourier Analysis page 29 Source lullen et al ( 1987) Lewis and Cow an ( 1985, 1986) Geisler et al ( 1982b, 1983, 1985c) I ulien et al ( 1985) Myers et al ( 1987) Levy et al ( 1987) Geisler et al ( 1984) Napolitano et al ( 1987) Lees et al ( 1988) Shneidman et al ( 1988) Geisler et al ( 1985a) Source McKeon et al ( 1986), Fisher et al ( 1986) Krohne et al ( 1987) Wolin et al ( 1987) Weber et al ( 1988) This section will be concerned with an investigation of the regularities in the linear disposition of the acidic and the basic residues in the amino acid sequences comprising the major heptad-containing segments of human lamins A and e, Xenopus lamins A and B, Helix pomotia B protein and peripherin (see Table 2- 1 for sources) . Each of these proteins manifests some unique extension to the basic model for IF protein structure or location : the lamin proteins are special amongst IF in that they occur within the cell nucleus (other than during mitosis); the type III peripherin protein occurs in neuronal tissue, previously thought to be the exclusive domain of the type IV neurofIlament proteins; the lamin and Helix pomotia proteins have a novel extension to the I B segment of the rod domain that would allow the coiled-coil structure to be extended, however unlike lamin the Helix pomotia proteins do not appear to be located within the cell nucleus (Weber et ai , 1988). Regularities in amino acid sequences are associated either with the structural or the CHAPTER 2: Primary Structure page 30 functional features of proteins, and their evaluation gives important clues in model building studies. The heptad substructure characteristic of regions of coiled-coil conformation, for example, is largely defined by the apolar residues spaced at intervals of three and four residues successively (described in more detail in Chapter 1) . In addition, the charged residues are often found in specific positions within the heptad which facilitate either interchain or intennolecular interactions. Patterns such as these involving the � and g heptad positions (defined in Chapter 1) allow pairs of such coiled-coils to pack together in an optimal way (the 'knob-into-hole' packing postulated by Crick, 1952), shielding hydrophobic residues from the aqueous environment and bringing oppositely charged residues from both chains into close proximity where they can form attractive electrostatic interactions. IF proteins express these sequence features clearly and together they are largely responsible for the self­ assembly of chains into the dimeric molecule. Other regular periods are also evident in IF protein sequences and these are thought to direct higher levels of aggregation from the molecule to the tetrameric unit, and the tetrameric unit to the 10 nm filament. Parry and Fraser (1985) have investigated the rod domain sequences of a number of IF proteins for regularities in the placement of acidic and basic residues. They described a feature common to the major coiled-coil domain sequences of all IF proteins in the type I to IV categories: segment IB shows a -9.54 residue period for these residues and segment 2 has a slightly longer period at -9.84 residues. In addition, the periods of the acidic and basi� residues were found to lie approximately 1 800 out of phase indicating that charged residues were grouped along the chain in clumps of alternating charge. 2 . 1 . 1 Fourier Analysis - Method Fourier transform techniques applied to primary sequence data from proteins have been described by McLachlan and Stewart (1976) and are used here after minor modification. The original method is as follows: residues of interest in the amino acid sequence (for example, acidic or basic residues) are represented by unity and all other residues are replaced by zero. The data set is then zero-filled to improve resolution of the resulting Fourier data and also to meet the modulo 2 size constraints imposed on the data set by the Fast Fourier Transform (FFT) algorithm used. Protein sequences studied have lengths of either 101 residues (segment IB for IF types I-IV) , 1 43 residues (segment IB for IF type V) or 148 residues (segment 2 for all IF types) and all are zero-filled to 2048. The discrete Fourier transform D(f) of the zero-filled data set d(x) is then calculated: N D(f) = F{d(x)} = �Jd(X).exPc2�")tb: CHAPTER 2: Primary Structure page 31 The variable x is the residue position in the linear sequence and f is its analogue in the Fourier domain. The Fourier intensities are scaled by where Nr = size of the unfilled residue data set er = count of residues of interest 10 = Fourier intensity at zero frequency The scaled Fourier intensities will thus be given by FI{d(x)} = Nr er 1.. ID(f) 12 Nr - er Io where ID(012 are the Fourier intensities of D(t). The method outlined above is modified by the introduction of an extra operation prior to zero-filling: the linear sequence is baseline corrected, ie, the average value of the sequence is subtracted from all data points so that the new average is zero. This simple procedure avoids the large zero-frequency spike and its associated side lobes that are otherwise evident in the Fourier transforms. A comparison between the original and the baseline correction methods is shown in Figure 2- 1 together with the difference between the transforms. The large zero-frequency peak (Figure 2- 1b) results from the non-zero average value of the data set and would not generally be considered a problem. However, in combination with the zero-filling operation, the large spike is spread by a significant amount through the Fourier domain and interferes to varying degrees with other peaks as well as producing 'spurious' peaks at low frequencies (see Appendix A). These 'spurious' peaks, or side lobes, are only apparent in the Fourier data when the central peak is relatively large (the ratio of the central lobe of the sinc function to the first side lobe is approximately 20: 1 ). This is generally the case for the zero-frequency peak if the baseline is not corrected. As can be seen from Figure 2- 1c, the effect of the baseline correction is small beyond the zero-frequency sinc function. The scaling factor described above from McLachlan and Stewart ( 1976) makes use of the zero-frequency intensity 10 produced by their method. However, the baseline correction method results in 10 being zero and so an alternative derivation for the scaling factor must be used. The value of 10 for the original method is also equal to the square of the average value of the digitized sequence prior to the baseline correction and may be used in place of Io in the calculation of the scaling factor. The relationship between frequency in Fourier space and period in the residue sequence is: 'od Nf pen -- frequency where Nf = size of the fIlled data set. CHAFfER 2: Primary Structure page 32 Ca) Peripherin, Segment 1B: 8 FT of acidic residues - baseline corrected >. ... .... en 6 c 0 ... � "0 4 0 cu u Cl) 2 0 Cb) Peripherin, Segment IB 8 FT of acidic residues - uncorrected >. ... .... en C 6 £ � "B 4 cu u Cl) 2 0 Cc) Difference between the corrected and uncorrected transforms 4 >. .... .... en C 2 0 .... � "B 0 cu u Cl) -2 -4 0 256 512 768 1024 Frequency Figure 2-1 Comparison of the scaled Fourier intensities of (a) the baseline-corrected and (b) raw data. The differences between the Fourier intensities of (a) and (b) are also shown (c). CHAPTER 2: Primary SbUCture page 33 2 . 1 . 2 Fourier Analysis • Results The scaled Fourier intensities for acidic and basic residues in rod domain segments of peripherin. Xenopus lamins A and B and the Helix pomotia B protein are shown in Appendix B, Figures B-1 to B-5, and a selection of the peaks are listed in Tables B-1 to B-5. The transform for human lamin A, which has been analysed previously by Parry et aI ( 1986), has been included to provide a comparison with the other sequences showing an extended 1B segment. The peripherin data display strong peaks corresponding to a 9.6-9.9 residue period in both the acidic and the basic residues in segments 1B and 2. In the case of the 1B segment, however, the period for the acidic residues appears to have merged with a nearby peak of slightly longer period to produce a single, wider peak (Appendix B, Fi.gure B-1 ). Comparison with the results obtained for some other type III proteins (chicken gizzard desmin, hamster eye lens vimentin and mouse GFAP - Parry and Fraser, 1985) shows a high degree of similarity (Table 2-2) and provides further support for the assignment of peripherin as a type ill IF protein. The acidic residues in segments 1B also reveal the second and third orders of the heptad period (3.47 and 2.33 residues respectively) which is to be expected since charged residues are located predominantly in the h. £, k' f and � positions of the heptad but rarely in positions � and g. The heptad repeat is interrupted near the middle of segment 2B and also in link segment L2 and consequently no significant orders of this period are observed in the transforms of either the acidic or basic residues in segment 2. This is generally true of all short-range (but not necessarily long-range) periods in sequences containing heptad stutters in otherwise regular patterns. Peripherin Other Type ID IF Period Intensity Period Intensity Segment 1B Acidics - - 9.63 ± 0.21 5.62 ± 1 .4 1 Basics 9.62 5.64 9.62 ± 0.05 4.39 ± 0.40 Segment 2 Acidics 9.75 7.39 9.76 ± 0.05 4.40 ± 2. 10 Basics 9.89 7.36 9.82 ± 0.02 6.05 ± 0.34 Table 2-2 Comparison of the dominant period in the linear distribution of the acidic and the basic residues for segments IB and 2 of rat peripherin and some other type III IF proteins (Parry and Fraser, 1985). The -9.6 residue period for the acidic residues in segment IB ofperipberin bas merged with a nearby peak to produce a single broad peak (Appendix B, Figure B-1) and so no exact value can be determined for it CHAYI'ER 2: Primary Structure page 34 Xenopus lamin A and B and Helix pomotia B proteins share the extended IB segment which was first observed in the human lamin A and C proteins. The Fourier transforms of charged residues from the rod domain segments of these proteins are shown in Figures B-3 and B-4 with the human lamin A protein providing a basis for comparison (Figure B-2; also Parry et ai, 1986). The similarities between the transforms of the human lamin A and Xenopus lamin A proteins are striking. Examination of the dispositions of the charged residues in the rod segments of the two proteins shows that they are nearly identical although some difference does occur in the distribution of the acidic residues in segment 2. Extensive homology is apparent throughout the primary sequences (Krohne et ai, 1987): the subdomains of the rod are identical in length and 83% of the residues are identical within the rod with a further 7% conserved in character (see Figure 2-2). An interesting feature of the transforms of human lamin A that was not noted previously (Parry et ai, 1986) is an apparent -19.86 period which can be inferred from peaks corresponding to periods of 9.94 residues (::= 19.86+2) and 6.61 residues (.::: 19.86+3). This pair is particularly strong for the acidic residues in segment 1 and in the rod domain as a whole but is notably absent from the basic residues in segment 2. The Xenopus lamin A protein also shows this feature but not the Xenopus lamin B protein. No evidence for a -19.86 period in the distributions of charged residues has been reported for other IF proteins in either of the two major rod domain segments (Parry and Fraser, 1985). The significance (or otherwise) of this observation i s not apparent. Several of the transforms of charged residues in both human and Xenopus lamin A show peaks corresponding to orders of the heptad substructure (7 , 3.5, and 2.33 residue periods). In addition, the basic residues of segment 2 and the rod domain as a whole reveal two strong peaks corresponding to the first two orders of a -5.06 residue repeat. The segment 1 transforms also show strong peaks corresponding to periods of 3.10 residues (acidics) and 3.02 residues (basics). Neither the -5.06 or 3. 10 and 3.02 periods falls on any orders of the -19.86 repeat (19.86+4=4.97 and 1 9.86+6=3.33) and no explanation of the significance of these periods is currently possible. Xenopus lamin B does not reveal a -19.86 repeat but shows the 9- 10 residue repeat common to all IF. In the acidic residues this period is 9.85 residues and in the basic residues it varies from 9.23 residues (segment 1 ) to 9.85 (segment 2) . The 3 .03 residue repeat apparent in the segment 1 basics of the lamin A proteins is also strongly represented in Xenopus lamin B and falls close to the third order of the 9 .23 residue repeat (9.23+3=3.08). CHAPTER 2: Primary Structure page 35 m..A:l ETPSQRRATR SGAQASSTPL SPTRITRLQE KEDLQELNDR LAVYIDRVRS XLA: ETPGQKRATR S----THTPL SPTRITRLQE KEDLQGLNDR LAVYIDKVRS • • • • lll..A:5 1 LETENAGLRL RITESEEVVS REVSGlKAAY EAELGDARKT LDSVAKERAR XLA: LELENARLRL RITESEDVIS REVTGIKSAY ETELADARKT LDSVAKERAR • • • lll..A: I01 LQLELSKVRE EFKELKARNT KKEGDLlAAQ ARLKDLEALL NSKEAALSTA XLA: LQLELSKlRE EHKELKARNA KKESDLLTAQ ARLKDLEALL NSKDAALTTA • • • lll..A: 15 1 LSEKRTLEGE LHDLRGQVAK LEAALGEAKK QLQDEMLRRV DAENRLQTMK XLA: LGEKRNLENE lRELKAHIAK LEASLADTKK QLQDEMLRRV DTENRNQTLK • lll..A:201 EELDFQKNIY SEELRETKRR HETRLVEIDN GKQREFESRL ADALQELRAQ XLA: EELEFQKSIY NEEMRETKRR HETRLVEVDN GRQREFESKL ADALHELRAQ lll..A:251 HEDQVEQYKK ELEKTYSAKL DNARQSAERN SNLVGAAHEE LQQSRIRIDS XLA: HEGQIGLYKE ELGKTYNAKL ENAKQSAERN SSLVGEAQEE IQQSRIRIDS • • lll..A:301 LSAQLSQLQK QLAAKEAKLR DLEDSLARER DTSRRLLAEK EREMAEMRAR XLA: LSAQLSQLQK QLAAREAKLR DLEDAYARER DSSRRLLADK DREMAEMRAR • lll..A:35 1 MQQQLDEYQE LLDIKLALDM ElHAYRKLLE GEEERLRLSP SPTSQRSRGR XLA: MQQQLDEYQE LLDIKLALDM EINAYRKLLE GEEERLRLSP SPNTQKRSAR lll..A:401 ASSHSSQTQG GGSVTKKRKL ESTESRSS-F SQHARTSGRV AVEEVDEEGK XLA: TIASHSGAHI SSSASKRRRL EEGESRSSSF TQHARTTGKV SVEEVDPEGK lll..A:451 FVRLRNKSNE DQSMGNWQIK RQNGDDPLLT YRFPPKFTLK AGQVVTIWAA XLA: YVRLRNKSNE DQSLGNWQIK RQIGDETPIV YKFPPRLTLK AGQTVTIWAS lll..A:501 GAGATHSPPT DLVWKAQNTW GCGNSLRTAL INSTGEEVAM RKLVRSVTVV XLA: GAGATNSPPS DLVWKAQSSW GTGDSIRTAL LTS SNEEVAM RKLVRTVVIN lll..A:551 EDDEDEDGDD LLHHHHGSHC S----SSGDP AEYNLRSRTV LCGTCGQPAD XLA: DEDDEDNDDM EHHHHHHHHH HDGQNSSGDP GEYNLRSRTI VCTSCGRPAE lll..A:601 KASASGSGAQ VGGPISSGSS ASSVTVTRSY RSVGG-SGGG SFGDNLVTRS XLA: KSVLASQGSG LVTG-SSGSS SSSVTLTRTY RSTGGTSGGS GLGESPVTRN lll..A:65 1 YLLGNSSPRT QSPQNCSIM XLA: FIVGNGQRAQ VAPQNCSIM Figure 2-2 Protein sequences for human lam in A (HLA) and Xenopus lam in A (XLA) - for sources, see Table 2-1 . The heptad-containing regions of the rod domain are indicated above the sequences by a line. Several blanks have been inserted to maintain optimal alignment of the sequences. The internal a and d positions of the heptad are marked above the sequences by '.' . CHAPTER 2: Primary Structure page 36 The Helix pomotia B protein reveals a very strong peak corresponding to a 9.23 residue repeat in the disposition of basic residues in the rod domain (Figure B-5). This is slightly less than the 9.31 and 9.35 residue periods of segments 1 and 2 respectively. The acidic residues of the rod domain also show a dominant peak corresponding to a 9.27 residue period. Another strongly represented period in the basic residues is a 12. 19 residue repeat which is apparent in both segment 1 and the whole rod domain. The dominant 9.23-9.35 residue repeats reported above for the Xenopus lamin B and Helix B proteins are similar to the period resolved in the acidic residues of segment IB of the type I keratins (9.28±o.02: Parry and Fraser, 1985). These period s are significantly less than the 9.5- 10 residue periods found in all other transfonns undertaken here and by Parry and Fraser (1985). Weber et al (1988) located the Helix pomotia A and B proteins in cells that also expressed keratin IF and suggested that the A and B proteins were not localized in the nucleus as are the structurally similar lamin proteins. If the Helix B protein coexists in the cytoplasm with keratin, then it is possible that the common .... 9.23 period in charged residues may allow co aggregation although there is no evidence yet to support this idea. Similarly, the lamin B protein, although sited within the nuclear envelope, is postulated as a nuclear anchorage site for the cytoplasmic IF network (Gerace and Burke, 1988). It may be of significanc e that the -9.25 residue period is present in the distribution of acidic residues of segment IB in keratins and in the basic residues of segment 1 i n the Xenopus lamin B protein . The six transforms for each of the type V proteins have been multiplied together to highlight periods that are commonly represented in each. The results are summarized in Table 2-3. A low intensity period of 1 1 .64 residues is revealed for the human and Xenopus lamin A proteins in addition to the dominant 9.94 residue period already discussed in the human and Xenopus lamins. A similar operation on the four Human lamin A Xenopus lamin A Xenopus lamin B Helix pomotia B 1 1 .64 (452) 1 1 .64 ( 136) 12. 19 ( 166) 10. 14 (228) 9.94 (2938) 9.94 (1702) 9.89 (3 198) 9.25 ( 1 1932) 2.77 ( 1 17) 2.84 ( 1 1 3) 2.52 (142) 2.54 (103) 2.47 (290) 2.022 (100) 2.022 (320) Table 2-3 Periods corresponding to major peaks (ie, scaled intensities greater than 100) resulting from multiplying the six Fourier transforms for each protein together. The intensities are shown in brackets. CHAPfER 2: Primary Structure page 37 transforms of peripherin yields a single peak (intensity 1 1 30) corresponding to a repeat of 9.85 residues. 2 . 2 Residue Distribution in the Heptad The seven residue quasi-repeat, or heptad substructure, that is found in major portions of the primary sequences of a-fibrous proteins is fundamental to the coiled-coil structure. Apolar residues are commonly found at spacings of three and four residues successively and give rise to a hydrophobic stripe that winds in a left-handed sense around the axis of a right-handed a-helix. Pairs of a-helices suffer a small distortion that allows the stripes to be aligned along the interior axis of a supercoil thus shielding them from the aqueous environment. The apolar residues interlock in a 'knob-into­ hole' packing that provides much stability to the coiled-coil structure (see Chapter 1 ) . These internal positions of the coiled-coil in IF are, on average, occupied at a rate of 75% by apolar residues (Parry and Fraser, 1985). The adjacent � and g positions are frequently occupied by charged residues: in IF proteins the � positions are more commonly basic, or positively charged and the g positions are often acidic, or negatively charged. This arrangement specifies the relative alignment of a-helices prior to the precise docking along the apolar stripes: the oppositely charged � and g stripes from different a-helices are adjacent when the chains are parallel. The distribution of residues in the major heptad-containing portions of IF protein sequences is presented in Table 2-4a. The data reveal that (i) acidic residues are very uncommon in position a and relatively uncommon in d, and that such residues occur with approximately equal frequency in positions 12, £., � and g, (ii) the basic residues occur very rarely in d though they may occur in a, and are favoured in � relative to g, and (iii) the apolar residues are most highly maintained in positions a and d, are found comparably in positions �, f and g and are least common in positions 12 and £. Table 2-4a shows other points of interest, some of which have been noted previously (Parry, 1982; Parry and Fraser, 1985). For example, certain residues occur with greatest frequency in the outer positions of the coiled-coil Ch, £ and f); these include glycine (80%), serine (76%), aspartic acid (73%), cysteine (63%), alanine (61 %) and asparagine (56%). In general, such residues (excluding aspartic acid) are not normally associated with the ability to specify intennolecular association. Residues occurring with greatest frequency in the innennost � and g positions are tyrosine (86%), leucine (74%), isoleucine (72%), tryptophan (70%), valine (52%), methionine (50%) and histidine (48%). In some cases there is an asymmetric distribution between the a and d positions thus emphasizing the structural uniqueness of each position. Examples include isoleucine �-81%; d.- 19%), valine �-64% ; Residue count Percentage Occurence � b � d � f g Totals i! 11 � d � f g Ala 76 263 171 123 66 200 143 1042 Ala 4.8 17.0 10.4 7.5 4.0 12.2 8.9 Cys 9 0 23 8 13 34 3 90 Cys 0.6 0.0 1 .4 0.5 0.8 2.1 0.2 Asp 5 148 198 2 66 154 108 681 Asp 0.3 9.5 12. 1 0. 1 4.0 9.4 6.7 GIu 9 294 282 1 18 348 200 382 1633 GIu 0.6 19.0 17.2 7.2 2 1 .2 12.2 23.9 Phe 60 1 47 79 13 0 4 204 Phe 3.8 0. 1 2.9 4.8 0.8 0.0 0.2 Gly 16 38 121 17 20 78 6 296 GIy 1 .0 2.5 7.4 1 .0 1 .2 4.8 0.4 His 27 17 24 42 3 1 8 1 2 143 His 1 .7 1 . 1 1 .5 2.6 0.2 1 . 1 0.7 ne 334 14 12 80 55 51 3 1 577 ne 20.9 0.9 0.7 4.9 3.4 3.1 1 .9 Lys 95 79 50 8 226 1 16 1 8 1 755 Lys 6.0 5. 1 3 .1 0.5 13.8 7.1 1 1 .3 Leu 406 41 33 717 1 17 86 1 12 1512 Leu 25.5 2.6 2.0 43.6 7 .1 5.2 7.0 Met 76 31 8 48 34 30 23 250 Met 4.8 2.0 0.5 2.9 2. 1 1 .8 1 .4 Asn 63 69 1 1 1 3 1 101 1 17 35 527 Asn 3.9 4.5 6.8 1 .9 6.2 7 . 1 2.2 Pro 2 9 3 0 0 0 1 15 Pro 0. 1 0.6 0.2 0.0 0.0 0.0 0. 1 Gin 8 230 91 43 1 1 5 121 228 836 Gin 0.5 14.8 5.6 2.6 7.0 7.4 14.2 Arg 86 137 142 9 262 167 1 14 917 Arg 5.4 8.8 8.7 0.5 16.0 10.2 7.1 Ser 24 95 193 19 54 150 44 579 Ser 1 .5 6. 1 1 1 .8 1 .2 3.3 9. 1 2.7 Thr 15 51 81 37 104 61 95 444 Thr 0.9 3.3 4.9 2.3 6.3 3.7 5.9 Val 147 30 30 81 29 5 1 67 435 Val 9.2 1 .9 1 .8 4.9 1 .8 3 .1 4.2 Trp 6 0 0 8 0 6 0 20 Trp 0.4 0.0 0.0 0.5 0.0 0.4 0.0 Tyr 1 3 1 3 19 174 15 2 12 356 Tyr 8.2 0.2 1 .2 10.6 0.9 0.1 0.7 Totals 1595 1550 1639 1644 1641 1642 1601 1 13 12 Totals 100.0 100.0 100.0 100.0 100.0 100.0 100.0 Apolar 1 154 120 149 1 1 79 263 220 249 3334 Apolar 72.4 7.7 9. 1 7 1 .7 16.0 13.4 15.6 Basic 18 1 216 192 17 488 283 295 1672 Basic 1 1 .3 13.9 1 1 .7 1 .0 29.7 17.2 18.4 Acidic 14 442 480 120 414 354 490 23 14 Acidic 0.9 28.5 29.3 7.3 25.2 21 .6 30.6 Table 2-4a The distribution of residues in the heptad-containing regions of IF proteins combined. The symbols H represent residue positions within the heptad. Large apolar (phe, lIe, Leu, Met, Val, Trp, Tyr), basic (Lys, Arg) and acidic (Asp, GIu) residues are summarized at the bottom of the table. Residue count Percentage Occurence a b c d e f SI Totals a b � d e f SI Ala 32 36 22 75 1 7 36 35 253 Ala 10.2 1 1 .9 7.0 24.0 5.4 1 1 .5 1 1 .2 Cys 2 0 0 1 0 0 0 3 Cys 0.6 0.0 0.0 0.3 0.0 0.0 0.0 Asp 0 42 45 3 14 27 23 154 Asp 0.0 13.9 14.2 1 .0 4.5 8.6 7.3 Glu 1 63 69 21 91 43 60 348 GIu 0.3 20.8 2 1 .8 6.7 29.1 13 .7 1 9.2 Phe 6 2 1 5 1 2 0 17 Phe 1 .9 0.7 0.3 1 .6 0.3 0.6 0.0 Gly 2 6 8 4 6 14 4 44 Gly 0.6 2.0 2.5 1 .3 1 .9 4.5 1 .3 His 4 12 5 3 4 10 3 41 His 1 .3 4.0 1 .6 1 .0 1 .3 3.2 1 .0 TIe 35 3 6 16 8 5 7 80 TIe 1 1 . 1 1 .0 1 .9 5 . 1 2.6 1 .6 2.2 Lys 3 1 50 48 3 20 37 36 225 Lys 9.9 16.5 15 .2 1 .0 6.4 1 1 .8 1 1 .5 Leu 104 5 1 1 109 1 7 9 20 275 Leu 33 .1 1 .7 3 .5 34.8 5.4 2.9 6.4 Met 8 2 2 8 2 3 0 25 Met 2.5 0.7 0.6 2.6 0.6 1 .0 0.0 Asn 14 16 1 3 4 20 1 1 10 88 Asn 4.5 5.3 4.1 1 .3 6.4 3 .5 3.2 Pro 0 0 0 0 0 0 0 0 Pro 0.0 0.0 0.0 0.0 0.0 0.0 0.0 GIn 4 20 29 14 59 18 48 192 GIn 1 .3 6.6 9.2 4.5 18.8 5.8 15.3 Arg 22 22 27 4 15 47 32 1 69 Arg 7.0 7.3 8.5 1 .3 4.8 15 .0 10.2 Ser 7 10 19 6 14 27 16 99 Ser 2.2 3 .3 6.0 1 .9 4.5 8.6 5 . 1 Thr 5 9 8 7 17 1 1 10 67 Thr 1 .6 3 .0 2.5 2.2 5.4 3 .5 3.2 Val 3 1 3 3 18 8 12 9 84 Val 9.9 1 .0 0.9 5.8 2.6 3.8 2.9 TIp 0 0 0 3 0 0 0 3 TIp 0.0 0.0 0.0 1 .0 0.0 0.0 0.0 Tyr 6 2 0 9 0 1 0 18 Tyr 1 .9 0.7 0.0 2.9 0.0 0.3 0.0 Totals 314 303 3 16 3 1 3 3 1 3 3 1 3 3 1 3 2185 Totals 100.0 100.0 100.0 100.0 100.0 100.0 100.0 Apolar 190 17 23 165 36 32 36 499 Apolar 60.5 5.6 7.3 52.7 1 1 .5 10.2 1 1 .5 Basic 53 72 75 7 35 84 68 394 Basic 16.9 23.8 23.7 2.2 1 1 .2 26.8 2 1 .7 Acidic 1 105 1 14 24 105 70 83 502 Acidic 0.3 34.7 36. 1 7.7 33.5 22.4 26.5 Table 2·4b The distribution of residues in the heptad-containing regions of pararnyosin, myosin and tropomyosin combined. Large apolar (Phe, TIe, Leu, Met, Val, TIp, Tyr), basic (Lys, Arg) and acidic (Asp, Glu) residues are summarized at the bottom of the table. Total Counts Average Percentage Occurance g 12 £ d � f � Totals .a 12 k d � f � A1a 108 299 193 198 83 236 178 1 295 Ala 7.5 14.4 8.7 15.7 4.7 " 1 1 .8 10. 1 Cys 1 1 0 23 9 1 3 34 3 93 Cys 0.6 0.0 0.7 0.4 0.4 1 .0 0. 1 Asp 5 190 243 5 80 181 131 835 Asp 0.2 1 1 .7 13.2 0.5 4.2 9.0 7.0 Glu 10 357 351 1 39 439 243 442 198 1 Glu 0.4 19.9 19.5 6.9 25. 1 13 .0 21 .5 Phe 66 3 48 84 14 2 4 221 Phe 2.8 0.4 1 .6 3.2 0.6 0.3 0. 1 Gly 1 8 44 129 21 26 92 10 340 Gly 0.8 2.2 5.0 1 .2 1 .6 4.6 0.8 His 3 1 29 29 45 7 28 15 184 His 1 .5 2.5 1 .5 1 .8 0.7 2 .1 0.9 ne 369 17 18 96 63 56 38 657 ne 16.0 0.9 1 .3 5.0 3.0 2.4 2.1 Lys 126 129 98 1 1 246 153 217 980 Lys 7.9 10.8 9. 1 0.7 10. 1 9.4 1 1 .4 Leu 510 46 44 826 134 95 132 1787 Leu 29.3 2.1 2.7 39.2 6.3 4.1 6.7 Met 84 33 10 56 36 33 23 275 Met 3.7 1 .3 0.6 2.7 1 .4 1 .4 0.7 Asn 77 85 124 35 121 128 45 615 Asn 4.2 4.9 5.4 1 .6 6.3 5.3 2.7 Pro 2 9 3 0 0 0 1 15 Pro 0. 1 0.3 0. 1 0.0 0.0 0.0 0.0 GIn 1 2 250 120 57 174 139 276 1028 GIn 0.9 10.7 7.4 3.5 12.9 6.6 14.8 Arg 108 159 169 13 277 214 146 1086 Arg 6.2 8.0 8.6 0.9 10.4 12.6 8.7 Ser 3 1 105 212 25 68 177 60 678 Ser 1 .9 4.7 8.9 1.5 3.9 8.9 3.9 Thr 20 60 89 44 121 72 105 5 1 1 Thr 1 .3 3 .1 3.7 2.2 5.9 3.6 4.6 Val 178 33 33 99 37 63 76 519 Val 9.5 1 .5 1 .4 5.3 2.2 3.5 3.5 Trp 6 0 0 1 1 0 6 0 23 Trp 0.2 0.0 0.0 0.7 0.0 0.2 0.0 Tyr 137 5 19 183 1 5 3 12 374 Tyr 5. 1 0.4 0.6 6.7 0.5 0.2 0.4 Totals 1909 1853 1 955 1957 1954 1955 1 914 1 3497 Totals 100.0 100.0 100.0 100.0 100.0 100.0 100.0 Apolar 1344 137 172 1344 299 252 285 3833 Apolar 66.4 6.7 8.2 62.2 13.8 1 1 .8 13.5 Basic 234 288 267 24 523 367 363 2066 Basic 14. 1 1 8.8 17.7 1.6 20.5 22.0 20. 1 Acidic 15 547 594 144 519 424 573 2816 Acidic 0.6 3 1 .6 32.7 7.5 29.4 22.0 28.6 Table 2-4c (Left) The distribution ofresidues in the heptad containing regions of the IF and myosin-type proteins used in Tables 2-1a and 2-1b. (Right) Average percentage distribution for the IF and myosin-type proteins used in Tables 2-1a and 2-1b. Large apolar (phe. De. Leu. Met. Val. Trp. Tyr). basic (Lys. Arg) and acidic (Asp. Glu) residues are summarized at the bottom of the table. CHAPTER 2: Primary Structure page 41 d-36%), methionine (a-61 %; £1-39%), histidine (a-39%; £1-6 1 %) and leucine (a-36%; d-64%). Another feature of Table 2-4a is the preponderance of leucine residues in the d position where leucine occurs four times as often as any other residue and makes up 44% of the total. Leucine and isoleucine contribute 26% and 21 % respectively to the residues found in the a position. The average occurrence of large apolar residues in the A and d positions is 72%, a figure in close agreement with that determined by Parry and Fraser (1985). Including the alanines in the apolar group raises this figure to 78%. The coiled-coil rod regions of several myosin-type sequences (myosin, paramyosin and tropomyosin) have also been analysed (Table 2-4b) for comparison with the IF proteins. The occurrence of large apolar residues in the a and d positions is not as great as for IF proteins (57%) but this figure becomes comparable when alanines are included (74%). Interestingly, acidic residues clearly dominate the � position for the myosin-type proteins (33.5% acidic: 1 1 .2% basic) but the & position for the IF proteins (29.8% : 18.6%). When the "myosins" and IF proteins are combined into a single table (Table 2-4c) the dominance of leucine is again apparent in the d position of the heptad and together with isoleucine it dominates the A position. Acidics and basics are now more evenly matched in � but acidics remain more common in &. It is interesting to note that some of the less common residues are excluded from certain positions within the heptad: cysteine is never found in 12; proline (an a-helix-disrupting residue) is never found in do � or f; and tryptophan is only found in 11, do and f. This study emphasizes the importance of apolar residues in specifying the heptad substructure and hence the stability of the coiled-coil. In particular, isoleucine and leucine are especially common in the a position of the heptad and leucine is dominant in the d position. Charged residues are found fairly evenly amongst the remaining, more external positions: those in the � and & positions are involved in specifying the relative orientations of the chains that form the dimeric molecule while the others are . on the exterior of the coiled-coil and specify higher orders of aggregation. 2 . 3 Flex ibi l i ty Prediction schemes are used to characterize the attributes of protein chains where no direct method of measurement is readily available. These schemes generally apply a statistically weighted scoring system to a region of protein sequence in an effort to establish the potential for a given characteristic such as hydrophobicity (Rose, 1978; CHAPI'ER 2: Primary Structure page 42 Kyte and Doolittle, 1982), antigenicity (Hopp et ai, 1981) or the probability of adopting a particular element of secondary structure (Chou and Fasman, 1974; Gamier et ai, 1978). A scheme for calculating the relative axial stagger of coiled-coil proteins has met with particular success in predicting the stagger of collagen molecules (and also myosin and paramyosin) and is used in Chapter 4 in a modified form. Karplus and Schulz (1985) introduced a method for predicting the flexibility of protein chains by using temperature factor data deduced from refined crystallograpbic structures. Their prime aim was to provide an improved tool for selecting peptide antigens and cross-reacting peptides based on the link between segmental flexibility and antigenic determinacy (Westhof et ai, 1984; Tainer et aI, 1984). This method was used by Conway et al ( 1989) and Con way and Parry ( 1989) to compare the flexibilities of different IF chain types, and also the different segments of the IF chains. However, some care must be taken in interpreting the results from IF protein chains as the method of Karplus and Schulz is based on data collected from globular proteins - the rod domain of IF is clearly not globular although the N- and C-tenninal domains may be. The data base of Karplus and Schulz comprised the temperature factors of the Ca atoms of residues from 3 1 globular proteins. The averages and ranges of these factors were noted to vary between proteins and this was assumed to be a result of differences in structure refinement methods rather than natural variances. As a consequence, the temperature factors were normalized and the root-mean-square deviation was made constant The average normalized temperature factor was determined for each residue and the residues were grouped into two classes: average nonnalized temperature factor > 1 ( ,flexible') or <1 ('rigid'). Separate flexibility indices were determined from the average normalized temperature factors according to whether a residue had zero, one or two 'rigid' neighbours. These indices were used to generate a profile of chain flexibility which was then smoothed with a triangular weighting profile (ie weights 1/ 16 , 2/16, 3/16, 4/16, 3/16, 2/16, 1/1 6). Conway et aI ( 1989) and Conway and Parry ( 1989) assumed that the same method could be applied to the IF chains even though they are not globular proteins. The weighting scheme was modified in order not to disguise effects arising from the relatively short 7 and 10 residue periods known to be present in the rod domain sequences (ie, the heptad substructure and the charged residue periods respectively) and the weights used were 1/4, 2/4, 1/4. Flexibility profiles from a selection of IF chains from all chain types are shown in Figure 2-3 and the average scores (± s.d.) for the chain segments are listed in Table 2-5. CHAPTER 2: Primary Structure � 1 .2 Cl) � 1 .0 :.s .� page 43 ti: O.S � ID H3i 2B @ (a) Component Sc- l �---- �====��==��------������--� � 1 .2 � 1 .0 :E � O.S � 1 .2 Cl) � 1 .0 P-I< � ID H3i 2D @ (b) Component 7 d: 0.8 � ID H3i 28 @ (c) Mouse M59K �--� �====��==��--�--������--� � 1 .2 8 Cl) � 1 .0 :.E .� ti: 0.8 � ID H3i 2B @ (d) Mouse M67K �---- �====�====��--�--������--� � 1 .2 Cl) >. � 1 .0 :.E .� ti: 0.8 � IB H3i 2B @ (e) Chicken Gizzard Desmin �---- �====�====��--�--��---�--�--� � 1 .2 (.) Cl) ,q 1 .0 :.E .>< d: 0.8 � 1 .2 Cl) � 1 .0 :.E .>< d: 0.8 � 1 .2 (.) Cl) .q 1 .0 :.E .>< � ID H3i � ID H3i 2B @) 2B @ (f) Pig NF-L (g) Human NF-M G: 0.8 0 lA I 18 H3l 2B @) (h) Human Lamin A L- __ �========�====�� __ � __ ����� ____ � Figure 2-3 Flexibility profiles for a selection of IF chains. nle horizontal axis is divided into units of 1 00 residues. Scores above 1 .0 on the vertical axis indicate greater than average flexibility and scores below 1 .0 indicate more rigid regions of sequence. Sequences are aligned by the N-tenninus of segment 2A. Segment Type la Type ITa Type lb Type lIb Type ID Type IV-L Type IV_M Type V Comp. 8c- 1 Comp. 7 M59K M67K COD PNF-L HNF-M LaminA El and V I ? ? 1 .064±O.082 1 .061±O.081 1 .037±O.074 - - - HI ? ? 1 .06O±O.078 1 .03 1±O.071 0.995±O.080 1 .016iO.073 1 .044iO.083 1 .053±O.O61 lA 1'()04iO.064 I .OO2iO.087 1 .000±0.O70 1 .009±0.078 0.982±O.074 0.989±O.077 0.998±O.080 1 .007±O.073 LI 1 .062iO.107 1 .014iO'(>92 1 .072iO.070 1 .050i0.053 1 .064iO.074 1 .058±O.074 1 .003±<>.085 - IB 1 .01 0i0.073 0.999±O.074 I .OO3iO.068 1 .012±<>.068 1 .001iO.068 1 .005±O.072 1 .000±0.07 1 1 .01 1±<>.067 L12 0.990±0.049 1 .01 1iO.089 1 .002iO.068 1 .014iO.070 0.983iO.070 0.976±O.079 0.988±<>.074 1 .041iO.075 2A 0.987iO.061 0.984±O.07 1 l .OO5iO.052 0.989±O.074 0.979±O.072 0.986±O.074 1 .038±<>.059 0.987±<>.050 L2 1 .032±<>.061 1 .027±O.077 1 .046±O.061 1 .033±O.08 1 0.976±O.060 1 .002±<>.046 0.967±<>.073 1 .040i0.052 2B 1 .003iO.074 1 .008iO.075 1 .024iO.069 1 .015iO.069 1 .00liO.070 1 .006±0.072 I .OO7iO.068 1 .007±O.071 H2 ? ? - 1 .017iO.056 1 .015iO.OS5 1 .030±0.067 1 .02IiO.069 1 .025iO.083 V2 and E2 ? ? 1 . 1 1 OiO.064 1 . 105iO.067 - 1 .079±O.044 1 .062±<>.066 1 .041±<>.081 N 1 .003±O.OS4 I .OO2±<>.076 I .063iO.OS2 1 .05 1iO.079 1 .028iO.077 1 .016±O.073 1 .044±O.083 1 .053±O.061 Rod I .OO7±O.072 I .OO4±O.077 1 .0I 5±O.069 1 .013iO.072 0.99S±O.070 1 .003±O.073 1 .003iO.071 1 .0 1 0i0. 069 C 0.998±O.079 1 .000±0.074 1 . 1 1OiO.064 1.094±O.072 1 .015±O.085 1 .047±O.064 I .055iO.068 1 .025±O.082 Table 2-5 Mean flexibility indices (± s.d.) for chain segments from all IF Types. Sources are shown in Table 3- 1 . Abbreviations are as follows: Comp. 8c- I , Component Se- I ; Comp. 7, Component 7; M59K, Mouse 59K; M67K, Mouse 67K; COD, Chicken Gizzard Desmin; PNF-L, Pig neurofilament light chain; HNF-M, Human neurofilament medium chain; ?, extent of the segment has not been determined (see Chapter 3); -, segment is not present in the sequence. CHAPfER 2: Primary Structure page 45 A striking feature of the flexibility profiles is the regular pattern shown in the terminal domains of the epidermal keratins M59K and M67K and also in pans of the C-terminal domain of human NF-M. The calculated flexibility values are particularly high for these 'soft' keratin chains in the VI and V2 subdomains and this is consistent with these regions being flexible, interactive and having an external location on the surface of the IF. Indeed a high degree of flexibility as well as strength is required in the epidermis which forms the barrier between an animal and its environment. The C-terminal domains of neurofilament chains are part of an extended bridge between IF and microtubules and may also be seen to require a degree of flexibility. The periodicities present in the flexibility proftles of the V 1 and V2 subdomains of the epidermal keratins include a 5-residue period and a quasi-halved 9-residue period in the V I subdomain of M59K, a to-residue period in the V2 subdomain of M59K, a 5-residue period in the VI su1:xlomain of M67K, and a 28-residue period in the V2 subdomain ofM67K. These periods and others (see Steinert et aI, 1983a, 1985a) are in each case associated with a glycine-serine rich structural motif. The regular pattern observed in the middle of the C-terminal domain of human NF-M is due to a l 3-residue repeat (KSPVEEKGKSPVP). Average flexibility scores for the segments are shown in Table 2-5. Segments lA, lB, and 2B are typically close to unity as is the rod domain as a whole. The HI and H2 domains are generally more flexible than the rod domains (with the exception of HI in the chicken gizzard desmin sequence) and the End and Variable domains combined (ie E l and V I , E2 and V2) are the most flexible. The hard a-keratin chains, however, . have mean flexibility indices close to unity over their entire N- and C-terminal domains and this may be an indication that these regions display a considerable degree of intra­ sUbtype homology (see also Crewther et aI, 1985, and Sparrow et aI, 1 989). Furthermore, their particular scores are comparable to those for the entire rod domains of all the chains studied and suggest that the terminal domains of the hard a-keratins are better defined structurally than most other IF chains. The rod domain segment predicted to have the highest mean flexibility is generally the link, Lt . This is consistent with the variable length of this link which can differ even within a single chain type and which is actually non-existent in type V chains. Almost certainly the structure ofLl is poorly defined. Segment 2A is predicted to have (marginally) the lowest flexibility while the adjacent link segment L2 is predicted to have the second highest within the rod domain (except for component 7 from wool a-keratin where it is the highest). Interestingly, the pig NF-L sequence shows the reverse of this trend: segment 2A has the highest mean CHAPTER 2: Primary Structure page 46 flexibility score and segment L2 has the lowest. These segments are of particular interest as they represent the major portion of a region where the coiled-coil undergoes a rearrangement caused by the discontinuity, or stutter, in the heptad phasing that occurs in segment L2. This latter segment, although predicted to be a-helical (see, for example, Geisler and Weber, 1982; and Parry and Fraser, 1985), lacks the heptad repeat characteristic of coiled-coil conformations found in the a-fibrous protein class of structures. The flexibility scores indicate that both chains of the IF molecule are flexible in the region of L2 and it is possible that such flexibility may provide a means by which the integrity of the coiled-coil structure in segments 2A and 2B can be maintained whilst allowing the coherence of segment 2 across the stutter to be re­ established in a gradual manner. The differences between flexibility proflles in this region may indicate alternative conformations of the two chains around the stutter. The flexibility profiles of chains comprising a hetenxiimeric keratin molecule are not in phase in the vicinity of the second heptad stutter found close to the centre of segment 2B in all IF chains. This may indicate that the heptad phasing is re-established at this point by a relatively sharp discontinuity in the molecular structure, possibly involving a kink in the axis of the coiled-coil. The nature of the discontinuity suggested here is in contrast to that postulated above for the stutter in segment L2. A feature of the flexibility proftles is that regions in the rod domain of high flexibility are frequently bounded by regions of particularly low flexibility. It follows that a high local flexibility score should not be interpreted in terms of a breakdown in coiled-coil structure but instead as a shon region of structure with a greater propensity for interaction with other IF molecules or perhaps intermediate filament associated proteins (IFAPs). Indeed many of the peaks can be correlated with the presence of basic residues (predominantly lysine) and acidic residues. These charged residues are intimately involved in the interactions which specify both molecular assembly and aggregation in vivo (see Chapter 1). The lamin chain shows a higher flexibility score for the N-terminal domain than the C-terminal domain, unlike most of the other sequences studied. The N-terminal domain of lamin is amongst the smallest of any of the IF chains whereas the C-terminal domain of lamin A is one of the largest after the NF-H chains. In summary, some of the features of the flexibility profiles can be related to the sequences of the chains, such as the glycine-serine rich area in the terminal domains of the epidermal keratins, and a tentative link to function can also be proposed, viz. the flexibility of skin. Generalizations of the proflle data made above always have exceptions and these may be the result of a link between predicted flexibility and the CHAPI'ER 2: Primary Structure page 47 function of individual IF. Indeed, although IF share a common plan for the rod domain, they are found in a wide range of cell types and hence it is to be expected that the variability present in the terminal domains in particular, as well as in the less well defined regions of the rod domain, is a result of differing function. 2 . 4 Summary A variety of structural studies based on IF protein sequences have been presented in this Chapter. These include an examination of the periodicities in the rod domains, the distribution of amino acid residues in the heptad substructure and the flexibility of the peptide backbone. Fourier techniques were used to investigate regularities in the disposition of charged residues in the rod domains of peripherin, the nuclear lamin proteins, and the Helix pomotia B protein. The Fourier data for peripherin showed a high degree of similarity with that from other type ill IF chains and confinn the assignment of peripherin to the type ID grouping. The close homology between the human and Xenopus lamin A proteins is reflected in the Fourier transfonns of their charged residues. A novel feature of these transfonns is an apparent doubling of the 9- 10 residue period found commonly in IF chains; this is associated with the larger scale packing of the molecules (compared to the short range heptad structure which is related directly to the mode of chain packing). The transforms of sequences from the Xenopus lamin B rod domain segments reveal a period of -9.3 residues in the disposition of basic residues in segment IB . This is a little shorter than generally found in the same segments in other IF chains although it is characteristic of the period found in acidic residues in segment I B of type I keratins. The Helix protein also reveals high intensity peaks corresponding to a similar period. Lamin B proteins are thought to be nuclear anchorage sites for the cytoplasmic IF network (Georgatos and Blobel, 1987; Gerace and Burke, 1988) and this common shorter period allows the possibility of a direct linkage between the cytoplasmic and nuclear IF networks. Helix B type V IF are apparently coexpressed with keratin IF in the cytoplasm (Weber et aI, 1988) and it is possible that the common period might also allow coaggregation of the two networks although it must be emphasized that there is no evidence yet available to suggest that this actually takes place. An analysis of the distribution of residues within the heptad substructure shows the preference of certain types of residues in certain positions. Apolar residues are confmned to occupy -75% of the i! and si positions though the distribution of residues between the two positions is highly non-uniform. For example, leucine is found much more frequently in the d position than the i! position but for isoleucine the il position is more commonly occupied than the d. This emphasizes that, although the i! and si. CHAPI'ER 2: Primary Structure page 48 positions are similar in that they are internal to the coiled-coil structure and crucial for the formation of the hydrophobic core that stabilizes the molecule, they are nonetheless stereochemically non-equivalent. This 'difference had also been observed by Phillips et al ( 1986) who correlated the occurrence of apolar residues having branched sidechains in the d. position with more flexible regions of coiled-coil or even "hinge" regions in the case of myosin (see also Cohen and Parry, 1989). The charged residues are also unevenly distributed between the il and d. and the � and " positions although of course there are relatively few non-apolar residues in the internal il and d. heptad positions. The asymmetric distribution found, however, does enable basic residues in the il and � positions of the same chain to interact with acidic residues in the d. and , positions of an adjacent chain (see Figure 1-4) to stabilize the two-chain molecular structure. An interesting comparison between two families of a-fibrous proteins reveals that for IF proteins the acidic residues are more common than basic residues in the " position of the heptad substructure but that for myosins the acidics dominate in the structurally similar � position. These two heptad positions are important for specifying the (parallel) orientation of the chains in the molecule. In the case of tropomyosin, which has a more pronounced division of charged residues among the two positions (parry, 1975), the � and , stripes on adjacent chains will attract if the chains are parallel or repel if antiparallel. For IF and the myosin-type proteins in general, the net charge of each position is more even. The flexibility proftles of representative IF chains has revealed some features that may be identified with structural elements of the molecule. The terminal domains are generally more flexible than the rod domains, especially for the soft keratins and the neurofilament chains studied. This flexibility may be a functional requirement of these domains (or at least parts of them). The epidermis, for example, is tough yet yielding and this 'soft' character may be attributable in part to the terminal domains of the constituent IF chains and the interactions these make with other IF or IF-associated proteins. In contrast, the hard keratin chains studied are predicted to be less flexible than their epidermal counterparts. Regular patterns of residues in parts of the amino acid sequence are also reflected in the flexibility profiles. For most IF chains the link region Ll is generally predicted to have the greatest flexibility of the rod domain segments: this is also one of the rod domain regions with the lowest degree of sequence homology. Another interesting observation is the flexibility apparent in the link L2 which also contains a stutter in the phasing of the heptad substructure. The apparent freedom allowed in the conformations of the chains CHAYfER 2: Primary SbUCture page 49 in this region may allow the integrity of the coiled-coil structures to be maintained over the discontinuity. The stutter towards the centre of segment 2B, however, appears to be·accommodated by a more abrupt conformational rearrangement of the coiled-coils. CHAPrER 3: Sequence Homology page 50 I 3. SEQUENCE HOMOLOGY I In general terms, homology may be described as a measure of correspondence or sameness among objects. This concept is useful for identifying groupings of objects with high degrees of homology and may also show the features that are common to a grouping as well as those that differentiate one group from another. In this chapter, homology is defined in terms of the degree of similarity between the linear distribution of the amino acid residues constituting the coiled-coil rod domain of IF protein molecules, although the techniques used here may be applied to the sequences of any family of proteins. The structme of the coiled-coil rod domain is relatively well defined and constant in length compared to the terminal domains of the IF proteins and so sequence comparisons amongst the proteins are more easily carried out in this domain. Sequence data from a selectioQ of IF proteins (Table 2- 1) are used to generate three homology statistics (ha, hr. hs) that describe, in large part, the degree of correspondence amongst groupings of the proteins. In general, mammalian sequences are examined although some pans of this study will compare non-mammalian (usually amphibian proteins) and mammalian proteins. The homology statistics are also used to generate a consensus, or archetypal, sequence for the various IF chain types I-V. The alignment of incomplete rod domain sequences is maintained by using blank positions which are otherwise ignored in the calculation of homology. Blanks are also used in the variable length segments L1 and L12 to maximize homology within these segments as well as to maintain alignment. The positions and numbers of blanks are somewhat arbitrary in L1 and L12 as the degree of homology is relatively low. The total number of residue positions for the rod domain of types I-IV IF has been set to 327; for type V IF chains this has been increased to 359 as a result of the 42 residue insert in segment 1B and the replacement of the link L1 by extensions to segments lA and 1B for which no padding is necessary. Two sequences exhibit anomalous lengths for some rod domain segments. The Endo B cytokeratin (M47K - Singer et ai, 1986) contains 17 residues in link L12 instead of the usual 16 for type I IF chains (Table 1-1). Also, an extra residue (acidic) is insened within the last heptad of segment 2B which is strictly conserved in length in all other IF chains. For these reasons, this protein has been excluded from the analyses described in this section. The human NF-L chain (Julien et ai, 1987) has a length of 103 residues for segment 1B, unlike all other type I-IV IF chains. The two extra CHAFfER 3: Sequence Homology page 51 residues (a glutamic acid and a threonine residue) are easily detected by comparison with other NF-L chains and in both cases represent a repeat of the residue immediately preceding it. Also, segment L12 is only 16 residues in length unlike all other NF-L and NF-M chains for which the length is 17 residues. Because relatively few NF-L sequences available, a modified sequence has been used where these two repeated residues in segment 1B have been deleted. 3 . 1 Homology Statistics Suppose that N sequences are to be compared and that the padded lengths of the sequences is n (n=327 for type I-IV chains; n=369 for type V chains) where the origin is at the amino-terminus of segment lA. Let A(iJ) denote the amino acid within protein sequence i ( lSiSN) at a linear position j (1�j�n). The various segments of the rod domain for types I-IV IF are defined as follows: segment lA A(i, l ) to A(i,35) segment Ll A(i,36) to A(i,56) segment 1B A(i,57) to A(i, 157) segment L12 A(i, 158) to A(i, 179) segment 2A A(i, 1 80) to A(i, 198) segment L2 A(i, 199) to A(i,206) segment 2B A (i,207) to A(i,327) For the type V sequences, the segment boundaries are: segment lA A(i, l ) to A(i,41) segment 1B A(i,43) to A(i, 189) segment L12 A(i, 190) to A(i,21 1 ) segment 2A A(i,212) to A(i,230) segment L2 A(i,23 1) to A(i,238) 35 residues 21 residues 10 1 residues 22 residues 19 residues 8 residues 121 residues 41 residues 147 residues 22 residues 19 residues 8 residues segment 2B A(i,239) to A(i,359) 121 residues The 42-residue insert in segment lB is part of a larger 70-residue piece at positions 90 to 1 59 inclusive in the type V chains which replaces residues 99 to 126 of the types I-IV chains (Parry et ai, 1986). Although this causes misalignment of the type V chains with the type I-IV chains, only homology within chain types is calculated and so the relative alignment of chains of different types is not important. The stutter in the phasing of the heptad substructure that occurs near the middle of segment 2B necessitates further subdivision of this segment into 2B 1 and 2B2 which are bounded for types I-IV IF as follows: segment 2B 1 A (i,207) to A(i,264) segment 2B2 A(i,266) to A(i,327) 58 residues 62 residues CHAPTER 3: Sequence Homology and for type V: segment 2Bl A(i,239) to A (i,296) segment 2B2 A(i,298) to A(i,359) 58 residues 62 residues page 52 A single residue has been designated as the 'locus' of this heptad stutter (residue 265 in types I-IV IF and residue 297 in type V IF). This is a somewhat arbitrary choice as in fact the heptad pattern can be continued through the 'locus' from either direction for a short way and indeed it is not at all apparent whether the stutter i s manifested as a sharp discontinuity or as a more extensive but gradual re-ordering of the coiled-coil structure. A similar situation arises for the link Ll in type V chains which is reduced to a stutter in the heptad phasing: in type I-IV chains it is a short non-ex-helical segment. As with the segment 2B stutter, a single residue position has been chosen in the type V chains as the centre of the stutter - position 42. The homology statistics ha, hr and hs are defined below. Note that the homology scores are given as percentages where 100% homology indicates identical sequences. 3 . 1 . 1 Amino Acid Homology Score, ha Each amino acid is compared in turn with each amino acid in the N sequences sharing a common value of j, including itself. For every comparison a look-up table (for example, Table 3- 1 ) is used to score the degree of homology between the pairs of residues. The basis of the scoring system is as follows: residue pairs having no relationship are scored zero; residue pairs of conservative size and character (eg. Leu­ Val, Met-Phe, etc.) are scored 0.5; residue pairs of conservative charge (eg. Lys-Arg or Asp-Glu) are scored 0.75; pairs of identical residues are scored unity. Other look­ up tables are used for scoring homology among specific types of residues and are listed in Table 3-2. For each comparison made the scores are added and normalized to generate the homology statistic ha(iJ). This may be formulated mathematically as: ha(iJ) = � �S[A(iJ) :A(kJ)] k=1 where S[A(iJ):A(kJ)] is the look-up score for the residue pair A(iJ) and A(kJ) . Note that this defmition implies a minimum homology score of � S [A(iJ):A(iJ)] 3 . 1 .2 Residue Homology Score, hr This i s defined as the largest value of ha(iJ) at some residue position j, ie. hr(J) = max[ha(iJ)] for 19� Asp GIu Arg Lys His ne Leu Met Phe Tyr Val Asp 1 GIu 0.75 1 Arg 1 Lys 0.75 1 His 0.5 0.5 1 TIe 1 Leu 0.5 1 Met 0.5 0.5 1 Phe 0.5 0.5 0.5 1 Var 0.5 0.5 0.5 0.75 1 0.5 0.5 0.5 0.5 0.5 Ala 0.25 0.25 0.25 0.25 0.25 Asn 0.5 Cys GIn 0.5 GIy Pro Ser Thr Trp 0.5 0.5 A� Glu Arg Lys His ne Leu Met Phe Tyr Table 3-1 Look-up table for mixed homology scores S [ A(iJ):A(k.j)J 1 0.25 Val Ala Asn Cys GIn Gly Pro 1 1 1 1 1 1 Ala Asn Cys Gin Gly Pro Ser Thr Trp 1 0.5 1 1 Ser Thr Trp Asp GIu Arg Lys His TIe Leu Met Phe Tyr Val Ala Asn Cr;s G n Gly Pro Ser Thr Trp CHAPrER 3: Sequence Homology Asp Glu Arg Lys ne Leu Met Acidic Asp 1 Homology Table Glu 1 1 Basic Arg 1 Homology Table Ly_s 1 1 Large Apolar ne 1 Homology Table Leu 1 1 Met 1 1 1 Phe 1 1 1 Tyr 1 1 1 Val 1 1 1 Asp Glu Arg Lys ne Leu Met Table 3-2 Look-up tables for Acidic, Basic and Large Apolar homology page 54 Phe Tyr Val 1 1 1 1 1 1 Phe Tyr Val and is a reflection of the most common feature of the various sequences being compared at that position. It leads naturally to the idea of a consensus residue and, by extension, to a consensus sequence. 3 . 1 . 3 Segment Homology Score, hs The value of hs(i) is the average ha(iJ) value within an entire sequence or some subsequence (often a coiled-coil or link segment). If ml and m2 are the positions of the fIrst and last residues respectively in some sequence of interest then the hs(O score for that subsequence is defmed as 1 m2 hs(i) = (m 2 - m 1 + 1 ) ''2jza(iJ) j=ml where 1�1-65 99-101 64-68 6>-65 - 85-97 1 1 1-1 14 80-87 72-76 1 10-124 1 17-120 129-132 140-150 1 35-145 150-154 L12 - - - - - 2A - 187-192 1 82-187 1 86-189 - 1.2 199-202 200-204 204-206 - - 2B 234-236 237-241 207-210 298-324 359-367 244-248 3 10-323 2 17-229 242-246 248-262 290-327 Table 3-3 Regions in the homology profiles (hr scores) where the scores are greater than or equal to 90%. The regions listed are at least three residues in length and indicate highly conserved primary structure. h�90% or hr<60% are listed in Tables 3-3 and 3-4 respectively. In addition, the segment homology scores (hs) have been calculated for the 7 subdomains of the rod region and for the rod domain as a whole. The scores for the chains belonging to the various types were then averaged and are shown in Table 3-5. The lamins have also been analysed as a subgroup of type V IF to highlight the greater degree of homology present than when they are taken in combination with the Helix pomotia B protein. 3 . 2 . 1 Coiled-Coil Segments The homology data indicate that segment lA is generally highly conserved within each chain type (Figure 3- 1 and Table 3-4) and also in the type I-IV IF chains a s a whole (Table 3-5) compared to the other coiled-coil segments. The observation of highly conserved sequences within segment lA and at the C-terminal end of segment 2B has been noted previously (see Chapter 1) and is confmned by this analysis. Indeed the degree of homology is very high in the C-terminal portion of segment 2B for the type III and type IV chains and over the last 30 residues it is close to 100% (Figure 3-1 ; Table 3-3). The type V chains reveal a similar highly conserved region in segment lA but no corresponding region at the C-terminus of segment 2B. This is due to the overall low homology of the Helix pomotia B protein with the other type V rod CHAPrER 3: Sequence Homology Se nt lA L1 47-50 1B 74-77 L12 161-165 176-179 174-176 2A L2 2B ID 34-35 36-41 45-50 72-76 174-179 17 1-174 page S7 v 35-41 43-55 154-156 267-270 Table 3-4 Regions in the homology profiles (hr scores) where the scores are less than 60%. The regions listed are at least three residues in length and indicate variability in the conservation of primary structure. domains despite other obvious structural similarities (see Table 3-5): the lamins taken alone do exhibit this feature. The mean hs scores (Table 3-5) show an interesting difference between coiled-coil segments for the types I-IV IF. The mean hs score for segment IB of type I chains is about 5% higher than that for segment 2B (and indeed for segment 2 as a whole). For types IT, ill and IV chains the mean values of hs for segment IB are lower by 6, 9 and 12% respectively than those for segment 2B. The type V lamin chains reveal similar mean scores for the IB and 2B segments (82.3% and 86.7% respectively). Consequently for keratin molecules, which are believed to be obligate heteropolymers containing a type I chain and a type 11 chain (see Chapter 1 ) , the sequences of molecular segments IB and 2 are preserved at comparable levels of significance whereas for the homopolymeric type III and type IV molecules, the sequence of segment 2B has been maintained with greater fidelity than has segment lB . This indicates that segment 2 in the type ID and the type IV molecules plays a slightly different structural role than in the keratin molecules and again raises the possibility that IF may exhibit more than one structural fonn. 3 . 2.2 Link Segments The a-helical link segment L2 exhibits a higher degree of homology within and across all chain types than the other link segments (Table 3-3, Table 3-5, Figure 3- 1 , Figure 3-2). The hs scores for L2 range from 83-89% for types I, II and III chains and these values are very similar to those noted previously for coiled-coil segment lA. The type IV and V hs scores for segment L2 are a little lower at 68.4% and 66.3% respectively CHAYI'ER 3 : Sequence Homology page 58 Segment Type I Typell Types I and IT lA 85.8±4.4 (8) 86.3±1 .9 (8) 69.9±2.2 ( 16) L1 42. 1±6.8 (10) 42.5±3.6 (8) 37.6±5.4 ( 1 8) 1B 80.0±3.7 (to) 72.5±1 . 8 . (8) 58.8±4.8 ( 1 8) L12 54.4±4.8 (12) 62.9± 1 .7 (8) 48.9±4. 1 (20) 2A 74.2±4.7 ( 13) 78 .4±4.3 (8) 63.8±4.7 (21 ) U 88.2±3. 1 ( 13) 89 .6±3 .8 (8) 72. 1±5.5 (21 ) 2B 74.8±4.2 (13) 78.3±2.4 (8) 59.7±4.2 (21 ) Rod 70.4±4.3 (8) 72.5±1 . 8 (8) 57.5±3.3 ( 16) Segment Type ill Type IV Types ill and IV lA 85.0±2.7 (5) 8 1 .8±3 .3 (9) 76.4±3.4 ( 14) L1 3 1 .8±1 .5 (5) 38.7±2.4 (9) 34.6±4.6 ( 14) 1B 78.0±4. 1 (4) 70.4±3.6 (9) 65.5±2.9 ( 13) L12 56.3±3.7 (6) 5 1 .3±0.9 (10) 46.8±4.4 ( 16) 2A 88.6±4.9 (7) 75 .2±0.6 ( 1 1) 73.4±4.9 ( 18) U 86.0±3.7 (7) 68.8±7.6 ( 1 1 ) 7 1 . 1±7.3 ( 1 8) 2B 86.9±2.5 (8) 82.0±2.2 (10) 75.4±1 .8 ( 1 8) Rod 7 1 .8±2.9 (4) 66.7±2.5 (9) 62.5±2.9 ( 1 3) Segment Types I, IT, Type V Type V ill and IV (Lamins only) (all) lA 66.3±2.3 (30) 86.7±3.2 (3) 72.5±8.5 (4) L1 32.6±6.0 (32) 1B 52.7±2.9 (3 1) 82.3±2.9 (3) 64.5±1 1 (4) L12 38.7±5 . 1 (36) 78 .0±2.6 (3) 57.3±19 (4) 2A 60.2±5. 1 (39) 75 .7±4.0 (3) 60.0±8.9 (4) U 61 .3±7.3 (39) 83.0±0.0 (3) 66.3±10 (4) 2B 56.7±3.8 (39) 86.7±4.0 (3) 67 .3±12 (4) Rod 5 1 .7±2.9 (29) 70.0±2.6 (3) 55 .0±10 (4) Table 3-5 Mean segment homology scores (hs) for the rod domain segments in IF proteins (see Figure 3-1). Values are quoted as mean ± s.d. (number of complete sequences in group). Segments L l and L12 are padded to allow optimum alignment of the sequences - the values correspond to the maximum possible score for those segments and cannot be directly compared to other entries in the table. The type V sequences have been examined as two groups: one without the Helix pomotia B protein (lamins only) and one with it (all). CHAPI'ER 3: Sequence Homology 8e-l HS. ... . � 1 .0 .s � 0.5 ca u Cl) 0.0 1 .0 0.5 Frequency (a) IB acidics (b) 2 acidics (c) 1B basics (d) 2 basics 0.0 -t-�-�..:.....I.-�--=---r-�L.--..---,.L...;..---l-�---l o 128 256 Frequency 384 5 12 Figure 3-3b Fourier transforms of the homology arrays for type I I IF proteins derived from the acidic residues in (a) segment IB and (b) segment 2, the basic residues in (c) segment 1B and (d) segment 2. and the large apolar residues in (e) segment 1B, (f) segment 2B1 and (g) segment 2B2. The frequency axes are related to periods in the homology profIles by the expression: .od 1024 pen = frequency CHAFfER 3: Sequence Homology page 72 The main conclusions from this section are summarized as follows: (1) The homology profiles for large apolar residues in segment IB and segment 2 in all chain types are dominated by the heptad repeat. This emphasizes the importance of these residues in specifying the heptad repeat and, by implication, the integrity of the coiled-coil structure of the IF molecule. (2) The highly conserved periodic distribution of acidic and basic residues in both segment IB and segment 2 implies that these residues play an important role in the self-assembly of IF molecules into IF through the formation of specific ionic interactions. (3) The degree of regularity in the distribution of the homologous ionic residues, as determined by the magnitude of the Fourier intensities and the numbers of orders present, varies between chain types. This indicates the possibility that alternative modes of assembly via ionic interactions may occur occur in the IF ?;- ·a u ... .El "3 6 .� 9 4 ..s 2 0 Xenopus Lamin A, Rod domain: FT of basic residues 8 l;> 6 .� 9 4 s:: - 2 0 0 200 400 600 800 1000 Frequency Figure B-3b Fowier transforms of the dispositions of basic residues in rod domain segments of the Xenopus lamin A protein. A selection of peaks are listed in Table B-3. Period is related to frequency by the expression: 'od 2048 pen = frequeocy APPENDIX B: Fourier Transfonns Period Segment lB (residues) acidics basics 14.629 4:507 12. 190 1 1 .703 4.44 1 1 1 .57 1 10.089 9.942 9.846 7.405 9.225 4.603 8.29 1 4.505 7.341 4.267 3.568 3.525 4.004 3.5 1 3 3.030 6.207 3.025 2.8 17 4.227 2.775 2.768 2.333 6. 1 3 1 2.243 4.246 2.238 2.038 2.022 XeCJ.QIl.W Lamin B Segment 2 acidics basics 4.403 3. 1 39 4.025 4.052 page 143 Rcxidomain acidics basics 4.509 5.504 3 . 1 83 7 .964 4.928 5 . 1 4 1 4.98 1 4.490 5.903 3 .585 4 .889 4.863 Table 8·4 Peaks in the Fourier transfonns of acidic and basic residues in the major coiled-coil segments of Xenopus lamin B (as for Table B-1). See Figure B-4. APPENDIX B: Fourier Transfonns page 144 Xenopus Lamin B. Segment 1 : FT of acidic residues 8 >. 6 ... . � £ 4 c - 2 0 Xenopus Lamin B. Segment 2: FT of acidic residues 8 ?;- 6 '/iI � c 4 - 2 0 Xenopus Lamin B. Rod Domain: FT of acidic residues 8 ?;- 6 '1i5 § c 4 - 2 0 0 200 400 600 800 1000 Frequency Figure 8·4a Fourier transfonns of the dispositions of acidic residues in rod domain segments of the XeMpus lam in B protein. A selection of peaks are listed in Table B-4. Period is related to frequency by the expression: 'od 2048 pen = frequeocy APPENDIX B: Fourier Transfonns page 145 Xenopus Lamin B, Segment 1: FT of basic residues 8 ?;> 6 '- 6 - '� £ c 4 - 2 0 Xenopus Lam in B, Rod Domain: FT of basic residues 8 ?;> 6 '� £ c 4 - 2 0 0 200 400 600 800 1000 Frequency Figure B-4b Fourier transfonns of the dispositions of basic residues in rod domain segments of the Xenopus lamin A protein, A selection of peaks are listed in Table B-4. Period is related to frequency by the expression: 'od 2048 pen = frequency APPENDIX B: Fourier Transfonns Period Segment 1B (residues) acidics basics 12. 190 5. 146 10.894 4.345 10.039 4.310 9.846 9.526 9.352 9.309 5.673 9.267 9.225 9.062 8.225 4.057 6.502 5.347 4. 1 12 3.5 1 3 4.614 2.95 1 4.339 2.926 2.840 2.447 2.263 2. 167 2. 158 2.077 4. 1 66 He.lix 12omotia B Segment 2 acidics basics 6.3 10 5.574 4.739 4.509 5.088 page 146 Rod domain acidics basics 7.362 4.273 3 .028 3.469 5 .540 9.860 4.530 5. 108 4. 1 20 4.325 4.279 4. 177 Table 8-S Peaks in the Fourier transfonns of acidic and basic residues in the major coiled-coil segments of the Helix pomotia B protein (as for Table B-1). See Figure B-S. APPENDIX B: Fourier Transfonns page 147 Helix B, Segment 1: FT of acidic residues 8 Helix B, Segment 2: FT of acidic residues 8 6 4 Helix B, Rod Segment: FT of acidic residues 8 � 6 '� .s 4 o 200 400 600 800 1000 Frequency Figure B-Sa Fourier transfonns of the dispositions of acidic residues in rod domain segments of the Helix pomotia B protein, A selection of peaks are listed in Table B-S. Period is related to frequency by the expression: 'od 2048 pen = freq ueocy APPENDIX B: Fourier Transforms page 148 HelixB, Segment 1: Ff of basic residues 8 � 6 o� c 4 - 2 0 Helix B, Segment 2: Ff of basic residues 8 � 6 0