Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere without the permission of the Author. EXPRESSION, PURIFICATION AND CHARACTERISATION OF RECOMBINANT PEPTIDE:N-GL YCOSIDASE F. A thesis presented in partial fulfilment of the requirements for the degree of Master of Philosophy in Biochemistry at Massey University, New Zealand. Trevor Stephen Loo 2000 "I have not failed. I've just found 10,000 ways that won't work." Thomas Edison. ABSTRACT ABSTRACT PNGase F (Peptide-.N4-(N-acetyl-D-glucosaminyl) asparagine amidase F) is an amidohydrolase isolated from the extracellular medium of the Gram-negative bacterium Flavobacterium meningosepticum. The 34.8-kDa enzyme catalyses the complete and intact cleavage of asparagine-linked oligosaccharide chains from their associated proteins. A T7 promoter-based E. coli expression system was developed in which PNGase F was expressed as a fusion protein with a leader sequence from the ompA gene. The hexa-histidine-tagged PNGase F was correctly processed and exported to the E. coli periplasm and had a calculated molecular weight of 36.2 kDa. A single step purification using immobilised metal affinity chromatography yielded 8 mg of pure protein per litre of culture. The sequence of the PNGase F coding region from the CDC strain 3352 of F. meningosepticum was found to differ from a published sequence from another strain of the bacterium (ATCC 33958) in 57 positions. These differences between the two strains result in eight amino acid substitutions, which are mostly conservative in nature and are on the surface of the protein. Moreover, three potential N-glycosylation sites not present in the ATCC strain 33958 were detected in CDC strain 3352. The recombinant enzyme has similar characteristics of the native enzyme with a pH optimum of 8.5 and is strongly inhibited by Ag+, Cu2+, and Fe3+ ions but not by sulfhydryl-targeting agents such as DTT and NEM. This indicates inhibition by these ions is probably through interactions with a histidine residue at position 193 that may be involved in substrate recognition or catalysis. The specific activity of the native PNGase Fis about four times that of the recombinant protein which may be contributed to inhibition by components of the Complete™ protease inhibitor tablets used in the enzyme preparation or due to modifications for cloning and purification. Using a discontinuous assay and a non-labelled 11-mer ovalbumin-derived glycopeptide as substrate, a rough estimate of the Michaelis constant (Km) for the recombinant PNGase F was determined to be 2.1 µM. An intriguing observation with the activity assays was the apparent product inhibition of enzyme activity and the inhibitor may be either peptide and/or glycan components, which require further investigations into the cause of the inhibition. ACKNOWLEDGEMENTS ACKNOWLEDGEMENTS The many staff and students at the Institute of Molecular Biosciences to be thanked for their help and support. The following people deserve special mention: My supervisor Dr. Gillian Norris for her invaluable advice, infinite patience and enthusiasm throughout the course of my study. Sincere thanks also to my co­ supervisors Dr. Mark Patchett and Dr. Shaun Lott for their advice, encouragement and support. I would also wish to thank my friends in X-lab: Deborah Frumau for her assistance, Julian Adams for his advice and modelling 3D structures, and F. Y. Chai for his humour and assistance. Dr. Cristina Weinberg for her encouragement as a caring friend who was forever patient with my questions, who kept me company late at night, and made me eat healthy. Joanne Mudford for her friendship and giving me a "crash course" about her job before moving to Dunedin. Carmen Norris and David Elgar from NZDRI for giving me permission to use their HPLC system, Dr. Robert Norris for proof-reading my thesis, Associate Professor Geoff Jamerson for his generosity in equipment purchase and consumables, Associate Professor David Harding and Dick Poll for their valuable advice on chemistry and equipment loans, Dr. Catherine Day and Mrs. Carole Flyger for advice and assistance when I was working on my expression constructs, and Professor Pat Sullivan for his interest and encouragement for this work. I should like to thank my long-suffering flatemate Desmond for enduring the chaos and his infallible efforts to keep the house habitable during house renovations and my long absences in writing of this manuscript. Finally, thanks to my family who has always encouraged me to pursuit my interests regardless how bizarre they seemed. 11 TABLE OF CONTENTS TABLE OF CONTENTS ABSTRACT ACKNOWLEDGEMENTS TABLE OF CONTENTS LIST OF FIGURES LIST OF TABLES ABBREVIATIONS ABBREVIATIONS OF AMINO ACIDS ABBREVIATIONS OF SUGARS Chapter 1 Introduction and Literature Review 1.1 Protein Glycosylation 1.1.1 Carbohydrates 1.1.2 Glycoproteins 1.1 .2.1 The Nature of Glycoproteins 1.1.2.2 Types of Glycosidic Linkages 1.1.2.2.1 Glycosylphosphatidylinositol (GPI) Anchors 1.1 .2.2.2 N-linked Glycosylation 1.1.2.2.3 0-linked Glycosylation 1.1.2.3 Complexity and Diversity of Glycans 1.1.2.4 Potential Roles of Protein Glycosylation 1.2 Deglycosylation 1.2.1 Tools to Study the Roles of Glycans in Glycoprotein Functions ... .. . ... ...... .. .. .. ... .... .... .. ... .... .. ... ... ... .. .. ... . lll . I II ... Ill XII XVII xvm XXI xxm 1 1 1 3 3 4 5 5 8 9 10 12 12 1.2.1.1 Chemical Deglycosylation 1.2.1 .2 Enzymatic Deglycosylation TABLE OF CONTENTS 1.2.2 Endo N-acetyl-B-o-glucosaminidases (ENGases) 1.2.3 Peptide-N4-(N-acetyl-B-D-glucosaminyl) asparagine amidase 12 12 13 (PNGases) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . 15 1.2.3.1 The in viva Functions of PNGases 1.2.4 Other PROXlases 1.2.5 Glycosidases from Flavobacterium memingosepticum 1.2.5.1 Endo F1 1.2.5.2 Endo F2 and F3 1.2.5.3 PNGase F 1.2.5.3.1 Properties of PNGase F 19 20 20 21 21 21 21 1.2.5.3.2 Substrate Structure Requirements of PNGase F 22 1.2.5.3.3 The Three Dimensional Structure of PNGase F 23 1.2.5.3.4 The Active Site of PNGase F 1.2.6 Constructs of PNGase F 1.2.6.1 The First Clones of PNGase F 1.2.6.2 PNGase F Cloned as GST Fusions 25 27 28 29 1.2.6.3 A Clone of PNGase F from F. meningosepticum (CDC strain 3352) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.2.6.4 Inclusion Bodies Formation in E. coli 29 1.3 Targeting and Assembly of Proteins in the Bacterial Periplasm 31 1.3.1 The Bacterial Periplasm 1.3.2 The Signal Peptide 1.3.3 The Sec Export Pathway 1.3.4 Protein Folding in the Periplasm lV 31 32 32 33 TABLE OF CONTENTS 1.4 The Scope of this Project 35 Chapter 2 Construction of an Expression Vector for the Production of PNGase F in E. coli . . .. . . . . .. . . 37 2.1 Introduction 2.2 Experimental Objectives and Strategies 2.3 Materials 2.3.1 Chemicals and Enzymes 2.3.2 Plasmids and Bacterial Strains Used in This Study 2.3.3 Bacterial Growth Media 2.3.3.1 M9 with 0.5% Casamino Acids 2.3.3.2 LB Broth 2.3.3.3 SOB Medium 2.3.4 Storage and Propagation of Bacterial Cultures 2.4 Methods used for Cloning 2.4.1 Phenol :Chloroform Extraction of DNA from a DNA/Protein Mixture 2.4.2 Ethanol Precipitation of DNA 37 37 39 39 40 40 41 41 41 41 42 42 42 2.4.3 Size-Selective Polyethylene Glycol (PEG) Precipitation of DNA ........... .. .... ..... .... ........ ............... .. .. .. ........ .. ... .. 42 2.4.4 Agarose Gel Electrophoresis 43 2.4.5 Quantitation and Size Determination of DNA Fragments 43 2.4.6 Amplification of PNGase F gene by PCR 2.4.7 Digestion of DNA with Restriction Endonucleases 2.4.8 Purification of DNA from Agarose Gels 2.4.9 Ligation of DNA Fragments V 43 44 44 45 TABLE OF CONTENTS 2.4.10 Preparation of Competent E. coli Cells 45 2.4.11 Transformation of Ligated Plasmids into Competent E. coli Cells .................................................................... 46 2.4.12 Small-Scale Preparation of Plasmid DNA 2.4.13 Sequence Analysis of DNA 2.5 Results 46 47 48 2.5.1 Amplification of PNGase F Gene from pT7-PNG by PCR 49 2.5.2 Restriction Enzyme Digest of the PNGase F DNA 2.5.3 Ligation of the PNGase F Gene into the Vector 2.5.4 Transformation of Competent E. coli Cells with Ligation 53 55 Products .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.6 Discussion and Conclusions Chapter 3 Expression and Purification of PNGase F 3.1 Experimental Objectives 3.2 Materials 3.3 Methods used for Expression and Analysis of PNGase F 3.3.1 General Methods 3.3.1.1 Determination of Protein Concentration 3.3.1.1.1 Alkaline Copper (Lowry) Protein Assay 3.3.1.1.2 Bicinchoninic Acid (BCA) Protein Assay 3.3.1.1.3 Coomassie Blue (Bradford) Protein Assay 3.3.1.1.4 UV Method 3.3.1 .2 Calculation of PNGase F Extinction Coefficient Vl 59 63 63 63 66 66 66 66 67 67 68 68 TABLE OF CONTENTS 3.3.1.3 Determination of PNGase F Activity with a Reverse Phase High Performance Liquid Chromatography (RP- HPLC) Assay ... ... ... ... ... ...... ... ... ... ...... ... ... ... .. . ..... 69 3.3.1.3.1 Mechanism of Assay 3.3.1.3.2 Preparation of Samples 69 71 3.3.1.3.3 Determination of Specific Activity of PNGase F 73 3.3.1.4 Polyacrylamide Gel Electrophoresis (PAGE) 3.3.1.5 Electroblotting of Proteins from Acrylamide Gels 3.3.1 .6 N-Terminal Sequencing 3.3.1.7 lsoelectric Focusing (IEF) of Native and Recombinant 75 75 76 PNGase F ...................... . .................................. 77 3.3.1.8 Electrospray Ionisation Mass Spectrometry 3.3.1.9 Cell Growth and Protein Production Studies 3.3.2 Expression of PNGase F 77 78 78 3.3.2.1 Growth and Induction of F. meningosepticum Culture 78 3.3.2.2 Growth and Induction of E. coli Culture 79 3.3.2.2.1 Preparation of Periplasmic Fraction from E. coli 79 3.3.3 Purification of PNGase F 3.3.4 Chromatographic Methods 80 79 3.3.4.1 Hydrophobic Interaction Chromatography (HIC) 81 3.3.4.1.1 Hydrophobic Interaction Chromatography with Phenyl Sepharose 6 Fast-Flow (Low Substitution) 82 3.3.4.1.2 HIC with t-butyl-TSK 3.3.4.1 .3 HIC Tests with Various Hydrophobic Matrices A. Pharmacia HiTrapTM Test Kit B. Macro-Prep t-butyl HIC Econo-Pac Cartridges C. TSK-butyl-Toyopearl 650M Vll 83 83 83 84 84 TABLE OF CONTENTS D. Alkyl Sepharose HR 10/10 3.3.4.1 Ion Exchange Chromatography (IEX) 3.3.4.2.1 IEX Test with various Ion Exchangers A. Mono-Q HR5/5 & Uno-Q B. Uno-S Polishing Column C. Hydroxyapatite 3.3.4.3 Size Exclusion Chromatography (SEC) with Superdex 84 84 86 86 86 87 75 ...................................... . ......... ... ............... 87 3.3.4.4 Purification of Recombinant PNGase F using Immobilised Metal Affinity Chromatography (IMAC) 89 3.3.4.5 Size Exchange Chromatography with Superdex 75 91 3.4 Results and Discussion 92 3.4.1 Verification of the Extinction Coefficient of Recombinant PNGase F .......... ...... ............... ................ .... ....... ..... 92 3.4.2 Comparison of Protein Quantitation Methods 93 3.4.3 Growth and Protein Production Studies of Transformed E. coli Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.4.4 Purification of Native PNGase F 100 3.4.4.1 HIC with Phenyl Sepharose 6 Fast-Flow (Low 100 Substitution) ..................................................... . 3.4.4.2 HIC with Home-Made t-butyl Substituted TSK 3.4.4.3 HIC Tests with various Hydrophobic Matrices 102 103 3.4.4.3.1 Phenyl Sepharose 6 Fast-Flow (High Substitution) 103 3.4.4.3.2 Butyl Sepharose 4 Fast-Flow 3.4.4.3.3 Macro-Prep t-butyl Econo-Pac 3.4.4.3.4 TSK-butyl-Toyopearl 650M 3.4.4.3.5 Alkyl Sepharose vm 105 106 107 108 TABLE OF CONTENTS 3.4.4.4 Ion Exchange Tests with various Ion Exchangers 3.4.4.4.1 Anion Exchange Chromatography A. Mono-Q at pH 8.9 B. Mono-Q at pH 9.8 3.4.4.4.2. Uno-Q A. Uno-Q at pH 8.9 B. Uno-Q at pH 9.8 111 111 111 113 114 114 115 3.4.4.4.3 Cation Exchange Chromatography: Uno-Sat pH 7.2 . .. . .. . . . ... . .. ... . .. .. . . . . ... ... .. . ... ...... .. . .. . .. . ... .. . 116 3.4.4.4.4 Hydroxyapatite Column 3.4.4.5 HIC with Newly Synthesised t-butyl TSK 3.4.4.6 SEC with Superdex 75 A. Superdex 75 HR 16/60 Prep-Grade Resin B. Superdex 75 HR 10/30 Super Fine Resin 118 120 121 121 122 C. Superdex 75 HR 10/30 Super Fine Resin in the presence of Thesit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 123 3.4.5 Purification of Recombinant PNGase F 3.4.5.1 IMAC Test of various Chelating Resins 124 125 A. Boehringer Mannheim Poly-His Purification Resin Chelated with Zn2 + . . . . . . . . . . . . . . . . . . . . . . . . . . . . • • . • • . . . . . . . . • • . . . 125 B. Boehringer Mannheim Poly-His Purification Resin Chelated with Ni2+ . .. ... . .. .. . ... .. . .. . ...... ... ... ..... .. . .. . .. 125 C. Pharmacia HiTrap TM Chelating Resin Chelated with Zn2+ .. . .... .. .......... .. ............................................ 126 D. Pharmacia HiTrap TM Chelating Resin Chelated with Ni2+ ............................................................ ··· ·· 127 E. Elution Profile of Recombinant PNGase F from Ni2+ Charged HiTrap TM Chelating Resin . . . . . . . . . . . . . . . . . . . . . . . . 128 lX TABLE OF CONTENTS 3.4.5.2 SEC of Recombinant Enzyme with Superdex 75 in the presence of 0.006% Thesit . . . ... .. . ... ... ..... . .. . ... ... .. . .. 128 3.5 Conclusion Chapter 4 Characterisation of Recombinant PNGase F 4.1 Introduction 4.2 Experimental Objectives 4.3 Materials 4.4 Methods used in Characterisation 4.4.1 Comparison of Substrate Specificity 135 136 136 136 137 138 138 4.4.1.1 Detection of Deglycosylated Substrate with Digoxigenin (DIG) Glycan/Protein Double Labelling kit . . . . . . . . . .. . . ... 139 4.4.1.1 .1 Principle of Detection 4.4.2 Characterisation of Recombinant PNGase F 4.4.2.1 Effects of Temperature 4.4.2.2 Effects of pH 4.4.2.3 Effects of Metal ions and Additives 4.4.2.4 Shelf-Life 139 140 140 141 141 142 4.4.3 Determination of the Michaelis Constant (Km) of Native and Recombinant PNGase F . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. 142 4.5 Results and Discussion 4.5.1 Comparison of Substrate Specificity 4.5.2 Characterisation of Recombinant PNGase F 4.5.2.1 Effects of Temperature 4.5.2.2 Effects of pH 4.5.2.3 Effects of Metal Ions and Additives 4.5.2.5 Shelf-Life of Recombinant PNGase F X 144 144 149 149 150 151 152 TABLE OF CONTENTS 4.5.3 Determination of the Michaelis Constant (Km) of Native and Recombinant PNGase F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 4.5.3.1 Experimental Strategy 4.5.3.2 Results 4.5.3.2 Shortcomings of the Discontinuous Activity Assay 4.5.3.2.1 Limited Availability of the Substrate 4.5.3.2.2 Low Detection Sensitivity of the Assay 4.5.3.2.3 Other Difficulties of the Assay 4.5.4 Alternative Methods to Measure Activity 4.5.4.1 Oxidation of NADPH by L-Glutamatic Dehydrogenase 4.5.4.1.1 Principle of Detection 4.5.4.1.2 Sample Preparation 4.5.4.1.3 Results 4.5.4.2 Real Time Measurement of Km using Surface Plasmon 156 156 159 159 159 159 160 160 160 161 161 Resonance (SPR) Technology . . . . . . . .. . .. . . . . . . . . . . .. .. 162 4.5.4.2.1 Principle of Detection 4.5.4.2.2 Application of SPR to the Km Determination of PNGase F ... ............. ....... ......... ... ... .... .. .... . 4.5.4.3 Kinetic Analysis of Ribonuclease B Products by ES-MS 4.5.4.4 Pre-Kinetic Analysis of PNGase F Deglycosylation Reaction by ES-MS .......................................... . 4.6 Conclusions Chapter 5 Final Conclusion REFERENCES APPENDIX Xl 163 164 164 166 167 170 173 193 Figure 1.1 Figure 1.2 Figure 1.3 Figure 1.4 Figure 1.5 Figure 1.6 Figure 1.7 Figure 1.8 Figure 1.9 Figure 1.10 Figure 1.11 Figure 1.12 Figure 1.13 Figure 1.14 Figure 1.15 Figure 2.1 Figure 2.2 Figure 2.3 Figure 2.4 Figure 2.5 Figure 2.6 Figure 2.7 Figure 2.8 LIST OF FIGURES LIST OF FIGURES The different configurations of monosaccharides Condensation reaction between two monosaccharides The P-asparatylglucosylamine link of a glycosylated asparagine Complex type N-linked glycan High mannose type N-linked glycan Hybrid type N-linked glycan The glycosidic bond cleaved by ENGases The two step cleavage reaction catalysed by PNGases Topology of the PNGase F molecule The PNGase F structure The orientation of N,N'-diacetylchitobiose (CTB) inside the active site of PNGase F ............... ................ ....... . .. . . . ... . Schematic diagram showing the intermolecular hydrogen bonding contacts between PNGase F, N,N' -diacetylchitobiose, and water molecules .................................................. . Structure of PNGase F signal sequence A model for protein translocation across the inner membrane Model of disulfide formation catalysed by the various Dsb enzymes in the periplasm of E. coli .......... .. .... .......... .. . .. . Map ofpKS-OmpA3-His An inverse image of a 12% SDS-PAGE with silver staining shows the solubility problem of recombinant PNGase F expressed in E. coli .................. .......... ...................... . Cloning strategy for PNGase F Synthetic oligonucleotides used for PCR amplification PNGase F amplified from pT7-PNG using PCR Double digested vector and PNGase F DNA Schematic diagram showing predicted PCR products using various primer pairs on a successfully ligated plasmid .......... . PCR confirmation of successful ligation XU 2 3 5 7 7 8 14 18 23 24 26 27 28 33 34 38 48 50 51 52 53 54 55 Figure 2.9 Figure 2.10 Figure 3.1 Figure 3.2 Figure 3.3 Figure 3.4 Figure 3.5 Figure 3.6 Figure 3.7 Figure 3.8 Figure 3.9 Figure 3.10 Figure 3.11 Figure 3.12 Figure 3.13 Figure 3.14 Figure 3.15 Figure 3.16 Figure 3.17 Figure 3.18 Figure 3.19 Figure 3.20 Figure 3.21 LIST OF FIGURES Plasmid miniprep of XLl-Blue transformed with pOPH6 Possible nucleotide misincorporation by Taq polymerase during the amplification of CDC strain 3352 PCR product .... . ...... . PNGase F cleaves the amide bond in the ovalbumin glycopeptide Formation of the homoserine lactone (Hsl) ring Chromatogram of a typical deglycosylation activity assay A summary of the purification scheme for PNGase F Growth curve of E. coli transformed with pOPH6 A schematic diagram of the E. coli BL21(DE3) cell showing IPTG induction of PNGase F synthesis and subsequent secretion of the nascent polypeptide into the periplasm .................... . PNGase F production in induced cells at different time points .. An inverse image of a 12% SDS-PAGE with silver staining shows the protein solubility problem of pKS-PNG is resolved by secretion in pKS-OPH6 ..... . ........................... ....... . ..... . . Standard curved used to estimate the molecular weight of recombinant PNGase F ........... .. ..... .... .... ....... ... ......... .. Chromatogram of the elution from Phenyl Sepharose6 Fast-Flow (low substitution) of F. meningosepticum culture medium SDS-PAGE of the PNGase F fractions from Phenyl Sepharose 6 Fast-Flow (low substitution) .. . ....... . ........ .. ......... ....... ... . Chromatogram of PNGase Font-butyl substituted TSK SDS-PAGE of fractions from t-butyl substituted TSK Chromatogram of elution of PNGase F from Phenyl Sepharose 6 Fast-Flow (high substitution) .. ..... .. . ..... .. .... . . ....... ......... . SDS-PAGE of fractions form Phenyl Sepharose 6 Fast-Flow (high substitution) ......................................... . ........ .. . Chromatogram of elution from Butyl Sepharose 4 Fast-Flow .. . SDS-PAGE analysis on active fractions from Butyl Sepharose 4 Fast-Flow ............ ........ . ............................... . ... . ..... . Chromatogram of elution from t-butyl Econo-Pac SDS-PAGE of active fractions from Bio-Rad t-butyl Econo-Pac Chromatogram of elution from TSK-butyl-Toyopearl SDS-PAGE analysis of active fractions from TSK-butyl- Toyopearl ........ .. . . .. .. . ... . ....... ............. . .................... . Xlll 58 61 68 69 71 90 94 96 97 98 99 100 100 101 101 103 103 105 105 106 106 107 108 Figure 3.22 Figure 3.23 Figure 3.24 Figure 3.25 Figure 3.26 Figure 3.27 Figure 3.28 Figure 3.29 Figure 3.30 Figure 3.31 Figure 3.32 Figure 3.33 Figure 3.34 Figure 3.35 Figure 3.36 Figure 3.37 Figure 3.38 Figure 3.39 Figure 3.40 Figure 3.41 Figure 3.42 Figure 3.43 Figure 3.44 Figure 3.45 Figure 3.46 Figure 3.47 Figure 3.48 Figure 3.49 LIST OF FIGURES Chromatogram of elution of PNGase F from Alkyl Sepharose .. SDS-PAGE analysis of fractions from Alkyl Sepharose SDS-PAGE analysis of native and recombinant PNGase Fused for IEF .. ... .................... . .............................. . ......... . Isoelectrical focusing of native and recombinant PNGase F Chromatogram of elution from Mono-! At pH 8.9 SDS-P AGE of active fractions form Mono-Q at pH 8.9 Chromatogram of elution from Mono-Q at pH 9.8 SDS-P AGE of active fractions from Mono-Q at pH 9 .8 Chromatogram of elution from Uno-Q at pH 8.9 SDS-PAGE analysis of factions from Uno-Q at pH 8.9 Chromatogram of elution from Uno-Q at pH 9.8 SDS-PAGE analysis of the fractions from Uno-Q at pH 9.8 Chromatogram of elution from Uno-Sat pH 7.2 SDS-PAGE analysis of fractions from Uno-Sat pH 7.2 Chromatogram of elution from hydroxyapatite SDS-PAGE analysis of fractions from hydroxyapatite Charged groups distribution on the surface of native PNGase F Chromatogram of elution from newt-butyl substituted TSK SDS-PAGE analysis of fractions from newt-butyl TSK Chromatogram of elution from Superdex 75 HR 16/60 SDS-PAGE analysis of fractions from Superdex 75 HR 16/60 Chromatogram of elution from Superdex 75 HR 10/30 SDS-P AGE analysis of fractions from Superdex 75 HR 10/30 Chromatogram of elution from Superdex 75 HR 10/30 with Thesit ................................................................... . An inverse image of a silver stained SDS-P AGE of fractions from Superdex 75 HR 10/30 with Thesit ......................... . SDS-PAGE of fractions from Zn2+ charged Poly-His resin SDS-PAGE of fractions from Ni2+ charged Poly-Hs resin SOS-PAGE of fractions from Zn2+ charged HiTrap™ chelating resin XIV 109 109 110 111 112 112 113 113 114 115 115 116 117 117 118 118 119 120 121 121 122 122 123 123 124 125 126 127 Figure 3.50 Figure 3.51 Figure 3.52 Figure 3.53 Figure 3.54 Figure 3.55 Figure 4.1 Figure 4.2 Figure 4.3 Figure 4.4 Figure 4.5 Figure 4.6 Figure 4.7 Figure 4.8 Figure 4.9 Figure 4.10 Figure 4.11 Figure 4.12 Figure 4.13 Figure 4.14 Figure 4.15 LIST OF FIGURES An inverse image of a silver stained SDS-PAGE of fractions from Ni2+ charged HiTrap™ chelating resin ... . .... . ............. . Elution profile from Ni2+ charged Pharmacia chelating resin Chromatogram of elution from Superdex 75 HR 10/30 An inverse image of a SDS-P AGE stained with silver staining method shows the purity of recombinant PNGase F from Superdex 75 HR 10/30 ... . . ..... . .............. . .................... . The reciprocal relationship of substrate depletion and product formation in the activity assay ... . .. .. . . . .... ............ . ......... . The linear relationship of product area and substrate concentration in the activity assay .... . ..... .. ... . .......... . ...... . Schematic representation of the detection principle of DIG glycan/protein double labelling kit .......... ... .................. .. Lineweaver-Burk plot with error bars of± 0.05 V Hanes-Woolf plot with error bars of± 0.05 V A 12% SDS-PAGE of the deglycosylated products offetuin by recombinant PNGase Funder different buffering conditions after 48 hours of incubation at 3 7°C ...... . .... ..... . . . ... ... ... .. . . . .... . The deglycosylated products of different glycoproteins used in DIG glycan/protein double labelling kit ... .. . ... .. .. . .. . ......... . The detection of glycans and proteins in deglycosylated products of various glycoproteins using DIG glycan/protein double labelling kit .. .. .. . .. . . .. .... . .... . .... ... . .... .. . . . . . ...... . . ... ...... . The comparison of glycoprotein deglycosylation by native and recombinant PNGase F . . .. .. . . ... .. .... .... .. . . .. ...... . ...... .. ..... . Temperature profile of recombinant PNGase F pH profile of recombinant PNGase F Structures of Caps and Capso buffers The effects of metal ions on recombinant PNGase F activity The effects of additives on recombinant PNGase F activity Initial estimate of the Km of recombinant PNGase F with ovalbumin-derived glycopeptide in small reaction volume Km determination of dialysed recombinant PNGase F in a large reaction volume . ......... ..... . .......... ............................. . The curve generated for calculation of Km by the program Enzfitter ....... . .... . .. .. .. . . . . . ....... . ................................... . xv 127 128 129 129 130 130 140 142 142 145 146 147 148 149 150 150 151 152 157 158 158 Figure 4.16 Figure 4.17 Figure 4.18 Figure 4.19 Figure 4.20 Figure 4.21 Figure 4.22 Figure 4.23 LIST OF FIGURES A continuous calorimetric assay for measuring PNGase F activity through the oxidation ofNADPH by L-glutamatic dehydrogenase (L-GLDH) ............................... . . . . . ........... . Chromatogram from the L-GLDH coupling assay ................... . Detection of biomolecular binding events by SPR ...... . ........... . Understanding the sensorgram ... ... . . . . . . ... . . ....... . ....... ... . ..... . The sensor chip NTA .... . ..... . .. . . .. . .. . ........ . .. . . . ......... ..... . .. . The spectrum for ribonuclease B product quantitation by ES-MS The spectrum for ovalbumin glycopeptide product quantitation by ES-MS ... . . . . . ..... . . .. . .. ............. ............... . ............... . . Structures of Capso, glycerol, and isopropanol. . ...... . . ... ......... . XVl 160 162 163 163 164 165 166 168 Table 1.1 Table 1.2 Table 1.3 Table 1.4 Table 1.5 Table 1.6 Table 2.1 Table 2.2 Table 3.1 Table 3.2 Table 3.3 Table 3.4 Table 4.1 Table 4.2 Table 4.3 LIST OF TABLES LIST OF TABLES Some consequences of genetic defects or polymorphisms in Glycosylation Some biological roles of oligosaccharides The occurrence of some endo-N-acetyl-~-D-glucosaminidases (ENGases) .............. .. . . ............................................... .. The occurrence of some peptide-Y'-(N-acetyl-~-D-glucosaminyl) asparagine amidase (ENGases) ...... . . .. .. . ... ... . . ..................... . Characteristics of known purified PNGases The occurrence of other PROXIases Colony count from transformation of pOPH6 ligation mix into E. coli XLl-Blue cells .... . .. .......... . . . .. . ........... . .... .... ..... . .... . .. Amino acid variation between ATCC strain 33958 and CDC strain 3352 ......... ......... ... . .. . . .. ...... .. ....................... ......... .. .. . Chromatographic supports used in purification PNGase activity assay gradient programme The nine glycoforms of hen egg white ovalbumin Summary of purification of native and recombinant PNGase F List of buffers used in the activity test Protease inhibitor set (Roche) was suspected to have components similar to those of the Complete TM protease inhibitor tablets The effects of long-term storage on recombinant PNGase F activity xvn 4 11 14 15 16 20 57 59 64 70 73 132 141 154 155 ABBREVIATIONS ABBREVIATIONS amu Atomic mass units Amp Ampicillin ATCC 33958 American Type Culture Collection No. 33958; sequence from this strain was published by Tarentino et al., 1990. AUFS Absorbance units at full scale BSA Bovine serum albumin BCA Bicinchoninic acid Caps 3-Cyclohexylamino-1-propanesulfonic acid Capso 3-Cyclohexylamino-2-hydroxy-1-propanesulfonic acid CDC strain 3352 United States Communicable Disease Centre culture collection strain 3352; the strain used in this study. CDI 1, 1 ' -car bony ldiimidazole cfu Colony forming units CIAP Calf intestinal alkaline phosphatase CNBr Cyanogen bromide CTB N,N'-diacetylchitobiose Dabsyl 4-( dimethy I amino )-azo benzene-4' -sulfony 1 DIG Digoxigenin DNA Deoxyribose nucleic acid DNTPs Deoxyribose nucleotide triphosphates DMSO Dimethyl sulphoxide DTT Dithiothreitol EDTA Ethylenediamine tetra-acetic acid (di-sodium salt) EGCase Endoglycoceramidase ENGase Endo-N-acety 1-~-o-gl ucosaminidase or endoglycosidase EPPS N-[2-Hydroxyethy I ]piperazine-N'[3-propanesulfonic acid] ER Endoplasmic reticulum ES-MS ElectroSpray Mass Spectrometry EtBr Ethidium bromide FPLC Fast protein liquid chromatography L-GLDH L-Glutamic dehydrogenase GdmHCl Guanidine hydrochloride XVlll ABBREVIATIONS GPI Glycosylphosphatidylinositol HCl Hydrochloric acid IMAC Immobilised metal ion affinity chromatography IPTG Isopropyl-1-thio-~-D-galactopyranoside kb kilo base pairs kDa kilo daltons a-KG a-Ketoglutaric acid LB Luria broth MW Molecular weight MWCO Molecular weight cut-off Mes Morpholinoethane sulfonic acid MOPS 3-[N-Morpholino] propanesulfonic acid NaAc Sodium acetate buffer NaCl Sodium chloride NADPH a-Nicotinamide adenine dinucleotide phosphate (reduced form) NEM N-Ethy lmaleimide NGase ~-as party 1-N-acety lglucosamine hydro lase (NH4)2S04 Ammonium sulphate NMR Nuclear magnetic resonance 0-GlcNAcase Cytoplasmic ~-GlcNAcase PAGE Polyacrylamide gel electrophoresis PCR Polymerase chain reaction PEG Polyethylene glycol PI Phosphatidyl inositol PMF Proton motive force PMSF Pheny lmethy lsulfony l fluoride PNGase Peptide-N4-(N-acetyl-~-D-glucosaminyl) asparagine amidase F POGase Peptide-0-glycanase Psi Pounds per square inch PVDF Polyvinylidene difluoride RP-HPLC Reverse phase high performance liquid chromatography rpm Revolutions per minute SDS Sodium dodecyl sulfate SPR Surface plasmon resonance XIX ABBREVIATIONS Taps (N-tris [Hydroxy-methy 1 ]methy 1-3-amino propanesulfonic acid TEMED N,N,N',N'-tetramethylethylenediamine Tris Tris (hydroxymethyl)-aminomethane TFA Trifluoroacetic acid Thesit Polyoxyethylene 9-laurylether TSK Chromatography matrix copolymer of oligoethylene glycol, glycidy lmethacry late and pentaery htro 1-dimethacry late UV Ultra violet X-Gal 5-bromo-4-chloro-3-indolyl-~-D-galactoside XX Amino Acid Abbreviations Amino Acid Abbreviations Amino acid Three One MW Side Chain Structure Letter Letter Symbol Symbol Alanine Ala A 89 -CH3 Arginine Arg R 174 -(CH2)3-NH-C-NH2 II NH 0 II Asparagine Asn N 132 -CH2-C-NH2 Aspartic Acid Asp D 133 -CH2-COOH Asparagine Asx B or Aspartic acid Cysteine Cys C 121 -CH2-SH Glutamic Acid Glu E 147 -(CH2)2-COOH 0 II Glutamine Gln Q 146 -(CH2)2-C-NH2 Glutamine Glx z or glutamic acid Glycine Gly G 75 -H CN Histidine His H 154 -CH2- ) N Homoserine Hs Hs 119 -CH2-CH2-0H Homoserine Hsi Hsl 101 er Lactone Isoleucine Ile I 131 -CH-CH2-CH3 I CH3 XXl Leucine Leu L Lysine Lys K Methionine Met M Phenylanine Phe F Proline Pro p Serine Ser s Threonine Thr T Tryptophan Trp w Tyrosine Tyr y Valine Val V 131 146 149 165 115 105 119 204 181 117 XXll Amino Acid Abbreviations CH3 I -CH2-CH I CH3 -(CH2)4-NH2 -( CH2)2-S-CH3 -CH2-@ 0-b N I I -CH-CH3 I OH -CH,-(D N -CH2-@-0H CH3 I -CH I CH3 Sugar Abbreviations Sugar Abbreviations Sugar Fucose Galactose Mannose N-acety lgalactosamine N-acety lglucosamine N-acetylneuraminic (sialic) acid Three Letter Symbol Fuc Gal Man GalNAc GlcNAc NeuNAc Note: Sugar linkages are described using conventional carbon ring numbers connected by a slash and anomericity is denoted by a or p. For example, galactose Pl-4 linked to N-acetylglucosamine is written as GalP1-4GlcNAc. XXlll Chapter 1 Introduction and Literature Review Chapter 1 Introduction and Literature Review This chapter provides a brief overview of the relationship between protein glycosylation and the enzymes that remove them. The diversity of glycans and the manner in which they are linked to proteins along with their functional roles in plants and animals are reviewed. The properties and possible functions of the glycan-removing enzymes are discussed with the main focus centred on PNGase F. 1.1 Protein Glycosylation 1.1.1 Carbohydrates Carbohydrates are the most abundant biomolecules on earth. Simple carbohydrates are polyhydroxy aldehydes or ketones, while such complex carbohydrates that yield these compounds on hydrolysis . Certain carbohydrates such as sugar and starch are the main source of the human diet in most parts of the world, and the oxidation of carbohydrates (glycolysis of o-glucose) is the central energy-yielding pathway in most non­ photosynthetic cells. Insoluble carbohydrate polymers feature as structural and protective elements in the cell walls of bacteria (glycosaminoglycans) and plants (cellulose), and in the connective tissues and cell coats of animals (proteoglycans). Other carbohydrate polymers lubricate skeletal joints (hyaluronates) and provide adhesion between cells. Complex carbohydrate polymers are found covalently linked to proteins or lipids where they serve as signals that determine the intracellular transport or the metabolic fate of these glyconjugates. There are three major classes of carbohydrates: Monosaccharides, oligosaccharides, and polysaccharides. (1) Monosaccharides consist of a single polyhydroxy aldehyde or ketone unit, the most abundant monosaccharide in nature being D-glucos~. Each unit may take up either an anomeric (a or~) configuration, exist as (D or L) isoforms and exist in a furanose or pyranose form, resulting in eight possible configurations (figure 1.1 ). 1 Chapter 1 a-o-Fructofuranose P-L-Fructofuranose Introduction and Literature Review C~H20H OH - CH20H OH P-o-Fructofuranose H OH P-L-Fructopyranose Figure 1.1 The different configurations of monosaccharides. a-o-Fructofuranose, fucose in furanose form, is a ketone (C2--?C5) in the anomeric a-configuration (CH20H group on Cl is above the plane of the ring), whereas P-L­ fructofuranose is a stereoisomer of ~-o-fructofuranose. P-L-fructopyranose, fucose in pyranose form, is an aldehyde (C 1--?C5). (2) Oligosaccharides consist of a number of monosaccharide units linked through a condensation reaction between two hydroxyl groups from two sugar units to form a glycosidic bond (figure 1.2). They are generally short chains consisting of two to ten monosaccharides. When two different sugar units are linked together, there are four distinct isometric links that one hydroxyl group of one sugar unit can form with the hydroxyl group at the C2, C3, C4 or C6 positions of another monosaccharide. Taking the eight possible configurations of each monosaccharide unit into account, there are thirty-two possible configurations may be formed in the linkage of two monosaccharide units. Therefore, the number of possible permutations and combinations of monosaccharide types and glycosidic linkages is enormous. 2 Chapter 1 Introduction and Literature Review a-D-Glucose OH Hydrolysis OH OH ~-D-Glucose CH20H OH Condensation Maltose OH Figure 1.2 Condensation reaction between two monosaccharides. The hydroxyl group of ~-D-Glucose condenses with the hemiacetal of the a-o­ Glucose, resulting in the elimination of H20 and formation of the glycosidic bond. The reverse reaction is hydrolysis or attack by H20 on the glycosidic bond. Adopted from Lehninger et al., (1993). (3) Polysaccharides consist of long chains having hundreds to thousands of monosaccharide units. Some polysaccharides such as cellulose occur in linear chains, whereas others such as glycogen have branched chains. 1.1.2 Glycoproteins 1.1.2.1 The Nature of Glycoproteins Glycosylation, where proteins possess a covalently linked oligosaccharide moiety, is the most widespread form of post-translational modification of proteins. These oligosaccharides are known to encode a large amount of structural and biochemical information. Genetic defects or polymorphisms in glycosylation are uncommon in higher animals because most are generally fatal. However, there are some that cause changes in the glycan primary structures with the consequences shown in table 1.1 (V arki, 1993). 3 Chapter 1 Introduction and Literature Review Table 1.1 Some consequences of genetic defects or polymorphisms in glycosylation Genetic defect/variation Partial deficiency of xylosylprotein 4-~ Galactosy ltransferase Hereditary opsonic defect Haemophilia A variant Deficiency ofUDP­ Gal:3-a Galactosyl transf erase Defect in glycosylation Decreased production of glycosaminoglycan chains with core Gal~1~4Xyl linkage. Point mutation in serum mannose-binding protein. Point mutation creates a new N-linked glycosylation site. Marked decrease of Gala1~3 Gal~1~4 GlcNAc sequences terminating glycoprotein and glycolipid oligosaccharides. 1.1.2.2 Types of Glycosidic Linkages Biological consequence(s) Progeroid syndrome with delayed mental development, and multiple connective tissue abnormalities. Heterozygous state causes low opsoninisation of pathogens. Increased infections in childhood. Decreased function of Factor VIII, leading to bleeding disorder. No obvious abnormality results. All humans have a natural antibody (up to 1 % of circulating IgG) against Gala1~3Gal~1~4GlcNAc sequences. The cellular mechanisms for carrying out glycosylation are available to all proteins that carry the appropriate signals in the polypeptide chain. Firstly, the polypeptide must contain a signal for entry into the secretary pathway, so that, as it leaves the ribosome, the amino-terminus translocates into the lumen of the endoplasmic reticulum (ER), where it first encounters the glycosylation machinery. One or more sites on a given protein may be glycosylated in the ER and the Golgi apparatus with a large population of structurally related oligosaccharides. These oligosaccharides are produced by a sequential addition of monosaccharide units to a core structure and are classified by the way in which they are bound to the protein. The three main classes of covalent glycosidic linkages to proteins are: Glycosylphosphatidylinositol (GPI) anchors, 0- linked, and N-linked glycans. 4 Chapter 1 Introduction and Literature Review 1.1.2.2.1 Glycosylphosphatidylinositol (GPI) Anchors GPI anchors serve to link some cell surface proteins to the lipid bilayer of membranes. The post-translational attachment of a GPI anchor to a fully folded protein, occurs in the ER and involves the transfer of a pre-assembled precursor onto a specific site on the protein through an ethanolamine phosphate linkage. GPI anchors are attached selectively to those proteins that contain a GPI signal sequence at the carboxy-teminus. This sequence is cleaved and replaced by a pre-assembled GPI anchor precursor, which may then be subsequently modified. In the fully formed anchor, the glycan contains a conserved backbone sequence (Manal-2Manal-6Manal-4GlcNH2) which is linked to the 6-position of the myo-inositol ring of phosphatidyl inositol (PI). All GPI anchors contain two lipids, normally one acyl and one alkyl, attached to the glycan through the phosphate and glycerol on the inositol ring 1.1.2.2.2 N-linked Glycosylation N-glycosylation is a co-translational modification available to, but not necessary used by, all proteins that contain the sequon Asn-X-Ser/Thr (where X is any amino acid except Asp and Pro). The oligosaccharides are N-linked to glycoproteins through an amide bond between the asparagine residue and a GlcNAc moiety at the reducing end of the oligosaccharide, commonly known as the ~-aspartylglucosylamine link (figure 1.3). Amide bond I NH I H NH-C-CH2-CH ij Lo Figure 1.3 The ~-aspartylglucosylamine link of a glycosylated asparagine. 5 Chapter 1 Introduction and Literature Review The initiation of N-glycosylation starts with the biosynthesis of a lipid linked dolichol­ oligosaccharide (Glc3Man9G1cNAc2PPD0l) precursor that is transferred onto the asparagine in the sequon Asn-X-Ser/Thr in the nascent polypeptide chain. The transfer and subsequent trimming and elongation reactions are catalysed by glucosidases, mannosidases and a number of membrane bound oligosaccharyl transferases in the rough endoplasmic reticulum (RER) and the Golgi apparatus to produce a large population of glycans at each glycosylation site. The amino acids that surround the glycosylation site and within the sequon itself have been shown to exert an influence on the efficiency of core glycosylation. Studies with a series of peptides have shown that Asn-X-Thr sequons favour glycosylation compared to Asn-X-Ser sequons (Kaplan et al., 1987) and interestingly, peptides with Asn-X-Cys sequons were reported to be glycosylated at admittedly low levels (Shakin-Eshleman, 1996). Site-directed mutagenesis studies have shown the Y amino acid in the Asn-Leu-Ser/Thr-Y sequon is another important determinant of glycosylation efficiency (Mellquist et al., 1998). The Asn-Leu-Thr-Y sequons are more efficient sites for glycosylation than Asn-Leu-Ser-Y sequons and the efficiency decreases with aromatic residues, negatively charged residues or praline in the Y position. The identity of the X amino acid in the sequon was also shown to alter the efficiency of glycosylation. Negatively charged amino acids, aromatic amino acids, or praline in the X position appear to inhibit glycosylation. However, glycosylation is favoured when small amino acids or positively charged amino acids are in the X position of the sequon (Shakin-Eshleman, 1996). Mature N-linked oligosaccharides all have a common trimannosyl core, because they arise from the same biosynthetic precursor, Glc3Man9(GlcNAc)2. They fall into three general categories: Complex-, high-mannose-, and hybrid glycans. (1) Complex oligosaccharides have ~-lactosamine substituted onto the core mannose termini and have the greatest structural variation (figure 1.4). The trimannosyl core can be linked up to five oligosaccharide chains and extended by the addition of a wide variety of sugars such as fucose, xy lose, galactose and sialic acid. Further structural variation comes from the addition of fucose residues at the C3 or C6 positions of the aGlcNAc proximal to the asparagine residue. 6 Chapter 1 Introduction and Literature Review NeuNAca2-3Gatp-4GalN Acp I Trimannosyl Core NeuNAca2-3Gatp 1-4GalNAcp I NeuNAca2-3GalPI \ 4 GalNAcPI 3 or 6 I Fucal NeuNAca2-3GalPl-4GalNAcPI 4 Mana] 2 \ 6 4 I Mana] 2 Manp 1-4GlcNAcP 1-4GlcNAcP 1-Asn 3 Figure 1.4 Complex type N-linked glycan. Modified from Mort and Pierce, (1995). (2) High-mannose type glycans are composed entirely of mannose, apart from the two GlcNAc residues of the trimannosyl core (figure 1.5). Manal - 2Manal Trimannosyl Core Mana) /3 \ Manal-2Manal , 6 Manal /2 Manal - 2Manal ManP 1-4GlcNAcp 1-4GlcNAcP 1-Asn 3 I Figure 1.5 High mannose type N-linked glycan. Adopted from Mort and Pierce, (1995). 7 Chapter 1 Introduction and Literature Review (3) Hybrid structures have features of complex and high mannose type oligosaccharides, and most contain a bisecting GlcNAc ~ 1-4 linked to ~­ mannose residue (figure 1.6). Manal Trimannosyl Core Mana! 3 \ Manal /,,, 6 GalNAcP I -+--- 4 Manp 1- 4GlcNAcp 1- 4GlcNAcP 1-Asn 3 I Mana] / 2 NeuNAca2-3GaIPI - 4GalNAcPI / _________________ _ Figure 1.6 Hybrid type N-linked glycan. Adopted from Mort and Pierce, (1995). 1.1.2.2.3 0-linked Glycosylation In contrast to N-glycosylation and the addition of GPI anchors, 0-glycosylation does not begin with the addition of a precursor to the polypeptide chain, but of a single monosaccharide, usually N-acety lgalactosamine ( GalN Ac) but sometimes N­ acetylglucosamine (GlcNAc) or xylose. This is transferred to the side chain oxygen of a serine, threonine, hydroxyproline, or hydroxylysine in both fully folded proteins and nascent polypeptide chains. The link between an a-N-acetylgalactosamine and serine or threonine (aGalNAc-Ser/Thr) residue is known as a mucin type linkage in mammals. 0-linked glycans can be either linear, formed by an extension of the oligosaccharide chain through a ~1,3-linked galactose to the aGalNAc (Gal~l-3GalNAc-Ser/Thr), or branched where additional extensions are made from the GalNAca-Ser/Thr core through a ~1,6-link between a GlcNAc and the aGalNAc. Single monosaccharide units such as GalNAc or GlcNAc are commonly found 0-linked to a growing number of proteins, although more commonly to nucleoplasmic and cytoplasmic proteins. There appear to be no defined sequon requirement for attachment (Haltiwanger et al., 1991 ). 8 Chapter 1 Introduction and Literature Review 1.1.2.3 Complexity and Diversity of Glycans The synthesis of the polypeptide chain of a glycoprotein is under genetic control. In contrast, oligosaccharides are attached to the protein then processed by a series of enzymes. While the populations of these enzymes are the result of a genetic template, their activities are governed by the availability of substrates, conditions in the cell and competition from other enzymes. Consequently, a single polypeptide that is glycosylated normally emerges from the biosynthetic pathway as a population of proteins that have the same polypeptide chain, but different oligosaccharide chains or glycans, known as glycoforms. Because the same glycosylation machinery is available to all proteins that enter the secretory pathway in a given cell, most glycoproteins emerge with characteristic glycosylation patterns but heterogeneous populations of glycans at each glycosylation site. This suggests that the proteins themselves direct the processing of their own glycan chains within the constraints imposed by a given array of enzymes and sugar nucleotides. Within the ER all glycoproteins contain the same limited range of oligomannose sugars, and it is later, within the Golgi, that the extensive heterogeneity develops. The factors that control the composition of the glycoform populations and the role that heterogeneity plays in the function of glycoproteins are important questions that are not yet fully understood. Different combinations of stereochemical configuration and monosaccharide composition lead to slight differences in physiochemical properties and hence the potential for diversity in function. This contrasts with the linear linking of amino acids or nucleic acids. Summarised below are the types of variation possible: • Glycosylation is cell , tissue and species specific • Glycoproteins from the same cell may contain different oligosaccharide chains • Individual polypeptides may have identical glycosylation patterns at a particular glycosylation site • Identical polypeptides may have different oligosaccharide structures at a particular glycosylation site • Polypeptides may contain multiple glycosylation sites, each of which may have different glycosylation structure • The pattern of oligosaccharide heterogeneity at a single glycosylation site under constant physiological conditions is reproducible and not random 9 Chapter 1 Introduction and Literature Review 1.1.2.4 Potential Roles of Protein Glycosylation Post-translational glycosylation is an important process for modifying the structure and function of the majority of eukaryotic secreted and membrane proteins. Covalently linked oligosaccharides are known to alter the physiochemical properties of their parent proteins such as viscosity , isoelectric point (pI), solubility and thermal stability. They also have an influence on the global physical properties of protein structure (Lehninger et al., 1993 ): The conjugated clusters of hydrophilic carbohydrates may alter the polarity and solubility of glycoproteins. Oligosaccharide chains attached to newly synthesised proteins in the Golgi complex may also influence the sequence of polypeptide-folding events that lead to the native tertiary structure of the protein in the following ways: (i) Steric interactions between peptide and oligosaccharide may repress one folding route and favour another. (ii) When a population of negatively charged oligosaccharides chains are clustered in one region of a protein, the charge repulsion among them favours the formation of an extended, rod-like structure in that region. (iii) The bulkiness and negative charge of oligosaccharide chains may protect some proteins from attack by proteolytic enzymes. Beyond these effects on tertiary structure, there are more specific biological functions ascribed to the oligosaccharide chains on glycoproteins depending on the protein to which they are attached. These include cell-cell recognition, intracellular sorting and targeting, and affinity to receptors. Some of these examples are listed in Table 1.2 (Varki, 1993; Lis and Sharon, 1993). However, while these observations suggest that oligosaccharides play important biological roles in cell growth and differentiation, and the fact that changes in oligosaccharide structure are observed in various disease states, no single common theory has emerged to explain the diversity of structures and the variety of roles. By studying the function of glycoproteins in their glycosylated and deglycosylated states, some of the biological roles played by oligosaccharides were elucidated. An example is the inhibition of the initial N-glycosylation of ecto-apyrase (HB6), a human brain E-type A TPase, expressed in COS cells, by tunicamycin. The non-glycosylated proteins are devoid of ATP and ADP hydrolysing activity and exist as a monomer unlike the glycosylated HB6 that exists as a homodimer (Smith et al., 1999). 10 Chapter 1 Introduction and Literature Review Table 2. Some biological roles of oligosaccharides. Biological role of glycans Example(s) Structural , protective, and • Maintenance of tissue • Gastric mucus prevents the stabilising structure, integrity and stomach from digesting itself. porosity. • Protects polypeptides from recognition by proteases or antibodies. • Initiation of correct folding in the rough endoplasmic reticulum. Organisational and barrier • Oligosaccharide binding • Chondroitin/dermatan sulphate domains of glycoconjugates chains of proteoglycan decorin involved in the organisation of is required for the deposition extra-cel lular matrix. of fibronecti n in Chinese Hamster Ovary cell matrix. Traitorous • As specific receptors for • Recognition of terminal sialic variety of viruses, bacteria, acids on glycans is the first parasites, plant and bacterial step in infectious process of toxins. the influenza virus. • Antigens for autoimmune and alloimm une reactions. Masking and decoy • Addition of specific • Addition of galactose and monosaccharides masks the sialic acid to the Tn antigen sequences recognised by abolishes its autoimmune microorganisms, toxins, or reactivity. autoimmune ant ibodies. Symbiotic • As specific receptors for • Certain gut bacteria in animals microorganisms in symbiotic and some root-nodule forming relationships. bacteria in plants mediate their binding to host cell surfaces through specific sugar sequences. On-off and tuning • Glycosylation can substantially • Binding affinities and modulate the interaction of biological activities of peptides with their cognate hematopoietic growth factors ligands or receptors. such as erythropoietin changes substantially with differing degrees ofN-linked glycosylation. Targeting and clearance • Glycosylation can affect • Exposed terminal ~-Gal protein turnover and half-life residues on mammalian in single cell. plasma proteins are recognised by the asialoglycoprotein receptor and cleared from circulation. Hormonal action • Free oligosaccharides can have • Various plant receptors biological effects in various recognise a specific ~-glycan systems. oligosaccharide of the Phytophothora fungal cell walls, causing the release of phytoalexins. Cell-cell and cell-matrix recognition • Oligosaccharides closely • Involvement ofselectin family spaced together on a of receptors in response to polypeptide generate clustered tissue injury or infection. The sugars for specific recognition. ligands involved in recognition appear to be sialylated fucosylated Sialyl Lewisx and Sialyl Lewis'. 11 Chapter 1 Introduction and Literature Review 1.2 Deglycosylation The previous sections have reviewed the significance of glycosylation in altering the physiochemical properties of glycoproteins, the biochemical information encoded by the attached glycans, and their possible biological roles . Much of this information has been obtained by removing these oligosaccharides to determine both the structure and function of the glycan moieties. 1.2.1 Tools to Study the Roles of Glycans in Glycoprotein Functions Approaches include enzymatic or chemical removal of glycan chains, inhibition of initial glycosylation with tunicamycin, changing the glycosylation pathway, prevention of glycan processing with inhibitors such as castanospermine, and elimination of specific glycosylation sites by site-directed mutagenesis (V arki, 1993). 1.2.1.1 Chemical Deglycosylation The most widely employed chemical procedure for releasing N-linked oligosaccharide chains is hydrazinolysis. This method is non-selective and therefore gives uniform release of unreduced N-linked oligosaccharides from glycoproteins in high yield. Hydrazinolysis requires relatively harsh reaction conditions that, in addition to cleavage of the asparatyl-N-acetylglucosamide bond, results in partial release of any 0-linked sugars, de-N-acetylation, hydrazone formation, and in some cases, peeling and other side reactions (Takasaki et al., 1982). Furthermore, not only the protein component of the glycoprotein is also destroyed in the reaction, but it is also necessary to re-N­ acetylate each oligosaccharide and to regenerate its free-reducing terminal before proceeding with further characterisation (Hirani et al., 1987). 1.2.1.2 Enzymatic Deglycosylation 'Proximal glycanases' (PROXIases) have been defined as a class of enzymes involved in the deglycosylation of glyconjugates (Suzuki et al., 1994c). These enzymes catalyse the cleavage of the linkage between proximal monosaccharide and core protein ( ceramides) or between two proximal monosaccharide moiety to release free glycan and apo-glycoconjugates. PROXIases rapidly become the biochemical tools of choice by 12 Chapter 1 Introduction and Literature Review researchers attempting to analyse the structure and function of the carbohydrate moiety and to aid in crystallisation of glycoproteins. In contrast to chemical deglycosylation, a simple incubation with PROXIases under mild conditions releases intact oligosaccharides and their protein counterpart in a form suitable for purification and characterisation. Furthermore, oligosaccharide structure can be deduced on basis of PROXIase specificity limitations (Hirani et al., 1987; Maley et al. , 1989). Side reactions are unlikely unless contaminating enzymes such as exoglycosidases ( enzymes that remove sugars from the reducing end) or phosphatases are present. PROXIases are organised into five classes: Cytoplasmic ~-GlcNAcase ( 0-GlcNAcase ), Endoglycoceramidase (EGCase) , Endo-N-glycanase (ENGase), Peptide-N-glycanase (PNGase), and Peptide-0-glycanase (POGase). The occurrences of these enzymes, their functions and possible biological roles are briefly reviewed in the next section. 1.2.2 Endo-N-acetyl-~-o-glucosaminidases (ENGases) ENGases (EC 3.2.1.96) have been found in various sources (table 1.3). The first EN Gases detected was in a crude extract of fig (Ogata-Arakawa et al., 1977) and since then more than 30 have subsequently been found and characterised. These enzymes hydrolyse the glycosidic linkage between the two N-acetylglucosamine (GlcNAc) residues within the asparagine-linked oligosaccharide core, leaving a GlcNAc residue on the glycoasparagine with the release of a free oligosaccharide with one less reducing end GlcNAc residue as shown in figure 1.7 (Tai et al., 1977). The key structural determinants for ENGase activity are a polypeptide on either side of the asparagine residue and a dichitobiose core (two GlcNAc residues) with at least three mannose residues. There are also specific geometric configuration and monosaccharide composition requirement of the oligosaccharide chain (Maley et al., 1989). For example, Endo F 1 and H only recognises high mannose structures and could not hydrolyse complex asparagine-linked oligosaccharides such as glycopeptides with a fucose linked a.1-6 to the proximal N-acetylglucosamine. Other ENGases have their own different and restricted substrate specificities. 13 Chapter 1 Table 1.3 Kingdom Animals Bacteria Fungi Mould Plants Introduction and Literature Review The occurrence of some endo-N-acetyl-~-D-glucosaminidases (EN Gases). Organism Hen Oviduct Human Kidney Human saliva Rat liver Arthrobacter protophormiae Clostridium perfringens Streptococcus pneumoniae Flavobaterium meningosepticum Streptomyces plicatus Streptomyces plicatus Pseudomonas sp Stigmatella aurantiaca Sporotrichum dimorphosporum Mucor heimalis Dictyostelium discoideum Fig Canavalia ensiformis (Jackbean) Phyllostachys heterocycla (Bamboo shoots) Raphanus sativus (Radish) Silene alba (White campion) ENGase Name HS A CI, CII D F1,F2,F3 H L PI, PII St B M s FI, FII J p R Se Reference( s) Tarentino et al., 1976 DeGasperi et al., 1989 Ito et al., 1993 Fujisaki et al., 1991 Takegawa et al., 1989 Ito et al., 1975 Muramatsu et al., 1971 Plummer et al., 1991 Tarentino et al., 1972 Tarentino et al., 1974 Takegawa et al., 1991 Bourgerie et al., 1993 Bouquelet et al., 1980 Kadowaki et al., 1990 Freeze et al., 1984 Chien et al., 1977 Yet et al., 1988 Nishiyama et al., 1991 Berger et al., 1995b Lhernould et al., 1995 ' ' NH Oligosaccharide \ • I NH-C-CH2-CH 0 0 ~ Lo I ' ~o ~o Figure 1.7 The glycosidic bond cleaved by ENGases. 14 Chapter 1 Introduction and Literature Review 1.2.3 Peptide N4-(N-acetyl-~-o-glucosaminyl) asparagine amidase (PNGases) PNGases (EC 3 .5.1.52) differ from EN Gases because they specifically hydrolyse the ~­ asparatyl-glucosaminylamine bond between the asparagine residue in peptide linkage and the GlcNAc moiety at the reducing end of the oligosaccharide chain. This cleavage results in the conversion of the asparagine residue to an aspartic acid and the concomitant liberation of free and intact oligosaccharides. These enzymes are therefore more correctly described as amidases (amidohydrolyases) rather than endoglycosidases which cleave the glycosidic bond between two GlcNAc sugars. The first PNGase discovered was from almond emulsin and designated as PNGase A (Takahashi et al., 1977). Other PNGases have subsequently been discovered from many sources (table 1 .4) but only a few have been purified to homogeneity and characterised (table 1.5). Table 1.4 The occurrence of peptide ~-(N-acetyl-~-o-glucosaminyl) asparagine amidase (PNGases). Kingdom Organism Animals Hen oviduct Mouse (liver ER) Mouse (L-929 fibroblast cells) Humans, chickens Various mouse organs Oryzias latipes (Medaka fish embryo) Bacteria Flavobacterium meningosepticum Saccharomyces cerevisiae (Yeast ER) Fungi Aspergillus tubigenesis Plant Prunus amygdalusi (Almond emulsin) Glycine max (Soybean seeds) Canavalia ensiformis (Jack beans) Hordeum vulgare (Barley seeds) Oryza saliva (Rice seeds) Pisum sativum (Split pea) Raphanus sativus (Radish) Silene alba (White campion) Various plant seeds 15 Name Reference( s) HO Suzuki et al., 1997 Weng et al., 1997 L-929 Chang et al., 1997 Suzuki et al., 1995a Kitajima et al., 1995 M F At A GM J JIP60 Os p R Se Seko et al., 1991, 1999 Plummer et al., 1984 Suzuki et al., 1998 Paquin et al., 1997 Takahashi, 1977 Kimura et al., 1998 Sugiyama et al., 1983 Dunaeva et al., 1999 Chang et al., 2000 Plummer et al., 1987 Beger et al., 1994 Lhemould et al., 1992 Plummer et al., 1987 0 ::r Table 1.5 Characteristics of known purified PNGases. ~ cii -, ...... I A At F GM J L-929 M Os (Acidic) pH optimum 4.5 5.0 8.5 5.0 5.0 7.0 4.0 5.0 Subunit Hetero- Mono- Mono- Mono- Multi- Homo- Mono- dimer mer mer menc menc dimeric menc Molecular weight (kDa) 75 .5 78 34.8 93 69 212 150 80 ....... 0\ -SH group requirement for activity No No No ND No Yes No No ~ Action on plant complex type N-glycan-peptide Yes Yes No Yes ND No No ND ..... 0 a. C e.g. Bromelain or ricin D glycopeptides that u 6" :::, contains fucose al -3 linked to the proximal Ql :::, a. GlcNAc r ~ Q) Action on sialylated glycopeptide Yes ND Yes Yes No Yes Yes ND c ..... CD e.g. Fetuin glycopeptides (strong) (weak) (weak) ::0 CD < Action on peptides containing a single GlcNAc Yes Yes No No ND No ND ND I~· residues (weak) (almost) (almost) Chapter 1 Introduction and Literature Review Results from 1H NMR and kinetic studies have shown that the cleavage of the ~­ asparatyl-glucosaminylamine bond occurs in two steps (Risley and Van Etten, 1985). The two-step reaction is presented in figure 1.8. In the first step, hydrolysis of the amide bond generates an aspartic acid on the polypeptide and liberates the carbohydrate moiety as a 1-amino-oligosaccharide intermediate, the amino group from the asparagine being retained. Non-enzymatic decomposition of the 1-amino-oligosaccharide to release ammonia is very slow at pH 8.6 and above, and very fast below pH 8 (Risley and Van Etten, 1985; Tarentino et al., 1982). This reaction is mechanistically analogous to that of a lysosomal enzyme, ~-aspartyl-N-acetylglucosamine hydrolase (NGases), but differs in that the lysosomal enzyme cannot hydrolyse the amide bond when the amino or carboxyl end of the asparagine is substituted (Tarentino et al., 1969). Conversely, PNGases cannot cleave oligosaccharides on a single asparagine residue (Plummer and Tarentino, 1981; Suzuki et al., 1994a and 1994c ). The structural determinants for recognition of the substrate appear to include both the polypeptide chain and the glycan core. Studies have shown that the location of the oligosaccharide on the peptide backbone and its chain length are major determinants for enzymatic activity (Plummer and Tarentino, 1981). Glycosylated asparagine residues are hydrolysed less favourably if present at the carboxyl- or amino-terminal position of a peptide chain. In the same paper, the authors showed that the activity was inhibited when substrates contained long polypeptide chains ( <20 residues), and in general, glycopeptides with a dipeptide on either side of the N-linked asparagine are good substrates for PNGases. Substrates with a negative charge in position X in the N­ glycosylation sequon -Asn-X-Ser/Thr- have been shown to effect a negative influence on the rate of hydrolysis (Tarentino et al., 1993a). 17 Chapter 1 Sugar-O Sugar-O 1-amino-oligosaccharide Sugar-O Intact oligosaccharide NH \ Introduction and Literature Review PNGase t IH NH-C- CH2- CH II I o c=o I I I I ' Asn c=o I CH3 I NH I + HO-c- CH2- CH II I o c=o I NH \ c=o I Asp Ammonia NH \ c=o I CH3 Figure 1.8 The two step cleavage reaction catalysed by PNGases. The hydrolysis of the ~-asparatylglucosaminylamine bond converts Asn to Asp and releases oligosaccharide intermediates that slowly degrade to intact oligosaccharides and free ammonia. 18 Chapter 1 Introduction and Literature Review 1.2.3.1 The in viva Functions of PNGases Although PNGases are widely used in research to remove sugar chains for structural and functional studies of the oligosaccharide moieties and to aid in protein crystallisation, the functional roles of PNGases in viva are unknown. Studies of the occurrence of PNGases in various living organisms lead to suggestion that these enzymes play many different roles depending on their cellular location and the developmental stage of the cell. For example, two PNGases with distinct enzymatic properties and two different pH optimum values are expressed during embryogenesis of medaka fish (Seko et al., 1999). Neutral PNGase M from medaka fish was suggested to control the biological functions of L-hyosophorin that is involved in the regulation of cell-to-cell interactions during early development. Whereas the acidic PNGase M is responsible for the production of glycophosphoprotein-type free glycans that may contribute to the degradation and the absorption of glycophosphoprotein by the developing embryos. Some PNGases are believed to be responsible for quality control of de nova synthesised proteins inside the cell such as the neutral PNGases from the ER ofrats, hen oviduct and in the cytosolic fraction of the yeast ER (Weng and Spiro, 1997; Suzuki et al., 1997, 1998). PNGase L-929 is proposed to have a dual role as both a glycosidase and a lectin-like receptor protein in viva (Suzuki et al., 1994b ). This enzyme possesses a unique carbohydrate-binding site that is separate from the catalytic site, and regulates the catalytic site by feedback inhibition at high concentrations of free oligosaccharides. PNGase J from jack bean converts the glycosylated concanavalin A precursor into an active form of the lectin (Bowles et al., 1986) and releases the unconjugated glycans (UNGs) which have a specific role in the plant cell (Priem et al., 1994). Detailed knowledge of the characteristics of many PNGases is limited because of the extremely small quantities of PNGases present in living organisms. This makes localisation, isolation, purification, and characterisation of these enzymes very difficult. Furthermore, some of the PNGases are expressed only during certain stage of growth and development. Creation of knock-out strains that are deficient in PNGase may not have an obvious effect on the organisms involved as some enzymatic pathways have redundancy systems for important processes needed for survival. In addition, creation of such strains requires detailed knowledge of the sequence encoding the PNGases and the genomes of the organisms involved. Antibodies raised against different PNGases 19 Chapter 1 Introduction and Literature Review can facilitate studies on the cellular localisation and development profiles of PNGases. However, the generation of these antibodies again requires the protein to be purified in relatively large quantities and antibodies raised against one PNGase may not detect other PNGases, if the epitopes on those PNGases are not recognised. Nevertheless, the functional roles played by PNGases will become clear as more of these enzymes are characterised. 1.2.4 Other PROXlases Other enzymes involved in deglycosylation of glycoconjugates are listed in table 1.6. They are, however, beyond the scope of this review and would not be discussed further. Table 1.6 The occurrence of other PROXIases. Enzyme Kingdom EGCase Animal Bacteria Endo-~-Xylase Animal 0-GlcNAcase Animal POGase Bacteria Organism Earthworm Leech Rabbit Corynebacterium sp Rhodococcus sp. Mollusc Rabbit Rat Alcaligenes sp. Reference( s) Li et al., 1987 Li et al., 1986 Basu et al., 1990 Ashida et al., 1992 Ito and Yamagata, 1986 Takagaki et al., 1990 Takagaki et al., 1988 Dong and Hart, 1994 Fan et al., 1988 Diplococcus pneumoniae Bhavanandan et al., 1976 Streptomyces sp. Ishii-Karakasa et al., 1992 1.2.5 Glycosidases from Flavobacterium meningosepticum The aerobic, rod-shaped, Gram-negative bacteria secretes an amidase, PNGase F, and three endoglycosidases, Endo F 1, Endo F2, and Endo F 3. (Elder et al., 1982; Plummer et al., 1984; Trimble et al., 1991 ). The properties and specificities of the endoglycosidases from F. meningosepticum are briefly examined with the main focus being on PN Gase F. 20 Chapter 1 Introduction and Literature Review 1.2.5.1 Endo F1 Endo F1 shows maximum activity between pH 5 and 6. It retains 65% activity at pH 7.0 but below pH 5, the activity drops off rapidly (Maley et al., 1989). The Endo F I gene has been cloned and sequenced, and codes for a mature protein of 289 amino acids with a molecular mass of 31, 667 Da (Tarentino et al., 1992). Endo F I has a very similar substrate specificity to Endo H from Streptomyces plicatus. Both enzymes are nearly identical in their ability to hydrolyse high-mannose oligosaccharides and differ only in their specificity for a core-substituted fucose that impedes hydrolysis by Endo F 1 (Trimble et al., 1991). Both Endo F1 and Endo H do not hydrolyse any complex type N-glycans. 1.2.5.2 Endo F2 and F3 Endo F2 and Endo F3 are more active between pH 4.0 and 4.5, retaining 70% of their activity at pH 3 .0 but above pH 6, the activity of both glycosidases sharply decreases. Endo F2 preferentially hydrolyses biantennary complex glycans and although it can hydrolyse high mannose oligosaccharides, it does so at a greatly diminished rate. Endo F2 does not cleave fucose-containing hybrid structures, or tri- or tetra-antennary oligosaccharides (Trimble et al., 1991). Endo F3 also hydrolyses bi- and tri-antennary glycans but at rates slower than Endo F2. However, if the core asparagine-proximal N­ acetylglucosamine is substituted with an al-6 fucose residue, then Endo F3 will hydrolyse the glycan at a much greater rate (Tarentino et al. , 1994a). 1.2.5.3 PNGase F PNGase F is the best-characterised PNGase with a known tertiary structure (Norris et al., 1994b; Tarentino et al., 1994b), and the gene has been cloned, sequenced, and expressed in E.coli by several groups (Lemp et al., 1990; Tarention et al., 1990; Barsomian et al., 1990; Grueninger-Leitch et al. , 1996). 1.2.5.3.1 Properties of PNGase F While it is most active at pH 8.5, the enzyme is at least 80% active between pH 7.5 and 9.5. The mature enzyme comprises 314 amino acids, has a molecular weight of 34, 779 Da and contains a relatively high number of nine tryptophan residues (Tarentino et al., 1990). PNGase F is compatible with a wide variety of inorganic and organic buffers 21 Chapter 1 Introduction and Literature Review (:SO. l M), including sodium phosphate, lithium carbonate, ammonium bicarbonate, Tris­ HCl, glycylglycine, HEPES, triethylamine acetate, but sodium borate is inhibitory (Tarentino et al., 1994a). It is stable in protein denaturants such as 2.5 M urea at 37°C for 2 hours and still possesses 40% of its activity in 5 M urea. The activity is not inhibited by high concentrations of the chelating agents such as EDT A and 1, 10- orthophennthroline or by the serine protease inhibitor PMSF (Maley et al., 1989). The enzyme is stable at least 6 months at 4°C and indefinitely at -80°C with 50% (v/v) glycerol, and should not be exposed to repeated freezing and thawing. The presence of glycerol (5%) is inhibitory and should not exceed 0.1 % for maximal activity (Dr. G. E. Norris, personal communication). 1.2.5.3.2 Substrate Structure Requirements of PNGase F PNGase F activity requires both the amino and carboxyl groups of the asparagine residue to be in a peptide linkage, while the minimum the oligosaccharide must consist at least two GlcNAc residues. The enzyme is also highly sensitive to modifications of the sugar core, as an a 1-3-fucose substituent on the asparagine-proximal GlcNAc found in glycoproteins from plants and insects completely blocks PNGase F activity, but an al-6-fucose substituent has no effect (Tretter et al., 1991). However, PNGase F shows no selectivity for the outer carbohydrate structure, giving it a broad specificity for N-linked glycoproteins. Studies with a stepwise degradation of a biantennary glycopeptide with exoglycosidases have shown that the size of the carbohydrate moiety on the substrate has little influence on enzyme activity and that the hydrolysis rate may be primarily determined by the length of the peptide (Altman et al., 1995). Recently, Fan et al., (1997) carried out detailed studies on substrate structure requirements of PNGase A and F, using more than 30 glycopeptides of varying peptide lengths and oligosaccharide moieties, and made several interesting observations: (i) Neither enzyme cleaves cellobiose nor lactose substituted glycopeptides, indicating that the 2-acetamindo group on the Asn-linked GlcNAc is important in substrate recognition. (ii) PNGase A efficiently cleaves a Gin-bound CTB glycopeptide (Gln replaces the Asn), but the action of PNGase Fon this peptide is minimal. (iii) PNGase A can act on CTB dipeptides whereas PNGase F prefers a tripeptide or longer. 22 Chapter 1 Introduction and Literature Review 1.2.5.3.3 The Three Dimensional Structure of PNGase F Structures from two strains (ATCC 33958 and CDC strain 3352) of F. meningosepticum were published (Kuhn et al., 1994; Norris et al. , 1994a). Although both structures were crystallised under different conditions (at pH 4.3 and pH 8.5), they are essentially identical. The structure consists of two tightly associated all-~ domains, an amino­ terminal domain comprising residues 1-135 and a carboxyl-terminal domain comprising residues 142-314. The ~-barrels in each domain are arranged in a 4+4 jelly roll arrangement (figure 1.9) that closely resembles viral capsid proteins (Rossmann et al., 1983). I B D G I F E H C B G I F E H C I 313 5 13 41 34 142 310 Domain 1 Domain 2 Figure 1.9 Topology of the PNGase F molecule. The ~-strands are identified using the convention adopted for the viral coat proteins and the residue number associated with each strand are given. The inset shows the classic eight-stranded ~-jelly roll motif for comparison. (Figure from Norris et al. , 1994a.) A short extended piece of polypeptide (residues 136-141) links the two domains at the bottom of the molecule. At the top of the molecule, several connecting loops from domain 2 reach across to interact closely with domain 1 to tie the domains together (figure 1. 10). The most important of these comprises residues 227-257 that links 23 Chapter 1 Introduction and Literature Review strands F and G of domain 2 and forms a double loop in which residues 227-249 form the first part. This loop extends to and back from domain 1 and includes a wide Q loop between residues 231-245 . The second part of the loop is formed by residues 250-257 and is tied to the first part by a disulfide bridge, 231-252. At the back of the molecule, the loop 151-159, which links strands B and C of domain 2, also stretches across towards domain 1, with residues 151-15 5 making a number of hydrogen bonds . These loops provide most of the inter-domain interactions and also play a major role in forming the active site. The domains are packed back to back to create an approximately rectangular molecule of overall dimensions 50 x 45 x 30A (Norris et al. , 1994a). Figure 1.10 The PNGase F structure. The protein is folded into two domains, each with an eight-stranded, antiparallel ~­ jelly roll configuration similar to structures found in lectins. Domain 1 is on the right and domain2 is on the left. This figure is produced by the program MolScript (Kraulis et al., 1991) and viewed with Raster3D (Merritt and Bacon, 1997). 24 Chapter 1 Introduction and Literature Review 1.2.5.3.4 The Active site of PNGase F Khun et al., (1995) found that a disaccharide, GlcNAc- GlcNAc, acted as an inhibitor to the enzyme and was able to obtain a crystal structure with the inhibitor bound (Figure 1.11 ). The active site is located in a deep cleft at the interface between the two domains at the top of the molecule. This cleft, formed by long loops that connect the P-strands between the P-sheets, is lined by His-193 (light green) and five exposed tryptophan residues, Trp-59, Trp-86 and Trp-120 on domain 1 (blue), and Trp-191 and Trp-207 on domain 2 ( cyan). Three acidic residues Asp-60 (red), Glu-206 and Glu-118 (magenta) located at the bottom of the cleft have been shown to be essential for activity by site­ directed mutagenesis studies (Khun et al., 1995). The D60N mutant has no detectable activity while E206Q and El 18Q have less than 0.01 and 0.1% of the wild type activity, respectively. Figure 1.11 The orientation of N,N'-diacetylchitobiose (CTB) inside the active site of PN Gase F. The CTB, in a-configuration, is positioned with the non-reducing end pointing into the cleft. The active site is lined by one basic (His-193 in light green), three acidic (Asp-60 in red and Glu-118 and Glu207 in magenta), and five tryptophan residues (blue on domain 1 and cyan on domain 2). This figure is generated by MolScript (Kraulis et al., 1991) and viewed with Raster3D (Merritt and Bacon, 1997). 25 Chapter 1 Introduction and Literature Review Figure 1.12 is a schematic diagram of the contacts made between the dichitobiose inhibitor and the enzyme. It shows that the oxygen (OD 1) of Asp-60 forms hydrogen bonds with O 1 of the reducing-end of the disaccharide substrate and the water molecule, Wat346 that connects Asp-60 to Glu-206. This indicates Asp-60 is the primary catalytic residue and since Glu-206 is not in direct contact with the substrate, it may be important for stabilisation of the reaction intermediates or interaction with the 0 6 of the substrate asparagine. Glu-118 forms a hydrogen bond with the 06 of the second N-acetyl­ glucosamine residue of the substrate and the low activity of the E 118Q mutant is probably due to its reduced ability to bind the oligosaccharide. Thus it is probably responsible for positioning the CTB in the active site. The contacts of the second GlcNAc are much weaker and only 0 6 and 0 7 are involved in direct hydrogen bonds with the protein and these may assist in CTB positioning. Two aromatic residues that line the active site also probably aid in the orientation of CTB in the active site. Trp-191 is positioned nearly perpendicular to the disaccharide and forms a hydrogen bond with the 0 3 of the reducing-end N-acetylglucosamine residue of the substrate, while Trp-120 is positioned in a way to be able to make a hydrophobic contact with the first mannose residue in the trimannosyl core. If this structure represents the true binding of the dichitobiose core when it is linked to the polypeptide, it is clear that steric hindrance would prevent N-glycans with a fucose al-3 substituted on the proximal GlcNAc binding as shown in this diagram. 26 Chapter 1 Introduction and Literature Review Figure 1.12. Schematic diagram showing the intermolecular hydrogen bonding contacts between PNGase F, N,N'-diacetylchitobiose and water molecules. Protein residues are indicated with single letter amino acid code and sequence number in boxes, and water molecules are indicated by a number analogous to their number in the Protein Data Bank. The reducing-end GlcNAc residue is on the left. Hydrogen bonding di stances, in A, are shown in italics. Note that Wat349 is present twice, once in contact with 0 3 and with R61. (Figure reproduced from Khun et al., 1995.) 1.2.6 Constructs of PNGase F N-glycosylation is cell- and species-specific and the oligosaccharide moieties play important roles in the cell. It is therefore, of considerable interest to biochemists studying the structure and function of such oligosaccharide moieties to be able to remove the intact oligosaccharides from glycoproteins, without compromising the integrity of the protein. Several constructs for the E. coli based expression of recombinant PNGase F have been reported in the literature and are briefly discussed. 27 Chapter 1 Introduction and Literature Review 1.2.6.1 The First Clones of PNGase F Genomic DNA of PNGase F from F. meningosepticum (ATCC 33958) was first isolated independently in three laboratories at about the same time in the United States (Barsomian et al., 1990, Lemp et al., 1990, and Tarentino et al., 1990). The DNA isolated by the three groups contained both a promoter region and an unusually long leader signal ( 42 residues) of the native bacteria. This was cloned into a pBluescript or pUC 18 expression vectors under the control of a lac promoter. The constructs from these groups were essentially identical, and produced PNGase F in E. coli at only low levels (30% that of produced by the native bacterium). Interestingly, the expression level was independent of the orientation of the F. meningosepticum DNA insert relative to the vector encoded lac promoter. Furthermore, when the promoter regions were deleted from the vector designed by Barsomian et al., the level of expression was decreased to only 10% of that of the original clone. Barsomian et al., suggested that this could be the result of the deleted transcription promoter regions being recognised by E. coli. Subsequent DNA sequencing showed the presence of consensus hexamer E. coli promoter sequences in that region (Barsomian et al, 1990). Moreover, the leader sequence was not processed correctly and the nascent protein remained in the cytoplasm. The leader signal was also cleaved at two different sites (Figure 1.13), although both these cleavage products deglycosylated hen ovomucoid glycopeptide. Observing this, the Barsomian group deleted the unnecessary promoter and leader signal sequences and increased expression by incorporating a ribosome-binding site upstream from the PNGase F DNA to produce the plasmid pBR29. This construct resulted in a two-fold increase in protein expression and produced a mature cytosolic enzyme with no additional pre-PNGase F sequences (Barsamian et al, 1990). A B ------- E. coli cleavage sites C t t F. meningosepticum cleavage site • MRKLLIFSISAYLMAGIVSCKG VOS ATPVTEDRLALNAVN APADNT 1 10 20 30 40 Figure 1.13 Structure of PNGase F signal sequence. The 40 amino acid signal sequence and the first 6 amino acids of the mature PNGase F produced by F. meningosepticum are shown. Cleavage sites are indicated by arrows and underlined sequence showed the N-terminus of PNGase F. Structural features indicated are A, N-terminal positively charged residues,; B, hydrophobic region punctuated with glycine; C, serine residue connecting the hydrophobic region of the E coli cleavage sites (Gly-22 or Ser-25). 28 Chapter 1 Introduction and Literature Review 1.2.6.2 PNGase F Cloned as GST Fusions In 1995, a Swedish group cloned PNGase F (ATCC 33958) into a pGEX-3X vector and expressed the enzyme as a fusion protein with a 26 kDa glutathione-S-transferase to give the plasmid, GST-PNGase F (Grueninger-Letich et al, 1995). Approximately 5 mg of pure GST-PNGase F was recovered per litre of culture giving a yield of 2.5 mg L- 1 • The fusion enzyme was active on a range of glycoprotein substrates including phytase, acid phosphatase and human renin. 1.2.6.3 A Clone of PNGase F from F. meningosepticum (CDC strain 3352) The PNGase F DNA with no leader signal sequence was isolated from the genomic DNA of F. meningosepticum (CDC strain 3352) and cloned into the plasmid pT7-7 to produce the construct, pT7-PNG. The PNGase DNA cloned contained an extra 27 base pairs (bp) downstream from the initiation codon of pT7-7. The recombinant enzyme was mainly produced in inclusion bodies, although enough soluble enzyme was produced to demonstrate it was active on both glycoproteins and glycopeptides (Dr. G. E. Norris, personal communication). In order to produce a high copy number expression vector containing an fl origin for single stranded DNA production, deletion PCR was carried out on pT7-PNG to move the DNA to start at the initiation codon at the unique Nde I site in pT7-7. The Xba 1/Pst I fragment containing the PNGase F gene and ribosome binding site of pT7-PNG was subcloned into pBluescript KS(-) downstream of the T7 promoter, to produce the plasmid pKS-PNG. As with the previous construct, the protein was expressed at high levels in an insoluble state. 1.2.6.4 Inclusion Bodies Formation in E. coli Despite extensive knowledge of the genetics of E. coli, not all genes can be expressed efficiently in this organism. This may be due to the subtle structural features of the gene sequence, the stability and translational efficiency of the mRNA, the ease of protein folding, degradation of the protein by host cell proteases, major differences in the codon usage between the foreign gene and native E. coli and the potential toxicity of the protein to the host (Makrides, 1996). Proteins expressed in E. coli often accumulate intracellularly in the form of inclusion bodies (Marston, 1986). Biologically active protein can be recovered from such aggregates by denaturation and refolding in vitro (Hockney, 1994). 29 Chapter 1 Introduction and Literature Review A statistical analysis of the composition of 81 proteins that do and do not form inclusion bodies in E. coli showed that there are six parameters that are correlated with inclusion body formation. They are average charge, fraction of tum-forming residues, cysteine content, pro line content, hydrophobicity and the total number of residues (Wilkinson et al, 1991 ). While there are some advantages in producing recombinant proteins as inclusion bodies, the process of denaturation and refolding into biologically active conformation is very empirical and, in most cases, not applicable for the efficient reconstitution of biologically active proteins (Rudolg and Lilie, 1996). It has been shown that the post-translational folding of proteins, the assembly of polypeptides into oligomeric structures and the localisation of proteins are mediated by specialised host proteins known as molecular chaperones, such as GroES and GroEL proteins in E. coli (Backer and Craig, 1994; Clarke, 1996; Ellis and Hartl, 1996). Several of these chaperones are heat-shock proteins whose synthesis is induced in response to stress. The exact mechanism of chaperone-assisted protein folding is still unclear, but it has been suggested chaperones assist protein folding by preventing unproductive side reaction such as aggregation (Rudolf and Lilie, 1996). However, the experiments where chaperones were used to assist protein folding have been inconclusive with the effects of chaperone co-production on gene expression in E. coli being protein specific (Wall and Pluckthun, 1995). One solution to the solubility problem is to export proteins to the periplasm. 30 Chapter 1 1.3 Introduction and Literature Review Targeting and Assembly of Proteins in the Bacterial Peri plasm All cells sequester biological activities in subcellular compartments that are bound by lipid bilayers. If lipid bilayers are considered as compartments, then Gram-negative bacteria such as Escherichia coli can be divided into four regions: Cytoplasm, inner membrane, periplasm, and outer membrane. The latter three compartments can also be viewed as one structure that envelops and separates the cytoplasm from the external environment, and are collectively known as the bacterial envelope. While compartmentalisation is essential for viability, it also poses a problem for the cell. For example in E. coli, all proteins are synthesised in the cytoplasm and proteins destined for the periplasm and the outer membrane must be translocated across the inner membrane and then exported to their specific targets. This process involves the proteins being synthesised as precursor proteins. These proteins contain an amino-terminal signal sequence that directs the precursors to a collection of proteins that catalyse precursor translocation across the inner membrane. During translocation, the signal sequence is proteolytically cleaved, generating a mature form of the protein, which is then transported to its appropriate destination. 1.3.1 The Bacterial Periplasm The periplasm is the region between the inner and outer membranes of Gram-negative bacteria and constitutes about 30% of the total cell volume of E. coli (van Wielink and Duine, 1990). It has a gel-like structure, filled with a peptidoglycan matrix that is progressively more tightly cross-linked towards the outer membrane (Hobot et al., 1984), and has been shown to be impermeable to whole nucleotides. The non-reducing environment of the periplasm favours disulfide bridge formation in proteins thus is of particular interest in the heterologous periplasmic expression of recombinant proteins, such as PNGase F, that contain multiple disulfide bonds which contribute to their stability and in some cases, their catalytic activity. 31 Chapter 1 Introduction and Literature Review 1.3.2 The Signal Peptide Proteins destined for export are synthesised as precursor proteins that contain amino­ terminal signal sequences. These signal peptides have been recognised to possess three distinct regions based on length, hydrophobicity and conformation (von Heijne, 1990). The amino-terminal region of the signal peptide is about 5-8 amino acids long and is characterised by the presence of basic residues. The net positive charge is essential for interaction with the negatively charged surface of the inner membrane (Inouye et al., 1982). The central region is about 8-12 non-polar amino acids long and has a high inclination for a-helical formation that may facilitate translocation across the bilayer (Engelman and Steitz, 1981 ). The carboxyl cleavage region is typically about 3-9 amino acids long (Jain et al., 1994), and involved in signal peptidase recognition and cleavage during folding and localisation of the exported protein. Many recombinant proteins when expressed intracellularly in E. coli at high levels, become inclusion bodies. The introduction of an appropriate signal sequence to the 5' end of the sequence encoding the gene of interest usually results in the polypeptide chain being exported into the periplasm of the bacteria. One example is the addition of a 63 base pair long signal sequence from a major outer membrane in E. coli, OmpA, to ~-lactamase (Ghrayeb et al. , 1984). Upon induction of gene expression, ~-lactamase was secreted into the periplasm with a correctly processed amino-terminus. 1.3.3 The Sec Export Pathway Compartmentalisation inside E. coli demands nascent extracytoplasmic proteins to be actively transported to destinations such as the periplasm and outer membrane. To perform this duty, E. coli employs a series of Sec (secretion) proteins arranged in a hexameric complex, known as the preprotein translocase, to catalyse the translocation of various polypeptides through the inner membrane. The multi subunit enzyme is composed of an integral membrane domain, SecYEGDFyajC, and a peripheral membrane domain, SecA (Wickner and Rice-Leonard, 1996). Genetic and biochemical studies have identified the core subunits of this enzyme to be the Sec Y, SecE and SecA proteins (Schatz and Bechwith, 1990; Akimaru et al., 1991; Duong and Wicker, 1997). Translocation depends upon the energy of A TP hydrolysis by SecA (Chen and Tai, 1985) and is strongly stimulated by proton motive force (PMF) across the membrane (Geller et al. , 1986). A model for protein translocation across the inner membrane by preprotin translocase is shown in figure 1.14. 32 Chapter 1 Binding Proofreading ~ B ribosome Introduction and Literature Review Insertion SS Cleavage Signal Sequence PMF ~ Translocating polypeptide Delnsertlon Reinsertion Figure 1.14 A model for protein translocation across the inner membrane. Beginning from the lower left, a precursor protein emerges from the ribosome and interacts with SecB. The SecB/precursor complex then interacts with membrane­ associated SecA at the cytoplasmic face of the SecEGY complex (Binding). The signal sequence of the precursor is verified, and SecB is released (Proof-reading). Binding of ATP to the SecA high-affinity site causes insertion of SecA and bound 20-30 residues of the precursor into the membrane. The signal sequence is cleaved (Insertion/SS Cleavage). SecD/F stabilise the membrane-inserted form of SecA and the proton motive force stimulates additional translocation of the SecA-bound polypeptide (PMF). A TP hydrolysis causes SecA to deinsert from the membrane (Deinsertion). Additional A TP binding causes reinsertion of SecA into the membrane. This reinsertion is accompanied by additional stepwise protein translocation. (Figure reproduced from Danese and Silhavy, 1998.) 1.3.4 Protein Folding in the Periplasm Protein folding in the cytoplasm is catalysed by the two families of chaperones, Hsp 60 and Hsp70, also known as the GroEL and DnaK proteins, respectively in E. coli (Georgopoulos, 1992; Gething and Sambrook, 1992). These chaperones stabilise unstable conformers of proteins during folding, translocation and assembly, and are expressed at higher levels at elevated temperatures to prevent drastic protein aggregation and proteolysis of damaged proteins (Hendrick and Hartl, 1993). However, these chaperones have not been located in the periplasm, probably because ATP, which is essential for chaperone activity, is absent. Instead, two types of folding catalysts have been found in the periplasm that accelerate only a specific rate-limiting step of the 33 Chapter 1 Introduction and Literature Review folding reaction. One type of slow conformational rearrangement, caused by thiol­ disulfide exchanges during disulfide bond formation or rearrangement, is catalysed by protein disulfide isomerases (PDI). An example of the PDI in E. coli is the Dsb family of proteins that belongs to the thioredoxin superfamily. Dsb proteins do not share overall sequence homology with thioredoxins but they share at least one common active site, which is Cys-X-X-Cys (Missiakas and Raina, 1997). A proposed model of disulfide formation catalysed by various Dsb proteins in the periplasm of E. coli is shown in figure 1.15. PERIPL-ASMIC Sf'ACE INNER MuMORANE /J~f,ft ,.. ,r CYTOrLASM Wn. gly nx1dized t Fnlded 1,l(l!ein~ D~~x~ DJltA-ox Proici!l-reJ Folded r,mtein~ Jnt,D.o-x Figure 1.15 Mode of disulfide formation catalysed by the various Dsb enzymes in the periplasm of E. coli. (Figure reproduced from Missiakas and Raina, 1997.) Peptide bonds display a partial double-bond character that force the carbonyl and amino group into a planar structure. The a-carbons attached, to the carboxyl carbon and the amide nitrogen can assume a cis or trans conformation with respect to each other. For most dipeptides, these groups are predominately in the trans conformation with the exception of peptide bonds formed from X-proline dipeptides (X is any residue) which can be in either conformation (Levitt, 1981 ). The cis/trans isomerisation of prolyl peptides is a slow process that impedes protein folding in proteins such as in bovine pancreatic trypsin inhibitor derivative RCAM(14-38) (Jullien and Baldwin, 1981). Peptidyl prolyl isomerases (PPI) catalyse the isomerisation around the X-Pro peptidyl bonds. Four PPiases have been found in E. coli periplasm, RotA, FklB, FkpA, and SurA. 34 Chapter 1 Introduction and Literature Review 1.4 The Scope of this Project PNGases are widely used in research for both investigations into the function and structural characterisation of the glycan moieties on glycoproteins and in the removal of heterogeneous sugar chains from glycoproteins to aid their crystallisation. Studies of the sugar moieties of glycoproteins are particularly important in the biotechnology industry as many recombinant proteins designed for therapeutic use are in fact glycosylated and many disease states exhibited changes in oligosaccharide structures. Therefore, it is of great interest to be able to produce relatively large quantities of the enzymes to aid in such studies. The focus of this project is on PNGase F that is one of the best-characterised PNGases but several important aspects of the enzyme have not been addressed yet: (1) The protein is secreted in small amount by F. meningosepticum and the purification of the secreted enzyme is a tedious multi step task with a low yield (at best < 0.5 mg L- 1 culture medium). (2) The biochemical properties and the kinetic aspects of the protein have not been determined in great detail. The Km of the PNGase F from ATCC 33958 strain of F. meningoseptic