chapter AMINO ACIDS, PEPTIDES, AND PROTEINS 3.1 Amino Acids 75 3.2 Peptides and Proteins 85 3.3 Working with Proteins 89 3.4 The Covalent Structure of Proteins 96 3.5 Protein Sequences and Evolution 106 The word protein that I propose to you . . . I would wish to derive from proteios, because it appears to be the primitive or principal substance of animal nutrition that plants prepare for the herbivores, and which the latter then furnish to the carnivores. —J. J. Berzelius, letter to G. J. Mulder, 1838 – + – + 3 75 P roteins are the most abundant biological macromol- ecules, occurring in all cells and all parts of cells. Pro- teins also occur in great variety; thousands of different kinds, ranging in size from relatively small peptides to huge polymers with molecular weights in the millions, may be found in a single cell. Moreover, proteins exhibit enormous diversity of biological function and are the most important final products of the information path- ways discussed in Part III of this book. Proteins are the molecular instruments through which genetic informa- tion is expressed. Relatively simple monomeric subunits provide the key to the structure of the thousands of different pro- teins. All proteins, whether from the most ancient lines of bacteria or from the most complex forms of life, are constructed from the same ubiquitous set of 20 amino acids, covalently linked in characteristic linear sequences. Because each of these amino acids has a side chain with distinctive chemical properties, this group of 20 pre- cursor molecules may be regarded as the alphabet in which the language of protein structure is written. What is most remarkable is that cells can produce proteins with strikingly different properties and activi- ties by joining the same 20 amino acids in many differ- ent combinations and sequences. From these building blocks different organisms can make such widely diverse products as enzymes, hormones, antibodies, trans- porters, muscle fibers, the lens protein of the eye, feath- ers, spider webs, rhinoceros horn, milk proteins, antibi- otics, mushroom poisons, and myriad other substances having distinct biological activities (Fig. 3–1). Among these protein products, the enzymes are the most var- ied and specialized. Virtually all cellular reactions are catalyzed by enzymes. Protein structure and function are the topics of this and the next three chapters. We begin with a descrip- tion of the fundamental chemical properties of amino acids, peptides, and proteins. 3.1 Amino Acids Protein Architecture—Amino Acids Proteins are polymers of amino acids, with each amino acid residue joined to its neighbor by a specific type of covalent bond. (The term “residue” reflects the loss of the elements of water when one amino acid is joined to another.) Proteins can be broken down (hydrolyzed) to their constituent amino acids by a variety of methods, and the earliest studies of proteins naturally focused on 8885d_c03_075 12/23/03 10:16 AM Page 75 mac111 mac111:reb: the free amino acids derived from them. Twenty differ- ent amino acids are commonly found in proteins. The first to be discovered was asparagine, in 1806. The last of the 20 to be found, threonine, was not identified until 1938. All the amino acids have trivial or common names, in some cases derived from the source from which they were first isolated. Asparagine was first found in as- paragus, and glutamate in wheat gluten; tyrosine was first isolated from cheese (its name is derived from the Greek tyros, “cheese”); and glycine (Greek glykos, “sweet”) was so named because of its sweet taste. Amino Acids Share Common Structural Features All 20 of the common amino acids are H9251-amino acids. They have a carboxyl group and an amino group bonded to the same carbon atom (the H9251 carbon) (Fig. 3–2). They differ from each other in their side chains, or R groups, which vary in structure, size, and electric charge, and which influence the solubility of the amino acids in wa- ter. In addition to these 20 amino acids there are many less common ones. Some are residues modified after a protein has been synthesized; others are amino acids present in living organisms but not as constituents of proteins. The common amino acids of proteins have been assigned three-letter abbreviations and one-letter symbols (Table 3–1), which are used as shorthand to in- dicate the composition and sequence of amino acids polymerized in proteins. Two conventions are used to identify the carbons in an amino acid—a practice that can be confusing. The additional carbons in an R group are commonly desig- nated H9252, H9253, H9254, H9255, and so forth, proceeding out from the H9251 carbon. For most other organic molecules, carbon atoms are simply numbered from one end, giving high- est priority (C-1) to the carbon with the substituent con- taining the atom of highest atomic number. Within this latter convention, the carboxyl carbon of an amino acid would be C-1 and the H9251 carbon would be C-2. In some cases, such as amino acids with heterocyclic R groups, the Greek lettering system is ambiguous and the num- bering convention is therefore used. For all the common amino acids except glycine, the H9251 carbon is bonded to four different groups: a carboxyl group, an amino group, an R group, and a hydrogen atom (Fig. 3–2; in glycine, the R group is another hydrogen atom). The H9251-carbon atom is thus a chiral center (p. 17). Because of the tetrahedral arrangement of the bonding orbitals around the H9251-carbon atom, the four dif- ferent groups can occupy two unique spatial arrange- ments, and thus amino acids have two possible stereoisomers. Since they are nonsuperimposable mir- ror images of each other (Fig. 3–3), the two forms rep- resent a class of stereoisomers called enantiomers (see Fig. 1–19). All molecules with a chiral center are also optically active—that is, they rotate plane-polarized light (see Box 1–2). CH 2 H11001 NH 3 COO H11002 H11001 NH 3 CH 2 CH 2 CH 2 CH Lysine 234561 edgba Chapter 3 Amino Acids, Peptides, and Proteins76 (a) (c)(b) FIGURE 3–1 Some functions of proteins. (a) The light produced by fireflies is the result of a reaction involving the protein luciferin and ATP, catalyzed by the enzyme luciferase (see Box 13–2). (b) Erythro- cytes contain large amounts of the oxygen-transporting protein he- moglobin. (c) The protein keratin, formed by all vertebrates, is the chief structural component of hair, scales, horn, wool, nails, and feath- ers. The black rhinoceros is nearing extinction in the wild because of the belief prevalent in some parts of the world that a powder derived from its horn has aphrodisiac properties. In reality, the chemical prop- erties of powdered rhinoceros horn are no different from those of pow- dered bovine hooves or human fingernails. H 3 N H11001 C COO H11002 R H FIGURE 3–2 General structure of an amino acid. This structure is common to all but one of the H9251-amino acids. (Proline, a cyclic amino acid, is the exception.) The R group or side chain (red) attached to the H9251 carbon (blue) is different in each amino acid. 8885d_c03_076 12/23/03 10:20 AM Page 76 mac111 mac111:reb: Special nomenclature has been developed to spec- ify the absolute configuration of the four substituents of asymmetric carbon atoms. The absolute configura- tions of simple sugars and amino acids are specified by the D, L system (Fig. 3–4), based on the absolute con- figuration of the three-carbon sugar glyceraldehyde, a convention proposed by Emil Fischer in 1891. (Fischer knew what groups surrounded the asymmetric carbon of glyceraldehyde but had to guess at their absolute configuration; his guess was later confirmed by x-ray diffraction analysis.) For all chiral compounds, stereo- isomers having a configuration related to that of L-glyceraldehyde are designated L, and stereoisomers related to D-glyceraldehyde are designated D. The func- tional groups of L-alanine are matched with those of L- glyceraldehyde by aligning those that can be intercon- verted by simple, one-step chemical reactions. Thus the carboxyl group of L-alanine occupies the same position about the chiral carbon as does the aldehyde group of L-glyceraldehyde, because an aldehyde is readily converted to a carboxyl group via a one-step oxidation. Historically, the similar l and d designations were used for levorotatory (rotating light to the left) and dextro- rotatory (rotating light to the right). However, not all L-amino acids are levorotatory, and the convention shown in Figure 3–4 was needed to avoid potential am- biguities about absolute configuration. By Fischer’s con- vention, L and D refer only to the absolute configura- tion of the four substituents around the chiral carbon, not to optical properties of the molecule. Another system of specifying configuration around a chiral center is the RS system, which is used in the systematic nomenclature of organic chemistry and de- scribes more precisely the configuration of molecules with more than one chiral center (see p. 18). The Amino Acid Residues in Proteins Are L Stereoisomers Nearly all biological compounds with a chiral center oc- cur naturally in only one stereoisomeric form, either D or L. The amino acid residues in protein molecules are exclusively L stereoisomers. D-Amino acid residues have been found only in a few, generally small peptides, in- cluding some peptides of bacterial cell walls and certain peptide antibiotics. It is remarkable that virtually all amino acid residues in proteins are L stereoisomers. When chiral compounds are formed by ordinary chemical reactions, the result is a racemic mixture of D and L isomers, which are diffi- cult for a chemist to distinguish and separate. But to a living system, D and L isomers are as different as the right hand and the left. The formation of stable, re- peating substructures in proteins (Chapter 4) generally requires that their constituent amino acids be of one stereochemical series. Cells are able to specifically syn- thesize the L isomers of amino acids because the active sites of enzymes are asymmetric, causing the reactions they catalyze to be stereospecific. 3.1 Amino Acids 77 (a) COO H11002 H 3 N CH 3 CH 3 H CC H COO H11002 L-Alanine D-Alanine H11001 NH 3 H11001 H 3 N H11001 C COO H11002 CH 3 H HC COO CH 3 N H11001 H 3 (b) L-Alanine D-Alanine H 3 N H11001 COO H11002 CH 3 HHC COO H11002 H11002 CH 3 N H11001 H 3 L-Alanine D-Alanine C (c) FIGURE 3–3 Stereoisomerism in H9251-amino acids. (a)The two stereoiso- mers of alanine, L- and D-alanine, are nonsuperimposable mirror im- ages of each other (enantiomers). (b, c) Two different conventions for showing the configurations in space of stereoisomers. In perspective formulas (b) the solid wedge-shaped bonds project out of the plane of the paper, the dashed bonds behind it. In projection formulas (c) the horizontal bonds are assumed to project out of the plane of the paper, the vertical bonds behind. However, projection formulas are often used casually and are not always intended to portray a specific stereochemical configuration. HO C 1 CHO 3 CH 2 OH HHC CHO CH 2 OH OH H 3 N H11001 C COO H11002 CH 3 HHC COO H11002 CH 3 N H11001 H 3 L-Glyceraldehyde D-Alanine 2 D-Glyceraldehyde L-Alanine FIGURE 3–4 Steric relationship of the stereoisomers of alanine to the absolute configuration of L- and D-glyceraldehyde. In these per- spective formulas, the carbons are lined up vertically, with the chiral atom in the center. The carbons in these molecules are numbered be- ginning with the terminal aldehyde or carboxyl carbon (red), 1 to 3 from top to bottom as shown. When presented in this way, the R group of the amino acid (in this case the methyl group of alanine) is always below the H9251 carbon. L-Amino acids are those with the H9251-amino group on the left, and D-amino acids have the H9251-amino group on the right. 8885d_c03_077 12/23/03 10:20 AM Page 77 mac111 mac111:reb: Amino Acids Can Be Classified by R Group Knowledge of the chemical properties of the common amino acids is central to an understanding of biochem- istry. The topic can be simplified by grouping the amino acids into five main classes based on the properties of their R groups (Table 3–1), in particular, their polarity, or tendency to interact with water at biological pH (near pH 7.0). The polarity of the R groups varies widely, from nonpolar and hydrophobic (water-insoluble) to highly polar and hydrophilic (water-soluble). The structures of the 20 common amino acids are shown in Figure 3–5, and some of their properties are listed in Table 3–1. Within each class there are grada- tions of polarity, size, and shape of the R groups. Nonpolar, Aliphatic R Groups The R groups in this class of amino acids are nonpolar and hydrophobic. The side chains of alanine, valine, leucine, and isoleucine tend to cluster together within proteins, stabilizing pro- tein structure by means of hydrophobic interactions. Glycine has the simplest structure. Although it is for- mally nonpolar, its very small side chain makes no real contribution to hydrophobic interactions. Methionine, one of the two sulfur-containing amino acids, has a non- polar thioether group in its side chain. Proline has an Chapter 3 Amino Acids, Peptides, and Proteins78 TABLE 3–1 Properties and Conventions Associated with the Common Amino Acids Found in Proteins pK a values Abbreviation/ pK 1 pK 2 pK R Hydropathy Occurrence in Amino acid symbol M r (OCOOH) (ONH 3 H11001 ) (R group) pI index* proteins (%) ? Nonpolar, aliphatic R groups Glycine Gly G 75 2.34 9.60 5.97 H110020.4 7.2 Alanine Ala A 89 2.34 9.69 6.01 1.8 7.8 Proline Pro P 115 1.99 10.96 6.48 1.6 5.2 Valine Val V 117 2.32 9.62 5.97 4.2 6.6 Leucine Leu L 131 2.36 9.60 5.98 3.8 9.1 Isoleucine Ile I 131 2.36 9.68 6.02 4.5 5.3 Methionine Met M 149 2.28 9.21 5.74 1.9 2.3 Aromatic R groups Phenylalanine Phe F 165 1.83 9.13 5.48 2.8 3.9 Tyrosine Tyr Y 181 2.20 9.11 10.07 5.66 H110021.3 3.2 Tryptophan Trp W 204 2.38 9.39 5.89 H110020.9 1.4 Polar, uncharged R groups Serine Ser S 105 2.21 9.15 5.68 H110020.8 6.8 Threonine Thr T 119 2.11 9.62 5.87 H110020.7 5.9 Cysteine Cys C 121 1.96 10.28 8.18 5.07 2.5 1.9 Asparagine Asn N 132 2.02 8.80 5.41 H110023.5 4.3 Glutamine Gln Q 146 2.17 9.13 5.65 H110023.5 4.2 Positively charged R groups Lysine Lys K 146 2.18 8.95 10.53 9.74 H110023.9 5.9 Histidine His H 155 1.82 9.17 6.00 7.59 H110023.2 2.3 Arginine Arg R 174 2.17 9.04 12.48 10.76 H110024.5 5.1 Negatively charged R groups Aspartate Asp D 133 1.88 9.60 3.65 2.77 H110023.5 5.3 Glutamate Glu E 147 2.19 9.67 4.25 3.22 H110023.5 6.3 *A scale combining hydrophobicity and hydrophilicity of R groups; it can be used to measure the tendency of an amino acid to seek an aqueous environment (H11002 values) or a hydrophobic environment (H11001 values). See Chapter 11. From Kyte, J. & Doolittle, R.F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132. ? Average occurrence in more than 1,150 proteins. From Doolittle, R.F. (1989) Redundancies in protein sequences. In Prediction of Protein Struc- ture and the Principles of Protein Conformation (Fasman, G.D., ed.), pp. 599–623, Plenum Press, New York. 8885d_c03_078 12/23/03 10:20 AM Page 78 mac111 mac111:reb: aliphatic side chain with a distinctive cyclic structure. The secondary amino (imino) group of proline residues is held in a rigid conformation that reduces the structural flexibility of polypeptide regions containing proline. Aromatic R Groups Phenylalanine, tyrosine, and tryp- tophan, with their aromatic side chains, are relatively nonpolar (hydrophobic). All can participate in hy- drophobic interactions. The hydroxyl group of tyrosine can form hydrogen bonds, and it is an important func- tional group in some enzymes. Tyrosine and tryptophan are significantly more polar than phenylalanine, because of the tyrosine hydroxyl group and the nitrogen of the tryptophan indole ring. Tryptophan and tyrosine, and to a much lesser ex- tent phenylalanine, absorb ultraviolet light (Fig. 3–6; Box 3–1). This accounts for the characteristic strong ab- sorbance of light by most proteins at a wavelength of 280 nm, a property exploited by researchers in the char- acterization of proteins. 3.1 Amino Acids 79 Nonpolar, aliphatic R groups H 3 N H11001 C COO H11002 H H H 3 N H11001 C COO H11002 CH 3 H H 3 N H11001 C COO H11002 C CH 3 CH 3 H H Glycine Alanine Valine Aromatic R groups H 3 N H11001 C COO H11002 CH 2 H H 3 N H11001 C COO H11002 CH 2 H OH Phenylalanine Tyrosine H 2 N H11001 H 2 C C COO H11002 H C CH 2 H 2 Proline H 3 N H11001 C COO H11002 C CCH H 2 H NH Tryptophan Polar, uncharged R groups H 3 N H11001 C COO H11002 CH 2 OH H H 3 N H11001 C COO H11002 HC CH 3 OH H H 3 N H11001 C COO H11002 C SH H 2 H Serine Threonine H 3 N H11001 C COO H11002 C C H 2 NO H 2 HH 3 N H11001 C COO H11002 C C C H 2 NO H 2 H 2 H Positively charged R groups H11001 N C C C C H 3 N H11001 C COO H11002 H H 2 H 2 H 2 H 2 H 3 C N C C C H 3 N H11001 C COO H11002 H H 2 H 2 H 2 H NH 2 N H11001 H 2 H 3 N H11001 C COO H11002 C CNH H 2 H C H N Lysine Arginine Histidine Negatively charged R groups H 3 N H11001 C COO H11002 C COO H11002 H 2 HH 3 N H11001 C COO H11002 C C COO H11002 H 2 H 2 H Aspartate GlutamateGlutamineAsparagine Cysteine CH H 3 N H11001 C COO H11002 C C CH 3 CH 3 H H 2 H Leucine H 3 N H11001 C COO H11002 C C S CH 3 H 2 H 2 H Methionine H 3 H11001 C COO H11002 HC C CH 3 H 2 H H Isoleucine N C 3 FIGURE 3–5 The 20 common amino acids of proteins. The structural formulas show the state of ionization that would predominate at pH 7.0. The unshaded portions are those common to all the amino acids; the portions shaded in red are the R groups. Although the R group of histidine is shown uncharged, its pK a (see Table 3–1) is such that a small but significant fraction of these groups are positively charged at pH 7.0. 8885d_c03_079 12/23/03 10:20 AM Page 79 mac111 mac111:reb: Polar, Uncharged R Groups The R groups of these amino acids are more soluble in water, or more hydrophilic, than those of the nonpolar amino acids, because they contain functional groups that form hydrogen bonds with water. This class of amino acids includes serine, threonine, cysteine, asparagine, and glutamine. The polarity of serine and threonine is contributed by their hydroxyl groups; that of cysteine by its sulfhydryl group; and that of asparagine and glutamine by their amide groups. Asparagine and glutamine are the amides of two other amino acids also found in proteins, aspartate and glutamate, respectively, to which asparagine and gluta- mine are easily hydrolyzed by acid or base. Cysteine is readily oxidized to form a covalently linked dimeric amino acid called cystine, in which two cysteine mole- cules or residues are joined by a disulfide bond (Fig. 3–7). The disulfide-linked residues are strongly hy- drophobic (nonpolar). Disulfide bonds play a special role in the structures of many proteins by forming co- valent links between parts of a protein molecule or be- tween two different polypeptide chains. Positively Charged (Basic) R Groups The most hydrophilic R groups are those that are either positively or nega- tively charged. The amino acids in which the R groups have significant positive charge at pH 7.0 are lysine, which has a second primary amino group at the H9255 posi- tion on its aliphatic chain; arginine, which has a posi- tively charged guanidino group; and histidine, which has an imidazole group. Histidine is the only common amino acid having an ionizable side chain with a pK a near neutrality. In many enzyme-catalyzed reactions, a His residue facilitates the reaction by serving as a pro- ton donor/acceptor. Negatively Charged (Acidic) R Groups The two amino acids having R groups with a net negative charge at pH 7.0 are aspartate and glutamate, each of which has a sec- ond carboxyl group. Uncommon Amino Acids Also Have Important Functions In addition to the 20 common amino acids, proteins may contain residues created by modification of com- mon residues already incorporated into a polypeptide (Fig. 3–8a). Among these uncommon amino acids are 4-hydroxyproline, a derivative of proline, and 5-hydroxylysine, derived from lysine. The former is found in plant cell wall proteins, and both are found in collagen, a fibrous protein of connective tissues. 6-N- Methyllysine is a constituent of myosin, a contractile protein of muscle. Another important uncommon amino acid is H9253-carboxyglutamate, found in the blood- clotting protein prothrombin and in certain other pro- teins that bind Ca 2H11001 as part of their biological function. More complex is desmosine, a derivative of four Lys residues, which is found in the fibrous protein elastin. Selenocysteine is a special case. This rare amino acid residue is introduced during protein synthesis rather than created through a postsynthetic modifica- tion. It contains selenium rather than the sulfur of cys- teine. Actually derived from serine, selenocysteine is a constituent of just a few known proteins. Some 300 additional amino acids have been found in cells. They have a variety of functions but are not constituents of proteins. Ornithine and citrulline Chapter 3 Amino Acids, Peptides, and Proteins80 Tryptophan Wavelength (nm) Absorbance 5 4 3 2 1 0 6 230 240 250 260 270 280 290 300 310 Tyrosine FIGURE 3–6 Absorption of ultraviolet light by aromatic amino acids. Comparison of the light absorption spectra of the aromatic amino acids tryptophan and tyrosine at pH 6.0. The amino acids are present in equimolar amounts (10 H110023 M) under identical conditions. The meas- ured absorbance of tryptophan is as much as four times that of tyro- sine. Note that the maximum light absorption for both tryptophan and tyrosine occurs near a wavelength of 280 nm. Light absorption by the third aromatic amino acid, phenylalanine (not shown), generally con- tributes little to the spectroscopic properties of proteins. CH 2H H11001 H11001 2e H11002 2H H11001 H11001 2e H11002 COO H11002 COO H11002 H 3 N CH 2 CH CH 2 SH SH Cysteine Cystine Cysteine H11001 NH 3 H11001 CH COO H11002 COO H11002 H 3 N CH 2 CH CH 2 S S H11001 NH 3 H11001 FIGURE 3–7 Reversible formation of a disulfide bond by the oxida- tion of two molecules of cysteine. Disulfide bonds between Cys residues stabilize the structures of many proteins. 8885d_c03_080 12/23/03 10:20 AM Page 80 mac111 mac111:reb: Amino Acids Can Act as Acids and Bases When an amino acid is dissolved in water, it exists in so- lution as the dipolar ion, or zwitterion (German for “hybrid ion”), shown in Figure 3–9. A zwitterion can act as either an acid (proton donor): or a base (proton acceptor): Substances having this dual nature are amphoteric and are often called ampholytes (from “amphoteric electrolytes”). A simple monoamino monocarboxylic H9251- amino acid, such as alanine, is a diprotic acid when fully protonated—it has two groups, the OCOOH group and the ONH 3 H11001 group, that can yield protons: H C COOHR H C COO H11002 R H11001 NH 3 H11001 H H11001 H11001 NH 3 Zwitterion H C COO H11002 R H11001 NH 3 H C COO H11002 R NH 2 H11001 H H11001 Zwitterion 3.1 Amino Acids 81 H 3 NCH 2 CH 2 CH 2 C H11001 H11001 NH 3 H COO H11002 Ornithine H 2 NC O N H CH 2 CH 2 CH 2 C H11001 NH 3 HCOO H11002 Citrulline(b) HO C H H 2 C N H11001 HH C CH 2 HCOO H11002 4-Hydroxyproline H 3 N H11001 CH 2 C OH HCH 2 CH 2 C H11001 NH 3 HCOO H11002 5-Hydroxylysine CH 3 NH CH 2 CH 2 CH 2 CH 2 CH COO 6-N-Methyllysine H11002 OOC C COO H11002 HCH 2 C H11001 NH 3 H COO H11002 H9253-Carboxyglutamate C H 3 N H11001 H11002 OOC H(CH 2 ) 2 C H 3 N H11001 COO H11002 H (CH 2 ) 3 C N H11001 H 3 COO H C (C N H11001 H 2 ) 4 H 3 N H11001 COO H11002 H Desmosine HSe CH 2 C H11001 NH 3 H COO H11002 H11002 Selenocysteine(a) (CH 2 ) 2 H11002 H11001 NH 3 FIGURE 3–8 Uncommon amino acids. (a) Some uncommon amino acids found in proteins. All are derived from common amino acids. Extra functional groups added by modification reactions are shown in red. Desmosine is formed from four Lys residues (the four carbon back- bones are shaded in yellow). Note the use of either numbers or Greek letters to identify the carbon atoms in these structures. (b) Ornithine and citrulline, which are not found in proteins, are intermediates in the biosynthesis of arginine and in the urea cycle. H C COO H11002 R H C COOHR H11001 NH 3 H11001 NH 3 H115451 0 H115461 H H11001 H C COO H11002 R NH 2 H H11001 Net charge: H 2 NC C R H H 3 N H11001 H11002 C C R H Nonionic form Zwitterionic form O HO O O FIGURE 3–9 Nonionic and zwitterionic forms of amino acids. The nonionic form does not occur in significant amounts in aqueous so- lutions. The zwitterion predominates at neutral pH. (Fig. 3–8b) deserve special note because they are key intermediates (metabolites) in the biosynthesis of argi- nine (Chapter 22) and in the urea cycle (Chapter 18). 8885d_c03_081 12/23/03 10:21 AM Page 81 mac111 mac111:reb: Amino Acids Have Characteristic Titration Curves Acid-base titration involves the gradual addition or re- moval of protons (Chapter 2). Figure 3–10 shows the titration curve of the diprotic form of glycine. The plot has two distinct stages, corresponding to deprotonation of two different groups on glycine. Each of the two stages resembles in shape the titration curve of a monoprotic acid, such as acetic acid (see Fig. 2–17), and can be analyzed in the same way. At very low pH, the predominant ionic species of glycine is the fully pro- tonated form, H11001 H 3 NOCH 2 OCOOH. At the midpoint in the first stage of the titration, in which the OCOOH group of glycine loses its proton, equimolar concentra- tions of the proton-donor ( H11001 H 3 NOCH 2 OCOOH) and proton-acceptor ( H11001 H 3 NOCH 2 OCOO H11002 ) species are present. At the midpoint of any titration, a point of in- flection is reached where the pH is equal to the pK a of the protonated group being titrated (see Fig. 2–18). For glycine, the pH at the midpoint is 2.34, thus its OCOOH group has a pK a (labeled pK 1 in Fig. 3–10) of 2.34. (Recall from Chapter 2 that pH and pK a are simply con- venient notations for proton concentration and the equilibrium constant for ionization, respectively. The pK a is a measure of the tendency of a group to give up a proton, with that tendency decreasing tenfold as the pK a increases by one unit.) As the titration proceeds, another important point is reached at pH 5.97. Here there is another point of inflection, at which removal of the first proton is essentially complete and removal of the second has just begun. At this pH glycine is present largely as the dipolar ion H11001 H 3 NOCH 2 OCOO H11002 . We shall return to the significance of this inflection point in the titration curve (labeled pI in Fig. 3–10) shortly. The second stage of the titration corresponds to the removal of a proton from the ONH 3 H11001 group of glycine. The pH at the midpoint of this stage is 9.60, equal to the pK a (labeled pK 2 in Fig. 3–10) for the ONH 3 H11001 group. The titration is essentially complete at a pH of about 12, at which point the predominant form of glycine is H 2 NOCH 2 OCOO H11002 . Chapter 3 Amino Acids, Peptides, and Proteins82 BOX 3–1 WORKING IN BIOCHEMISTRY Absorption of Light by Molecules: The Lambert-Beer Law A wide range of biomolecules absorb light at charac- teristic wavelengths, just as tryptophan absorbs light at 280 nm (see Fig. 3–6). Measurement of light absorp- tion by a spectrophotometer is used to detect and iden- tify molecules and to measure their concentration in solution. The fraction of the incident light absorbed by a solution at a given wavelength is related to the thick- ness of the absorbing layer (path length) and the con- centration of the absorbing species (Fig. 1). These two relationships are combined into the Lambert-Beer law, log H11005 H9255cl where I 0 is the intensity of the incident light, I is the in- tensity of the transmitted light, H9255 is the molar extinc- tion coefficient (in units of liters per mole-centimeter), c is the concentration of the absorbing species (in moles per liter), and l is the path length of the light- absorbing sample (in centimeters). The Lambert-Beer law assumes that the incident light is parallel and monochromatic (of a single wavelength) and that the solvent and solute molecules are randomly oriented. The expression log (I 0 /I) is called the absorbance, designated A. It is important to note that each successive milli- meter of path length of absorbing solution in a 1.0 cm cell absorbs not a constant amount but a constant frac- tion of the light that is incident upon it. However, with an absorbing layer of fixed path length, the ab- sorbance, A, is directly proportional to the con- centration of the absorbing solute. The molar extinction coefficient varies with the nature of the absorbing compound, the solvent, and the wavelength, and also with pH if the light-absorbing species is in equilibrium with an ionization state that has different absorbance properties. I 0 H5007 I Intensity of transmitted light I DetectorMonochromatorLamp Intensity of incident light I 0 Sample cuvette with c moles/liter of absorbing species 0.012A = l FIGURE 1 The principal components of a spectrophotometer. A light source emits light along a broad spectrum, then the monochromator selects and transmits light of a particular wavelength. The monochro- matic light passes through the sample in a cuvette of path length l and is absorbed by the sample in proportion to the concentra- tion of the absorbing species. The transmit- ted light is measured by a detector. 8885d_c03_082 12/23/03 10:21 AM Page 82 mac111 mac111:reb: From the titration curve of glycine we can derive several important pieces of information. First, it gives a quantitative measure of the pK a of each of the two ion- izing groups: 2.34 for the OCOOH group and 9.60 for the ONH 3 H11001 group. Note that the carboxyl group of glycine is over 100 times more acidic (more easily ion- ized) than the carboxyl group of acetic acid, which, as we saw in Chapter 2, has a pK a of 4.76—about average for a carboxyl group attached to an otherwise unsub- stituted aliphatic hydrocarbon. The perturbed pK a of glycine is caused by repulsion between the departing proton and the nearby positively charged amino group on the H9251-carbon atom, as described in Figure 3–11. The opposite charges on the resulting zwitterion are stabi- lizing, nudging the equilibrium farther to the right. Sim- ilarly, the pK a of the amino group in glycine is perturbed downward relative to the average pK a of an amino group. This effect is due partly to the electronegative oxygen atoms in the carboxyl groups, which tend to pull elec- trons toward them, increasing the tendency of the amino group to give up a proton. Hence, the H9251-amino group has a pK a that is lower than that of an aliphatic amine such as methylamine (Fig. 3–11). In short, the pK a of any functional group is greatly affected by its chemical environment, a phenomenon sometimes exploited in the active sites of enzymes to promote exquisitely adapted reaction mechanisms that depend on the perturbed pK a values of proton donor/acceptor groups of specific residues. 3.1 Amino Acids 83 N H11001 N H11001 H11005 H11005 H11005 C COOH H 2 H 3 C COO H11002 H 2 H 3 N C COO H11002 H 2 H 2 13 0.5 OH H11002 (equivalents) pI 5.97 pH 0 0 7 21.51 pK 2 pK 1 pK 2 9.60 pK 1 2.34 Glycine FIGURE 3–10 Titration of an amino acid. Shown here is the titration curve of 0.1 M glycine at 25 H11034C. The ionic species predominating at key points in the titration are shown above the graph. The shaded boxes, centered at about pK 1 H11005 2.34 and pK 2 H11005 9.60, indicate the re- gions of greatest buffering power. NH 3 H11001 Methyl-substituted carboxyl and amino groups Acetic acid The normal pK a for a carboxyl group is about 4.8. pK a 2 4 6 8 10 12 Methylamine The normal pK a for an amino group is about 10.6. Carboxyl and amino groups in glycine H9251-Amino acid (glycine) pK a H11005 2.34 Repulsion between the amino group and the departing proton lowers the pK a for the carboxyl group, and oppositely charged groups lower the pK a by stabi- lizing the zwitterion. H9251-Amino acid (glycine) pK a H11005 9.60 Electronegative oxygen atoms in the carboxyl group pull electrons away from the amino group, lowering its pK a . CH 3 COOH CH 3 CH 3 COO H11002 COO H11002 CH H NH 2 COO H11002 CH H CH 3 NH 3 NH 2 H H11001 H H11001 H11001 COOHHC H NH 3 H11001 H H11001 H H11001 H H11001 H H11001 H H11001 H H11001 FIGURE 3–11 Effect of the chemical environment on pK a . The pK a values for the ionizable groups in glycine are lower than those for sim- ple, methyl-substituted amino and carboxyl groups. These downward perturbations of pK a are due to intramolecular interactions. Similar ef- fects can be caused by chemical groups that happen to be positioned nearby—for example, in the active site of an enzyme. 8885d_c03_083 12/23/03 10:21 AM Page 83 mac111 mac111:reb: group in the range of 1.8 to 2.4, and pK a of the ONH 3 H11001 group in the range of 8.8 to 11.0 (Table 3–1). Second, amino acids with an ionizable R group have more complex titration curves, with three stages corre- sponding to the three possible ionization steps; thus they have three pK a values. The additional stage for the titration of the ionizable R group merges to some extent with the other two. The titration curves for two amino acids of this type, glutamate and histidine, are shown in Figure 3–12. The isoelectric points reflect the nature of the ionizing R groups present. For example, glutamate Chapter 3 Amino Acids, Peptides, and Proteins84 10 8 6 4 2 0 Glutamate H 3 N H11001 N H11001 N H11001 C COOH C C COOH H 2 H 2 H pK 1 H 3 C COO H11002 C C COOH H 2 H 2 H pK R H 3 C COO H11002 C C COO H11002 H 2 H 2 H pK 2 H 2 N C COO H11002 C C COO H11002 H 2 H 2 H pK 2 H11005 9.67 pK R H11005 4.25 pK 1 H11005 2.19 1.0 2.0 3.0 pH OH H11002 (equivalents) (a) FIGURE 3–12 Titration curves for (a) glutamate and (b) histidine. The pK a of the R group is designated here as pK R . The second piece of information provided by the titration curve of glycine is that this amino acid has two regions of buffering power. One of these is the relatively flat portion of the curve, extending for approximately 1 pH unit on either side of the first pK a of 2.34, indi- cating that glycine is a good buffer near this pH. The other buffering zone is centered around pH 9.60. (Note that glycine is not a good buffer at the pH of intracel- lular fluid or blood, about 7.4.) Within the buffering ranges of glycine, the Henderson-Hasselbalch equation (see Box 2–3) can be used to calculate the proportions of proton-donor and proton-acceptor species of glycine required to make a buffer at a given pH. Titration Curves Predict the Electric Charge of Amino Acids Another important piece of information derived from the titration curve of an amino acid is the relationship between its net electric charge and the pH of the solu- tion. At pH 5.97, the point of inflection between the two stages in its titration curve, glycine is present pre- dominantly as its dipolar form, fully ionized but with no net electric charge (Fig. 3–10). The characteristic pH at which the net electric charge is zero is called the isoelectric point or isoelectric pH, designated pI. For glycine, which has no ionizable group in its side chain, the isoelectric point is simply the arithmetic mean of the two pK a values: pI H11005 H5007 1 2 H5007 (pK 1 H11001 pK 2 ) H11005 H5007 1 2 H5007 (2.34 H11001 9.60) H11005 5.97 As is evident in Figure 3–10, glycine has a net negative charge at any pH above its pI and will thus move toward the positive electrode (the anode) when placed in an electric field. At any pH below its pI, glycine has a net positive charge and will move toward the negative elec- trode (the cathode). The farther the pH of a glycine so- lution is from its isoelectric point, the greater the net electric charge of the population of glycine molecules. At pH 1.0, for example, glycine exists almost entirely as the form H11001 H 3 NOCH 2 OCOOH, with a net positive charge of 1.0. At pH 2.34, where there is an equal mix- ture of H11001 H 3 NOCH 2 OCOOH and H11001 H 3 NOCH 2 OCOO H11002 , the average or net positive charge is 0.5. The sign and the magnitude of the net charge of any amino acid at any pH can be predicted in the same way. Amino Acids Differ in Their Acid-Base Properties The shared properties of many amino acids permit some simplifying generalizations about their acid-base behav- iors. First, all amino acids with a single H9251-amino group, a single H9251-carboxyl group, and an R group that does not ionize have titration curves resembling that of glycine (Fig. 3–10). These amino acids have very similar, al- though not identical, pK a values: pK a of the OCOOH C H 3 N H11001 C COOH C CH C H N H 2 H H 3 N H11001 C COO H11002 CH 2 H H 3 N H11001 C COO H11002 CH 2 HH 2 NC CH 2 H pK 1 H11005 1.82 pK R H11005 6.0 pK 2 H11005 9.17 C H N CH C H N H11001 H C H N CH C H N H11001 H C H N CH C H N 10 8 6 4 2 0 1.0 2.0 3.0 pH OH H11002 (equivalents) (b) COO H11002 H N Histidine pK 2 pK R pK 1 8885d_c03_084 12/23/03 10:21 AM Page 84 mac111 mac111:reb: has a pI of 3.22, considerably lower than that of glycine. This is due to the presence of two carboxyl groups, which, at the average of their pK a values (3.22), con- tribute a net charge of H110021 that balances the H110011 con- tributed by the amino group. Similarly, the pI of histi- dine, with two groups that are positively charged when protonated, is 7.59 (the average of the pK a values of the amino and imidazole groups), much higher than that of glycine. Finally, as pointed out earlier, under the general condition of free and open exposure to the aqueous en- vironment, only histidine has an R group (pK a H11005 6.0) providing significant buffering power near the neutral pH usually found in the intracellular and extracellular fluids of most animals and bacteria (Table 3–1). SUMMARY 3.1 Amino Acids ■ The 20 amino acids commonly found as residues in proteins contain an H9251-carboxyl group, an H9251-amino group, and a distinctive R group substituted on the H9251-carbon atom. The H9251-carbon atom of all amino acids except glycine is asymmetric, and thus amino acids can exist in at least two stereoisomeric forms. Only the L stereoisomers, with a configuration related to the absolute configuration of the reference molecule L-glyceraldehyde, are found in proteins. ■ Other, less common amino acids also occur, either as constituents of proteins (through modification of common amino acid residues after protein synthesis) or as free metabolites. ■ Amino acids are classified into five types on the basis of the polarity and charge (at pH 7) of their R groups. ■ Amino acids vary in their acid-base properties and have characteristic titration curves. Monoamino monocarboxylic amino acids (with nonionizable R groups) are diprotic acids ( H11001 H 3 NCH(R)COOH) at low pH and exist in several different ionic forms as the pH is increased. Amino acids with ionizable R groups have additional ionic species, depending on the pH of the medium and the pK a of the R group. 3.2 Peptides and Proteins We now turn to polymers of amino acids, the peptides and proteins. Biologically occurring polypeptides range in size from small to very large, consisting of two or three to thousands of linked amino acid residues. Our focus is on the fundamental chemical properties of these polymers. Peptides Are Chains of Amino Acids Two amino acid molecules can be covalently joined through a substituted amide linkage, termed a peptide bond, to yield a dipeptide. Such a linkage is formed by removal of the elements of water (dehydration) from the H9251-carboxyl group of one amino acid and the H9251-amino group of another (Fig. 3–13). Peptide bond formation is an example of a condensation reaction, a common class of reactions in living cells. Under standard biochemical conditions, the equilibrium for the reaction shown in Fig- ure 3–13 favors the amino acids over the dipeptide. To make the reaction thermodynamically more favorable, the carboxyl group must be chemically modified or ac- tivated so that the hydroxyl group can be more readily eliminated. A chemical approach to this problem is out- lined later in this chapter. The biological approach to peptide bond formation is a major topic of Chapter 27. Three amino acids can be joined by two peptide bonds to form a tripeptide; similarly, amino acids can be linked to form tetrapeptides, pentapeptides, and so forth. When a few amino acids are joined in this fash- ion, the structure is called an oligopeptide. When many amino acids are joined, the product is called a polypep- tide. Proteins may have thousands of amino acid residues. Although the terms “protein” and “polypep- tide” are sometimes used interchangeably, molecules re- ferred to as polypeptides generally have molecular weights below 10,000, and those called proteins have higher molecular weights. Figure 3–14 shows the structure of a pentapeptide. As already noted, an amino acid unit in a peptide is often called a residue (the part left over after losing a hydro- gen atom from its amino group and the hydroxyl moi- ety from its carboxyl group). In a peptide, the amino acid residue at the end with a free H9251-amino group is the amino-terminal (or N-terminal) residue; the residue 3.2 Peptides and Proteins 85 H 3 N H11001 C R 1 HC O OH H11001 H N H C R 2 H COO H11002 H 2 O H 2 O H 3 N H11001 C R 1 HC O N H C R 2 H COO H11002 FIGURE 3–13 Formation of a peptide bond by condensation. The H9251- amino group of one amino acid (with R 2 group) acts as a nucleophile to displace the hydroxyl group of another amino acid (with R 1 group), forming a peptide bond (shaded in yellow). Amino groups are good nucleophiles, but the hydroxyl group is a poor leaving group and is not readily displaced. At physiological pH, the reaction shown does not occur to any appreciable extent. 8885d_c03_085 12/23/03 10:22 AM Page 85 mac111 mac111:reb: at the other end, which has a free carboxyl group, is the carboxyl-terminal (C-terminal) residue. Although hydrolysis of a peptide bond is an exer- gonic reaction, it occurs slowly because of its high acti- vation energy. As a result, the peptide bonds in proteins are quite stable, with an average half-life (t 1/2 ) of about 7 years under most intracellular conditions. Peptides Can Be Distinguished by Their Ionization Behavior Peptides contain only one free H9251-amino group and one free H9251-carboxyl group, at opposite ends of the chain (Fig. 3–15). These groups ionize as they do in free amino acids, although the ionization constants are different be- cause an oppositely charged group is no longer linked to the H9251 carbon. The H9251-amino and H9251-carboxyl groups of all nonterminal amino acids are covalently joined in the peptide bonds, which do not ionize and thus do not con- tribute to the total acid-base behavior of peptides. How- ever, the R groups of some amino acids can ionize (Table 3–1), and in a peptide these contribute to the overall acid-base properties of the molecule (Fig. 3–15). Thus the acid-base behavior of a peptide can be predicted from its free H9251-amino and H9251-carboxyl groups as well as the nature and number of its ionizable R groups. Like free amino acids, peptides have characteristic titration curves and a characteristic isoelectric pH (pI) at which they do not move in an electric field. These properties are exploited in some of the techniques used to separate peptides and proteins, as we shall see later in the chapter. It should be emphasized that the pK a value for an ionizable R group can change somewhat when an amino acid becomes a residue in a peptide. The loss of charge in the H9251-carboxyl and H9251-amino groups, the interactions with other peptide R groups, and other environmental factors can affect the pK a . The pK a val- ues for R groups listed in Table 3–1 can be a useful guide to the pH range in which a given group will ionize, but they cannot be strictly applied to peptides. Biologically Active Peptides and Polypeptides Occur in a Vast Range of Sizes No generalizations can be made about the molecular weights of biologically active peptides and proteins in re- lation to their functions. Naturally occurring peptides range in length from two to many thousands of amino acid residues. Even the smallest peptides can have bio- logically important effects. Consider the commercially synthesized dipeptide L-aspartyl-L-phenylalanine methyl ester, the artificial sweetener better known as aspartame or NutraSweet. Many small peptides exert their effects at very low concentrations. For example, a number of vertebrate hormones (Chapter 23) are small peptides. These in- clude oxytocin (nine amino acid residues), which is se- creted by the posterior pituitary and stimulates uterine contractions; bradykinin (nine residues), which inhibits inflammation of tissues; and thyrotropin-releasing fac- tor (three residues), which is formed in the hypothala- mus and stimulates the release of another hormone, thyrotropin, from the anterior pituitary gland. Some extremely toxic mushroom poisons, such as amanitin, are also small peptides, as are many antibiotics. Slightly larger are small polypeptides and oligopep- tides such as the pancreatic hormone insulin, which con- tains two polypeptide chains, one having 30 amino acid H 3 N H11001 C C COO H11002 H 2 HC O N H C CH 2 HC O OCH 3 L-Aspartyl-L-phenylalanine methyl ester (aspartame) Chapter 3 Amino Acids, Peptides, and Proteins86 H 3 N H11001 C CH 2 OH H C O N H C H H C O N H C CH 2 H C O N H C CH 3 H C OH N H C C C CH 3 CH 3 H H 2 COO H11002 Amino- Carboxyl- terminal end terminal end OH FIGURE 3–14 The pentapeptide serylglycyltyrosylalanylleucine, or Ser–Gly–Tyr–Ala–Leu. Peptides are named beginning with the amino- terminal residue, which by convention is placed at the left. The pep- tide bonds are shaded in yellow; the R groups are in red. Ala C COO H11002 NH OC C NH OC C NH OC C N H11001 H 3 HCH 3 HCH 2 CH 2 COO H11002 H 2 HCH 2 CH 2 CH 2 CH 2 N H11001 H 3 Lys Gly Glu FIGURE 3–15 Alanylglutamylglycyllysine. This tetrapeptide has one free H9251-amino group, one free H9251-carboxyl group, and two ionizable R groups. The groups ionized at pH 7.0 are in red. 8885d_c03_086 12/23/03 10:22 AM Page 86 mac111 mac111:reb: residues and the other 21. Glucagon, another pancre- atic hormone, has 29 residues; it opposes the action of insulin. Corticotropin is a 39-residue hormone of the an- terior pituitary gland that stimulates the adrenal cortex. How long are the polypeptide chains in proteins? As Table 3–2 shows, lengths vary considerably. Human cyto- chrome c has 104 amino acid residues linked in a single chain; bovine chymotrypsinogen has 245 residues. At the extreme is titin, a constituent of vertebrate muscle, which has nearly 27,000 amino acid residues and a mo- lecular weight of about 3,000,000. The vast majority of naturally occurring proteins are much smaller than this, containing fewer than 2,000 amino acid residues. Some proteins consist of a single polypeptide chain, but others, called multisubunit proteins, have two or more polypeptides associated noncovalently (Table 3–2). The individual polypeptide chains in a multisub- unit protein may be identical or different. If at least two are identical the protein is said to be oligomeric, and the identical units (consisting of one or more polypep- tide chains) are referred to as protomers. Hemoglobin, for example, has four polypeptide subunits: two identical H9251 chains and two identical H9252 chains, all four held together by noncovalent interactions. Each H9251 sub- unit is paired in an identical way with a H9252 subunit within the structure of this multisubunit protein, so that he- moglobin can be considered either a tetramer of four polypeptide subunits or a dimer of H9251H9252 protomers. A few proteins contain two or more polypeptide chains linked covalently. For example, the two polypep- tide chains of insulin are linked by disulfide bonds. In such cases, the individual polypeptides are not consid- ered subunits but are commonly referred to simply as chains. We can calculate the approximate number of amino acid residues in a simple protein containing no other chemical constituents by dividing its molecular weight by 110. Although the average molecular weight of the 20 common amino acids is about 138, the smaller amino acids predominate in most proteins. If we take into ac- count the proportions in which the various amino acids occur in proteins (Table 3–1), the average molecular weight of protein amino acids is nearer to 128. Because a molecule of water (M r 18) is removed to create each peptide bond, the average molecular weight of an amino acid residue in a protein is about 128 H11002 18 H11005 110. Polypeptides Have Characteristic Amino Acid Compositions Hydrolysis of peptides or proteins with acid yields a mix- ture of free H9251-amino acids. When completely hydrolyzed, each type of protein yields a characteristic proportion or mixture of the different amino acids. The 20 common amino acids almost never occur in equal amounts in a protein. Some amino acids may occur only once or not at all in a given type of protein; others may occur in large numbers. Table 3–3 shows the composition of the amino acid mixtures obtained on complete hydrolysis of bovine cytochrome c and chymotrypsinogen, the inac- tive precursor of the digestive enzyme chymotrypsin. These two proteins, with very different functions, also differ significantly in the relative numbers of each kind of amino acid they contain. Complete hydrolysis alone is not sufficient for an exact analysis of amino acid composition, however, be- cause some side reactions occur during the procedure. For example, the amide bonds in the side chains of as- paragine and glutamine are cleaved by acid treatment, yielding aspartate and glutamate, respectively. The side chain of tryptophan is almost completely degraded by acid hydrolysis, and small amounts of serine, threonine, 3.2 Peptides and Proteins 87 TABLE 3–2 Molecular Data on Some Proteins Molecular Number of Number of weight residues polypeptide chains Cytochrome c (human) 13,000 104 1 Ribonuclease A (bovine pancreas) 13,700 124 1 Lysozyme (chicken egg white) 13,930 129 1 Myoglobin (equine heart) 16,890 153 1 Chymotrypsin (bovine pancreas) 21,600 241 3 Chymotrypsinogen (bovine) 22,000 245 1 Hemoglobin (human) 64,500 574 4 Serum albumin (human) 68,500 609 1 Hexokinase (yeast) 102,000 972 2 RNA polymerase (E. coli) 450,000 4,158 5 Apolipoprotein B (human) 513,000 4,536 1 Glutamine synthetase (E. coli) 619,000 5,628 12 Titin (human) 2,993,000 26,926 1 8885d_c03_087 12/23/03 10:22 AM Page 87 mac111 mac111:reb: and tyrosine are also lost. When a precise amino acid composition is required, biochemists use additional pro- cedures to resolve the ambiguities that remain from acid hydrolysis. Some Proteins Contain Chemical Groups Other Than Amino Acids Many proteins, for example the enzymes ribonuclease A and chymotrypsinogen, contain only amino acid residues and no other chemical constituents; these are considered simple proteins. However, some proteins contain permanently associated chemical components in addition to amino acids; these are called conjugated proteins. The non–amino acid part of a conjugated pro- tein is usually called its prosthetic group. Conjugated proteins are classified on the basis of the chemical na- ture of their prosthetic groups (Table 3–4); for exam- ple, lipoproteins contain lipids, glycoproteins contain sugar groups, and metalloproteins contain a specific metal. A number of proteins contain more than one pros- thetic group. Usually the prosthetic group plays an im- portant role in the protein’s biological function. There Are Several Levels of Protein Structure For large macromolecules such as proteins, the tasks of describing and understanding structure are approached at several levels of complexity, arranged in a kind of con- ceptual hierarchy. Four levels of protein structure are commonly defined (Fig. 3–16). A description of all co- valent bonds (mainly peptide bonds and disulfide bonds) linking amino acid residues in a polypeptide chain is its primary structure. The most important el- ement of primary structure is the sequence of amino acid residues. Secondary structure refers to particu- larly stable arrangements of amino acid residues giving rise to recurring structural patterns. Tertiary struc- ture describes all aspects of the three-dimensional fold- ing of a polypeptide. When a protein has two or more polypeptide subunits, their arrangement in space is re- ferred to as quaternary structure. Primary structure is the focus of Section 3.4; the higher levels of structure are discussed in Chapter 4. SUMMARY 3.2 Peptides and Proteins ■ Amino acids can be joined covalently through peptide bonds to form peptides and proteins. Cells generally contain thousands of different proteins, each with a different biological activity. ■ Proteins can be very long polypeptide chains of 100 to several thousand amino acid residues. However, some naturally occurring peptides have only a few amino acid residues. Some proteins are composed of several noncovalently Chapter 3 Amino Acids, Peptides, and Proteins88 *In some common analyses, such as acid hydrolysis, Asp and Asn are not readily distin- guished from each other and are together designated Asx (or B). Similarly, when Glu and Gln cannot be distinguished, they are together designated Glx (or Z). In addition, Trp is destroyed. Additional procedures must be employed to obtain an accurate assessment of complete amino acid content. Number of residues per molecule of protein* Amino Bovine Bovine acid cytochrome c chymotrypsinogen Ala 6 22 Arg 2 4 Asn 5 15 Asp 3 8 Cys 2 10 Gln 3 10 Glu 9 5 Gly 14 23 His 3 2 Ile 6 10 Leu 6 19 Lys 18 14 Met 2 2 Phe 4 6 Pro 4 9 Ser 1 28 Thr 8 23 Trp 1 8 Tyr 4 4 Val 3 23 Total 104 245 Amino Acid Composition of Two Proteins TABLE 3–3 TABLE 3–4 Conjugated Proteins Class Prosthetic group Example Lipoproteins Lipids H9252 1 -Lipoprotein of blood Glycoproteins Carbohydrates Immunoglobulin G Phosphoproteins Phosphate groups Casein of milk Hemoproteins Heme (iron porphyrin) Hemoglobin Flavoproteins Flavin nucleotides Succinate dehydrogenase Metalloproteins Iron Ferritin Zinc Alcohol dehydrogenase Calcium Calmodulin Molybdenum Dinitrogenase Copper Plastocyanin 8885d_c03_088 12/23/03 10:22 AM Page 88 mac111 mac111:reb: associated polypeptide chains, called subunits. Simple proteins yield only amino acids on hydrolysis; conjugated proteins contain in addition some other component, such as a metal or organic prosthetic group. ■ The sequence of amino acids in a protein is characteristic of that protein and is called its primary structure. This is one of four generally recognized levels of protein structure. 3.3 Working with Proteins Our understanding of protein structure and function has been derived from the study of many individual proteins. To study a protein in detail, the researcher must be able to separate it from other proteins and must have the techniques to determine its properties. The necessary methods come from protein chemistry, a discipline as old as biochemistry itself and one that retains a central position in biochemical research. Proteins Can Be Separated and Purified A pure preparation is essential before a protein’s prop- erties and activities can be determined. Given that cells contain thousands of different kinds of proteins, how can one protein be purified? Methods for separating pro- teins take advantage of properties that vary from one protein to the next, including size, charge, and binding properties. The source of a protein is generally tissue or mi- crobial cells. The first step in any protein purification procedure is to break open these cells, releasing their proteins into a solution called a crude extract. If nec- essary, differential centrifugation can be used to pre- pare subcellular fractions or to isolate specific or- ganelles (see Fig. 1–8). Once the extract or organelle preparation is ready, various methods are available for purifying one or more of the proteins it contains. Commonly, the extract is sub- jected to treatments that separate the proteins into dif- ferent fractions based on a property such as size or charge, a process referred to as fractionation. Early fractionation steps in a purification utilize differences in protein solubility, which is a complex function of pH, temperature, salt concentration, and other factors. The solubility of proteins is generally lowered at high salt concentrations, an effect called “salting out.” The addi- tion of a salt in the right amount can selectively pre- cipitate some proteins, while others remain in solution. Ammonium sulfate ((NH 4 ) 2 SO 4 ) is often used for this purpose because of its high solubility in water. A solution containing the protein of interest often must be further altered before subsequent purification steps are possible. For example, dialysis is a procedure that separates proteins from solvents by taking advan- tage of the proteins’ larger size. The partially purified extract is placed in a bag or tube made of a semiper- meable membrane. When this is suspended in a much larger volume of buffered solution of appropriate ionic strength, the membrane allows the exchange of salt and buffer but not proteins. Thus dialysis retains large pro- teins within the membranous bag or tube while allow- ing the concentration of other solutes in the protein preparation to change until they come into equilibrium with the solution outside the membrane. Dialysis might be used, for example, to remove ammonium sulfate from the protein preparation. The most powerful methods for fractionating pro- teins make use of column chromatography, which takes advantage of differences in protein charge, size, 3.3 Working with Proteins 89 Primary structure Secondary structure Tertiary structure Quaternary structure Amino acid residues Lys Lys Gly Gly Leu Val Ala His Helix Polypeptide chain Assembled subunitsH9251 FIGURE 3–16 Levels of structure in proteins. The primary structure consists of a sequence of amino acids linked together by peptide bonds and includes any disulfide bonds. The resulting polypeptide can be coiled into units of secondary structure, such as an H9251 helix. The he- lix is a part of the tertiary structure of the folded polypeptide, which is itself one of the subunits that make up the quaternary structure of the multisubunit protein, in this case hemoglobin. 8885d_c03_089 12/23/03 11:06 AM Page 89 mac111 mac111:reb: length. And as the length of time spent on the column increases, the resolution can decline as a result of dif- fusional spreading within each protein band. Figure 3–18 shows two other variations of column chromatography in addition to ion exchange. Size- exclusion chromatography separates proteins ac- cording to size. In this method, large proteins emerge from the column sooner than small ones—a somewhat counterintuitive result. The solid phase consists of beads with engineered pores or cavities of a particular size. Large proteins cannot enter the cavities, and so take a short (and rapid) path through the column, around the beads. Small proteins enter the cavities, and migrate through the column more slowly as a result (Fig. 3–18b). Affinity chromatography is based on the binding affinity of a protein. The beads in the column have a covalently attached chemical group. A protein with affinity for this particular chemical group will bind to the beads in the column, and its migration will be re- tarded as a result (Fig. 3–18c). A modern refinement in chromatographic methods is HPLC, or high-performance liquid chromatogra- phy. HPLC makes use of high-pressure pumps that speed the movement of the protein molecules down the column, as well as higher-quality chromatographic ma- terials that can withstand the crushing force of the pres- surized flow. By reducing the transit time on the col- umn, HPLC can limit diffusional spreading of protein bands and thus greatly improve resolution. The approach to purification of a protein that has not previously been isolated is guided both by estab- lished precedents and by common sense. In most cases, several different methods must be used sequentially to purify a protein completely. The choice of method is Chapter 3 Amino Acids, Peptides, and Proteins90 Solid porous matrix (stationary phase) Porous support Effluent Reservoir Protein sample (mobile phase) Proteins A B C FIGURE 3–17 Column chromatography. The standard elements of a chromatographic column include a solid, porous material supported inside a column, generally made of plastic or glass. The solid material (matrix) makes up the stationary phase through which flows a solu- tion, the mobile phase. The solution that passes out of the column at the bottom (the effluent) is constantly replaced by solution supplied from a reservoir at the top. The protein solution to be separated is lay- ered on top of the column and allowed to percolate into the solid matrix. Additional solution is added on top. The protein solution forms a band within the mobile phase that is initially the depth of the pro- tein solution applied to the column. As proteins migrate through the column, they are retarded to different degrees by their different inter- actions with the matrix material. The overall protein band thus widens as it moves through the column. Individual types of proteins (such as A, B, and C, shown in blue, red, and green) gradually separate from each other, forming bands within the broader protein band. Separa- tion improves (resolution increases) as the length of the column in- creases. However, each individual protein band also broadens with time due to diffusional spreading, a process that decreases resolution. In this example, protein A is well separated from B and C, but diffu- sional spreading prevents complete separation of B and C under these conditions. binding affinity, and other properties (Fig. 3–17). A porous solid material with appropriate chemical prop- erties (the stationary phase) is held in a column, and a buffered solution (the mobile phase) percolates through it. The protein-containing solution, layered on the top of the column, percolates through the solid matrix as an ever-expanding band within the larger mobile phase (Fig. 3–17). Individual proteins migrate faster or more slowly through the column depending on their proper- ties. For example, in cation-exchange chromatogra- phy (Fig. 3–18a), the solid matrix has negatively charged groups. In the mobile phase, proteins with a net positive charge migrate through the matrix more slowly than those with a net negative charge, because the mi- gration of the former is retarded more by interaction with the stationary phase. The two types of protein can separate into two distinct bands. The expansion of the protein band in the mobile phase (the protein solution) is caused both by separation of proteins with different properties and by diffusional spreading. As the length of the column increases, the resolution of two types of protein with different net charges generally improves. However, the rate at which the protein solution can flow through the column usually decreases with column 8885d_c03_090 12/23/03 10:23 AM Page 90 mac111 mac111:reb: Protein mixture is added to column containing cation exchangers. (a) 123456 Large net positive charge Net positive charge Net negative charge Large net negative charge Proteins move through the column at rates determined by their net charge at the pH being used. With cation exchangers, proteins with a more negative net charge move faster and elute earlier. Polymer beads with negatively charged functional groups FIGURE 3–18 Three chromatographic methods used in protein purifi- cation. (a) Ion-exchange chromatography exploits differences in the sign and magnitude of the net electric charges of proteins at a given pH. The column matrix is a synthetic polymer containing bound charged groups; those with bound anionic groups are called cation exchangers, and those with bound cationic groups are called anion exchangers. Ion-exchange chromatography on a cation exchanger is shown here. The affinity of each protein for the charged groups on the column is affected by the pH (which determines the ionization state of the molecule) and the concentration of competing free salt ions in the surrounding solution. Separation can be optimized by gradually changing the pH and/or salt concentration of the mobile phase so as to create a pH or salt gradient. (b) Size-exclusion chromatography, also called gel filtration, separates proteins according to size. The column matrix is a cross-linked polymer with pores of selected size. Larger proteins migrate faster than smaller ones, because they are too large to enter the pores in the beads and hence take a more direct route through the column. The smaller proteins enter the pores and are slowed by their more labyrinthine path through the column. (c) Affinity chromatography separates proteins by their binding speci- ficities. The proteins retained on the column are those that bind specifically to a ligand cross-linked to the beads. (In biochemistry, the term “ligand” is used to refer to a group or molecule that binds to a macromolecule such as a protein.) After proteins that do not bind to the ligand are washed through the column, the bound protein of particular interest is eluted (washed out of the column) by a solution containing free ligand. Protein molecules separate by size; larger molecules pass more freely, appearing in the earlier fractions. 123456 Protein mixture is added to column containing cross-linked polymer. Porous polymer beads (b) Unwanted proteins are washed through column. Protein of interest is eluted by ligand solution. Protein of interest Ligand Protein mixture is added to column containing a polymer-bound ligand specific for protein of interest. Mixture of proteins 7 86543 Solution of ligand 3 421 5 (c) 8885d_c03_091 12/23/03 10:23 AM Page 91 mac111 mac111:reb: Chapter 3 Amino Acids, Peptides, and Proteins92 somewhat empirical, and many protocols may be tried before the most effective one is found. Trial and error can often be minimized by basing the procedure on pu- rification techniques developed for similar proteins. Published purification protocols are available for many thousands of proteins. Common sense dictates that in- expensive procedures such as salting out be used first, when the total volume and the number of contaminants are greatest. Chromatographic methods are often im- practical at early stages, because the amount of chro- matographic medium needed increases with sample size. As each purification step is completed, the sample size generally becomes smaller (Table 3–5), making it feasible to use more sophisticated (and expensive) chromatographic procedures at later stages. Proteins Can Be Separated and Characterized by Electrophoresis Another important technique for the separation of pro- teins is based on the migration of charged proteins in an electric field, a process called electrophoresis. These procedures are not generally used to purify pro- teins in large amounts, because simpler alternatives are usually available and electrophoretic methods often adversely affect the structure and thus the function of proteins. Electrophoresis is, however, especially useful as an analytical method. Its advantage is that proteins can be visualized as well as separated, permitting a researcher to estimate quickly the number of different proteins in a mixture or the degree of purity of a par- ticular protein preparation. Also, electrophoresis allows determination of crucial properties of a protein such as its isoelectric point and approximate molecular weight. Electrophoresis of proteins is generally carried out in gels made up of the cross-linked polymer polyacryl- amide (Fig. 3–19). The polyacrylamide gel acts as a mo- lecular sieve, slowing the migration of proteins approx- imately in proportion to their charge-to-mass ratio. Migration may also be affected by protein shape. In elec- trophoresis, the force moving the macromolecule is the electrical potential, E. The electrophoretic mobility of the molecule, H9262, is the ratio of the velocity of the par- ticle molecule, V, to the electrical potential. Electro- phoretic mobility is also equal to the net charge of the molecule, Z, divided by the frictional coefficient, f, which reflects in part a protein’s shape. Thus: H9262 H11005 H5007 E V H5007 H11005 H5007 Z f H5007 The migration of a protein in a gel during electro- phoresis is therefore a function of its size and its shape. An electrophoretic method commonly employed for estimation of purity and molecular weight makes use of the detergent sodium dodecyl sulfate (SDS). SDS binds to most proteins in amounts roughly propor- tional to the molecular weight of the protein, about one molecule of SDS for every two amino acid residues. The bound SDS contributes a large net negative charge, ren- dering the intrinsic charge of the protein insignificant and conferring on each protein a similar charge-to-mass ratio. In addition, the native conformation of a protein is altered when SDS is bound, and most proteins assume a similar shape. Electrophoresis in the presence of SDS therefore separates proteins almost exclusively on the basis of mass (molecular weight), with smaller polypep- tides migrating more rapidly. After electrophoresis, the proteins are visualized by adding a dye such as Coomassie blue, which binds to proteins but not to the gel itself (Fig. 3–19b). Thus, a researcher can monitor the progress of a protein purification procedure as the number of protein bands visible on the gel decreases af- ter each new fractionation step. When compared with the positions to which proteins of known molecular weight migrate in the gel, the position of an unidenti- fied protein can provide an excellent measure of its mo- lecular weight (Fig. 3–20). If the protein has two or more different subunits, the subunits will generally be sepa- rated by the SDS treatment and a separate band will ap- pear for each. SDS Gel Electrophoresis (CH 2 ) 11 CH 3 O SNa H11001 H11002 OO O Sodium dodecyl sulfate (SDS) TABLE 3–5 A Purification Table for a Hypothetical Enzyme Fraction volume Total protein Activity Specific activity Procedure or step (ml) (mg) (units) (units/mg) 1. Crude cellular extract 1,400 10,000 100,000 10 2. Precipitation with ammonium sulfate 280 3,000 96,000 32 3. Ion-exchange chromatography 90 400 80,000 200 4. Size-exclusion chromatography 80 100 60,000 600 5. Affinity chromatography 6 3 45,000 15,000 Note: All data represent the status of the sample after the designated procedure has been carried out. Activity and specific activity are defined on page 94. 8885d_c03_092 12/23/03 10:23 AM Page 92 mac111 mac111:reb: Sample Well Direction of migration + – (a) (b) FIGURE 3–19 Electrophoresis. (a) Different samples are loaded in wells or depressions at the top of the polyacrylamide gel. The proteins move into the gel when an electric field is applied. The gel minimizes convection currents caused by small temperature gradients, as well as protein movements other than those induced by the electric field. (b) Proteins can be visualized after electrophoresis by treating the gel with a stain such as Coomassie blue, which binds to the proteins but not to the gel itself. Each band on the gel represents a different pro- tein (or protein subunit); smaller proteins move through the gel more rapidly than larger proteins and therefore are found nearer the bottom of the gel. This gel illustrates the purification of the enzyme RNA poly- merase from E. coli. The first lane shows the proteins present in the crude cellular extract. Successive lanes (left to right) show the proteins present after each purification step. The purified protein contains four subunits, as seen in the last lane on the right. 200,000 116,250 97,400 66,200 45,000 31,000 21,500 14,400 M r standards Unknown protein Myosin b-Galactosidase Glycogen phosphorylase b Bovine serum albumin Ovalbumin Carbonic anhydrase Soybean trypsin inhibitor Lysozyme – + 12 (a) log M r Relative migration Unknown protein (b) FIGURE 3–20 Estimating the molecular weight of a protein. The electrophoretic mobility of a protein on an SDS polyacrylamide gel is related to its molecular weight, M r . (a) Standard proteins of known molecular weight are subjected to electrophoresis (lane 1). These marker proteins can be used to estimate the molecular weight of an unknown protein (lane 2). (b) A plot of log M r of the marker proteins versus relative migration during electrophoresis is linear, which allows the molecular weight of the unknown protein to be read from the graph. Isoelectric focusing is a procedure used to de- termine the isoelectric point (pI) of a protein (Fig. 3–21). A pH gradient is established by allowing a mix- ture of low molecular weight organic acids and bases (ampholytes; p. 81) to distribute themselves in an elec- tric field generated across the gel. When a protein mix- ture is applied, each protein migrates until it reaches the pH that matches its pI (Table 3–6). Proteins with different isoelectric points are thus distributed differ- ently throughout the gel. Combining isoelectric focusing and SDS electropho- resis sequentially in a process called two-dimensional 933.3 Working with Proteins 8885d_c03_093 1/16/04 6:48 AM Page 93 mac76 mac76:385_reb: pH 9 pH 3 – + – + – + An ampholyte solution is incorporated into a gel. Decreasing pH A stable pH gradient is established in the gel after application of an electric field. Protein solution is added and electric field is reapplied. After staining, proteins are shown to be distributed along pH gradient according to their pI values. FIGURE 3–21 Isoelectric focusing. This technique separates proteins according to their isoelectric points. A stable pH gradient is established in the gel by the addition of appropriate ampholytes. A protein mixture is placed in a well on the gel. With an applied electric field, proteins enter the gel and migrate until each reaches a pH equivalent to its pI. Remember that when pH H11005 pI, the net charge of a protein is zero. electrophoresis permits the resolution of complex mixtures of proteins (Fig. 3–22). This is a more sensi- tive analytical method than either electrophoretic method alone. Two-dimensional electrophoresis sepa- rates proteins of identical molecular weight that differ in pI, or proteins with similar pI values but different mo- lecular weights. Unseparated Proteins Can Be Quantified To purify a protein, it is essential to have a way of de- tecting and quantifying that protein in the presence of many other proteins at each stage of the procedure. Often, purification must proceed in the absence of any information about the size and physical properties of the protein or about the fraction of the total protein mass it represents in the extract. For proteins that are en- zymes, the amount in a given solution or tissue extract can be measured, or assayed, in terms of the catalytic effect the enzyme produces—that is, the increase in the rate at which its substrate is converted to reaction products when the enzyme is present. For this purpose one must know (1) the overall equation of the reaction catalyzed, (2) an analytical procedure for determining the disappearance of the substrate or the appearance of a reaction product, (3) whether the enzyme requires co- factors such as metal ions or coenzymes, (4) the de- pendence of the enzyme activity on substrate concen- tration, (5) the optimum pH, and (6) a temperature zone in which the enzyme is stable and has high activ- ity. Enzymes are usually assayed at their optimum pH and at some convenient temperature within the range 25 to 38 H11034C. Also, very high substrate concentrations are generally used so that the initial reaction rate, measured experimentally, is proportional to enzyme concentration (Chapter 6). By international agreement, 1.0 unit of enzyme ac- tivity is defined as the amount of enzyme causing trans- formation of 1.0 H9262mol of substrate per minute at 25 H11034C under optimal conditions of measurement. The term activity refers to the total units of enzyme in a solu- tion. The specific activity is the number of enzyme units per milligram of total protein (Fig. 3–23). The spe- cific activity is a measure of enzyme purity: it increases during purification of an enzyme and becomes maximal and constant when the enzyme is pure (Table 3–5). Chapter 3 Amino Acids, Peptides, and Proteins94 Protein pI Pepsin H110211.0 Egg albumin 4.6 Serum albumin 4.9 Urease 5.0 H9252-Lactoglobulin 5.2 Hemoglobin 6.8 Myoglobin 7.0 Chymotrypsinogen 9.5 Cytochrome c 10.7 Lysozyme 11.0 The Isoelectric Points of Some Proteins TABLE 3–6 8885d_c03_094 12/23/03 10:24 AM Page 94 mac111 mac111:reb: After each purification step, the activity of the preparation (in units of enzyme activity) is assayed, the total amount of protein is determined independently, and the ratio of the two gives the specific activity. Ac- tivity and total protein generally decrease with each step. Activity decreases because some loss always oc- curs due to inactivation or nonideal interactions with chromatographic materials or other molecules in the so- lution. Total protein decreases because the objective is to remove as much unwanted or nonspecific protein as possible. In a successful step, the loss of nonspecific pro- tein is much greater than the loss of activity; therefore, specific activity increases even as total activity falls. The data are then assembled in a purification table similar to Table 3–5. A protein is generally considered pure when further purification steps fail to increase specific activity and when only a single protein species can be detected (for example, by electrophoresis). For proteins that are not enzymes, other quantifi- cation methods are required. Transport proteins can be assayed by their binding to the molecule they transport, and hormones and toxins by the biological effect they produce; for example, growth hormones will stimulate the growth of certain cultured cells. Some structural proteins represent such a large fraction of a tissue mass that they can be readily extracted and purified without a functional assay. The approaches are as varied as the proteins themselves. 3.3 Working with Proteins 95 Decreasing pI Second dimension First dimension Isoelectric focusing Decreasing M r Decreasing pI(a) Isoelectric focusing gel is placed on SDS polyacrylamide gel. SDS polyacrylamide gel electrophoresis (b) FIGURE 3–22 Two-dimensional electrophoresis. (a) Proteins are first separated by isoelectric focusing in a cylindrical gel. The gel is then laid horizontally on a second, slab-shaped gel, and the proteins are separated by SDS polyacrylamide gel electrophoresis. Horizontal sep- aration reflects differences in pI; vertical separation reflects differences in molecular weight. (b) More than 1,000 different proteins from E. coli can be resolved using this technique. FIGURE 3–23 Activity versus specific activity. The difference between these two terms can be illustrated by considering two beakers of mar- bles. The beakers contain the same number of red marbles, but dif- ferent numbers of marbles of other colors. If the marbles represent proteins, both beakers contain the same activity of the protein repre- sented by the red marbles. The second beaker, however, has the higher specific activity because here the red marbles represent a much higher fraction of the total. 8885d_c03_095 12/23/03 10:24 AM Page 95 mac111 mac111:reb: SUMMARY 3.3 Working with Proteins ■ Proteins are separated and purified by taking advantage of differences in their properties. Proteins can be selectively precipitated by the addition of certain salts. A wide range of chromatographic procedures makes use of differences in size, binding affinities, charge, and other properties. These include ion- exchange, size-exclusion, affinity, and high- performance liquid chromatography. ■ Electrophoresis separates proteins on the basis of mass or charge. SDS gel electrophoresis and isoelectric focusing can be used separately or in combination for higher resolution. ■ All purification procedures require a method for quantifying or assaying the protein of interest in the presence of other proteins. Purification can be monitored by assaying specific activity. 3.4 The Covalent Structure of Proteins Purification of a protein is usually only a prelude to a detailed biochemical dissection of its structure and function. What is it that makes one protein an enzyme, another a hormone, another a structural protein, and still another an antibody? How do they differ chemically? The most obvious distinctions are structural, and these distinctions can be approached at every level of struc- ture defined in Figure 3–16. The differences in primary structure can be espe- cially informative. Each protein has a distinctive num- ber and sequence of amino acid residues. As we shall see in Chapter 4, the primary structure of a protein de- termines how it folds up into a unique three-dimensional structure, and this in turn determines the function of the protein. Primary structure is the focus of the re- mainder of this chapter. We first consider empirical clues that amino acid sequence and protein function are closely linked, then describe how amino acid sequence is determined; finally, we outline the many uses to which this information can be put. The Function of a Protein Depends on Its Amino Acid Sequence The bacterium Escherichia coli produces more than 3,000 different proteins; a human produces 25,000 to 35,000. In both cases, each type of protein has a unique three-dimensional structure and this structure confers a unique function. Each type of protein also has a unique amino acid sequence. Intuition suggests that the amino acid sequence must play a fundamental role in deter- mining the three-dimensional structure of the protein, and ultimately its function, but is this supposition cor- rect? A quick survey of proteins and how they vary in amino acid sequence provides a number of empirical clues that help substantiate the important relationship between amino acid sequence and biological function. First, as we have already noted, proteins with dif- ferent functions always have different amino acid se- quences. Second, thousands of human genetic diseases have been traced to the production of defective pro- teins. Perhaps one-third of these proteins are defective because of a single change in their amino acid sequence; hence, if the primary structure is altered, the function of the protein may also be changed. Finally, on com- paring functionally similar proteins from different species, we find that these proteins often have similar amino acid sequences. An extreme case is ubiquitin, a 76-residue protein involved in regulating the degrada- tion of other proteins. The amino acid sequence of ubiq- uitin is identical in species as disparate as fruit flies and humans. Is the amino acid sequence absolutely fixed, or in- variant, for a particular protein? No; some flexibility is possible. An estimated 20% to 30% of the proteins in humans are polymorphic, having amino acid sequence variants in the human population. Many of these varia- tions in sequence have little or no effect on the func- tion of the protein. Furthermore, proteins that carry out a broadly similar function in distantly related species can differ greatly in overall size and amino acid sequence. Although the amino acid sequence in some regions of the primary structure might vary considerably with- out affecting biological function, most proteins contain crucial regions that are essential to their function and whose sequence is therefore conserved. The fraction of the overall sequence that is critical varies from protein to protein, complicating the task of relating sequence to three-dimensional structure, and structure to function. Before we can consider this problem further, however, we must examine how sequence information is obtained. The Amino Acid Sequences of Millions of Proteins Have Been Determined Two major discoveries in 1953 were of crucial importance in the history of biochemistry. In that year James D. Watson and Francis Crick deduced the double-helical structure of DNA and proposed a structural basis for its precise replication (Chapter 8). Their proposal illumi- nated the molecular reality behind the idea of a gene. In that same year, Frederick Sanger worked out the se- quence of amino acid residues in the polypeptide chains of the hormone insulin (Fig. 3–24), surprising many researchers who had long thought that elucidation of the amino acid sequence of a polypeptide would be a hopelessly difficult task. It quickly became evident that the nucleotide sequence in DNA and the amino acid sequence in proteins were somehow related. Barely a decade after these discoveries, the role of the nucleotide Chapter 3 Amino Acids, Peptides, and Proteins96 8885d_c03_096 12/23/03 10:24 AM Page 96 mac111 mac111:reb: sequence of DNA in determining the amino acid se- quence of protein molecules was revealed (Chapter 27). An enormous number of protein sequences can now be derived indirectly from the DNA sequences in the rapidly growing genome databases. However, many are still de- duced by traditional methods of polypeptide sequencing. The amino acid sequences of thousands of different proteins from many species have been determined us- ing principles first developed by Sanger. These methods are still in use, although with many variations and im- provements in detail. Chemical protein sequencing now complements a growing list of newer methods, provid- ing multiple avenues to obtain amino acid sequence data. Such data are now critical to every area of bio- chemical investigation. Short Polypeptides Are Sequenced Using Automated Procedures Various procedures are used to analyze protein primary structure. Several protocols are available to label and identify the amino-terminal amino acid residue (Fig. 3–25a). Sanger developed the reagent 1-fluoro-2,4- dinitrobenzene (FDNB) for this purpose; other reagents used to label the amino-terminal residue, dansyl chlo- ride and dabsyl chloride, yield derivatives that are more easily detectable than the dinitrophenyl derivatives. Af- ter the amino-terminal residue is labeled with one of these reagents, the polypeptide is hydrolyzed to its con- stituent amino acids and the labeled amino acid is iden- tified. Because the hydrolysis stage destroys the polypeptide, this procedure cannot be used to sequence a polypeptide beyond its amino-terminal residue. How- ever, it can help determine the number of chemically distinct polypeptides in a protein, provided each has a different amino-terminal residue. For example, two residues—Phe and Gly—would be labeled if insulin (Fig. 3–24) were subjected to this procedure. 3.4 The Covalent Structure of Proteins 97 Frederick Sanger Gly Ile Val Gln Gln Cys Cys Ala Val Cys Ser Val Ser Gly Phe Phe Tyr Thr Pro Lys B chain SS S S S 5 10 20 25 20 15 30 5 10 15 S H11001 NH 3 Ala COO H11002 A chain Phe Val Asn His Gln Leu Cys Gly Ser H11001 NH 3 His Leu Glu Ala Leu Tyr Leu Val Cys Gly Glu Arg Leu Tyr Gln Leu Glu Asn Tyr Cys Asn COO H11002 FIGURE 3–24 Amino acid sequence of bovine insulin. The two polypeptide chains are joined by disulfide cross- linkages. The A chain is identical in human, pig, dog, rabbit, and sperm whale insulins. The B chains of the cow, pig, dog, goat, and horse are identical. O 2 ClN G CH 3 D CH 3 NPN Dabsyl chloride S N G CH 3 D CH 3 Dansyl chloride O 2 ClS 8885d_c03_097 12/23/03 10:24 AM Page 97 mac111 mac111:reb: To sequence an entire polypeptide, a chemical method devised by Pehr Edman is usually employed. The Edman degradation procedure labels and re- moves only the amino-terminal residue from a peptide, leaving all other peptide bonds intact (Fig. 3–25b). The peptide is reacted with phenylisothiocyanate under mildly alkaline conditions, which converts the amino- terminal amino acid to a phenylthiocarbamoyl (PTC) adduct. The peptide bond next to the PTC adduct is then cleaved in a step carried out in anhydrous trifluo- roacetic acid, with removal of the amino-terminal amino acid as an anilinothiazolinone derivative. The deriva- tized amino acid is extracted with organic solvents, con- verted to the more stable phenylthiohydantoin deriva- tive by treatment with aqueous acid, and then identified. The use of sequential reactions carried out under first basic and then acidic conditions provides control over the entire process. Each reaction with the amino- terminal amino acid can go essentially to completion without affecting any of the other peptide bonds in the peptide. After removal and identification of the amino- terminal residue, the new amino-terminal residue so exposed can be labeled, removed, and identified through the same series of reactions. This procedure is repeated until the entire sequence is determined. The Edman degradation is carried out on a machine, called a sequenator, that mixes reagents in the proper pro- portions, separates the products, identifies them, and records the results. These methods are extremely sen- sitive. Often, the complete amino acid sequence can be determined starting with only a few micrograms of protein. The length of polypeptide that can be accurately sequenced by the Edman degradation depends on the Chapter 3 Amino Acids, Peptides, and Proteins98 Polypeptide (b) H11001 amino acids R 1 C NH C HN R 2 C CO H O HR 1 C NH COO H11002 H 2,4-Dinitro- phenyl derivative of polypeptide 2,4-Dinitrophenyl derivative of amino-terminal residue N C S cyanate phenylisothio- H11002 OH N C HN: R 1 C O + NH 2 CF 3 COOH R 2 C C PTC adduct O H H S H C N C NH S HC R 1 Phenylthiohydantoin derivative of amino acid residue NO 2 NO 2 Identify amino-terminal residue of polypeptide. Identify amino-terminal residue; purify and recycle remaining peptide fragment through Edman process. NO 2 F FDNB NO 2 NO 2 NO 2 (a) Free O 6 M HCl C H + N C NH S CHC R 1 O Anilinothiazolinone derivative of amino acid residue Shortened peptide R H11001 O C H C H N H C 2 O CH 3 N R 3 FIGURE 3–25 Steps in sequencing a polypeptide. (a) Identification of the amino-terminal residue can be the first step in sequencing a polypeptide. Sanger’s method for identifying the amino-terminal residue is shown here. (b) The Edman degradation procedure reveals the entire sequence of a peptide. For shorter peptides, this method alone readily yields the entire sequence, and step (a) is often omit- ted. Step (a) is useful in the case of larger polypeptides, which are of- ten fragmented into smaller peptides for sequencing (see Fig. 3–27). 8885d_c03_098 12/23/03 10:25 AM Page 98 mac111 mac111:reb: efficiency of the individual chemical steps. Consider a peptide beginning with the sequence Gly–Pro–Lys– at its amino terminus. If glycine were removed with 97% efficiency, 3% of the polypeptide molecules in the solu- tion would retain a Gly residue at their amino terminus. In the second Edman cycle, 97% of the liberated amino acids would be proline, and 3% glycine, while 3% of the polypeptide molecules would retain Gly (0.1%) or Pro (2.9%) residues at their amino terminus. At each cycle, peptides that did not react in earlier cycles would con- tribute amino acids to an ever-increasing background, eventually making it impossible to determine which amino acid is next in the original peptide sequence. Modern sequenators achieve efficiencies of better than 99% per cycle, permitting the sequencing of more than 50 contiguous amino acid residues in a polypeptide. The primary structure of insulin, worked out by Sanger and colleagues over a period of 10 years, could now be com- pletely determined in a day or two. Large Proteins Must Be Sequenced in Smaller Segments The overall accuracy of amino acid sequencing gener- ally declines as the length of the polypeptide increases. The very large polypeptides found in proteins must be broken down into smaller pieces to be sequenced effi- ciently. There are several steps in this process. First, the protein is cleaved into a set of specific fragments by chemical or enzymatic methods. If any disulfide bonds are present, they must be broken. Each fragment is pu- rified, then sequenced by the Edman procedure. Finally, the order in which the fragments appear in the original protein is determined and disulfide bonds (if any) are located. Breaking Disulfide Bonds Disulfide bonds interfere with the sequencing procedure. A cystine residue (Fig. 3–7) that has one of its peptide bonds cleaved by the Edman procedure may remain attached to another polypeptide strand via its disulfide bond. Disulfide bonds also inter- fere with the enzymatic or chemical cleavage of the polypeptide. Two approaches to irreversible breakage of disulfide bonds are outlined in Figure 3–26. Cleaving the Polypeptide Chain Several methods can be used for fragmenting the polypeptide chain. Enzymes called proteases catalyze the hydrolytic cleavage of peptide bonds. Some proteases cleave only the peptide bond adjacent to particular amino acid residues (Table 3–7) and thus fragment a polypeptide chain in a pre- dictable and reproducible way. A number of chemical reagents also cleave the peptide bond adjacent to spe- cific residues. Among proteases, the digestive enzyme trypsin cat- alyzes the hydrolysis of only those peptide bonds in which the carbonyl group is contributed by either a Lys or an Arg residue, regardless of the length or amino acid sequence of the chain. The number of smaller peptides produced by trypsin cleavage can thus be predicted 3.4 The Covalent Structure of Proteins 99 Disulfide bond (cystine) HC NH CO CH 2 SSCH 2 C OC HN H oxidation by reduction by performic acid dithiothreitol HC NH CO CH 2 S O O O H11002H11002 OS O O CH 2 C OC HN HHC NH CO CH 2 SH HS CH 2 C OC HN H Cysteic acid residues acetylation by iodoacetate HC NH CO CH 2 SCH 2 COO H11002H11002 OOC CH 2 SCH 2 C OC HN H Acetylated cysteine residues CH 2 SH CHOH CHOH CH 2 SH Dithiothreitol (DTT) FIGURE 3–26 Breaking disulfide bonds in proteins. Two common methods are illustrated. Oxidation of a cystine residue with performic acid produces two cysteic acid residues. Reduction by dithiothreitol to form Cys residues must be followed by further modification of the reactive OSH groups to prevent re-formation of the disulfide bond. Acetylation by iodoacetate serves this purpose. 8885d_c03_099 12/23/03 10:25 AM Page 99 mac111 mac111:reb: from the total number of Lys or Arg residues in the orig- inal polypeptide, as determined by hydrolysis of an in- tact sample (Fig. 3–27). A polypeptide with five Lys and/or Arg residues will usually yield six smaller pep- tides on cleavage with trypsin. Moreover, all except one of these will have a carboxyl-terminal Lys or Arg. The fragments produced by trypsin (or other enzyme or chemical) action are then separated by chromato- graphic or electrophoretic methods. Sequencing of Peptides Each peptide fragment resulting from the action of trypsin is sequenced separately by the Edman procedure. Ordering Peptide Fragments The order of the “trypsin fragments” in the original polypeptide chain must now be determined. Another sample of the intact polypep- tide is cleaved into fragments using a different enzyme or reagent, one that cleaves peptide bonds at points other than those cleaved by trypsin. For example, cyanogen bromide cleaves only those peptide bonds in which the carbonyl group is contributed by Met. The fragments resulting from this second procedure are then separated and sequenced as before. The amino acid sequences of each fragment ob- tained by the two cleavage procedures are examined, with the objective of finding peptides from the second procedure whose sequences establish continuity, be- cause of overlaps, between the fragments obtained by the first cleavage procedure (Fig. 3–27). Overlapping peptides obtained from the second fragmentation yield the correct order of the peptide fragments produced in the first. If the amino-terminal amino acid has been iden- tified before the original cleavage of the protein, this in- formation can be used to establish which fragment is derived from the amino terminus. The two sets of frag- ments can be compared for possible errors in deter- mining the amino acid sequence of each fragment. If the second cleavage procedure fails to establish conti- nuity between all peptides from the first cleavage, a third or even a fourth cleavage method must be used to obtain a set of peptides that can provide the necessary overlap(s). Locating Disulfide Bonds If the primary structure in- cludes disulfide bonds, their locations are determined in an additional step after sequencing is completed. A sample of the protein is again cleaved with a reagent such as trypsin, this time without first breaking the disulfide bonds. The resulting peptides are separated by electrophoresis and compared with the original set of peptides generated by trypsin. For each disulfide bond, two of the original peptides will be missing and a new, larger peptide will appear. The two missing peptides represent the regions of the intact polypeptide that are linked by the disulfide bond. Amino Acid Sequences Can Also Be Deduced by Other Methods The approach outlined above is not the only way to de- termine amino acid sequences. New methods based on mass spectrometry permit the sequencing of short polypeptides (20 to 30 amino acid residues) in just a few minutes (Box 3–2). In addition, with the develop- ment of rapid DNA sequencing methods (Chapter 8), the elucidation of the genetic code (Chapter 27), and the development of techniques for isolating genes (Chapter 9), researchers can deduce the sequence of a polypeptide by determining the sequence of nucleotides in the gene that codes for it (Fig. 3–28). The techniques used to determine protein and DNA sequences are com- plementary. When the gene is available, sequencing the DNA can be faster and more accurate than sequencing the protein. Most proteins are now sequenced in this in- direct way. If the gene has not been isolated, direct se- quencing of peptides is necessary, and this can provide information (the location of disulfide bonds, for exam- ple) not available in a DNA sequence. In addition, a knowledge of the amino acid sequence of even a part of a polypeptide can greatly facilitate the isolation of the corresponding gene (Chapter 9). The array of methods now available to analyze both proteins and nucleic acids is ushering in a new disci- Chapter 3 Amino Acids, Peptides, and Proteins100 *All reagents except cyanogen bromide are proteases. All are available from commercial sources. ? Residues furnishing the primary recognition point for the protease or reagent; peptide bond cleavage occurs on either the carbonyl (C) or the amino (N) side of the indicated amino acid residues. Reagent (biological source)* Cleavage points ? Trypsin Lys, Arg (C) (bovine pancreas) Submaxillarus protease Arg (C) (mouse submaxillary gland) Chymotrypsin Phe, Trp, Tyr (C) (bovine pancreas) Staphylococcus aureus V8 protease Asp, Glu (C) (bacterium S. aureus) Asp-N-protease Asp, Glu (N) (bacterium Pseudomonas fragi) Pepsin Phe, Trp, Tyr (N) (porcine stomach) Endoproteinase Lys C Lys (C) (bacterium Lysobacter enzymogenes) Cyanogen bromide Met (C) The Specificity of Some Common Methods for Fragmenting Polypeptide Chains TABLE 3–7 8885d_c03_100 12/23/03 10:25 AM Page 100 mac111 mac111:reb: pline of “whole cell biochemistry.” The complete se- quence of an organism’s DNA, its genome, is now avail- able for organisms ranging from viruses to bacteria to multicellular eukaryotes (see Table 1–4). Genes are be- ing discovered by the millions, including many that en- code proteins with no known function. To describe the entire protein complement encoded by an organism’s DNA, researchers have coined the term proteome. As described in Chapter 9, the new disciplines of genomics and proteomics are complementing work carried out on cellular intermediary metabolism and nucleic acid metabolism to provide a new and increasingly complete picture of biochemistry at the level of cells and even organisms. 3.4 The Covalent Structure of Proteins 101 hydrolyze; separate amino acids Result A5 I3 R1 C2 K2 S2 D4 L2 T1 E2 M2F1 G3 P3 Y2 H2 Conclusion Polypeptide has 38 amino acid residues. Tryp- sin will cleave three times (at one R (Arg) and two K (Lys)) to give four frag- ments. Cyanogen bromide will cleave at two M (Met) to give three fragments. Polypeptide react with FDNB; hydrolyze; separate amino acids 2,4-Dinitrophenylglutamate detected E (Glu) is amino- terminal residue. reduce disulfide bonds (if present) by Edman degradation separate fragments; sequence cleave with trypsin; T-1 GASMALIK T-2 EGAAYHDFEPIDPR T-3 DCVHSD T-4 YLIACGPMTK T-2 begins with E (Glu). T-3 terminus because it does not end with R (Arg) or K (Lys). sequence by Edman degradation bromide; separate fragments; cleave with cyanogen C-1 EGAAYHDFEPIDPRGASM C-3 ALIKYLIACGPM C-3 them to be ordered. sequence establish Amino Carboxyl terminus terminus T-2 EGAAYHDFEPIDPRGASMALIKYLIACGPMTKDCVHSD C-1 Procedure C-2 TKDCVHSD T-3T-4T-1 C-3 C-2 SH V1 terminus because it T-1 HS placed at amino placed at carboxyl and T-4 , allowing overlaps with S S FIGURE 3–27 Cleaving proteins and sequencing and ordering the peptide fragments. First, the amino acid composition and amino- terminal residue of an intact sample are determined. Then any disulfide bonds are broken before fragmenting so that sequencing can proceed efficiently. In this example, there are only two Cys (C) residues and thus only one possibility for location of the disulfide bond. In polypep- tides with three or more Cys residues, the position of disulfide bonds can be determined as described in the text. (The one-letter symbols for amino acids are given in Table 3–1.) sequence (protein) Gln–Tyr–Pro–Thr–Ile–Trp DNA sequence (gene) CAGTATCCTACGATTTGG Amino acid FIGURE 3–28 Correspondence of DNA and amino acid sequences. Each amino acid is encoded by a specific sequence of three nucleo- tides in DNA. The genetic code is described in detail in Chapter 27. 8885d_c03_101 12/23/03 10:26 AM Page 101 mac111 mac111:reb: Chapter 3 Amino Acids, Peptides, and Proteins102 BOX 3–2 WORKING IN BIOCHEMISTRY Investigating Proteins with Mass Spectrometry The mass spectrometer has long been an indispensa- ble tool in chemistry. Molecules to be analyzed, re- ferred to as analytes, are first ionized in a vacuum. When the newly charged molecules are introduced into an electric and/or magnetic field, their paths through the field are a function of their mass-to-charge ratio, m/z. This measured property of the ionized species can be used to deduce the mass (M) of the analyte with very high precision. Although mass spectrometry has been in use for many years, it could not be applied to macromolecules such as proteins and nucleic acids. The m/z meas- urements are made on molecules in the gas phase, and the heating or other treatment needed to transfer a macromolecule to the gas phase usually caused its rapid decomposition. In 1988, two different tech- niques were developed to overcome this problem. In one, proteins are placed in a light-absorbing matrix. With a short pulse of laser light, the proteins are ion- ized and then desorbed from the matrix into the vac- uum system. This process, known as matrix-assisted laser desorption/ionization mass spectrometry, or MALDI MS, has been successfully used to meas- ure the mass of a wide range of macromolecules. In a second and equally successful method, macromole- cules in solution are forced directly from the liquid to gas phase. A solution of analytes is passed through a charged needle that is kept at a high electrical po- tential, dispersing the solution into a fine mist of charged microdroplets. The solvent surrounding the macromolecules rapidly evaporates, and the resulting multiply charged macromolecular ions are thus intro- duced nondestructively into the gas phase. This tech- nique is called electrospray ionization mass spec- trometry, or ESI MS. Protons added during passage through the needle give additional charge to the macromolecule. The m/z of the molecule can be ana- lyzed in the vacuum chamber. Mass spectrometry provides a wealth of informa- tion for proteomics research, enzymology, and protein chemistry in general. The techniques require only miniscule amounts of sample, so they can be readily applied to the small amounts of protein that can be extracted from a two-dimensional electrophoretic gel. The accurately measured molecular mass of a protein is one of the critical parameters in its identification. Once the mass of a protein is accurately known, mass spectrometry is a convenient and accurate method for detecting changes in mass due to the presence of bound cofactors, bound metal ions, covalent modifi- cations, and so on. The process for determining the molecular mass of a protein with ESI MS is illustrated in Figure 1. As it is injected into the gas phase, a protein acquires a variable number of protons, and thus positive charges, from the solvent. This creates a spectrum of species with different mass-to-charge ratios. Each successive peak corresponds to a species that differs from that 100 50+ 75 50 Relative intensity (%) 25 0 800 1,000 1,200 m/z 40+ 100 50 0 47,000 48,000 47,342 30+ 1,400 1,600 M r Mass spectrometer Vacuum interface Glass capillary Sample solution High voltage + (b) (a) FIGURE 1 Electrospray mass spectrometry of a protein. (a) A pro- tein solution is dispersed into highly charged droplets by passage through a needle under the influence of a high-voltage electric field. The droplets evaporate, and the ions (with added protons in this case) enter the mass spectrometer for m/z measurement. The spec- trum generated (b) is a family of peaks, with each successive peak (from right to left) corresponding to a charged species increased by 1 in both mass and charge. A computer-generated transformation of this spectrum is shown in the inset. 8885d_c03_102 12/23/03 10:26 AM Page 102 mac111 mac111:reb: 3.4 The Covalent Structure of Proteins 103 of its neighboring peak by a charge difference of 1 and a mass difference of 1 (1 proton). The mass of the protein can be determined from any two neighboring peaks. The measured m/z of one peak is (m/z) 2 H11005 where M is the mass of the protein, n 2 is the number of charges, and X is the mass of the added groups (protons in this case). Similarly for the neighboring peak, (m/z) 1 H11005 We now have two unknowns (M and n 2 ) and two equa- tions. We can solve first for n 2 and then for M: n 2 H11005 M H11005 n 2 [(m/z) 2 H11002 X] This calculation using the m/z values for any two peaks in a spectrum such as that shown in Figure 1b usually provides the mass of the protein (in this case, aerolysin k; 47,342 Da) with an error of only H110060.01%. Generating several sets of peaks, repeating the calcu- lation, and averaging the results generally provides an even more accurate value for M. Computer algorithms can transform the m/z spectrum into a single peak that also provides a very accurate mass measurement (Fig. 1b, inset). Mass spectrometry can also be used to sequence short stretches of polypeptide, an application that has emerged as an invaluable tool for quickly identifying unknown proteins. Sequence information is extracted using a technique called tandem MS, or MS/MS. A solution containing the protein under investigation is first treated with a protease or chemical reagent to hydrolyze it to a mixture of shorter peptides. The mix- ture is then injected into a device that is essentially two mass spectrometers in tandem (Fig. 2a, top). In the first, the peptide mixture is sorted and the ion- ized fragments are manipulated so that only one of the several types of peptides produced by cleavage emerges at the other end. The sample of the selected (m/z) 2 H11002 X H5007H5007 (m/z) 2 H11002 (m/z) 1 M H11001 (n 2 H11001 1)X H5007H5007 n 2 H11001 1 M H11001 n 2X H5007H5007 n 2 100 Relative intensity (%) 75 50 25 0 200 y 1 H11033 y 2 H11033 y 3 H11033 y 4 H11033 y 5 H11033 y 6 H11033 y 7 H11033 y 8 H11033 y 9 H11033 400 600 m/z 800 1,000 R 1 R 2 C H H 2 N R 3 C H O O C O b y CN C H R 4 C H C H N H O O O – C CN H R 5 C H N H R 1 R 2 C H H 2 N R 3 C H O O C O CN C H R 4 C H C H N H O O O – C CN H R 5 C H N H (a) (b) MS-2 DetectorMS-1 Collision cell SeparationElectrospray ionization Breakage FIGURE 2 Obtaining protein sequence information with tandem MS. (a) After proteolytic hydrolysis, a protein solution is injected into a mass spectrometer (MS-1). The different peptides are sorted so that only one type is selected for further analysis. The selected peptide is further fragmented in a chamber between the two mass spectrometers, and m/z for each fragment is measured in the sec- ond mass spectrometer (MS-2). Many of the ions generated during this second fragmentation result from breakage of the peptide bond, as shown. These are called b-type or y-type ions, depending on whether the charge is retained on the amino- or carboxyl-terminal side, respectively. (b) A typical spectrum with peaks representing the peptide fragments generated from a sample of one small pep- tide (10 residues). The labeled peaks are y-type ions. The large peak next to y 5 H11033 is a doubly charged ion and is not part of the y set. The successive peaks differ by the mass of a particular amino acid in the original peptide. In this case, the deduced sequence was Phe–Pro–Gly–Gln–(Ile/Leu)–Asn–Ala–Asp–(Ile/Leu)–Arg. Note the ambiguity about Ile and Leu residues, because they have the same molecular mass. In this example, the set of peaks derived from y-type ions predominates, and the spectrum is greatly simplified as a re- sult. This is because an Arg residue occurs at the carboxyl terminus of the peptide, and most of the positive charges are retained on this residue. (continued on next page) 8885d_c03_103 12/23/03 10:26 AM Page 103 mac111 mac111:reb: Small Peptides and Proteins Can Be Chemically Synthesized Many peptides are potentially useful as pharmacologic agents, and their production is of considerable com- mercial importance. There are three ways to obtain a peptide: (1) purification from tissue, a task often made difficult by the vanishingly low concentrations of some peptides; (2) genetic engineering (Chapter 9); or (3) di- rect chemical synthesis. Powerful techniques now make direct chemical synthesis an attractive option in many cases. In addition to commercial applications, the syn- thesis of specific peptide portions of larger proteins is an increasingly important tool for the study of protein structure and function. The complexity of proteins makes the traditional synthetic approaches of organic chemistry impractical for peptides with more than four or five amino acid residues. One problem is the difficulty of purifying the product after each step. The major breakthrough in this technology was provided by R. Bruce Merrifield in 1962. His innovation involved synthesizing a peptide while keeping it at- tached at one end to a solid support. The support is an insoluble polymer (resin) contained within a column, similar to that used for chromatographic procedures. The peptide is built up on this support one amino acid at a time using a standard set of reactions in a repeat- ing cycle (Fig. 3–29). At each successive step in the cycle, protective chemical groups block unwanted reactions. The technology for chemical peptide synthesis is now automated. As in the sequencing reactions already considered, the most important limitation of the process is the efficiency of each chemical cycle, as can be seen by calculating the overall yields of peptides of various Chapter 3 Amino Acids, Peptides, and Proteins104 peptide, each molecule of which has a charge some- where along its length, then travels through a vacuum chamber between the two mass spectrometers. In this collision cell, the peptide is further fragmented by high-energy impact with a “collision gas,” a small amount of a noble gas such as helium or argon that is bled into the vacuum chamber. This procedure is de- signed to fragment many of the peptide molecules in the sample, with each individual peptide broken in only one place, on average. Most breaks occur at pep- tide bonds. This fragmentation does not involve the addition of water (it is done in a near-vacuum), so the products may include molecular ion radicals such as carbonyl radicals (Fig. 2a, bottom). The charge on the original peptide is retained on one of the fragments generated from it. The second mass spectrometer then measures the m/z ratios of all the charged fragments (uncharged fragments are not detected). This generates one or more sets of peaks. A given set of peaks (Fig. 2b) con- sists of all the charged fragments that were generated by breaking the same type of bond (but at different points in the peptide) and are derived from the same side of the bond breakage, either the carboxyl- or amino-terminal side. Each successive peak in a given set has one less amino acid than the peak before. The difference in mass from peak to peak identifies the amino acid that was lost in each case, thus revealing the sequence of the peptide. The only ambiguities in- volve leucine and isoleucine, which have the same mass. The charge on the peptide can be retained on ei- ther the carboxyl- or amino-terminal fragment, and bonds other than the peptide bond can be broken in the fragmentation process, with the result that multi- ple sets of peaks are usually generated. The two most prominent sets generally consist of charged fragments derived from breakage of the peptide bonds. The set consisting of the carboxyl-terminal fragments can be unambiguously distinguished from that consisting of the amino-terminal fragments. Because the bond breaks generated between the spectrometers (in the collision cell) do not yield full carboxyl and amino groups at the sites of the breaks, the only intact H9251- amino and H9251-carboxyl groups on the peptide frag- ments are those at the very ends (Fig. 2a). The two sets of fragments can thereby be identified by the re- sulting slight differences in mass. The amino acid se- quence derived from one set can be confirmed by the other, improving the confidence in the sequence in- formation obtained. Even a short sequence is often enough to permit unambiguous association of a protein with its gene, if the gene sequence is known. Sequencing by mass spectrometry cannot replace the Edman degradation procedure for the sequencing of long polypeptides, but it is ideal for proteomics research aimed at cata- loging the hundreds of cellular proteins that might be separated on a two-dimensional gel. In the coming decades, detailed genomic sequence data will be avail- able from hundreds, eventually thousands, of organ- isms. The ability to rapidly associate proteins with genes using mass spectrometry will greatly facilitate the exploitation of this extraordinary information resource. BOX 3–2 WORKING IN BIOCHEMISTRY (continued from previous page) 8885d_c03_104 1/16/04 6:08 AM Page 104 mac76 mac76:385_reb: lengths when the yield for addition of each new amino acid is 96.0% versus 99.8% (Table 3–8). Incomplete re- action at one stage can lead to formation of an impurity (in the form of a shorter peptide) in the next. The chemistry has been optimized to permit the synthesis of proteins of 100 amino acid residues in a few days in reasonable yield. A very similar approach is used to synthesize nucleic acids (see Fig. 8–38). It is worth not- ing that this technology, impressive as it is, still pales when compared with biological processes. The same 3.4 The Covalent Structure of Proteins 105 -amino group protected Amino acid 1 with by Fmoc group Cl CH 2 Insoluble polystyrene bead N H C R 1 HC Cl H11002 N H C R 1 HC O OCH 2 3 amino acid to reactive Protecting group is removed by flushing with solution containing a mild organic base. -Amino group of amino acid 1 attacks activated 2 to form peptide bond. N H C O N H Dicyclohexylurea byproduct to repeated as necessary Completed peptide is deprotected as in ester linkage between peptide and resin. N H C R 2 HC O O H11002 Amino acid 2 with protected -amino group is activated at carboxyl group by DCC. H 3 C R 1 HC O OOCH 2 C Dicyclohexylcarbodiimide (DCC) N H C R 2 HC O OC NH N N H C R 2 HC O N H C R 1 HC O OCH 2 H 3 N H11001 N H11001 C R HC O N H C R 2 HC O O H11002 H11001 FCH 2 HF Attachment of carboxyl-terminal 1 2 5 4 4 Reactions carboxyl group of amino acid group on resin. reaction 2 2 ; HF cleaves 1 O O H11002 NN Fmoc Fmoc Fmoc Fmoc Fmoc H9251 H9251 H9251 R. Bruce Merrifield FIGURE 3–29 Chemical synthesis of a peptide on an insoluble polymer support. Reactions 1 through 4 are necessary for the formation of each peptide bond. The 9-fluorenylmethoxycarbonyl (Fmoc) group (shaded blue) prevents unwanted reactions at the H9251-amino group of the residue (shaded red). Chemical synthesis proceeds from the carboxyl terminus to the amino terminus, the reverse of the direction of protein synthesis in vivo (Chapter 27). O H11002 R 1 CH 2 CHON H C O Fmoc Amino acid residue C O 8885d_c03_105 12/23/03 10:27 AM Page 105 mac111 mac111:reb: 100-amino-acid protein would be synthesized with ex- quisite fidelity in about 5 seconds in a bacterial cell. A variety of new methods for the efficient ligation (joining together) of peptides has made possible the as- sembly of synthetic peptides into larger proteins. With these methods, novel forms of proteins can be created with precisely positioned chemical groups, including those that might not normally be found in a cellular pro- tein. These novel forms provide new ways to test theo- ries of enzyme catalysis, to create proteins with new chemical properties, and to design protein sequences that will fold into particular structures. This last appli- cation provides the ultimate test of our increasing abil- ity to relate the primary structure of a peptide to the three-dimensional structure that it takes up in solution. Amino Acid Sequences Provide Important Biochemical Information Knowledge of the sequence of amino acids in a protein can offer insights into its three-dimensional structure and its function, cellular location, and evolution. Most of these insights are derived by searching for similari- ties with other known sequences. Thousands of se- quences are known and available in databases accessi- ble through the Internet. A comparison of a newly obtained sequence with this large bank of stored se- quences often reveals relationships both surprising and enlightening. Exactly how the amino acid sequence determines three-dimensional structure is not understood in detail, nor can we always predict function from sequence. However, protein families that have some shared struc- tural or functional features can be readily identified on the basis of amino acid sequence similarities. Individual proteins are assigned to families based on the degree of similarity in amino acid sequence. Members of a family are usually identical across 25% or more of their se- quences, and proteins in these families generally share at least some structural and functional characteristics. Some families are defined, however, by identities in- volving only a few amino acid residues that are critical to a certain function. A number of similar substructures (to be defined in Chapter 4 as “domains”) occur in many functionally unrelated proteins. These domains often fold into structural configurations that have an unusual degree of stability or that are specialized for a certain environment. Evolutionary relationships can also be in- ferred from the structural and functional similarities within protein families. Certain amino acid sequences serve as signals that determine the cellular location, chemical modification, and half-life of a protein. Special signal sequences, usu- ally at the amino terminus, are used to target certain proteins for export from the cell; other proteins are tar- geted for distribution to the nucleus, the cell surface, the cytosol, and other cellular locations. Other se- quences act as attachment sites for prosthetic groups, such as sugar groups in glycoproteins and lipids in lipoproteins. Some of these signals are well character- ized and are easily recognized in the sequence of a newly characterized protein (Chapter 27). SUMMARY 3.4 The Covalent Structure of Proteins ■ Differences in protein function result from differences in amino acid composition and sequence. Some variations in sequence are possible for a particular protein, with little or no effect on function. ■ Amino acid sequences are deduced by fragmenting polypeptides into smaller peptides using reagents known to cleave specific peptide bonds; determining the amino acid sequence of each fragment by the automated Edman degradation procedure; then ordering the peptide fragments by finding sequence overlaps between fragments generated by different reagents. A protein sequence can also be deduced from the nucleotide sequence of its corresponding gene in DNA. ■ Short proteins and peptides (up to about 100 residues) can be chemically synthesized. The peptide is built up, one amino acid residue at a time, while remaining tethered to a solid support. 3.5 Protein Sequences and Evolution The simple string of letters denoting the amino acid se- quence of a given protein belies the wealth of informa- tion this sequence holds. As more protein sequences have become available, the development of more pow- erful methods for extracting information from them has become a major biochemical enterprise. Each protein’s function relies on its three-dimensional structure, which Chapter 3 Amino Acids, Peptides, and Proteins106 TABLE 3–8 Overall yield of final peptide (%) Number of residues in when the yield of each step is: the final polypeptide 96.0% 99.8% 11 66 98 21 44 96 31 29 94 51 13 90 100 1.7 82 Effect of Stepwise Yield on Overall Yield in Peptide Synthesis 8885d_c03_106 12/23/03 10:27 AM Page 106 mac111 mac111:reb: in turn is determined largely by its primary structure. Thus, the biochemical information conveyed by a pro- tein sequence is in principle limited only by our own un- derstanding of structural and functional principles. On a different level of inquiry, protein sequences are be- ginning to tell us how the proteins evolved and, ulti- mately, how life evolved on this planet. Protein Sequences Can Elucidate the History of Life on Earth The field of molecular evolution is often traced to Emile Zuckerkandl and Linus Pauling, whose work in the mid- 1960s advanced the use of nucleotide and protein se- quences to explore evolution. The premise is deceptively straightforward. If two organisms are closely related, the sequences of their genes and proteins should be simi- lar. The sequences increasingly diverge as the evolu- tionary distance between two organisms increases. The promise of this approach began to be realized in the 1970s, when Carl Woese used ribosomal RNA sequences to define archaebacteria as a group of living organisms distinct from other bacteria and eukaryotes (see Fig. 1–4). Protein sequences offer an opportunity to greatly refine the available information. With the advent of genome projects investigating organisms from bacteria to humans, the number of available sequences is grow- ing at an enormous rate. This information can be used to trace biological history. The challenge is in learning to read the genetic hieroglyphics. Evolution has not taken a simple linear path. Com- plexities abound in any attempt to mine the evolution- ary information stored in protein sequences. For a given protein, the amino acid residues essential for the activ- ity of the protein are conserved over evolutionary time. The residues that are less important to function may vary over time—that is, one amino acid may substitute for another—and these variable residues can provide the information used to trace evolution. Amino acid sub- stitutions are not always random, however. At some po- sitions in the primary structure, the need to maintain protein function may mean that only particular amino acid substitutions can be tolerated. Some proteins have more variable amino acid residues than others. For these and other reasons, proteins can evolve at different rates. Another complicating factor in tracing evolutionary history is the rare transfer of a gene or group of genes from one organism to another, a process called lateral gene transfer. The transferred genes may be quite sim- ilar to the genes they were derived from in the original organism, whereas most other genes in the same two organisms may be quite distantly related. An example of lateral gene transfer is the recent rapid spread of antibiotic-resistance genes in bacterial populations. The proteins derived from these transferred genes would not be good candidates for the study of bacterial evolution, because they share only a very limited evolutionary his- tory with their “host” organisms. The study of molecular evolution generally focuses on families of closely related proteins. In most cases, the families chosen for analysis have essential functions in cellular metabolism that must have been present in the earliest viable cells, thus greatly reducing the chance that they were introduced relatively recently by lateral gene transfer. For example, a protein called EF-1H9251 (elongation factor 1H9251) is involved in the synthesis of pro- teins in all eukaryotes. A similar protein, EF-Tu, with the same function, is found in bacteria. Similarities in sequence and function indicate that EF-1H9251 and EF-Tu are members of a family of proteins that share a com- mon ancestor. The members of protein families are called homologous proteins, or homologs. The con- cept of a homolog can be further refined. If two proteins within a family (that is, two homologs) are present in the same species, they are referred to as paralogs. Ho- mologs from different species are called orthologs (see Fig. 1–37). The process of tracing evolution involves first identifying suitable families of homologous proteins and then using them to reconstruct evolutionary paths. Homologs are identified using increasingly power- ful computer programs that can directly compare two or more chosen protein sequences, or can search vast databases to find the evolutionary relatives of one se- lected protein sequence. The electronic search process can be thought of as sliding one sequence past the other until a section with a good match is found. Within this sequence alignment, a positive score is assigned for each position where the amino acid residues in the two se- quences are identical—the value of the score varying from one program to the next—to provide a measure of the quality of the alignment. The process has some com- plications. Sometimes the proteins being compared match well at, say, two sequence segments, and these segments are connected by less related sequences of different lengths. Thus the two matching segments can- not be aligned at the same time. To handle this, the com- puter program introduces “gaps” in one of the sequences to bring the matching segments into register (Fig. 3–30). 3.5 Protein Sequences and Evolution 107 FIGURE 3–30 Aligning protein sequences with the use of gaps. Shown here is the sequence alignment of a short section of the EF-Tu protein from two well-studied bacterial species, E. coli and Bacillus subtilis. Introduction of a gap in the B. subtilis sequence allows a bet- ter alignment of amino acid residues on either side of the gap. Iden- tical amino acid residues are shaded. T D G E N D R Q T T I I A L V L Y Y D D L L G G G G G G T T F F D D I V S S I I I L E E I L D G E D V G DGEKT T F F E E V V L R A S T T N A G G D D T N H R L L G G G G E D D D F F D D S Q R V L I I I H D Y H L L E. coli B. subtilis Gap 8885d_c03_107 12/23/03 10:27 AM Page 107 mac111 mac111:reb: Of course, if a sufficient number of gaps are introduced, almost any two sequences could be brought into some sort of alignment. To avoid uninformative alignments, the programs include penalties for each gap introduced, thus lowering the overall alignment score. With elec- tronic trial and error, the program selects the alignment with the optimal score that maximizes identical amino acid residues while minimizing the introduction of gaps. Identical amino acids are often inadequate to iden- tify related proteins or, more importantly, to determine how closely related the proteins are on an evolutionary time scale. A more useful analysis includes a consider- ation of the chemical properties of substituted amino acids. When amino acid substitutions are found within a protein family, many of the differences may be con- servative—that is, an amino acid residue is replaced by a residue having similar chemical properties. For ex- ample, a Glu residue may substitute in one family mem- ber for the Asp residue found in another; both amino acids are negatively charged. Such a conservative sub- stitution should logically garner a higher score in a se- quence alignment than does a nonconservative substi- tution, such as the replacement of the Asp residue with a hydrophobic Phe residue. To determine what scores to assign to the many dif- ferent amino acid substitutions, Steven Henikoff and Jorja Henikoff examined the aligned sequences from a variety of different proteins. They did not analyze en- tire protein sequences, focusing instead on thousands of short conserved blocks where the fraction of identi- cal amino acids was high and the alignments were thus reliable. Looking at the aligned sequence blocks, the Henikoffs analyzed the nonidentical amino acid residues within the blocks. Higher scores were given to non- identical residues that occurred frequently than to those that appeared rarely. Even the identical residues were given scores based on how often they were replaced, such that amino acids with unique chemical properties (such as Cys and Trp) received higher scores than those more conservatively replaced (such as Asp and Glu). The result of this scoring system is a Blosum (blocks substitution matrix) table. The table in Figure 3–31 was generated from sequences that were identical in at least 62% of their amino acid residues, and it is thus referred to as Blosum62. Similar tables have been generated for blocks of homologous sequences that are 50% or 80% identical. When higher levels of identity are required, the most conservative amino acid substitutions can be Chapter 3 Amino Acids, Peptides, and Proteins108 A Ala 4 C 0 9 D H110022 H110023 6 E A C Cys D Asp H110021 H110024 2 5 F E Glu H110022 H110022 H110023 H110023 6 G F Phe 0 H110023 H110021 H110022 H110023 6 H G Gly H110022 H110023 H110021 0 H110021 H110022 8 I H His H110021 H110021 H110023 H110023 0 H110024 H110023 4 K I Ile H110021 H110023 H110021 1 H110023 H110022 H110021 H110023 5 L K Lys H110021 H110021 H110024 H110023 0 H110024 H110023 2 H110022 4 M L Leu H110021 H110021 H110023 H110022 0 H110023 H110022 1 H110021 2 5 N M Met H110022 H110023 1 0 H110023 0 1 H110023 0 H110023 H110022 6 P N Asn H110021 H110023 H110021 H110021 H110024 H110022 H110022 H110023 H110021 H110023 H110022 H110022 7 Q P Pro H110021 H110023 0 2 H110023 H110022 0 H110023 1 H110022 0 0 H110021 5 R Q Gln H110021 H110023 H110022 0 H110023 H110022 0 H110023 2 H110022 H110021 0 H110022 1 5 S R Arg 1 H110021 0 0 H110022 0 H110021 H110022 0 H110022 H110021 1 H110021 0 H110021 4 T S Ser 0 H110021 H110021 H110021 H110022 H110022 H110022 H110021 H110021 H110021 H110021 0 H110021 H110021 H110021 1 5 V T Thr 0 H110021 H110023 H110022 H110021 H110023 H110023 3 H110022 1 1 H110023 H110022 H110022 H110023 H110022 0 4 W V Val H110023 H110022 H110024 H110023 1 H110022 H110022 H110023 H110023 H110022 H110021 H110024 H110024 H110022 H110023 H110023 H110022 H110023 11 Y W Trp H110022 H110022 H110023 H110022 3 H110023 2 H110021 H110022 H110021 H110021 H110022 H110023 H110021 H110022 H110022 H110022 H110021 2 7 Y Tyr FIGURE 3–31 The Blosum62 table. This blocks substitution matrix was created by comparing thousands of short blocks of aligned se- quences that were identical in at least 62% of their amino acid residues. The nonidentical residues were assigned scores based on how frequently they were replaced by each of the other amino acids. Each substitution contributes to the score given to a particular align- ment. Positive numbers (shaded yellow) add to the score for a partic- ular alignment; negative numbers subtract from the score. Identical residues in sequences being compared (the shaded diagonal from top left to bottom right in the matrix) receive scores based on how often they are replaced, such that amino acids with unique chemical prop- erties (e.g., Cys and Trp) receive higher scores (9 and 11, respectively) than those more easily replaced in conservative substitutions (e.g., Asp (6) and Glu (5)). Many computer programs use Blosum62 to assign scores to new sequence alignments. 8885d_c03_108 12/23/03 10:27 AM Page 108 mac111 mac111:reb: overrepresented, which limits the usefulness of the ma- trix in identifying homologs that are somewhat distantly related. Tests have shown that the Blosum62 table pro- vides the most reliable alignments over a wide range of protein families, and it is the default table in many se- quence alignment programs. For most efforts to find homologies and explore evo- lutionary relationships, protein sequences (derived ei- ther directly from protein sequencing or from the se- quencing of the DNA encoding the protein) are superior to nongenic nucleic acid sequences (those that do not encode a protein or functional RNA). For a nucleic acid, with its four different types of residues, random align- ment of nonhomologous sequences will generally yield matches for at least 25% of the positions. Introduction of a few gaps can often increase the fraction of matched residues to 40% or more, and the probability of chance alignment of unrelated sequences becomes quite high. The 20 different amino acid residues in proteins greatly lower the probability of uninformative chance align- ments of this type. The programs used to generate a sequence align- ment are complemented by methods that test the reli- ability of the alignments. A common computerized test is to shuffle the amino acid sequence of one of the pro- teins being compared to produce a random sequence, then instruct the program to align the shuffled sequence with the other, unshuffled one. Scores are assigned to the new alignment, and the shuffling and alignment process is repeated many times. The original alignment, before shuffling, should have a score significantly higher than any of those within the distribution of scores gen- erated by the random alignments; this increases the con- fidence that the sequence alignment has identified a pair of homologs. Note that the absence of a significant align- ment score does not necessarily mean that no evolu- tionary relationship exists between two proteins. As we shall see in Chapter 4, three-dimensional structural sim- ilarities sometimes reveal evolutionary relationships where sequence homology has been wiped away by time. Using a protein family to explore evolution requires the identification of family members with similar mo- lecular functions in the widest possible range of organ- isms. Information from the family can then be used to trace the evolution of those organisms. By analyzing the sequence divergence in selected protein families, in- vestigators can segregate organisms into classes based on their evolutionary relationships. This information must be reconciled with more classical examinations of the physiology and biochemistry of the organisms. Certain segments of a protein sequence may be found in the organisms of one taxonomic group but not in other groups; these segments can be used as signa- ture sequences for the group in which they are found. An example of a signature sequence is an insertion of 12 amino acids near the amino terminus of the EF- 1H9251/EF-Tu proteins in all archaebacteria and eukaryotes but not in other types of bacteria (Fig. 3–32). The sig- nature is one of many biochemical clues that can help establish the evolutionary relatedness of eukaryotes and archaebacteria. For example, the major taxa of bacteria can be distinguished by signature sequences in several different proteins. The H9252 and H9253 proteobacteria have sig- nature sequences in the Hsp70 and DNA gyrase protein families (families of proteins involved in protein folding and DNA replication, respectively) that are not present in any other bacteria, including the other proteobacte- ria. The other types of proteobacteria (H9251, H9254, H9255), along with the H9252 and H9253 proteobacteria, have a separate Hsp70 signature sequence and a signature in alanyl-tRNA syn- thetase (an enzyme of protein synthesis) that are not present in other bacteria. The appearance of unique sig- natures in the H9252 and H9253 proteobacteria suggests the H9251, H9254, and H9255 proteobacteria arose before their H9252 and H9253 cousins. By considering the entire sequence of a protein, re- searchers can now construct more elaborate evolution- ary trees with many species in each taxonomic group. Figure 3–33 presents one such tree for bacteria, based on sequence divergence in the protein GroEL (a pro- tein present in all bacteria that assists in the proper fold- ing of proteins). The tree can be refined by basing it on the sequences of multiple proteins and by supplement- ing the sequence information with data on the unique biochemical and physiological properties of each species. There are many methods for generating trees, each with its own advantages and shortcomings, and 3.5 Protein Sequences and Evolution 109 FIGURE 3–32 A signature sequence in the EF-1H9251/EF-Tu protein family. The signature sequence (boxed) is a 12-amino-acid insertion near the amino terminus of the sequence. Residues that align in all species are shaded yellow. Both archaebacteria and eukaryotes have the signature, although the sequences of the insertions are quite dis- tinct for the two groups. The variation in the signature sequence re- flects the significant evolutionary divergence that has occurred at this site since it first appeared in a common ancestor of both groups. I I I I I I G G G G G G H H H H H H V V V V V V D D D D D D H H S S H H G G G G G G K K K K K K S S S S S T T T T T T T M L T T M L V V T T V T G G G G G A R R H H R A L L L L L L I I Y M Y Y E D K K T R C C G G G G S F G G V I I I P D D D E E K K H K R R V T T T I V I I I I E K E E T T Q E K K T T H A F F V V Halobacterium halobium Sulfolobus solfataricus Saccharomyces cerevisiae Homo sapiens Bacillus subtilis Escherichia coli Archaebacteria Eukaryotes Gram-positive bacterium Gram-negative bacterium Signature sequence 8885d_c03_109 12/23/03 10:27 AM Page 109 mac111 mac111:reb: many ways to represent the resulting evolutionary rela- tionships. In Figure 3–33, the free end points of lines are called “external nodes”; each represents an extant species, and each is so labeled. The points where two lines come together, the “internal nodes,” represent ex- tinct ancestor species. In most representations (includ- ing Fig. 3–33), the lengths of the lines connecting the nodes are proportional to the number of amino acid sub- stitutions separating one species from another. If we trace two extant species to a common internal node (representing the common ancestor of the two species), the length of the branch connecting each external node to the internal node represents the number of amino acid substitutions separating one extant species from this ancestor. The sum of the lengths of all the line seg- ments that connect an extant species to another extant species through a common ancestor reflects the num- ber of substitutions separating the two extant species. To determine how much time was needed for the vari- ous species to diverge, the tree must be calibrated by comparing it with information from the fossil record and other sources. As more sequence information is made available in databases, we can generate evolutionary trees based on a variety of different proteins. Some proteins evolve faster than others, or change faster within one group of species than another. A large protein, with many vari- able amino acid residues, may exhibit a few differences between two closely related species. Another, smaller protein may be identical in the same two species. For many reasons, some details of an evolutionary tree based on the sequences of one protein may differ from those of a tree based on the sequences of another pro- tein. Increasingly sophisticated analyses using the se- quences of many different proteins can provide an ex- quisitely detailed and accurate picture of evolutionary relationships. The story is a work in progress, and the questions being asked and answered are fundamental to how humans view themselves and the world around them. The field of molecular evolution promises to be among the most vibrant of the scientific frontiers in the twenty-first century. SUMMARY 3.5 Protein Sequences and Evolution ■ Protein sequences are a rich source of information about protein structure and function, as well as the evolution of life on this planet. Sophisticated methods are being developed to trace evolution by analyzing the resultant slow changes in the amino acid sequences of homologous proteins. Chapter 3 Amino Acids, Peptides, and Proteins110 Leptospira interrogans Borrelia burgdorferi Spirochaetes Thermophilic bacterium PS-3 Bacillus subtilis Staphylococcus aureus Clostridium acetobutylicum Clostridium perfringens Streptomyces albus [gene] Streptomyces coelicolor Mycobacterium leprae Mycobacterium tuberculosis low G + C high G + C Gram-positive bacteria Cyanobacteria and chloroplasts Cyanidium caldarium chl. Synechocystis Ricinus communis chl. Triticum aestivum chl. Brassica napus chl. Arabidopsis thaliana chl. Zymomonas mobilis Agrobacterium tumefaciens 0.1 substitutions/site Bradyrhizobium japonicum Rickettsia tsutsugamushi Neisseria gonorrhoeae Yersinia enterocolitica Salmonella typhi Escherichia coli Pseudomonas aeruginosa Legionella pneumophila Helicobacter pylori Porphyromonas gingivalis Chlamydia trachomatis Chlamydia psittaci Chlamydia Bacteroides Proteobacteria H9253 H9251 H9252 H9254H11408H9255 FIGURE 3–33 Evolutionary tree derived from amino acid sequence comparisons. A bacterial evolutionary tree, based on the sequence divergence observed in the GroEL family of proteins. Also included in this tree (lower right) are the chloroplasts (chl.) of some nonbacterial species. 8885d_c03_110 12/24/03 6:49 AM Page 110 mac76 mac76:385_reb: Chapter 3 Further Reading 111 Key Terms amino acids 75 R group 76 chiral center 76 enantiomers 76 absolute configuration 77 D, L system 77 polarity 78 zwitterion 81 absorbance, A 82 isoelectric pH (isoelec- tric point, pI) 84 peptide 85 protein 85 peptide bond 85 oligopeptide 85 polypeptide 85 oligomeric protein 87 protomer 87 conjugated protein 88 prosthetic group 88 primary structure 88 secondary structure 88 tertiary structure 88 quaternary structure 88 crude extract 89 fractionation 89 dialysis 89 column chromatography 89 high-performance liquid chromatography (HPLC) 90 electrophoresis 92 sodium dodecyl sulfate (SDS) 92 isoelectric focusing 93 Edman degradation 98 proteases 99 proteome 101 lateral gene transfer 107 homologous proteins 107 homolog 107 paralog 107 ortholog 107 signature sequence 109 Terms in bold are defined in the glossary. Further Reading Amino Acids Dougherty, D.A. (2000) Unnatural amino acids as probes of pro- tein structure and function. Curr. Opin. Chem. Biol. 4, 645–652. Greenstein, J.P. & Winitz, M. (1961) Chemistry of the Amino Acids, 3 Vols, John Wiley & Sons, New York. Kreil, G. (1997) D-Amino acids in animal peptides. Annu. Rev. Biochem. 66, 337–345. An update on the occurrence of these unusual stereoisomers of amino acids. Meister, A. (1965) Biochemistry of the Amino Acids, 2nd edn, Vols 1 and 2, Academic Press, Inc., New York. Encyclopedic treatment of the properties, occurrence, and me- tabolism of amino acids. Peptides and Proteins Creighton, T.E. (1992) Proteins: Structures and Molecular Properties, 2nd edn, W. H. Freeman and Company, New York. Very useful general source. Working with Proteins Dunn, M.J. & Corbett, J.M. (1996) Two-dimensional polyacryl- amide gel electrophoresis. Methods Enzymol. 271, 177–203. A detailed description of the technology. Kornberg, A. (1990) Why purify enzymes? Methods Enzymol. 182, 1–5. The critical role of classical biochemical methods in a new age. Scopes, R.K. (1994) Protein Purification: Principles and Prac- tice, 3rd edn, Springer-Verlag, New York. A good source for more complete descriptions of the principles underlying chromatography and other methods. Covalent Structure of Proteins Andersson, L., Blomberg, L., Flegel, M., Lepsa, L., Nilsson, B., & Verlander, M. (2000) Large-scale synthesis of peptides. Biopolymers 55, 227–250. A discussion of approaches used to manufacture peptides as pharmaceuticals. Dell, A. & Morris, H.R. (2001) Glycoprotein structure determi- nation by mass spectrometry. Science 291, 2351–2356. Glycoproteins can be complex; mass spectrometry is a method of choice for sorting things out. Dongre, A.R., Eng, J.K., & Yates, J.R. III (1997) Emerging tandem-mass-spectrometry techniques for the rapid identification of proteins. Trends Biotechnol. 15, 418–425. A detailed description of mass spectrometry methods. Gygi, S.P. & Aebersold, R. (2000) Mass spectrometry and pro- teomics. Curr. Opin. Chem. Biol. 4, 489–494. Uses of mass spectrometry to identify and study cellular proteins. Koonin, E.V., Tatusov, R.L., & Galperin, M.Y. (1998) Beyond complete genomes: from sequence to structure and function. Curr. Opin. Struct. Biol. 8, 355–363. A good discussion about the possible uses of the tremendous amount of protein sequence information becoming available. Mann, M. & Wilm, M. (1995) Electrospray mass spectrometry for protein characterization. Trends Biochem. Sci. 20, 219–224. An approachable summary of this technique for beginners. Mayo, K.H. (2000) Recent advances in the design and construc- tion of synthetic peptides: for the love of basics or just for the technology of it. Trends Biotechnol. 18, 212–217. 8885d_c03_111 1/16/04 6:08 AM Page 111 mac76 mac76:385_reb: Chapter 3 Amino Acids, Peptides, and Proteins112 Miranda, L.P. & Alewood, P.F. (2000) Challenges for protein chemical synthesis in the 21st century: bridging genomics and pro- teomics. Biopolymers 55, 217–226. This and the Mayo article describe how to make peptides and splice them together to address a wide range of problems in protein biochemistry. Sanger, F. (1988) Sequences, sequences, sequences. Annu. Rev. Biochem. 57, 1–28. A nice historical account of the development of sequencing methods. Protein Sequences and Evolution Gupta, R.S. (1998) Protein phylogenies and signal sequences: a reappraisal of evolutionary relationships among Archaebacteria, Eubacteria, and Eukaryotes. Microbiol. Mol. Biol. Rev. 62, 1435–1491. An almost encyclopedic but very readable report of how protein sequences are used to explore evolution, introducing many in- teresting ideas and supporting them with detailed sequence comparisons. Li, W.-H. & Graur, D. (2000) Fundamentals of Molecular Evo- lution, 2nd edn, Sinauer Associates, Inc., Sunderland, MA. A very readable text describing methods used to analyze pro- tein and nucleic acid sequences. Chapter 5 provides one of the best available descriptions of how evolutionary trees are con- structed from sequence data. Rokas, A., Williams, B.L., King, N., & Carroll, S.B. (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425, 798–804. How sequence comparisons of multiple proteins can yield accu- rate evolutionary information. Zuckerkandl, E. & Pauling, L. (1965) Molecules as documents of evolutionary history. J. Theor. Biol. 8, 357–366. Considered by many the founding paper in the field of molecu- lar evolution. 1. Absolute Configuration of Citrulline The citrulline isolated from watermelons has the structure shown below. Is it a D- or L-amino acid? Explain. 2. Relationship between the Titration Curve and the Acid-Base Properties of Glycine A 100 mL solution of 0.1 M glycine at pH 1.72 was titrated with 2 M NaOH solution. The pH was monitored and the results were plotted on a graph, as shown at right. The key points in the titration are designated I to V. For each of the statements (a) to (o), iden- tify the appropriate key point in the titration and justify your choice. (a) Glycine is present predominantly as the species H11001 H 3 NOCH 2 OCOOH. (b) The average net charge of glycine is H11001 1 H5008 2 . (c) Half of the amino groups are ionized. (d) The pH is equal to the pK a of the carboxyl group. (e) The pH is equal to the pK a of the protonated amino group. (f) Glycine has its maximum buffering capacity. (g) The average net charge of glycine is zero. (h) The carboxyl group has been completely titrated (first equivalence point). (i) Glycine is completely titrated (second equivalence point). (j) The predominant species is H11001 H 3 NOCH 2 OCOO H11002 . (k) The average net charge of glycine is H110021. (l) Glycine is present predominantly as a 50:50 mixture of H11001 H 3 NOCH 2 OCOOH and H11001 H 3 NOCH 2 OCOO H11002 . (m) This is the isoelectric point. (n) This is the end of the titration. (o) These are the worst pH regions for buffering power. 3. How Much Alanine Is Present as the Completely Uncharged Species? At a pH equal to the isoelectric point of alanine, the net charge on alanine is zero. Two structures can be drawn that have a net charge of zero, but the pre- dominant form of alanine at its pI is zwitterionic. (a) Why is alanine predominantly zwitterionic rather than completely uncharged at its pI? (b) What fraction of alanine is in the completely un- charged form at its pI? Justify your assumptions. H11001 C CH 3 H 3 N H C O O H11002 Zwitterionic Uncharged C CH 3 H 2 N H C O OH 12 2 4 6 8 0 11.30 0.5 OH H11002 (equivalents) pH 1.0 1.5 2.0 (V) 9.60 (IV) (III) 2.34 (I) (II) 5.97 10 CC O )H (CH NH 2 NH 222 P H C N H11001 H 3 COO H11002 Problems 8885d_c03_112 12/30/03 7:11 AM Page 112 mac76 mac76:385_reb: Chapter 3 Problems 113 4. Ionization State of Amino Acids Each ionizable group of an amino acid can exist in one of two states, charged or neutral. The electric charge on the functional group is de- termined by the relationship between its pK a and the pH of the solution. This relationship is described by the Henderson- Hasselbalch equation. (a) Histidine has three ionizable functional groups. Write the equilibrium equations for its three ionizations and assign the proper pK a for each ionization. Draw the structure of histidine in each ionization state. What is the net charge on the histidine molecule in each ionization state? (b) Draw the structures of the predominant ionization state of histidine at pH 1, 4, 8, and 12. Note that the ioniza- tion state can be approximated by treating each ionizable group independently. (c) What is the net charge of histidine at pH 1, 4, 8, and 12? For each pH, will histidine migrate toward the anode (H11001) or cathode (H11002) when placed in an electric field? 5. Separation of Amino Acids by Ion-Exchange Chro- matography Mixtures of amino acids are analyzed by first separating the mixture into its components through ion- exchange chromatography. Amino acids placed on a cation- exchange resin containing sulfonate groups (see Fig. 3–18a) flow down the column at different rates because of two fac- tors that influence their movement: (1) ionic attraction be- tween the OSO 3 H11002 residues on the column and positively charged functional groups on the amino acids, and (2) hy- drophobic interactions between amino acid side chains and the strongly hydrophobic backbone of the polystyrene resin. For each pair of amino acids listed, determine which will be eluted first from an ion-exchange column using a pH 7.0 buffer. (a) Asp and Lys (b) Arg and Met (c) Glu and Val (d) Gly and Leu (e) Ser and Ala 6. Naming the Stereoisomers of Isoleucine The struc- ture of the amino acid isoleucine is (a) How many chiral centers does it have? (b) How many optical isomers? (c) Draw perspective formulas for all the optical isomers of isoleucine. 7. Comparing the pK a Values of Alanine and Polyala- nine The titration curve of alanine shows the ionization of two functional groups with pK a values of 2.34 and 9.69, corre- sponding to the ionization of the carboxyl and the protonated amino groups, respectively. The titration of di-, tri-, and larger oligopeptides of alanine also shows the ionization of only two functional groups, although the experimental pK a values are different. The trend in pK a values is summarized in the table. (a) Draw the structure of Ala–Ala–Ala. Identify the func- tional groups associated with pK 1 and pK 2 . (b) Why does the value of pK 1 increase with each addition of an Ala residue to the Ala oligopeptide? (c) Why does the value of pK 2 decrease with each ad- dition of an Ala residue to the Ala oligopeptide? 8. The Size of Proteins What is the approximate molec- ular weight of a protein with 682 amino acid residues in a sin- gle polypeptide chain? 9. The Number of Tryptophan Residues in Bovine Serum Albumin A quantitative amino acid analysis reveals that bovine serum albumin (BSA) contains 0.58% tryptophan (M r 204) by weight. (a) Calculate the minimum molecular weight of BSA (i.e., assuming there is only one tryptophan residue per pro- tein molecule). (b) Gel filtration of BSA gives a molecular weight esti- mate of 70,000. How many tryptophan residues are present in a molecule of serum albumin? 10. Net Electric Charge of Peptides A peptide has the sequence Glu–His–Trp–Ser–Gly–Leu–Arg–Pro–Gly (a) What is the net charge of the molecule at pH 3, 8, and 11? (Use pK a values for side chains and terminal amino and carboxyl groups as given in Table 3–1.) (b) Estimate the pI for this peptide. 11. Isoelectric Point of Pepsin Pepsin is the name given to several digestive enzymes secreted (as larger precursor proteins) by glands that line the stomach. These glands also secrete hydrochloric acid, which dissolves the particulate matter in food, allowing pepsin to enzymatically cleave indi- vidual protein molecules. The resulting mixture of food, HCl, and digestive enzymes is known as chyme and has a pH near 1.5. What pI would you predict for the pepsin proteins? What functional groups must be present to confer this pI on pepsin? Which amino acids in the proteins would contribute such groups? 12. The Isoelectric Point of Histones Histones are pro- teins found in eukaryotic cell nuclei, tightly bound to DNA, which has many phosphate groups. The pI of histones is very high, about 10.8. What amino acid residues must be present in relatively large numbers in histones? In what way do these residues contribute to the strong binding of histones to DNA? 13. Solubility of Polypeptides One method for separat- ing polypeptides makes use of their differential solubilities. The solubility of large polypeptides in water depends upon the relative polarity of their R groups, particularly on the num- ber of ionized groups: the more ionized groups there are, the more soluble the polypeptide. Which of each pair of the polypeptides that follow is more soluble at the indicated pH? Amino acid or peptide pK 1 pK 2 Ala 2.34 9.69 Ala–Ala 3.12 8.30 Ala–Ala–Ala 3.39 8.03 Ala–(Ala) n –Ala, n H11350 4 3.42 7.94 HC H 3 N H C COO H11002 H CH 2 CH 3 CH 3 8885d_c03_113 1/16/04 6:09 AM Page 113 mac76 mac76:385_reb: Chapter 3 Amino Acids, Peptides, and Proteins114 (a) (Gly) 20 or (Glu) 20 at pH 7.0 (b) (Lys–Ala) 3 or (Phe–Met) 3 at pH 7.0 (c) (Ala–Ser–Gly) 5 or (Asn–Ser–His) 5 at pH 6.0 (d) (Ala–Asp–Gly) 5 or (Asn–Ser–His) 5 at pH 3.0 14. Purification of an Enzyme A biochemist discovers and purifies a new enzyme, generating the purification table below. (a) From the information given in the table, calculate the specific activity of the enzyme solution after each purifi- cation procedure. (b) Which of the purification procedures used for this enzyme is most effective (i.e., gives the greatest relative in- crease in purity)? (c) Which of the purification procedures is least effective? (d) Is there any indication based on the results shown in the table that the enzyme after step 6 is now pure? What else could be done to estimate the purity of the enzyme prepa- ration? 15. Sequence Determination of the Brain Peptide Leucine Enkephalin A group of peptides that influence nerve transmission in certain parts of the brain has been iso- lated from normal brain tissue. These peptides are known as opioids, because they bind to specific receptors that also bind opiate drugs, such as morphine and naloxone. Opioids thus mimic some of the properties of opiates. Some researchers consider these peptides to be the brain’s own pain killers. Us- ing the information below, determine the amino acid sequence of the opioid leucine enkephalin. Explain how your structure is consistent with each piece of information. (a) Complete hydrolysis by 6 M HCl at 110 H11034C followed by amino acid analysis indicated the presence of Gly, Leu, Phe, and Tyr, in a 2:1:1:1 molar ratio. (b) Treatment of the peptide with 1-fluoro-2,4-dini- trobenzene followed by complete hydrolysis and chromatog- raphy indicated the presence of the 2,4-dinitrophenyl deriv- ative of tyrosine. No free tyrosine could be found. (c) Complete digestion of the peptide with pepsin fol- lowed by chromatography yielded a dipeptide containing Phe and Leu, plus a tripeptide containing Tyr and Gly in a 1:2 ratio. 16. Structure of a Peptide Antibiotic from Bacillus bre- vis Extracts from the bacterium Bacillus brevis contain a peptide with antibiotic properties. This peptide forms com- plexes with metal ions and apparently disrupts ion transport across the cell membranes of other bacterial species, killing them. The structure of the peptide has been determined from the following observations. (a) Complete acid hydrolysis of the peptide followed by amino acid analysis yielded equimolar amounts of Leu, Orn, Phe, Pro, and Val. Orn is ornithine, an amino acid not present in proteins but present in some peptides. It has the structure (b) The molecular weight of the peptide was estimated as about 1,200. (c) The peptide failed to undergo hydrolysis when treated with the enzyme carboxypeptidase. This enzyme cat- alyzes the hydrolysis of the carboxyl-terminal residue of a polypeptide unless the residue is Pro or, for some reason, does not contain a free carboxyl group. (d) Treatment of the intact peptide with 1-fluoro-2,4- dinitrobenzene, followed by complete hydrolysis and chro- matography, yielded only free amino acids and the following derivative: (Hint: Note that the 2,4-dinitrophenyl derivative involves the amino group of a side chain rather than the H9251-amino group.) (e) Partial hydrolysis of the peptide followed by chro- matographic separation and sequence analysis yielded the fol- lowing di- and tripeptides (the amino-terminal amino acid is always at the left): Leu–Phe Phe–Pro Orn–Leu Val–Orn Val–Orn–Leu Phe–Pro–Val Pro–Val–Orn Given the above information, deduce the amino acid sequence of the peptide antibiotic. Show your reasoning. When you have arrived at a structure, demonstrate that it is consistent with each experimental observation. 17. Efficiency in Peptide Sequencing A peptide with the primary structure Lys–Arg–Pro–Leu–Ile–Asp–Gly–Ala is se- quenced by the Edman procedure. If each Edman cycle is 96% efficient, what percentage of the amino acids liberated in the fourth cycle will be leucine? Do the calculation a sec- ond time, but assume a 99% efficiency for each cycle. 18. Biochemistry Protocols: Your First Protein Purifi- cation As the newest and least experienced student in a biochemistry research lab, your first few weeks are spent washing glassware and labeling test tubes. You then graduate to making buffers and stock solutions for use in various lab- oratory procedures. Finally, you are given responsibility for purifying a protein. It is a citric acid cycle enzyme, citrate synthase, located in the mitochondrial matrix. Following a protocol for the purification, you proceed through the steps below. As you work, a more experienced student questions you about the rationale for each procedure. Supply the an- swers. (Hint: See Chapter 2 for information about osmolar- ity; see p. 6 for information on separation of organelles from cells.) (a) You pick up 20 kg of beef hearts from a nearby slaughterhouse. You transport the hearts on ice, and perform NO 2 CH 2 CH 2 H11001 NH 3 O 2 N COO H11002 CH 2 NH C H CH 2 CH 2 CH 2 C COO H11002 H 3 N H H11001 NH 3 H11001 Total protein Activity Procedure (mg) (units) 1. Crude extract 20,000 4,000,000 2. Precipitation (salt) 5,000 3,000,000 3. Precipitation (pH) 4,000 1,000,000 4. Ion-exchange chromatography 200 800,000 5. Affinity chromatography 50 750,000 6. Size-exclusion chromatography 45 675,000 8885d_c03_114 12/23/03 10:29 AM Page 114 mac111 mac111:reb: Chapter 3 Problems 115 each step of the purification on ice or in a walk-in cold room. You homogenize the beef heart tissue in a high-speed blender in a medium containing 0.2 M sucrose, buffered to a pH of 7.2. Why do you use beef heart tissue, and in such large quan- tity? What is the purpose of keeping the tissue cold and suspending it in 0.2 M sucrose, at pH 7.2? What happens to the tissue when it is homogenized? (b) You subject the resulting heart homogenate, which is dense and opaque, to a series of differential centrifugation steps. What does this accomplish? (c) You proceed with the purification using the super- natant fraction that contains mostly intact mitochondria. Next you osmotically lyse the mitochondria. The lysate, which is less dense than the homogenate, but still opaque, consists primarily of mitochondrial membranes and internal mito- chondrial contents. To this lysate you add ammonium sulfate, a highly soluble salt, to a specific concentration. You cen- trifuge the solution, decant the supernatant, and discard the pellet. To the supernatant, which is clearer than the lysate, you add more ammonium sulfate. Once again, you centrifuge the sample, but this time you save the pellet because it con- tains the protein of interest. What is the rationale for the two-step addition of the salt? (d) You solubilize the ammonium sulfate pellet contain- ing the mitochondrial proteins and dialyze it overnight against large volumes of buffered (pH 7.2) solution. Why isn’t am- monium sulfate included in the dialysis buffer? Why do you use the buffer solution instead of water? (e) You run the dialyzed solution over a size-exclusion chromatographic column. Following the protocol, you collect the first protein fraction that exits the column, and discard the rest of the fractions that elute from the column later. You detect the protein by measuring UV absorbance (at 280 nm) in the fractions. What does the instruction to collect the first fraction tell you about the protein? Why is UV ab- sorbance at 280 nm a good way to monitor for the pres- ence of protein in the eluted fractions? (f) You place the fraction collected in (e) on a cation- exchange chromatographic column. After discarding the ini- tial solution that exits the column (the flowthrough), you add a washing solution of higher pH to the column and collect the protein fraction that immediately elutes. Explain what you are doing. (g) You run a small sample of your fraction, now very reduced in volume and quite clear (though tinged pink), on an isoelectric focusing gel. When stained, the gel shows three sharp bands. According to the protocol, the protein of inter- est is the one with the pI of 5.6, but you decide to do one more assay of the protein’s purity. You cut out the pI 5.6 band and subject it to SDS polyacrylamide gel electrophoresis. The protein resolves as a single band. Why were you uncon- vinced of the purity of the “single” protein band on your isoelectric focusing gel? What did the results of the SDS gel tell you? Why is it important to do the SDS gel elec- trophoresis after the isoelectric focusing? 8885d_c03_115 1/16/04 6:09 AM Page 115 mac76 mac76:385_reb: chapter THE THREE-DIMENSIONAL STRUCTURE OF PROTEINS 4.1 Overview of Protein Structure 116 4.2 Protein Secondary Structure 120 4.3 Protein Tertiary and Quaternary Structures 125 4.4 Protein Denaturation and Folding 147 Perhaps the more remarkable features of [myoglobin] are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regulari- ties which one instinctively anticipates, and it is more complicated than has been predicted by any theory of protein structure. —John Kendrew, article in Nature, 1958 4 T he covalent backbone of a typical protein contains hundreds of individual bonds. Because free rotation is possible around many of these bonds, the protein can assume an unlimited number of conformations. How- ever, each protein has a specific chemical or structural function, strongly suggesting that each has a unique three-dimensional structure (Fig. 4–1). By the late 1920s, several proteins had been crystallized, including hemoglobin (M r 64,500) and the enzyme urease (M r 483,000). Given that the ordered array of molecules in a crystal can generally form only if the molecular units are identical, the simple fact that many proteins can be crystallized provides strong evidence that even very large proteins are discrete chemical entities with unique structures. This conclusion revolutionized thinking about proteins and their functions. In this chapter, we explore the three-dimensional structure of proteins, emphasizing five themes. First, the three-dimensional structure of a protein is deter- mined by its amino acid sequence. Second, the function of a protein depends on its structure. Third, an isolated protein usually exists in one or a small number of sta- ble structural forms. Fourth, the most important forces stabilizing the specific structures maintained by a given protein are noncovalent interactions. Finally, amid the huge number of unique protein structures, we can rec- ognize some common structural patterns that help us organize our understanding of protein architecture. These themes should not be taken to imply that pro- teins have static, unchanging three-dimensional struc- tures. Protein function often entails an interconversion between two or more structural forms. The dynamic as- pects of protein structure will be explored in Chapters 5 and 6. The relationship between the amino acid sequence of a protein and its three-dimensional structure is an in- tricate puzzle that is gradually yielding to techniques used in modern biochemistry. An understanding of structure, in turn, is essential to the discussion of func- tion in succeeding chapters. We can find and understand the patterns within the biochemical labyrinth of protein structure by applying fundamental principles of chem- istry and physics. 4.1 Overview of Protein Structure The spatial arrangement of atoms in a protein is called its conformation. The possible conformations of a pro- tein include any structural state that can be achieved without breaking covalent bonds. A change in confor- mation could occur, for example, by rotation about sin- gle bonds. Of the numerous conformations that are theoretically possible in a protein containing hundreds of single bonds, one or (more commonly) a few gener- ally predominate under biological conditions. The need for multiple stable conformations reflects the changes that must occur in most proteins as they bind to other 116 8885d_c04_116 12/23/03 7:43 AM Page 116 mac111 mac111:reb: molecules or catalyze reactions. The conformations ex- isting under a given set of conditions are usually the ones that are thermodynamically the most stable, hav- ing the lowest Gibbs free energy (G). Proteins in any of their functional, folded conformations are called native proteins. What principles determine the most stable confor- mations of a protein? An understanding of protein con- formation can be built stepwise from the discussion of primary structure in Chapter 3 through a consideration of secondary, tertiary, and quaternary structures. To this traditional approach must be added a new emphasis on supersecondary structures, a growing set of known and classifiable protein folding patterns that provides an im- portant organizational context to this complex endeavor. We begin by introducing some guiding principles. A Protein’s Conformation Is Stabilized Largely by Weak Interactions In the context of protein structure, the term stability can be defined as the tendency to maintain a native con- formation. Native proteins are only marginally stable; the H9004G separating the folded and unfolded states in typ- ical proteins under physiological conditions is in the range of only 20 to 65 kJ/mol. A given polypeptide chain can theoretically assume countless different conforma- tions, and as a result the unfolded state of a protein is characterized by a high degree of conformational en- tropy. This entropy, and the hydrogen-bonding interac- tions of many groups in the polypeptide chain with sol- vent (water), tend to maintain the unfolded state. The chemical interactions that counteract these effects and stabilize the native conformation include disulfide bonds and the weak (noncovalent) interactions described in Chapter 2: hydrogen bonds, and hydrophobic and ionic interactions. An appreciation of the role of these weak interactions is especially important to our understand- ing of how polypeptide chains fold into specific sec- ondary and tertiary structures, and how they combine with other polypeptides to form quaternary structures. About 200 to 460 kJ/mol are required to break a sin- gle covalent bond, whereas weak interactions can be dis- rupted by a mere 4 to 30 kJ/mol. Individual covalent bonds that contribute to the native conformations of proteins, such as disulfide bonds linking separate parts of a single polypeptide chain, are clearly much stronger than individual weak interactions. Yet, because they are so numerous, it is weak interactions that predominate as a stabilizing force in protein structure. In general, the protein conformation with the lowest free energy (that is, the most stable conformation) is the one with the maximum number of weak interactions. The stability of a protein is not simply the sum of the free energies of formation of the many weak inter- actions within it. Every hydrogen-bonding group in a folded polypeptide chain was hydrogen-bonded to wa- ter prior to folding, and for every hydrogen bond formed in a protein, a hydrogen bond (of similar strength) be- tween the same group and water was broken. The net stability contributed by a given weak interaction, or the difference in free energies of the folded and unfolded states, may be close to zero. We must therefore look elsewhere to explain why the native conformation of a protein is favored. We find that the contribution of weak interactions to protein stability can be understood in terms of the properties of water (Chapter 2). Pure water contains a network of hydrogen-bonded H 2 O molecules. No other molecule has the hydrogen-bonding potential of water, and other molecules present in an aqueous solution dis- rupt the hydrogen bonding of water. When water sur- rounds a hydrophobic molecule, the optimal arrange- ment of hydrogen bonds results in a highly structured shell, or solvation layer, of water in the immediate vicinity. The increased order of the water molecules in the solvation layer correlates with an unfavorable de- crease in the entropy of the water. However, when non- polar groups are clustered together, there is a decrease in the extent of the solvation layer because each group no longer presents its entire surface to the solution. The result is a favorable increase in entropy. As described in 4.1 Overview of Protein Structure 117 FIGURE 4–1 Structure of the enzyme chymotrypsin, a globular pro- tein. Proteins are large molecules and, as we shall see, each has a unique structure. A molecule of glycine (blue) is shown for size com- parison. The known three-dimensional structures of proteins are archived in the Protein Data Bank, or PDB (www.rcsb.org/pdb). Each structure is assigned a unique four-character identifier, or PDB ID. Where appropriate, we will provide the PDB IDs for molecular graphic images in the figure captions. The image shown here was made using data from the PDB file 6GCH. The data from the PDB files provide only a series of coordinates detailing the location of atoms and their connectivity. Viewing the images requires easy-to-use graphics pro- grams such as RasMol and Chime that convert the coordinates into an image and allow the viewer to manipulate the structure in three dimensions. You will find instructions for downloading Chime with the structure tutorials on the textbook website (www.whfreeman. com/lehninger). The PDB website has instructions for downloading other viewers. We encourage all students to take advantage of the re- sources of the PDB and the free molecular graphics programs. 8885d_c04_117 12/23/03 7:43 AM Page 117 mac111 mac111:reb: Chapter 2, this entropy term is the major thermody- namic driving force for the association of hydrophobic groups in aqueous solution. Hydrophobic amino acid side chains therefore tend to be clustered in a protein’s interior, away from water. Under physiological conditions, the formation of hydrogen bonds and ionic interactions in a protein is driven largely by this same entropic effect. Polar groups can generally form hydrogen bonds with water and hence are soluble in water. However, the number of hy- drogen bonds per unit mass is generally greater for pure water than for any other liquid or solution, and there are limits to the solubility of even the most polar mole- cules as their presence causes a net decrease in hydro- gen bonding per unit mass. Therefore, a solvation shell of structured water will also form to some extent around polar molecules. Even though the energy of formation of an intramolecular hydrogen bond or ionic interaction between two polar groups in a macromolecule is largely canceled out by the elimination of such interactions be- tween the same groups and water, the release of struc- tured water when the intramolecular interaction is formed provides an entropic driving force for folding. Most of the net change in free energy that occurs when weak interactions are formed within a protein is there- fore derived from the increased entropy in the sur- rounding aqueous solution resulting from the burial of hydrophobic surfaces. This more than counterbalances the large loss of conformational entropy as a polypep- tide is constrained into a single folded conformation. Hydrophobic interactions are clearly important in stabilizing a protein conformation; the interior of a pro- tein is generally a densely packed core of hydrophobic amino acid side chains. It is also important that any po- lar or charged groups in the protein interior have suit- able partners for hydrogen bonding or ionic interactions. One hydrogen bond seems to contribute little to the stability of a native structure, but the presence of hydrogen-bonding or charged groups without partners in the hydrophobic core of a protein can be so destabi- lizing that conformations containing these groups are often thermodynamically untenable. The favorable free- energy change realized by combining such a group with a partner in the surrounding solution can be greater than the difference in free energy between the folded and unfolded states. In addition, hydrogen bonds between groups in proteins form cooperatively. Formation of one hydrogen bond facilitates the formation of additional hy- drogen bonds. The overall contribution of hydrogen bonds and other noncovalent interactions to the stabi- lization of protein conformation is still being evaluated. The interaction of oppositely charged groups that form an ion pair (salt bridge) may also have a stabilizing effect on one or more native conformations of some proteins. Most of the structural patterns outlined in this chap- ter reflect two simple rules: (1) hydrophobic residues are largely buried in the protein interior, away from wa- ter; and (2) the number of hydrogen bonds within the protein is maximized. Insoluble proteins and proteins within membranes (which we examine in Chapter 11) follow somewhat different rules because of their func- tion or their environment, but weak interactions are still critical structural elements. The Peptide Bond Is Rigid and Planar Protein Architecture—Primary Structure Covalent bonds also place important constraints on the conformation of a polypeptide. In the late 1930s, Linus Pauling and Robert Corey embarked on a series of studies that laid the foun- dation for our present understanding of protein struc- ture. They began with a careful analysis of the peptide bond. The H9251 carbons of adjacent amino acid residues are separated by three covalent bonds, arranged as C H9251 OCONOC H9251 . X-ray diffraction studies of crystals of amino acids and of simple dipeptides and tripeptides demonstrated that the peptide CON bond is somewhat shorter than the CON bond in a simple amine and that the atoms associated with the peptide bond are co- planar. This indicated a resonance or partial sharing of two pairs of electrons between the carbonyl oxygen and the amide nitrogen (Fig. 4–2a). The oxygen has a par- tial negative charge and the nitrogen a partial positive charge, setting up a small electric dipole. The six atoms of the peptide group lie in a single plane, with the oxy- gen atom of the carbonyl group and the hydrogen atom of the amide nitrogen trans to each other. From these findings Pauling and Corey concluded that the peptide CON bonds are unable to rotate freely because of their partial double-bond character. Rotation is permitted about the NOC H9251 and the C H9251 OC bonds. The backbone of a polypeptide chain can thus be pictured as a series of rigid planes with consecutive planes sharing a com- mon point of rotation at C H9251 (Fig. 4–2b). The rigid pep- tide bonds limit the range of conformations that can be assumed by a polypeptide chain. By convention, the bond angles resulting from ro- tations at C H9251 are labeled H9278 (phi) for the NOC H9251 bond and H9274 (psi) for the C H9251 OC bond. Again by convention, both H9278 and H9274 are defined as 180H11034 when the polypeptide is in its fully extended conformation and all peptide groups are in the same plane (Fig. 4–2b). In principle, H9278 and H9274 can have any value between H11002180H11034 and H11001180H11034, but many values are prohibited by steric interference between atoms in the polypeptide backbone and amino acid side chains. The conformation in which both H9278 and H9274 are 0H11034 (Fig. 4–2c) is prohibited for this reason; this conformation is used merely as a reference point for de- scribing the angles of rotation. Allowed values for H9278 and H9274 are graphically revealed when H9274 is plotted versus H9278 in a Ramachandran plot (Fig. 4–3), introduced by G. N. Ramachandran. Chapter 4 The Three-Dimensional Structure of Proteins118 8885d_c04_118 12/23/03 7:43 AM Page 118 mac111 mac111:reb: 4.1 Overview of Protein Structure 119 C O N H C O H9254H11002 N H9254H11001 H11001 H C O H11002 N H The carbonyl oxygen has a partial negative charge and the amide nitrogen a partial positive charge, setting up a small electric dipole. Virtually all peptide bonds in proteins occur in this trans configuration; an exception is noted in Figure 4–8b. (a) C H9251 C H9251 C H9251 C H9251 C H9251 C H9251 C a Amino terminus H N–Ca Ca–C C–N C R O C a 1.24 ? 1.32 ? 1.46 ? 1.53 ? fw fw f w Carboxyl terminus (b) N w f C a C a C a N H H R N C C O O (c) FIGURE 4–2 The planar peptide group. (a) Each peptide bond has some double-bond character due to resonance and cannot rotate. (b) Three bonds separate sequential H9251 carbons in a polypeptide chain. The NOC H9251 and C H9251 OC bonds can rotate, with bond angles designated H9278 and H9274, respectively. The peptide CON bond is not free to rotate. Other single bonds in the backbone may also be rotationally hindered, depending on the size and charge of the R groups. In the conformation shown, H9278 and H9274 are 180H11034 (or H11002180H11034). As one looks out from the H9251 carbon, the H9274 and H9278 angles increase as the carbonyl or amide nitrogens (respectively) rotate clockwise. (c) By convention, both H9278 and H9274 are defined as 0H11034 when the two peptide bonds flanking that H9251 carbon are in the same plane and positioned as shown. In a protein, this conformation is prohibited by steric overlap between an H9251-carbonyl oxygen and an H9251-amino hydrogen atom. To illustrate the bonds between atoms, the balls representing each atom are smaller than the van der Waals radii for this scale. 1 ? H11005 0.1 nm. H11001180 120 60 0 H1100260 H11002120 H11002180 H110011800H11002180 w (degrees) f (degrees) FIGURE 4–3 Ramachandran plot for L-Ala residues. The conformations of peptides are defined by the values of H9278 and H9274. Conformations deemed possible are those that involve little or no steric interference, based on calculations using known van der Waals radii and bond angles. The areas shaded dark blue reflect conformations that involve no steric overlap and thus are fully allowed; medium blue indicates conformations allowed at the extreme limits for unfavorable atomic contacts; the lightest blue area reflects conformations that are permissible if a little flexibility is allowed in the bond angles. The asymmetry of the plot results from the L stereochemistry of the amino acid residues. The plots for other L-amino acid residues with unbranched side chains are nearly identical. The allowed ranges for branched amino acid residues such as Val, Ile, and Thr are somewhat smaller than for Ala. The Gly residue, which is less sterically hindered, exhibits a much broader range of allowed conformations. The range for Pro residues is greatly restricted because H9278 is limited by the cyclic side chain to the range of H1100235H11034 to H1100285H11034. 8885d_c04_119 12/30/03 2:13 PM Page 119 mac76 mac76:385_reb: SUMMARY 4.1 Overview of Protein Structure ■ Every protein has a three-dimensional structure that reflects its function. ■ Protein structure is stabilized by multiple weak interactions. Hydrophobic interactions are the major contributors to stabilizing the globular form of most soluble proteins; hydrogen bonds and ionic interactions are optimized in the specific structures that are thermodynamically most stable. ■ The nature of the covalent bonds in the polypeptide backbone places constraints on structure. The peptide bond has a partial double- bond character that keeps the entire six-atom peptide group in a rigid planar configuration. The NOC H9251 and C H9251 OC bonds can rotate to assume bond angles of H9278 and H9274, respectively. 4.2 Protein Secondary Structure The term secondary structure refers to the local con- formation of some part of a polypeptide. The discussion of secondary structure most usefully focuses on com- mon regular folding patterns of the polypeptide back- bone. A few types of secondary structure are particu- larly stable and occur widely in proteins. The most prominent are the H9251 helix and H9252 conformations de- scribed below. Using fundamental chemical principles and a few experimental observations, Pauling and Corey predicted the existence of these secondary structures in 1951, several years before the first complete protein structure was elucidated. The H9251 Helix Is a Common Protein Secondary Structure Protein Architecture—H9251 Helix Pauling and Corey were aware of the importance of hydrogen bonds in orient- ing polar chemical groups such as the CPO and NOH groups of the peptide bond. They also had the experi- mental results of William Astbury, who in the 1930s had conducted pioneering x-ray studies of proteins. Astbury demonstrated that the protein that makes up hair and porcupine quills (the fibrous protein H9251-keratin) has a regular structure that repeats every 5.15 to 5.2 ?. (The angstrom, ?, named after the physicist Anders J. ?ngstr?m, is equal to 0.1 nm. Although not an SI unit, it is used universally by structural biologists to describe atomic distances.) With this information and their data on the peptide bond, and with the help of precisely con- structed models, Pauling and Corey set out to deter- mine the likely conformations of protein molecules. The simplest arrangement the polypeptide chain could assume with its rigid peptide bonds (but other single bonds free to rotate) is a helical structure, which Pauling and Corey called the H9251 helix (Fig. 4–4). In this structure the polypeptide backbone is tightly wound around an imaginary axis drawn longitudinally through the middle of the helix, and the R groups of the amino acid residues protrude outward from the helical back- bone. The repeating unit is a single turn of the helix, which extends about 5.4 ? along the long axis, slightly greater than the periodicity Astbury observed on x-ray analysis of hair keratin. The amino acid residues in an H9251 helix have conformations with H9274 H11005H1100245H11034 to H1100250H11034 and H9278 H11005H1100260H11034, and each helical turn includes 3.6 amino acid residues. The helical twist of the H9251 helix found in all pro- teins is right-handed (Box 4–1). The H9251 helix proved to be the predominant structure in H9251-keratins. More gen- erally, about one-fourth of all amino acid residues in polypeptides are found in H9251 helices, the exact fraction varying greatly from one protein to the next. Why does the H9251 helix form more readily than many other possible conformations? The answer is, in part, that an H9251 helix makes optimal use of internal hydrogen bonds. The structure is stabilized by a hydrogen bond between the hydrogen atom attached to the elec- tronegative nitrogen atom of a peptide linkage and the electronegative carbonyl oxygen atom of the fourth amino acid on the amino-terminal side of that peptide bond (Fig. 4–4b). Within the H9251 helix, every peptide bond (except those close to each end of the helix) partici- pates in such hydrogen bonding. Each successive turn of the H9251 helix is held to adjacent turns by three to four hydrogen bonds. All the hydrogen bonds combined give the entire helical structure considerable stability. Further model-building experiments have shown that an H9251 helix can form in polypeptides consisting of either L- or D-amino acids. However, all residues must be of one stereoisomeric series; a D-amino acid will dis- rupt a regular structure consisting of L-amino acids, and vice versa. Naturally occurring L-amino acids can form either right- or left-handed H9251 helices, but extended left- handed helices have not been observed in proteins. Chapter 4 The Three-Dimensional Structure of Proteins120 Linus Pauling, 1901–1994 Robert Corey, 1897–1971 8885d_c04_120 12/23/03 7:44 AM Page 120 mac111 mac111:reb: Amino Acid Sequence Affects H9251 Helix Stability Not all polypeptides can form a stable H9251 helix. Interac- tions between amino acid side chains can stabilize or destabilize this structure. For example, if a polypeptide chain has a long block of Glu residues, this segment of the chain will not form an H9251 helix at pH 7.0. The nega- tively charged carboxyl groups of adjacent Glu residues repel each other so strongly that they prevent forma- tion of the H9251 helix. For the same reason, if there are many adjacent Lys and/or Arg residues, which have pos- itively charged R groups at pH 7.0, they will also repel each other and prevent formation of the H9251 helix. The bulk and shape of Asn, Ser, Thr, and Cys residues can also destabilize an H9251 helix if they are close together in the chain. The twist of an H9251 helix ensures that critical inter- actions occur between an amino acid side chain and the side chain three (and sometimes four) residues away on either side of it (Fig. 4–5). Positively charged amino acids are often found three residues away from nega- tively charged amino acids, permitting the formation of an ion pair. Two aromatic amino acid residues are often similarly spaced, resulting in a hydrophobic interaction. 4.2 Protein Secondary Structure 121 (b) Carbon Hydrogen Oxygen Nitrogen R group 5.4 ? (3.6 residues) (a) Carboxyl terminus Amino terminus (c) (d) FIGURE 4–4 Four models of the H9251 helix, showing different aspects of its structure. (a) Formation of a right-handed H9251 helix. The planes of the rigid peptide bonds are parallel to the long axis of the helix, depicted here as a vertical rod. (b) Ball-and-stick model of a right- handed H9251 helix, showing the intrachain hydrogen bonds. The repeat unit is a single turn of the helix, 3.6 residues. (c) The H9251 helix as viewed from one end, looking down the longitudinal axis (derived from PDB ID 4TNC). Note the positions of the R groups, represented by purple spheres. This ball-and-stick model, used to emphasize the helical arrangement, gives the false impression that the helix is hollow, be- cause the balls do not represent the van der Waals radii of the indi- vidual atoms. As the space-filling model (d) shows, the atoms in the center of the H9251 helix are in very close contact. FIGURE 4–5 Interactions between R groups of amino acids three residues apart in an H9251 helix. An ionic interaction between Asp 100 and Arg 103 in an H9251-helical region of the protein troponin C, a calcium- binding protein associated with muscle, is shown in this space-filling model (derived from PDB ID 4TNC). The polypeptide backbone (car- bons, H9251-amino nitrogens, and H9251-carbonyl oxygens) is shown in gray for a helix segment 13 residues long. The only side chains represented here are the interacting Asp (red) and Arg (blue) side chains. 8885d_c04_121 12/23/03 7:44 AM Page 121 mac111 mac111:reb: A constraint on the formation of the H9251 helix is the presence of Pro or Gly residues. In proline, the nitrogen atom is part of a rigid ring (see Fig. 4–8b), and rotation about the NOC H9251 bond is not possible. Thus, a Pro residue introduces a destabilizing kink in an H9251 helix. In addition, the nitrogen atom of a Pro residue in peptide linkage has no substituent hydrogen to participate in hy- drogen bonds with other residues. For these reasons, proline is only rarely found within an H9251 helix. Glycine occurs infrequently in H9251 helices for a different reason: it has more conformational flexibility than the other amino acid residues. Polymers of glycine tend to take up coiled structures quite different from an H9251 helix. A final factor affecting the stability of an H9251 helix in a polypeptide is the identity of the amino acid residues near the ends of the H9251-helical segment. A small electric dipole exists in each peptide bond (Fig. 4–2a). These dipoles are connected through the hydrogen bonds of the helix, resulting in a net dipole extending along the helix that increases with helix length (Fig. 4–6). The four amino acid residues at each end of the helix do not participate fully in the helix hydrogen bonds. The par- tial positive and negative charges of the helix dipole ac- tually reside on the peptide amino and carbonyl groups near the amino-terminal and carboxyl-terminal ends of the helix, respectively. For this reason, negatively charged amino acids are often found near the amino ter- minus of the helical segment, where they have a stabi- lizing interaction with the positive charge of the helix dipole; a positively charged amino acid at the amino- terminal end is destabilizing. The opposite is true at the carboxyl-terminal end of the helical segment. Thus, five different kinds of constraints affect the stability of an H9251 helix: (1) the electrostatic repulsion (or attraction) between successive amino acid residues with charged R groups, (2) the bulkiness of adjacent R groups, (3) the interactions between R groups spaced three (or four) residues apart, (4) the occurrence of Pro and Gly residues, and (5) the interaction between amino acid residues at the ends of the helical segment and the electric dipole inherent to the H9251 helix. The tendency of a given segment of a polypeptide chain to fold up as an H9251 helix therefore depends on the identity and sequence of amino acid residues within the segment. Chapter 4 The Three-Dimensional Structure of Proteins122 BOX 4–1 WORKING IN BIOCHEMISTRY Knowing the Right Hand from the Left There is a simple method for determining whether a helical structure is right-handed or left-handed. Make fists of your two hands with thumbs outstretched and pointing straight up. Looking at your right hand, think of a helix spiraling up your right thumb in the direc- tion in which the other four fingers are curled as shown (counterclockwise). The resulting helix is right-handed. Your left hand will demonstrate a left- handed helix, which rotates in the clockwise direction as it spirals up your thumb. – + – + – + – – – – + + + – + – – – + + + + d + d – Carboxyl terminus Amino terminus FIGURE 4–6 Helix dipole. The electric dipole of a peptide bond (see Fig. 4–2a) is transmitted along an H9251-helical segment through the in- trachain hydrogen bonds, resulting in an overall helix dipole. In this illustration, the amino and carbonyl constituents of each peptide bond are indicated by H11001 and H11002 symbols, respectively. Non-hydrogen- bonded amino and carbonyl constituents in the peptide bonds near each end of the H9251-helical region are shown in red. 8885d_c04_122 12/23/03 7:44 AM Page 122 mac111 mac111:reb: The H9252 Conformation Organizes Polypeptide Chains into Sheets Protein Architecture—H9252 Sheet Pauling and Corey predicted a second type of repetitive structure, the H9252 conforma- tion. This is a more extended conformation of polypep- tide chains, and its structure has been confirmed by x-ray analysis. In the H9252 conformation, the backbone of the polypeptide chain is extended into a zigzag rather than helical structure (Fig. 4–7). The zigzag polypep- tide chains can be arranged side by side to form a struc- ture resembling a series of pleats. In this arrangement, called a H9252 sheet, hydrogen bonds are formed between adjacent segments of polypeptide chain. The individual segments that form a H9252 sheet are usually nearby on the polypeptide chain, but can also be quite distant from each other in the linear sequence of the polypeptide; they may even be segments in different polypeptide chains. The R groups of adjacent amino acids protrude from the zigzag structure in opposite directions, creat- ing the alternating pattern seen in the side views in Fig- ure 4–7. The adjacent polypeptide chains in a H9252 sheet can be either parallel or antiparallel (having the same or opposite amino-to-carboxyl orientations, respectively). The structures are somewhat similar, although the repeat period is shorter for the parallel conformation (6.5 ?, versus 7 ? for antiparallel) and the hydrogen- bonding patterns are different. Some protein structures limit the kinds of amino acids that can occur in the H9252 sheet. When two or more H9252 sheets are layered close together within a protein, the R groups of the amino acid residues on the touching sur- faces must be relatively small. H9252-Keratins such as silk fibroin and the fibroin of spider webs have a very high content of Gly and Ala residues, the two amino acids with the smallest R groups. Indeed, in silk fibroin Gly and Ala alternate over large parts of the sequence. H9252 Turns Are Common in Proteins Protein Architecture—H9252 Turn In globular proteins, which have a compact folded structure, nearly one-third of the amino acid residues are in turns or loops where the polypeptide chain reverses direction (Fig. 4–8). These are the connecting elements that link successive runs of H9251 helix or H9252 conformation. Particularly common are H9252 turns that connect the ends of two adjacent segments of an antiparallel H9252 sheet. The structure is a 180H11034 turn involving four amino acid residues, with the carbonyl oxygen of the first residue forming a hydrogen bond with the amino-group hydrogen of the fourth. The peptide groups of the central two residues do not participate in any interresidue hydrogen bonding. Gly and Pro residues often occur in H9252 turns, the former because it is small and flexible, the latter because peptide bonds involving the imino nitrogen of proline readily assume the cis configuration (Fig. 4–8b), a form that is partic- ularly amenable to a tight turn. Of the several types of H9252 turns, the two shown in Figure 4–8a are the most com- mon. Beta turns are often found near the surface of a protein, where the peptide groups of the central two amino acid residues in the turn can hydrogen-bond with water. Considerably less common is the H9253 turn, a three- residue turn with a hydrogen bond between the first and third residues. 4.2 Protein Secondary Structure 123 (a) Antiparallel Top view Side view (b) Parallel Top view Side view FIGURE 4–7 The H9252 conformation of polypeptide chains. These top and side views reveal the R groups extending out from the H9252 sheet and emphasize the pleated shape described by the planes of the pep- tide bonds. (An alternative name for this structure is H9252-pleated sheet.) Hydrogen-bond cross-links between adjacent chains are also shown. (a) Antiparallel H9252 sheet, in which the amino-terminal to carboxyl- terminal orientation of adjacent chains (arrows) is inverse. (b) Parallel H9252 sheet. 8885d_c04_123 12/23/03 7:45 AM Page 123 mac111 mac111:reb: 1 Type I Type II (a) b Turns 2 3 4 R Cα Cα Cα Cα R R 1 2 3 4 H C O C R C O N O H trans cis H C RH C O N (b) Proline isomers ¨ ¨ ¨ ∑ C Glycine Common Secondary Structures Have Characteristic Bond Angles and Amino Acid Content The H9251 helix and the H9252 conformation are the major repet- itive secondary structures in a wide variety of proteins, although other repetitive structures do exist in some specialized proteins (an example is collagen; see Fig. 4–13 on page 128). Every type of secondary structure can be completely described by the bond angles H9278 and H9274 at each residue. As shown by a Ramachandran plot, the H9251 helix and H9252 conformation fall within a relatively re- stricted range of sterically allowed structures (Fig. 4–9a). Most values of H9278 and H9274 taken from known protein structures fall into the expected regions, with high con- centrations near the H9251 helix and H9252 conformation values as predicted (Fig. 4–9b). The only amino acid residue often found in a conformation outside these regions is glycine. Because its side chain, a single hydrogen atom, is small, a Gly residue can take part in many conforma- tions that are sterically forbidden for other amino acids. Some amino acids are accommodated better than others in the different types of secondary structures. An overall summary is presented in Figure 4–10. Some biases, such as the common presence of Pro and Gly residues in H9252 turns and their relative absence in H9251 he- lices, are readily explained by the known constraints on the different secondary structures. Other evident biases may be explained by taking into account the sizes or charges of side chains, but not all the trends shown in Figure 4–10 are understood. SUMMARY 4.2 Protein Secondary Structure ■ Secondary structure is the regular arrangement of amino acid residues in a segment of a polypeptide chain, in which each residue is spatially related to its neighbors in the same way. ■ The most common secondary structures are the H9251 helix, the H9252 conformation, and H9252 turns. ■ The secondary structure of a polypeptide segment can be completely defined if the H9278 and H9274 angles are known for all amino acid residues in that segment. Chapter 4 The Three-Dimensional Structure of Proteins124 FIGURE 4–8 Structures of H9252 turns. (a) Type I and type II H9252 turns are most common; type I turns occur more than twice as frequently as type II. Type II H9252 turns always have Gly as the third residue. Note the hydrogen bond between the peptide groups of the first and fourth residues of the bends. (Individual amino acid residues are framed by large blue circles.) (b) The trans and cis isomers of a peptide bond in- volving the imino nitrogen of proline. Of the peptide bonds between amino acid residues other than Pro, over 99.95% are in the trans con- figuration. For peptide bonds involving the imino nitrogen of proline, however, about 6% are in the cis configuration; many of these occur at H9252 turns. 8885d_c04_124 12/23/03 7:45 AM Page 124 mac111 mac111:reb: 4.3 Protein Tertiary and Quaternary Structures Protein Architecture—Introduction to Tertiary Structure The overall three-dimensional arrangement of all atoms in a protein is referred to as the protein’s tertiary struc- ture. Whereas the term “secondary structure” refers to the spatial arrangement of amino acid residues that are adjacent in the primary structure, tertiary structure in- cludes longer-range aspects of amino acid sequence. Amino acids that are far apart in the polypeptide se- quence and that reside in different types of secondary structure may interact within the completely folded structure of a protein. The location of bends (including H9252 turns) in the polypeptide chain and the direction and angle of these bends are determined by the number and location of specific bend-producing residues, such as Pro, Thr, Ser, and Gly. Interacting segments of polypep- tide chains are held in their characteristic tertiary posi- tions by different kinds of weak bonding interactions (and sometimes by covalent bonds such as disulfide cross-links) between the segments. Some proteins contain two or more separate polypeptide chains, or subunits, which may be identical or different. The arrangement of these protein subunits in three-dimensional complexes constitutes quater- nary structure. In considering these higher levels of structure, it is useful to classify proteins into two major groups: fi- brous proteins, having polypeptide chains arranged in long strands or sheets, and globular proteins, having polypeptide chains folded into a spherical or globular shape. The two groups are structurally distinct: fibrous proteins usually consist largely of a single type of sec- ondary structure; globular proteins often contain sev- eral types of secondary structure. The two groups dif- fer functionally in that the structures that provide support, shape, and external protection to vertebrates are made of fibrous proteins, whereas most enzymes and regulatory proteins are globular proteins. Certain fi- brous proteins played a key role in the development of our modern understanding of protein structure and pro- vide particularly clear examples of the relationship be- tween structure and function. We begin our discussion with fibrous proteins, before turning to the more com- plex folding patterns observed in globular proteins. 4.3 Protein Tertiary and Quaternary Structures 125 (b) H11001180 120 60 0 H1100260 H11002120 H11002180 H110011800H11002180 w (degrees) f (degrees) Antiparallel b sheets Collagen triple helix Right-twisted b sheets Parallel b sheets Left-handed a helix Right-handed a helix H11001180 120 60 0 H1100260 H11002120 H11002180 H110011800H11002180 w (degrees) f (degrees)(a) FIGURE 4–9 Ramachandran plots for a variety of structures. (a) The values of H9278 and H9274 for various allowed secondary structures are over- laid on the plot from Figure 4–3. Although left-handed H9251 helices ex- tending over several amino acid residues are theoretically possible, they have not been observed in proteins. (b) The values of H9278 and H9274 for all the amino acid residues except Gly in the enzyme pyruvate ki- nase (isolated from rabbit) are overlaid on the plot of theoretically al- lowed conformations (Fig. 4–3). The small, flexible Gly residues were excluded because they frequently fall outside the expected ranges (blue). FIGURE 4–10 Relative probabilities that a given amino acid will oc- cur in the three common types of secondary structure. Glu a Helix b Conformation b Turn Met Ala Leu Lys Phe Gln Trp Ile Val Asp His Arg Thr Ser Cys Tyr Asn Pro Gly 8885d_c04_125 12/23/03 7:46 AM Page 125 mac111 mac111:reb: Fibrous Proteins Are Adapted for a Structural Function Protein Architecture—Tertiary Structure of Fibrous Proteins H9251-Keratin, collagen, and silk fibroin nicely illustrate the relationship between protein structure and biological function (Table 4–1). Fibrous proteins share properties that give strength and/or flexibility to the structures in which they occur. In each case, the fundamental struc- tural unit is a simple repeating element of secondary structure. All fibrous proteins are insoluble in water, a property conferred by a high concentration of hy- drophobic amino acid residues both in the interior of the protein and on its surface. These hydrophobic sur- faces are largely buried by packing many similar polypeptide chains together to form elaborate supramol- ecular complexes. The underlying structural simplicity of fibrous proteins makes them particularly useful for illustrating some of the fundamental principles of pro- tein structure discussed above. H9251-Keratin The H9251-keratins have evolved for strength. Found in mammals, these proteins constitute almost the entire dry weight of hair, wool, nails, claws, quills, horns, hooves, and much of the outer layer of skin. The H9251-keratins are part of a broader family of proteins called intermediate filament (IF) proteins. Other IF proteins are found in the cytoskeletons of animal cells. All IF pro- teins have a structural function and share structural fea- tures exemplified by the H9251-keratins. The H9251-keratin helix is a right-handed H9251 helix, the same helix found in many other proteins. Francis Crick and Linus Pauling in the early 1950s independently sug- gested that the H9251 helices of keratin were arranged as a coiled coil. Two strands of H9251-keratin, oriented in parallel (with their amino termini at the same end), are wrapped about each other to form a supertwisted coiled coil. The supertwisting amplifies the strength of the overall struc- ture, just as strands are twisted to make a strong rope (Fig. 4–11). The twisting of the axis of an H9251 helix to form a coiled coil explains the discrepancy between the 5.4 ? per turn predicted for an H9251 helix by Pauling and Corey and the 5.15 to 5.2 ? repeating structure observed in the x-ray diffraction of hair (p. 120). The helical path of the supertwists is left-handed, opposite in sense to the H9251 helix. The surfaces where the two H9251 helices touch are made up of hydrophobic amino acid residues, their R groups meshed together in a regular interlocking pat- tern. This permits a close packing of the polypeptide chains within the left-handed supertwist. Not surpris- ingly, H9251-keratin is rich in the hydrophobic residues Ala, Val, Leu, Ile, Met, and Phe. An individual polypeptide in the H9251-keratin coiled coil has a relatively simple tertiary structure, dominated by an H9251-helical secondary structure with its helical axis twisted in a left-handed superhelix. The intertwining of the two H9251-helical polypeptides is an example of quater- nary structure. Coiled coils of this type are common structural elements in filamentous proteins and in the muscle protein myosin (see Fig. 5–29). The quaternary structure of H9251-keratin can be quite complex. Many coiled coils can be assembled into large supramolecular com- plexes, such as the arrangement of H9251-keratin to form the intermediate filament of hair (Fig. 4–11b). Chapter 4 The Three-Dimensional Structure of Proteins126 Cells Intermediate filament Protofibril Cross section of a hair Protofilament Two-chain coiled coil H9251 Helix (b) FIGURE 4–11 Structure of hair. (a) Hair H9251-keratin is an elongated H9251 helix with somewhat thicker elements near the amino and carboxyl termini. Pairs of these helices are interwound in a left-handed sense to form two-chain coiled coils. These then combine in higher-order structures called protofilaments and protofibrils. About four protofibrils—32 strands of H9251-keratin altogether—combine to form an intermediate filament. The individual two-chain coiled coils in the various substructures also appear to be interwound, but the handedness of the interwinding and other structural details are unknown. (b) A hair is an array of many H9251-keratin filaments, made up of the substructures shown in (a). (a) Protofibril Protofilament Two-chain coiled coil 20–30 ? Keratin a helix 8885d_c04_126 12/23/03 7:46 AM Page 126 mac111 mac111:reb: The strength of fibrous proteins is enhanced by co- valent cross-links between polypeptide chains within the multihelical “ropes” and between adjacent chains in a supramolecular assembly. In H9251-keratins, the cross-links stabilizing quaternary structure are disulfide bonds (Box 4–2). In the hardest and toughest H9251-keratins, such as those of rhinoceros horn, up to 18% of the residues are cysteines involved in disulfide bonds. Collagen Like the H9251-keratins, collagen has evolved to provide strength. It is found in connective tissue such as tendons, cartilage, the organic matrix of bone, and the cornea of the eye. The collagen helix is a unique secondary structure quite distinct from the H9251 helix. It is left-handed and has three amino acid residues per turn (Fig. 4–12). Collagen is also a coiled coil, but one with distinct tertiary and quaternary structures: three separate polypeptides, called H9251 chains (not to be con- fused with H9251 helices), are supertwisted about each other (Fig. 4–12c). The superhelical twisting is right-handed in collagen, opposite in sense to the left-handed helix of the H9251 chains. There are many types of vertebrate collagen. Typi- cally they contain about 35% Gly, 11% Ala, and 21% Pro and 4-Hyp (4-hydroxyproline, an uncommon amino acid; see Fig. 3–8a). The food product gelatin is derived 4.3 Protein Tertiary and Quaternary Structures 127 TABLE 4–1 Secondary Structures and Properties of Fibrous Proteins Structure Characteristics Examples of occurrence H9251 Helix, cross-linked by disulfide Tough, insoluble protective structures of H9251-Keratin of hair, feathers, and nails bonds varying hardness and flexibility H9252 Conformation Soft, flexible filaments Silk fibroin Collagen triple helix High tensile strength, without stretch Collagen of tendons, bone matrix BOX 4–2 THE WORLD OF BIOCHEMISTRY Permanent Waving Is Biochemical Engineering When hair is exposed to moist heat, it can be stretched. At the molecular level, the H9251 helices in the H9251-keratin of hair are stretched out until they arrive at the fully extended H9252 conformation. On cooling they spontaneously revert to the H9251-helical conformation. The characteristic “stretchability” of H9251-keratins, and their numerous disulfide cross-linkages, are the basis of permanent waving. The hair to be waved or curled is first bent around a form of appropriate shape. A so- lution of a reducing agent, usually a compound con- taining a thiol or sulfhydryl group (OSH), is then ap- plied with heat. The reducing agent cleaves the cross-linkages by reducing each disulfide bond to form two Cys residues. The moist heat breaks hydrogen bonds and causes the H9251-helical structure of the polypeptide chains to uncoil. After a time the reduc- ing solution is removed, and an oxidizing agent is added to establish new disulfide bonds between pairs of Cys residues of adjacent polypeptide chains, but not the same pairs as before the treatment. After the hair is washed and cooled, the polypeptide chains revert to their H9251-helical conformation. The hair fibers now curl in the desired fashion because the new disulfide cross-linkages exert some torsion or twist on the bun- dles of H9251-helical coils in the hair fibers. A permanent wave is not truly permanent, because the hair grows; in the new hair replacing the old, the H9251-keratin has the natural, nonwavy pattern of disulfide bonds. SH SH SH SH SH SH HS HS HS HS HS HS S S S S S S S S S S S S SH HS HS SH HS SH SH HS SH HS HS S S S S S S S S reduce curl oxidize HS HS HS SH HS 8885d_c04_127 1/16/04 6:13 AM Page 127 mac76 mac76:385_reb: Heads of collagen molecules Section of collagen molecule Cross-striations 640 ? (64 nm) 250 nm FIGURE 4–13 Structure of collagen fibrils. Collagen (M r 300,000) is a rod-shaped molecule, about 3,000 ? long and only 15 ? thick. Its three helically intertwined H9251 chains may have different sequences, but each has about 1,000 amino acid residues. Collagen fibrils are made up of collagen molecules aligned in a staggered fashion and cross- linked for strength. The specific alignment and degree of cross-linking vary with the tissue and produce characteristic cross-striations in an electron micrograph. In the example shown here, alignment of the head groups of every fourth molecule produces striations 640 ? apart. from collagen; it has little nutritional value as a protein, because collagen is extremely low in many amino acids that are essential in the human diet. The unusual amino acid content of collagen is related to structural con- straints unique to the collagen helix. The amino acid se- quence in collagen is generally a repeating tripeptide unit, Gly–X–Y, where X is often Pro, and Y is often 4-Hyp. Only Gly residues can be accommodated at the very tight junctions between the individual H9251 chains (Fig. 4–12d); The Pro and 4-Hyp residues permit the sharp twisting of the collagen helix. The amino acid se- quence and the supertwisted quaternary structure of collagen allow a very close packing of its three polypep- tides. 4-Hydroxyproline has a special role in the struc- ture of collagen—and in human history (Box 4–3). The tight wrapping of the H9251 chains in the collagen triple helix provides tensile strength greater than that Chapter 4 The Three-Dimensional Structure of Proteins128 (b) (c) (d)(a) FIGURE 4–12 Structure of collagen. (Derived from PDB ID 1CGD.) (a) The H9251 chain of collagen has a repeating secondary structure unique to this protein. The repeating tripeptide sequence Gly–X–Pro or Gly–X–4-Hyp adopts a left-handed helical structure with three residues per turn. The repeating sequence used to generate this model is Gly–Pro–4-Hyp. (b) Space-filling model of the same H9251 chain. (c) Three of these helices (shown here in gray, blue, and purple) wrap around one another with a right-handed twist. (d) The three-stranded colla- gen superhelix shown from one end, in a ball-and-stick representa- tion. Gly residues are shown in red. Glycine, because of its small size, is required at the tight junction where the three chains are in contact. The balls in this illustration do not represent the van der Waals radii of the individual atoms. The center of the three-stranded superhelix is not hollow, as it appears here, but is very tightly packed. N OH CH 2 CH CH 2 CH 2 C CO H Polypeptide chain HN NH OC CH CH 2 CH 2 CH 2 CH Polypeptide Lys residue HyLys chain minus H9280-amino residue group (norleucine) Dehydrohydroxylysinonorleucine of a steel wire of equal cross section. Collagen fibrils (Fig. 4–13) are supramolecular assemblies consisting of triple-helical collagen molecules (sometimes referred to as tropocollagen molecules) associated in a variety of ways to provide different degrees of tensile strength. The H9251 chains of collagen molecules and the collagen mol- ecules of fibrils are cross-linked by unusual types of co- valent bonds involving Lys, HyLys (5-hydroxylysine; see Fig. 3–8a), or His residues that are present at a few of the X and Y positions in collagens. These links create uncommon amino acid residues such as dehydrohy- droxylysinonorleucine. The increasingly rigid and brit- tle character of aging connective tissue results from ac- cumulated covalent cross-links in collagen fibrils. 8885d_c04_128 12/23/03 7:47 AM Page 128 mac111 mac111:reb: 129 A typical mammal has more than 30 structural variants of collagen, particular to certain tissues and each somewhat different in sequence and function. Some human genetic defects in collagen structure il- lustrate the close relationship between amino acid se- quence and three-dimensional structure in this protein. Osteogenesis imperfecta is characterized by abnormal bone formation in babies; Ehlers-Danlos syndrome is characterized by loose joints. Both conditions can be lethal, and both result from the substitution of an amino acid residue with a larger R group (such as Cys or Ser) for a single Gly residue in each H9251 chain (a different Gly residue in each disorder). These single-residue substi- tutions have a catastrophic effect on collagen function because they disrupt the Gly–X–Y repeat that gives col- lagen its unique helical structure. Given its role in the collagen triple helix (Fig. 4–12d), Gly cannot be re- placed by another amino acid residue without substan- tial deleterious effects on collagen structure. ■ Silk Fibroin Fibroin, the protein of silk, is produced by insects and spiders. Its polypeptide chains are predom- inantly in the H9252 conformation. Fibroin is rich in Ala and Gly residues, permitting a close packing of H9252 sheets and an interlocking arrangement of R groups (Fig. 4–14). The overall structure is stabilized by extensive hydro- gen bonding between all peptide linkages in the polypeptides of each H9252 sheet and by the optimization of van der Waals interactions between sheets. Silk does not stretch, because the H9252 conformation is already highly extended (Fig. 4–7; see also Fig. 4–15). However, the structure is flexible because the sheets are held together by numerous weak interactions rather than by covalent bonds such as the disulfide bonds in H9251-keratins. Structural Diversity Reflects Functional Diversity in Globular Proteins In a globular protein, different segments of a polypep- tide chain (or multiple polypeptide chains) fold back on each other. As illustrated in Figure 4–15, this folding generates a compact form relative to polypeptides in a fully extended conformation. The folding also provides the structural diversity necessary for proteins to carry out a wide array of biological functions. Globular proteins include enzymes, transport proteins, motor proteins, regulatory proteins, immunoglobulins, and proteins with many other functions. As a new millennium begins, the number of known three-dimensional protein structures is in the thousands and more than doubles every two years. This wealth of structural information is revolutionizing our under- standing of protein structure, the relation of structure 4.3 Protein Tertiary and Quaternary Structures (b) 70 mH9262 3.5 ? 5.7 ? Ala side chain Gly side chain(a) FIGURE 4–14 Structure of silk. The fibers used to make silk cloth or a spider web are made up of the protein fibroin. (a) Fibroin consists of layers of antiparallel H9252 sheets rich in Ala (purple) and Gly (yellow) residues. The small side chains interdigitate and allow close packing of each layered sheet, as shown in this side view. (b) Strands of fibroin (blue) emerge from the spinnerets of a spider in this colorized electron micrograph. FIGURE 4–15 Globular protein structures are compact and varied. Human serum albumin (M r 64,500) has 585 residues in a single chain. Given here are the approximate dimensions its single polypeptide chain would have if it occurred entirely in extended H9252 conformation or as an H9251 helix. Also shown is the size of the protein in its native globular form, as determined by X-ray crystallography; the polypeptide chain must be very compactly folded to fit into these dimensions. a Helix 900 H11003 11 ? Native globular form 100 H11003 60 ? b Conformation 2,000 H11003 5 ? 8885d_c04_129 12/30/03 2:13 PM Page 129 mac76 mac76:385_reb: Chapter 4 The Three-Dimensional Structure of Proteins130 BOX 4–3 BIOCHEMISTRY IN MEDICINE Why Sailors, Explorers, and College Students Should Eat Their Fresh Fruits and Vegetables . . . from this misfortune, together with the unhealthiness of the country, where there never falls a drop of rain, we were stricken with the “camp-sickness,” which was such that the flesh of our limbs all shrivelled up, and the skin of our legs became all blotched with black, mouldy patches, like an old jack-boot, and proud flesh came upon the gums of those of us who had the sickness, and none escaped from this sickness save through the jaws of death. The signal was this: when the nose began to bleed, then death was at hand . . . —from The Memoirs of the Lord of Joinville, ca. 1300 This excerpt describes the plight of Louis IX’s army toward the end of the Seventh Crusade (1248–1254), immediately preceding the battle of Fariskur, where the scurvy-weakened Crusader army was destroyed by the Egyptians. What was the nature of the malady af- flicting these thirteenth-century soldiers? Scurvy is caused by lack of vitamin C, or ascorbic acid (ascorbate). Vitamin C is required for, among other things, the hydroxylation of proline and lysine in colla- gen; scurvy is a deficiency disease characterized by general degeneration of connective tissue. Manifesta- tions of advanced scurvy include numerous small hem- orrhages caused by fragile blood vessels, tooth loss, poor wound healing and the reopening of old wounds, bone pain and degeneration, and eventually heart fail- ure. Despondency and oversensitivity to stimuli of many kinds are also observed. Milder cases of vitamin C deficiency are accompanied by fatigue, irritability, and an increased severity of respiratory tract infections. Most animals make large amounts of vitamin C, con- verting glucose to ascorbate in four enzymatic steps. But in the course of evolution, humans and some other animals—gorillas, guinea pigs, and fruit bats—have lost the last enzyme in this pathway and must obtain ascor- bate in their diet. Vitamin C is available in a wide range of fruits and vegetables. Until 1800, however, it was of- ten absent in the dried foods and other food supplies stored for winter or for extended travel. Scurvy was recorded by the Egyptians in 1500 BCE, and it is described in the fifth century BCE writings of Hippocrates. Although scurvy played a critical role in medieval wars and made regular winter appearances in northern climates, it did not come to wide public notice until the European voyages of discovery from 1500 to 1800. The first circumnavigation of the globe, led by Ferdinand Magellan (1520), was accomplished only with the loss of more than 80% of his crew to scurvy. Vasco da Gama lost two-thirds of his crew to the same fate during his first exploration of trade routes to India (1499). During Jacques Cartier’s sec- ond voyage to explore the St. Lawrence River (1535– 1536), his band suffered numerous fatalities and was threatened with complete disaster until the native Americans taught the men to make a cedar tea that cured and prevented scurvy (it contained vitamin C) (Fig. 1). It is estimated that a million sailors died of scurvy in the years 1600 to 1800. Winter outbreaks of scurvy in Europe were gradually eliminated in the nineteenth century as the cultivation of the potato, in- troduced from South America, became widespread. In 1747, James Lind, a Scottish surgeon in the Royal Navy (Fig. 2), carried out the first controlled clinical study in recorded history. During an extended voyage on the 50-gun warship HMS Salisbury, Lind selected 12 sailors suffering from scurvy and separated them into groups of two. All 12 received the same diet, except that each group was given a different rem- edy for scurvy from among those recommended at the time. The sailors given lemons and oranges recovered and returned to duty. The sailors given boiled apple juice improved slightly. The re- mainder continued to deterio- rate. Lind’s Treatise on the Scurvy was published in 1753, but inaction persisted in the Royal Navy for another 40 years. FIGURE 1 Iroquois showing Jacques Cartier how to make cedar tea as a remedy for scurvy. FIGURE 2 James Lind, 1716–1794; naval sur- geon, 1739–1748. 8885d_c04_130 12/23/03 7:47 AM Page 130 mac111 mac111:reb: 4.3 Protein Tertiary and Quaternary Structures 131 In 1795 the British admiralty finally mandated a ration of concentrated lime or lemon juice for all British sailors (hence the name “limeys”). Scurvy continued to be a problem in some other parts of the world until 1932, when Hungarian scientist Albert Szent-Gy?rgyi, and W. A. Waugh and C. G. King at the University of Pittsburgh, isolated and synthesized ascorbic acid. L-Ascorbic acid (vitamin C) is a white, odorless, crystalline powder. It is freely soluble in water and rel- atively insoluble in organic solvents. In a dry state, away from light, it is stable for a considerable length of time. The appropriate daily intake of this vitamin is still in dispute. The recommended daily allowance in the United States is 60 mg (Australia and the United Kingdom recommend 30 to 40 mg; Russia recom- mends 100 mg). Higher doses of vitamin C are some- times recommended, although the benefit of such a regimen is disputed. Notably, animals that synthesize their own vitamin C maintain levels found in humans only if they consume hundreds of times the recom- mended daily allowance. Along with citrus fruits and almost all other fresh fruits, other good sources of vi- tamin C include peppers, tomatoes, potatoes, and broccoli. The vitamin C of fruits and vegetables is de- stroyed by overcooking or prolonged storage. So why is ascorbate so necessary to good health? Of particular interest to us here is its role in the for- mation of collagen. The proline derivative 4(R)-L- hydroxyproline (4-Hyp) plays an essential role in the folding of collagen and in maintaining its structure. As noted in the text, collagen is constructed of the re- peating tripeptide unit Gly–X–Y, where X and Y are generally Pro or 4-Hyp. A constructed peptide with 10 Gly–Pro–Pro repeats will fold to form a collagen triple helix, but the structure melts at 41 H11034C. If the 10 re- peats are changed to Gly–Pro–4-Hyp, the melting tem- perature jumps to 69 H11034C. The stability of collagen arises from the detailed structure of the collagen he- lix, determined independently by Helen Berman and Adriana Zagari and their colleagues. The proline ring is normally found as a mixture of two puckered con- formations, called C H9253 -endo and C H9253 -exo (Fig. 3). The collagen helix structure requires the Pro residue in the Y positions to be in the C H9253 -exo conformation, and it is this conformation that is enforced by the hydroxyl substitution at C-4 in 4-hydroxyproline. However, the collagen structure requires the Pro residue in the X positions to have the C H9253 -endo conformation, and in- troduction of 4-Hyp here can destabilize the helix. The inability to hydroxylate the Pro at the Y positions when vitamin C is absent leads to collagen instability and the connective tissue problems seen in scurvy. The hydroxylation of specific Pro residues in pro- collagen, the precursor of collagen, requires the ac- tion of the enzyme prolyl 4-hydroxylase. This enzyme (M r 240,000) is an H9251 2 H9252 2 tetramer in all vertebrate sources. The proline-hydroxylating activity is found in the H9251 subunits. (Researchers were surprised to find that the H9252 subunits are identical to the enzyme pro- tein disulfide isomerase (PDI; p. 152); these subunits do not participate in the prolyl hydroxylation activity.) Each H9251 subunit contains one atom of nonheme iron (Fe 2H11001 ), and the enzyme is one of a class of hydroxy- lases that require H9251-ketoglutarate in their reactions. In the normal prolyl 4-hydroxylase reaction (Fig. 4a), one molecule of H9251-ketoglutarate and one of O 2 bind to the enzyme. The H9251-ketoglutarate is oxidatively decarboxylated to form CO 2 and succinate. The re- maining oxygen atom is then used to hydroxylate an appropriate Pro residue in procollagen. No ascorbate is needed in this reaction. However, prolyl 4-hydroxylase also catalyzes an oxidative decarboxylation of H9251- ketoglutarate that is not coupled to proline hydroxy- lation—and this is the reaction that requires ascorbate (Fig. 4b). During this reaction, the heme Fe 2H11001 be- comes oxidized, and the oxidized form of the enzyme is inactive—unable to hydroxylate proline. The ascor- bate consumed in the reaction presumably functions to reduce the heme iron and restore enzyme activity. But there is more to the vitamin C story than pro- line hydroxylation. Very similar hydroxylation reac- tions generate the less abundant 3-hydroxyproline and 5-hydroxylysine residues that also occur in collagen. The enzymes that catalyze these reactions are mem- bers of the same H9251-ketoglutarate-dependent dioxyge- nase family, and for all these enzymes ascorbate plays the same role. These dioxygenases are just a few of the dozens of closely related enzymes that play a variety of metabolic roles in different classes of organisms. Ascorbate serves other roles too. It is an antioxidant, reacting enzymatically and nonenzymati- cally with reactive oxygen species, which in mammals play an important role in aging and cancer. O N O N HO C H9253 -endo Proline C H9253 -exo 4-Hydroxyproline FIGURE 3 The C H9253 -endo conformation of proline and the C H9253 -exo conformation of 4-hydroxyproline. (continued on next page) 8885d_c04_131 12/23/03 7:48 AM Page 131 mac111 mac111:reb: to function, and even the evolutionary paths by which proteins arrived at their present state, which can be glimpsed in the family resemblances that are revealed as protein databases are sifted and sorted. The sheer variety of structures can seem daunting. Yet as new pro- tein structures become available it is becoming increas- ingly clear that they are manifestations of a finite set of recognizable, stable folding patterns. Our discussion of globular protein structure begins with the principles gleaned from the earliest protein structures to be elucidated. This is followed by a de- tailed description of protein substructure and compar- ative categorization. Such discussions are possible only because of the vast amount of information available over the Internet from resources such as the Protein Data Bank (PDB; www.rcsb.org/pdb), an archive of experi- mentally determined three-dimensional structures of biological macromolecules. Myoglobin Provided Early Clues about the Complexity of Globular Protein Structure Protein Architecture—Tertiary Structure of Small Globular Pro- teins, II. Myoglobin The first breakthrough in understand- ing the three-dimensional structure of a globular pro- tein came from x-ray diffraction studies of myoglobin carried out by John Kendrew and his colleagues in the 1950s. Myoglobin is a relatively small (M r 16,700), oxygen-binding protein of muscle cells. It functions both to store oxygen and to facilitate oxygen diffusion in rap- idly contracting muscle tissue. Myoglobin contains a sin- gle polypeptide chain of 153 amino acid residues of known sequence and a single iron protoporphyrin, or heme, group. The same heme group is found in hemo- globin, the oxygen-binding protein of erythrocytes, and is responsible for the deep red-brown color of both myo- globin and hemoglobin. Myoglobin is particularly abun- Chapter 4 The Three-Dimensional Structure of Proteins132 In plants, ascorbate is required as a substrate for the enzyme ascorbate peroxidase, which converts H 2 O 2 to water. The peroxide is generated from the O 2 produced in photosynthesis, an unavoidable conse- quence of generating O 2 in a compartment laden with powerful oxidation-reduction systems (Chapter 19). Ascorbate is a also a precursor of oxalate and tartrate in plants, and is involved in the hydroxylation of Pro residues in cell wall proteins called extensins. Ascor- bate is found in all subcellular compartments of plants, at concentrations of 2 to 25 mM—which is why plants are such good sources of vitamin C. Scurvy remains a problem today. The malady is still encountered not only in remote regions where nutri- tious food is scarce but, surprisingly, on U.S. college campuses. The only vegetables consumed by some stu- dents are those in tossed salads, and days go by with- out these young adults consuming fruit. A 1998 study of 230 students at Arizona State University revealed that 10% had serious vitamin C deficiencies, and 2 stu- dents had vitamin C levels so low that they probably had scurvy. Only half the students in the study con- sumed the recommended daily allowance of vitamin C. Eat your fresh fruit and vegetables. CO HC H 2 C H 2 COH HCOH C CC HO OH (a) (b) CH 2 CH 2 CH 2 O 2 COOH COOH C O O O N Pro residue H11001H11001 H9251-Ketoglutarate H9251-Ketoglutarate Ascorbate CO HC OH C CH 2 CH 2 CO 2 COOH COOH N H 4-Hyp residue H11001 H11001 Succinate CH 2 CH 2 CO 2 COOH COOH H11001H11001 Succinate CH 2 CH 2 O 2 COOH C OC H11001H11001 H 2 COH HCOH C CC O Dehydroascorbate O OO C COOH Fe 2H11001 Fe 2H11001 C H 2 H 2 C C H 2 FIGURE 4 The reactions catalyzed by prolyl 4-hydroxylase. (a) The normal reaction, coupled to proline hydroxyla- tion, which does not require ascorbate. The fate of the two oxygen atoms from O 2 is shown in red. (b) The uncoupled reaction, in which H9251-ketoglutarate is oxidatively decarboxylated without hydroxylation of proline. Ascorbate is consumed stoichiometrically in this process as it is converted to dehydroascorbate. BOX 4–3 BIOCHEMISTRY IN MEDICINE (continued from previous page) 8885d_c04_132 12/23/03 7:48 AM Page 132 mac111 mac111:reb: 4.3 Protein Tertiary and Quaternary Structures 133 dant in the muscles of diving mammals such as the whale, seal, and porpoise, whose muscles are so rich in this protein that they are brown. Storage and distribu- tion of oxygen by muscle myoglobin permit these ani- mals to remain submerged for long periods of time. The activities of myoglobin and other globin molecules are investigated in greater detail in Chapter 5. Figure 4–16 shows several structural representa- tions of myoglobin, illustrating how the polypeptide chain is folded in three dimensions—its tertiary struc- ture. The red group surrounded by protein is heme. The backbone of the myoglobin molecule is made up of eight relatively straight segments of H9251 helix interrupted by bends, some of which are H9252 turns. The longest H9251 helix has 23 amino acid residues and the shortest only 7; all helices are right-handed. More than 70% of the residues in myoglobin are in these H9251-helical regions. X-ray analy- sis has revealed the precise position of each of the R groups, which occupy nearly all the space within the folded chain. Many important conclusions were drawn from the structure of myoglobin. The positioning of amino acid side chains reflects a structure that derives much of its stability from hydrophobic interactions. Most of the hy- drophobic R groups are in the interior of the myoglobin molecule, hidden from exposure to water. All but two of the polar R groups are located on the outer surface of the molecule, and all are hydrated. The myoglobin molecule is so compact that its interior has room for only four molecules of water. This dense hydrophobic core is typical of globular proteins. The fraction of space occupied by atoms in an organic liquid is 0.4 to 0.6; in a typical crystal the fraction is 0.70 to 0.78, near the theoretical maximum. In a globular protein the fraction is about 0.75, comparable to that in a crystal. In this packed environment, weak interactions strengthen and reinforce each other. For example, the nonpolar side chains in the core are so close together that short-range van der Waals interactions make a significant contribu- tion to stabilizing hydrophobic interactions. (d) (e) (a) (b) (c) FIGURE 4–16 Tertiary structure of sperm whale myoglobin. (PDB ID 1MBO) The orientation of the protein is similar in all panels; the heme group is shown in red. In addition to illustrating the myoglobin struc- ture, this figure provides examples of several different ways to display protein structure. (a) The polypeptide backbone, shown in a ribbon representation of a type introduced by Jane Richardson, which high- lights regions of secondary structure. The H9251-helical regions are evi- dent. (b) A “mesh” image emphasizes the protein surface. (c) A sur- face contour image is useful for visualizing pockets in the protein where other molecules might bind. (d) A ribbon representation, in- cluding side chains (blue) for the hydrophobic residues Leu, Ile, Val, and Phe. (e) A space-filling model with all amino acid side chains. Each atom is represented by a sphere encompassing its van der Waals radius. The hydrophobic residues are again shown in blue; most are not visible, because they are buried in the interior of the protein. 8885d_c04_133 12/23/03 7:48 AM Page 133 mac111 mac111:reb: Deduction of the structure of myoglobin confirmed some expectations and introduced some new elements of secondary structure. As predicted by Pauling and Corey, all the peptide bonds are in the planar trans con- figuration. The H9251 helices in myoglobin provided the first direct experimental evidence for the existence of this type of secondary structure. Three of the four Pro residues of myoglobin are found at bends (recall that proline, with its fixed H9278 bond angle and lack of a peptide- bond NOH group for participation in hydrogen bonds, is largely incompatible with H9251-helical structure). The fourth Pro residue occurs within an H9251 helix, where it cre- ates a kink necessary for tight helix packing. Other bends contain Ser, Thr, and Asn residues, which are among the amino acids whose bulk and shape tend to make them incompatible with H9251-helical structure if they are in close proximity in the amino acid sequence (p. 121). The flat heme group rests in a crevice, or pocket, in the myoglobin molecule. The iron atom in the center of the heme group has two bonding (coordination) posi- tions perpendicular to the plane of the heme (Fig. 4–17). One of these is bound to the R group of the His residue at position 93; the other is the site at which an O 2 mol- ecule binds. Within this pocket, the accessibility of the heme group to solvent is highly restricted. This is im- portant for function, because free heme groups in an oxy- genated solution are rapidly oxidized from the ferrous (Fe 2H11001 ) form, which is active in the reversible binding of O 2 , to the ferric (Fe 3H11001 ) form, which does not bind O 2 . Knowledge of the structure of myoglobin allowed researchers for the first time to understand in detail the correlation between the structure and function of a pro- tein. Many different myoglobin structures have been elucidated, allowing investigators to see how the struc- ture changes when oxygen or other molecules bind to it. Hundreds of proteins have been subjected to similar analysis since then. Today, techniques such as NMR spectroscopy supplement x-ray diffraction data, pro- viding more information on a protein’s structure (Box 4–4). The ongoing sequencing of genomic DNA from many organisms (Chapter 9) has identified thousands of genes that encode proteins of known sequence but unknown function. Our first insight into what these pro- teins do often comes from our still-limited understand- ing of how primary structure determines tertiary struc- ture, and how tertiary structure determines function. Globular Proteins Have a Variety of Tertiary Structures With elucidation of the tertiary structures of hundreds of other globular proteins by x-ray analysis, it became clear that myoglobin illustrates only one of many ways in which a polypeptide chain can be folded. In Figure 4–18 the structures of cytochrome c, lysozyme, and ribonuclease are compared. These proteins have differ- ent amino acid sequences and different tertiary struc- tures, reflecting differences in function. All are relatively small and easy to work with, facilitating structural analy- sis. Cytochrome c is a component of the respiratory chain of mitochondria (Chapter 19). Like myoglobin, cy- tochrome c is a heme protein. It contains a single polypeptide chain of about 100 residues (M r 12,400) and a single heme group. In this case, the protoporphyrin of the heme group is covalently attached to the polypep- tide. Only about 40% of the polypeptide is in H9251-helical segments, compared with 70% of the myoglobin chain. The rest of the cytochrome c chain contains H9252 turns and irregularly coiled and extended segments. Lysozyme (M r 14,600) is an enzyme abundant in egg white and human tears that catalyzes the hydrolytic cleavage of polysaccharides in the protective cell walls of some families of bacteria. Lysozyme, because it can lyse, or degrade, bacterial cell walls, serves as a bacte- ricidal agent. As in cytochrome c, about 40% of its 129 amino acid residues are in H9251-helical segments, but the arrangement is different and some H9252-sheet structure is also present (Fig. 4–18). Four disulfide bonds con- tribute stability to this structure. The H9251 helices line a long crevice in the side of the molecule, called the ac- tive site, which is the site of substrate binding and catal- ysis. The bacterial polysaccharide that is the substrate for lysozyme fits into this crevice. Protein Architecture— Tertiary Structure of Small Globular Proteins, III. Lysozyme Ribonuclease, another small globular protein (M r 13,700), is an enzyme secreted by the pancreas into the small intestine, where it catalyzes the hydrolysis of cer- tain bonds in the ribonucleic acids present in ingested Chapter 4 The Three-Dimensional Structure of Proteins134 O C O O Fe CH 3 CH (a) N CH 2 CH 2 CH 2 CH 2 CH 2 CH 3 CH 3 CH 3 CH CH CH CH CH O C CC CC C C C CC C C C C CC N NN CCH 2 H11002H11002 H11001 H11001 (b) Fe O 2 CH 2 N N FIGURE 4–17 The heme group. This group is present in myoglobin, hemoglobin, cytochromes, and many other heme proteins. (a) Heme consists of a complex organic ring structure, protoporphyrin, to which is bound an iron atom in its ferrous (Fe 2H11001 ) state. The iron atom has six coordination bonds, four in the plane of, and bonded to, the flat por- phyrin molecule and two perpendicular to it. (b) In myoglobin and hemoglobin, one of the perpendicular coordination bonds is bound to a nitrogen atom of a His residue. The other is “open” and serves as the binding site for an O 2 molecule. 8885d_c04_134 12/23/03 7:49 AM Page 134 mac111 mac111:reb: food. Its tertiary structure, determined by x-ray analy- sis, shows that little of its 124 amino acid polypeptide chain is in an H9251-helical conformation, but it contains many segments in the H9252 conformation (Fig. 4–18). Like lysozyme, ribonuclease has four disulfide bonds be- tween loops of the polypeptide chain. In small proteins, hydrophobic residues are less likely to be sheltered in a hydrophobic interior—simple geometry dictates that the smaller the protein, the lower the ratio of volume to surface area. Small proteins also have fewer potential weak interactions available to sta- bilize them. This explains why many smaller proteins such as those in Figure 4–18 are stabilized by a number of covalent bonds. Lysozyme and ribonuclease, for ex- ample, have disulfide linkages, and the heme group in cytochrome c is covalently linked to the protein on two sides, providing significant stabilization of the entire protein structure. Table 4–2 shows the proportions of H9251 helix and H9252 conformation (expressed as percentage of residues in each secondary structure) in several small, single-chain, globular proteins. Each of these proteins has a distinct structure, adapted for its particular biological function, but together they share several important properties. Each is folded compactly, and in each case the hydro- phobic amino acid side chains are oriented toward the interior (away from water) and the hydrophilic side chains are on the surface. The structures are also sta- bilized by a multitude of hydrogen bonds and some ionic interactions. 4.3 Protein Tertiary and Quaternary Structures 135 FIGURE 4–18 Three-dimensional structures of some small proteins. Shown here are cytochrome c (PDB ID 1CCR), lysozyme (PDB ID 3LYM), and ribonuclease (PDB ID 3RN3). Each protein is shown in surface contour and in a ribbon representation, in the same orienta- tion. In the ribbon depictions, regions in the H9252 conformation are represented by flat arrows and the H9251 helices are represented by spiral ribbons. Key functional groups (the heme in cytochrome c; amino acid side chains in the active site of lysozyme and ribonuclease) are shown in red. Disulfide bonds are shown (in the ribbon representations) in yellow. Source: Data from Cantor, C.R. & Schimmel, P.R. (1980) Biophysical Chemistry, Part I: The Confor- mation of Biological Macromolecules, p. 100, W. H. Freeman and Company, New York. *Portions of the polypeptide chains that are not accounted for by H9251 helix or H9252 conformation con- sist of bends and irregularly coiled or extended stretches. Segments of H9251 helix and H9252 conforma- tion sometimes deviate slightly from their normal dimensions and geometry. Residues (%)* Protein (total residues) H9251 Helix H9252 Conformation Chymotrypsin (247) 14 45 Ribonuclease (124) 26 35 Carboxypeptidase (307) 38 17 Cytochrome c (104) 39 0 Lysozyme (129) 40 12 Myoglobin (153) 78 0 TABLE 4–2 Approximate Amounts of H9251 Helix and H9252 Conformation in Some Single-Chain Proteins Cytochrome c Lysozyme Ribonuclease 8885d_c04_135 12/23/03 7:49 AM Page 135 mac111 mac111:reb: BOX 4–4 WORKING IN BIOCHEMISTRY Chapter 4 The Three-Dimensional Structure of Proteins136 (a) (b) Methods for Determining the Three-Dimensional Structure of a Protein X-Ray Diffraction The spacing of atoms in a crystal lattice can be de- termined by measuring the locations and intensities of spots produced on photographic film by a beam of x rays of given wavelength, after the beam has been diffracted by the electrons of the atoms. For example, x-ray analysis of sodium chloride crystals shows that Na H11001 and Cl H11002 ions are arranged in a simple cubic lat- tice. The spacing of the different kinds of atoms in complex organic molecules, even very large ones such as proteins, can also be analyzed by x-ray diffraction methods. However, the technique for analyzing crys- tals of complex molecules is far more laborious than for simple salt crystals. When the repeating pattern of the crystal is a molecule as large as, say, a protein, the numerous atoms in the molecule yield thousands of diffraction spots that must be analyzed by computer. The process may be understood at an elementary level by considering how images are generated in a light microscope. Light from a point source is focused on an object. The light waves are scattered by the ob- ject, and these scattered waves are recombined by a series of lenses to generate an enlarged image of the object. The smallest object whose structure can be determined by such a system—that is, the resolv- ing power of the microscope—is determined by the wavelength of the light, in this case visible light, with wavelengths in the range of 400 to 700 nm. Objects smaller than half the wavelength of the incident light cannot be resolved. To resolve objects as small as pro- teins we must use x rays, with wavelengths in the range of 0.7 to 1.5 ? (0.07 to 0.15 nm). However, there are no lenses that can recombine x rays to form an image; instead the pattern of diffracted x rays is col- lected directly and an image is reconstructed by math- ematical techniques. The amount of information obtained from x-ray crystallography depends on the degree of structural order in the sample. Some important structural pa- rameters were obtained from early studies of the dif- fraction patterns of the fibrous proteins arranged in fairly regular arrays in hair and wool. However, the or- derly bundles formed by fibrous proteins are not crystals—the molecules are aligned side by side, but not all are oriented in the same direction. More de- tailed three-dimensional structural information about proteins requires a highly ordered protein crystal. Pro- tein crystallization is something of an empirical sci- ence, and the structures of many important proteins are not yet known, simply because they have proved difficult to crystallize. Practitioners have compared making protein crystals to holding together a stack of bowling balls with cellophane tape. Operationally, there are several steps in x-ray structural analysis (Fig. 1). Once a crystal is obtained, it is placed in an x-ray beam between the x-ray source and a detector, and a regular array of spots called re- 8885d_c04_136 12/23/03 7:49 AM Page 136 mac111 mac111:reb: 4.3 Protein Tertiary and Quaternary Structures 137 (c) (d) flections is generated. The spots are created by the diffracted x-ray beam, and each atom in a molecule makes a contribution to each spot. An electron-density map of the protein is reconstructed from the overall diffraction pattern of spots by using a mathematical technique called a Fourier transform. In effect, the computer acts as a “computational lens.” A model for the structure is then built that is consistent with the electron-density map. John Kendrew found that the x-ray diffraction pattern of crystalline myoglobin (isolated from mus- cles of the sperm whale) is very complex, with nearly 25,000 reflections. Computer analysis of these reflec- tions took place in stages. The resolution improved at each stage, until in 1959 the positions of virtually all the non-hydrogen atoms in the protein had been de- termined. The amino acid sequence of the protein, ob- tained by chemical analysis, was consistent with the molecular model. The structures of thousands of pro- teins, many of them much more complex than myo- globin, have since been determined to a similar level of resolution. The physical environment within a crystal, of course, is not identical to that in solution or in a liv- ing cell. A crystal imposes a space and time average on the structure deduced from its analysis, and x-ray diffraction studies provide little information about mo- lecular motion within the protein. The conformation of proteins in a crystal could in principle also be af- fected by nonphysiological factors such as incidental protein-protein contacts within the crystal. However, when structures derived from the analysis of crystals are compared with structural information obtained by other means (such as NMR, as described below), the crystal-derived structure almost always represents a functional conformation of the protein. X-ray crystal- lography can be applied successfully to proteins too large to be structurally analyzed by NMR. Nuclear Magnetic Resonance An important complementary method for determining the three-dimensional structures of macromolecules is nuclear magnetic resonance (NMR). Modern NMR techniques are being used to determine the structures of ever-larger macromolecules, including carbohy- drates, nucleic acids, and small to average-sized pro- teins. An advantage of NMR studies is that they are FIGURE 1 Steps in the determination of the structure of sperm whale myoglobin by x-ray crystallography. (a) X-ray diffraction patterns are generated from a crystal of the protein. (b) Data extracted from the diffraction patterns are used to calculate a three-dimensional elec- tron-density map of the protein. The electron density of only part of the structure, the heme, is shown. (c) Regions of greatest electron density reveal the location of atomic nuclei, and this information is used to piece together the final structure. Here, the heme structure is modeled into its electron-density map. (d) The completed struc- ture of sperm whale myoglobin, including the heme (PDB ID 2MBW). (continued on next page) 8885d_c04_137 12/23/03 7:49 AM Page 137 mac111 mac111:reb: carried out on macromolecules in solution, whereas x- ray crystallography is limited to molecules that can be crystallized. NMR can also illuminate the dynamic side of protein structure, including conformational changes, protein folding, and interactions with other molecules. NMR is a manifestation of nuclear spin angular momentum, a quantum mechanical property of atomic nuclei. Only certain atoms, including 1 H, 13 C, 15 N, 19 F, and 31 P, possess the kind of nuclear spin that gives rise to an NMR signal. Nuclear spin generates a mag- netic dipole. When a strong, static magnetic field is applied to a solution containing a single type of macro- molecule, the magnetic dipoles are aligned in the field in one of two orientations, parallel (low energy) or antiparallel (high energy). A short (~10 H9262s) pulse of electromagnetic energy of suitable frequency (the res- onant frequency, which is in the radio frequency range) is applied at right angles to the nuclei aligned in the magnetic field. Some energy is absorbed as nu- clei switch to the high-energy state, and the absorp- tion spectrum that results contains information about the identity of the nuclei and their immediate chemi- cal environment. The data from many such experi- ments performed on a sample are averaged, increas- ing the signal-to-noise ratio, and an NMR spectrum such as that in Figure 2 is generated. 1 H is particularly important in NMR experiments because of its high sensitivity and natural abundance. For macromolecules, 1 H NMR spectra can become quite complicated. Even a small protein has hundreds of 1 H atoms, typically resulting in a one-dimensional NMR spectrum too complex for analysis. Structural analysis of proteins became possible with the advent of two-dimensional NMR techniques (Fig. 3). These methods allow measurement of distance-dependent coupling of nuclear spins in nearby atoms through space (the nuclear Overhauser effect (NOE), in a method dubbed NOESY) or the coupling of nuclear spins in atoms connected by covalent bonds (total cor- relation spectroscopy, or TOCSY). Translating a two-dimensional NMR spectrum into a complete three-dimensional structure can be a labo- rious process. The NOE signals provide some informa- tion about the distances between individual atoms, but for these distance constraints to be useful, the atoms giving rise to each signal must be identified. Comple- mentary TOCSY experiments can help identify which NOE signals reflect atoms that are linked by covalent bonds. Certain patterns of NOE signals have been as- sociated with secondary structures such as H9251 helices. Modern genetic engineering (Chapter 9) can be used to prepare proteins that contain the rare isotopes 13 C or 15 N. The new NMR signals produced by these atoms, and the coupling with 1 H signals resulting from these substitutions, help in the assignment of individual 1 H NOE signals. The process is also aided by a knowledge of the amino acid sequence of the polypeptide. To generate a three-dimensional structure, re- searchers feed the distance constraints into a com- puter along with known geometric constraints such as chirality, van der Waals radii, and bond lengths and angles. The computer generates a family of closely re- lated structures that represent the range of confor- mations consistent with the NOE distance constraints (Fig. 3c). The uncertainty in structures generated by NMR is in part a reflection of the molecular vibrations (breathing) within a protein structure in solution, dis- cussed in more detail in Chapter 5. Normal experi- mental uncertainty can also play a role. When a protein structure has been determined by both x-ray crystallography and NMR, the structures Chapter 4 The Three-Dimensional Structure of Proteins138 FIGURE 2 A one-dimensional NMR spectrum of a globin from a marine blood worm. This protein and sperm whale myoglobin are very close structural analogs, belonging to the same protein struc- tural family and sharing an oxygen-transport function. 10.0 8.0 6.0 4.0 2.0 0.0 –2.0 1 H chemical shift (ppm) Analysis of Many Globular Proteins Reveals Common Structural Patterns Protein Architecture—Tertiary Structure of Large Globular Pro- teins For the beginning student, the very complex terti- ary structures of globular proteins much larger than those shown in Figure 4–18 are best approached by fo- cusing on structural patterns that recur in different and often unrelated proteins. The three-dimensional struc- ture of a typical globular protein can be considered an assemblage of polypeptide segments in the H9251-helix and H9252-sheet conformations, linked by connecting segments. The structure can then be described to a first approxi- mation by defining how these segments stack on one BOX 4–4 WORKING IN BIOCHEMISTRY (continued from previous page) 8885d_c04_138 12/23/03 7:50 AM Page 138 mac111 mac111:reb: generally agree well. In some cases, the precise loca- tions of particular amino acid side chains on the pro- tein exterior are different, often because of effects re- lated to the packing of adjacent protein molecules in a crystal. The two techniques together are at the heart of the rapid increase in the availability of structural information about the macromolecules of living cells. 4.3 Protein Tertiary and Quaternary Structures 139 1 2 –2.00.02.04.06.08.010.0 – 2.0 0.0 2.0 4.0 6.0 8.0 10.0 1 H chemical shift (ppm) 1 H chemical shift (ppm) (a) (b) 1 2 (c) FIGURE 3 The use of two-dimensional NMR to generate a three- dimensional structure of a globin, the same protein used to generate the data in Figure 2. The diagonal in a two-dimensional NMR spectrum is equivalent to a one-dimensional spectrum. The off-diagonal peaks are NOE signals generated by close-range interactions of 1 H atoms that may generate signals quite distant in the one-dimensional spectrum. Two such interactions are identified in (a), and their identities are shown with blue lines in (b) (PDB ID 1VRF). Three lines are drawn for interaction 2 between a methyl group in the protein and a hydrogen on the heme. The methyl group rotates rapidly such that each of its three hydrogens contributes equally to the interaction and the NMR signal. Such information is used to determine the complete three-dimensional structure (PDB ID 1VRE), as in (c). The multiple lines shown for the protein backbone represent the family of structures consistent with the distance constraints in the NMR data. The structural similarity with myoglobin (Fig. 1) is evident. The proteins are oriented in the same way in both figures. another and how the segments that connect them are arranged. This formalism has led to the development of databases that allow informative comparisons of protein structures, complementing other databases that permit comparisons of protein sequences. An understanding of a complete three-dimensional structure is built upon an analysis of its parts. We begin by defining terms used to describe protein substruc- tures, then turn to the folding rules elucidated from analysis of the structures of many proteins. Supersecondary structures, also called motifs or simply folds, are particularly stable arrangements of several elements of secondary structure and the con- nections between them. There is no universal agreement 8885d_c04_139 12/23/03 7:50 AM Page 139 mac111 mac111:reb: among biochemists on the application of the three terms, and they are often used interchangeably. The terms are also applied to a wide range of structures. Recognized motifs range from simple to complex, some- times appearing in repeating units or combinations. A single large motif may comprise the entire protein. We have already encountered one well-studied motif, the coiled coil of H9251-keratin, also found in a number of other proteins. Polypeptides with more than a few hundred amino acid residues often fold into two or more stable, globu- lar units called domains. In many cases, a domain from a large protein will retain its correct three-dimensional structure even when it is separated (for example, by proteolytic cleavage) from the remainder of the polypeptide chain. A protein with multiple domains may appear to have a distinct globular lobe for each domain (Fig. 4–19), but, more commonly, extensive contacts be- tween domains make individual domains hard to dis- cern. Different domains often have distinct functions, such as the binding of small molecules or interaction with other proteins. Small proteins usually have only one domain (the domain is the protein). Folding of polypeptides is subject to an array of physical and chemical constraints. A sampling of the prominent folding rules that have emerged provides an opportunity to introduce some simple motifs. 1. Hydrophobic interactions make a large contribu- tion to the stability of protein structures. Burial of hydrophobic amino acid R groups so as to exclude water requires at least two layers of secondary structure. Two simple motifs, the H9252-H9251-H9252 loop and the H9251-H9251 corner (Fig. 4–20a), create two layers. 2. Where they occur together in proteins, H9251 helices and H9252 sheets generally are found in different structural layers. This is because the backbone of a polypeptide segment in the H9252 conformation (Fig. 4–7) cannot readily hydrogen-bond to an H9251 helix aligned with it. Chapter 4 The Three-Dimensional Structure of Proteins140 FIGURE 4–19 Structural domains in the polypeptide troponin C. (PDB ID 4TNC) This calcium-binding protein associated with muscle has separate calcium-binding domains, indicated in blue and purple. FIGURE 4–20 Stable folding patterns in proteins. (a) Two simple and common motifs that provide two layers of secondary structure. Amino acid side chains at the interface between elements of secondary struc- ture are shielded from water. Note that the H9252 strands in the H9252-H9251-H9252 loop tend to twist in a right-handed fashion. (b) Connections between H9252 strands in layered H9252 sheets. The strands are shown from one end, with no twisting included in the schematic. Thick lines represent connec- tions at the ends nearest the viewer; thin lines are connections at the far ends of the H9252 strands. The connections on a given end (e.g., near the viewer) do not cross each other. (c) Because of the twist in H9252 strands, connections between strands are generally right-handed. Left- handed connections must traverse sharper angles and are harder to form. (d) Two arrangements of H9252 strands stabilized by the tendency of the strands to twist. This H9252 barrel is a single domain of H9251-hemolysin (a pore-forming toxin that kills a cell by creating a hole in its mem- brane) from the bacterium Staphylococcus aureus (derived from PDB ID 7AHL). The twisted H9252 sheet is from a domain of photolyase (a pro- tein that repairs certain types of DNA damage) from E. coli (derived from PDB ID 1DNP). Loop--H9252H9251H9252(a) H9251H9251- Corner Typical connections in an all- motifH9252 (b) Crossover connection (not observed) Right-handed connection between strandsH9252 (c) Left-handed connection between strands (very rare) H9252 BarrelH9252(d) Twisted sheetH9252 8885d_c04_140 12/30/03 2:13 PM Page 140 mac76 mac76:385_reb: 3. Polypeptide segments adjacent to each other in the primary sequence are usually stacked adjacent to each other in the folded structure. Although distant segments of a polypeptide may come together in the tertiary structure, this is not the norm. 4. Connections between elements of secondary structure cannot cross or form knots (Fig. 4–20b). 5. The H9252 conformation is most stable when the individual segments are twisted slightly in a right- handed sense. This influences both the arrange- ment of H9252 sheets relative to one another and the path of the polypeptide connection between them. Two parallel H9252 strands, for example, must be connected by a crossover strand (Fig. 4–20c). In principle, this crossover could have a right- or left- handed conformation, but in proteins it is almost always right-handed. Right-handed connections tend to be shorter than left-handed connections and tend to bend through smaller angles, making them easier to form. The twisting of H9252 sheets also leads to a characteristic twisting of the structure formed when many segments are put together. Two examples of resulting structures are the H9252 barrel and twisted H9252 sheet (Fig. 4–20d), which form the core of many larger structures. Following these rules, complex motifs can be built up from simple ones. For example, a series of H9252-H9251-H9252 loops, arranged so that the H9252 strands form a barrel, creates a particularly stable and common motif called the H9251/H9252 barrel (Fig. 4–21). In this structure, each parallel H9252 seg- ment is attached to its neighbor by an H9251-helical segment. All connections are right-handed. The H9251/H9252 barrel is found in many enzymes, often with a binding site for a cofactor or substrate in the form of a pocket near one end of the barrel. Note that domains exhibiting similar folding patterns are said to have the same motif even though their constituent H9251 helices and H9252 sheets may dif- fer in length. Protein Motifs Are the Basis for Protein Structural Classification Protein Architecture—Tertiary Structure of Large Globular Pro- teins, IV. Structural Classification of Proteins As we have seen, the complexities of tertiary structure are decreased by considering substructures. Taking this idea further, re- searchers have organized the complete contents of databases according to hierarchical levels of structure. The Structural Classification of Proteins (SCOP) data- base offers a good example of this very important trend in biochemistry. At the highest level of classification, the SCOP database (http://scop.mrc-lmb.cam.ac.uk/scop) borrows a scheme already in common use, in which pro- tein structures are divided into four classes: all H9251, all H9252, H9251/H9252 (in which the H9251 and H9252 segments are interspersed or alternate), and H9251 H11001 H9252 (in which the H9251 and H9252 regions are somewhat segregated) (Fig. 4–22). Within each class are tens to hundreds of different folding arrangements, built up from increasingly identifiable substructures. Some of the substructure arrangements are very common, oth- ers have been found in just one protein. Figure 4–22 dis- plays a variety of motifs arrayed among the four classes of protein structure. Those illustrated are just a minute sample of the hundreds of known motifs. The number of folding patterns is not infinite, however. As the rate at which new protein structures are elucidated has in- creased, the fraction of those structures containing a new motif has steadily declined. Fewer than 1,000 dif- ferent folds or motifs may exist in all proteins. Figure 4–22 also shows how proteins can be organized based on the presence of the various motifs. The top two lev- els of organization, class and fold, are purely structural. Below the fold level, categorization is based on evolu- tionary relationships. Many examples of recurring domain or motif struc- tures are available, and these reveal that protein terti- ary structure is more reliably conserved than primary sequence. The comparison of protein structures can thus provide much information about evolution. Pro- teins with significant primary sequence similarity, and/or with demonstrably similar structure and func- tion, are said to be in the same protein family. A strong evolutionary relationship is usually evident within a pro- tein family. For example, the globin family has many dif- ferent proteins with both structural and sequence sim- ilarity to myoglobin (as seen in the proteins used as examples in Box 4–4 and again in the next chapter). Two or more families with little primary sequence sim- ilarity sometimes make use of the same major structural 4.3 Protein Tertiary and Quaternary Structures 141 - - LoopH9251H9252H9252 / BarrelH9251H9252 FIGURE 4–21 Constructing large motifs from smaller ones. The H9251/H9252 barrel is a common motif constructed from repetitions of the simpler H9252-H9251-H9252 loop motif. This H9251/H9252 barrel is a domain of the pyruvate kinase (a glycolytic enzyme) from rabbit (derived from PDB ID 1PKN). 8885d_c04_141 12/23/03 7:50 AM Page 141 mac111 mac111:reb: Chapter 4 The Three-Dimensional Structure of Proteins142 1AO6 Serum albumin Serum albumin Serum albumin Serum albumin Human (Homo sapiens) 1JPC -Prism II -D-Mannose-specific plant lectins -D-Mannose-specific plant lectins Lectin (agglutinin) Snowdrop (Galanthus nivalis) 1LXA Single-stranded left-handed helix Trimeric LpxA-like enzymes UDP N-acetylglucosamine acyltransferase UDP N-acetylglucosamine acyltransferase Escherichia coli 1PEX Four-bladed propeller Hemopexin-like domain Hemopexin-like domain Collagenase-3 (MMP-13), carboxyl-terminal domain Human (Homo sapiens) 1GAI H11408 toroid Six-hairpin glycosyltransferase Glucoamylase Glucoamylase Aspergillus awamori, variant x100 1ENH DNA/RNA-binding 3-helical bundle Homeodomain-like Homeodomain engrailed Homeodomain Drosophila melanogaster 1BCF Ferritin-like Ferritin-like Ferritin Bacterioferritin (cytochrome b 1 ) Escherichia coli All All H9251 H9251H9251 H9251 H9251 H9251 H9251 H9251 H9251 H9252 H9252 H9252H9252 H9252 1HOE -Amylase inhibitor tendamistat -Amylase inhibitor tendamistat -Amylase inhibitor tendamistat -Amylase inhibitor tendamistat Streptomyces tendae 1CD8 Immunoglobulin-like sandwich Immunoglobulin V set domains (antibody variable domain-like) CD8 Human (Homo sapiens) 8885d_c04_142 12/30/03 2:14 PM Page 142 mac76 mac76:385_reb: 4.3 Protein Tertiary and Quaternary Structures 143 1DEH NAD(P)-binding Rossmann-fold domains NAD(P)-binding Rossmann-fold domains Alcohol/glucose dehydrogenases, carboxyl-terminal domain Alcohol dehydrogenase Human (Homo sapiens) 2PIL Pilin Pilin Pilin Pilin Neisseria gonorrhoeae 1U9A UBC-like UBC-like Ubibuitin-conjugating enzyme, UBC Ubiquitin-conjugating enzyme, UBC Human (Homo sapiens) ubc9 1SYN Thymidylate synthase/dCMP hydroxymethylase Thymidylate synthase/dCMP hydroxymethylase Thymidylate synthase/dCMP hydroxymethylase Thymidylate synthase Escherichia coli 1EMA GFP-like GFP-like Fluorescent proteins Green fluorescent protein, GFP Jellyfish (Aequorea victoria) 1DUB ClpP/crotonase ClpP/crotonase Crotonase-like Enoyl-CoA hydratase (crotonase) Rat (Rattus norvegicus) 1PFK Phosphofructokinase Phosphofructokinase Phosphofructokinase ATP-dependent phosphofructokinase Escherichia coli PDB identifier Fold Superfamily Family Protein Species H11545 /H9251H9252 H9251 H9252 FIGURE 4–22 Organization of proteins based on motifs. Shown here are just a small number of the hundreds of known stable motifs. They are divided into four classes: all H9251, all H9252, H9251/H9252, and H9251 H11001 H9252. Structural classification data from the SCOP (Structural Classification of Proteins) database (http://scop.mrc-lmb.cam.ac.uk/scop) are also provided. The PDB identifier is the unique number given to each structure archived in the Protein Data Bank (www.rcsb.org/pdb). The H9251/H9252 barrel, shown in Figure 4–21, is another particularly common H9251/H9252 motif. 8885d_c04_143 12/30/03 2:14 PM Page 143 mac76 mac76:385_reb: motif and have functional similarities; these families are grouped as superfamilies. An evolutionary relationship between the families in a superfamily is considered probable, even though time and functional distinc- tions—hence different adaptive pressures—may have erased many of the telltale sequence relationships. A protein family may be widespread in all three domains of cellular life, the Bacteria, Archaea, and Eukarya, sug- gesting a very ancient origin. Other families may be pres- ent in only a small group of organisms, indicating that the structure arose more recently. Tracing the natural history of structural motifs, using structural classifica- tions in databases such as SCOP, provides a powerful complement to sequence analyses in tracing many evo- lutionary relationships. The SCOP database is curated manually, with the objective of placing proteins in the correct evolutionary framework based on conserved structural features. Two similar enterprises, the CATH (class, architecture, topology, and homologous superfamily) and FSSP ( fold classification based on structure-structure alignment of proteins) databases, make use of more automated meth- ods and can provide additional information. Structural motifs become especially important in defining protein families and superfamilies. Improved classification and comparison systems for proteins lead inevitably to the elucidation of new functional relation- ships. Given the central role of proteins in living sys- tems, these structural comparisons can help illuminate every aspect of biochemistry, from the evolution of in- dividual proteins to the evolutionary history of complete metabolic pathways. Protein Quaternary Structures Range from Simple Dimers to Large Complexes Protein Architecture—Quaternary Structure Many proteins have multiple polypeptide subunits. The association of polypeptide chains can serve a variety of functions. Many multisubunit proteins have regulatory roles; the binding of small molecules may affect the interaction between subunits, causing large changes in the protein’s activity in response to small changes in the concentra- tion of substrate or regulatory molecules (Chapter 6). In other cases, separate subunits can take on separate but related functions, such as catalysis and regulation. Some associations, such as the fibrous proteins consid- ered earlier in this chapter and the coat proteins of viruses, serve primarily structural roles. Some very large protein assemblies are the site of complex, multistep re- actions. One example is the ribosome, site of protein synthesis, which incorporates dozens of protein sub- units along with a number of RNA molecules. A multisubunit protein is also referred to as a mul- timer. Multimeric proteins can have from two to hun- dreds of subunits. A multimer with just a few subunits is often called an oligomer. If a multimer is composed of a number of nonidentical subunits, the overall struc- ture of the protein can be asymmetric and quite com- plicated. However, most multimers have identical sub- units or repeating groups of nonidentical subunits, usually in symmetric arrangements. As noted in Chap- ter 3, the repeating structural unit in such a multimeric protein, whether it is a single subunit or a group of sub- units, is called a protomer. The first oligomeric protein for which the three- dimensional structure was determined was hemoglobin (M r 64,500), which contains four polypeptide chains and four heme prosthetic groups, in which the iron atoms are in the ferrous (Fe 2H11001 ) state (Fig. 4–17). The protein portion, called globin, consists of two H9251 chains (141 residues each) and two H9252 chains (146 residues each). Note that in this case H9251 and H9252 do not refer to second- ary structures. Because hemoglobin is four times as large as myoglobin, much more time and effort were re- quired to solve its three-dimensional structure by x-ray analysis, finally achieved by Max Perutz, John Kendrew, and their colleagues in 1959. The subunits of hemoglo- bin are arranged in symmetric pairs (Fig. 4–23), each pair having one H9251 and one H9252 subunit. Hemoglobin can therefore be described either as a tetramer or as a dimer of H9251H9252 protomers. Identical subunits of multimeric proteins are gen- erally arranged in one or a limited set of symmetric pat- terns. A description of the structure of these proteins requires an understanding of conventions used to de- fine symmetries. Oligomers can have either rotational symmetry or helical symmetry; that is, individual subunits can be superimposed on others (brought to co- incidence) by rotation about one or more rotational axes, or by a helical rotation. In proteins with rotational symmetry, the subunits pack about the rotational axes to form closed structures. Proteins with helical symme- Chapter 4 The Three-Dimensional Structure of Proteins144 Max Perutz, 1914–2002 (left) John Kendrew, 1917–1997 (right) 8885d_c04_144 12/23/03 7:51 AM Page 144 mac111 mac111:reb: try tend to form structures that are more open-ended, with subunits added in a spiraling array. There are several forms of rotational symmetry. The simplest is cyclic symmetry, involving rotation about a single axis (Fig. 4–24a). If subunits can be superimposed by rotation about a single axis, the protein has a sym- metry defined by convention as C n (C for cyclic, n for the number of subunits related by the axis). The axis itself is described as an n-fold rotational axis. The H9251H9252 protomers of hemoglobin (Fig. 4–23) are related by C 2 symmetry. A somewhat more complicated rotational symmetry is dihedral symmetry, in which a twofold rotational axis intersects an n-fold axis at right angles. The symmetry is defined as D n (Fig. 4–24b). A protein with dihedral symmetry has 2n protomers. Proteins with cyclic or dihedral symmetry are par- ticularly common. More complex rotational symmetries are possible, but only a few are regularly encountered. One example is icosahedral symmetry. An icosahe- dron is a regular 12-cornered polyhedron having 20 equilateral triangular faces (Fig. 4–24c). Each face can 4.3 Protein Tertiary and Quaternary Structures 145 (a) (b) FIGURE 4–23 Quaternary structure of deoxyhemoglobin. (PDB ID 2HHB) X-ray diffraction analysis of deoxyhemoglobin (hemoglobin without oxygen molecules bound to the heme groups) shows how the four polypeptide subunits are packed together. (a) A ribbon represen- tation. (b) A space-filling model. The H9251 subunits are shown in gray and light blue; the H9252 subunits in pink and dark blue. Note that the heme groups (red) are relatively far apart. Icosahedral symmetry (c) Fivefold Threefold Twofold Two types of dihedral symmetry (b) D 2 D 4 Twofold Fourfold Twofold Twofold Twofold Twofold Two types of cyclic symmetry (a) C 2 C 3 Twofold Threefold FIGURE 4–24 Rotational symmetry in proteins. (a) In cyclic sym- metry, subunits are related by rotation about a single n-fold axis, where n is the number of subunits so related. The axes are shown as black lines; the numbers are values of n. Only two of many possible C n arrangements are shown. (b) In dihedral symmetry, all subunits can be related by rotation about one or both of two axes, one of which is twofold. D 2 symmetry is most common. (c) Icosahedral symmetry. Re- lating all 20 triangular faces of an icosahedron requires rotation about one or more of three separate rotational axes: twofold, threefold, and fivefold. An end-on view of each of these axes is shown at the right. 8885d_c04_145 12/23/03 7:51 AM Page 145 mac111 mac111:reb: be brought to coincidence with another by rotation about one or more of three rotational axes. This is a common structure in virus coats, or capsids. The human poliovirus has an icosahedral capsid (Fig. 4–25a). Each triangular face is made up of three protomers, each pro- tomer containing single copies of four different polypep- tide chains, three of which are accessible at the outer surface. Sixty protomers form the 20 faces of the icosa- hedral shell enclosing the genetic material (RNA). The other major type of symmetry found in oligomers, helical symmetry, also occurs in capsids. To- bacco mosaic virus is a right-handed helical filament made up of 2,130 identical subunits (Fig. 4–25b). This cylindrical structure encloses the viral RNA. Proteins with subunits arranged in helical filaments can also form long, fibrous structures such as the actin filaments of muscle (see Fig. 5–30). There Are Limits to the Size of Proteins The relatively large size of proteins reflects their func- tions. The function of an enzyme, for example, requires a stable structure containing a pocket large enough to bind its substrate and catalyze a reaction. Protein size has limits, however, imposed by two factors: the genetic coding capacity of nucleic acids and the accuracy of the protein biosynthetic process. The use of many copies of one or a few proteins to make a large enclosing struc- ture (capsid) is important for viruses because this strat- egy conserves genetic material. Remember that there is a linear correspondence between the sequence of a gene in the nucleic acid and the amino acid sequence of the protein for which it codes (see Fig. 1–31). The nucleic acids of viruses are much too small to encode the in- formation required for a protein shell made of a single polypeptide. By using many copies of much smaller polypeptides, a much shorter nucleic acid is needed for coding the capsid subunits, and this nucleic acid can be efficiently used over and over again. Cells also use large complexes of polypeptides in muscle, cilia, the cyto- skeleton, and other structures. It is simply more effi- cient to make many copies of a small polypeptide than one copy of a very large protein. In fact, most proteins with a molecular weight greater than 100,000 have mul- tiple subunits, identical or different. The second factor limiting the size of proteins is the error frequency during protein biosynthesis. The error frequency is low (about 1 mistake per 10,000 amino acid residues added), but even this low rate results in a high probability of a damaged protein if the protein is very large. Simply put, the potential for incorporating a “wrong” amino acid in a protein is greater for a large protein than for a small one. SUMMARY 4.3 Protein Tertiary and Quaternary Structures ■ Tertiary structure is the complete three- dimensional structure of a polypeptide chain. There are two general classes of proteins based on tertiary structure: fibrous and globular. ■ Fibrous proteins, which serve mainly structural roles, have simple repeating elements of secondary structure. ■ Globular proteins have more complicated tertiary structures, often containing several types of secondary structure in the same polypeptide chain. The first globular protein structure to be determined, using x-ray diffraction methods, was that of myoglobin. ■ The complex structures of globular proteins can be analyzed by examining stable substructures called supersecondary structures, Chapter 4 The Three-Dimensional Structure of Proteins146 (b) (a) Protein subunit RNA FIGURE 4–25 Viral capsids. (a) Poliovirus (derived from PDB ID 2PLV). The coat proteins of poliovirus assemble into an icosahedron 300 ? in diameter. Icosahedral symmetry is a type of rotational sym- metry (see Fig. 4–24c). On the left is a surface contour image of the poliovirus capsid. In the image on the right, lines have been super- imposed to show the axes of symmetry. (b) Tobacco mosaic virus (de- rived from PDB ID 1VTM). This rod-shaped virus (as shown in the electron micrograph) is 3,000 ? long and 180 ? in diameter; it has helical symmetry. 8885d_c04_146 12/23/03 7:51 AM Page 146 mac111 mac111:reb: motifs, or folds. The thousands of known protein structures are generally assembled from a repertoire of only a few hundred motifs. Regions of a polypeptide chain that can fold stably and independently are called domains. ■ Quaternary structure results from interactions between the subunits of multisubunit (multimeric) proteins or large protein assemblies. Some multimeric proteins have a repeated unit consisting of a single subunit or a group of subunits referred to as a protomer. Protomers are usually related by rotational or helical symmetry. 4.4 Protein Denaturation and Folding All proteins begin their existence on a ribosome as a lin- ear sequence of amino acid residues (Chapter 27). This polypeptide must fold during and following synthesis to take up its native conformation. We have seen that a na- tive protein conformation is only marginally stable. Mod- est changes in the protein’s environment can bring about structural changes that can affect function. We now ex- plore the transition that occurs between the folded and unfolded states. Loss of Protein Structure Results in Loss of Function Protein structures have evolved to function in particu- lar cellular environments. Conditions different from those in the cell can result in protein structural changes, large and small. A loss of three-dimensional structure suffi- cient to cause loss of function is called denaturation. The denatured state does not necessarily equate with complete unfolding of the protein and randomization of conformation. Under most conditions, denatured pro- teins exist in a set of partially folded states that are poorly understood. Most proteins can be denatured by heat, which af- fects the weak interactions in a protein (primarily hy- drogen bonds) in a complex manner. If the temperature is increased slowly, a protein’s conformation generally remains intact until an abrupt loss of structure (and function) occurs over a narrow temperature range (Fig. 4–26). The abruptness of the change suggests that un- folding is a cooperative process: loss of structure in one part of the protein destabilizes other parts. The effects of heat on proteins are not readily predictable. The very heat-stable proteins of thermophilic bacteria have evolved to function at the temperature of hot springs (~100 H11034C). Yet the structures of these proteins often dif- fer only slightly from those of homologous proteins de- rived from bacteria such as Escherichia coli. How these small differences promote structural stability at high temperatures is not yet understood. Proteins can be denatured not only by heat but by extremes of pH, by certain miscible organic solvents such as alcohol or acetone, by certain solutes such as urea and guanidine hydrochloride, or by detergents. Each of these denaturing agents represents a relatively mild treatment in the sense that no covalent bonds in the polypeptide chain are broken. Organic solvents, urea, and detergents act primarily by disrupting the hy- drophobic interactions that make up the stable core of globular proteins; extremes of pH alter the net charge on the protein, causing electrostatic repulsion and the disruption of some hydrogen bonding. The denatured states obtained with these various treatments need not be equivalent. 4.4 Protein Denaturation and Folding 147 Ribonuclease A (a) 80 100 60 40 20 0 2040608010 Ribonuclease A Apomyoglobin Temperature (°C) P ercent of maximum signal (b) 80 100 60 40 20 012345 [GdnHCl], M P ercent unfolded T m T m T m FIGURE 4–26 Protein denaturation. Results are shown for proteins de- natured by two different environmental changes. In each case, the tran- sition from the folded to unfolded state is fairly abrupt, suggesting co- operativity in the unfolding process. (a) Thermal denaturation of horse apomyoglobin (myoglobin without the heme prosthetic group) and ri- bonuclease A (with its disulfide bonds intact; see Fig. 4–27). The mid- point of the temperature range over which denaturation occurs is called the melting temperature, or T m . The denaturation of apomyoglobin was monitored by circular dichroism, a technique that measures the amount of helical structure in a macromolecule. Denaturation of ribonuclease A was tracked by monitoring changes in the intrinsic fluorescence of the protein, which is affected by changes in the environment of Trp residues. (b) Denaturation of disulfide-intact ribonuclease A by guani- dine hydrochloride (GdnHCl), monitored by circular dichroism. 8885d_c04_147 12/23/03 7:52 AM Page 147 mac111 mac111:reb: Amino Acid Sequence Determines Tertiary Structure The tertiary structure of a globular protein is deter- mined by its amino acid sequence. The most important proof of this came from experiments showing that de- naturation of some proteins is reversible. Certain glob- ular proteins denatured by heat, extremes of pH, or de- naturing reagents will regain their native structure and their biological activity if returned to conditions in which the native conformation is stable. This process is called renaturation. A classic example is the denaturation and renatu- ration of ribonuclease. Purified ribonuclease can be completely denatured by exposure to a concentrated urea solution in the presence of a reducing agent. The reducing agent cleaves the four disulfide bonds to yield eight Cys residues, and the urea disrupts the stabiliz- ing hydrophobic interactions, thus freeing the entire polypeptide from its folded conformation. Denaturation of ribonuclease is accompanied by a complete loss of catalytic activity. When the urea and the reducing agent are removed, the randomly coiled, denatured ribonu- clease spontaneously refolds into its correct tertiary structure, with full restoration of its catalytic activity (Fig. 4–27). The refolding of ribonuclease is so accurate that the four intrachain disulfide bonds are re-formed in the same positions in the renatured molecule as in the native ribonuclease. As calculated mathematically, the eight Cys residues could recombine at random to form up to four disulfide bonds in 105 different ways. In fact, an essentially random distribution of disulfide bonds is obtained when the disulfides are allowed to re- form in the presence of denaturant, indicating that weak bonding interactions are required for correct position- ing of disulfide bonds and assumption of the native conformation. This classic experiment, carried out by Christian Anfinsen in the 1950s, provided the first evidence that the amino acid sequence of a polypeptide chain contains all the information required to fold the chain into its na- tive, three-dimensional structure. Later, similar results were obtained using chemically synthesized, catalyti- cally active ribonuclease. This eliminated the possibility that some minor contaminant in Anfinsen’s purified ribonuclease preparation might have contributed to the renaturation of the enzyme, thus dispelling any re- maining doubt that this enzyme folds spontaneously. Polypeptides Fold Rapidly by a Stepwise Process In living cells, proteins are assembled from amino acids at a very high rate. For example, E. coli cells can make a complete, biologically active protein molecule con- taining 100 amino acid residues in about 5 seconds at 37 H11034C. How does such a polypeptide chain arrive at its native conformation? Let’s assume conservatively that each of the amino acid residues could take up 10 dif- ferent conformations on average, giving 10 100 different conformations for the polypeptide. Let’s also assume that the protein folds itself spontaneously by a random process in which it tries out all possible conformations around every single bond in its backbone until it finds its native, biologically active form. If each conformation were sampled in the shortest possible time (~10 H1100213 sec- ond, or the time required for a single molecular vibra- tion), it would take about 10 77 years to sample all pos- sible conformations. Thus protein folding cannot be a completely random, trial-and-error process. There must be shortcuts. This problem was first pointed out by Cyrus Levinthal in 1968 and is sometimes called Levinthal’s paradox. The folding pathway of a large polypeptide chain is unquestionably complicated, and not all the principles that guide the process have been worked out. However, extensive study has led to the development of several Chapter 4 The Three-Dimensional Structure of Proteins148 26 removal of urea and mercapto- ethanol addition of urea and mercapto- ethanol 84 40 95 110 58 65 72 110 95 HS HS HS HS HS SH SH SH 72 65 58 40 26 84 40 26 84 65 72 58 110 95 Native state; catalytically active. Unfolded state; inactive. Disulfide cross-links reduced to yield Cys residues. Native, catalytically active state. Disulfide cross-links correctly re-formed. FIGURE 4–27 Renaturation of unfolded, denatured ribonuclease. Urea is used to denature ribonuclease, and mercaptoethanol (HOCH 2 CH 2 SH) to reduce and thus cleave the disulfide bonds to yield eight Cys residues. Renaturation involves reestablishment of the cor- rect disulfide cross-links. 8885d_c04_148 12/23/03 7:52 AM Page 148 mac111 mac111:reb: plausible models. In one, the folding process is envi- sioned as hierarchical. Local secondary structures form first. Certain amino acid sequences fold readily into H9251 helices or H9252 sheets, guided by constraints we have re- viewed in our discussion of secondary structure. This is followed by longer-range interactions between, say, two H9251 helices that come together to form stable supersec- ondary structures. The process continues until complete domains form and the entire polypeptide is folded (Fig. 4–28). In an alternative model, folding is initiated by a spontaneous collapse of the polypeptide into a compact state, mediated by hydrophobic interactions among non- polar residues. The state resulting from this “hy- drophobic collapse” may have a high content of sec- ondary structure, but many amino acid side chains are not entirely fixed. The collapsed state is often referred to as a molten globule. Most proteins probably fold by a process that incorporates features of both models. In- stead of following a single pathway, a population of pep- tide molecules may take a variety of routes to the same end point, with the number of different partly folded conformational species decreasing as folding nears completion. Thermodynamically, the folding process can be viewed as a kind of free-energy funnel (Fig. 4–29). The unfolded states are characterized by a high degree of conformational entropy and relatively high free energy. As folding proceeds, the narrowing of the funnel repre- 4.4 Protein Denaturation and Folding 149 FIGURE 4–28 A simulated folding pathway. The folding pathway of a 36-residue segment of the protein villin (an actin-binding protein found principally in the microvilli lining the intestine) was simulated by computer. The process started with the randomly coiled peptide and 3,000 surrounding water molecules in a virtual “water box.” The molecular motions of the peptide and the effects of the water mole- cules were taken into account in mapping the most likely paths to the final structure among the countless alternatives. The simulated folding took place in a theoretical time span of 1 ms; however, the calculation required half a billion integration steps on two Cray supercomputers, each running for two months. P ercentage of residues of protein in native conformation Energy Molten globule states Native structure Discrete folding intermediates 100 0 Entropy Beginning of helix formation and collapse FIGURE 4–29 The thermodynamics of protein folding depicted as a free-energy funnel. At the top, the number of conformations, and hence the conformational entropy, is large. Only a small fraction of the intramolecular interactions that will exist in the native conforma- tion are present. As folding progresses, the thermodynamic path down the funnel reduces the number of states present (decreases entropy), increases the amount of protein in the native conformation, and de- creases the free energy. Depressions on the sides of the funnel repre- sent semistable folding intermediates, which may, in some cases, slow the folding process. sents a decrease in the number of conformational species present. Small depressions along the sides of the free-energy funnel represent semistable intermediates that can briefly slow the folding process. At the bottom of the funnel, an ensemble of folding intermediates has been reduced to a single native conformation (or one of a small set of native conformations). Defects in protein folding may be the molecular basis for a wide range of human genetic disorders. For example, cystic fibrosis is caused by defects in a membrane-bound protein called cystic fibrosis trans- membrane conductance regulator (CFTR), which acts as a channel for chloride ions. The most common cystic 8885d_c04_149 12/23/03 7:52 AM Page 149 mac111 mac111:reb: fibrosis–causing mutation is the deletion of a Phe residue at position 508 in CFTR, which causes improper protein folding (see Box 11–3). Many of the disease- related mutations in collagen (p. 129) also cause de- fective folding. An improved understanding of protein folding may lead to new therapies for these and many other diseases (Box 4–5). ■ Thermodynamic stability is not evenly distributed over the structure of a protein—the molecule has re- gions of high and low stability. For example, a protein Chapter 4 The Three-Dimensional Structure of Proteins150 BOX 4–5 BIOCHEMISTRY IN MEDICINE Death by Misfolding: The Prion Diseases A misfolded protein appears to be the causative agent of a number of rare degenerative brain diseases in mammals. Perhaps the best known of these is mad cow disease (bovine spongiform encephalopathy, BSE), an outbreak of which made international head- lines in the spring of 1996. Related diseases include kuru and Creutzfeldt-Jakob disease in humans, scrapie in sheep, and chronic wasting disease in deer and elk. These diseases are also referred to as spongiform en- cephalopathies, because the diseased brain frequently becomes riddled with holes (Fig. 1). Typical symptoms include dementia and loss of coordination. The dis- eases are fatal. In the 1960s, investigators found that prepara- tions of the disease-causing agents appeared to lack nucleic acids. At this time, Tikvah Alper suggested that the agent was a protein. Initially, the idea seemed heretical. All disease-causing agents known up to that time—viruses, bacteria, fungi, and so on—contained nucleic acids, and their virulence was related to ge- netic reproduction and propagation. However, four decades of investigations, pursued most notably by Stanley Prusiner, have provided evidence that spongi- form encephalopathies are different. The infectious agent has been traced to a single protein (M r 28,000), which Prusiner dubbed prion (from proteinaceous infectious only) protein (PrP). Prion protein is a normal constituent of brain tissue in all mammals. Its role in the mammalian brain is not known in detail, but it appears to have a molecular signaling function. Strains of mice lacking the gene for PrP (and thus the protein itself) suffer no obvious ill effects. Illness occurs only when the normal cellular PrP, or PrP C , occurs in an altered conformation called PrP Sc (Sc denotes scrapie). The interaction of PrP Sc with PrP C converts the latter to PrP Sc , initiating a domino effect in which more and more of the brain protein converts to the disease-causing form. The mechanism by which the presence of PrP Sc leads to spongiform encephalopathy is not understood. In inherited forms of prion diseases, a mutation in the gene encoding PrP produces a change in one amino acid residue that is believed to make the conversion of PrP C to PrP Sc more likely. A complete understanding of prion diseases awaits new information about how prion protein affects brain function. Structural information about PrP is beginning to provide insights into the mo- lecular process that allows the prion proteins to inter- act so as to alter their conformation (Fig. 2). FIGURE 1 A stained section of the cerebral cortex from a patient with Creutzfeldt-Jakob disease shows spongiform (vacuolar) degen- eration, the most characteristic neurohistological feature. The yel- lowish vacuoles are intracellular and occur mostly in pre- and post- synaptic processes of neurons. The vacuoles in this section vary in diameter from 20 to 100 H9262m. FIGURE 2 The structure of the globular domain of human PrP in monomeric (left) and dimeric (right) forms. The second subunit is gray to highlight the dramatic conformational change in the green H9251 helix when the dimer is formed. 8885d_c04_150 12/23/03 7:52 AM Page 150 mac111 mac111:reb: may have two stable domains joined by a segment with lower structural stability, or one small part of a domain may have a lower stability than the remainder. The re- gions of low stability allow a protein to alter its confor- mation between two or more states. As we shall see in the next two chapters, variations in the stability of re- gions within a given protein are often essential to pro- tein function. Some Proteins Undergo Assisted Folding Not all proteins fold spontaneously as they are synthe- sized in the cell. Folding for many proteins is facilitated by the action of specialized proteins. Molecular chap- erones are proteins that interact with partially folded or improperly folded polypeptides, facilitating correct folding pathways or providing microenvironments in which folding can occur. Two classes of molecular chap- erones have been well studied. Both are found in or- ganisms ranging from bacteria to humans. The first class, a family of proteins called Hsp70, generally have a molecular weight near 70,000 and are more abundant in cells stressed by elevated temperatures (hence, heat shock proteins of M r 70,000, or Hsp70). Hsp70 proteins bind to regions of unfolded polypeptides that are rich in hydrophobic residues, preventing inappropriate aggregation. These chaperones thus “protect” proteins that have been denatured by heat and peptides that are being synthesized (and are not yet folded). Hsp70 proteins also block the folding of certain proteins that must remain unfolded until they have been translocated across membranes (as described in Chapter 27). Some chaperones also facilitate the quaternary assembly of oligomeric proteins. The Hsp70 proteins bind to and release polypeptides in a cycle that also involves sev- eral other proteins (including a class called Hsp40) and ATP hydrolysis. Figure 4–30 illustrates chaperone- assisted folding as elucidated for the chaperones DnaK and DnaJ in E. coli, homologs of the eukaryotic Hsp70 and Hsp40. DnaK and DnaJ were first identified as pro- teins required for in vitro replication of certain viral DNA molecules (hence the “Dna” designation). 4.4 Protein Denaturation and Folding 151 DnaJ DnaK 2 P i Unfolded protein Folded protein (native conformation) GrpE ADP + GrpE (+ DnaJ ?) + + + ATP ATP ATP ATP ATP + ATP To GroEL system Partially folded protein 1 DnaJ binds to the unfolded or partially folded protein and then to DnaK. 4 ATP binds to DnaK and the protein dissociates. 2 DnaJ stimulates ATP hydrolysis by DnaK. DnaK–ADP binds tightly to the unfolded protein. 3 In bacteria, the nucleotide-exchange factor GrpE stimulates release of ADP. ADP ADP FIGURE 4–30 Chaperones in protein folding. The cyclic pathway by which chaperones bind and release polypeptides is illustrated for the E. coli chaperone proteins DnaK and DnaJ, homologs of the eukary- otic chaperones Hsp70 and Hsp40. The chaperones do not actively promote the folding of the substrate protein, but instead prevent ag- gregation of unfolded peptides. For a population of polypeptides, some fraction of the polypeptides released at the end of the cycle are in the native conformation. The remainder are rebound by DnaK or are di- verted to the chaperonin system (GroEL; see Fig. 4–31). In bacteria, a protein called GrpE interacts transiently with DnaK late in the cycle (step 3 ), promoting dissociation of ADP and possibly DnaJ. No eu- karyotic analog of GrpE is known. 8885d_c04_151 12/23/03 7:53 AM Page 151 mac111 mac111:reb: The second class of chaperones is called chaper- onins. These are elaborate protein complexes required for the folding of a number of cellular proteins that do not fold spontaneously. In E. coli an estimated 10% to 15% of cellular proteins require the resident chaperonin system, called GroEL/GroES, for folding under normal conditions (up to 30% require this assistance when the cells are heat stressed). These proteins first became known when they were found to be necessary for the growth of certain bacterial viruses (hence the designa- tion “Gro”). Unfolded proteins are bound within pock- ets in the GroEL complex, and the pockets are capped transiently by the GroES “lid” (Fig. 4–31). GroEL un- dergoes substantial conformational changes, coupled to ATP hydrolysis and the binding and release of GroES, which promote folding of the bound polypeptide. Al- though the structure of the GroEL/GroES chaperonin is known, many details of its mechanism of action remain unresolved. Finally, the folding pathways of a number of pro- teins require two enzymes that catalyze isomerization reactions. Protein disulfide isomerase (PDI) is a widely distributed enzyme that catalyzes the inter- change or shuffling of disulfide bonds until the bonds of the native conformation are formed. Among its func- tions, PDI catalyzes the elimination of folding interme- Chapter 4 The Three-Dimensional Structure of Proteins152 (b) GroEL 7 P i , Unfolded protein GroES 7 7 P i GroES (a) GroES 7 P i GroES 1 Unfolded protein binds to the GroEL pocket not blocked by GroES. 2 ATP binds to each subunit of the GroEL heptamer. 3 ATP hydrolysis leads to release of 14 ADP and GroES. Folded protein 7 Proteins not folded when released are rapidly bound again. 6 The released protein is fully folded or in a partially folded state that is committed to adopt the native conformation. 5 Protein folds inside the enclosure. 4 7 ATP and GroES bind to GroEL with a filled pocket. ATP 7 ADP 7 ADP 7 ADP 7 ADP 7 ADP 7 ADP 7 ATP 7 ATP 7 ATP 7 ATP 7 ATP FIGURE 4–31 Chaperonins in protein folding. (a) A proposed pathway for the action of the E. coli chaperonins GroEL (a member of the Hsp60 protein family) and GroES. Each GroEL complex consists of two large pockets formed by two heptameric rings (each subunit M r 57,000). GroES, also a heptamer (subunits M r 10,000), blocks one of the GroEL pockets. (b) Surface and cut-away images of the GroEL/GroES complex (PDB ID 1AON). The cut-away (right) illustrates the large interior space within which other proteins are bound. 8885d_c04_152 12/23/03 7:53 AM Page 152 mac111 mac111:reb: diates with inappropriate disulfide cross-links. Peptide prolyl cis-trans isomerase (PPI) catalyzes the in- terconversion of the cis and trans isomers of Pro pep- tide bonds (Fig. 4–8b), which can be a slow step in the folding of proteins that contain some Pro residue pep- tide bonds in the cis conformation. Protein folding is likely to be a more complex process in the densely packed cellular environment than in the test tube. More classes of proteins that facilitate protein folding may be discovered as the biochemical dissection of the folding process continues. SUMMARY 4.4 Protein Denaturation and Folding ■ The three-dimensional structure and the function of proteins can be destroyed by denaturation, demonstrating a relationship between structure and function. Some denatured proteins can renature spontaneously to form biologically active protein, showing that protein tertiary structure is determined by amino acid sequence. ■ Protein folding in cells probably involves multiple pathways. Initially, regions of secondary structure may form, followed by folding into supersecondary structures. Large ensembles of folding intermediates are rapidly brought to a single native conformation. ■ For many proteins, folding is facilitated by Hsp70 chaperones and by chaperonins. Disulfide bond formation and the cis-trans isomerization of Pro peptide bonds are catalyzed by specific enzymes. Chapter 4 Further Reading 153 Key Terms conformation 116 native conformation 117 solvation layer 117 peptide group 118 Ramachandran plot 118 secondary struc- ture 120 H9251 helix 120 H9252 conformation 123 H9252 sheet 123 H9252 turn 123 tertiary structure 125 quaternary structure 125 fibrous proteins 125 globular proteins 125 H9251-keratin 126 collagen 127 silk fibroin 129 supersecondary struc- tures 139 motif 139 fold 139 domain 140 protein family 141 multimer 144 oligomer 144 protomer 144 symmetry 144 denaturation 147 molten globule 149 prion 150 molecular chaperone 151 Hsp70 151 chaperonin 152 Terms in bold are defined in the glossary. Further Reading General Anfinsen, C.B. (1973) Principles that govern the folding of protein chains. Science 181, 223–230. The author reviews his classic work on ribonuclease. Branden, C. & Tooze, J. (1991) Introduction to Protein Structure, Garland Publishing, Inc., New York. Creighton, T.E. (1993) Proteins: Structures and Molecular Properties, 2nd edn, W. H. Freeman and Company, New York. A comprehensive and authoritative source. Evolution of Catalytic Function. (1987) Cold Spring Harb. Symp. Quant. Biol. 52. A collection of excellent articles on many topics, including protein structure, folding, and function. Kendrew, J.C. (1961) The three-dimensional structure of a protein molecule. Sci. Am. 205 (December), 96–111. Describes how the structure of myoglobin was determined and what was learned from it. Richardson, J.S. (1981) The anatomy and taxonomy of protein structure. Adv. Prot. Chem. 34, 167–339. An outstanding summary of protein structural patterns and principles; the author originated the very useful “ribbon” representations of protein structure. Secondary, Tertiary, and Quaternary Structures Berman, H.M. (1999) The past and future of structure databases. Curr. Opin. Biotechnol. 10, 76–80. A broad summary of the different approaches being used to catalog protein structures. Brenner, S.E., Chothia, C., & Hubbard, T.J.P. (1997) Population statistics of protein structures: lessons from structural classifications. Curr. Opin. Struct. Biol. 7, 369–376. Fuchs, E. & Cleveland, D.W. (1998) A structural scaffolding of intermediate filaments in health and disease. Science 279, 514–519. 8885d_c04_153 1/16/04 6:14 AM Page 153 mac76 mac76:385_reb: Chapter 4 The Three-Dimensional Structure of Proteins154 McPherson, A. (1989) Macromolecular crystals. Sci. Am. 260 (March), 62–69. A description of how macromolecules such as proteins are crystallized. Ponting, C.P. & Russell, R.R. (2002) The natural history of protein domains. Annu. Rev. Biophys. Biomol. Struct. 31, 45–71. An explanation of how structural databases can be used to explore evolution. Prockop, D.J. & Kivirikko, K.I. (1995) Collagens, molecular biology, diseases, and potentials for therapy. Annu. Rev. Biochem. 64, 403–434. Protein Denaturation and Folding Baldwin, R.L. (1994) Matching speed and stability. Nature 369, 183–184. Bukau, B., Deuerling, E., Pfund, C., & Craig, E.A. (2000) Getting newly synthesized proteins into shape. Cell 101, 119–122. A good summary of chaperone mechanisms. Collinge, J. (2001) Prion diseases of humans and animals: their causes and molecular basis. Annu. Rev. Neurosci. 24, 519–550. Creighton, T.E., Darby, N.J., & Kemmink, J. (1996) The roles of partly folded intermediates in protein folding. FASEB J. 10, 110–118. Daggett, V., & Fersht, A.R. (2003) Is there a unifying mecha- nism for protein folding? Trends Biochem. Sci. 28, 18–25. Dill, K.A. & Chan, H.S. (1997) From Levinthal to pathways to funnels. Nat. Struct. Biol. 4, 10–19. Luque, I., Leavitt, S.A., & Freire, E. (2002) The linkage between protein folding and functional cooperativity: two sides of the same coin? Annu. Rev. Biophys. Biomol. Struct. 31, 235–256. A review of how variations in structural stability within one protein contribute to function. Nicotera, P. (2001) A route for prion neuroinvasion. Neuron 31, 345–348. Prusiner, S.B. (1995) The prion diseases. Sci. Am. 272 (January), 48–57. A good summary of the evidence leading to the prion hypothesis. Richardson, A., Landry, S.J., & Georgopolous, C. (1998) The ins and outs of a molecular chaperone machine. Trends Biochem. Sci. 23, 138–143. Thomas, P.J., Qu, B.-H., & Pederson, P.L. (1995) Defective protein folding as a basis of human disease. Trends Biochem. Sci. 20, 456–459. Westaway, D. & Carlson, G.A. (2002) Mammalian prion proteins: enigma, variation and vaccination. Trends Biochem. Sci. 27, 301–307. A good update. 1. Properties of the Peptide Bond In x-ray studies of crystalline peptides, Linus Pauling and Robert Corey found that the CON bond in the peptide link is intermediate in length (1.32 ?) between a typical CON single bond (1.49 ?) and a CPN double bond (1.27 ?). They also found that the peptide bond is planar (all four atoms attached to the CON group are located in the same plane) and that the two H9251- carbon atoms attached to the CON are always trans to each other (on opposite sides of the peptide bond): (a) What does the length of the CON bond in the pep- tide linkage indicate about its strength and its bond order (i.e., whether it is single, double, or triple)? (b) What do the observations of Pauling and Corey tell us about the ease of rotation about the CON peptide bond? 2. Structural and Functional Relationships in Fibrous Proteins William Astbury discovered that the x-ray pattern of wool shows a repeating structural unit spaced about 5.2 ? along the length of the wool fiber. When he steamed and stretched the wool, the x-ray pattern showed a new repeating structural unit at a spacing of 7.0 ?. Steaming and stretching the wool and then letting it shrink gave an x-ray pattern con- sistent with the original spacing of about 5.2 ?. Although these observations provided important clues to the molecular struc- ture of wool, Astbury was unable to interpret them at the time. (a) Given our current understanding of the structure of wool, interpret Astbury’s observations. (b) When wool sweaters or socks are washed in hot wa- ter or heated in a dryer, they shrink. Silk, on the other hand, does not shrink under the same conditions. Explain. 3. Rate of Synthesis of Hair H9251-Keratin Hair grows at a rate of 15 to 20 cm/yr. All this growth is concentrated at the base of the hair fiber, where H9251-keratin filaments are syn- thesized inside living epidermal cells and assembled into ro- pelike structures (see Fig. 4–11). The fundamental structural element of H9251-keratin is the H9251 helix, which has 3.6 amino acid residues per turn and a rise of 5.4 ? per turn (see Fig. 4–4b). Assuming that the biosynthesis of H9251-helical keratin chains is the rate-limiting factor in the growth of hair, calculate the rate at which peptide bonds of H9251-keratin chains must be syn- thesized (peptide bonds per second) to account for the ob- served yearly growth of hair. 4. Effect of pH on the Conformation of H9251-Helical Sec- ondary Structures The unfolding of the H9251 helix of a polypeptide to a randomly coiled conformation is accompanied by a large decrease in a property called its specific rotation, a measure of a solution’s capacity to rotate plane-polarized light. Polyglutamate, a polypeptide made up of only L-Glu residues, C a N C O C a H Problems 8885d_c04_154 12/23/03 7:53 AM Page 154 mac111 mac111:reb: Chapter 4 Problems 155 has the H9251-helical conformation at pH 3. When the pH is raised to 7, there is a large decrease in the specific rotation of the so- lution. Similarly, polylysine (L-Lys residues) is an H9251 helix at pH 10, but when the pH is lowered to 7 the specific rotation also decreases, as shown by the following graph. What is the explanation for the effect of the pH changes on the conformations of poly(Glu) and poly(Lys)? Why does the transition occur over such a narrow range of pH? 5. Disulfide Bonds Determine the Properties of Many Proteins A number of natural proteins are very rich in disulfide bonds, and their mechanical properties (tensile strength, viscosity, hardness, etc.) are correlated with the de- gree of disulfide bonding. For example, glutenin, a wheat pro- tein rich in disulfide bonds, is responsible for the cohesive and elastic character of dough made from wheat flour. Simi- larly, the hard, tough nature of tortoise shell is due to the extensive disulfide bonding in its H9251-keratin. (a) What is the molecular basis for the correlation be- tween disulfide-bond content and mechanical properties of the protein? (b) Most globular proteins are denatured and lose their activity when briefly heated to 65 H11034C. However, globular pro- teins that contain multiple disulfide bonds often must be heated longer at higher temperatures to denature them. One such protein is bovine pancreatic trypsin inhibitor (BPTI), which has 58 amino acid residues in a single chain and con- tains three disulfide bonds. On cooling a solution of dena- tured BPTI, the activity of the protein is restored. What is the molecular basis for this property? 6. Amino Acid Sequence and Protein Structure Our growing understanding of how proteins fold allows re- searchers to make predictions about protein structure based on primary amino acid sequence data. (a) In the amino acid sequence above, where would you predict that bends or H9252 turns would occur? (b) Where might intrachain disulfide cross-linkages be formed? (c) Assuming that this sequence is part of a larger glob- ular protein, indicate the probable location (the external sur- face or interior of the protein) of the following amino acid residues: Asp, Ile, Thr, Ala, Gln, Lys. Explain your reasoning. (Hint: See the hydropathy index in Table 3–1.) 7. Bacteriorhodopsin in Purple Membrane Proteins Under the proper environmental conditions, the salt-loving bacterium Halobacterium halobium synthesizes a membrane protein (M r 26,000) known as bacteriorhodopsin, which is pur- ple because it contains retinal (see Fig. 10–21). Molecules of this protein aggregate into “purple patches” in the cell mem- brane. Bacteriorhodopsin acts as a light-activated proton pump that provides energy for cell functions. X-ray analysis of this protein reveals that it consists of seven parallel H9251-helical seg- ments, each of which traverses the bacterial cell membrane (thickness 45 ?). Calculate the minimum number of amino acid residues necessary for one segment of H9251 helix to traverse the membrane completely. Estimate the fraction of the bacteri- orhodopsin protein that is involved in membrane-spanning he- lices. (Use an average amino acid residue weight of 110.) 8. Pathogenic Action of Bacteria That Cause Gas Gangrene The highly pathogenic anaerobic bacterium Clostridium perfringens is responsible for gas gangrene, a condition in which animal tissue structure is destroyed. This bacterium secretes an enzyme that efficiently catalyzes the hydrolysis of the peptide bond indicated in red: where X and Y are any of the 20 common amino acids. How does the secretion of this enzyme contribute to the invasive- ness of this bacterium in human tissues? Why does this en- zyme not affect the bacterium itself? 9. Number of Polypeptide Chains in a Multisubunit Protein A sample (660 mg) of an oligomeric protein of M r 132,000 was treated with an excess of 1-fluoro-2,4- dinitrobenzene (Sanger’s reagent) under slightly alkaline con- ditions until the chemical reaction was complete. The pep- tide bonds of the protein were then completely hydrolyzed by heating it with concentrated HCl. The hydrolysate was found to contain 5.5 mg of the following compound: 2,4-Dinitrophenyl derivatives of the H9251-amino groups of other amino acids could not be found. (a) Explain how this information can be used to deter- mine the number of polypeptide chains in an oligomeric protein. (b) Calculate the number of polypeptide chains in this protein. (c) What other protein analysis technique could you employ to determine whether the polypeptide chains in this protein are similar or different? O 2 N NO 2 NH C C CH 3 CH 3 H H COOH X Gly Pro Y H 2 O X COO H 3 NGly Pro Y H11001 H11001 H11002 12345678910 Ile Ala His Thr Tyr Gly Pro Phe Glu Ala 11 12 13 14 15 16 17 18 19 20 Ala Met Cys Lys Trp Glu Ala Gln Pro Asp 21 22 23 24 25 26 27 28 Gly Met Glu Cys Ala Phe His Arg 0 Poly(Glu) Random conformation Poly(Lys) pH Specific rotation 2 4 6 8 10 12 14 a Helix Random conformation a Helix 8885d_c04_155 1/16/04 6:14 AM Page 155 mac76 mac76:385_reb: Chapter 4 The Three-Dimensional Structure of Proteins156 Biochemistry on the Internet 10. Protein Modeling on the Internet A group of pa- tients suffering from Crohn’s disease (an inflammatory bowel disease) underwent biopsies of their intestinal mucosa in an attempt to identify the causative agent. A protein was iden- tified that was expressed at higher levels in patients with Crohn’s disease than in patients with an unrelated inflamma- tory bowel disease or in unaffected controls. The protein was isolated and the following partial amino acid sequence was obtained (reads left to right): EAELCPDRCI HSFQNLGIQC VKKRDLEQAI SQRIQTNNNP FQVPIEEQRG DYDLNAVRLC FQVTVRDPSG RPLRLPPVLP HPIFDNRAPN TAELKICRVN RNSGSCLGGD EIFLLCDKVQ KEDIEVYFTG PGWEARGSFS QADVHRQVAI VFRTPPYADP SLQAPVRVSM QLRRPSDREL SEPMEFQYLP DTDDRHRIEE KRKRTYETFK SIMKKSPFSG PTDPRPPPRR IAVPSRSSAS VPKPAPQPYP (a) You can identify this protein using a protein data- base on the Internet. Some good places to start include Protein Information Resource (PIR; pir.georgetown.edu/ pirwww), Structural Classification of Proteins (SCOP; http:// scop.berkeley.edu), and Prosite (http://us.expasy.org/prosite). At your selected database site, follow links to locate the sequence comparison engine. Enter about 30 residues from the sequence of the protein in the appropriate search field and submit it for analysis. What does this analysis tell you about the identity of the protein? (b) Try using different portions of the protein amino acid sequence. Do you always get the same result? (c) A variety of websites provide information about the three-dimensional structure of proteins. Find information about the protein’s secondary, tertiary, and quaternary struc- ture using database sites such as the Protein Data Bank (PDB; www.rcsb.org/pdb) or SCOP. (d) In the course of your Web searches try to find in- formation about the cellular function of the protein. 8885d_c04_156 12/23/03 7:54 AM Page 156 mac111 mac111:reb: