1051 CHAPTER 27 AMINO ACIDS, PEPTIDES, AND PROTEINS. NUCLEIC ACIDS T he relationship between structure and function reaches its ultimate expression in the chemistry of amino acids, peptides, and proteins. Amino acids are carboxylic acids that contain an amine function. Under cer- tain conditions the amine group of one molecule and the carboxyl group of a second can react, uniting the two amino acids by an amide bond. Amide linkages between amino acids are known as peptide bonds, and the product of peptide bond formation between two amino acids is called a dipeptide. The peptide chain may be extended to incorporate three amino acids in a tripeptide, four in a tetrapep- tide, and so on. Polypeptides contain many amino acid units. Proteins are naturally occurring polypeptides that contain more than 50 amino acid units—most proteins are polymers of 100–300 amino acids. The most striking thing about proteins is the diversity of their roles in living sys- tems: silk, hair, skin, muscle, and connective tissue are proteins, and almost all enzymes are proteins. As in most aspects of chemistry and biochemistry, structure is the key to function. We’ll explore the structure of proteins by first concentrating on their funda- mental building block units, the H9251-amino acids. Then, after developing the principles of peptide structure, we’ll see how the insights gained from these smaller molecules aid our understanding of proteins. Amide (peptide) bond H 3 NCHCO H11002 H11001 R O H 3 NCHCO H11002 H11001 RH11032 O H11001 Two H9251-amino acids NHCHCO H11002 H 3 NCHC H11001 R RH11032 O O Dipeptide Water H11001 H 2 O Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 1052 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids The chapter concludes with a discussion of the nucleic acids, which are the genetic material of living systems and which direct the biosynthesis of proteins. These two types of biopolymers, nucleic acids and proteins, are the organic chemicals of life. 27.1 CLASSIFICATION OF AMINO ACIDS Amino acids are classified as H9251, H9252, H9253, and so on, according to the location of the amine group on the carbon chain that contains the carboxylic acid function. Although more than 700 different amino acids are known to occur naturally, a group of 20 of them commands special attention. These 20 are the amino acids that are normally present in proteins and are shown in Figure 27.1 and in Table 27.1. All the amino acids from which proteins are derived are H9251-amino acids, and all but one of these contain a primary amino function and conform to the general structure The one exception is proline, a secondary amine in which the amino nitrogen is incor- porated into a five-membered ring. Table 27.1 includes three-letter and one-letter abbreviations for the amino acids. Both enjoy wide use. Our bodies can make some of the amino acids shown in the table. The others, which are called essential amino acids, we have to get from what we eat. 27.2 STEREOCHEMISTRY OF AMINO ACIDS Glycine is the simplest amino acid and the only one in Table 27.1 that is achiral. The H9251-carbon atom is a stereogenic center in all the others. Configurations in amino acids are normally specified by the D, L notational system. All the chiral amino acids obtained from proteins have the L configuration at their H9251-carbon atom. N H9251 H11001 CO 2 H11002 HH Proline RCHCO 2 H11002 H9251 H11001 NH 3 1-Aminocyclopropanecarboxylic acid: an H9251-amino acid that is the biological precursor to ethylene in plants CO 2 H11002 NH 3H9251 H11001 H 3 NCH 2 CH 2 CO 2 H11002 H11001 H9252H9251 3-Aminopropanoic acid: known as H9252-alanine, it is a H9252-amino acid that makes up one of the structural units of coenzyme A H 3 NCH 2 CH 2 CH 2 CO 2 H11002 H11001 H9253H9252H9251 4-Aminobutanoic acid: known as H9253-aminobutyric acid (GABA), it is a H9253-amino acid and is involved in the transmission of nerve impulses The graphic that opened this chapter is an electrostatic potential map of glycine. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website Amino acids with nonpolar side chains Amino acids with polar but nonionized side chains Amino acids with acidic side chains Amino acids with basic side chains LeucineValine IsoleucineAlanineGlycine Methionine Proline Phenylalanine Tryptophan GlutamineAsparagine Serine Threonine Glutamic acidAspartic acid Tyrosine Cysteine Lysine Arginine Histidine FIGURE 27.1 Electro- static potential maps of the 20 common amino acids listed in Table 27.1. Each amino acid is oriented so that its side chain is in the upper left corner. The side chains affect the shape and properties of the amino acids. 27.2 Stereochemistry of Amino Acids 1053 Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 1054 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids TABLE 27.1 H9251-Amino Acids Found in Proteins Name Glycine Alanine Valine ? Leucine ? Isoleucine ? Methionine ? Proline Phenylalanine ? Tryptophan ? (Continued) Amino acids with nonpolar side chains Asparagine Amino acids with polar but nonionized side chains Gly (G) Ala (A) Val (V) Leu (L) Ile (I) Met (M) Pro (P) Phe (F) Trp (W) Asn (N) Abbreviation Structural formula* H NH 3 H11001 CHCO 2 H11002 CH 3 NH 3 H11001 CHCO 2 H11002 (CH 3 ) 2 CH NH 3 H11001 CHCO 2 H11002 CH 3 CH 2 CH NH 3 H11001 CHCO 2 H11002 CH 3 (CH 3 ) 2 CHCH 2 NH 3 H11001 CHCO 2 H11002 H 2 C H 2 C H 2 C NH 2 H11001 CHCO 2 H11002 CH 3 SCH 2 CH 2 NH 3 H11001 CHCO 2 H11002 CH 2 NH 3 H11001 CHCO 2 H11002 N H CH 2 NH 3 H11001 CHCO 2 H11002 O H 2 NCCH 2 NH 3 H11001 CHCO 2 H11002 *All amino acids are shown in the form present in greatest concentration at pH 7. ? An essential amino acid, which must be present in the diet of animals to ensure normal growth. Learning By Modeling contains electrostatic potential maps of all the amino acids in this table. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 27.2 Stereochemistry of Amino Acids 1055 TABLE 27.1 H9251-Amino Acids Found in Proteins (Continued) Name Serine Threonine ? Aspartic acid Glutamic acid Tyrosine Cysteine Amino acids with acidic side chains Amino acids with polar but nonionized side chains Lysine ? Arginine ? Histidine ? Amino acids with basic side chains Ser (S) Thr (T) Asp (D) Glu (E) Tyr (Y) Cys (C) Lys (K) Arg (R) His (H) Abbreviation Structural formula* CH 3 CH NH 3 H11001 CHCO 2 H11002 OH HSCH 2 NH 3 H11001 CHCO 2 H11002 H 3 NCH 2 CH 2 CH 2 CH 2 NH 3 H11001 H11001 CHCO 2 H11002 O H11002 OCCH 2 NH 3 H11001 CHCO 2 H11002 H 2 NCNHCH 2 CH 2 CH 2 NH 3 H11001 NH 2 H11001 CHCO 2 H11002 O H11002 OCCH 2 CH 2 NH 3 H11001 CHCO 2 H11002 HOCH 2 NH 3 H11001 CHCO 2 H11002 CH 2 NH 3 H11001 CHCO 2 H11002 HO CH 2 NH 3 H11001 CHCO 2 H11002 N N H Glutamine Gln (Q) O H 2 NCCH 2 CH 2 NH 3 H11001 CHCO 2 H11002 Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website PROBLEM 27.1 What is the absolute configuration (R or S) at the H9251 carbon atom in each of the following L-amino acids? (a) (c) (b) SAMPLE SOLUTION (a) First identify the four groups attached directly to the stereogenic center, and rank them in order of decreasing sequence rule prece- dence. For L-serine these groups are Next, translate the Fischer projection of L-serine to a three-dimensional represen- tation, and orient it so that the lowest ranked substituent at the stereogenic cen- ter is directed away from you. In order of decreasing precedence the three highest ranked groups trace an anti- clockwise path. The absolute configuration of L-serine is S. PROBLEM 27.2 Which of the amino acids in Table 27.1 have more than one stereogenic center? Although all the chiral amino acids obtained from proteins have the L configura- tion at their H9251 carbon, that should not be taken to mean that D-amino acids are unknown. In fact, quite a number of D-amino acids occur naturally. D-Alanine, for example, is a HOCH 2 CO 2 H11002 NH 3 H11001 H 3 N H11001 CO 2 H11002 H CH 2 OH H11013 C HOCH 2 H CO 2 H11002 NH 3 H11001 C H11001 NH 3 HOCH 2 CO 2 H11002 H H11013 H 3 N± H11001 Highest ranked H Lowest ranked ±CO 2 H11002 ±CH 2 OHH11022H11022 H11022 H 3 N H11001 CO 2 H11002 H CH 2 SH L-Cysteine H 3 N H11001 CO 2 H11002 H CH 2 CH 2 SCH 3 L-Methionine H 3 N H11001 CO 2 H11002 H CH 2 OH L-Serine H 3 N H11001 CO 2 H11002 H H Glycine (achiral) Fischer projection of an L-amino acid H 3 N H11001 CO 2 H11002 H R H11013 C R H NH 3 H11001 CO 2 H11002 1056 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website constituent of bacterial cell walls. The point is that D-amino acids are not constituents of proteins. A new technique for dating archaeological samples called amino acid racemiza- tion (AAR) is based on the stereochemistry of amino acids. Over time, the configuration at the H9251-carbon atom of a protein’s amino acids is lost in a reaction that follows first- order kinetics. When the H9251 carbon is the only stereogenic center, this process corresponds to racemization. For an amino acid with two stereogenic centers, changing the configu- ration of the H9251 carbon from L to D gives a diastereomer. In the case of isoleucine, for example, the diastereomer is an amino acid not normally present in proteins, called alloisoleucine. By measuring the L-isoleucine/D-alloisoleucine ratio in the protein isolated from the eggshells of an extinct Australian bird, a team of scientists recently determined that this bird lived approximately 50,000 years ago. Radiocarbon ( 14 C) dating is not accurate for samples older than about 35,000 years, so AAR is a useful addition to the tools avail- able to paleontologists. 27.3 ACID–BASE BEHAVIOR OF AMINO ACIDS The physical properties of a typical amino acid such as glycine suggest that it is a very polar substance, much more polar than would be expected on the basis of its formula- tion as H 2 NCH 2 CO 2 H. Glycine is a crystalline solid; it does not melt, but on being heated it eventually decomposes at 233°C. It is very soluble in water but practically insoluble in nonpolar organic solvents. These properties are attributed to the fact that the stable form of glycine is a zwitterion, or inner salt. The equilibrium expressed by the preceding equation lies overwhelmingly to the side of the zwitterion. Glycine, as well as other amino acids, is amphoteric, meaning it contains an acidic functional group and a basic functional group. The acidic functional group is the ammo- nium ion ; the basic functional group is the carboxylate ion ±CO 2 H11002 . How do we know this? Aside from its physical properties, the acid–base properties of glycine, as illustrated by the titration curve in Figure 27.2, require it. In a strongly acidic medium the species present is . As the pH is raised, a proton is removed from this species. Is the proton removed from the positively charged nitrogen or from the carboxyl group? We know what to expect for the relative acid strengths of and RCO 2 H. A typical ammonium ion has pK a H11015 9, and a typical carboxylic acid has pK a H11015 5. The RNH 3 H11001 H 3 NCH 2 CO 2 H H11001 H 3 N± H11001 H 2 NCH 2 C O OH H 3 NCH 2 C H11001 O O H11002 Zwitterionic form of glycine L-Isoleucine CO 2 H11002 CH 2 CH 3 H 3 N H11001 H H 3 CH D-Alloisoleucine CO 2 H11002 CH 2 CH 3 H H11001 NH 3 H 3 CH 27.3 Acid–Base Behavior of Amino Acids 1057 The zwitterion is also often referred to as a dipolar ion. Note, however, that it is not an ion, but a neutral mole- cule. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website measured pK a for the conjugate acid of glycine is 2.35, a value closer to that expected for deprotonation of the carboxyl group. As the pH is raised, a second deprotonation step, corresponding to removal of a proton from nitrogen of the zwitterion, is observed. The pK a associated with this step is 9.78, much like that of typical alkylammonium ions. Thus, glycine is characterized by two pK a values: the one corresponding to the more acidic site is designated pK a1 , the one corresponding to the less acidic site is des- ignated pK a2 . Table 27.2 lists pK a1 and pK a2 values for the H9251-amino acids that have neu- tral side chains, which are the first two groups of amino acids given in Table 27.1. In all cases their pK a values are similar to those of glycine. Table 27.2 includes a column labeled pI, which gives isoelectric point values. The isoelectric point is the pH at which the amino acid bears no net charge; it corresponds to the pH at which the concentration of the zwitterion is a maximum. For the amino acids in Table 27.2 this is the average of pK a1 and pK a2 and lies slightly to the acid side of neutrality. Some amino acids, including those listed in the last two sections of Table 27.1, have side chains that bear acidic or basic groups. As Table 27.3 indicates, these amino acids are characterized by three pK a values. The “extra” pK a value (it can be either pK a2 or pK a3 ) reflects the nature of the function present in the side chain. The isoelectric points of the amino acids in Table 27.3 are midway between the pK a values of the monocation and monoanion and are well removed from neutrality when the side chain bears a car- boxyl group (aspartic acid, for example) or a basic amine function (lysine, for example). H 3 NCH 2 C H11001 O O H11002 Zwitterion; predominant species in solutions near neutrality H 2 NCH 2 C O O H11002 Species present in strong base Species present in strong acid H 3 NCH 2 C H11001 O OH H11002H H11001 H11001H H11001 H11002H H11001 H11001H H11001 1058 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids Equivalents of base added 0.4 0.2 0.0 1.0 0.8 0.6 1.6 1.4 1.2 2.0 1.8 pH 2 4 6 8 10 12 pK a1 = 2.3 pK a2 = 9.8 pI FIGURE 27.2 The titration curve of glycine. At pH values less than pK a1 , is the major species present. At pH values between pK a1 and pK a2 , the principal species is the zwitterion . The concentration of the zwitterion is a maximum at the isoelectric point pI. At pH values greater than pK a2 , H 2 NCH 2 CO 2 H11002 is the species present in greatest concentration. H 3 NCH 2 CO 2 H11002 H11001 H 3 NCH 2 CO 2 H H11001 Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website PROBLEM 27.3 Write the most stable structural formula for tyrosine: (a) In its cationic form (c) As a monoanion (b) In its zwitterionic form (d) As a dianion SAMPLE SOLUTION (a) The cationic form of tyrosine is the one present at low pH. The positive charge is on nitrogen, and the species present is an ammonium ion. HO CH 2 CHCO 2 H NH 3 H11001 27.3 Acid–Base Behavior of Amino Acids 1059 TABLE 27.2 Acid-Base Properties of Amino Acids with Neutral Side Chains Amino acid Glycine Alanine Valine Leucine Isoleucine Methionine Proline Phenylalanine Tryptophan Asparagine Glutamine Serine Threonine pK a1 * 2.34 2.34 2.32 2.36 2.36 2.28 1.99 1.83 2.83 2.02 2.17 2.21 2.09 pK a2 * 9.60 9.69 9.62 9.60 9.60 9.21 10.60 9.13 9.39 8.80 9.13 9.15 9.10 pI 5.97 6.00 5.96 5.98 6.02 5.74 6.30 5.48 5.89 5.41 5.65 5.68 5.60 *In all cases pK a1 corresponds to ionization of the carboxyl group; pK a2 corresponds to deprotonation of the ammonium ion. TABLE 27.3 Acid-Base Properties of Amino Acids with Ionizable Side Chains Amino acid Aspartic acid Glutamic acid Tyrosine Cysteine Lysine Arginine Histidine pK a1 * 1.88 2.19 2.20 1.96 2.18 2.17 1.82 pK a2 3.65 4.25 9.11 8.18 8.95 9.04 6.00 pK a3 9.60 9.67 10.07 10.28 10.53 12.48 9.17 pI 2.77 3.22 5.66 5.07 9.74 10.76 7.59 *In all cases pK a1 corresponds to ionization of the carboxyl group of RCHCO 2 H. W NH 3 H11001 Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 1060 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids ELECTROPHORESIS E lectrophoresis is a method for separation and purification that depends on the movement of charged particles in an electric field. Its principles can be introduced by considering the electrophoretic behavior of some representative amino acids. The medium is a cellulose acetate strip that is moistened with an aqueous solution buffered at a particular pH. The opposite ends of the strip are placed in separate compartments containing the buffer, and each com- partment is connected to a source of direct electric current (Figure 27.3a). If the buffer solution is more acidic than the isoelectric point (pI) of the amino acid, the amino acid has a net positive charge and mi- grates toward the negatively charged electrode. Con- versely, when the buffer is more basic than the pI of the amino acid, the amino acid has a net negative charge and migrates toward the positively charged electrode. When the pH of the buffer corresponds to the pI, the amino acid has no net charge and does not migrate from the origin. Thus if a mixture containing alanine, aspartic acid, and lysine is subjected to electrophoresis in a buffer that matches the isoelectric point of alanine (pH 6.0), aspartic acid (pI H11005 2.8) migrates toward the positive electrode, alanine remains at the origin, and lysine (pI H11005 9.7) migrates toward the negative elec- trode (Figure 27.3b). H11002 O 2 CCH 2 CHCO 2 H11002 H11001 NH 3 Aspartic acid (monoanion) CH 3 CHCO 2 H11002 H11001 NH 3 Alanine (neutral) H 3 N(CH 2 ) 4 CHCO 2 H11002 H11001 H11001 NH 3 Lysine (monocation) A mixture of amino acids H11002 O 2 CCH 2 CHCO 2 H11002 H 3 N(CH 2 ) 4 CHCO 2 H11002 CH 3 CHCO 2 H11002 H11001 NH 3 H11001 NH 3 H11001 NH 3 is placed at the center of a sheet of cellulose acetate. The sheet is soaked with an aqueous solution buffered at a pH of 6.0. At this pH aspartic acid exists as its H110021 ion, alanine as its zwitterion, and lysine as its H110011 ion. (a) (b) H11002H11001 Application of an electric current causes the negatively charged ions to migrate to the H11001 electrode, and the positively charged ions to migrate to the H11002 electrode. The zwitterion, with a net charge of zero, remains at its original position. H11001 —Cont. FIGURE 27.3 Application of electrophoresis to the separation of aspartic acid, alanine, and lysine according to their charge type at a pH corresponding to the isoelectric point (pI) of alanine. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website PROBLEM 27.4 Write structural formulas for the principal species present when the pH of a solution containing lysine is raised from 1 to 9 and again to 13. The acid–base properties of their side chains are one way in which individual amino acids differ. This is important in peptides and proteins, where the properties of the substance depend on its amino acid constituents, especially on the nature of the side chains. It is also important in analyses in which a complex mixture of amino acids is separated into its components by taking advantage of the differences in their proton- donating and proton-accepting abilities. 27.4 SYNTHESIS OF AMINO ACIDS One of the oldest methods for the synthesis of amino acids dates back to the nineteenth century and is simply a nucleophilic substitution in which ammonia reacts with an H9251-halo carboxylic acid. The H9251-halo acid is normally prepared by the Hell–Volhard–Zelinsky reaction (see Sec- tion 19.16). PROBLEM 27.5 Outline the steps in a synthesis of valine from 3-methylbutanoic acid. In the Strecker synthesis an aldehyde is converted to an H9251-amino acid with one more carbon atom by a two-stage procedure in which an H9251-amino nitrile is an intermediate. CH 3 CHCO 2 H Br 2-Bromopropanoic acid CH 3 CHCO 2 H11002 NH 3 H11001 Alanine (65–70%) H11001H110012NH 3 Ammonia NH 4 Br Ammonium bromide H 2 O 27.4 Synthesis of Amino Acids 1061 Electrophoresis is used primarily to analyze mix- tures of peptides and proteins, rather than individual amino acids, but analogous principles apply. Because they incorporate different numbers of amino acids and because their side chains are different, two pep- tides will have slightly different acid–base properties and slightly different net charges at a particular pH. Thus, their mobilities in an electric field will be differ- ent, and electrophoresis can be used to separate them. The medium used to separate peptides and proteins is typically a polyacrylamide gel, leading to the term gel electrophoresis for this technique. A second factor that governs the rate of migra- tion during electrophoresis is the size (length and shape) of the peptide or protein. Larger molecules move through the polyacrylamide gel more slowly than smaller ones. In current practice, the experiment is modified to exploit differences in size more than differences in net charge, especially in the SDS gel electrophoresis of proteins. Approximately 1.5 g of the detergent sodium dodecyl sulfate (SDS, page 745) per gram of protein is added to the aqueous buffer. SDS binds to the protein, causing the protein to un- fold so that it is roughly rod-shaped with the CH 3 (CH 2 ) 10 CH 2 ± groups of SDS associated with the lipophilic portions of the protein. The negatively charged sulfate groups are exposed to the water. The SDS molecules that they carry ensure that all the pro- tein molecules are negatively charged and migrate toward the positive electrode. Furthermore, all the proteins in the mixture now have similar shapes and tend to travel at rates proportional to their chain length. Thus, when carried out on a preparative scale, SDS gel electrophoresis permits proteins in a mixture to be separated according to their molecular weight. On an analytical scale, it is used to estimate the mo- lecular weight of a protein by comparing its elec- trophoretic mobility with that of proteins of known molecular weight. Later, in Section 27.29, we will see how gel elec- trophoresis is used in nucleic acid chemistry. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website The H9251-amino nitrile is formed by reaction of the aldehyde with ammonia or an ammo- nium salt and a source of cyanide ion. Hydrolysis of the nitrile group to a carboxylic acid function completes the synthesis. PROBLEM 27.6 Outline the steps in the preparation of valine by the Strecker synthesis. The most widely used method for the laboratory synthesis of H9251-amino acids is a modification of the malonic ester synthesis (Section 21.7). The key reagent is diethyl acetamidomalonate, a derivative of malonic ester that already has the critical nitrogen substituent in place at the H9251-carbon atom. The side chain is introduced by alkylating diethyl acetamidomalonate in the same way as diethyl malonate itself is alkylated. Hydrolysis removes the acetyl group from nitrogen and converts the two ester functions to carboxyl groups. Decarboxylation gives the desired product. PROBLEM 27.7 Outline the steps in the synthesis of valine from diethyl acetamidomalonate. The overall yield of valine by this method is reported to be rather low (31%). Can you think of a reason why this synthesis is not very effi- cient? Unless a resolution step is included, the H9251-amino acids prepared by the synthetic methods just described are racemic. Optically active amino acids, when desired, may be obtained by resolving a racemic mixture or by enantioselective synthesis. A synthesis is described as enantioselective if it produces one enantiomer of a chiral compound in an amount greater than its mirror image. Recall from Section 7.9 that optically inactive reactants cannot give optically active products. Enantioselective syntheses of amino acids therefore require an enantiomerically enriched chiral reagent or catalyst at some point in HBr H 2 O, heat heat H11002CO 2 Phenylalanine (65%) C 6 H 5 CH 2 CHCO 2 H11002 NH 3 H11001 Diethyl acetamidobenzylmalonate CH 3 CNHC(CO 2 CH 2 CH 3 ) 2 O CH 2 C 6 H 5 (not isolated) H 3 NC(CO 2 H) 2 H11001 CH 2 C 6 H 5 CH 3 CH O Acetaldehyde NH 4 Cl NaCN 2-Aminopropanenitrile CH 3 CHC N NH 2 Alanine (52–60%) CH 3 CHCO 2 H11002 NH 3 H11001 1. H 2 O, HCl, heat 2. HO H11002 1062 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids CH 3 CNHCH(CO 2 CH 2 CH 3 ) 2 O Diethyl acetamidomalonate CH 3 CNHC(CO 2 CH 2 CH 3 ) 2 H11002 Na H11001 O Sodium salt of diethyl acetamidomalonate NaOCH 2 CH 3 CH 3 CH 2 OH Diethyl acetamidobenzylmalonate (90%) CH 3 CNHC(CO 2 CH 2 CH 3 ) 2 O CH 2 C 6 H 5 C 6 H 5 CH 2 Cl The synthesis of alanine was described by Adolf Strecker of the University of Würzburg (Germany) in a pa- per published in 1850. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website the process. If the chiral reagent or catalyst is a single enantiomer and if the reaction sequence is completely enantioselective, an optically pure amino acid is obtained. Chemists have succeeded in preparing H9251-amino acids by techniques that are more than 95% enantioselective. Although this is an impressive feat, we must not lose sight of the fact that the reactions that produce amino acids in living systems do so with 100% enan- tioselectivity. 27.5 REACTIONS OF AMINO ACIDS Amino acids undergo reactions characteristic of both their amine and carboxylic acid functional groups. Acylation is a typical reaction of the amino group. Ester formation is a typical reaction of the carboxyl group. The presence of amino acids can be detected by the formation of a purple color on treatment with ninhydrin. The same compound responsible for the purple color is formed from all amino acids in which the H9251-amino group is primary. Proline, in which the H9251-amino group is secondary, gives an orange compound on reac- tion with ninhydrin. PROBLEM 27.8 Suggest a reasonable mechanism for the reaction of an H9251-amino acid with ninhydrin. 27.6 SOME BIOCHEMICAL REACTIONS OF AMINO ACIDS The 20 amino acids listed in Table 27.1 are biosynthesized by a number of different pathways, and we will touch on only a few of them in an introductory way. We will examine the biosynthesis of glutamic acid first, since it illustrates a biochemical process Ethanol CH 3 CH 2 OHH11001 Alanine CH 3 CHCO 2 H11002 NH 3 H11001 Hydrochloride salt of alanine ethyl ester (90–95%) CH 3 CHCOCH 2 CH 3 Cl H11002 O NH 3 H11001 HCl Glycine H 3 NCH 2 CO 2 H11002 H11001 Acetic anhydride CH 3 COCCH 3 O O N-Acetylglycine (89–92%) CH 3 CNHCH 2 CO 2 H O H11001H11001 Acetic acid CH 3 CO 2 H 27.6 Some Biochemical Reactions of Amino Acids 1063 2 O O OH OH Ninhydrin H11001 H 3 NCHCO 2 H11002 H11001 R H11001 HO H11002 N O H11002 OO O Violet dye (“Ruhemann’s purple”) H11001 (Formed, but not normally isolated) O RCH CO 2 4 H 2 O Ninhydrin is used to detect fingerprints. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website analogous to a reaction we have discussed earlier in the context of amine synthesis, reductive amination (Section 22.11). Glutamic acid is formed in most organisms from ammonia and H9251-ketoglutaric acid. H9251-Ketoglutaric acid is one of the intermediates in the tricarboxylic acid cycle (also called the Krebs cycle) and arises via metabolic breakdown of food sources—carbohy- drates, fats, and proteins. Ammonia reacts with the ketone carbonyl group to give an imine (C?NH), which is then reduced to the amine function of the H9251-amino acid. Both imine formation and reduc- tion are enzyme-catalyzed. The reduced form of nicotinamide adenine diphosphonu- cleotide (NADPH) is a coenzyme and acts as a reducing agent. The step in which the imine is reduced is the one in which the stereogenic center is introduced and gives only L-glutamic acid. L-Glutamic acid is not an essential amino acid. It need not be present in the diet, since animals can biosynthesize it from sources of H9251-ketoglutaric acid. It is, however, a key intermediate in the biosynthesis of other amino acids by a process known as transamination. L-Alanine, for example, is formed from pyruvic acid by transamination from L-glutamic acid. In transamination an amine group is transferred from L-glutamic acid to pyruvic acid. An outline of the mechanism of transamination is presented in Figure 27.4. One amino acid often serves as the biological precursor to another. L-Phenylala- nine is classified as an essential amino acid, whereas its p-hydroxy derivative, L-tyro- sine, is not. This is because animals can convert L-phenylalanine to L-tyrosine by hydrox- ylation of the aromatic ring. An arene oxide (Section 24.7) is an intermediate. Some people lack the enzymes necessary to convert L-phenylalanine to L-tyrosine. Any L-phenylalanine that they obtain from their diet is diverted along a different meta- bolic pathway, giving phenylpyruvic acid: enzymes L-Glutamic acid HO 2 CCH 2 CH 2 CHCO 2 H11002 NH 3 H11001 CH 3 CHCO 2 H11002 NH 3 H11001 L-AlaninePyruvic acid CH 3 CCO 2 H O H9251-Ketoglutaric acid HO 2 CCH 2 CH 2 CCO 2 H O H11001H11001 enzymes reducing agents L-Glutamic acid HO 2 CCH 2 CH 2 CHCO 2 H11002 NH 3 H11001 H9251-Ketoglutaric acid HO 2 CCH 2 CH 2 CCO 2 H O Ammonia NH 3 H11001 1064 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids The August 1986 issue of the Journal of Chemical Educa- tion (pp. 673–677) contains a review of the Krebs cycle. CH 2 CHCO 2 H11002 NH 3 H11001 L-Phenylalanine O 2 enzyme enzyme O CH 2 CHCO 2 H11002 NH 3 H11001 Arene oxide intermediate HO CH 2 CHCO 2 H11002 NH 3 H11001 L-Tyrosine Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website Phenylpyruvic acid can cause mental retardation in infants who are deficient in the enzymes necessary to convert L-phenylalanine to L-tyrosine. This disorder is called phenylketonuria, or PKU disease. PKU disease can be detected by a simple test rou- tinely administered to newborns. It cannot be cured, but is controlled by restricting the dietary intake of L-phenylalanine. In practice this means avoiding foods such as meat that are rich in L-phenylalanine. Among the biochemical reactions that amino acids undergo is decarboxylation to amines. Decarboxylation of histidine, for example, gives histamine, a powerful vasodila- tor normally present in tissue and formed in excessive amounts under conditions of trau- matic shock. CH 2 CHCO 2 H11002 NH 3 H11001 L-Phenylalanine CH 2 CCO 2 H Phenylpyruvic acid enzymes O 27.6 Some Biochemical Reactions of Amino Acids 1065 Step 1: The amine function of L-glutamate reacts with the ketone carbonyl of pyruvate to form an imine. L-Glutamate H11001 H11001 Step 2: Enzyme-catalyzed proton-transfer steps cause migration of the double bond, converting the imine formed in step 1 to an isomeric imine. Pyruvate Imine Imine from step 1 HH H ± acid Step 3: Hydrolysis of the rearranged imine gives L-alanine and H9251-ketoglutarate. Rearranged imine Rearranged imine Water H9251-Ketoglutarate L-Alanine H11002 O 2 CCH 2 CH 2 H11002 O 2 CCH 2 CH 2 CH ± NH 3 H11001 O ? C CH±N?C H11002 O 2 C H11002 O 2 CCH 2 CH 2 C ± N ? C H11002 O 2 C H11002 O 2 C CO 2 H11002 CH 3 CO 2 H11002 CH 3 CO 2 H11002 CH 3 H11002 O 2 CCH 2 CH 2 C ? N ± C H11002 O 2 C CO 2 H11002 CH 3 ¢± H11002 O 2 CCH 2 CH 2 C ? O H11001 H 3 N ± CH H11002 O 2 C CO 2 H11002 CH 3 ¢± ¢± H11002 O 2 CCH 2 CH 2 C ? N ± CH H11001 H 2 O H11002 O 2 C CO 2 H11002 CH 3 base: FIGURE 27.4 The mechanism of transamination. All the steps are enzyme-catalyzed. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website Histamine is responsible for many of the symptoms associated with hay fever and other allergies. An antihistamine relieves these symptoms by blocking the action of histamine. PROBLEM 27.9 One of the amino acids in Table 27.1 is the biological precursor to H9253-aminobutyric acid (4-aminobutanoic acid), which it forms by a decarboxyla- tion reaction. Which amino acid is this? The chemistry of the brain and central nervous system is affected by a group of substances called neurotransmitters. Several of these neurotransmitters arise from L-tyrosine by structural modification and decarboxylation, as outlined in Figure 27.5. N N H CH 2 CHCO 2 H11002 NH 3 H11001 Histidine H11002CO 2 enzymes CH 2 CH 2 NH 2 N N H Histamine 1066 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids ± CH 2 ± CHO ± CO 2 H11002H Tyrosine 3,4-Dihydroxyphenylalanine (L-dopa) Dopamine Norepinephrine Epinephrine H11001 NH 3 ± CH 2 ± CHO ± CO 2 H11002H HO ± CH 2 CH 2 NH 2 HO ± ± CHO ± OH H CH 2 NH 2 ± CHO ± OH H CH 2 NHCH 3 ¢± ¢± ¢± ¢± T HO T HO T HO T T TT H11001 NH 3 T FIGURE 27.5 Tyrosine is the biosynthetic precursor to a number of neurotransmitters. Each transformation is enzyme-catalyzed. Hydroxylation of the aromatic ring of tyrosine converts it to 3,4-dihydroxyphenylalanine (L-dopa), decarboxylation of which gives dopamine. Hydroxyla- tion of the benzylic carbon of dopamine converts it to norepinephrine (noradrenaline), and methylation of the amino group of norepinephrine yields epinephrine (adrenaline). For a review of neurotrans- mitters, see the February 1988 issue of the Journal of Chemical Education (pp. 108–111). Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 27.7 PEPTIDES A key biochemical reaction of amino acids is their conversion to peptides, polypeptides, and proteins. In all these substances amino acids are linked together by amide bonds. The amide bond between the amino group of one amino acid and the carboxyl of another is called a peptide bond. Alanylglycine is a representative dipeptide. By agreement, peptide structures are written so that the amino group (as or H 2 N±) is at the left and the carboxyl group (as CO 2 H11002 or CO 2 H) is at the right. The left and right ends of the peptide are referred to as the N terminus (or amino terminus) and the C terminus (or carboxyl terminus), respectively. Alanine is the N-terminal amino acid in alanylglycine; glycine is the C-terminal amino acid. A dipeptide is named as an acyl derivative of the C-terminal amino acid. We call the precise order of bonding in a peptide its amino acid sequence. The amino acid sequence is conveniently specified by using the three-letter amino acid abbreviations for the respective amino acids and con- necting them by hyphens. Individual amino acid components of peptides are often referred to as amino acid residues. PROBLEM 27.10 Write structural formulas showing the constitution of each of the following dipeptides. Rewrite each sequence using one-letter abbreviations for the amino acids. (a) Gly-Ala (d) Gly-Glu (b) Ala-Phe (e) Lys-Gly (c) Phe-Ala (f) D-Ala-D-Ala SAMPLE SOLUTION (a) Gly-Ala is a constitutional isomer of Ala-Gly. Glycine is the N-terminal amino acid in Gly-Ala; alanine is the C-terminal amino acid. H 3 N± H11001 N-terminal amino acid C-terminal amino acidNHCH 2 CO 2 H11002 H 3 NCHC H11001 CH 3 O Alanylglycine (Ala-Gly) 27.7 Peptides 1067 It is understood that H9251-amino acids occur as their L stereo- isomers unless otherwise indicated. The D notation is explicitly shown when a D amino acid is present, and a racemic amino acid is iden- tified by the prefix DL. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 1068 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids FIGURE 27.6 Structural features of the dipeptide L-alanylglycine as determined by X-ray crystallography. Figure 27.6 shows the structure of Ala-Gly as determined by X-ray crystallogra- phy. An important feature is the planar geometry associated with the peptide bond, and the most stable conformation with respect to this bond has the two H9251-carbon atoms anti to each other. Rotation about the amide linkage is slow because delocalization of the unshared electron pair of nitrogen into the carbonyl group gives partial double-bond char- acter to the carbon–nitrogen bond. PROBLEM 27.11 Expand your answer to Problem 27.10 by showing the struc- tural formula for each dipeptide in a manner that reveals the stereochemistry at the H9251-carbon atom. SAMPLE SOLUTION (a) Glycine is achiral, and so Gly-Ala has only one stereo- genic center, the H9251-carbon atom of the L-alanine residue. When the carbon chain is drawn in an extended zigzag fashion and L-alanine is the C terminus, its struc- ture is as shown: The structures of higher peptides follow in an analogous fashion. Figure 27.7 gives the structural formula and amino acid sequence of a naturally occurring pentapeptide known as leucine enkephalin. Enkephalins are pentapeptide components of endorphins, polypeptides present in the brain that act as the body’s own painkillers. A second sub- stance, known as methionine enkephalin, is also present in endorphins. Methionine enkephalin differs from leucine enkephalin only in having methionine instead of leucine as its C-terminal amino acid. CO 2 H11002 H 3 N H11001 O N H HH 3 C Glycyl-L-alanine (Gly-Ala) N-terminal amino acid C-terminal amino acidNHCHCO 2 H11002 H 3 NCH 2 C H11001 O CH 3 Glycylalanine (GA) Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 27.7 Peptides 1069 PROBLEM 27.12 What is the amino acid sequence (using three-letter abbrevia- tions) of methionine enkephalin? Show it using one-letter abbreviations. Peptides having structures slightly different from those described to this point are known. One such variation is seen in the nonapeptide oxytocin, shown in Figure 27.8. Oxytocin is a hormone secreted by the pituitary gland that stimulates uterine contrac- tions during childbirth. Rather than terminating in a carboxyl group, the terminal glycine residue in oxytocin has been modified so that it exists as the corresponding amide. Two cysteine units, one of them the N-terminal amino acid, are joined by the sulfur–sulfur bond of a large-ring cyclic disulfide unit. This is a common structural modification in polypeptides and proteins that contain cysteine residues. It provides a covalent bond between regions of peptide chains that may be many amino acid residues removed from each other. Tyr Gly Gly Phe Leu (b) Tyr Gly Phe LeuGly (a) HO NH 3 S T O X O X N H N H H N X O X O H H CH 2 C H N C H C H11001 CH 2 CH(CH 3 ) 2 CO 2 H11002 FIGURE 27.7 The structure of the pentapep- tide leucine enkephalin shown as (a) a structural drawing and (b) as a molecu- lar model. The shape of the molecular model was deter- mined by X-ray crystallogra- phy. Hydrogens have been omitted for clarity. Recall from Section 15.14 that compounds of the type RSH are readily oxidized to RSSR. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 27.8 INTRODUCTION TO PEPTIDE STRUCTURE DETERMINATION There are several levels of peptide structure. The primary structure is the amino acid sequence plus any disulfide links. With the 20 amino acids of Table 27.1 as building blocks, 20 2 dipeptides, 20 3 tripeptides, 20 4 tetrapeptides, and so on, are possible. Given a peptide of unknown structure, how do we determine its amino acid sequence? We’ll describe peptide structure determination by first looking at one of the great achievements of biochemistry, the determination of the amino acid sequence of insulin by Frederick Sanger of Cambridge University (England). Sanger was awarded the 1958 Nobel Prize in chemistry for this work, which he began in 1944 and completed 10 years later. The methods used by Sanger and his coworkers are, of course, dated by now, but the overall strategy hasn’t changed very much. We’ll use Sanger’s insulin work to ori- ent us with respect to strategy, then show how current methods of protein sequencing have evolved from it. Sanger’s strategy can be outlined as follows: 1. Determine what amino acids are present and their molar ratios. 2. Cleave the peptide into smaller fragments, separate these fragments, and determine the amino acid composition of the fragments. 3. Identify the N-terminal and the C-terminal amino acid in the original peptide and in each fragment. 4. Organize the information so that the amino acid sequences of small fragments can be overlapped to reveal the full sequence. 27.9 AMINO ACID ANALYSIS The chemistry behind amino acid analysis is nothing more than acid-catalyzed hydroly- sis of amide (peptide) bonds. The peptide is hydrolyzed by heating in 6 M hydrochloric acid for about 24 h to give a solution that contains all the amino acids. This mixture is then separated by ion-exchange chromatography, which separates the amino acids mainly according to their acid–base properties. As the amino acids leave the chro- matography column, they are mixed with ninhydrin and the intensity of the ninhydrin 1070 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids Sanger was a corecipient of a second Nobel Prize in 1980 for devising methods for se- quencing nucleic acids. Sanger’s strategy for nucleic acid sequencing will be de- scribed in Section 27.29. H H O H NH C H CH 2 CNH 2 O O 0 0 O 0 H OH H H O X H 2 NCCH 2 O X C X O W CH 2 O X CH 2 W H N H N O0 N H CH 3 CHCH 2 CH 3 CH 2 HN ± ± ± W NH 2 N N H C X O C H N O X H 2 NC (CH 3 ) 2 CH S S FIGURE 27.8 The structure of oxytocin, a nonapeptide containing a disulfide bond between two cysteine residues. One of these cysteines is the N-terminal amino acid and is highlighted in blue. The C-terminal amino acid is the amide of glycine and is high- lighted in red. There are no free carboxyl groups in the molecule; all exist in the form of carboxamides. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website color monitored electronically. The amino acids are identified by comparing their chro- matographic behavior with authentic samples, and their relative amounts from peak areas as recorded on a strip chart. The entire operation is carried out automatically using an amino acid analyzer and is so sensitive that as little as 10 H110025 –10 H110027 g of the peptide is required. PROBLEM 27.13 Amino acid analysis of a certain tetrapeptide gave alanine, glycine, phenylalanine, and valine in equimolar amounts. What amino acid sequences are possible for this tetrapeptide? 27.10 PARTIAL HYDROLYSIS OF PEPTIDES Whereas acid-catalyzed hydrolysis of peptides cleaves amide bonds indiscriminately and eventually breaks all of them, enzymatic hydrolysis is much more selective and is the method used to convert a peptide into smaller fragments. The enzymes that catalyze the hydrolysis of peptides are called peptidases, pro- teases, or proteolytic enzymes. One group of pancreatic enzymes, known as car- boxypeptidases, catalyzes only the hydrolysis of the peptide bond to the C-terminal amino acid, for example. Trypsin, a digestive enzyme present in the intestine, catalyzes only the hydrolysis of peptide bonds involving the carboxyl group of a lysine or argi- nine residue. Chymotrypsin, another digestive enzyme, is selective for peptide bonds involving the carboxyl group of amino acids with aromatic side chains (phenylalanine, tryrosine, tryptophan). In addition to these, many other digestive enzymes are known and their selectivity exploited in the selective hydrolysis of peptides. PROBLEM 27.14 Digestion of the tetrapeptide of Problem 27.13 with chy- motrypsin gave a dipeptide that on amino acid analysis gave phenylalanine and valine in equimolar amounts. What amino acid sequences are possible for the tetrapeptide? 27.11 END GROUP ANALYSIS An amino acid sequence is ambiguous unless we know the direction in which to read it—left to right, or right to left. We need to know which end is the N terminus and which is the C terminus. As we saw in the preceding section, carboxypeptidase-catalyzed hydrolysis cleaves the C-terminal amino acid and so can be used to identify it. What about the N terminus? Several chemical methods have been devised for identifying the N-terminal amino acid. They all take advantage of the fact that the N-terminal amino group is free and can act as a nucleophile. The H9251-amino groups of all the other amino acids are part of amide linkages, are not free, and are much less nucleophilic. Sanger’s method for N-terminal residue analysis involves treating a peptide with 1-fluoro-4-nitrobenzene, which is very reactive toward nucleophilic aromatic substitution. NHCHCNHCHCNHCHC RH11032R RH11033 O OO Site of chymotrypsin-catalyzed hydrolysis when RH11032 is an aromatic side chain 27.11 End Group Analysis 1071 Papain, the active compo- nent of most meat tenderiz- ers, is a proteolytic enzyme. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website The amino group of the N-terminal amino acid displaces fluoride from 1-fluoro-2,4-dini- trobenzene and gives a peptide in which the N-terminal nitrogen is labeled with a 2,4-dinitrophenyl (DNP) group. This is shown for the case of Val-Phe-Gly-Ala in Fig- ure 27.9. The 2,4-dinitrophenyl-labeled peptide DNP-Val-Phe-Gly-Ala is isolated and subjected to hydrolysis, after which the 2,4-dinitrophenyl derivative of the N-terminal amino acid is isolated and identified as DNP-Val by comparing its chromatographic behavior with that of standard samples of 2,4-dinitrophenyl-labeled amino acids. None of the other amino acid residues bear a 2,4-dinitrophenyl group; they appear in the hydrolysis product as the free amino acids. FO 2 N NO 2 1-Fluoro-2,4-dinitrobenzene Nucleophiles attack here, displacing fluoride. 1072 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids 1-Fluoro-2,4-dinitrobenzene O 2 N ± Val-Phe-Gly-Ala DNP-Val-Phe-Gly-Ala H 3 O H11001 DNP-Val H11001H11001H11001 Phe Gly Ala The reaction is carried out by mixing the peptide and 1-fluoro-2,4-dinitrobenzene in the presence of a weak base such as sodium carbonate. In the first step the base abstracts a proton from the terminal H 3 N H11001 group to give a free amino function. The nucleophilic amino group attacks 1-fluoro-2,4-dinitrobenzene, displacing fluoride. Acid hydrolysis cleaves the amide bonds of the 2,4-dinitrophenyl-labeled peptide, giving the 2,4-dinitrophenyl-labeled N-terminal amino acid and a mixture of unlabeled amino acids. ± F H11001 H 2 NCHC ± NHCHC ± NHCH 2 C ± NHCHCO 2 H11002 NO 2 O 2 N ± NO 2 O 2 N ± NO 2 O X O X O X (CH 3 ) 2 CH CH 2 C 6 H 5 CH 3 O X O X O X CH(CH 3 ) 2 CH 2 C 6 H 5 CH 3 ± NHCHC ± NHCHC ± NHCH 2 C ± NHCHCO 2 H11002 CH(CH 3 ) 2 CH 2 C 6 H 5 CH 3 ± NHCHCO 2 H H11001 H 3 NCHCO 2 H H11001 H 3 NCH 2 CO 2 H H11001 H 3 NCHCO 2 H FIGURE 27.9 Use of 1- fluoro-2,4-dinitrobenzene to identify the N-terminal amino acid of a peptide. 1-Fluoro-4-nitrobenzene is commonly referred to as Sanger’s reagent. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website Labeling the N-terminal amino acid as its DNP derivative is mainly of historical interest and has been replaced by other methods. We’ll discuss one of these—the Edman degradation—in Section 27.13. First, though, we’ll complete our review of the general strategy for peptide sequencing by seeing how Sanger tied all of the information together into a structure for insulin. 27.12 INSULIN Insulin has 51 amino acids, divided between two chains. One of these, the A chain, has 21 amino acids; the other, the B chain, has 30. The A and B chains are joined by disul- fide bonds between cysteine residues (Cys±Cys). Figure 27.10 shows some of the infor- mation that defines the amino acid sequence of the B chain. ? Reaction of the B chain peptide with 1-fluoro-4-nitrobenzene established that phenylalanine is the N terminus. ? Pepsin-catalyzed hydrolysis gave the four peptides shown in blue in Figure 27.10. (Their sequences were determined in separate experiments.) These four peptides contain 27 of the 30 amino acids in the B chain, but there are no points of over- lap between them. ? The sequences of the four tetrapeptides shown in red in Figure 27.10 bridge the gaps between three of the four “blue” peptides to give an unbroken sequence from 1 through 24. ? The peptide shown in yellow was isolated by trypsin-catalyzed hydrolysis and has an amino acid sequence that completes the remaining overlaps. Sanger also determined the sequence of the A chain and identified the cysteine residues involved in disulfide bonds between the A and B chains as well as in the 27.12 Insulin 1073 Phe-Val-Asn-Gln-His-Leu-Cys-Gly-Ser-His-Leu-Val-Glu-Ala-Leu-Tyr-Leu-Val-Cys-Gly-Glu-Arg-Gly-Phe-Phe-Tyr-Thr-Pro-Lys-Ala 1 5 10 15 20 25 30 Tyr-Thr-Pro-Lys-Ala 3029282726 Gly-Phe-Phe-Tyr-Thr-Pro-Lys 25 Val-Cys-Gly-Glu-Arg-Gly-Phe 18 2019 21 22 23 24 Tyr-Leu-Val-Cys 16 17 Ala-Leu-Tyr Val-Glu-Ala-Leu 12 13 14 15 Leu-Val-Glu-Ala Ser-His-Leu-Val Phe-Val-Asn-Gln-His-Leu-Cys-Gly-Ser-His-Leu 13 1024567 189 FIGURE 27.10 Diagram showing how the amino acid sequence of the B chain of bovine insulin can be deter- mined by overlap of peptide fragments. Pepsin-catalyzed hydrolysis produced the fragments shown in blue, trypsin produced the one shown in yellow, and acid- catalyzed hydrolysis gave many fragments, including the four shown in red. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website disulfide linkage within the A chain. The complete insulin structure is shown in Figure 27.11. The structure shown is that of bovine insulin (from cattle). The A chains of human insulin and bovine insulin differ in only two amino acid residues; their B chains are iden- tical except for the amino acid at the C terminus. 27.13 THE EDMAN DEGRADATION AND AUTOMATED SEQUENCING OF PEPTIDES The years that have passed since Sanger determined the structure of insulin have seen refinements in technique while retaining the same overall strategy. Enzyme-catalyzed hydrolysis to convert a large peptide to smaller fragments remains an important compo- nent, as does searching for overlaps among these smaller fragments. The method for N-terminal residue analysis, however, has been improved so that much smaller amounts of peptide are required, and the analysis has been automated. When Sanger’s method for N-terminal residue analysis was discussed, you may have wondered why it was not done sequentially. Simply start at the N terminus and work steadily back to the C terminus identifying one amino acid after another. The idea is fine, but it just doesn’t work well in practice, at least with 1-fluoro-4-nitrobenzene. A major advance was devised by Pehr Edman (University of Lund, Sweden) that has become the standard method for N-terminal residue analysis. The Edman degrada- tion is based on the chemistry shown in Figure 27.12. A peptide reacts with phenyl iso- thiocyanate to give a phenylthiocarbamoyl (PTC) derivative, as shown in the first step. This PTC derivative is then treated with an acid in an anhydrous medium (Edman used nitromethane saturated with hydrogen chloride) to cleave the amide bond between the N-terminal amino acid and the remainder of the peptide. No other peptide bonds are cleaved in this step as amide bond hydrolysis requires water. When the PTC derivative is treated with acid in an anhydrous medium, the sulfur atom of the C?S unit acts as an internal nucleophile, and the only amide bond cleaved under these conditions is the one to the N-terminal amino acid. The product of this cleavage, called a thiazolone, is unstable under the conditions of its formation and rearranges to a phenylthiohydantoin (PTH), which is isolated and identified by comparing it with standard samples of PTH derivatives of known amino acids. This is normally done by chromatographic methods, but mass spectrometry has also been used. 1074 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids S S N terminus of A chain N terminus of B chain C terminus of A chain C terminus of B chain S S S S 5 5 10 15 10 15 20 20 25 30 Ile Val Val Val Val Val Glu Asn Glu Glu Gly Gly Gly Gly Gln Leu Gln Cys Cys Cys Cys Cys Ala Ala Ala Ser Ser Ser Tyr Glu Leu Leu Leu Leu Gln Cys Tyr Lys Tyr Asn Tyr Asn Phe Phe Phe His His Leu Arg Thr Pro FIGURE 27.11 The amino acid sequence in bovine insulin. The A chain is shown in red and the B chain in blue. The A chain is joined to the B chain by two disul- fide units (yellow). There is also a disulfide bond linking cysteines 6 and 11 in the A chain. Human insulin has threonine and isoleucine at residues 8 and 10, respec- tively, in the A chain and threonine as the C-terminal amino acid in the B chain. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website Only the N-terminal amide bond is broken in the Edman degradation; the rest of the peptide chain remains intact. It can be isolated and subjected to a second Edman pro- cedure to determine its new N terminus. We can proceed along a peptide chain by begin- ning with the N terminus and determining each amino acid in order. The sequence is given directly by the structure of the PTH derivative formed in each successive degra- dation. PROBLEM 27.15 Give the structure of the PTH derivative isolated in the second Edman cycle of the tetrapeptide Val-Phe-Gly-Ala. Ideally, one could determine the primary structure of even the largest protein by repeating the Edman procedure. Because anything less than 100% conversion in any sin- gle Edman degradation gives a mixture containing some of the original peptide along with the degraded one, two different PTH derivatives are formed in the next Edman cycle, and the ideal is not realized in practice. Nevertheless, some impressive results 27.13 The Edman Degradation and Automated Sequencing of Peptides 1075 Step 3: Once formed, the thiazolone derivative isomerizes to a more stable phenylthiohydantoin (PTH) derivative, which is isolated and characterized, thereby providing identification of the N-terminal amino acid. The remainder of the peptide (formed in step 2) can be isolated and subjected to a second Edman degradation. Step 2: On reaction with hydrogen chloride in an anhydrous solvent, the thiocarbonyl sulfur of the PTC derivative attacks the carbonyl carbon of the N-terminal amino acid. The N-terminal amino acid is cleaved as a thiazolone derivative from the remainder of the peptide. C 6 H 5 NH Step 1: A peptide is treated with phenyl isothiocyanate to give a phenylthiocarbamoyl (PTC) derivative.1: H11001 C 6 H 5 NH 3 NCHCH11001 Phenyl isothiocyanate C S R O NH CNHCHC S R O NH PTC derivative C 6 H 5 NHC S N H CH R C O NH C 6 H 5 NH HCl S CC CHN R O ThiazolonePTC derivative H11001 H11001 H 3 N Remainder of peptide C 6 H 5 NH S CC CHN R O Thiazolone H11546 Cl Cl H C 6 H 5 NH S CC CHN RH O Cl N CC CH R O Cl S HN H C 6 H 5 N CC CH R OS HN PTH derivative C 6 H 5 PEPTIDEPEPTIDE PEPTIDEPEPTIDE FIGURE 27.12 Identification of the N-terminal amino acid of a peptide by Edman degradation. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website have been achieved. It is a fairly routine matter to sequence the first 20 amino acids from the N terminus by repetitive Edman cycles, and even 60 residues have been determined on a single sample of the protein myoglobin. The entire procedure has been automated and incorporated into a device called an Edman sequenator, which carries out all the operations under computer control. The amount of sample required is quite small; as little as 10 H1100210 mol is typical. So many peptides and proteins have been sequenced now that it is impossible to give an accurate count. What was Nobel Prize-winning work in 1958 is routine today. Nor has the story ended. Sequencing of nucleic acids has advanced so dramatically that it is pos- sible to clone the gene that codes for a particular protein, sequence its DNA, and deduce the structure of the protein from the nucleotide sequence of the DNA. We’ll have more to say about DNA sequencing later in the chapter. 27.14 THE STRATEGY OF PEPTIDE SYNTHESIS One way to confirm the structure proposed for a peptide is to synthesize a peptide hav- ing a specific sequence of amino acids and compare the two. This was done, for exam- ple, in the case of bradykinin, a peptide present in blood that acts to lower blood pres- sure. Excess bradykinin, formed as a response to the sting of wasps and other insects containing substances in their venom that stimulate bradykinin release, causes severe local pain. Bradykinin was originally believed to be an octapeptide containing two pro- line residues; however, a nonapeptide containing three prolines in the following sequence was synthesized and determined to be identical with natural bradykinin in every respect, including biological activity: A reevaluation of the original sequence data established that natural bradykinin was indeed the nonapeptide shown. Here the synthesis of a peptide did more than confirm structure; synthesis was instrumental in determining structure. Chemists and biochemists also synthesize peptides in order to better understand how they act. By systematically altering the sequence, it’s sometimes possible to find out which amino acids are intimately involved in the reactions that involve a particular peptide. Many synthetic peptides have been prepared in searching for new drugs. The objective in peptide synthesis may be simply stated: to connect amino acids in a prescribed sequence by amide bond formation between them. A number of very effective methods and reagents have been designed for peptide bond formation, so that the joining together of amino acids by amide linkages is not difficult. The real difficulty lies in ensuring that the correct sequence is obtained. This can be illustrated by consid- ering the synthesis of a representative dipeptide, Phe-Gly. Random peptide bond forma- tion in a mixture containing phenylalanine and glycine would be expected to lead to four dipeptides: Phenylalanine H 3 NCHCO 2 H11002 H11001 CH 2 C 6 H 5 Glycine H 3 NCH 2 CO 2 H11002 H11001 Phe-Gly Gly-Phe Gly-GlyPhe-PheH11001 H11001H11001H11001 Arg-Pro-Pro-Gly-Phe-Ser-Pro-Phe-Arg Bradykinin 1076 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website In order to direct the synthesis so that only Phe-Gly is formed, the amino group of phenylalanine and the carboxyl group of glycine must be protected so that they cannot react under the conditions of peptide bond formation. We can represent the peptide bond formation step by the following equation, where X and Y are amine- and carboxyl- protecting groups, respectively: Thus, the synthesis of a dipeptide of prescribed sequence requires at least three operations: 1. Protect the amino group of the N-terminal amino acid and the carboxyl group of the C-terminal amino acid. 2. Couple the two protected amino acids by amide bond formation between them. 3. Deprotect the amino group at the N terminus and the carboxyl group at the C ter- minus. Higher peptides are prepared in an analogous way by a direct extension of the logic just outlined for the synthesis of dipeptides. Sections 27.15 through 27.18 describe the chemistry associated with the protection and deprotection of amino and carboxyl functions, along with methods for peptide bond formation. 27.15 AMINO GROUP PROTECTION The reactivity of an amino group is suppressed by converting it to an amide, and amino groups are most often protected by acylation. The benzyloxycarbonyl group is one of the most often used amino-protecting groups. It is attached by acylation of an amino acid with benzyloxycarbonyl chloride. PROBLEM 27.16 Lysine reacts with two equivalents of benzyloxycarbonyl chlo- ride to give a derivative containing two benzyloxycarbonyl groups. What is the structure of this compound? CH 2 OCCl O Benzyloxycarbonyl chloride H11001 H11001 CH 2 C 6 H 5 O H 3 NCHCO H11002 Phenylalanine 1. NaOH, H 2 O 2. H H11001 CH 2 OCNHCHCO 2 H CH 2 C 6 H 5 O N-Benzyloxycarbonylphenylalanine (82–87%) (C 6 H 5 CH 2 OC±) O X 27.15 Amino Group Protection 1077 H11001 H 2 NCH 2 CY O C-Protected glycine X NHCHCOH CH 2 C 6 H 5 O N-Protected phenylalanine X NHCHC CH 2 C 6 H 5 O NHCH 2 CY O Protected Phe-Gly couple deprotect NHCH 2 CO H11002 O H 3 NCHC H11001 CH 2 C 6 H 5 O Phe-Gly Another name for the benzyl- oxycarbonyl group is carbo- benzoxy. This name, and its abbreviation Cbz, are often found in the older literature, but are no longer a part of IUPAC nomenclature. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website Just as it is customary to identify individual amino acids by abbreviations, so too with protected amino acids. The approved abbreviation for a benzyloxycarbonyl group is the letter Z. Thus, N-benzyloxycarbonylphenylalanine is represented as The value of the benzyloxycarbonyl protecting group is that it is easily removed by reactions other than hydrolysis. In peptide synthesis, amide bonds are formed. We protect the N terminus as an amide but need to remove the protecting group without cleaving the very amide bonds we labored so hard to construct. Removing the protect- ing group by hydrolysis would surely bring about cleavage of peptide bonds as well. One advantage that the benzyloxycarbonyl protecting group enjoys over more familiar acyl groups such as acetyl is that it can be removed by hydrogenolysis in the presence of palladium. The following equation illustrates this for the removal of the benzyloxy- carbonyl protecting group from the ethyl ester of Z-Phe-Gly: Alternatively, the benzyloxycarbonyl protecting group may be removed by treat- ment with hydrogen bromide in acetic acid: Deprotection by this method rests on the ease with which benzyl esters are cleaved by nucleophilic attack at the benzylic carbon in the presence of strong acids. Bromide ion is the nucleophile. A related N-terminal-protecting group is tert-butoxycarbonyl, abbreviated Boc: Like the benzyloxycarbonyl protecting group, the Boc group may be removed by treat- ment with hydrogen bromide (it is stable to hydrogenolysis, however): (CH 3 ) 3 COC O tert-Butoxycarbonyl (Boc-) (CH 3 ) 3 COC NHCHCO 2 H CH 2 C 6 H 5 O N-tert-Butoxycarbonylphenylalanine CH 2 C 6 H 5 BocNHCHCO 2 H Boc-Phe also written as ZNHCHCO 2 H CH 2 C 6 H 5 or more simply as Z-Phe 1078 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids Hydrogenolysis refers to the cleavage of a molecule un- der conditions of catalytic hydrogenation. CH 2 C 6 H 5 O H 3 NCHCNHCH 2 CO 2 CH 2 CH 3 Br H11002 H11001 Phenylalanylglycine ethyl ester hydrobromide (82%) HBr C 6 H 5 CH 2 OCNHCHCNHCH 2 CO 2 CH 2 CH 3 CH 2 C 6 H 5 O O N-Benzyloxycarbonylphenylalanylglycine ethyl ester C 6 H 5 CH 2 Br Benzyl bromide H11001H11001CO 2 Carbon dioxide CH 2 C 6 H 5 O H 2 NCHCNHCH 2 CO 2 CH 2 CH 3 Phenylalanylglycine ethyl ester (100%) H 2 Pd C 6 H 5 CH 2 OCNHCHCNHCH 2 CO 2 CH 2 CH 3 CH 2 C 6 H 5 O O N-Benzyloxycarbonylphenylalanylglycine ethyl ester C 6 H 5 CH 3 Toluene H11001H11001CO 2 Carbon dioxide Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website The tert-butyl group is cleaved as the corresponding carbocation. Loss of a proton from tert-butyl cation converts it to 2-methylpropene. Because of the ease with which a tert- butyl group is cleaved as a carbocation, other acidic reagents, such as trifluoroacetic acid, may also be used. 27.16 CARBOXYL GROUP PROTECTION Carboxyl groups of amino acids and peptides are normally protected as esters. Methyl and ethyl esters are prepared by Fischer esterification. Deprotection of methyl and ethyl esters is accomplished by hydrolysis in base. Benzyl esters are a popular choice because they can be removed by hydrogenolysis. Thus a synthetic peptide, protected at both its N terminus with a Z group and at its C terminus as a benzyl ester, can be completely deprotected in a single operation. Several of the amino acids listed in Table 27.1 bear side-chain functional groups, which must also be protected during peptide synthesis. In most cases, protecting groups are available that can be removed by hydrogenolysis. 27.17 PEPTIDE BOND FORMATION To form a peptide bond between two suitably protected amino acids, the free carboxyl group of one of them must be activated so that it is a reactive acylating agent. The most familiar acylating agents are acyl chlorides, and they were once extensively used to cou- ple amino acids. Certain drawbacks to this approach, however, led chemists to seek alter- native methods. In one method, treatment of a solution containing the N-protected and the C- protected amino acids with N,NH11032-dicyclohexylcarbodiimide (DCCI) leads directly to pep- tide bond formation: H11001 H 2 NCH 2 COCH 2 CH 3 O Glycine ethyl ester ZNHCHCOH CH 2 C 6 H 5 O Z-Protected phenylalanine ZNHCHC CH 2 C 6 H 5 O NHCH 2 COCH 2 CH 3 O Z-Protected Phe-Gly ethyl ester (83%) DCCI chloroform 27.17 Peptide Bond Formation 1079 HBr (CH 3 ) 3 COCNHCHCNHCH 2 CO 2 CH 2 CH 3 CH 2 C 6 H 5 O O N-tert-Butoxycarbonylphenylalanylglycine ethyl ester (CH 3 ) 2 CCH 2 2-Methylpropene CH 2 C 6 H 5 O H 3 NCHCNHCH 2 CO 2 CH 2 CH 3 Br H11002 H11001 Phenylalanylglycine ethyl ester hydrobromide (86%) H11001H11001CO 2 Carbon dioxide CH 2 C 6 H 5 O H 3 NCHCNHCH 2 CO 2 H11002 H11001 Phenylalanylglycine (87%) C 6 H 5 CH 2 OCNHCHCNHCH 2 CO 2 CH 2 C 6 H 5 CH 2 C 6 H 5 O O N-Benzyloxycarbonylphenylalanylglycine benzyl ester 2C 6 H 5 CH 3 Toluene H11001H11001 CO 2 Carbon dioxide H 2 Pd An experiment using Boc protection in the synthesis of a dipeptide can be found in the November 1989 issue of the Journal of Chemical Edu- cation, pp. 965–967. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website N,NH11032-Dicyclohexylcarbodiimide has the structure shown: The mechanism by which DCCI promotes the condensation of an amine and a carboxylic acid to give an amide is outlined in Figure 27.13. PROBLEM 27.17 Show the steps involved in the synthesis of Ala-Leu from ala- nine and leucine using benzyloxycarbonyl and benzyl ester protecting groups and DCCI-promoted peptide bond formation. In the second major method of peptide synthesis the carboxyl group is activated by converting it to an active ester, usually a p-nitrophenyl ester. Recall from Section 20.11 that esters react with ammonia and amines to give amides. p-Nitrophenyl esters are much more reactive than methyl and ethyl esters in these reactions because p-nitro- phenoxide is a better (less basic) leaving group than methoxide and ethoxide. Simply allowing the active ester and a C-protected amino acid to stand in a suitable solvent is sufficient to bring about peptide bond formation by nucleophilic acyl substitution. The p-nitrophenol formed as a byproduct in this reaction is easily removed by extrac- tion with dilute aqueous base. Unlike free amino acids and peptides, protected peptides are not zwitterionic and are more soluble in organic solvents than in water. PROBLEM 27.18 p-Nitrophenyl esters are made from Z-protected amino acids by reaction with p-nitrophenol in the presence of N,NH11032-dicyclohexylcarbodiimide. Suggest a reasonable mechanism for this reaction. PROBLEM 27.19 Show how you could convert the ethyl ester of Z-Phe-Gly to Leu-Phe-Gly (as its ethyl ester) by the active ester method. Higher peptides are prepared either by stepwise extension of peptide chains, one amino acid at a time, or by coupling of fragments containing several residues (the frag- ment condensation approach). Human pituitary adrenocorticotropic hormone (ACTH), for example, has 39 amino acids and was synthesized by coupling of smaller peptides containing residues 1–10, 11–16, 17–24, and 25–39. An attractive feature of this approach is that the various protected peptide fragments may be individually purified, which simplifies the purification of the final product. Among the substances that have been synthesized by fragment condensation are insulin (51 amino acids) and the protein ribonuclease A (124 amino acids). In the stepwise extension approach, the starting N C N N,NH11032-Dicyclohexylcarbodiimide (DCCI) 1080 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids NO 2 OZNHCHC CH 2 C 6 H 5 O Z-Protected phenylalanine p-nitrophenyl ester H11001 H 2 NCH 2 COCH 2 CH 3 O Glycine ethyl ester chloroform ZNHCHC CH 2 C 6 H 5 O NHCH 2 COCH 2 CH 3 O Z-Protected Phe-Gly ethyl ester (78%) H11001 OH NO 2 p-Nitrophenol Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 27.17 Peptide Bond Formation 1081 Step 2: Structurally, O-acylisoureas resemble carboxylic acid anhydrides and are powerful acylating agents. In the reaction's second stage the amine adds to the carbonyl group of the O-acylisourea to give a tetrahedral intermediate. Step 1: In the first stage of the reaction, the carboxylic acid adds to one of the double bonds of DCCI to give an O-acylisourea. DCCI = N,NH11032-dicyclohexylcarbodiimide; R = cyclohexyl Overall reaction: NH 2 H11001 C HN O Amine Mechanism: CO 2 H H11001 Carboxylic acid RN C NR DCCI Amide H11001 RNHCNHR O C H O Carboxylic acid O C NR NR DCCI C O O H11002 C NR NR + C O O-Acylisourea OC NR NHR NH 2 H11001 H C O O-Acylisourea OC NR NHR C OH HN O C NR NHR Tetrahedral intermediate Tetrahedral intermediate Amine Step 3: The tetrahedral intermediate dissociates to an amide and N,NH11032-dicyclohexylurea. C O HN O C NR NHR H C HN O Amide H11001 C HNR O NHR N,NH11032-Dicyclohexylurea N,NH11032-Dicyclohexylurea FIGURE 27.13 The mechanism of amide bond formation by N,NH11032-dicyclohexylcarbodiimide-promoted condensation of a car- boxylic acid and an amine. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website peptide in a particular step differs from the coupling product by only one amino acid residue and the properties of the two peptides may be so similar as to make purification by conventional techniques all but impossible. The following section describes a method by which many of the difficulties involved in the purification of intermediates have been overcome. 27.18 SOLID-PHASE PEPTIDE SYNTHESIS: THE MERRIFIELD METHOD In 1962, R. Bruce Merrifield of Rockefeller University reported the synthesis of the non- apeptide bradykinin (see Section 27.14) by a novel method. In Merrifield’s method, pep- tide coupling and deprotection are carried out not in homogeneous solution but at the surface of an insoluble polymer, or solid support. Beads of a copolymer prepared from styrene containing about 2% divinylbenzene are treated with chloromethyl methyl ether and tin(IV) chloride to give a resin in which about 10% of the aromatic rings bear ±CH 2 Cl groups (Figure 27.14). The growing peptide is anchored to this polymer, and excess reagents, impurities, and byproducts are removed by thorough washing after each operation. This greatly simplifies the purification of intermediates. The actual process of solid-phase peptide synthesis, outlined in Figure 27.15, begins with the attachment of the C-terminal amino acid to the chloromethylated poly- mer in step 1. Nucleophilic substitution by the carboxylate anion of an N-Boc-protected C-terminal amino acid displaces chloride from the chloromethyl group of the polymer to form an ester, protecting the C terminus while anchoring it to a solid support. Next, the Boc group is removed by treatment with acid (step 2), and the polymer containing the unmasked N terminus is washed with a series of organic solvents. Byproducts are removed, and only the polymer and its attached C-terminal amino acid residue remain. Next (step 3), a peptide bond to an N-Boc-protected amino acid is formed by conden- sation in the presence of N,NH11032-dicyclohexylcarbodiimide. Again, the polymer is washed thoroughly. The Boc-protecting group is then removed by acid treatment (step 4), and after washing, the polymer is now ready for the addition of another amino acid residue by a repetition of the cycle. When all the amino acids have been added, the synthetic peptide is removed from the polymeric support by treatment with hydrogen bromide in trifluoroacetic acid. By successively adding amino acid residues to the C-terminal amino acid, it took Merrifield only 8 days to synthesize bradykinin in 68% yield. The biological activity of synthetic bradykinin was identical with that of natural material. 1082 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids Merrifield was awarded the 1984 Nobel Prize in chem- istry for developing the solid-phase method of pep- tide synthesis. CH 2 CH 2 CH 2 W CH 2 Cl CH W CH W CH W CH W S CH 2 S T S T S T S T FIGURE 27.14 A section of polystyrene showing one of the benzene rings modified by chloromethylation. Individual polystyrene chains in the resin used in solid-phase peptide syn- thesis are connected to one another at various points (cross-linked) by adding a small amount of p-divinylbenzene to the styrene monomer. The chloromethylation step is carried out under conditions such that only about 10% of the benzene rings bear ±CH 2 Cl groups. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 27.18 Solid-Phase Peptide Synthesis: The Merrifield Method 1083 Step 1: The Boc-protected amino acid is anchored to the resin. Nucleophilic substitution of the benzylic chloride by the carboxylate anion gives an ester. BocNHCHC O O H11002 R CH 2 Cl BocNHCHCO O R CH 2 H 2 NCHCO O R CH 2 BocNHCHCO 2 H RH11032 DCCI NHCHCO O R CH 2 O R CH 2 HCl HCl BocNHCHC O RH11032 H 2 NCHCNHCHCO O RH11032 HBr, CF 3 CO 2 H Step 2: The Boc protecting group is removed by treatment with hydrochloric acid in dilute acetic acid. After the resin has been washed, the C-terminal amino acid is ready for coupling. Step 3: The resin-bound C-terminal amino acid is coupled to an N-protected amino acid by using N,NH11032-dicyclohexylcarbodiimide. Excess reagent and N,NH11032-dicyclohexylurea are washed away from the resin after coupling is complete. Step 4: The Boc protecting group is removed as in step 2. If desired, steps 3 and 4 may be repeated to introduce as many amino acid residues as desired. NHCHCNHCHCO 2 H O R Resin RH11032 BrCH 2 H11001CPEPTIDEH 3 N H11001 Step n: When the peptide is completely assembled, it is removed from the resin by treatment with hydrogen bromide in trifluoroacetic acid. Resin Resin Resin Resin Resin O FIGURE 27.15 Peptide syn- thesis by the solid-phase method of Merrifield. Amino acid residues are attached sequentially beginning at the C terminus. PROBLEM 27.20 Starting with phenylalanine and glycine, outline the steps in the preparation of Phe-Gly by the Merrifield method. Merrifield successfully automated all the steps in solid-phase peptide synthesis, and computer-controlled equipment is now commercially available to perform this synthesis. Using an early version of his “peptide synthesizer,” in collaboration with coworker Bernd Gutte, Merrifield reported the synthesis of the enzyme ribonuclease in 1969. It took them Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website only 6 weeks to perform the 369 reactions and 11,391 steps necessary to assemble the sequence of 124 amino acids of ribonuclease. Solid-phase peptide synthesis does not solve all purification problems, however. Even if every coupling step in the ribonuclease synthesis proceeded in 99% yield, the product would be contaminated with many different peptides containing 123 amino acids, 122 amino acids, and so on. Thus, Merrifield and Gutte’s 6 weeks of synthesis was fol- lowed by 4 months spent in purifying the final product. The technique has since been refined to the point that yields at the 99% level and greater are achieved with current instrumentation, and thousands of peptides and peptide analogs have been prepared by the solid-phase method. Merrifield’s concept of a solid-phase method for peptide synthesis and his devel- opment of methods for carrying it out set the stage for an entirely new way to do chem- ical reactions. Solid-phase synthesis has been extended to include numerous other classes of compounds and has helped spawn a whole new field called combinatorial chemistry. Combinatorial synthesis allows a chemist, using solid-phase techniques, to prepare hun- dreds of related compounds (called libraries) at a time. It is one of the most active areas of organic synthesis, especially in the pharmaceutical industry. 27.19 SECONDARY STRUCTURES OF PEPTIDES AND PROTEINS The primary structure of a peptide is its amino acid sequence. We also speak of the sec- ondary structure of a peptide, that is, the conformational relationship of nearest neigh- bor amino acids with respect to each other. On the basis of X-ray crystallographic stud- ies and careful examination of molecular models, Linus Pauling and Robert B. Corey of the California Institute of Technology showed that certain peptide conformations were more stable than others. Two arrangements, the H9251 helix and the pleated H9252 sheet, stand out as secondary structural units that are both particularly stable and commonly encoun- tered. Both of these incorporate two important features: 1. The geometry of the peptide bond is planar and the main chain is arranged in an anti conformation (Section 27.7). 2. Hydrogen bonding can occur when the N±H group of one amino acid unit and the C?O group of another are close in space; conformations that maximize the number of these hydrogen bonds are stabilized by them. Figure 27.16 illustrates a H9252 sheet structure for a protein composed of alternating glycine and alanine residues. There are hydrogen bonds between the C?O and H±N groups of adjacent antiparallel chains. Van der Waals repulsions between the H9251 hydrogens 1084 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website of glycine and the methyl groups of alanine cause the chains to rotate with respect to one another to give a rippled effect. Hence the name pleated H9252 sheet. The pleated H9252 sheet is an important secondary structure, especially in proteins that are rich in amino acids with small side chains, such as H (glycine), CH 3 (alanine), and CH 2 OH (serine). Fibroin, the major protein of most silk fibers, is almost entirely pleated H9252 sheet, and over 80% of it is a repeating sequence of the six-residue unit -Gly-Ser-Gly-Ala-Gly-Ala-. The pleated H9252 sheet is flexible, but since the peptide chains are nearly in an extended con- formation, it resists stretching. Unlike the pleated H9252 sheet, in which hydrogen bonds are formed between two chains, the H9251 helix is stabilized by hydrogen bonds within a single chain. Figure 27.17 illustrates a section of peptide H9251 helix constructed from L-alanine. A right-handed heli- cal conformation with about 3.6 amino acids per turn permits each carbonyl oxygen to be hydrogen-bonded to an amide proton and vice versa. The H9251 helix is found in many proteins; the principal protein components of muscle (myosin) and wool (H9251-keratin), for example, contain high percentages of H9251 helix. When wool fibers are stretched, these heli- cal regions are elongated by the breaking of hydrogen bonds. Disulfide bonds between cysteine residues of neighboring H9251-keratin chains are too strong to be broken during stretching, however, and they limit the extent of distortion. After the stretching force is removed, the hydrogen bonds reform spontaneously, and the wool fiber returns to its original shape. Wool has properties that are different from those of silk because the sec- ondary structures of the two fibers are different, and their secondary structures are dif- ferent because the primary structures are different. Proline is the only amino acid in Table 27.1 that is a secondary amine, and its pres- ence in a peptide chain introduces an amide nitrogen that has no hydrogen available for hydrogen bonding. This disrupts the network of hydrogen bonds and divides the peptide into two separate regions of H9251 helix. The presence of proline is often associated with a bend in the peptide chain. Proteins, or sections of proteins, sometimes exist as random coils, an arrangement that lacks the regularity of the H9251 helix or pleated H9252 sheet. 27.19 Secondary Structures of Peptides and Proteins 1085 FIGURE 27.16 The H9252-sheet secondary structure of a protein, composed of alternating glycine and alanine residues. Hydrogen bonding occurs between the amide N±H of one chain and the carbonyl oxygen of another. Van der Waals repulsions between substituents at the H9251- carbon atoms, shown here as vertical methyl groups, introduces creases in the sheet. The struc- ture of the pleated H9252 sheet is seen more clearly by examining the molecular model on Learning By Modeling and rotating it in three dimensions. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 27.20 TERTIARY STRUCTURE OF PEPTIDES AND PROTEINS The tertiary structure of a peptide or protein refers to the folding of the chain. The way the chain is folded affects both the physical properties of a protein and its biolog- ical function. Structural proteins, such as those present in skin, hair, tendons, wool, and silk, may have either helical or pleated-sheet secondary structures, but in general are elongated in shape, with a chain length many times the chain diameter. They are classed as fibrous proteins and, as befits their structural role, tend to be insoluble in water. Many other proteins, including most enzymes, operate in aqueous media; some are soluble, but most are dispersed as colloids. Proteins of this type are called globular proteins. Glob- ular proteins are approximately spherical. Figure 27.18 shows carboxypeptidase A (Sec- tion 27.10), a globular protein containing 307 amino acids. A typical protein such as car- boxypeptidase A incorporates elements of a number of secondary structures: some segments are helical; others, pleated sheet; and still others correspond to no simple description. 1086 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids FIGURE 27.17 An H9251 helix of a portion of a pro- tein in which all of the amino acids are alanine. The helix is stabilized by hydrogen bonds between the N±H proton of one amide group and the carbonyl oxygen of another. The methyl groups at the H9251 carbon project away from the outer surface of the helix. When viewed along the helical axis, the chain turns in a clockwise direction (a right-handed helix). The structure of the H9251 helix is seen more clearly by examin- ing the molecular model on Learning By Modeling and rotating it in three dimen- sions. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website The shape of a large protein is influenced by many factors, including, of course, its primary and secondary structure. The disulfide bond shown in Figure 27.18 links Cys- 138 of carboxypeptidase A to Cys-161 and contributes to the tertiary structure. Car- boxypeptidase A contains a Zn 2H11001 ion, which is essential to the catalytic activity of the enzyme, and its presence influences the tertiary structure. The Zn 2H11001 ion lies near the cen- ter of the enzyme, where it is coordinated to the imidazole nitrogens of two histidine residues (His-69, His-196) and to the carboxylate side chain of Glu-72. Protein tertiary structure is also influenced by the environment. In water a globu- lar protein usually adopts a shape that places its lipophilic groups toward the interior, with its polar groups on the surface, where they are solvated by water molecules. About 65% of the mass of most cells is water, and the proteins present in cells are said to be in their native state—the tertiary structure in which they express their biological activ- ity. When the tertiary structure of a protein is disrupted by adding substances that cause the protein chain to unfold, the protein becomes denatured and loses most, if not all, of its activity. Evidence that supports the view that the tertiary structure is dictated by the primary structure includes experiments in which proteins are denatured and allowed to stand, whereupon they are observed to spontaneously readopt their native-state confor- mation with full recovery of biological activity. Most protein tertiary structures are determined by X-ray crystallography. The first, myoglobin, the oxygen storage protein of muscle, was determined in 1957. Since then thousands more have been determined. In the form of crystallographic coordinates, the data are deposited in the Protein Data Bank and are freely available. The three-dimen- sional structure of carboxypeptidase in Figure 27.18, for example, was produced by downloading the coordinates from the Protein Data Bank and converting them to a mo- lecular model. At present, the Protein Data Bank averages about one new protein struc- ture per day. Knowing how the protein chain is folded is a key ingredient in understanding the mechanism by which an enzyme catalyzes a reaction. Take carboxypeptidase for exam- ple. This enzyme catalyzes the hydrolysis of the peptide bond at the C terminus. It is believed that an ionic bond between the positively charged side chain of an arginine residue (Arg-145) of the enzyme and the negatively charged carboxylate group of the substrate’s terminal amino acid binds the peptide at the active site, the region of the enzyme’s interior where the catalytically important functional groups are located. There, 27.20 Tertiary Structure of Peptides and Proteins 1087 Disulfide bond (a) (b) Zn 2H11001 Arg-145 N-terminus C-terminus FIGURE 27.18 The structure of carboxypepti- dase A displayed as (a) a tube model and (b) a ribbon dia- gram. The tube model shows all of the amino acids and their side chains. The most evident feature illustrated by (a) is the globular shape of the enzyme. The ribbon dia- gram emphasizes the folding of the chain and the helical regions. As can be seen in (b), a substantial portion of the protein, the sections colored gray, is not helical but is ran- dom coil. The orientation of the protein and the color- coding are the same in both views. For their work on myoglobin and hemoglobin, respec- tively, John C. Kendrew and Max F. Perutz were awarded the 1962 Nobel Prize in chemistry. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website the Zn 2H11001 ion acts as a Lewis acid toward the carbonyl oxygen of the peptide substrate, increasing its susceptibility to attack by a water molecule (Figure 27.19). Living systems contain thousands of different enzymes. As we have seen, all are structurally quite complex, and there are no sweeping generalizations that can be made to include all aspects of enzymic catalysis. The case of carboxypeptidase A illustrates one mode of enzyme action, the bringing together of reactants and catalytically active functions at the active site. 27.21 COENZYMES The number of chemical processes that protein side chains can engage in is rather lim- ited. Most prominent among them are proton donation, proton abstraction, and nucle- ophilic addition to carbonyl groups. In many biological processes a richer variety of reac- tivity is required, and proteins often act in combination with nonprotein organic molecules to bring about the necessary chemistry. These “helper molecules,” referred to as coenzymes, cofactors, or prosthetic groups, interact with both the enzyme and the substrate to produce the necessary chemical change. Acting alone, for example, proteins lack the necessary functionality to be effective oxidizing or reducing agents. They can catalyze biological oxidations and reductions, however, in the presence of a suitable coenzyme. In earlier sections we saw numerous examples of these reactions in which the coenzyme NAD H11001 acted as an oxidizing agent, and others in which NADH acted as a reducing agent. Heme (Figure 27.20) is an important prosthetic group in which iron(II) is coordi- nated with the four nitrogen atoms of a type of tetracyclic aromatic substance known as 1088 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids N N CH 3 CH 3 CH 3 CH 3 H 2 C ? CH HO 2 CCH 2 CH 2 CH 2 CH 2 CO 2 H ± CH ? CH 2 N N Fe (a)(b) FIGURE 27.20 Heme shown as (a) a structural drawing and as (b) a space- filling model. The space-filling model shows the coplanar arrangement of the groups surrounding iron. Almost, but not all enzymes are proteins. For identifying certain RNA-catalyzed bio- logical processes Sidney Alt- man (Yale University) and Thomas R. Cech (University of Colorado) shared the 1989 Nobel Prize in chemistry. H11001 H 3 N ± peptide ± C ± NH ± CH ± C H11002 H11001 C ±Arg-145 H H ± ± ± ± ± O X W R O O H 2 N H 2 N Zn 2H11001 O H11001H11002 FIGURE 27.19 Proposed mechanism of hydrolysis of a peptide catalyzed by car- boxypeptidase A. The pep- tide is bound at the active site by an ionic bond between its C-terminal amino acid and the posi- tively charged side chain of arginine-145. Coordination of Zn 2H11001 to oxygen makes the carbon of the carbonyl group more positive and increases the rate of nucle- ophilic attack by water. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website a porphyrin. The oxygen-storing protein of muscle, myoglobin, represented schemati- cally in Figure 27.21, consists of a heme group surrounded by a protein of 153 amino acids. Four of the six available coordination sites of Fe 2H11001 are taken up by the nitrogens of the porphyrin, one by a histidine residue of the protein, and the last by a water mol- ecule. Myoglobin stores oxygen obtained from the blood by formation of an Fe±O 2 complex. The oxygen displaces water as the sixth ligand on iron and is held there until needed. The protein serves as a container for the heme and prevents oxidation of Fe 2H11001 to Fe 3H11001 , an oxidation state in which iron lacks the ability to bind oxygen. Separately, neither heme nor the protein binds oxygen in aqueous solution; together, they do it very well. 27.22 PROTEIN QUATERNARY STRUCTURE: HEMOGLOBIN Rather than existing as a single polypeptide chain, some proteins are assemblies of two or more chains. The manner in which these subunits are organized is called the quater- nary structure of the protein. Hemoglobin is the oxygen-carrying protein of blood. It binds oxygen at the lungs and transports it to the muscles, where it is stored by myoglobin. Hemoglobin binds oxy- gen in very much the same way as myoglobin, using heme as the prosthetic group. Hemoglobin is much larger than myoglobin, however, having a molecular weight of 64,500, whereas that of myoglobin is 17,500; hemoglobin contains four heme units, myo- globin only one. Hemoglobin is an assembly of four hemes and four protein chains, including two identical chains called the alpha chains and two identical chains called the beta chains. Some substances, such as CO, form strong bonds to the iron of heme, strong enough to displace O 2 from it. Carbon monoxide binds 30–50 times more effectively than oxygen to myoglobin and hundreds of times better than oxygen to hemoglobin. Strong binding of CO at the active site interferes with the ability of heme to perform its biological task of transporting and storing oxygen, with potentially lethal results. How function depends on structure can be seen in the case of the genetic disorder sickle cell anemia. This is a debilitating, sometimes fatal, disease in which red blood cells become distorted (“sickle-shaped”) and interfere with the flow of blood through the capillaries. This condition results from the presence of an abnormal hemoglobin in affected people. The primary structures of the beta chain of normal and sickle cell hemo- globin differ by a single amino acid out of 149; sickle cell hemoglobin has valine in 27.22 Protein Quaternary Structure: Hemoglobin 1089 N-terminus C-terminus Heme (a)(b) FIGURE 27.21 The structure of sperm-whale myoglobin displayed as (a) a tube model and (b) a ribbon diagram. The tube model shows all of the amino acids in the chain; the ribbon dia- gram shows the folding of the chain. There are five sep- arate regions of H9251-helix in myoglobin which are shown in different colors to show them more clearly. The heme portion is included in both drawings, but is easier to lo- cate in the ribbon diagram, as is the histidine side chain that is attached to the iron of heme. An article entitled “Hemo- globin: Its Occurrence, Struc- ture, and Adaptation” appeared in the March 1982 issue of the Journal of Chem- ical Education (pp. 173–178). Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website place of glutamic acid as the sixth residue from the N terminus. A tiny change in amino acid sequence can produce a life-threatening result! This modification is genetically con- trolled and probably became established in the gene pool because bearers of the trait have an increased resistance to malaria. 27.23 PYRIMIDINES AND PURINES One of the major achievements in all of science has been the identification, at the molecular level, of the chemical interactions that are involved in the transfer of genetic information and the control of protein biosynthesis. The substances involved are bio- logical macromolecules called nucleic acids. Nucleic acids were isolated over 100 years ago, and, as their name implies, they are acidic substances present in the nuclei of cells. There are two major kinds of nucleic acids: ribonucleic acid (RNA) and deoxyribonu- cleic acid (DNA). To understand the complex structure of nucleic acids, we first need to examine some simpler substances, nitrogen-containing aromatic heterocycles called pyrimidines and purines. The parent substance of each class and the numbering system used are shown: The pyrimidines that occur in DNA are cytosine and thymine. Cytosine is also a structural unit in RNA, which, however, contains uracil instead of thymine. Other pyrim- idine derivatives are sometimes present but in small amounts. PROBLEM 27.21 5-Fluorouracil is a drug used in cancer chemotherapy. What is its structure? N H O O HN Uracil (occurs in RNA) Thymine (occurs in DNA) CH 3 N H O O HN N H NH 2 N O Cytosine (occurs in both RNA and DNA) N N 1 26 5 4 3 Pyrimidine N 3 2 6 1 N 4 5 8 N 9 7 N H Purine 1090 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids Recall that heterocyclic aro- matic compounds were in- troduced in Section 11.21. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website Adenine and guanine are the principal purines of both DNA and RNA. The rings of purines and pyrimidines are aromatic and planar. You will see how important this flat shape is when we consider the structure of nucleic acids. Pyrimidines and purines occur naturally in substances other than nucleic acids. Cof- fee, for example, is a familiar source of caffeine. Tea contains both caffeine and theo- bromine. 27.24 NUCLEOSIDES The term nucleoside was once restricted to pyrimidine and purine N-glycosides of D-ribofuranose and 2-deoxy-D-ribofuranose, because these are the substances present in nucleic acids. The term is used more liberally now with respect to the carbohydrate por- tion, but is still usually limited to pyrimidine and purine substituents at the anomeric car- bon. Uridine is a representative pyrimidine nucleoside; it bears a D-ribofuranose group at N-1. Adenosine is a representative purine nucleoside; its carbohydrate unit is attached at N-9. It is customary to refer to the noncarbohydrate portion of a nucleoside as a purine or pyrimidine base. Uridine (1-H9252-D-ribofuranosyluracil) HOCH 2 H O H HH OHOH N O O HN Adenosine (9-H9252-D-ribofuranosyladenine) HOCH 2 H O H HH OHOH N N NH 2 N N N N CH 3 CH 3 O O H 3 C N N Caffeine HN N CH 3 CH 3 O O N N Theobromine N N NH 2 N N H Adenine HN H 2 N N O N N H Guanine 27.24 Nucleosides 1091 Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website PROBLEM 27.22 The names of the principal nucleosides obtained from RNA and DNA are listed. Write a structural formula for each one. (a) Thymidine (thymine-derived nucleoside in DNA) (b) Cytidine (cytosine-derived nucleoside in RNA) (c) Guanosine (guanine-derived nucleoside in RNA) SAMPLE SOLUTION (a) Thymine is a pyrimidine base present in DNA; its carbo- hydrate substituent is 2-deoxyribofuranose, which is attached to N-1 of thymine. Nucleosides of 2-deoxyribose are named in the same way. Carbons in the carbo- hydrate portion of the molecule are identified as 1H11032, 2H11032, 3H11032, 4H11032, and 5H11032 to distinguish them from atoms in the purine or pyrimidine base. Thus, the adenine nucleoside of 2-deoxyri- bose is called 2H11032-deoxyadenosine or 9-H9252-2H11032-deoxyribofuranosyladenine. 27.25 NUCLEOTIDES Nucleotides are phosphoric acid esters of nucleosides. The 5H11032-monophosphate of adeno- sine is called 5H11032-adenylic acid or adenosine 5H11032-monophosphate (AMP). As its name implies, 5H11032-adenylic acid is an acidic substance; it is a diprotic acid with pK a ’s for ionization of 3.8 and 6.2, respectively. In aqueous solution at pH 7, both OH groups of the P(O)(OH) 2 unit are ionized. The analogous D-ribonucleotides of the other purines and pyrimidines are uridylic acid, guanylic acid, and cytidylic acid. Thymidylic acid is the 5H11032-monophosphate of thymidine (the carbohydrate is 2-deoxyribose in this case). 5H11032-Adenylic acid (AMP) OCH 2 H O H HH OHOH N N NH 2 N N HO O HO P Thymidine HOCH 2 H H H OH HH O N O O HN CH 3 1092 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website Other important 5H11032-nucleotides of adenosine include adenosine diphosphate (ADP) and adenosine triphosphate (ATP): Each phosphorylation step in the sequence shown is endothermic: The energy to drive each step comes from carbohydrates by the process of glycolysis. It is convenient to view ATP as the storage vessel for the energy released during con- version of carbohydrates to carbon dioxide and water. That energy becomes available to the cells when ATP undergoes hydrolysis. The hydrolysis of ATP to ADP and phosphate has a H9004G° value of H11002 35 kJ/mol (H11002 8.4 kcal/mol). Adenosine 3H11032-5H11032-cyclic monophosphate (cyclicAMP or cAMP) is an important reg- ulator of a large number of biological processes. It is a cyclic ester of phosphoric acid and adenosine involving the hydroxyl groups at C-3H11032 and C-5H11032. 27.26 NUCLEIC ACIDS Nucleic acids are polynucleotides in which a phosphate ester unit links the 5H11032 oxygen of one nucleotide to the 3H11032 oxygen of another. Figure 27.22 is a generalized depiction of the structure of a nucleic acid. Nucleic acids are classified as ribonucleic acids (RNA) or deoxyribonucleic acids (DNA) depending on the carbohydrate present. Research on nucleic acids progressed slowly until it became evident during the 1940s that they played a role in the transfer of genetic information. It was known that Adenosine 3H11032-5H11032-cyclic monophosphate (cAMP) CH 2 H O H HH OHO N N NH 2 N N O P HO O Adenosine AMP ADP ATP PO 4 3H11002 enzymes PO 4 3H11002 enzymes PO 4 3H11002 enzymes 27.26 Nucleic Acids 1093 Adenosine diphosphate (ADP) OCH 2 H O H HH OHOH N N NH 2 N N O HO PHO O HO P O Adenosine triphosphate (ATP) OCH 2 H O H HH OHOH N N NH 2 N N O HO P O HO P OHO O HO PO For a discussion of glycolysis, see the July 1986 issue of the Journal of Chemical Educa- tion (pp. 566–570). Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website the genetic information of an organism resides in the chromosomes present in each of its cells and that individual chromosomes are made up of smaller units called genes. When it became apparent that genes are DNA, interest in nucleic acids intensified. There was a feeling that once the structure of DNA was established, the precise way in which it carried out its designated role would become more evident. In some respects the prob- lems are similar to those of protein chemistry. Knowing that DNA is a polynucleotide is comparable with knowing that proteins are polyamides. What is the nucleotide sequence (primary structure)? What is the precise shape of the polynucleotide chain (secondary and tertiary structure)? Is the genetic material a single strand of DNA, or is it an assem- bly of two or more strands? The complexity of the problem can be indicated by noting that a typical strand of human DNA contains approximately 10 8 nucleotides; if uncoiled it would be several centimeters long, yet it and many others like it reside in cells too small to see with the naked eye. In 1953 James D. Watson and Francis H. C. Crick pulled together data from biol- ogy, biochemistry, chemistry, and X-ray crystallography, along with the insight they gained from molecular models, to propose a structure for DNA and a mechanism for its replication. Their two brief papers paved the way for an explosive growth in our under- standing of life processes at the molecular level, the field we now call molecular biol- ogy. Along with Maurice Wilkins, who was responsible for the X-ray crystallographic work, Watson and Crick shared the 1962 Nobel Prize in physiology or medicine. 27.27 STRUCTURE AND REPLICATION OF DNA: THE DOUBLE HELIX Watson and Crick were aided in their search for the structure of DNA by a discovery made by Erwin Chargaff (Columbia University). Chargaff found that there was a con- sistent pattern in the composition of DNAs from various sources. Although there was a wide variation in the distribution of the bases among species, half the bases in all samples 1094 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids Watson and Crick have each written accounts of their work, and both are well worth reading. Watson’s is entitled The Double Helix. Crick’s is What Mad Pursuit: A Personal View of Scientific Discovery. CH 2 CH 2 5 NH 2 X X P H11002 O O f OH; O N N O O O N O N N N NH 2 O P O O O CH 2 O N O NH O R OX P O H11002 O O 5H11032 5H11032 3H11032 2H11032 DNA: X H11005 H; R H11005 CH 3 RNA: X H11005 R H11005 H ? ? f H11002 f H11032 3H11032 2H11032 O ? 3H11032 2H11032 FIGURE 27.22 A portion of a polynucleotide chain. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 27.27 Structure and Replication of DNA: The Double Helix 1095 ? ? ? N-------- --------- N N N 2-Deoxyribose N H H HN N O O CH 3 2-Deoxyribose AT 1080 pm (a) N N N N 2-Deoxyribose O--------- --------- --------- H HN N N O 2-Deoxyribose H HN H GC 1080 pm (b) of DNA were purines and the other half were pyrimidines. Furthermore, the ratio of the purine adenine (A) to the pyrimidine thymine (T) was always close to 1:1. Likewise, the ratio of the purine guanine (G) to the pyrimidine cytosine (C) was also close to 1:1. Analysis of human DNA, for example, revealed it to have the following composition: Feeling that the constancy in the A/T and G/C ratios was no accident, Watson and Crick proposed that it resulted from a structural complementarity between A and T and between G and C. Consideration of various hydrogen bonding arrangements revealed that A and T could form the hydrogen-bonded base pair shown in Figure 27.23a and that G and C could associate as in Figure 27.23b. Specific base pairing of A to T and of G to C by hydrogen bonds is a key element in the Watson–Crick model for the struc- ture of DNA. We shall see that it is also a key element in the replication of DNA. Because each hydrogen-bonded base pair contains one purine and one pyrimidine, A---T and G---C are approximately the same size. Thus, two nucleic acid chains may be aligned side by side with their bases in the middle, as illustrated in Figure 27.24. The two chains are joined by the network of hydrogen bonds between the paired bases A---T and G---C. Since X-ray crystallographic data indicated a helical structure, Watson and Crick proposed that the two strands are intertwined as a double helix (Figure 27.25). The Watson–Crick base pairing model for DNA structure holds the key to under- standing the process of DNA replication. During cell division a cell’s DNA is dupli- cated, that in the new cell being identical with that in the original cell. At one stage of cell division the DNA double helix begins to unwind, separating the two chains. As por- trayed in Figure 27.26, each strand serves as the template on which a new DNA strand is constructed. Each new strand is exactly like the original partner because the A---T, G---C base pairing requirement ensures that the new strand is the precise complement of the template, just as the old strand was. As the double helix unravels, each strand becomes one half of a new and identical DNA double helix. Purine Adenine (A) 30.3% Guanine (G) 19.5% Total purines 49.8% Pyrimidine Thymine (T) 30.3% Cytosine (C) 19.9% Total pyrimidines 50.1% Base ratio A/T H11005 1.00 G/C H11005 0.98 FIGURE 27.23 Base pairing between (a) adenine and thymine and (b) guanine and cytosine. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website The structural requirements for the pairing of nucleic acid bases are also critical for utilizing genetic information, and in living systems this means protein biosynthesis. 27.28 DNA-DIRECTED PROTEIN BIOSYNTHESIS Protein biosynthesis is directed by DNA through the agency of several types of ribonu- cleic acid called messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal RNA (rRNA). There are two main stages in protein biosynthesis: transcription and translation. In the transcription stage a molecule of mRNA having a nucleotide sequence com- plementary to one of the strands of a DNA double helix is constructed. A diagram illus- trating transcription is presented in Figure 27.27 on page 1099. Transcription begins at the 5H11032 end of the DNA molecule, and ribonucleotides with bases complementary to the DNA bases are polymerized with the aid of the enzyme RNA polymerase. Thymine does not occur in RNA; the base that pairs with adenine in RNA is uracil. Unlike DNA, RNA is single-stranded. 1096 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids FIGURE 27.24 Hydrogen bonds between complementary bases (A and T, and G and C) permit pairing of two DNA strands. The strands are antiparallel; the 5H11032 end of the left strand is at the top, while the 5H11032 end of the right strand is at the bottom. P ? O O – P ? O O – P ? O O – C G O O O OCH 2 5H11032 OCH 2 5H11032 3H11032 3H11032 O 3H11032 O O 3H11032 O CH 2 O 5H11032 3H11032 O O 3H11032 O 3H11032 O 3H11032 O AT AT CG O OCH 2 5H11032 O OCH 2 5H11032 P ? O O – CH 2 O 5H11032 O CH 2 O 5H11032 O CH 2 O 5H11032 P ? O O – P ? O O – Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website (a)(b) FIGURE 27.25 Tube (a) and space-filling (b) mod- els of a DNA double helix. The carbohydrate–phosphate “backbone” is on the outside and can be roughly traced in (b) by the red oxygen atoms. The blue atoms belong to the purine and pyrimidine bases and lie on the inside. The base-pairing is more clearly seen in (a). A C GH11032 G T AH11032 TH11032 CH11032 FIGURE 27.26 During DNA replication the double helix unwinds, and each of the original strands serves as a template for the synthesis of its complementary strand. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 1098 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids AIDS T he explosive growth of our knowledge of nu- cleic acid chemistry and its role in molecular bi- ology in the 1980s happened to coincide with a challenge to human health that would have defied understanding a generation ago. That challenge is acquired immune deficiency syndrome, or AIDS. AIDS is a condition in which the body’s immune system is devastated by a viral infection to the extent that it can no longer perform its vital function of identifying and destroying invading organisms. AIDS victims of- ten die from “opportunistic” infections—diseases that are normally held in check by a healthy immune system but which can become deadly when the im- mune system is compromised. In the short time since its discovery, AIDS has claimed the lives of over 11 mil- lion people worldwide, and the most recent esti- mates place the number of those infected at more than 30 million. The virus responsible for almost all the AIDS cases in the United States was identified by scientists at the Louis Pasteur Institute in Paris in 1983 and is known as human immunodeficiency virus 1 (HIV-1). HIV-1 is believed to have originated in Africa, where a related virus, HIV-2, was discovered in 1986 by the Pas- teur Institute group. Both HIV-1 and HIV-2 are classed as retroviruses, because their genetic material is RNA rather than DNA. HIVs require a host cell to reproduce, and the hosts in humans are the so-called T4 lympho- cytes, which are the cells primarily responsible for in- ducing the immune system to respond when pro- voked. The HIV penetrates the cell wall of a T4 lymphocyte and deposits both its RNA and an enzyme called reverse transcriptase inside the T4 cell, where the reverse transcriptase catalyzes the formation of a DNA strand that is complementary to the viral RNA. The transcribed DNA then serves as the template for formation of double-helical DNA, which, with the information it carries for reproduction of the HIV, be- comes incorporated into the T4 cell’s own genetic ma- terial. The viral DNA induces the host lymphocyte to begin producing copies of the virus, which then leave the host to infect other T4 cells. In the course of HIV re- production, the ability of the T4 lymphocyte to repro- duce itself is hampered. As the number of T4 cells de- crease, so does the body’s ability to combat infections. At this time, there is no known cure for AIDS, but progress is being made in delaying the onset of symptoms and prolonging the lives of those infected with HIV. The first advance in treatment came with drugs such as zidovudine, also known as azido- thymine, or AZT. AZT interferes with the ability of HIV to reproduce by blocking the action of reverse tran- scriptase. As seen by its structure AZT is a nucleoside. Several other nucleosides that are also reverse transcriptase inhibitors are in clinical use as well, sometimes in combination with AZT as “drug cocktails.” A mixture makes it more difficult for a virus to develop resistance than a single drug does. The most recent advance has been to simulta- neously attack HIV on a second front using a protease inhibitor. Recall from Section 27.10 that proteases are enzymes that catalyze the hydrolysis of proteins at specific points. When HIV uses a cell’s DNA to synthe- size its own proteins, those proteins are in a form that must be modified by protease-catalyzed hydrol- ysis to become useful. Protease inhibitors prevent this modification and, in combination with reverse tran- scriptase inhibitors, slow the reproduction of HIV and have been found to dramatically reduce the “viral load” in HIV-infected patients. The AIDS outbreak has been and continues to be a tragedy on a massive scale. Until a cure is discov- ered, or a vaccine developed, sustained efforts at pre- venting its transmission offer our best weapon against the spread of AIDS. N O H 3 C NH O N 3 HOCH 2 O Zidovudine (AZT) Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website In the translation stage, the nucleotide sequence of the mRNA is decoded and “read” as an amino acid sequence to be constructed. Since there are only four different bases in mRNA and 20 amino acids to be coded for, codes using either one nucleotide to one amino acid or two nucleotides to one amino acid are inadequate. If nucleotides are read in sets of three, however, the four mRNA bases (A, U, C, G) generate 64 pos- sible “words,” more than sufficient to code for 20 amino acids. It has been established that the genetic code is indeed made up of triplets of adjacent nucleotides called codons. The amino acids corresponding to each of the 64 possible codons of mRNA have been determined (Table 27.4). 27.28 DNA-Directed Protein Biosynthesis 1099 TABLE 27.4 The Genetic Code (Messenger RNA Codons)* Alanine GCU GCA GCC GCG Glutamic acid GAA GAG Leucine UUA CUU CUA UUG CUC CUG Serine UCU UCA AGU UCC UCG AGC Arginine CGU CGA AGA CGC CGG AGG Glutamine CAA CAG Lysine AAA AAG Threonine ACU ACA ACC ACG Asparagine AAU AAC Glycine GGU GGA GGC GGG Methionine AUG Tryptophan UGG Aspartic acid GAU GAC Histidine CAU CAC Phenylalanine UUU UUC Tyrosine UAU UAC Cysteine UGU UGC Isoleucine AUU AUA AUC Proline CCU CCA CCC CCG Valine GUU GUA GUC GUG *The first letter of each triplet corresponds to the nucleotide nearer the 5H11032 terminus, the last letter to the nucleotide nearer the 3H11032 terminus. UAA, UGA, and UAG are not included in the table; they are chain- terminating codons. A DNA DNA strand that serves as template for transcription mRNA Nucleotides to be incorporated into mRNA DNA strand complementary to one being transcribed A A C T T C G T G C T A G A A A A G G G C T C G A T C G T T CG G T C T CAGTCC A G U A G C U A G FIGURE 27.27 During transcription a molecule of mRNA is assembled by using DNA as a template. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 1100 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids PROBLEM 27.23 It was pointed out in Section 27.22 that sickle cell hemoglobin has valine in place of glutamic acid at one point in its protein chain. Compare the codons for valine and glutamic acid. How do they differ? The mechanism of translation makes use of the same complementary base pairing principle used in replication and transcription. Each amino acid is associated with a par- ticular tRNA. Transfer RNA is much smaller than DNA and mRNA. It is single-stranded and contains 70–90 ribonucleotides arranged in a “cloverleaf” pattern (Figure 27.28). Its characteristic shape results from the presence of paired bases in some regions and their absence in others. All tRNAs have a CCA triplet at their 3H11032 terminus, to which is attached, by an ester linkage, an amino acid unique to that particular tRNA. At one of the loops of the tRNA there is a nucleotide triplet called the anticodon, which is complementary to a codon of mRNA. The codons of mRNA are read by the anticodons of tRNA, and the proper amino acids are transferred in sequence to the growing protein. 27.29 DNA SEQUENCING In 1988, the United States Congress authorized the first allocation of funds in what may be a $3 billion project dedicated to determining the sequence of bases that make up the human genome. (The genome is the aggregate of all the genes that determine what an organism becomes.) Given that the human genome contains approximately 3 H11003 10 9 base pairs, this expenditure amounts to $1 per base pair—a strikingly small cost when one considers both the complexity of the project and the increased understanding of human According to Crick, the so- called central dogma of mo- lecular biology is “DNA makes RNA makes protein.” Anticodon loop 5H11032 3H11032 O OCCHCH 2 NH 3 + A A GAC CU U C G G G G A A GC C G A C G G U A A A A A A G AC G C C U C U GU U G G C C C C U U A A C G G G A U A A U U Anticodon loop (a) (b) 3H11032 5H11032 FIGURE 27.28 Phenylalanine tRNA. (a) A schematic drawing showing the sequence of bases. RNAs usually contain modified bases (green boxes), slightly different from those in other RNAs. The anticodon for phenylalanine is shown in red, and the CCA triplet which bears the phenylalanine is in blue. (b) The experimentally determined structure for yeast phenyl- alanine tRNA. Complementary base-pairing is present in some regions, but not in others. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website biology that is sure to result. DNA sequencing, which lies at the heart of the human genome project, is a relatively new technique but one that has seen dramatic advances in efficiency in a very short time. To explain how DNA sequencing works, we must first mention restriction enzymes. Like all organisms, bacteria are subject to infection by external invaders (e.g., viruses and other bacteria) and possess defenses in the form of restriction enzymes that destroy the intruder by cleaving its DNA. About 200 different restriction enzymes are known. They differ in respect to the nucleotide sequence they recognize, and each restric- tion enzyme cleaves DNA at a specific nucleotide site. Thus, one can take a large piece of DNA and, with the aid of restriction enzymes, cleave it into units small enough to be sequenced conveniently. These smaller DNA fragments are separated and purified by gel electrophoresis. At a pH of 7.4, each phosphate link between adjacent nucleotides is ion- ized, giving the DNA fragments a negative charge and causing them to migrate to the positively charged electrode. Separation is size-dependent. Larger polynucleotides move more slowly through the polyacrylamide gel than smaller ones. The technique is so sen- sitive that two polynucleotides differing in length by only a single nucleotide can be sep- arated from each other on polyacrylamide gels. Once the DNA is separated into smaller fragments, each fragment is sequenced independently. Again, gel electrophoresis is used, this time as an analytical tool. In the technique devised by Frederick Sanger, the two strands of a sample of a small fragment of DNA, 100–200 base pairs in length, are separated and one strand is used as a tem- plate to create complements of itself. The single-stranded sample is divided among four test tubes, each of which contains the materials necessary for DNA synthesis. These materials include the four nucleosides present in DNA, 2H11032-deoxyadenosine (dA), 2H11032-deoxythymidine (dT), 2H11032-deoxyguanosine (dG), and 2H11032-deoxycytidine (dC) as their triphosphates dATP, dTTP, dGTP, and dCTP. Also present in the first test tube is a synthetic analog of adenosine triphosphate in which both the 2H11032 and 3H11032 hydroxyl groups have been replaced by hydrogens. This compound is called 2H11032,3H11032-dideoxyadenosine triphosphate (ddATP). Similarly, ddTTP is added to the second tube, ddGTP to the third, and ddCTP to the fourth. Each tube also contains a “primer.” The primer is a short section of the complementary DNA strand, which has been labeled with a radioactive isotope of phosphorus ( 32 P) that emits H9251 particles. When the electrophoresis gel is examined at the end of the experiment, the positions of the DNAs formed by chain extension of the primer are located by detecting their H9251 emis- sion by a technique called autoradiography. As DNA synthesis proceeds, nucleotides from the solution are added to the grow- ing polynucleotide chain. Chain extension takes place without complication as long as the incorporated nucleotides are derived from dATP, dTTP, dGTP, and dCTP. If, how- ever, the incorporated species is derived from a dideoxy analog, chain extension stops. Because the dideoxy species ddA, ddT, ddG, and ddC lack hydroxyl groups at 3H11032, they cannot engage in the 3H11032 → 5H11032 phosphodiester linkage necessary for chain extension. Thus, CH 2 O base X H O OH P OH P COHO OH OOO PO X H11005 OH dATP dTTP dGTP dCTP X H11005 H ddATP ddTTP ddGTP ddCTP 27.29 DNA Sequencing 1101 Gel electrophoresis of pro- teins was described in the boxed essay accompanying Section 27.3. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website the first tube—the one containing ddATP—contains a mixture of DNA fragments of dif- ferent length, all of which terminate in ddA. Similarly, all the polynucleotides in the sec- ond tube terminate in ddT, those in the third tube terminate in ddG, and those in the fourth terminate in ddC. The contents of each tube are then subjected to electrophoresis in separate lanes on the same sheet of polyacrylamide gel and the DNAs located by autoradiography. A typical electrophoresis gel of a DNA fragment containing 50 nucleotides will exhibit a pattern of 50 bands distributed among the four lanes with no overlaps. Each band cor- responds to a polynucleotide that is one nucleotide longer than the one that precedes it (which may be in a different lane). One then simply “reads” the nucleotide sequence according to the lane in which each succeeding band appears. The Sanger method for DNA sequencing is summarized in Figure 27.29. This work produced a second Nobel Prize for Sanger. (His first was for protein sequencing in 1958.) Sanger shared the 1980 chemistry prize with Walter Gilbert of Har- vard University, who developed a chemical method for DNA sequencing (the Maxam–Gilbert method), and with Paul Berg of Stanford University, who was respon- sible for many of the most important techniques in nucleic acid chemistry and biology. A recent modification of Sanger’s method has resulted in the commercial avail- ability of automated DNA sequenators based on Sanger’s use of dideoxy analogs of nucleotides. Instead, however, of tagging a primer with 32 P, the purine and pyrimidine base portions of the dideoxynucleotides are each modified to contain a side chain that bears a different fluorescent dye, and all the dideoxy analogs are present in the same reaction. After electrophoretic separation of the products in a single lane, the gel is read by argon–laser irradiation at four different wavelengths. One wavelength causes the mod- ified ddA-containing polynucleotides to fluoresce, another causes modified-ddT fluores- 1102 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids In 1995, a team of U.S. scien- tists announced the com- plete sequencing of the 1.8 million base genome of a species of influenza bac- terium. Increasing distance from origin DNA fragment formed under conditions of experiment terminates in indicated dideoxynucleoside ddA ddT ddG ddC Sequence of DNA fragment Sequence of original DNA TA TG AC TGA ACT TGAC ACTG TGACAT TGACA ACTGTA ACTGT TGACATA ACTGTAT TGACATAC ACTGTATG TGACATACG ACTGTATGC TGACATACGT ACTGTATGCA FIGURE 27.29 Sequencing of a short strand of DNA (10 bases) by Sanger’s method using dideoxynucleotides to halt polynucleotide chain extension. Double-stranded DNA is separated, and one of the strands is used to pro- duce complements of itself in four different tubes. All of the tubes contain a primer tagged with 32 P, dATP, dTTP, dGTP, and dCTP (see text for abbreviations). The first tube also contains ddATP; the second, ddTTP; the third, ddGTP; and the fourth, ddCTP. All of the DNA fragments in the first tube terminate in A, those in the second terminate in T, those in the third terminate in G, and those in the fourth terminate in C. Location of the zones by autoradio- graphic detection of 32 P identifies the terminal nucleoside. The original DNA strand is its comple- ment. Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website cence, and so on. The data are stored and analyzed in a computer and printed out as the DNA sequence. It is claimed that a single instrument can sequence 10,000 nucleotides per day, making the hope of sequencing the 3 billion base pairs in the human genome a not-impossible goal. The present plan is to complete a draft of the DNA sequence of the human genome by 2001 and a refined version by 2003. 27.30 SUMMARY This chapter revolves around proteins. The first third describes the building blocks of proteins, progressing through amino acids and peptides. The middle third deals with proteins themselves. The last third discusses nucleic acids and their role in the biosyn- thesis of proteins. Section 27.1 A group of 20 amino acids, listed in Table 27.1, regularly appears as the hydrolysis products of proteins. All are H9251-amino acids. Section 27.2 Except for glycine, which is achiral, all of the H9251-amino acids present in proteins are chiral and have the L configuration at the H9251 carbon. Section 27.3 The most stable structure of a neutral amino acid is a zwitterion. The pH of an aqueous solution at which the concentration of the zwitterion is a maximum is called the isoelectric point (pI). Section 27.4 Amino acids are synthesized in the laboratory from 1. H9251-Halo acids by reaction with ammonia 2. Aldehydes by reaction with ammonia and cyanide ion (the Strecker synthesis) 3. Alkyl halides by reaction with the enolate anion derived from diethyl acetamidomalonate The amino acids prepared by these methods are formed as racemic mix- tures and are optically inactive. Section 27.5 Amino acids undergo reactions characteristic of the amino group (e.g., amide formation) and the carboxyl group (e.g., esterification). Amino acid side chains undergo reactions characteristic of the functional groups they contain. Section 27.6 The reactions that amino acids undergo in living systems include transamination and decarboxylation. Section 27.7 An amide linkage between two H9251-amino acids is called a peptide bond. The primary structure of a peptide is given by its amino acid sequence plus any disulfide bonds between two cysteine residues. By convention, peptides are named and written beginning at the N terminus. H 3 N H11001 CO 2 H11002 H CH(CH 3 ) 2 Fischer projection of L-valine in its zwitterionic form 27.30 Summary 1103 Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 1104 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids Section 27.8 The primary structure of a peptide is determined by a systematic approach in which the protein is cleaved to smaller fragments, even individual amino acids. The smaller fragments are sequenced and the main sequence deduced by finding regions of overlap among the smaller peptides. Section 27.9 Complete hydrolysis of a peptide gives a mixture of amino acids. An amino acid analyzer identifies the individual amino acids and determines their molar ratios. Section 27.10 Incomplete hydrolysis can be accomplished by using enzymes to catalyze cleavage at specific peptide bonds. Section 27.11 Carboxypeptidase-catalyzed hydrolysis can be used to identify the C- terminal amino acid. The N terminus is determined by chemical means. One reagent used for this purpose is 1-fluoro-2,4-dinitrobenzene (see Fig- ure 27.8). Section 27.12 The procedure described in Sections 27.8–27.11 was used to determine the amino acid sequence of insulin. Section 27.13 Modern methods of peptide sequencing follow a strategy similar to that used to sequence insulin, but are automated and can be carried out on a small scale. A key feature is repetitive N-terminal identification using the Edman degradation. Section 27.14 Synthesis of a peptide of prescribed sequence requires the use of pro- tecting groups to minimize the number of possible reactions. Section 27.15 Amino-protecting groups include benzyloxycarbonyl (Z) and tert-butoxy- carbonyl (Boc). Hydrogen bromide may be used to remove either the benzyloxycarbonyl or tert-butoxycarbonyl protecting group. The benzyloxycarbonyl protect- ing group may also be removed by catalytic hydrogenolysis. Section 27.16 Carboxyl groups are normally protected as benzyl, methyl, or ethyl esters. Hydrolysis in dilute base is normally used to deprotect methyl and ethyl esters. Benzyl protecting groups are removed by hydrogenolysis. Section 27.17 Peptide bond formation between a protected amino acid having a free carboxyl group and a protected amino acid having a free amino group can be accomplished with the aid of N,NH11032-dicyclohexylcarbodiimide (DCCI). C 6 H 5 CH 2 OC NHCHCO 2 H RO Benzyloxycarbonyl-protected amino acid (CH 3 ) 3 COC NHCHCO 2 H RO tert-Butoxycarbonyl-protected amino acid NHCHC NHCH 2 CO 2 H11002 H 3 NCHC H11001 CH 3 O CH 2 SH O Alanylcysteinylglycine Ala-Cys-Gly Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website Section 27.18 In the Merrifield method the carboxyl group of an amino acid is anchored to a solid support and the chain extended one amino acid at a time. When all the amino acid residues have been added, the polypeptide is removed from the solid support. Section 27.19 Two secondary structures of proteins are particularly prominent. The pleated H9252 sheet is stabilized by hydrogen bonds between N±H and C?O groups of adjacent chains. The H9251 helix is stabilized by hydrogen bonds within a single polypeptide chain. Section 27.20 The folding of a peptide chain is its tertiary structure. The tertiary struc- ture has a tremendous influence on the properties of the peptide and the biological role it plays. The tertiary structure is normally determined by X-ray crystallography. Many globular proteins are enzymes. They accelerate the rates of chemical reactions in biological systems, but the kinds of reactions that take place are the fundamental reactions of organic chemistry. One way in which enzymes accelerate these reactions is by bringing reactive func- tions together in the presence of catalytically active functions of the protein. Section 27.21 Often the catalytically active functions of an enzyme are nothing more than proton donors and proton acceptors. In many cases a protein acts in cooperation with a coenzyme, a small molecule having the proper func- tionality to carry out a chemical change not otherwise available to the protein itself. Section 27.22 Many proteins consist of two or more chains, and the way in which the various units are assembled in the native state of the protein is called its quaternary structure. Sections Carbohydrate derivatives of purine and pyrimidine are among the most 27-23–27.26 important compounds of biological chemistry. N-Glycosides of D-ribose and 2-deoxy-D-ribose in which the substituent at the anomeric position is a derivative of purine or pyrimidine are called nucleosides. Nucleotides are phosphate esters of nucleosides. Nucleic acids are poly- mers of nucleotides. Section 27.27 Nucleic acids derived from 2-deoxy-D-ribose (DNA) are responsible for storing and transmitting genetic information. DNA exists as a double- stranded pair of helices in which hydrogen bonds are responsible for com- plementary base pairing between adenine (A) and thymine (T), and between guanine (G) and cytosine (C). During cell division the two strands of DNA unwind and are duplicated. Each strand acts as a tem- plate on which its complement is constructed. Section 27.28 In the transcription stage of protein biosynthesis a molecule of mes- senger RNA (mRNA) having a nucleotide sequence complementary to that of DNA is assembled. Transcription is followed by translation, in ZNHCHCOH R O H11001 H 2 NCHCOCH 3 RH11032 O ZNHCHC R O NHCHCOCH 3 RH11032 O DCCI 27.30 Summary 1105 Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 1106 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids which triplets of nucleotides of mRNA called codons are recognized by transfer RNA (tRNA) for a particular amino acid, and that amino acid is added to the growing peptide chain. Section 27.29 The nucleotide sequence of DNA can be determined by a technique in which a short section of single-stranded DNA is allowed to produce its complement in the presence of dideoxy analogs of ATP, TTP, GTP, and CTP. DNA formation terminates when a dideoxy analog is incorporated into the growing polynucleotide chain. A mixture of polynucleotides dif- fering from one another by an incremental nucleoside is produced and analyzed by electrophoresis. From the observed sequence of the comple- mentary chain, the sequence of the original DNA is deduced. PROBLEMS 27.24 The imidazole ring of the histidine side chain acts as a proton acceptor in certain enzyme- catalyzed reactions. Which is the more stable protonated form of the histidine residue, A or B? Why? 27.25 Acrylonitrile (CH 2 ?CHCPN) readily undergoes conjugate addition when treated with nucleophilic reagents. Describe a synthesis of H9252-alanine that takes advantage of this fact. 27.26 (a) Isoleucine has been prepared by the following sequence of reactions. Give the structure of compounds A through D isolated as intermediates in this synthesis. (b) An analogous procedure has been used to prepare phenylalanine. What alkyl halide would you choose as the starting material for this synthesis? 27.27 Hydrolysis of the following compound in concentrated hydrochloric acid for several hours at 100°C gives one of the amino acids in Table 27.1. Which one? Is it optically active? 27.28 If you synthesized the tripeptide Leu-Phe-Ser from amino acids prepared by the Strecker synthesis, how many stereoisomers would you expect to be formed? O O N CH 2 COOCH 2 CH 3 C(COOCH 2 CH 3 ) 2 CH 3 CH 2 CHCH 3 W Br A B (C 7 H 12 O 4 ) diethyl malonate sodium ethoxide 1. KOH 2. HCl B D isoleucine (racemic)C (C 7 H 11 BrO 4 ) Br 2 heat NH 3 H 2 O (H 3 NCH 2 CH 2 CO 2 H11002 ) H11001 H N HN H11001 CH 2 CHC NH O A N H H N H11001 CH 2 CHC NH O B Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 27.29 How many peaks would you expect to see on the strip chart after amino acid analysis of bradykinin? 27.30 Automated amino acid analysis of peptides containing asparagine (Asn) and glutamine (Gln) residues gives a peak corresponding to ammonia. Why? 27.31 What are the products of each of the following reactions? Your answer should account for all the amino acid residues in the starting peptides. (a) Reaction of Leu-Gly-Ser with 1-fluoro-2,4-dinitrobenzene (b) Hydrolysis of the compound in part (a) in concentrated hydrochloric acid (100°C) (c) Treatment of Ile-Glu-Phe with C 6 H 5 N?C?S, followed by hydrogen bromide in nitromethane (d) Reaction of Asn-Ser-Ala with benzyloxycarbonyl chloride (e) Reaction of the product of part (d) with p-nitrophenol and N,NH11032-dicyclohexylcarbodi- imide (f) Reaction of the product of part (e) with the ethyl ester of valine (g) Hydrogenolysis of the product of part (f ) over palladium 27.32 Hydrazine cleaves amide bonds to form acylhydrazides according to the general mechanism of nucleophilic acyl substitution discussed in Chapter 20: This reaction forms the basis of one method of terminal residue analysis. A peptide is treated with excess hydrazine in order to cleave all the peptide linkages. One of the terminal amino acids is cleaved as the free amino acid and identified; all the other amino acid residues are converted to acylhydrazides. Which amino acid is identified by hydrazinolysis, the N terminus or the C terminus? 27.33 Somatostatin is a tetradecapeptide of the hypothalamus that inhibits the release of pituitary growth hormone. Its amino acid sequence has been determined by a combination of Edman degra- dations and enzymic hydrolysis experiments. On the basis of the following data, deduce the pri- mary structure of somatostatin: 1. Edman degradation gave PTH-Ala. 2. Selective hydrolysis gave peptides having the following indicated sequences: Phe-Trp Thr-Ser-Cys Lys-Thr-Phe Thr-Phe-Thr-Ser-Cys Asn-Phe-Phe-Trp-Lys Ala-Gly-Cys-Lys-Asn-Phe 3. Somatostatin has a disulfide bridge. 27.34 What protected amino acid would you anchor to the solid support in the first step of a syn- thesis of oxytocin (see Figure 27.8) by the Merrifield method? Amide RCNHRH11032 O X Acylhydrazide RCNHNH 2 O X Hydrazine H 2 NNH 2 Amine RH11032NH 2 H11001H11001 Arg-Pro-Pro-Gly-Phe-Ser-Pro-Phe-Arg Bradykinin Problems 1107 Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website 27.35 Nebularine is a toxic nucleoside isolated from a species of mushroom. Its systematic name is 9-H9252-D-ribofuranosylpurine. Write a structural formula for nebularine. 27.36 The nucleoside vidarabine (ara-A) shows promise as an antiviral agent. Its structure is iden- tical with that of adenosine (Section 27.24) except the D-arabinose replaces D-ribose as the car- bohydrate component. Write a structural formula for this substance. 27.37 When 6-chloropurine is heated with aqueous sodium hydroxide, it is quantitatively con- verted to hypoxantine. Suggest a reasonable mechanism for this reaction. 27.38 Treatment of adenosine with nitrous acid gives a nucleoside known as inosine: Suggest a reasonable mechanism for this reaction. 27.39 (a) The 5H11032-nucleotide of inosine, inosinic acid (C 10 H 13 N 4 O 8 P), is added to foods as a fla- vor enhancer. What is the structure of inosinic acid? (The structure of inosine is given in Problem 27.38.) (b) The compound 2H11032,3H11032-dideoxyinosine (DDI) holds promise as a drug for the treatment of AIDS. What is the structure of DDI? 27.40 In one of the early experiments designed to elucidate the genetic code, Marshall Nirenberg of the U.S. National Institutes of Health (Nobel Prize in physiology or medicine, 1968) prepared a synthetic mRNA in which all the bases were uracil. He added this poly(U) to a cell-free system containing all the necessary materials for protein biosynthesis. A polymer of a single amino acid was obtained. What amino acid was polymerized? 1. HONO, H H11001 2. H 2 O Adenosine HOCH 2 OH H OH H HH O N NH 2 N N N Inosine HOCH 2 OH H OH H HH O O N N NH N N Cl N H N N 6-Chloropurine N O N H NH N Hypoxanthine NaOH, H 2 O heat 1108 CHAPTER TWENTY-SEVEN Amino Acids, Peptides, and Proteins. Nucleic Acids Back Forward Main Menu TOC Study Guide TOC Student OLC MHHE Website