chapter
AMINO ACIDS, PEPTIDES,
AND PROTEINS
3.1 Amino Acids 75
3.2 Peptides and Proteins 85
3.3 Working with Proteins 89
3.4 The Covalent Structure of Proteins 96
3.5 Protein Sequences and Evolution 106
The word protein that I propose to you . . . I would wish to
derive from proteios, because it appears to be the
primitive or principal substance of animal nutrition that
plants prepare for the herbivores, and which the latter
then furnish to the carnivores.
—J. J. Berzelius, letter to G. J. Mulder, 1838
–
+
–
+
3
75
P
roteins are the most abundant biological macromol-
ecules, occurring in all cells and all parts of cells. Pro-
teins also occur in great variety; thousands of different
kinds, ranging in size from relatively small peptides to
huge polymers with molecular weights in the millions,
may be found in a single cell. Moreover, proteins exhibit
enormous diversity of biological function and are the
most important final products of the information path-
ways discussed in Part III of this book. Proteins are the
molecular instruments through which genetic informa-
tion is expressed.
Relatively simple monomeric subunits provide the
key to the structure of the thousands of different pro-
teins. All proteins, whether from the most ancient lines
of bacteria or from the most complex forms of life, are
constructed from the same ubiquitous set of 20 amino
acids, covalently linked in characteristic linear sequences.
Because each of these amino acids has a side chain with
distinctive chemical properties, this group of 20 pre-
cursor molecules may be regarded as the alphabet in
which the language of protein structure is written.
What is most remarkable is that cells can produce
proteins with strikingly different properties and activi-
ties by joining the same 20 amino acids in many differ-
ent combinations and sequences. From these building
blocks different organisms can make such widely diverse
products as enzymes, hormones, antibodies, trans-
porters, muscle fibers, the lens protein of the eye, feath-
ers, spider webs, rhinoceros horn, milk proteins, antibi-
otics, mushroom poisons, and myriad other substances
having distinct biological activities (Fig. 3–1). Among
these protein products, the enzymes are the most var-
ied and specialized. Virtually all cellular reactions are
catalyzed by enzymes.
Protein structure and function are the topics of this
and the next three chapters. We begin with a descrip-
tion of the fundamental chemical properties of amino
acids, peptides, and proteins.
3.1 Amino Acids
Protein Architecture—Amino Acids
Proteins are polymers of amino acids, with each amino
acid residue joined to its neighbor by a specific type
of covalent bond. (The term “residue” reflects the loss
of the elements of water when one amino acid is joined
to another.) Proteins can be broken down (hydrolyzed)
to their constituent amino acids by a variety of methods,
and the earliest studies of proteins naturally focused on
8885d_c03_075 12/23/03 10:16 AM Page 75 mac111 mac111:reb:
the free amino acids derived from them. Twenty differ-
ent amino acids are commonly found in proteins. The
first to be discovered was asparagine, in 1806. The last
of the 20 to be found, threonine, was not identified until
1938. All the amino acids have trivial or common names,
in some cases derived from the source from which they
were first isolated. Asparagine was first found in as-
paragus, and glutamate in wheat gluten; tyrosine was
first isolated from cheese (its name is derived from the
Greek tyros, “cheese”); and glycine (Greek glykos,
“sweet”) was so named because of its sweet taste.
Amino Acids Share Common Structural Features
All 20 of the common amino acids are H9251-amino acids.
They have a carboxyl group and an amino group bonded
to the same carbon atom (the H9251 carbon) (Fig. 3–2). They
differ from each other in their side chains, or R groups,
which vary in structure, size, and electric charge, and
which influence the solubility of the amino acids in wa-
ter. In addition to these 20 amino acids there are many
less common ones. Some are residues modified after a
protein has been synthesized; others are amino acids
present in living organisms but not as constituents of
proteins. The common amino acids of proteins have
been assigned three-letter abbreviations and one-letter
symbols (Table 3–1), which are used as shorthand to in-
dicate the composition and sequence of amino acids
polymerized in proteins.
Two conventions are used to identify the carbons in
an amino acid—a practice that can be confusing. The
additional carbons in an R group are commonly desig-
nated H9252, H9253, H9254, H9255, and so forth, proceeding out from the
H9251 carbon. For most other organic molecules, carbon
atoms are simply numbered from one end, giving high-
est priority (C-1) to the carbon with the substituent con-
taining the atom of highest atomic number. Within this
latter convention, the carboxyl carbon of an amino acid
would be C-1 and the H9251 carbon would be C-2. In some
cases, such as amino acids with heterocyclic R groups,
the Greek lettering system is ambiguous and the num-
bering convention is therefore used.
For all the common amino acids except glycine, the
H9251 carbon is bonded to four different groups: a carboxyl
group, an amino group, an R group, and a hydrogen atom
(Fig. 3–2; in glycine, the R group is another hydrogen
atom). The H9251-carbon atom is thus a chiral center
(p. 17). Because of the tetrahedral arrangement of the
bonding orbitals around the H9251-carbon atom, the four dif-
ferent groups can occupy two unique spatial arrange-
ments, and thus amino acids have two possible
stereoisomers. Since they are nonsuperimposable mir-
ror images of each other (Fig. 3–3), the two forms rep-
resent a class of stereoisomers called enantiomers (see
Fig. 1–19). All molecules with a chiral center are also
optically active—that is, they rotate plane-polarized
light (see Box 1–2).
CH
2
H11001
NH
3
COO
H11002
H11001
NH
3
CH
2
CH
2
CH
2
CH
Lysine
234561
edgba
Chapter 3 Amino Acids, Peptides, and Proteins76
(a) (c)(b)
FIGURE 3–1 Some functions of proteins. (a) The light produced by
fireflies is the result of a reaction involving the protein luciferin and
ATP, catalyzed by the enzyme luciferase (see Box 13–2). (b) Erythro-
cytes contain large amounts of the oxygen-transporting protein he-
moglobin. (c) The protein keratin, formed by all vertebrates, is the
chief structural component of hair, scales, horn, wool, nails, and feath-
ers. The black rhinoceros is nearing extinction in the wild because of
the belief prevalent in some parts of the world that a powder derived
from its horn has aphrodisiac properties. In reality, the chemical prop-
erties of powdered rhinoceros horn are no different from those of pow-
dered bovine hooves or human fingernails.
H
3
N
H11001
C
COO
H11002
R
H
FIGURE 3–2 General structure of an amino acid. This structure is
common to all but one of the H9251-amino acids. (Proline, a cyclic amino
acid, is the exception.) The R group or side chain (red) attached to the
H9251 carbon (blue) is different in each amino acid.
8885d_c03_076 12/23/03 10:20 AM Page 76 mac111 mac111:reb:
Special nomenclature has been developed to spec-
ify the absolute configuration of the four substituents
of asymmetric carbon atoms. The absolute configura-
tions of simple sugars and amino acids are specified by
the D, L system (Fig. 3–4), based on the absolute con-
figuration of the three-carbon sugar glyceraldehyde, a
convention proposed by Emil Fischer in 1891. (Fischer
knew what groups surrounded the asymmetric carbon
of glyceraldehyde but had to guess at their absolute
configuration; his guess was later confirmed by x-ray
diffraction analysis.) For all chiral compounds, stereo-
isomers having a configuration related to that of
L-glyceraldehyde are designated L, and stereoisomers
related to D-glyceraldehyde are designated D. The func-
tional groups of L-alanine are matched with those of L-
glyceraldehyde by aligning those that can be intercon-
verted by simple, one-step chemical reactions. Thus the
carboxyl group of L-alanine occupies the same position
about the chiral carbon as does the aldehyde group
of L-glyceraldehyde, because an aldehyde is readily
converted to a carboxyl group via a one-step oxidation.
Historically, the similar l and d designations were used
for levorotatory (rotating light to the left) and dextro-
rotatory (rotating light to the right). However, not all
L-amino acids are levorotatory, and the convention
shown in Figure 3–4 was needed to avoid potential am-
biguities about absolute configuration. By Fischer’s con-
vention, L and D refer only to the absolute configura-
tion of the four substituents around the chiral carbon,
not to optical properties of the molecule.
Another system of specifying configuration around
a chiral center is the RS system, which is used in the
systematic nomenclature of organic chemistry and de-
scribes more precisely the configuration of molecules
with more than one chiral center (see p. 18).
The Amino Acid Residues in Proteins
Are L Stereoisomers
Nearly all biological compounds with a chiral center oc-
cur naturally in only one stereoisomeric form, either D
or L. The amino acid residues in protein molecules are
exclusively L stereoisomers. D-Amino acid residues have
been found only in a few, generally small peptides, in-
cluding some peptides of bacterial cell walls and certain
peptide antibiotics.
It is remarkable that virtually all amino acid residues
in proteins are L stereoisomers. When chiral compounds
are formed by ordinary chemical reactions, the result is
a racemic mixture of D and L isomers, which are diffi-
cult for a chemist to distinguish and separate. But to a
living system, D and L isomers are as different as the
right hand and the left. The formation of stable, re-
peating substructures in proteins (Chapter 4) generally
requires that their constituent amino acids be of one
stereochemical series. Cells are able to specifically syn-
thesize the L isomers of amino acids because the active
sites of enzymes are asymmetric, causing the reactions
they catalyze to be stereospecific.
3.1 Amino Acids 77
(a)
COO
H11002
H
3
N
CH
3
CH
3
H CC H
COO
H11002
L-Alanine D-Alanine
H11001
NH
3
H11001
H
3
N
H11001
C
COO
H11002
CH
3
H HC
COO
CH
3
N
H11001
H
3
(b) L-Alanine D-Alanine
H
3
N
H11001
COO
H11002
CH
3
HHC
COO
H11002
H11002
CH
3
N
H11001
H
3
L-Alanine D-Alanine
C
(c)
FIGURE 3–3 Stereoisomerism in H9251-amino acids. (a)The two stereoiso-
mers of alanine, L- and D-alanine, are nonsuperimposable mirror im-
ages of each other (enantiomers). (b, c) Two different conventions for
showing the configurations in space of stereoisomers. In perspective
formulas (b) the solid wedge-shaped bonds project out of the plane
of the paper, the dashed bonds behind it. In projection formulas (c)
the horizontal bonds are assumed to project out of the plane of the
paper, the vertical bonds behind. However, projection formulas are
often used casually and are not always intended to portray a specific
stereochemical configuration.
HO C
1
CHO
3
CH
2
OH
HHC
CHO
CH
2
OH
OH
H
3
N
H11001
C
COO
H11002
CH
3
HHC
COO
H11002
CH
3
N
H11001
H
3
L-Glyceraldehyde
D-Alanine
2
D-Glyceraldehyde
L-Alanine
FIGURE 3–4 Steric relationship of the stereoisomers of alanine to
the absolute configuration of L- and D-glyceraldehyde. In these per-
spective formulas, the carbons are lined up vertically, with the chiral
atom in the center. The carbons in these molecules are numbered be-
ginning with the terminal aldehyde or carboxyl carbon (red), 1 to 3
from top to bottom as shown. When presented in this way, the R group
of the amino acid (in this case the methyl group of alanine) is always
below the H9251 carbon. L-Amino acids are those with the H9251-amino group
on the left, and D-amino acids have the H9251-amino group on the right.
8885d_c03_077 12/23/03 10:20 AM Page 77 mac111 mac111:reb:
Amino Acids Can Be Classified by R Group
Knowledge of the chemical properties of the common
amino acids is central to an understanding of biochem-
istry. The topic can be simplified by grouping the amino
acids into five main classes based on the properties of
their R groups (Table 3–1), in particular, their polarity,
or tendency to interact with water at biological pH (near
pH 7.0). The polarity of the R groups varies widely, from
nonpolar and hydrophobic (water-insoluble) to highly
polar and hydrophilic (water-soluble).
The structures of the 20 common amino acids are
shown in Figure 3–5, and some of their properties are
listed in Table 3–1. Within each class there are grada-
tions of polarity, size, and shape of the R groups.
Nonpolar, Aliphatic R Groups The R groups in this class of
amino acids are nonpolar and hydrophobic. The side
chains of alanine, valine, leucine, and isoleucine
tend to cluster together within proteins, stabilizing pro-
tein structure by means of hydrophobic interactions.
Glycine has the simplest structure. Although it is for-
mally nonpolar, its very small side chain makes no real
contribution to hydrophobic interactions. Methionine,
one of the two sulfur-containing amino acids, has a non-
polar thioether group in its side chain. Proline has an
Chapter 3 Amino Acids, Peptides, and Proteins78
TABLE 3–1 Properties and Conventions Associated with the Common Amino Acids Found in Proteins
pK
a
values
Abbreviation/ pK
1
pK
2
pK
R
Hydropathy Occurrence in
Amino acid symbol M
r
(OCOOH) (ONH
3
H11001
) (R group) pI index* proteins (%)
?
Nonpolar, aliphatic
R groups
Glycine Gly G 75 2.34 9.60 5.97 H110020.4 7.2
Alanine Ala A 89 2.34 9.69 6.01 1.8 7.8
Proline Pro P 115 1.99 10.96 6.48 1.6 5.2
Valine Val V 117 2.32 9.62 5.97 4.2 6.6
Leucine Leu L 131 2.36 9.60 5.98 3.8 9.1
Isoleucine Ile I 131 2.36 9.68 6.02 4.5 5.3
Methionine Met M 149 2.28 9.21 5.74 1.9 2.3
Aromatic R groups
Phenylalanine Phe F 165 1.83 9.13 5.48 2.8 3.9
Tyrosine Tyr Y 181 2.20 9.11 10.07 5.66 H110021.3 3.2
Tryptophan Trp W 204 2.38 9.39 5.89 H110020.9 1.4
Polar, uncharged
R groups
Serine Ser S 105 2.21 9.15 5.68 H110020.8 6.8
Threonine Thr T 119 2.11 9.62 5.87 H110020.7 5.9
Cysteine Cys C 121 1.96 10.28 8.18 5.07 2.5 1.9
Asparagine Asn N 132 2.02 8.80 5.41 H110023.5 4.3
Glutamine Gln Q 146 2.17 9.13 5.65 H110023.5 4.2
Positively charged
R groups
Lysine Lys K 146 2.18 8.95 10.53 9.74 H110023.9 5.9
Histidine His H 155 1.82 9.17 6.00 7.59 H110023.2 2.3
Arginine Arg R 174 2.17 9.04 12.48 10.76 H110024.5 5.1
Negatively charged
R groups
Aspartate Asp D 133 1.88 9.60 3.65 2.77 H110023.5 5.3
Glutamate Glu E 147 2.19 9.67 4.25 3.22 H110023.5 6.3
*A scale combining hydrophobicity and hydrophilicity of R groups; it can be used to measure the tendency of an amino acid to seek an aqueous
environment (H11002 values) or a hydrophobic environment (H11001 values). See Chapter 11. From Kyte, J. & Doolittle, R.F. (1982) A simple method for
displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132.
?
Average occurrence in more than 1,150 proteins. From Doolittle, R.F. (1989) Redundancies in protein sequences. In Prediction of Protein Struc-
ture and the Principles of Protein Conformation (Fasman, G.D., ed.), pp. 599–623, Plenum Press, New York.
8885d_c03_078 12/23/03 10:20 AM Page 78 mac111 mac111:reb:
aliphatic side chain with a distinctive cyclic structure. The
secondary amino (imino) group of proline residues is
held in a rigid conformation that reduces the structural
flexibility of polypeptide regions containing proline.
Aromatic R Groups Phenylalanine, tyrosine, and tryp-
tophan, with their aromatic side chains, are relatively
nonpolar (hydrophobic). All can participate in hy-
drophobic interactions. The hydroxyl group of tyrosine
can form hydrogen bonds, and it is an important func-
tional group in some enzymes. Tyrosine and tryptophan
are significantly more polar than phenylalanine, because
of the tyrosine hydroxyl group and the nitrogen of the
tryptophan indole ring.
Tryptophan and tyrosine, and to a much lesser ex-
tent phenylalanine, absorb ultraviolet light (Fig. 3–6;
Box 3–1). This accounts for the characteristic strong ab-
sorbance of light by most proteins at a wavelength of
280 nm, a property exploited by researchers in the char-
acterization of proteins.
3.1 Amino Acids 79
Nonpolar, aliphatic R groups
H
3
N
H11001
C
COO
H11002
H
H H
3
N
H11001
C
COO
H11002
CH
3
H H
3
N
H11001
C
COO
H11002
C
CH
3
CH
3
H
H
Glycine Alanine Valine
Aromatic R groups
H
3
N
H11001
C
COO
H11002
CH
2
H H
3
N
H11001
C
COO
H11002
CH
2
H
OH
Phenylalanine Tyrosine
H
2
N
H11001
H
2
C
C
COO
H11002
H
C
CH
2
H
2
Proline
H
3
N
H11001
C
COO
H11002
C
CCH
H
2
H
NH
Tryptophan
Polar, uncharged R groups
H
3
N
H11001
C
COO
H11002
CH
2
OH
H H
3
N
H11001
C
COO
H11002
HC
CH
3
OH
H H
3
N
H11001
C
COO
H11002
C
SH
H
2
H
Serine Threonine
H
3
N
H11001
C
COO
H11002
C
C
H
2
NO
H
2
HH
3
N
H11001
C
COO
H11002
C
C
C
H
2
NO
H
2
H
2
H
Positively charged R groups
H11001
N
C
C
C
C
H
3
N
H11001
C
COO
H11002
H
H
2
H
2
H
2
H
2
H
3
C
N
C
C
C
H
3
N
H11001
C
COO
H11002
H
H
2
H
2
H
2
H
NH
2
N
H11001
H
2
H
3
N
H11001
C
COO
H11002
C
CNH
H
2
H
C
H
N
Lysine Arginine Histidine
Negatively charged R groups
H
3
N
H11001
C
COO
H11002
C
COO
H11002
H
2
HH
3
N
H11001
C
COO
H11002
C
C
COO
H11002
H
2
H
2
H
Aspartate GlutamateGlutamineAsparagine
Cysteine
CH
H
3
N
H11001
C
COO
H11002
C
C
CH
3
CH
3
H
H
2
H
Leucine
H
3
N
H11001
C
COO
H11002
C
C
S
CH
3
H
2
H
2
H
Methionine
H
3
H11001
C
COO
H11002
HC
C
CH
3
H
2
H
H
Isoleucine
N
C
3
FIGURE 3–5 The 20 common amino acids of proteins. The structural
formulas show the state of ionization that would predominate at pH
7.0. The unshaded portions are those common to all the amino acids;
the portions shaded in red are the R groups. Although the R group of
histidine is shown uncharged, its pK
a
(see Table 3–1) is such that a
small but significant fraction of these groups are positively charged at
pH 7.0.
8885d_c03_079 12/23/03 10:20 AM Page 79 mac111 mac111:reb:
Polar, Uncharged R Groups The R groups of these amino
acids are more soluble in water, or more hydrophilic,
than those of the nonpolar amino acids, because they
contain functional groups that form hydrogen bonds
with water. This class of amino acids includes serine,
threonine, cysteine, asparagine, and glutamine.
The polarity of serine and threonine is contributed by
their hydroxyl groups; that of cysteine by its sulfhydryl
group; and that of asparagine and glutamine by their
amide groups.
Asparagine and glutamine are the amides of two
other amino acids also found in proteins, aspartate and
glutamate, respectively, to which asparagine and gluta-
mine are easily hydrolyzed by acid or base. Cysteine is
readily oxidized to form a covalently linked dimeric
amino acid called cystine, in which two cysteine mole-
cules or residues are joined by a disulfide bond (Fig.
3–7). The disulfide-linked residues are strongly hy-
drophobic (nonpolar). Disulfide bonds play a special
role in the structures of many proteins by forming co-
valent links between parts of a protein molecule or be-
tween two different polypeptide chains.
Positively Charged (Basic) R Groups The most hydrophilic
R groups are those that are either positively or nega-
tively charged. The amino acids in which the R groups
have significant positive charge at pH 7.0 are lysine,
which has a second primary amino group at the H9255 posi-
tion on its aliphatic chain; arginine, which has a posi-
tively charged guanidino group; and histidine, which
has an imidazole group. Histidine is the only common
amino acid having an ionizable side chain with a pK
a
near neutrality. In many enzyme-catalyzed reactions, a
His residue facilitates the reaction by serving as a pro-
ton donor/acceptor.
Negatively Charged (Acidic) R Groups The two amino acids
having R groups with a net negative charge at pH 7.0
are aspartate and glutamate, each of which has a sec-
ond carboxyl group.
Uncommon Amino Acids Also Have
Important Functions
In addition to the 20 common amino acids, proteins
may contain residues created by modification of com-
mon residues already incorporated into a polypeptide
(Fig. 3–8a). Among these uncommon amino acids
are 4-hydroxyproline, a derivative of proline, and
5-hydroxylysine, derived from lysine. The former is
found in plant cell wall proteins, and both are found in
collagen, a fibrous protein of connective tissues. 6-N-
Methyllysine is a constituent of myosin, a contractile
protein of muscle. Another important uncommon amino
acid is H9253-carboxyglutamate, found in the blood-
clotting protein prothrombin and in certain other pro-
teins that bind Ca
2H11001
as part of their biological function.
More complex is desmosine, a derivative of four Lys
residues, which is found in the fibrous protein elastin.
Selenocysteine is a special case. This rare amino
acid residue is introduced during protein synthesis
rather than created through a postsynthetic modifica-
tion. It contains selenium rather than the sulfur of cys-
teine. Actually derived from serine, selenocysteine is a
constituent of just a few known proteins.
Some 300 additional amino acids have been found
in cells. They have a variety of functions but are not
constituents of proteins. Ornithine and citrulline
Chapter 3 Amino Acids, Peptides, and Proteins80
Tryptophan
Wavelength (nm)
Absorbance
5
4
3
2
1
0
6
230 240 250 260 270 280 290 300 310
Tyrosine
FIGURE 3–6 Absorption of ultraviolet light by aromatic amino acids.
Comparison of the light absorption spectra of the aromatic amino acids
tryptophan and tyrosine at pH 6.0. The amino acids are present in
equimolar amounts (10
H110023
M) under identical conditions. The meas-
ured absorbance of tryptophan is as much as four times that of tyro-
sine. Note that the maximum light absorption for both tryptophan and
tyrosine occurs near a wavelength of 280 nm. Light absorption by the
third aromatic amino acid, phenylalanine (not shown), generally con-
tributes little to the spectroscopic properties of proteins.
CH
2H
H11001
H11001 2e
H11002
2H
H11001
H11001 2e
H11002
COO
H11002
COO
H11002
H
3
N
CH
2
CH
CH
2
SH
SH
Cysteine
Cystine
Cysteine
H11001
NH
3
H11001
CH
COO
H11002
COO
H11002
H
3
N
CH
2
CH
CH
2
S
S
H11001
NH
3
H11001
FIGURE 3–7 Reversible formation of a disulfide bond by the oxida-
tion of two molecules of cysteine. Disulfide bonds between Cys
residues stabilize the structures of many proteins.
8885d_c03_080 12/23/03 10:20 AM Page 80 mac111 mac111:reb:
Amino Acids Can Act as Acids and Bases
When an amino acid is dissolved in water, it exists in so-
lution as the dipolar ion, or zwitterion (German for
“hybrid ion”), shown in Figure 3–9. A zwitterion can act
as either an acid (proton donor):
or a base (proton acceptor):
Substances having this dual nature are amphoteric
and are often called ampholytes (from “amphoteric
electrolytes”). A simple monoamino monocarboxylic H9251-
amino acid, such as alanine, is a diprotic acid when fully
protonated—it has two groups, the OCOOH group and
the ONH
3
H11001
group, that can yield protons:
H
C COOHR
H
C COO
H11002
R
H11001
NH
3
H11001 H
H11001
H11001
NH
3
Zwitterion
H
C COO
H11002
R
H11001
NH
3
H
C COO
H11002
R
NH
2
H11001 H
H11001
Zwitterion
3.1 Amino Acids 81
H
3
NCH
2
CH
2
CH
2
C
H11001
H11001
NH
3
H COO
H11002
Ornithine
H
2
NC
O
N
H
CH
2
CH
2
CH
2
C
H11001
NH
3
HCOO
H11002
Citrulline(b)
HO C
H
H
2
C
N
H11001
HH
C
CH
2
HCOO
H11002
4-Hydroxyproline
H
3
N
H11001
CH
2
C
OH
HCH
2
CH
2
C
H11001
NH
3
HCOO
H11002
5-Hydroxylysine
CH
3
NH CH
2
CH
2
CH
2
CH
2
CH COO
6-N-Methyllysine
H11002
OOC C
COO
H11002
HCH
2
C
H11001
NH
3
H COO
H11002
H9253-Carboxyglutamate
C
H
3
N
H11001
H11002
OOC
H(CH
2
)
2
C
H
3
N
H11001
COO
H11002
H
(CH
2
)
3
C
N
H11001
H
3
COO
H
C
(C
N
H11001
H
2
)
4
H
3
N
H11001
COO
H11002
H
Desmosine
HSe CH
2
C
H11001
NH
3
H COO
H11002
H11002
Selenocysteine(a)
(CH
2
)
2
H11002
H11001
NH
3
FIGURE 3–8 Uncommon amino acids. (a) Some uncommon amino
acids found in proteins. All are derived from common amino acids.
Extra functional groups added by modification reactions are shown in
red. Desmosine is formed from four Lys residues (the four carbon back-
bones are shaded in yellow). Note the use of either numbers or Greek
letters to identify the carbon atoms in these structures. (b) Ornithine
and citrulline, which are not found in proteins, are intermediates in
the biosynthesis of arginine and in the urea cycle.
H
C COO
H11002
R
H
C COOHR
H11001
NH
3
H11001
NH
3
H115451 0 H115461
H
H11001 H
C COO
H11002
R
NH
2
H
H11001
Net
charge:
H
2
NC
C
R
H H
3
N
H11001
H11002
C
C
R
H
Nonionic
form
Zwitterionic
form
O
HO
O
O
FIGURE 3–9 Nonionic and zwitterionic forms of amino acids. The
nonionic form does not occur in significant amounts in aqueous so-
lutions. The zwitterion predominates at neutral pH.
(Fig. 3–8b) deserve special note because they are key
intermediates (metabolites) in the biosynthesis of argi-
nine (Chapter 22) and in the urea cycle (Chapter 18).
8885d_c03_081 12/23/03 10:21 AM Page 81 mac111 mac111:reb:
Amino Acids Have Characteristic Titration Curves
Acid-base titration involves the gradual addition or re-
moval of protons (Chapter 2). Figure 3–10 shows the
titration curve of the diprotic form of glycine. The plot
has two distinct stages, corresponding to deprotonation
of two different groups on glycine. Each of the two
stages resembles in shape the titration curve of a
monoprotic acid, such as acetic acid (see Fig. 2–17),
and can be analyzed in the same way. At very low pH,
the predominant ionic species of glycine is the fully pro-
tonated form,
H11001
H
3
NOCH
2
OCOOH. At the midpoint in
the first stage of the titration, in which the OCOOH
group of glycine loses its proton, equimolar concentra-
tions of the proton-donor (
H11001
H
3
NOCH
2
OCOOH) and
proton-acceptor (
H11001
H
3
NOCH
2
OCOO
H11002
) species are
present. At the midpoint of any titration, a point of in-
flection is reached where the pH is equal to the pK
a
of
the protonated group being titrated (see Fig. 2–18). For
glycine, the pH at the midpoint is 2.34, thus its OCOOH
group has a pK
a
(labeled pK
1
in Fig. 3–10) of 2.34.
(Recall from Chapter 2 that pH and pK
a
are simply con-
venient notations for proton concentration and the
equilibrium constant for ionization, respectively. The
pK
a
is a measure of the tendency of a group to give up
a proton, with that tendency decreasing tenfold as the
pK
a
increases by one unit.) As the titration proceeds,
another important point is reached at pH 5.97. Here
there is another point of inflection, at which removal of
the first proton is essentially complete and removal of
the second has just begun. At this pH glycine is
present largely as the dipolar ion
H11001
H
3
NOCH
2
OCOO
H11002
.
We shall return to the significance of this inflection
point in the titration curve (labeled pI in Fig. 3–10)
shortly.
The second stage of the titration corresponds to the
removal of a proton from the ONH
3
H11001
group of glycine.
The pH at the midpoint of this stage is 9.60, equal to
the pK
a
(labeled pK
2
in Fig. 3–10) for the ONH
3
H11001
group.
The titration is essentially complete at a pH of about 12,
at which point the predominant form of glycine is
H
2
NOCH
2
OCOO
H11002
.
Chapter 3 Amino Acids, Peptides, and Proteins82
BOX 3–1 WORKING IN BIOCHEMISTRY
Absorption of Light by Molecules:
The Lambert-Beer Law
A wide range of biomolecules absorb light at charac-
teristic wavelengths, just as tryptophan absorbs light at
280 nm (see Fig. 3–6). Measurement of light absorp-
tion by a spectrophotometer is used to detect and iden-
tify molecules and to measure their concentration in
solution. The fraction of the incident light absorbed by
a solution at a given wavelength is related to the thick-
ness of the absorbing layer (path length) and the con-
centration of the absorbing species (Fig. 1). These two
relationships are combined into the Lambert-Beer law,
log H11005 H9255cl
where I
0
is the intensity of the incident light, I is the in-
tensity of the transmitted light, H9255 is the molar extinc-
tion coefficient (in units of liters per mole-centimeter),
c is the concentration of the absorbing species (in
moles per liter), and l is the path length of the light-
absorbing sample (in centimeters). The Lambert-Beer
law assumes that the incident light is parallel and
monochromatic (of a single wavelength) and that the
solvent and solute molecules are randomly oriented.
The expression log (I
0
/I) is called the absorbance,
designated A.
It is important to note that each successive milli-
meter of path length of absorbing solution in a 1.0 cm
cell absorbs not a constant amount but a constant frac-
tion of the light that is incident upon it. However, with
an absorbing layer of fixed path length, the ab-
sorbance, A, is directly proportional to the con-
centration of the absorbing solute.
The molar extinction coefficient varies with the
nature of the absorbing compound, the solvent, and
the wavelength, and also with pH if the light-absorbing
species is in equilibrium with an ionization state that
has different absorbance properties.
I
0
H5007
I
Intensity of
transmitted
light
I
DetectorMonochromatorLamp
Intensity of
incident
light
I
0
Sample cuvette
with c moles/liter
of absorbing
species
0.012A =
l
FIGURE 1 The principal components of a
spectrophotometer. A light source emits
light along a broad spectrum, then the
monochromator selects and transmits light
of a particular wavelength. The monochro-
matic light passes through the sample in a
cuvette of path length l and is absorbed by
the sample in proportion to the concentra-
tion of the absorbing species. The transmit-
ted light is measured by a detector.
8885d_c03_082 12/23/03 10:21 AM Page 82 mac111 mac111:reb:
From the titration curve of glycine we can derive
several important pieces of information. First, it gives a
quantitative measure of the pK
a
of each of the two ion-
izing groups: 2.34 for the OCOOH group and 9.60 for
the ONH
3
H11001
group. Note that the carboxyl group of
glycine is over 100 times more acidic (more easily ion-
ized) than the carboxyl group of acetic acid, which, as
we saw in Chapter 2, has a pK
a
of 4.76—about average
for a carboxyl group attached to an otherwise unsub-
stituted aliphatic hydrocarbon. The perturbed pK
a
of
glycine is caused by repulsion between the departing
proton and the nearby positively charged amino group
on the H9251-carbon atom, as described in Figure 3–11. The
opposite charges on the resulting zwitterion are stabi-
lizing, nudging the equilibrium farther to the right. Sim-
ilarly, the pK
a
of the amino group in glycine is perturbed
downward relative to the average pK
a
of an amino group.
This effect is due partly to the electronegative oxygen
atoms in the carboxyl groups, which tend to pull elec-
trons toward them, increasing the tendency of the amino
group to give up a proton. Hence, the H9251-amino group
has a pK
a
that is lower than that of an aliphatic amine
such as methylamine (Fig. 3–11). In short, the pK
a
of
any functional group is greatly affected by its chemical
environment, a phenomenon sometimes exploited in the
active sites of enzymes to promote exquisitely adapted
reaction mechanisms that depend on the perturbed pK
a
values of proton donor/acceptor groups of specific
residues.
3.1 Amino Acids 83
N
H11001
N
H11001
H11005
H11005
H11005
C
COOH
H
2
H
3
C
COO
H11002
H
2
H
3
N
C
COO
H11002
H
2
H
2
13
0.5
OH
H11002
(equivalents)
pI 5.97
pH
0
0
7
21.51
pK
2
pK
1
pK
2
9.60
pK
1
2.34
Glycine
FIGURE 3–10 Titration of an amino acid. Shown here is the titration
curve of 0.1 M glycine at 25 H11034C. The ionic species predominating at
key points in the titration are shown above the graph. The shaded
boxes, centered at about pK
1
H11005 2.34 and pK
2
H11005 9.60, indicate the re-
gions of greatest buffering power.
NH
3
H11001
Methyl-substituted
carboxyl and
amino groups
Acetic acid
The normal pK
a
for a
carboxyl group is about 4.8.
pK
a
2
4
6
8
10
12
Methylamine
The normal pK
a
for an
amino group is about 10.6.
Carboxyl and
amino groups
in glycine
H9251-Amino acid (glycine)
pK
a
H11005 2.34
Repulsion between the amino
group and the departing proton
lowers the pK
a
for the carboxyl
group, and oppositely charged
groups lower the pK
a
by stabi-
lizing the zwitterion.
H9251-Amino acid (glycine)
pK
a
H11005 9.60
Electronegative oxygen atoms
in the carboxyl group pull electrons
away from the amino group,
lowering its pK
a
.
CH
3
COOH CH
3
CH
3
COO
H11002
COO
H11002
CH
H
NH
2
COO
H11002
CH
H
CH
3
NH
3
NH
2
H
H11001
H
H11001
H11001
COOHHC
H
NH
3
H11001
H
H11001
H
H11001
H
H11001
H
H11001
H
H11001
H
H11001
FIGURE 3–11 Effect of the chemical environment on pK
a
. The pK
a
values for the ionizable groups in glycine are lower than those for sim-
ple, methyl-substituted amino and carboxyl groups. These downward
perturbations of pK
a
are due to intramolecular interactions. Similar ef-
fects can be caused by chemical groups that happen to be positioned
nearby—for example, in the active site of an enzyme.
8885d_c03_083 12/23/03 10:21 AM Page 83 mac111 mac111:reb:
group in the range of 1.8 to 2.4, and pK
a
of the ONH
3
H11001
group in the range of 8.8 to 11.0 (Table 3–1).
Second, amino acids with an ionizable R group have
more complex titration curves, with three stages corre-
sponding to the three possible ionization steps; thus
they have three pK
a
values. The additional stage for the
titration of the ionizable R group merges to some extent
with the other two. The titration curves for two amino
acids of this type, glutamate and histidine, are shown in
Figure 3–12. The isoelectric points reflect the nature of
the ionizing R groups present. For example, glutamate
Chapter 3 Amino Acids, Peptides, and Proteins84
10
8
6
4
2
0
Glutamate
H
3
N
H11001
N
H11001
N
H11001
C
COOH
C
C
COOH
H
2
H
2
H
pK
1
H
3
C
COO
H11002
C
C
COOH
H
2
H
2
H
pK
R
H
3
C
COO
H11002
C
C
COO
H11002
H
2
H
2
H
pK
2
H
2
N C
COO
H11002
C
C
COO
H11002
H
2
H
2
H
pK
2
H11005
9.67
pK
R
H11005
4.25
pK
1
H11005
2.19
1.0 2.0 3.0
pH
OH
H11002
(equivalents)
(a)
FIGURE 3–12 Titration curves for (a) glutamate and (b) histidine. The
pK
a
of the R group is designated here as pK
R
.
The second piece of information provided by the
titration curve of glycine is that this amino acid has two
regions of buffering power. One of these is the relatively
flat portion of the curve, extending for approximately
1 pH unit on either side of the first pK
a
of 2.34, indi-
cating that glycine is a good buffer near this pH. The
other buffering zone is centered around pH 9.60. (Note
that glycine is not a good buffer at the pH of intracel-
lular fluid or blood, about 7.4.) Within the buffering
ranges of glycine, the Henderson-Hasselbalch equation
(see Box 2–3) can be used to calculate the proportions
of proton-donor and proton-acceptor species of glycine
required to make a buffer at a given pH.
Titration Curves Predict the Electric Charge
of Amino Acids
Another important piece of information derived from
the titration curve of an amino acid is the relationship
between its net electric charge and the pH of the solu-
tion. At pH 5.97, the point of inflection between the
two stages in its titration curve, glycine is present pre-
dominantly as its dipolar form, fully ionized but with no
net electric charge (Fig. 3–10). The characteristic pH
at which the net electric charge is zero is called the
isoelectric point or isoelectric pH, designated pI.
For glycine, which has no ionizable group in its side
chain, the isoelectric point is simply the arithmetic mean
of the two pK
a
values:
pI H11005 H5007
1
2
H5007 (pK
1
H11001 pK
2
) H11005 H5007
1
2
H5007 (2.34 H11001 9.60) H11005 5.97
As is evident in Figure 3–10, glycine has a net negative
charge at any pH above its pI and will thus move toward
the positive electrode (the anode) when placed in an
electric field. At any pH below its pI, glycine has a net
positive charge and will move toward the negative elec-
trode (the cathode). The farther the pH of a glycine so-
lution is from its isoelectric point, the greater the net
electric charge of the population of glycine molecules.
At pH 1.0, for example, glycine exists almost entirely as
the form
H11001
H
3
NOCH
2
OCOOH, with a net positive
charge of 1.0. At pH 2.34, where there is an equal mix-
ture of
H11001
H
3
NOCH
2
OCOOH and
H11001
H
3
NOCH
2
OCOO
H11002
,
the average or net positive charge is 0.5. The sign and
the magnitude of the net charge of any amino acid at
any pH can be predicted in the same way.
Amino Acids Differ in Their Acid-Base Properties
The shared properties of many amino acids permit some
simplifying generalizations about their acid-base behav-
iors. First, all amino acids with a single H9251-amino group,
a single H9251-carboxyl group, and an R group that does not
ionize have titration curves resembling that of glycine
(Fig. 3–10). These amino acids have very similar, al-
though not identical, pK
a
values: pK
a
of the OCOOH
C
H
3
N
H11001
C
COOH
C
CH
C
H
N
H
2
H H
3
N
H11001
C
COO
H11002
CH
2
H H
3
N
H11001
C
COO
H11002
CH
2
HH
2
NC
CH
2
H
pK
1
H11005
1.82
pK
R
H11005
6.0
pK
2
H11005
9.17
C
H
N
CH
C
H
N
H11001
H
C
H
N
CH
C
H
N
H11001
H
C
H
N
CH
C
H
N
10
8
6
4
2
0 1.0 2.0 3.0
pH
OH
H11002
(equivalents)
(b)
COO
H11002
H
N
Histidine
pK
2
pK
R
pK
1
8885d_c03_084 12/23/03 10:21 AM Page 84 mac111 mac111:reb:
has a pI of 3.22, considerably lower than that of glycine.
This is due to the presence of two carboxyl groups,
which, at the average of their pK
a
values (3.22), con-
tribute a net charge of H110021 that balances the H110011 con-
tributed by the amino group. Similarly, the pI of histi-
dine, with two groups that are positively charged when
protonated, is 7.59 (the average of the pK
a
values of the
amino and imidazole groups), much higher than that of
glycine.
Finally, as pointed out earlier, under the general
condition of free and open exposure to the aqueous en-
vironment, only histidine has an R group (pK
a
H11005 6.0)
providing significant buffering power near the neutral
pH usually found in the intracellular and extracellular
fluids of most animals and bacteria (Table 3–1).
SUMMARY 3.1 Amino Acids
■ The 20 amino acids commonly found as
residues in proteins contain an H9251-carboxyl
group, an H9251-amino group, and a distinctive R
group substituted on the H9251-carbon atom. The
H9251-carbon atom of all amino acids except glycine
is asymmetric, and thus amino acids can exist
in at least two stereoisomeric forms. Only the
L stereoisomers, with a configuration related to
the absolute configuration of the reference
molecule L-glyceraldehyde, are found in
proteins.
■ Other, less common amino acids also occur,
either as constituents of proteins (through
modification of common amino acid residues
after protein synthesis) or as free metabolites.
■ Amino acids are classified into five types on the
basis of the polarity and charge (at pH 7) of
their R groups.
■ Amino acids vary in their acid-base properties
and have characteristic titration curves.
Monoamino monocarboxylic amino acids (with
nonionizable R groups) are diprotic acids
(
H11001
H
3
NCH(R)COOH) at low pH and exist in
several different ionic forms as the pH is
increased. Amino acids with ionizable R groups
have additional ionic species, depending on the
pH of the medium and the pK
a
of the R group.
3.2 Peptides and Proteins
We now turn to polymers of amino acids, the peptides
and proteins. Biologically occurring polypeptides range
in size from small to very large, consisting of two or
three to thousands of linked amino acid residues. Our
focus is on the fundamental chemical properties of these
polymers.
Peptides Are Chains of Amino Acids
Two amino acid molecules can be covalently joined
through a substituted amide linkage, termed a peptide
bond, to yield a dipeptide. Such a linkage is formed by
removal of the elements of water (dehydration) from
the H9251-carboxyl group of one amino acid and the H9251-amino
group of another (Fig. 3–13). Peptide bond formation is
an example of a condensation reaction, a common class
of reactions in living cells. Under standard biochemical
conditions, the equilibrium for the reaction shown in Fig-
ure 3–13 favors the amino acids over the dipeptide. To
make the reaction thermodynamically more favorable,
the carboxyl group must be chemically modified or ac-
tivated so that the hydroxyl group can be more readily
eliminated. A chemical approach to this problem is out-
lined later in this chapter. The biological approach to
peptide bond formation is a major topic of Chapter 27.
Three amino acids can be joined by two peptide
bonds to form a tripeptide; similarly, amino acids can be
linked to form tetrapeptides, pentapeptides, and so
forth. When a few amino acids are joined in this fash-
ion, the structure is called an oligopeptide. When many
amino acids are joined, the product is called a polypep-
tide. Proteins may have thousands of amino acid
residues. Although the terms “protein” and “polypep-
tide” are sometimes used interchangeably, molecules re-
ferred to as polypeptides generally have molecular
weights below 10,000, and those called proteins have
higher molecular weights.
Figure 3–14 shows the structure of a pentapeptide.
As already noted, an amino acid unit in a peptide is often
called a residue (the part left over after losing a hydro-
gen atom from its amino group and the hydroxyl moi-
ety from its carboxyl group). In a peptide, the amino
acid residue at the end with a free H9251-amino group is the
amino-terminal (or N-terminal) residue; the residue
3.2 Peptides and Proteins 85
H
3
N
H11001
C
R
1
HC
O
OH H11001 H N
H
C
R
2
H COO
H11002
H
2
O
H
2
O
H
3
N
H11001
C
R
1
HC
O
N
H
C
R
2
H COO
H11002
FIGURE 3–13 Formation of a peptide bond by condensation. The H9251-
amino group of one amino acid (with R
2
group) acts as a nucleophile
to displace the hydroxyl group of another amino acid (with R
1
group),
forming a peptide bond (shaded in yellow). Amino groups are good
nucleophiles, but the hydroxyl group is a poor leaving group and is
not readily displaced. At physiological pH, the reaction shown does
not occur to any appreciable extent.
8885d_c03_085 12/23/03 10:22 AM Page 85 mac111 mac111:reb:
at the other end, which has a free carboxyl group, is the
carboxyl-terminal (C-terminal) residue.
Although hydrolysis of a peptide bond is an exer-
gonic reaction, it occurs slowly because of its high acti-
vation energy. As a result, the peptide bonds in proteins
are quite stable, with an average half-life (t
1/2
) of about
7 years under most intracellular conditions.
Peptides Can Be Distinguished by Their
Ionization Behavior
Peptides contain only one free H9251-amino group and one
free H9251-carboxyl group, at opposite ends of the chain
(Fig. 3–15). These groups ionize as they do in free amino
acids, although the ionization constants are different be-
cause an oppositely charged group is no longer linked
to the H9251 carbon. The H9251-amino and H9251-carboxyl groups of
all nonterminal amino acids are covalently joined in the
peptide bonds, which do not ionize and thus do not con-
tribute to the total acid-base behavior of peptides. How-
ever, the R groups of some amino acids can ionize (Table
3–1), and in a peptide these contribute to the overall
acid-base properties of the molecule (Fig. 3–15). Thus
the acid-base behavior of a peptide can be predicted
from its free H9251-amino and H9251-carboxyl groups as well as
the nature and number of its ionizable R groups.
Like free amino acids, peptides have characteristic
titration curves and a characteristic isoelectric pH (pI)
at which they do not move in an electric field. These
properties are exploited in some of the techniques used
to separate peptides and proteins, as we shall see later
in the chapter. It should be emphasized that the pK
a
value for an ionizable R group can change somewhat
when an amino acid becomes a residue in a peptide. The
loss of charge in the H9251-carboxyl and H9251-amino groups,
the interactions with other peptide R groups, and other
environmental factors can affect the pK
a
. The pK
a
val-
ues for R groups listed in Table 3–1 can be a useful guide
to the pH range in which a given group will ionize, but
they cannot be strictly applied to peptides.
Biologically Active Peptides and Polypeptides
Occur in a Vast Range of Sizes
No generalizations can be made about the molecular
weights of biologically active peptides and proteins in re-
lation to their functions. Naturally occurring peptides
range in length from two to many thousands of amino
acid residues. Even the smallest peptides can have bio-
logically important effects. Consider the commercially
synthesized dipeptide L-aspartyl-L-phenylalanine methyl
ester, the artificial sweetener better known as aspartame
or NutraSweet.
Many small peptides exert their effects at very low
concentrations. For example, a number of vertebrate
hormones (Chapter 23) are small peptides. These in-
clude oxytocin (nine amino acid residues), which is se-
creted by the posterior pituitary and stimulates uterine
contractions; bradykinin (nine residues), which inhibits
inflammation of tissues; and thyrotropin-releasing fac-
tor (three residues), which is formed in the hypothala-
mus and stimulates the release of another hormone,
thyrotropin, from the anterior pituitary gland. Some
extremely toxic mushroom poisons, such as amanitin,
are also small peptides, as are many antibiotics.
Slightly larger are small polypeptides and oligopep-
tides such as the pancreatic hormone insulin, which con-
tains two polypeptide chains, one having 30 amino acid
H
3
N
H11001
C
C
COO
H11002
H
2
HC
O
N
H
C
CH
2
HC
O
OCH
3
L-Aspartyl-L-phenylalanine methyl ester
(aspartame)
Chapter 3 Amino Acids, Peptides, and Proteins86
H
3
N
H11001
C
CH
2
OH
H
C
O
N
H
C
H
H
C
O
N
H
C
CH
2
H
C
O
N
H
C
CH
3
H
C
OH
N
H
C
C
C
CH
3
CH
3
H
H
2
COO
H11002
Amino- Carboxyl-
terminal end terminal end
OH
FIGURE 3–14 The pentapeptide serylglycyltyrosylalanylleucine, or
Ser–Gly–Tyr–Ala–Leu. Peptides are named beginning with the amino-
terminal residue, which by convention is placed at the left. The pep-
tide bonds are shaded in yellow; the R groups are in red.
Ala
C
COO
H11002
NH
OC
C
NH
OC
C
NH
OC
C
N
H11001
H
3
HCH
3
HCH
2
CH
2
COO
H11002
H
2
HCH
2
CH
2
CH
2
CH
2
N
H11001
H
3
Lys
Gly
Glu
FIGURE 3–15 Alanylglutamylglycyllysine. This tetrapeptide has one
free H9251-amino group, one free H9251-carboxyl group, and two ionizable R
groups. The groups ionized at pH 7.0 are in red.
8885d_c03_086 12/23/03 10:22 AM Page 86 mac111 mac111:reb:
residues and the other 21. Glucagon, another pancre-
atic hormone, has 29 residues; it opposes the action of
insulin. Corticotropin is a 39-residue hormone of the an-
terior pituitary gland that stimulates the adrenal cortex.
How long are the polypeptide chains in proteins? As
Table 3–2 shows, lengths vary considerably. Human cyto-
chrome c has 104 amino acid residues linked in a single
chain; bovine chymotrypsinogen has 245 residues. At
the extreme is titin, a constituent of vertebrate muscle,
which has nearly 27,000 amino acid residues and a mo-
lecular weight of about 3,000,000. The vast majority of
naturally occurring proteins are much smaller than this,
containing fewer than 2,000 amino acid residues.
Some proteins consist of a single polypeptide chain,
but others, called multisubunit proteins, have two or
more polypeptides associated noncovalently (Table
3–2). The individual polypeptide chains in a multisub-
unit protein may be identical or different. If at least two
are identical the protein is said to be oligomeric, and
the identical units (consisting of one or more polypep-
tide chains) are referred to as protomers. Hemoglobin,
for example, has four polypeptide subunits: two
identical H9251 chains and two identical H9252 chains, all four
held together by noncovalent interactions. Each H9251 sub-
unit is paired in an identical way with a H9252 subunit within
the structure of this multisubunit protein, so that he-
moglobin can be considered either a tetramer of four
polypeptide subunits or a dimer of H9251H9252 protomers.
A few proteins contain two or more polypeptide
chains linked covalently. For example, the two polypep-
tide chains of insulin are linked by disulfide bonds. In
such cases, the individual polypeptides are not consid-
ered subunits but are commonly referred to simply as
chains.
We can calculate the approximate number of amino
acid residues in a simple protein containing no other
chemical constituents by dividing its molecular weight
by 110. Although the average molecular weight of the
20 common amino acids is about 138, the smaller amino
acids predominate in most proteins. If we take into ac-
count the proportions in which the various amino acids
occur in proteins (Table 3–1), the average molecular
weight of protein amino acids is nearer to 128. Because
a molecule of water (M
r
18) is removed to create each
peptide bond, the average molecular weight of an amino
acid residue in a protein is about 128 H11002 18 H11005 110.
Polypeptides Have Characteristic
Amino Acid Compositions
Hydrolysis of peptides or proteins with acid yields a mix-
ture of free H9251-amino acids. When completely hydrolyzed,
each type of protein yields a characteristic proportion
or mixture of the different amino acids. The 20 common
amino acids almost never occur in equal amounts in a
protein. Some amino acids may occur only once or not
at all in a given type of protein; others may occur in
large numbers. Table 3–3 shows the composition of the
amino acid mixtures obtained on complete hydrolysis of
bovine cytochrome c and chymotrypsinogen, the inac-
tive precursor of the digestive enzyme chymotrypsin.
These two proteins, with very different functions, also
differ significantly in the relative numbers of each kind
of amino acid they contain.
Complete hydrolysis alone is not sufficient for an
exact analysis of amino acid composition, however, be-
cause some side reactions occur during the procedure.
For example, the amide bonds in the side chains of as-
paragine and glutamine are cleaved by acid treatment,
yielding aspartate and glutamate, respectively. The side
chain of tryptophan is almost completely degraded by
acid hydrolysis, and small amounts of serine, threonine,
3.2 Peptides and Proteins 87
TABLE 3–2 Molecular Data on Some Proteins
Molecular Number of Number of
weight residues polypeptide chains
Cytochrome c (human) 13,000 104 1
Ribonuclease A (bovine pancreas) 13,700 124 1
Lysozyme (chicken egg white) 13,930 129 1
Myoglobin (equine heart) 16,890 153 1
Chymotrypsin (bovine pancreas) 21,600 241 3
Chymotrypsinogen (bovine) 22,000 245 1
Hemoglobin (human) 64,500 574 4
Serum albumin (human) 68,500 609 1
Hexokinase (yeast) 102,000 972 2
RNA polymerase (E. coli) 450,000 4,158 5
Apolipoprotein B (human) 513,000 4,536 1
Glutamine synthetase (E. coli) 619,000 5,628 12
Titin (human) 2,993,000 26,926 1
8885d_c03_087 12/23/03 10:22 AM Page 87 mac111 mac111:reb:
and tyrosine are also lost. When a precise amino acid
composition is required, biochemists use additional pro-
cedures to resolve the ambiguities that remain from acid
hydrolysis.
Some Proteins Contain Chemical Groups
Other Than Amino Acids
Many proteins, for example the enzymes ribonuclease
A and chymotrypsinogen, contain only amino acid
residues and no other chemical constituents; these are
considered simple proteins. However, some proteins
contain permanently associated chemical components
in addition to amino acids; these are called conjugated
proteins. The non–amino acid part of a conjugated pro-
tein is usually called its prosthetic group. Conjugated
proteins are classified on the basis of the chemical na-
ture of their prosthetic groups (Table 3–4); for exam-
ple, lipoproteins contain lipids, glycoproteins contain
sugar groups, and metalloproteins contain a specific
metal. A number of proteins contain more than one pros-
thetic group. Usually the prosthetic group plays an im-
portant role in the protein’s biological function.
There Are Several Levels of Protein Structure
For large macromolecules such as proteins, the tasks of
describing and understanding structure are approached
at several levels of complexity, arranged in a kind of con-
ceptual hierarchy. Four levels of protein structure are
commonly defined (Fig. 3–16). A description of all co-
valent bonds (mainly peptide bonds and disulfide
bonds) linking amino acid residues in a polypeptide
chain is its primary structure. The most important el-
ement of primary structure is the sequence of amino
acid residues. Secondary structure refers to particu-
larly stable arrangements of amino acid residues giving
rise to recurring structural patterns. Tertiary struc-
ture describes all aspects of the three-dimensional fold-
ing of a polypeptide. When a protein has two or more
polypeptide subunits, their arrangement in space is re-
ferred to as quaternary structure. Primary structure
is the focus of Section 3.4; the higher levels of structure
are discussed in Chapter 4.
SUMMARY 3.2 Peptides and Proteins
■ Amino acids can be joined covalently through
peptide bonds to form peptides and proteins.
Cells generally contain thousands of different
proteins, each with a different biological activity.
■ Proteins can be very long polypeptide chains of
100 to several thousand amino acid residues.
However, some naturally occurring peptides
have only a few amino acid residues. Some
proteins are composed of several noncovalently
Chapter 3 Amino Acids, Peptides, and Proteins88
*In some common analyses, such as acid hydrolysis, Asp and Asn are not readily distin-
guished from each other and are together designated Asx (or B). Similarly, when Glu and
Gln cannot be distinguished, they are together designated Glx (or Z). In addition, Trp is
destroyed. Additional procedures must be employed to obtain an accurate assessment of
complete amino acid content.
Number of residues
per molecule of protein*
Amino Bovine Bovine
acid cytochrome c chymotrypsinogen
Ala 6 22
Arg 2 4
Asn 5 15
Asp 3 8
Cys 2 10
Gln 3 10
Glu 9 5
Gly 14 23
His 3 2
Ile 6 10
Leu 6 19
Lys 18 14
Met 2 2
Phe 4 6
Pro 4 9
Ser 1 28
Thr 8 23
Trp 1 8
Tyr 4 4
Val 3 23
Total 104 245
Amino Acid Composition of
Two Proteins
TABLE 3–3 TABLE 3–4 Conjugated Proteins
Class Prosthetic group Example
Lipoproteins Lipids H9252
1
-Lipoprotein
of blood
Glycoproteins Carbohydrates Immunoglobulin G
Phosphoproteins Phosphate groups Casein of milk
Hemoproteins Heme (iron porphyrin) Hemoglobin
Flavoproteins Flavin nucleotides Succinate
dehydrogenase
Metalloproteins Iron Ferritin
Zinc Alcohol
dehydrogenase
Calcium Calmodulin
Molybdenum Dinitrogenase
Copper Plastocyanin
8885d_c03_088 12/23/03 10:22 AM Page 88 mac111 mac111:reb:
associated polypeptide chains, called subunits.
Simple proteins yield only amino acids on
hydrolysis; conjugated proteins contain in
addition some other component, such as a
metal or organic prosthetic group.
■ The sequence of amino acids in a protein is
characteristic of that protein and is called its
primary structure. This is one of four generally
recognized levels of protein structure.
3.3 Working with Proteins
Our understanding of protein structure and function has
been derived from the study of many individual proteins.
To study a protein in detail, the researcher must be able
to separate it from other proteins and must have the
techniques to determine its properties. The necessary
methods come from protein chemistry, a discipline as
old as biochemistry itself and one that retains a central
position in biochemical research.
Proteins Can Be Separated and Purified
A pure preparation is essential before a protein’s prop-
erties and activities can be determined. Given that cells
contain thousands of different kinds of proteins, how
can one protein be purified? Methods for separating pro-
teins take advantage of properties that vary from one
protein to the next, including size, charge, and binding
properties.
The source of a protein is generally tissue or mi-
crobial cells. The first step in any protein purification
procedure is to break open these cells, releasing their
proteins into a solution called a crude extract. If nec-
essary, differential centrifugation can be used to pre-
pare subcellular fractions or to isolate specific or-
ganelles (see Fig. 1–8).
Once the extract or organelle preparation is ready,
various methods are available for purifying one or more
of the proteins it contains. Commonly, the extract is sub-
jected to treatments that separate the proteins into dif-
ferent fractions based on a property such as size or
charge, a process referred to as fractionation. Early
fractionation steps in a purification utilize differences in
protein solubility, which is a complex function of pH,
temperature, salt concentration, and other factors. The
solubility of proteins is generally lowered at high salt
concentrations, an effect called “salting out.” The addi-
tion of a salt in the right amount can selectively pre-
cipitate some proteins, while others remain in solution.
Ammonium sulfate ((NH
4
)
2
SO
4
) is often used for this
purpose because of its high solubility in water.
A solution containing the protein of interest often
must be further altered before subsequent purification
steps are possible. For example, dialysis is a procedure
that separates proteins from solvents by taking advan-
tage of the proteins’ larger size. The partially purified
extract is placed in a bag or tube made of a semiper-
meable membrane. When this is suspended in a much
larger volume of buffered solution of appropriate ionic
strength, the membrane allows the exchange of salt and
buffer but not proteins. Thus dialysis retains large pro-
teins within the membranous bag or tube while allow-
ing the concentration of other solutes in the protein
preparation to change until they come into equilibrium
with the solution outside the membrane. Dialysis might
be used, for example, to remove ammonium sulfate from
the protein preparation.
The most powerful methods for fractionating pro-
teins make use of column chromatography, which
takes advantage of differences in protein charge, size,
3.3 Working with Proteins 89
Primary
structure
Secondary
structure
Tertiary
structure
Quaternary
structure
Amino acid residues
Lys
Lys
Gly
Gly
Leu
Val
Ala
His
Helix Polypeptide chain Assembled subunitsH9251
FIGURE 3–16 Levels of structure in proteins. The primary structure
consists of a sequence of amino acids linked together by peptide bonds
and includes any disulfide bonds. The resulting polypeptide can be
coiled into units of secondary structure, such as an H9251 helix. The he-
lix is a part of the tertiary structure of the folded polypeptide, which
is itself one of the subunits that make up the quaternary structure of
the multisubunit protein, in this case hemoglobin.
8885d_c03_089 12/23/03 11:06 AM Page 89 mac111 mac111:reb:
length. And as the length of time spent on the column
increases, the resolution can decline as a result of dif-
fusional spreading within each protein band.
Figure 3–18 shows two other variations of column
chromatography in addition to ion exchange. Size-
exclusion chromatography separates proteins ac-
cording to size. In this method, large proteins emerge
from the column sooner than small ones—a somewhat
counterintuitive result. The solid phase consists of
beads with engineered pores or cavities of a particular
size. Large proteins cannot enter the cavities, and so
take a short (and rapid) path through the column,
around the beads. Small proteins enter the cavities, and
migrate through the column more slowly as a result (Fig.
3–18b). Affinity chromatography is based on the
binding affinity of a protein. The beads in the column
have a covalently attached chemical group. A protein
with affinity for this particular chemical group will bind
to the beads in the column, and its migration will be re-
tarded as a result (Fig. 3–18c).
A modern refinement in chromatographic methods
is HPLC, or high-performance liquid chromatogra-
phy. HPLC makes use of high-pressure pumps that
speed the movement of the protein molecules down the
column, as well as higher-quality chromatographic ma-
terials that can withstand the crushing force of the pres-
surized flow. By reducing the transit time on the col-
umn, HPLC can limit diffusional spreading of protein
bands and thus greatly improve resolution.
The approach to purification of a protein that has
not previously been isolated is guided both by estab-
lished precedents and by common sense. In most cases,
several different methods must be used sequentially to
purify a protein completely. The choice of method is
Chapter 3 Amino Acids, Peptides, and Proteins90
Solid
porous
matrix
(stationary
phase)
Porous
support
Effluent
Reservoir
Protein
sample
(mobile
phase)
Proteins
A
B
C
FIGURE 3–17 Column chromatography. The standard elements of a
chromatographic column include a solid, porous material supported
inside a column, generally made of plastic or glass. The solid material
(matrix) makes up the stationary phase through which flows a solu-
tion, the mobile phase. The solution that passes out of the column at
the bottom (the effluent) is constantly replaced by solution supplied
from a reservoir at the top. The protein solution to be separated is lay-
ered on top of the column and allowed to percolate into the solid
matrix. Additional solution is added on top. The protein solution forms
a band within the mobile phase that is initially the depth of the pro-
tein solution applied to the column. As proteins migrate through the
column, they are retarded to different degrees by their different inter-
actions with the matrix material. The overall protein band thus widens
as it moves through the column. Individual types of proteins (such as
A, B, and C, shown in blue, red, and green) gradually separate from
each other, forming bands within the broader protein band. Separa-
tion improves (resolution increases) as the length of the column in-
creases. However, each individual protein band also broadens with
time due to diffusional spreading, a process that decreases resolution.
In this example, protein A is well separated from B and C, but diffu-
sional spreading prevents complete separation of B and C under these
conditions.
binding affinity, and other properties (Fig. 3–17). A
porous solid material with appropriate chemical prop-
erties (the stationary phase) is held in a column, and a
buffered solution (the mobile phase) percolates through
it. The protein-containing solution, layered on the top
of the column, percolates through the solid matrix as an
ever-expanding band within the larger mobile phase
(Fig. 3–17). Individual proteins migrate faster or more
slowly through the column depending on their proper-
ties. For example, in cation-exchange chromatogra-
phy (Fig. 3–18a), the solid matrix has negatively
charged groups. In the mobile phase, proteins with a net
positive charge migrate through the matrix more slowly
than those with a net negative charge, because the mi-
gration of the former is retarded more by interaction
with the stationary phase. The two types of protein can
separate into two distinct bands. The expansion of the
protein band in the mobile phase (the protein solution)
is caused both by separation of proteins with different
properties and by diffusional spreading. As the length
of the column increases, the resolution of two types of
protein with different net charges generally improves.
However, the rate at which the protein solution can flow
through the column usually decreases with column
8885d_c03_090 12/23/03 10:23 AM Page 90 mac111 mac111:reb:
Protein mixture is added
to column containing
cation exchangers.
(a)
123456
Large net positive charge
Net positive charge
Net negative charge
Large net negative charge
Proteins move through the column at rates
determined by their net charge at the pH
being used. With cation exchangers,
proteins with a more negative net charge
move faster and elute earlier.
Polymer beads with
negatively charged
functional groups
FIGURE 3–18 Three chromatographic methods used in protein purifi-
cation. (a) Ion-exchange chromatography exploits differences in the
sign and magnitude of the net electric charges of proteins at a given
pH. The column matrix is a synthetic polymer containing bound
charged groups; those with bound anionic groups are called cation
exchangers, and those with bound cationic groups are called anion
exchangers. Ion-exchange chromatography on a cation exchanger is
shown here. The affinity of each protein for the charged groups on the
column is affected by the pH (which determines the ionization state
of the molecule) and the concentration of competing free salt ions in
the surrounding solution. Separation can be optimized by gradually
changing the pH and/or salt concentration of the mobile phase so as
to create a pH or salt gradient. (b) Size-exclusion chromatography,
also called gel filtration, separates proteins according to size. The
column matrix is a cross-linked polymer with pores of selected size.
Larger proteins migrate faster than smaller ones, because they are too
large to enter the pores in the beads and hence take a more direct
route through the column. The smaller proteins enter the pores and
are slowed by their more labyrinthine path through the column.
(c) Affinity chromatography separates proteins by their binding speci-
ficities. The proteins retained on the column are those that bind
specifically to a ligand cross-linked to the beads. (In biochemistry, the
term “ligand” is used to refer to a group or molecule that binds to a
macromolecule such as a protein.) After proteins that do not bind to
the ligand are washed through the column, the bound protein of
particular interest is eluted (washed out of the column) by a solution
containing free ligand.
Protein molecules separate
by size; larger molecules
pass more freely, appearing
in the earlier fractions. 123456
Protein mixture is added
to column containing
cross-linked polymer.
Porous
polymer beads
(b)
Unwanted proteins
are washed through
column.
Protein of interest
is eluted by ligand
solution.
Protein of
interest
Ligand
Protein mixture is
added to column
containing a
polymer-bound
ligand specific for
protein of interest.
Mixture
of proteins
7 86543
Solution
of ligand
3 421 5
(c)
8885d_c03_091 12/23/03 10:23 AM Page 91 mac111 mac111:reb:
Chapter 3 Amino Acids, Peptides, and Proteins92
somewhat empirical, and many protocols may be tried
before the most effective one is found. Trial and error
can often be minimized by basing the procedure on pu-
rification techniques developed for similar proteins.
Published purification protocols are available for many
thousands of proteins. Common sense dictates that in-
expensive procedures such as salting out be used first,
when the total volume and the number of contaminants
are greatest. Chromatographic methods are often im-
practical at early stages, because the amount of chro-
matographic medium needed increases with sample
size. As each purification step is completed, the sample
size generally becomes smaller (Table 3–5), making it
feasible to use more sophisticated (and expensive)
chromatographic procedures at later stages.
Proteins Can Be Separated and Characterized
by Electrophoresis
Another important technique for the separation of pro-
teins is based on the migration of charged proteins in
an electric field, a process called electrophoresis.
These procedures are not generally used to purify pro-
teins in large amounts, because simpler alternatives are
usually available and electrophoretic methods often
adversely affect the structure and thus the function of
proteins. Electrophoresis is, however, especially useful
as an analytical method. Its advantage is that proteins
can be visualized as well as separated, permitting a
researcher to estimate quickly the number of different
proteins in a mixture or the degree of purity of a par-
ticular protein preparation. Also, electrophoresis allows
determination of crucial properties of a protein such as
its isoelectric point and approximate molecular weight.
Electrophoresis of proteins is generally carried out
in gels made up of the cross-linked polymer polyacryl-
amide (Fig. 3–19). The polyacrylamide gel acts as a mo-
lecular sieve, slowing the migration of proteins approx-
imately in proportion to their charge-to-mass ratio.
Migration may also be affected by protein shape. In elec-
trophoresis, the force moving the macromolecule is the
electrical potential, E. The electrophoretic mobility of
the molecule, H9262, is the ratio of the velocity of the par-
ticle molecule, V, to the electrical potential. Electro-
phoretic mobility is also equal to the net charge of
the molecule, Z, divided by the frictional coefficient, f,
which reflects in part a protein’s shape. Thus:
H9262 H11005 H5007
E
V
H5007 H11005 H5007
Z
f
H5007
The migration of a protein in a gel during electro-
phoresis is therefore a function of its size and its shape.
An electrophoretic method commonly employed for
estimation of purity and molecular weight makes use of
the detergent sodium dodecyl sulfate (SDS).
SDS binds to most proteins in amounts roughly propor-
tional to the molecular weight of the protein, about one
molecule of SDS for every two amino acid residues. The
bound SDS contributes a large net negative charge, ren-
dering the intrinsic charge of the protein insignificant
and conferring on each protein a similar charge-to-mass
ratio. In addition, the native conformation of a protein
is altered when SDS is bound, and most proteins assume
a similar shape. Electrophoresis in the presence of SDS
therefore separates proteins almost exclusively on the
basis of mass (molecular weight), with smaller polypep-
tides migrating more rapidly. After electrophoresis, the
proteins are visualized by adding a dye such as
Coomassie blue, which binds to proteins but not to the
gel itself (Fig. 3–19b). Thus, a researcher can monitor
the progress of a protein purification procedure as the
number of protein bands visible on the gel decreases af-
ter each new fractionation step. When compared with
the positions to which proteins of known molecular
weight migrate in the gel, the position of an unidenti-
fied protein can provide an excellent measure of its mo-
lecular weight (Fig. 3–20). If the protein has two or more
different subunits, the subunits will generally be sepa-
rated by the SDS treatment and a separate band will ap-
pear for each. SDS Gel Electrophoresis
(CH
2
)
11
CH
3
O
SNa
H11001 H11002
OO
O
Sodium dodecyl sulfate
(SDS)
TABLE 3–5 A Purification Table for a Hypothetical Enzyme
Fraction volume Total protein Activity Specific activity
Procedure or step (ml) (mg) (units) (units/mg)
1. Crude cellular extract 1,400 10,000 100,000 10
2. Precipitation with ammonium sulfate 280 3,000 96,000 32
3. Ion-exchange chromatography 90 400 80,000 200
4. Size-exclusion chromatography 80 100 60,000 600
5. Affinity chromatography 6 3 45,000 15,000
Note: All data represent the status of the sample after the designated procedure has been carried out. Activity and specific activity are defined on page 94.
8885d_c03_092 12/23/03 10:23 AM Page 92 mac111 mac111:reb:
Sample
Well
Direction
of
migration
+
–
(a) (b)
FIGURE 3–19 Electrophoresis. (a) Different samples are loaded in
wells or depressions at the top of the polyacrylamide gel. The proteins
move into the gel when an electric field is applied. The gel minimizes
convection currents caused by small temperature gradients, as well as
protein movements other than those induced by the electric field.
(b) Proteins can be visualized after electrophoresis by treating the gel
with a stain such as Coomassie blue, which binds to the proteins but
not to the gel itself. Each band on the gel represents a different pro-
tein (or protein subunit); smaller proteins move through the gel more
rapidly than larger proteins and therefore are found nearer the bottom
of the gel. This gel illustrates the purification of the enzyme RNA poly-
merase from E. coli. The first lane shows the proteins present in the
crude cellular extract. Successive lanes (left to right) show the proteins
present after each purification step. The purified protein contains four
subunits, as seen in the last lane on the right.
200,000
116,250
97,400
66,200
45,000
31,000
21,500
14,400
M
r
standards
Unknown
protein
Myosin
b-Galactosidase
Glycogen phosphorylase b
Bovine serum albumin
Ovalbumin
Carbonic anhydrase
Soybean trypsin inhibitor
Lysozyme
–
+
12
(a)
log
M
r
Relative migration
Unknown
protein
(b)
FIGURE 3–20 Estimating the molecular weight of a protein. The
electrophoretic mobility of a protein on an SDS polyacrylamide gel
is related to its molecular weight, M
r
. (a) Standard proteins of
known molecular weight are subjected to electrophoresis (lane 1).
These marker proteins can be used to estimate the molecular
weight of an unknown protein (lane 2). (b) A plot of log M
r
of the
marker proteins versus relative migration during electrophoresis is
linear, which allows the molecular weight of the unknown protein
to be read from the graph.
Isoelectric focusing is a procedure used to de-
termine the isoelectric point (pI) of a protein (Fig.
3–21). A pH gradient is established by allowing a mix-
ture of low molecular weight organic acids and bases
(ampholytes; p. 81) to distribute themselves in an elec-
tric field generated across the gel. When a protein mix-
ture is applied, each protein migrates until it reaches
the pH that matches its pI (Table 3–6). Proteins with
different isoelectric points are thus distributed differ-
ently throughout the gel.
Combining isoelectric focusing and SDS electropho-
resis sequentially in a process called two-dimensional
933.3 Working with Proteins
8885d_c03_093 1/16/04 6:48 AM Page 93 mac76 mac76:385_reb:
pH 9
pH 3
–
+
–
+
–
+
An ampholyte
solution is
incorporated
into a gel.
Decreasing pH
A stable pH gradient
is established in the
gel after application
of an electric field.
Protein solution is
added and electric
field is reapplied.
After staining, proteins
are shown to be
distributed along pH
gradient according to
their pI values.
FIGURE 3–21 Isoelectric focusing. This
technique separates proteins according to
their isoelectric points. A stable pH gradient
is established in the gel by the addition of
appropriate ampholytes. A protein mixture
is placed in a well on the gel. With an
applied electric field, proteins enter the gel
and migrate until each reaches a pH
equivalent to its pI. Remember that when
pH H11005 pI, the net charge of a protein is zero.
electrophoresis permits the resolution of complex
mixtures of proteins (Fig. 3–22). This is a more sensi-
tive analytical method than either electrophoretic
method alone. Two-dimensional electrophoresis sepa-
rates proteins of identical molecular weight that differ
in pI, or proteins with similar pI values but different mo-
lecular weights.
Unseparated Proteins Can Be Quantified
To purify a protein, it is essential to have a way of de-
tecting and quantifying that protein in the presence of
many other proteins at each stage of the procedure.
Often, purification must proceed in the absence of any
information about the size and physical properties of the
protein or about the fraction of the total protein mass
it represents in the extract. For proteins that are en-
zymes, the amount in a given solution or tissue extract
can be measured, or assayed, in terms of the catalytic
effect the enzyme produces—that is, the increase in
the rate at which its substrate is converted to reaction
products when the enzyme is present. For this purpose
one must know (1) the overall equation of the reaction
catalyzed, (2) an analytical procedure for determining
the disappearance of the substrate or the appearance of
a reaction product, (3) whether the enzyme requires co-
factors such as metal ions or coenzymes, (4) the de-
pendence of the enzyme activity on substrate concen-
tration, (5) the optimum pH, and (6) a temperature
zone in which the enzyme is stable and has high activ-
ity. Enzymes are usually assayed at their optimum pH
and at some convenient temperature within the range
25 to 38 H11034C. Also, very high substrate concentrations are
generally used so that the initial reaction rate, measured
experimentally, is proportional to enzyme concentration
(Chapter 6).
By international agreement, 1.0 unit of enzyme ac-
tivity is defined as the amount of enzyme causing trans-
formation of 1.0 H9262mol of substrate per minute at 25 H11034C
under optimal conditions of measurement. The term
activity refers to the total units of enzyme in a solu-
tion. The specific activity is the number of enzyme
units per milligram of total protein (Fig. 3–23). The spe-
cific activity is a measure of enzyme purity: it increases
during purification of an enzyme and becomes maximal
and constant when the enzyme is pure (Table 3–5).
Chapter 3 Amino Acids, Peptides, and Proteins94
Protein pI
Pepsin H110211.0
Egg albumin 4.6
Serum albumin 4.9
Urease 5.0
H9252-Lactoglobulin 5.2
Hemoglobin 6.8
Myoglobin 7.0
Chymotrypsinogen 9.5
Cytochrome c 10.7
Lysozyme 11.0
The Isoelectric Points
of Some Proteins
TABLE 3–6
8885d_c03_094 12/23/03 10:24 AM Page 94 mac111 mac111:reb:
After each purification step, the activity of the
preparation (in units of enzyme activity) is assayed, the
total amount of protein is determined independently,
and the ratio of the two gives the specific activity. Ac-
tivity and total protein generally decrease with each
step. Activity decreases because some loss always oc-
curs due to inactivation or nonideal interactions with
chromatographic materials or other molecules in the so-
lution. Total protein decreases because the objective is
to remove as much unwanted or nonspecific protein as
possible. In a successful step, the loss of nonspecific pro-
tein is much greater than the loss of activity; therefore,
specific activity increases even as total activity falls. The
data are then assembled in a purification table similar
to Table 3–5. A protein is generally considered pure
when further purification steps fail to increase specific
activity and when only a single protein species can be
detected (for example, by electrophoresis).
For proteins that are not enzymes, other quantifi-
cation methods are required. Transport proteins can be
assayed by their binding to the molecule they transport,
and hormones and toxins by the biological effect they
produce; for example, growth hormones will stimulate
the growth of certain cultured cells. Some structural
proteins represent such a large fraction of a tissue mass
that they can be readily extracted and purified without
a functional assay. The approaches are as varied as the
proteins themselves.
3.3 Working with Proteins 95
Decreasing
pI
Second
dimension
First
dimension
Isoelectric
focusing
Decreasing
M
r
Decreasing
pI(a)
Isoelectric focusing
gel is placed on SDS
polyacrylamide gel.
SDS polyacrylamide
gel electrophoresis
(b)
FIGURE 3–22 Two-dimensional electrophoresis. (a) Proteins are first
separated by isoelectric focusing in a cylindrical gel. The gel is then
laid horizontally on a second, slab-shaped gel, and the proteins are
separated by SDS polyacrylamide gel electrophoresis. Horizontal sep-
aration reflects differences in pI; vertical separation reflects differences
in molecular weight. (b) More than 1,000 different proteins from E.
coli can be resolved using this technique.
FIGURE 3–23 Activity versus specific activity. The difference between
these two terms can be illustrated by considering two beakers of mar-
bles. The beakers contain the same number of red marbles, but dif-
ferent numbers of marbles of other colors. If the marbles represent
proteins, both beakers contain the same activity of the protein repre-
sented by the red marbles. The second beaker, however, has the higher
specific activity because here the red marbles represent a much higher
fraction of the total.
8885d_c03_095 12/23/03 10:24 AM Page 95 mac111 mac111:reb:
SUMMARY 3.3 Working with Proteins
■ Proteins are separated and purified by taking
advantage of differences in their properties.
Proteins can be selectively precipitated by
the addition of certain salts. A wide range of
chromatographic procedures makes use of
differences in size, binding affinities, charge,
and other properties. These include ion-
exchange, size-exclusion, affinity, and high-
performance liquid chromatography.
■ Electrophoresis separates proteins on the basis
of mass or charge. SDS gel electrophoresis and
isoelectric focusing can be used separately or
in combination for higher resolution.
■ All purification procedures require a method for
quantifying or assaying the protein of interest
in the presence of other proteins. Purification
can be monitored by assaying specific activity.
3.4 The Covalent Structure of Proteins
Purification of a protein is usually only a prelude to a
detailed biochemical dissection of its structure and
function. What is it that makes one protein an enzyme,
another a hormone, another a structural protein, and
still another an antibody? How do they differ chemically?
The most obvious distinctions are structural, and these
distinctions can be approached at every level of struc-
ture defined in Figure 3–16.
The differences in primary structure can be espe-
cially informative. Each protein has a distinctive num-
ber and sequence of amino acid residues. As we shall
see in Chapter 4, the primary structure of a protein de-
termines how it folds up into a unique three-dimensional
structure, and this in turn determines the function of
the protein. Primary structure is the focus of the re-
mainder of this chapter. We first consider empirical
clues that amino acid sequence and protein function are
closely linked, then describe how amino acid sequence
is determined; finally, we outline the many uses to which
this information can be put.
The Function of a Protein Depends on
Its Amino Acid Sequence
The bacterium Escherichia coli produces more than
3,000 different proteins; a human produces 25,000 to
35,000. In both cases, each type of protein has a unique
three-dimensional structure and this structure confers
a unique function. Each type of protein also has a unique
amino acid sequence. Intuition suggests that the amino
acid sequence must play a fundamental role in deter-
mining the three-dimensional structure of the protein,
and ultimately its function, but is this supposition cor-
rect? A quick survey of proteins and how they vary in
amino acid sequence provides a number of empirical
clues that help substantiate the important relationship
between amino acid sequence and biological function.
First, as we have already noted, proteins with dif-
ferent functions always have different amino acid se-
quences. Second, thousands of human genetic diseases
have been traced to the production of defective pro-
teins. Perhaps one-third of these proteins are defective
because of a single change in their amino acid sequence;
hence, if the primary structure is altered, the function
of the protein may also be changed. Finally, on com-
paring functionally similar proteins from different
species, we find that these proteins often have similar
amino acid sequences. An extreme case is ubiquitin, a
76-residue protein involved in regulating the degrada-
tion of other proteins. The amino acid sequence of ubiq-
uitin is identical in species as disparate as fruit flies and
humans.
Is the amino acid sequence absolutely fixed, or in-
variant, for a particular protein? No; some flexibility is
possible. An estimated 20% to 30% of the proteins in
humans are polymorphic, having amino acid sequence
variants in the human population. Many of these varia-
tions in sequence have little or no effect on the func-
tion of the protein. Furthermore, proteins that carry out
a broadly similar function in distantly related species can
differ greatly in overall size and amino acid sequence.
Although the amino acid sequence in some regions
of the primary structure might vary considerably with-
out affecting biological function, most proteins contain
crucial regions that are essential to their function and
whose sequence is therefore conserved. The fraction of
the overall sequence that is critical varies from protein
to protein, complicating the task of relating sequence to
three-dimensional structure, and structure to function.
Before we can consider this problem further, however,
we must examine how sequence information is obtained.
The Amino Acid Sequences of Millions
of Proteins Have Been Determined
Two major discoveries in 1953 were of crucial importance
in the history of biochemistry. In that year James D.
Watson and Francis Crick deduced the double-helical
structure of DNA and proposed a structural basis for its
precise replication (Chapter 8). Their proposal illumi-
nated the molecular reality behind the idea of a gene.
In that same year, Frederick Sanger worked out the se-
quence of amino acid residues in the polypeptide chains
of the hormone insulin (Fig. 3–24), surprising many
researchers who had long thought that elucidation of
the amino acid sequence of a polypeptide would be a
hopelessly difficult task. It quickly became evident that
the nucleotide sequence in DNA and the amino acid
sequence in proteins were somehow related. Barely a
decade after these discoveries, the role of the nucleotide
Chapter 3 Amino Acids, Peptides, and Proteins96
8885d_c03_096 12/23/03 10:24 AM Page 96 mac111 mac111:reb:
sequence of DNA in determining the amino acid se-
quence of protein molecules was revealed (Chapter 27).
An enormous number of protein sequences can now be
derived indirectly from the DNA sequences in the rapidly
growing genome databases. However, many are still de-
duced by traditional methods of polypeptide sequencing.
The amino acid sequences of thousands of different
proteins from many species have been determined us-
ing principles first developed by Sanger. These methods
are still in use, although with many variations and im-
provements in detail. Chemical protein sequencing now
complements a growing list of newer methods, provid-
ing multiple avenues to obtain amino acid sequence
data. Such data are now critical to every area of bio-
chemical investigation.
Short Polypeptides Are Sequenced
Using Automated Procedures
Various procedures are used to analyze protein primary
structure. Several protocols are available to label and
identify the amino-terminal amino acid residue (Fig.
3–25a). Sanger developed the reagent 1-fluoro-2,4-
dinitrobenzene (FDNB) for this purpose; other reagents
used to label the amino-terminal residue, dansyl chlo-
ride and dabsyl chloride, yield derivatives that are more
easily detectable than the dinitrophenyl derivatives. Af-
ter the amino-terminal residue is labeled with one of
these reagents, the polypeptide is hydrolyzed to its con-
stituent amino acids and the labeled amino acid is iden-
tified. Because the hydrolysis stage destroys the
polypeptide, this procedure cannot be used to sequence
a polypeptide beyond its amino-terminal residue. How-
ever, it can help determine the number of chemically
distinct polypeptides in a protein, provided each has a
different amino-terminal residue. For example, two
residues—Phe and Gly—would be labeled if insulin (Fig.
3–24) were subjected to this procedure.
3.4 The Covalent Structure of Proteins 97
Frederick Sanger
Gly
Ile
Val
Gln
Gln
Cys
Cys
Ala
Val
Cys
Ser Val
Ser
Gly
Phe
Phe
Tyr
Thr
Pro
Lys
B chain
SS
S
S
S
5
10
20
25
20
15
30
5
10
15
S
H11001
NH
3
Ala
COO
H11002
A chain
Phe
Val
Asn
His
Gln
Leu
Cys
Gly
Ser
H11001
NH
3
His
Leu
Glu
Ala
Leu
Tyr
Leu
Val
Cys
Gly
Glu
Arg
Leu
Tyr
Gln
Leu
Glu
Asn
Tyr
Cys
Asn
COO
H11002
FIGURE 3–24 Amino acid sequence
of bovine insulin. The two polypeptide
chains are joined by disulfide cross-
linkages. The A chain is identical in
human, pig, dog, rabbit, and sperm
whale insulins. The B chains of the
cow, pig, dog, goat, and horse are
identical.
O
2
ClN
G
CH
3
D
CH
3
NPN
Dabsyl chloride
S
N
G
CH
3
D
CH
3
Dansyl chloride
O
2
ClS
8885d_c03_097 12/23/03 10:24 AM Page 97 mac111 mac111:reb:
To sequence an entire polypeptide, a chemical
method devised by Pehr Edman is usually employed.
The Edman degradation procedure labels and re-
moves only the amino-terminal residue from a peptide,
leaving all other peptide bonds intact (Fig. 3–25b). The
peptide is reacted with phenylisothiocyanate under
mildly alkaline conditions, which converts the amino-
terminal amino acid to a phenylthiocarbamoyl (PTC)
adduct. The peptide bond next to the PTC adduct is
then cleaved in a step carried out in anhydrous trifluo-
roacetic acid, with removal of the amino-terminal amino
acid as an anilinothiazolinone derivative. The deriva-
tized amino acid is extracted with organic solvents, con-
verted to the more stable phenylthiohydantoin deriva-
tive by treatment with aqueous acid, and then identified.
The use of sequential reactions carried out under first
basic and then acidic conditions provides control over
the entire process. Each reaction with the amino-
terminal amino acid can go essentially to completion
without affecting any of the other peptide bonds in the
peptide. After removal and identification of the amino-
terminal residue, the new amino-terminal residue so
exposed can be labeled, removed, and identified
through the same series of reactions. This procedure is
repeated until the entire sequence is determined. The
Edman degradation is carried out on a machine, called
a sequenator, that mixes reagents in the proper pro-
portions, separates the products, identifies them, and
records the results. These methods are extremely sen-
sitive. Often, the complete amino acid sequence can be
determined starting with only a few micrograms of
protein.
The length of polypeptide that can be accurately
sequenced by the Edman degradation depends on the
Chapter 3 Amino Acids, Peptides, and Proteins98
Polypeptide
(b)
H11001 amino
acids
R
1
C
NH
C
HN
R
2
C
CO
H
O
HR
1
C
NH
COO
H11002
H
2,4-Dinitro-
phenyl
derivative
of polypeptide
2,4-Dinitrophenyl
derivative
of amino-terminal
residue
N
C
S
cyanate
phenylisothio-
H11002
OH
N
C
HN:
R
1
C
O
+
NH
2
CF
3
COOH
R
2
C
C
PTC adduct
O
H
H
S
H
C
N
C
NH
S
HC
R
1
Phenylthiohydantoin
derivative of amino
acid residue
NO
2
NO
2
Identify amino-terminal
residue of polypeptide.
Identify amino-terminal
residue; purify and recycle
remaining peptide fragment
through Edman process.
NO
2
F
FDNB
NO
2
NO
2
NO
2
(a)
Free
O
6 M HCl
C
H
+
N
C
NH
S
CHC
R
1
O
Anilinothiazolinone
derivative of amino
acid residue
Shortened
peptide
R
H11001
O
C
H
C
H
N
H
C
2
O
CH
3
N
R
3
FIGURE 3–25 Steps in sequencing a polypeptide. (a) Identification of
the amino-terminal residue can be the first step in sequencing a
polypeptide. Sanger’s method for identifying the amino-terminal
residue is shown here. (b) The Edman degradation procedure reveals
the entire sequence of a peptide. For shorter peptides, this method
alone readily yields the entire sequence, and step (a) is often omit-
ted. Step (a) is useful in the case of larger polypeptides, which are of-
ten fragmented into smaller peptides for sequencing (see Fig. 3–27).
8885d_c03_098 12/23/03 10:25 AM Page 98 mac111 mac111:reb:
efficiency of the individual chemical steps. Consider a
peptide beginning with the sequence Gly–Pro–Lys– at
its amino terminus. If glycine were removed with 97%
efficiency, 3% of the polypeptide molecules in the solu-
tion would retain a Gly residue at their amino terminus.
In the second Edman cycle, 97% of the liberated amino
acids would be proline, and 3% glycine, while 3% of the
polypeptide molecules would retain Gly (0.1%) or Pro
(2.9%) residues at their amino terminus. At each cycle,
peptides that did not react in earlier cycles would con-
tribute amino acids to an ever-increasing background,
eventually making it impossible to determine which
amino acid is next in the original peptide sequence.
Modern sequenators achieve efficiencies of better than
99% per cycle, permitting the sequencing of more than
50 contiguous amino acid residues in a polypeptide. The
primary structure of insulin, worked out by Sanger and
colleagues over a period of 10 years, could now be com-
pletely determined in a day or two.
Large Proteins Must Be Sequenced
in Smaller Segments
The overall accuracy of amino acid sequencing gener-
ally declines as the length of the polypeptide increases.
The very large polypeptides found in proteins must be
broken down into smaller pieces to be sequenced effi-
ciently. There are several steps in this process. First, the
protein is cleaved into a set of specific fragments by
chemical or enzymatic methods. If any disulfide bonds
are present, they must be broken. Each fragment is pu-
rified, then sequenced by the Edman procedure. Finally,
the order in which the fragments appear in the original
protein is determined and disulfide bonds (if any) are
located.
Breaking Disulfide Bonds Disulfide bonds interfere with
the sequencing procedure. A cystine residue (Fig. 3–7)
that has one of its peptide bonds cleaved by the Edman
procedure may remain attached to another polypeptide
strand via its disulfide bond. Disulfide bonds also inter-
fere with the enzymatic or chemical cleavage of the
polypeptide. Two approaches to irreversible breakage of
disulfide bonds are outlined in Figure 3–26.
Cleaving the Polypeptide Chain Several methods can be
used for fragmenting the polypeptide chain. Enzymes
called proteases catalyze the hydrolytic cleavage of
peptide bonds. Some proteases cleave only the peptide
bond adjacent to particular amino acid residues (Table
3–7) and thus fragment a polypeptide chain in a pre-
dictable and reproducible way. A number of chemical
reagents also cleave the peptide bond adjacent to spe-
cific residues.
Among proteases, the digestive enzyme trypsin cat-
alyzes the hydrolysis of only those peptide bonds in
which the carbonyl group is contributed by either a Lys
or an Arg residue, regardless of the length or amino acid
sequence of the chain. The number of smaller peptides
produced by trypsin cleavage can thus be predicted
3.4 The Covalent Structure of Proteins 99
Disulfide bond
(cystine)
HC
NH
CO
CH
2
SSCH
2
C
OC
HN
H
oxidation
by
reduction
by
performic
acid
dithiothreitol
HC
NH
CO
CH
2
S
O
O
O
H11002H11002
OS
O
O
CH
2
C
OC
HN
HHC
NH
CO
CH
2
SH HS CH
2
C
OC
HN
H
Cysteic acid
residues
acetylation
by
iodoacetate
HC
NH
CO
CH
2
SCH
2
COO
H11002H11002
OOC CH
2
SCH
2
C
OC
HN
H
Acetylated
cysteine
residues
CH
2
SH
CHOH
CHOH
CH
2
SH
Dithiothreitol (DTT)
FIGURE 3–26 Breaking disulfide bonds in proteins. Two common
methods are illustrated. Oxidation of a cystine residue with performic
acid produces two cysteic acid residues. Reduction by dithiothreitol
to form Cys residues must be followed by further modification of
the reactive OSH groups to prevent re-formation of the disulfide
bond. Acetylation by iodoacetate serves this purpose.
8885d_c03_099 12/23/03 10:25 AM Page 99 mac111 mac111:reb:
from the total number of Lys or Arg residues in the orig-
inal polypeptide, as determined by hydrolysis of an in-
tact sample (Fig. 3–27). A polypeptide with five Lys
and/or Arg residues will usually yield six smaller pep-
tides on cleavage with trypsin. Moreover, all except one
of these will have a carboxyl-terminal Lys or Arg. The
fragments produced by trypsin (or other enzyme or
chemical) action are then separated by chromato-
graphic or electrophoretic methods.
Sequencing of Peptides Each peptide fragment resulting
from the action of trypsin is sequenced separately by
the Edman procedure.
Ordering Peptide Fragments The order of the “trypsin
fragments” in the original polypeptide chain must now
be determined. Another sample of the intact polypep-
tide is cleaved into fragments using a different enzyme
or reagent, one that cleaves peptide bonds at points
other than those cleaved by trypsin. For example,
cyanogen bromide cleaves only those peptide bonds in
which the carbonyl group is contributed by Met. The
fragments resulting from this second procedure are then
separated and sequenced as before.
The amino acid sequences of each fragment ob-
tained by the two cleavage procedures are examined,
with the objective of finding peptides from the second
procedure whose sequences establish continuity, be-
cause of overlaps, between the fragments obtained by
the first cleavage procedure (Fig. 3–27). Overlapping
peptides obtained from the second fragmentation yield
the correct order of the peptide fragments produced in
the first. If the amino-terminal amino acid has been iden-
tified before the original cleavage of the protein, this in-
formation can be used to establish which fragment is
derived from the amino terminus. The two sets of frag-
ments can be compared for possible errors in deter-
mining the amino acid sequence of each fragment. If
the second cleavage procedure fails to establish conti-
nuity between all peptides from the first cleavage, a
third or even a fourth cleavage method must be used to
obtain a set of peptides that can provide the necessary
overlap(s).
Locating Disulfide Bonds If the primary structure in-
cludes disulfide bonds, their locations are determined
in an additional step after sequencing is completed. A
sample of the protein is again cleaved with a reagent
such as trypsin, this time without first breaking the
disulfide bonds. The resulting peptides are separated by
electrophoresis and compared with the original set of
peptides generated by trypsin. For each disulfide bond,
two of the original peptides will be missing and a new,
larger peptide will appear. The two missing peptides
represent the regions of the intact polypeptide that are
linked by the disulfide bond.
Amino Acid Sequences Can Also Be Deduced
by Other Methods
The approach outlined above is not the only way to de-
termine amino acid sequences. New methods based on
mass spectrometry permit the sequencing of short
polypeptides (20 to 30 amino acid residues) in just a
few minutes (Box 3–2). In addition, with the develop-
ment of rapid DNA sequencing methods (Chapter 8),
the elucidation of the genetic code (Chapter 27), and
the development of techniques for isolating genes
(Chapter 9), researchers can deduce the sequence of a
polypeptide by determining the sequence of nucleotides
in the gene that codes for it (Fig. 3–28). The techniques
used to determine protein and DNA sequences are com-
plementary. When the gene is available, sequencing the
DNA can be faster and more accurate than sequencing
the protein. Most proteins are now sequenced in this in-
direct way. If the gene has not been isolated, direct se-
quencing of peptides is necessary, and this can provide
information (the location of disulfide bonds, for exam-
ple) not available in a DNA sequence. In addition, a
knowledge of the amino acid sequence of even a part of
a polypeptide can greatly facilitate the isolation of the
corresponding gene (Chapter 9).
The array of methods now available to analyze both
proteins and nucleic acids is ushering in a new disci-
Chapter 3 Amino Acids, Peptides, and Proteins100
*All reagents except cyanogen bromide are proteases. All are available from commercial sources.
?
Residues furnishing the primary recognition point for the protease or reagent; peptide bond
cleavage occurs on either the carbonyl (C) or the amino (N) side of the indicated amino acid
residues.
Reagent (biological source)* Cleavage points
?
Trypsin Lys, Arg (C)
(bovine pancreas)
Submaxillarus protease Arg (C)
(mouse submaxillary gland)
Chymotrypsin Phe, Trp, Tyr (C)
(bovine pancreas)
Staphylococcus aureus V8 protease Asp, Glu (C)
(bacterium S. aureus)
Asp-N-protease Asp, Glu (N)
(bacterium Pseudomonas fragi)
Pepsin Phe, Trp, Tyr (N)
(porcine stomach)
Endoproteinase Lys C Lys (C)
(bacterium Lysobacter
enzymogenes)
Cyanogen bromide Met (C)
The Specificity of Some Common
Methods for Fragmenting Polypeptide Chains
TABLE 3–7
8885d_c03_100 12/23/03 10:25 AM Page 100 mac111 mac111:reb:
pline of “whole cell biochemistry.” The complete se-
quence of an organism’s DNA, its genome, is now avail-
able for organisms ranging from viruses to bacteria to
multicellular eukaryotes (see Table 1–4). Genes are be-
ing discovered by the millions, including many that en-
code proteins with no known function. To describe the
entire protein complement encoded by an organism’s
DNA, researchers have coined the term proteome. As
described in Chapter 9, the new disciplines of genomics
and proteomics are complementing work carried out
on cellular intermediary metabolism and nucleic acid
metabolism to provide a new and increasingly complete
picture of biochemistry at the level of cells and even
organisms.
3.4 The Covalent Structure of Proteins 101
hydrolyze; separate
amino acids
Result
A5
I3
R1
C2
K2
S2
D4
L2
T1
E2
M2F1
G3 P3
Y2
H2
Conclusion
Polypeptide has 38
amino acid residues. Tryp-
sin will cleave three times
(at one R (Arg) and two
K (Lys)) to give four frag-
ments. Cyanogen bromide
will cleave at two
M (Met) to give three
fragments.
Polypeptide
react with FDNB; hydrolyze;
separate amino acids
2,4-Dinitrophenylglutamate
detected
E (Glu) is amino-
terminal residue.
reduce
disulfide
bonds (if present)
by Edman degradation
separate fragments; sequence
cleave with trypsin;
T-1 GASMALIK
T-2 EGAAYHDFEPIDPR
T-3 DCVHSD
T-4 YLIACGPMTK
T-2
begins with E (Glu).
T-3
terminus because it
does not end with
R (Arg) or K (Lys).
sequence by Edman degradation
bromide; separate fragments;
cleave with cyanogen
C-1 EGAAYHDFEPIDPRGASM
C-3 ALIKYLIACGPM
C-3
them to be ordered.
sequence
establish
Amino Carboxyl
terminus terminus
T-2
EGAAYHDFEPIDPRGASMALIKYLIACGPMTKDCVHSD
C-1
Procedure
C-2 TKDCVHSD
T-3T-4T-1
C-3 C-2
SH
V1
terminus because it
T-1
HS
placed at amino
placed at carboxyl
and T-4 , allowing
overlaps with
S
S
FIGURE 3–27 Cleaving proteins and sequencing and ordering the
peptide fragments. First, the amino acid composition and amino-
terminal residue of an intact sample are determined. Then any disulfide
bonds are broken before fragmenting so that sequencing can proceed
efficiently. In this example, there are only two Cys (C) residues and
thus only one possibility for location of the disulfide bond. In polypep-
tides with three or more Cys residues, the position of disulfide bonds
can be determined as described in the text. (The one-letter symbols
for amino acids are given in Table 3–1.)
sequence (protein) Gln–Tyr–Pro–Thr–Ile–Trp
DNA sequence (gene) CAGTATCCTACGATTTGG
Amino acid
FIGURE 3–28 Correspondence of DNA and amino acid sequences.
Each amino acid is encoded by a specific sequence of three nucleo-
tides in DNA. The genetic code is described in detail in Chapter 27.
8885d_c03_101 12/23/03 10:26 AM Page 101 mac111 mac111:reb:
Chapter 3 Amino Acids, Peptides, and Proteins102
BOX 3–2 WORKING IN BIOCHEMISTRY
Investigating Proteins with Mass Spectrometry
The mass spectrometer has long been an indispensa-
ble tool in chemistry. Molecules to be analyzed, re-
ferred to as analytes, are first ionized in a vacuum.
When the newly charged molecules are introduced
into an electric and/or magnetic field, their paths
through the field are a function of their mass-to-charge
ratio, m/z. This measured property of the ionized
species can be used to deduce the mass (M) of the
analyte with very high precision.
Although mass spectrometry has been in use for
many years, it could not be applied to macromolecules
such as proteins and nucleic acids. The m/z meas-
urements are made on molecules in the gas phase, and
the heating or other treatment needed to transfer a
macromolecule to the gas phase usually caused its
rapid decomposition. In 1988, two different tech-
niques were developed to overcome this problem. In
one, proteins are placed in a light-absorbing matrix.
With a short pulse of laser light, the proteins are ion-
ized and then desorbed from the matrix into the vac-
uum system. This process, known as matrix-assisted
laser desorption/ionization mass spectrometry,
or MALDI MS, has been successfully used to meas-
ure the mass of a wide range of macromolecules. In a
second and equally successful method, macromole-
cules in solution are forced directly from the liquid to
gas phase. A solution of analytes is passed through a
charged needle that is kept at a high electrical po-
tential, dispersing the solution into a fine mist of
charged microdroplets. The solvent surrounding the
macromolecules rapidly evaporates, and the resulting
multiply charged macromolecular ions are thus intro-
duced nondestructively into the gas phase. This tech-
nique is called electrospray ionization mass spec-
trometry, or ESI MS. Protons added during passage
through the needle give additional charge to the
macromolecule. The m/z of the molecule can be ana-
lyzed in the vacuum chamber.
Mass spectrometry provides a wealth of informa-
tion for proteomics research, enzymology, and protein
chemistry in general. The techniques require only
miniscule amounts of sample, so they can be readily
applied to the small amounts of protein that can be
extracted from a two-dimensional electrophoretic gel.
The accurately measured molecular mass of a protein
is one of the critical parameters in its identification.
Once the mass of a protein is accurately known, mass
spectrometry is a convenient and accurate method for
detecting changes in mass due to the presence of
bound cofactors, bound metal ions, covalent modifi-
cations, and so on.
The process for determining the molecular mass
of a protein with ESI MS is illustrated in Figure 1. As
it is injected into the gas phase, a protein acquires a
variable number of protons, and thus positive charges,
from the solvent. This creates a spectrum of species
with different mass-to-charge ratios. Each successive
peak corresponds to a species that differs from that
100
50+
75
50
Relative intensity (%)
25
0
800 1,000 1,200
m/z
40+
100
50
0
47,000 48,000
47,342
30+
1,400 1,600
M
r
Mass
spectrometer
Vacuum
interface
Glass
capillary
Sample
solution
High
voltage
+
(b)
(a)
FIGURE 1 Electrospray mass spectrometry of a protein. (a) A pro-
tein solution is dispersed into highly charged droplets by passage
through a needle under the influence of a high-voltage electric field.
The droplets evaporate, and the ions (with added protons in this
case) enter the mass spectrometer for m/z measurement. The spec-
trum generated (b) is a family of peaks, with each successive peak
(from right to left) corresponding to a charged species increased by
1 in both mass and charge. A computer-generated transformation of
this spectrum is shown in the inset.
8885d_c03_102 12/23/03 10:26 AM Page 102 mac111 mac111:reb:
3.4 The Covalent Structure of Proteins 103
of its neighboring peak by a charge difference of 1 and
a mass difference of 1 (1 proton). The mass of the
protein can be determined from any two neighboring
peaks. The measured m/z of one peak is
(m/z)
2
H11005
where M is the mass of the protein, n
2
is the number
of charges, and X is the mass of the added groups
(protons in this case). Similarly for the neighboring
peak,
(m/z)
1
H11005
We now have two unknowns (M and n
2
) and two equa-
tions. We can solve first for n
2
and then for M:
n
2
H11005
M H11005 n
2
[(m/z)
2
H11002 X]
This calculation using the m/z values for any two
peaks in a spectrum such as that shown in Figure 1b
usually provides the mass of the protein (in this case,
aerolysin k; 47,342 Da) with an error of only H110060.01%.
Generating several sets of peaks, repeating the calcu-
lation, and averaging the results generally provides an
even more accurate value for M. Computer algorithms
can transform the m/z spectrum into a single peak that
also provides a very accurate mass measurement (Fig.
1b, inset).
Mass spectrometry can also be used to sequence
short stretches of polypeptide, an application that has
emerged as an invaluable tool for quickly identifying
unknown proteins. Sequence information is extracted
using a technique called tandem MS, or MS/MS. A
solution containing the protein under investigation is
first treated with a protease or chemical reagent to
hydrolyze it to a mixture of shorter peptides. The mix-
ture is then injected into a device that is essentially
two mass spectrometers in tandem (Fig. 2a, top). In
the first, the peptide mixture is sorted and the ion-
ized fragments are manipulated so that only one of the
several types of peptides produced by cleavage
emerges at the other end. The sample of the selected
(m/z)
2
H11002 X
H5007H5007
(m/z)
2
H11002 (m/z)
1
M H11001 (n
2
H11001 1)X
H5007H5007
n
2
H11001 1
M H11001 n
2X
H5007H5007
n
2
100
Relative intensity (%)
75
50
25
0
200
y
1
H11033
y
2
H11033
y
3
H11033
y
4
H11033
y
5
H11033
y
6
H11033
y
7
H11033
y
8
H11033
y
9
H11033
400 600
m/z
800 1,000
R
1
R
2
C
H
H
2
N
R
3
C
H
O
O
C
O
b
y
CN C
H
R
4
C
H
C
H
N
H
O
O
O
–
C CN
H
R
5
C
H
N
H
R
1
R
2
C
H
H
2
N
R
3
C
H
O
O
C
O
CN C
H
R
4
C
H
C
H
N
H
O
O
O
–
C CN
H
R
5
C
H
N
H
(a)
(b)
MS-2 DetectorMS-1
Collision
cell
SeparationElectrospray
ionization
Breakage
FIGURE 2 Obtaining protein sequence information with tandem
MS. (a) After proteolytic hydrolysis, a protein solution is injected
into a mass spectrometer (MS-1). The different peptides are sorted
so that only one type is selected for further analysis. The selected
peptide is further fragmented in a chamber between the two mass
spectrometers, and m/z for each fragment is measured in the sec-
ond mass spectrometer (MS-2). Many of the ions generated during
this second fragmentation result from breakage of the peptide bond,
as shown. These are called b-type or y-type ions, depending on
whether the charge is retained on the amino- or carboxyl-terminal
side, respectively. (b) A typical spectrum with peaks representing
the peptide fragments generated from a sample of one small pep-
tide (10 residues). The labeled peaks are y-type ions. The large peak
next to y
5
H11033 is a doubly charged ion and is not part of the y set. The
successive peaks differ by the mass of a particular amino acid in
the original peptide. In this case, the deduced sequence was
Phe–Pro–Gly–Gln–(Ile/Leu)–Asn–Ala–Asp–(Ile/Leu)–Arg. Note the
ambiguity about Ile and Leu residues, because they have the same
molecular mass. In this example, the set of peaks derived from y-type
ions predominates, and the spectrum is greatly simplified as a re-
sult. This is because an Arg residue occurs at the carboxyl terminus
of the peptide, and most of the positive charges are retained on this
residue.
(continued on next page)
8885d_c03_103 12/23/03 10:26 AM Page 103 mac111 mac111:reb:
Small Peptides and Proteins Can Be
Chemically Synthesized
Many peptides are potentially useful as pharmacologic
agents, and their production is of considerable com-
mercial importance. There are three ways to obtain a
peptide: (1) purification from tissue, a task often made
difficult by the vanishingly low concentrations of some
peptides; (2) genetic engineering (Chapter 9); or (3) di-
rect chemical synthesis. Powerful techniques now make
direct chemical synthesis an attractive option in many
cases. In addition to commercial applications, the syn-
thesis of specific peptide portions of larger proteins is
an increasingly important tool for the study of protein
structure and function.
The complexity of proteins makes the traditional
synthetic approaches of organic chemistry impractical
for peptides with more than four or five amino acid
residues. One problem is the difficulty of purifying the
product after each step.
The major breakthrough in this technology was
provided by R. Bruce Merrifield in 1962. His innovation
involved synthesizing a peptide while keeping it at-
tached at one end to a solid support. The support is an
insoluble polymer (resin) contained within a column,
similar to that used for chromatographic procedures.
The peptide is built up on this support one amino acid
at a time using a standard set of reactions in a repeat-
ing cycle (Fig. 3–29). At each successive step in the
cycle, protective chemical groups block unwanted
reactions.
The technology for chemical peptide synthesis is
now automated. As in the sequencing reactions already
considered, the most important limitation of the process
is the efficiency of each chemical cycle, as can be seen
by calculating the overall yields of peptides of various
Chapter 3 Amino Acids, Peptides, and Proteins104
peptide, each molecule of which has a charge some-
where along its length, then travels through a vacuum
chamber between the two mass spectrometers. In this
collision cell, the peptide is further fragmented by
high-energy impact with a “collision gas,” a small
amount of a noble gas such as helium or argon that is
bled into the vacuum chamber. This procedure is de-
signed to fragment many of the peptide molecules in
the sample, with each individual peptide broken in
only one place, on average. Most breaks occur at pep-
tide bonds. This fragmentation does not involve the
addition of water (it is done in a near-vacuum), so the
products may include molecular ion radicals such as
carbonyl radicals (Fig. 2a, bottom). The charge on the
original peptide is retained on one of the fragments
generated from it.
The second mass spectrometer then measures the
m/z ratios of all the charged fragments (uncharged
fragments are not detected). This generates one or
more sets of peaks. A given set of peaks (Fig. 2b) con-
sists of all the charged fragments that were generated
by breaking the same type of bond (but at different
points in the peptide) and are derived from the same
side of the bond breakage, either the carboxyl- or
amino-terminal side. Each successive peak in a given
set has one less amino acid than the peak before. The
difference in mass from peak to peak identifies the
amino acid that was lost in each case, thus revealing
the sequence of the peptide. The only ambiguities in-
volve leucine and isoleucine, which have the same mass.
The charge on the peptide can be retained on ei-
ther the carboxyl- or amino-terminal fragment, and
bonds other than the peptide bond can be broken in
the fragmentation process, with the result that multi-
ple sets of peaks are usually generated. The two most
prominent sets generally consist of charged fragments
derived from breakage of the peptide bonds. The set
consisting of the carboxyl-terminal fragments can be
unambiguously distinguished from that consisting of
the amino-terminal fragments. Because the bond
breaks generated between the spectrometers (in the
collision cell) do not yield full carboxyl and amino
groups at the sites of the breaks, the only intact H9251-
amino and H9251-carboxyl groups on the peptide frag-
ments are those at the very ends (Fig. 2a). The two
sets of fragments can thereby be identified by the re-
sulting slight differences in mass. The amino acid se-
quence derived from one set can be confirmed by the
other, improving the confidence in the sequence in-
formation obtained.
Even a short sequence is often enough to permit
unambiguous association of a protein with its gene, if
the gene sequence is known. Sequencing by mass
spectrometry cannot replace the Edman degradation
procedure for the sequencing of long polypeptides,
but it is ideal for proteomics research aimed at cata-
loging the hundreds of cellular proteins that might be
separated on a two-dimensional gel. In the coming
decades, detailed genomic sequence data will be avail-
able from hundreds, eventually thousands, of organ-
isms. The ability to rapidly associate proteins with
genes using mass spectrometry will greatly facilitate
the exploitation of this extraordinary information
resource.
BOX 3–2 WORKING IN BIOCHEMISTRY (continued from previous page)
8885d_c03_104 1/16/04 6:08 AM Page 104 mac76 mac76:385_reb:
lengths when the yield for addition of each new amino
acid is 96.0% versus 99.8% (Table 3–8). Incomplete re-
action at one stage can lead to formation of an impurity
(in the form of a shorter peptide) in the next. The
chemistry has been optimized to permit the synthesis
of proteins of 100 amino acid residues in a few days in
reasonable yield. A very similar approach is used to
synthesize nucleic acids (see Fig. 8–38). It is worth not-
ing that this technology, impressive as it is, still pales
when compared with biological processes. The same
3.4 The Covalent Structure of Proteins 105
-amino group protected
Amino acid 1 with
by Fmoc group
Cl CH
2
Insoluble
polystyrene
bead
N
H
C
R
1
HC
Cl
H11002
N
H
C
R
1
HC
O
OCH
2
3
amino acid to reactive
Protecting group is removed
by flushing with solution
containing a mild organic base.
-Amino group of amino
acid 1 attacks activated
2 to form peptide bond.
N
H
C
O
N
H
Dicyclohexylurea byproduct
to
repeated as necessary
Completed peptide is
deprotected as in
ester linkage between
peptide and resin.
N
H
C
R
2
HC
O
O
H11002
Amino acid 2 with
protected
-amino group is
activated at
carboxyl group
by DCC.
H
3
C
R
1
HC
O
OOCH
2
C
Dicyclohexylcarbodiimide
(DCC)
N
H
C
R
2
HC
O
OC
NH
N
N
H
C
R
2
HC
O
N
H
C
R
1
HC
O
OCH
2
H
3
N
H11001
N
H11001
C
R
HC
O
N
H
C
R
2
HC
O
O
H11002
H11001 FCH
2
HF
Attachment of carboxyl-terminal
1
2
5
4
4
Reactions
carboxyl group of amino acid
group on resin.
reaction 2
2
; HF cleaves
1
O
O
H11002
NN
Fmoc
Fmoc
Fmoc
Fmoc
Fmoc
H9251
H9251
H9251
R. Bruce Merrifield
FIGURE 3–29 Chemical synthesis of a peptide on an insoluble polymer support.
Reactions 1 through 4 are necessary for the formation of each peptide bond.
The 9-fluorenylmethoxycarbonyl (Fmoc) group (shaded blue) prevents unwanted
reactions at the H9251-amino group of the residue (shaded red). Chemical synthesis
proceeds from the carboxyl terminus to the amino terminus, the reverse of the
direction of protein synthesis in vivo (Chapter 27).
O
H11002
R
1
CH
2
CHON
H
C
O
Fmoc
Amino acid
residue
C
O
8885d_c03_105 12/23/03 10:27 AM Page 105 mac111 mac111:reb:
100-amino-acid protein would be synthesized with ex-
quisite fidelity in about 5 seconds in a bacterial cell.
A variety of new methods for the efficient ligation
(joining together) of peptides has made possible the as-
sembly of synthetic peptides into larger proteins. With
these methods, novel forms of proteins can be created
with precisely positioned chemical groups, including
those that might not normally be found in a cellular pro-
tein. These novel forms provide new ways to test theo-
ries of enzyme catalysis, to create proteins with new
chemical properties, and to design protein sequences
that will fold into particular structures. This last appli-
cation provides the ultimate test of our increasing abil-
ity to relate the primary structure of a peptide to the
three-dimensional structure that it takes up in solution.
Amino Acid Sequences Provide Important
Biochemical Information
Knowledge of the sequence of amino acids in a protein
can offer insights into its three-dimensional structure
and its function, cellular location, and evolution. Most
of these insights are derived by searching for similari-
ties with other known sequences. Thousands of se-
quences are known and available in databases accessi-
ble through the Internet. A comparison of a newly
obtained sequence with this large bank of stored se-
quences often reveals relationships both surprising and
enlightening.
Exactly how the amino acid sequence determines
three-dimensional structure is not understood in detail,
nor can we always predict function from sequence.
However, protein families that have some shared struc-
tural or functional features can be readily identified on
the basis of amino acid sequence similarities. Individual
proteins are assigned to families based on the degree of
similarity in amino acid sequence. Members of a family
are usually identical across 25% or more of their se-
quences, and proteins in these families generally share
at least some structural and functional characteristics.
Some families are defined, however, by identities in-
volving only a few amino acid residues that are critical
to a certain function. A number of similar substructures
(to be defined in Chapter 4 as “domains”) occur in many
functionally unrelated proteins. These domains often
fold into structural configurations that have an unusual
degree of stability or that are specialized for a certain
environment. Evolutionary relationships can also be in-
ferred from the structural and functional similarities
within protein families.
Certain amino acid sequences serve as signals that
determine the cellular location, chemical modification,
and half-life of a protein. Special signal sequences, usu-
ally at the amino terminus, are used to target certain
proteins for export from the cell; other proteins are tar-
geted for distribution to the nucleus, the cell surface,
the cytosol, and other cellular locations. Other se-
quences act as attachment sites for prosthetic groups,
such as sugar groups in glycoproteins and lipids in
lipoproteins. Some of these signals are well character-
ized and are easily recognized in the sequence of a newly
characterized protein (Chapter 27).
SUMMARY 3.4 The Covalent Structure of Proteins
■ Differences in protein function result from
differences in amino acid composition and
sequence. Some variations in sequence are
possible for a particular protein, with little or
no effect on function.
■ Amino acid sequences are deduced by
fragmenting polypeptides into smaller peptides
using reagents known to cleave specific peptide
bonds; determining the amino acid sequence
of each fragment by the automated Edman
degradation procedure; then ordering the
peptide fragments by finding sequence overlaps
between fragments generated by different
reagents. A protein sequence can also be
deduced from the nucleotide sequence of its
corresponding gene in DNA.
■ Short proteins and peptides (up to about 100
residues) can be chemically synthesized. The
peptide is built up, one amino acid residue at
a time, while remaining tethered to a solid
support.
3.5 Protein Sequences and Evolution
The simple string of letters denoting the amino acid se-
quence of a given protein belies the wealth of informa-
tion this sequence holds. As more protein sequences
have become available, the development of more pow-
erful methods for extracting information from them has
become a major biochemical enterprise. Each protein’s
function relies on its three-dimensional structure, which
Chapter 3 Amino Acids, Peptides, and Proteins106
TABLE 3–8
Overall yield of final peptide (%)
Number of residues in
when the yield of each step is:
the final polypeptide 96.0% 99.8%
11 66 98
21 44 96
31 29 94
51 13 90
100 1.7 82
Effect of Stepwise Yield on Overall
Yield in Peptide Synthesis
8885d_c03_106 12/23/03 10:27 AM Page 106 mac111 mac111:reb:
in turn is determined largely by its primary structure.
Thus, the biochemical information conveyed by a pro-
tein sequence is in principle limited only by our own un-
derstanding of structural and functional principles. On
a different level of inquiry, protein sequences are be-
ginning to tell us how the proteins evolved and, ulti-
mately, how life evolved on this planet.
Protein Sequences Can Elucidate
the History of Life on Earth
The field of molecular evolution is often traced to Emile
Zuckerkandl and Linus Pauling, whose work in the mid-
1960s advanced the use of nucleotide and protein se-
quences to explore evolution. The premise is deceptively
straightforward. If two organisms are closely related, the
sequences of their genes and proteins should be simi-
lar. The sequences increasingly diverge as the evolu-
tionary distance between two organisms increases. The
promise of this approach began to be realized in the
1970s, when Carl Woese used ribosomal RNA sequences
to define archaebacteria as a group of living organisms
distinct from other bacteria and eukaryotes (see Fig.
1–4). Protein sequences offer an opportunity to greatly
refine the available information. With the advent of
genome projects investigating organisms from bacteria
to humans, the number of available sequences is grow-
ing at an enormous rate. This information can be used
to trace biological history. The challenge is in learning
to read the genetic hieroglyphics.
Evolution has not taken a simple linear path. Com-
plexities abound in any attempt to mine the evolution-
ary information stored in protein sequences. For a given
protein, the amino acid residues essential for the activ-
ity of the protein are conserved over evolutionary time.
The residues that are less important to function may
vary over time—that is, one amino acid may substitute
for another—and these variable residues can provide
the information used to trace evolution. Amino acid sub-
stitutions are not always random, however. At some po-
sitions in the primary structure, the need to maintain
protein function may mean that only particular amino
acid substitutions can be tolerated. Some proteins have
more variable amino acid residues than others. For these
and other reasons, proteins can evolve at different rates.
Another complicating factor in tracing evolutionary
history is the rare transfer of a gene or group of genes
from one organism to another, a process called lateral
gene transfer. The transferred genes may be quite sim-
ilar to the genes they were derived from in the original
organism, whereas most other genes in the same two
organisms may be quite distantly related. An example
of lateral gene transfer is the recent rapid spread of
antibiotic-resistance genes in bacterial populations. The
proteins derived from these transferred genes would not
be good candidates for the study of bacterial evolution,
because they share only a very limited evolutionary his-
tory with their “host” organisms.
The study of molecular evolution generally focuses
on families of closely related proteins. In most cases, the
families chosen for analysis have essential functions in
cellular metabolism that must have been present in the
earliest viable cells, thus greatly reducing the chance
that they were introduced relatively recently by lateral
gene transfer. For example, a protein called EF-1H9251
(elongation factor 1H9251) is involved in the synthesis of pro-
teins in all eukaryotes. A similar protein, EF-Tu, with
the same function, is found in bacteria. Similarities in
sequence and function indicate that EF-1H9251 and EF-Tu
are members of a family of proteins that share a com-
mon ancestor. The members of protein families are
called homologous proteins, or homologs. The con-
cept of a homolog can be further refined. If two proteins
within a family (that is, two homologs) are present in
the same species, they are referred to as paralogs. Ho-
mologs from different species are called orthologs
(see Fig. 1–37). The process of tracing evolution involves
first identifying suitable families of homologous proteins
and then using them to reconstruct evolutionary paths.
Homologs are identified using increasingly power-
ful computer programs that can directly compare two
or more chosen protein sequences, or can search vast
databases to find the evolutionary relatives of one se-
lected protein sequence. The electronic search process
can be thought of as sliding one sequence past the other
until a section with a good match is found. Within this
sequence alignment, a positive score is assigned for each
position where the amino acid residues in the two se-
quences are identical—the value of the score varying
from one program to the next—to provide a measure of
the quality of the alignment. The process has some com-
plications. Sometimes the proteins being compared
match well at, say, two sequence segments, and these
segments are connected by less related sequences of
different lengths. Thus the two matching segments can-
not be aligned at the same time. To handle this, the com-
puter program introduces “gaps” in one of the sequences
to bring the matching segments into register (Fig. 3–30).
3.5 Protein Sequences and Evolution 107
FIGURE 3–30 Aligning protein sequences with the use of gaps.
Shown here is the sequence alignment of a short section of the EF-Tu
protein from two well-studied bacterial species, E. coli and Bacillus
subtilis. Introduction of a gap in the B. subtilis sequence allows a bet-
ter alignment of amino acid residues on either side of the gap. Iden-
tical amino acid residues are shaded.
T
D
G
E
N
D
R
Q
T
T
I
I
A
L
V
L
Y
Y
D
D
L
L
G
G
G
G
G
G
T
T
F
F
D
D
I
V
S
S
I
I
I
L
E
E
I
L
D
G
E
D
V
G
DGEKT
T
F
F
E
E
V
V
L
R
A
S
T
T
N
A
G
G
D
D
T
N
H
R
L
L
G
G
G
G
E
D
D
D
F
F
D
D
S
Q
R
V
L
I
I
I
H
D
Y
H
L
L
E. coli
B. subtilis
Gap
8885d_c03_107 12/23/03 10:27 AM Page 107 mac111 mac111:reb:
Of course, if a sufficient number of gaps are introduced,
almost any two sequences could be brought into some
sort of alignment. To avoid uninformative alignments,
the programs include penalties for each gap introduced,
thus lowering the overall alignment score. With elec-
tronic trial and error, the program selects the alignment
with the optimal score that maximizes identical amino
acid residues while minimizing the introduction of gaps.
Identical amino acids are often inadequate to iden-
tify related proteins or, more importantly, to determine
how closely related the proteins are on an evolutionary
time scale. A more useful analysis includes a consider-
ation of the chemical properties of substituted amino
acids. When amino acid substitutions are found within
a protein family, many of the differences may be con-
servative—that is, an amino acid residue is replaced by
a residue having similar chemical properties. For ex-
ample, a Glu residue may substitute in one family mem-
ber for the Asp residue found in another; both amino
acids are negatively charged. Such a conservative sub-
stitution should logically garner a higher score in a se-
quence alignment than does a nonconservative substi-
tution, such as the replacement of the Asp residue with
a hydrophobic Phe residue.
To determine what scores to assign to the many dif-
ferent amino acid substitutions, Steven Henikoff and
Jorja Henikoff examined the aligned sequences from a
variety of different proteins. They did not analyze en-
tire protein sequences, focusing instead on thousands
of short conserved blocks where the fraction of identi-
cal amino acids was high and the alignments were thus
reliable. Looking at the aligned sequence blocks, the
Henikoffs analyzed the nonidentical amino acid residues
within the blocks. Higher scores were given to non-
identical residues that occurred frequently than to those
that appeared rarely. Even the identical residues were
given scores based on how often they were replaced,
such that amino acids with unique chemical properties
(such as Cys and Trp) received higher scores than those
more conservatively replaced (such as Asp and Glu).
The result of this scoring system is a Blosum (blocks
substitution matrix) table. The table in Figure 3–31 was
generated from sequences that were identical in at least
62% of their amino acid residues, and it is thus referred
to as Blosum62. Similar tables have been generated for
blocks of homologous sequences that are 50% or 80%
identical. When higher levels of identity are required,
the most conservative amino acid substitutions can be
Chapter 3 Amino Acids, Peptides, and Proteins108
A
Ala
4
C
0
9
D
H110022
H110023
6
E
A
C
Cys
D
Asp
H110021
H110024
2
5
F
E
Glu
H110022
H110022
H110023
H110023
6
G
F
Phe
0
H110023
H110021
H110022
H110023
6
H
G
Gly
H110022
H110023
H110021
0
H110021
H110022
8
I
H
His
H110021
H110021
H110023
H110023
0
H110024
H110023
4
K
I
Ile
H110021
H110023
H110021
1
H110023
H110022
H110021
H110023
5
L
K
Lys
H110021
H110021
H110024
H110023
0
H110024
H110023
2
H110022
4
M
L
Leu
H110021
H110021
H110023
H110022
0
H110023
H110022
1
H110021
2
5
N
M
Met
H110022
H110023
1
0
H110023
0
1
H110023
0
H110023
H110022
6
P
N
Asn
H110021
H110023
H110021
H110021
H110024
H110022
H110022
H110023
H110021
H110023
H110022
H110022
7
Q
P
Pro
H110021
H110023
0
2
H110023
H110022
0
H110023
1
H110022
0
0
H110021
5
R
Q
Gln
H110021
H110023
H110022
0
H110023
H110022
0
H110023
2
H110022
H110021
0
H110022
1
5
S
R
Arg
1
H110021
0
0
H110022
0
H110021
H110022
0
H110022
H110021
1
H110021
0
H110021
4
T
S
Ser
0
H110021
H110021
H110021
H110022
H110022
H110022
H110021
H110021
H110021
H110021
0
H110021
H110021
H110021
1
5
V
T
Thr
0
H110021
H110023
H110022
H110021
H110023
H110023
3
H110022
1
1
H110023
H110022
H110022
H110023
H110022
0
4
W
V
Val
H110023
H110022
H110024
H110023
1
H110022
H110022
H110023
H110023
H110022
H110021
H110024
H110024
H110022
H110023
H110023
H110022
H110023
11
Y
W
Trp
H110022
H110022
H110023
H110022
3
H110023
2
H110021
H110022
H110021
H110021
H110022
H110023
H110021
H110022
H110022
H110022
H110021
2
7
Y
Tyr
FIGURE 3–31 The Blosum62 table. This blocks substitution matrix
was created by comparing thousands of short blocks of aligned se-
quences that were identical in at least 62% of their amino acid
residues. The nonidentical residues were assigned scores based on
how frequently they were replaced by each of the other amino acids.
Each substitution contributes to the score given to a particular align-
ment. Positive numbers (shaded yellow) add to the score for a partic-
ular alignment; negative numbers subtract from the score. Identical
residues in sequences being compared (the shaded diagonal from top
left to bottom right in the matrix) receive scores based on how often
they are replaced, such that amino acids with unique chemical prop-
erties (e.g., Cys and Trp) receive higher scores (9 and 11, respectively)
than those more easily replaced in conservative substitutions (e.g., Asp
(6) and Glu (5)). Many computer programs use Blosum62 to assign
scores to new sequence alignments.
8885d_c03_108 12/23/03 10:27 AM Page 108 mac111 mac111:reb:
overrepresented, which limits the usefulness of the ma-
trix in identifying homologs that are somewhat distantly
related. Tests have shown that the Blosum62 table pro-
vides the most reliable alignments over a wide range of
protein families, and it is the default table in many se-
quence alignment programs.
For most efforts to find homologies and explore evo-
lutionary relationships, protein sequences (derived ei-
ther directly from protein sequencing or from the se-
quencing of the DNA encoding the protein) are superior
to nongenic nucleic acid sequences (those that do not
encode a protein or functional RNA). For a nucleic acid,
with its four different types of residues, random align-
ment of nonhomologous sequences will generally yield
matches for at least 25% of the positions. Introduction
of a few gaps can often increase the fraction of matched
residues to 40% or more, and the probability of chance
alignment of unrelated sequences becomes quite high.
The 20 different amino acid residues in proteins greatly
lower the probability of uninformative chance align-
ments of this type.
The programs used to generate a sequence align-
ment are complemented by methods that test the reli-
ability of the alignments. A common computerized test
is to shuffle the amino acid sequence of one of the pro-
teins being compared to produce a random sequence,
then instruct the program to align the shuffled sequence
with the other, unshuffled one. Scores are assigned to
the new alignment, and the shuffling and alignment
process is repeated many times. The original alignment,
before shuffling, should have a score significantly higher
than any of those within the distribution of scores gen-
erated by the random alignments; this increases the con-
fidence that the sequence alignment has identified a pair
of homologs. Note that the absence of a significant align-
ment score does not necessarily mean that no evolu-
tionary relationship exists between two proteins. As we
shall see in Chapter 4, three-dimensional structural sim-
ilarities sometimes reveal evolutionary relationships
where sequence homology has been wiped away by time.
Using a protein family to explore evolution requires
the identification of family members with similar mo-
lecular functions in the widest possible range of organ-
isms. Information from the family can then be used to
trace the evolution of those organisms. By analyzing the
sequence divergence in selected protein families, in-
vestigators can segregate organisms into classes based
on their evolutionary relationships. This information
must be reconciled with more classical examinations of
the physiology and biochemistry of the organisms.
Certain segments of a protein sequence may be
found in the organisms of one taxonomic group but not
in other groups; these segments can be used as signa-
ture sequences for the group in which they are found.
An example of a signature sequence is an insertion of
12 amino acids near the amino terminus of the EF-
1H9251/EF-Tu proteins in all archaebacteria and eukaryotes
but not in other types of bacteria (Fig. 3–32). The sig-
nature is one of many biochemical clues that can help
establish the evolutionary relatedness of eukaryotes and
archaebacteria. For example, the major taxa of bacteria
can be distinguished by signature sequences in several
different proteins. The H9252 and H9253 proteobacteria have sig-
nature sequences in the Hsp70 and DNA gyrase protein
families (families of proteins involved in protein folding
and DNA replication, respectively) that are not present
in any other bacteria, including the other proteobacte-
ria. The other types of proteobacteria (H9251, H9254, H9255), along
with the H9252 and H9253 proteobacteria, have a separate Hsp70
signature sequence and a signature in alanyl-tRNA syn-
thetase (an enzyme of protein synthesis) that are not
present in other bacteria. The appearance of unique sig-
natures in the H9252 and H9253 proteobacteria suggests the H9251, H9254,
and H9255 proteobacteria arose before their H9252 and H9253 cousins.
By considering the entire sequence of a protein, re-
searchers can now construct more elaborate evolution-
ary trees with many species in each taxonomic group.
Figure 3–33 presents one such tree for bacteria, based
on sequence divergence in the protein GroEL (a pro-
tein present in all bacteria that assists in the proper fold-
ing of proteins). The tree can be refined by basing it on
the sequences of multiple proteins and by supplement-
ing the sequence information with data on the unique
biochemical and physiological properties of each
species. There are many methods for generating trees,
each with its own advantages and shortcomings, and
3.5 Protein Sequences and Evolution 109
FIGURE 3–32 A signature sequence in the EF-1H9251/EF-Tu protein
family. The signature sequence (boxed) is a 12-amino-acid insertion
near the amino terminus of the sequence. Residues that align in all
species are shaded yellow. Both archaebacteria and eukaryotes have
the signature, although the sequences of the insertions are quite dis-
tinct for the two groups. The variation in the signature sequence re-
flects the significant evolutionary divergence that has occurred at this
site since it first appeared in a common ancestor of both groups.
I
I
I
I
I
I
G
G
G
G
G
G
H
H
H
H
H
H
V
V
V
V
V
V
D
D
D
D
D
D
H
H
S
S
H
H
G
G
G
G
G
G
K
K
K
K
K
K
S
S
S
S
S
T
T
T
T
T
T
T
M
L
T
T
M
L
V
V
T
T
V
T
G
G
G
G
G
A
R
R
H
H
R
A
L
L
L
L
L
L
I
I
Y
M
Y
Y
E
D
K
K
T
R
C
C
G
G
G
G
S
F
G
G
V
I
I
I
P
D
D
D
E
E
K
K
H
K
R
R
V
T
T
T
I
V
I
I
I
I
E
K
E
E
T
T
Q
E
K
K
T
T
H
A
F
F
V
V
Halobacterium halobium
Sulfolobus solfataricus
Saccharomyces cerevisiae
Homo sapiens
Bacillus subtilis
Escherichia coli
Archaebacteria
Eukaryotes
Gram-positive bacterium
Gram-negative bacterium
Signature sequence
8885d_c03_109 12/23/03 10:27 AM Page 109 mac111 mac111:reb:
many ways to represent the resulting evolutionary rela-
tionships. In Figure 3–33, the free end points of lines
are called “external nodes”; each represents an extant
species, and each is so labeled. The points where two
lines come together, the “internal nodes,” represent ex-
tinct ancestor species. In most representations (includ-
ing Fig. 3–33), the lengths of the lines connecting the
nodes are proportional to the number of amino acid sub-
stitutions separating one species from another. If we
trace two extant species to a common internal node
(representing the common ancestor of the two species),
the length of the branch connecting each external node
to the internal node represents the number of amino
acid substitutions separating one extant species from
this ancestor. The sum of the lengths of all the line seg-
ments that connect an extant species to another extant
species through a common ancestor reflects the num-
ber of substitutions separating the two extant species.
To determine how much time was needed for the vari-
ous species to diverge, the tree must be calibrated by
comparing it with information from the fossil record and
other sources.
As more sequence information is made available in
databases, we can generate evolutionary trees based on
a variety of different proteins. Some proteins evolve
faster than others, or change faster within one group of
species than another. A large protein, with many vari-
able amino acid residues, may exhibit a few differences
between two closely related species. Another, smaller
protein may be identical in the same two species. For
many reasons, some details of an evolutionary tree
based on the sequences of one protein may differ from
those of a tree based on the sequences of another pro-
tein. Increasingly sophisticated analyses using the se-
quences of many different proteins can provide an ex-
quisitely detailed and accurate picture of evolutionary
relationships. The story is a work in progress, and the
questions being asked and answered are fundamental to
how humans view themselves and the world around
them. The field of molecular evolution promises to be
among the most vibrant of the scientific frontiers in the
twenty-first century.
SUMMARY 3.5 Protein Sequences and Evolution
■ Protein sequences are a rich source of
information about protein structure and
function, as well as the evolution of life on this
planet. Sophisticated methods are being
developed to trace evolution by analyzing the
resultant slow changes in the amino acid
sequences of homologous proteins.
Chapter 3 Amino Acids, Peptides, and Proteins110
Leptospira interrogans
Borrelia burgdorferi
Spirochaetes
Thermophilic bacterium PS-3
Bacillus subtilis
Staphylococcus aureus
Clostridium acetobutylicum
Clostridium perfringens
Streptomyces albus [gene]
Streptomyces coelicolor
Mycobacterium leprae
Mycobacterium tuberculosis
low
G
+
C
high
G
+
C
Gram-positive bacteria
Cyanobacteria and
chloroplasts
Cyanidium caldarium chl.
Synechocystis
Ricinus communis chl.
Triticum aestivum chl.
Brassica napus chl.
Arabidopsis thaliana chl.
Zymomonas mobilis
Agrobacterium tumefaciens
0.1 substitutions/site
Bradyrhizobium japonicum
Rickettsia
tsutsugamushi
Neisseria gonorrhoeae
Yersinia enterocolitica
Salmonella typhi
Escherichia coli
Pseudomonas aeruginosa
Legionella pneumophila
Helicobacter pylori
Porphyromonas gingivalis
Chlamydia trachomatis
Chlamydia psittaci
Chlamydia
Bacteroides
Proteobacteria
H9253
H9251
H9252
H9254H11408H9255
FIGURE 3–33 Evolutionary tree derived from amino acid sequence comparisons. A bacterial
evolutionary tree, based on the sequence divergence observed in the GroEL family of proteins.
Also included in this tree (lower right) are the chloroplasts (chl.) of some nonbacterial species.
8885d_c03_110 12/24/03 6:49 AM Page 110 mac76 mac76:385_reb:
Chapter 3 Further Reading 111
Key Terms
amino acids 75
R group 76
chiral center 76
enantiomers 76
absolute
configuration 77
D, L system 77
polarity 78
zwitterion 81
absorbance, A 82
isoelectric pH (isoelec-
tric point, pI) 84
peptide 85
protein 85
peptide bond 85
oligopeptide 85
polypeptide 85
oligomeric protein 87
protomer 87
conjugated
protein 88
prosthetic group 88
primary structure 88
secondary
structure 88
tertiary structure 88
quaternary
structure 88
crude extract 89
fractionation 89
dialysis 89
column
chromatography 89
high-performance liquid
chromatography
(HPLC) 90
electrophoresis 92
sodium dodecyl sulfate
(SDS) 92
isoelectric
focusing 93
Edman degradation 98
proteases 99
proteome 101
lateral gene transfer 107
homologous
proteins 107
homolog 107
paralog 107
ortholog 107
signature sequence 109
Terms in bold are defined in the glossary.
Further Reading
Amino Acids
Dougherty, D.A. (2000) Unnatural amino acids as probes of pro-
tein structure and function. Curr. Opin. Chem. Biol. 4, 645–652.
Greenstein, J.P. & Winitz, M. (1961) Chemistry of the Amino
Acids, 3 Vols, John Wiley & Sons, New York.
Kreil, G. (1997) D-Amino acids in animal peptides. Annu. Rev.
Biochem. 66, 337–345.
An update on the occurrence of these unusual stereoisomers of
amino acids.
Meister, A. (1965) Biochemistry of the Amino Acids, 2nd edn,
Vols 1 and 2, Academic Press, Inc., New York.
Encyclopedic treatment of the properties, occurrence, and me-
tabolism of amino acids.
Peptides and Proteins
Creighton, T.E. (1992) Proteins: Structures and Molecular
Properties, 2nd edn, W. H. Freeman and Company, New York.
Very useful general source.
Working with Proteins
Dunn, M.J. & Corbett, J.M. (1996) Two-dimensional polyacryl-
amide gel electrophoresis. Methods Enzymol. 271, 177–203.
A detailed description of the technology.
Kornberg, A. (1990) Why purify enzymes? Methods Enzymol.
182, 1–5.
The critical role of classical biochemical methods in a new age.
Scopes, R.K. (1994) Protein Purification: Principles and Prac-
tice, 3rd edn, Springer-Verlag, New York.
A good source for more complete descriptions of the principles
underlying chromatography and other methods.
Covalent Structure of Proteins
Andersson, L., Blomberg, L., Flegel, M., Lepsa, L., Nilsson,
B., & Verlander, M. (2000) Large-scale synthesis of peptides.
Biopolymers 55, 227–250.
A discussion of approaches used to manufacture peptides as
pharmaceuticals.
Dell, A. & Morris, H.R. (2001) Glycoprotein structure determi-
nation by mass spectrometry. Science 291, 2351–2356.
Glycoproteins can be complex; mass spectrometry is a method
of choice for sorting things out.
Dongre, A.R., Eng, J.K., & Yates, J.R. III (1997) Emerging
tandem-mass-spectrometry techniques for the rapid identification
of proteins. Trends Biotechnol. 15, 418–425.
A detailed description of mass spectrometry methods.
Gygi, S.P. & Aebersold, R. (2000) Mass spectrometry and pro-
teomics. Curr. Opin. Chem. Biol. 4, 489–494.
Uses of mass spectrometry to identify and study cellular proteins.
Koonin, E.V., Tatusov, R.L., & Galperin, M.Y. (1998) Beyond
complete genomes: from sequence to structure and function. Curr.
Opin. Struct. Biol. 8, 355–363.
A good discussion about the possible uses of the tremendous
amount of protein sequence information becoming available.
Mann, M. & Wilm, M. (1995) Electrospray mass spectrometry for
protein characterization. Trends Biochem. Sci. 20, 219–224.
An approachable summary of this technique for beginners.
Mayo, K.H. (2000) Recent advances in the design and construc-
tion of synthetic peptides: for the love of basics or just for the
technology of it. Trends Biotechnol. 18, 212–217.
8885d_c03_111 1/16/04 6:08 AM Page 111 mac76 mac76:385_reb:
Chapter 3 Amino Acids, Peptides, and Proteins112
Miranda, L.P. & Alewood, P.F. (2000) Challenges for protein
chemical synthesis in the 21st century: bridging genomics and pro-
teomics. Biopolymers 55, 217–226.
This and the Mayo article describe how to make peptides and
splice them together to address a wide range of problems in
protein biochemistry.
Sanger, F. (1988) Sequences, sequences, sequences. Annu. Rev.
Biochem. 57, 1–28.
A nice historical account of the development of sequencing
methods.
Protein Sequences and Evolution
Gupta, R.S. (1998) Protein phylogenies and signal sequences: a
reappraisal of evolutionary relationships among Archaebacteria,
Eubacteria, and Eukaryotes. Microbiol. Mol. Biol. Rev. 62,
1435–1491.
An almost encyclopedic but very readable report of how protein
sequences are used to explore evolution, introducing many in-
teresting ideas and supporting them with detailed sequence
comparisons.
Li, W.-H. & Graur, D. (2000) Fundamentals of Molecular Evo-
lution, 2nd edn, Sinauer Associates, Inc., Sunderland, MA.
A very readable text describing methods used to analyze pro-
tein and nucleic acid sequences. Chapter 5 provides one of the
best available descriptions of how evolutionary trees are con-
structed from sequence data.
Rokas, A., Williams, B.L., King, N., & Carroll, S.B. (2003)
Genome-scale approaches to resolving incongruence in molecular
phylogenies. Nature 425, 798–804.
How sequence comparisons of multiple proteins can yield accu-
rate evolutionary information.
Zuckerkandl, E. & Pauling, L. (1965) Molecules as documents
of evolutionary history. J. Theor. Biol. 8, 357–366.
Considered by many the founding paper in the field of molecu-
lar evolution.
1. Absolute Configuration of Citrulline The citrulline
isolated from watermelons has the structure shown below.
Is it a D- or L-amino acid? Explain.
2. Relationship between the Titration Curve and the
Acid-Base Properties of Glycine A 100 mL solution of
0.1 M glycine at pH 1.72 was titrated with 2 M NaOH solution.
The pH was monitored and the results were plotted on a
graph, as shown at right. The key points in the titration are
designated I to V. For each of the statements (a) to (o), iden-
tify the appropriate key point in the titration and justify your
choice.
(a) Glycine is present predominantly as the species
H11001
H
3
NOCH
2
OCOOH.
(b) The average net charge of glycine is H11001
1
H5008
2
.
(c) Half of the amino groups are ionized.
(d) The pH is equal to the pK
a
of the carboxyl group.
(e) The pH is equal to the pK
a
of the protonated amino
group.
(f) Glycine has its maximum buffering capacity.
(g) The average net charge of glycine is zero.
(h) The carboxyl group has been completely titrated
(first equivalence point).
(i) Glycine is completely titrated (second equivalence
point).
(j) The predominant species is
H11001
H
3
NOCH
2
OCOO
H11002
.
(k) The average net charge of glycine is H110021.
(l) Glycine is present predominantly as a 50:50 mixture
of
H11001
H
3
NOCH
2
OCOOH and
H11001
H
3
NOCH
2
OCOO
H11002
.
(m) This is the isoelectric point.
(n) This is the end of the titration.
(o) These are the worst pH regions for buffering power.
3. How Much Alanine Is Present as the Completely
Uncharged Species? At a pH equal to the isoelectric point
of alanine, the net charge on alanine is zero. Two structures
can be drawn that have a net charge of zero, but the pre-
dominant form of alanine at its pI is zwitterionic.
(a) Why is alanine predominantly zwitterionic rather
than completely uncharged at its pI?
(b) What fraction of alanine is in the completely un-
charged form at its pI? Justify your assumptions.
H11001
C
CH
3
H
3
N
H
C
O
O
H11002
Zwitterionic Uncharged
C
CH
3
H
2
N
H
C
O
OH
12
2
4
6
8
0
11.30
0.5
OH
H11002
(equivalents)
pH
1.0 1.5 2.0
(V)
9.60
(IV)
(III)
2.34
(I)
(II)
5.97
10
CC
O
)H (CH NH
2
NH
222
P
H C N
H11001
H
3
COO
H11002
Problems
8885d_c03_112 12/30/03 7:11 AM Page 112 mac76 mac76:385_reb:
Chapter 3 Problems 113
4. Ionization State of Amino Acids Each ionizable group
of an amino acid can exist in one of two states, charged or
neutral. The electric charge on the functional group is de-
termined by the relationship between its pK
a
and the pH of
the solution. This relationship is described by the Henderson-
Hasselbalch equation.
(a) Histidine has three ionizable functional groups.
Write the equilibrium equations for its three ionizations and
assign the proper pK
a
for each ionization. Draw the structure
of histidine in each ionization state. What is the net charge
on the histidine molecule in each ionization state?
(b) Draw the structures of the predominant ionization
state of histidine at pH 1, 4, 8, and 12. Note that the ioniza-
tion state can be approximated by treating each ionizable
group independently.
(c) What is the net charge of histidine at pH 1, 4, 8, and
12? For each pH, will histidine migrate toward the anode (H11001)
or cathode (H11002) when placed in an electric field?
5. Separation of Amino Acids by Ion-Exchange Chro-
matography Mixtures of amino acids are analyzed by first
separating the mixture into its components through ion-
exchange chromatography. Amino acids placed on a cation-
exchange resin containing sulfonate groups (see Fig. 3–18a)
flow down the column at different rates because of two fac-
tors that influence their movement: (1) ionic attraction be-
tween the OSO
3
H11002
residues on the column and positively
charged functional groups on the amino acids, and (2) hy-
drophobic interactions between amino acid side chains and
the strongly hydrophobic backbone of the polystyrene resin.
For each pair of amino acids listed, determine which will be
eluted first from an ion-exchange column using a pH 7.0
buffer.
(a) Asp and Lys
(b) Arg and Met
(c) Glu and Val
(d) Gly and Leu
(e) Ser and Ala
6. Naming the Stereoisomers of Isoleucine The struc-
ture of the amino acid isoleucine is
(a) How many chiral centers does it have?
(b) How many optical isomers?
(c) Draw perspective formulas for all the optical isomers
of isoleucine.
7. Comparing the pK
a
Values of Alanine and Polyala-
nine The titration curve of alanine shows the ionization of
two functional groups with pK
a
values of 2.34 and 9.69, corre-
sponding to the ionization of the carboxyl and the protonated
amino groups, respectively. The titration of di-, tri-, and larger
oligopeptides of alanine also shows the ionization of only two
functional groups, although the experimental pK
a
values are
different. The trend in pK
a
values is summarized in the table.
(a) Draw the structure of Ala–Ala–Ala. Identify the func-
tional groups associated with pK
1
and pK
2
.
(b) Why does the value of pK
1
increase with each
addition of an Ala residue to the Ala oligopeptide?
(c) Why does the value of pK
2
decrease with each ad-
dition of an Ala residue to the Ala oligopeptide?
8. The Size of Proteins What is the approximate molec-
ular weight of a protein with 682 amino acid residues in a sin-
gle polypeptide chain?
9. The Number of Tryptophan Residues in Bovine
Serum Albumin A quantitative amino acid analysis reveals
that bovine serum albumin (BSA) contains 0.58% tryptophan
(M
r
204) by weight.
(a) Calculate the minimum molecular weight of BSA
(i.e., assuming there is only one tryptophan residue per pro-
tein molecule).
(b) Gel filtration of BSA gives a molecular weight esti-
mate of 70,000. How many tryptophan residues are present
in a molecule of serum albumin?
10. Net Electric Charge of Peptides A peptide has the
sequence
Glu–His–Trp–Ser–Gly–Leu–Arg–Pro–Gly
(a) What is the net charge of the molecule at pH 3, 8,
and 11? (Use pK
a
values for side chains and terminal amino
and carboxyl groups as given in Table 3–1.)
(b) Estimate the pI for this peptide.
11. Isoelectric Point of Pepsin Pepsin is the name given
to several digestive enzymes secreted (as larger precursor
proteins) by glands that line the stomach. These glands also
secrete hydrochloric acid, which dissolves the particulate
matter in food, allowing pepsin to enzymatically cleave indi-
vidual protein molecules. The resulting mixture of food, HCl,
and digestive enzymes is known as chyme and has a pH near
1.5. What pI would you predict for the pepsin proteins? What
functional groups must be present to confer this pI on pepsin?
Which amino acids in the proteins would contribute such
groups?
12. The Isoelectric Point of Histones Histones are pro-
teins found in eukaryotic cell nuclei, tightly bound to DNA,
which has many phosphate groups. The pI of histones is very
high, about 10.8. What amino acid residues must be present
in relatively large numbers in histones? In what way do these
residues contribute to the strong binding of histones to DNA?
13. Solubility of Polypeptides One method for separat-
ing polypeptides makes use of their differential solubilities.
The solubility of large polypeptides in water depends upon
the relative polarity of their R groups, particularly on the num-
ber of ionized groups: the more ionized groups there are, the
more soluble the polypeptide. Which of each pair of the
polypeptides that follow is more soluble at the indicated pH?
Amino acid or peptide pK
1
pK
2
Ala 2.34 9.69
Ala–Ala 3.12 8.30
Ala–Ala–Ala 3.39 8.03
Ala–(Ala)
n
–Ala, n H11350 4 3.42 7.94
HC
H
3
N
H
C
COO
H11002
H
CH
2
CH
3
CH
3
8885d_c03_113 1/16/04 6:09 AM Page 113 mac76 mac76:385_reb:
Chapter 3 Amino Acids, Peptides, and Proteins114
(a) (Gly)
20
or (Glu)
20
at pH 7.0
(b) (Lys–Ala)
3
or (Phe–Met)
3
at pH 7.0
(c) (Ala–Ser–Gly)
5
or (Asn–Ser–His)
5
at pH 6.0
(d) (Ala–Asp–Gly)
5
or (Asn–Ser–His)
5
at pH 3.0
14. Purification of an Enzyme A biochemist discovers
and purifies a new enzyme, generating the purification table
below.
(a) From the information given in the table, calculate
the specific activity of the enzyme solution after each purifi-
cation procedure.
(b) Which of the purification procedures used for this
enzyme is most effective (i.e., gives the greatest relative in-
crease in purity)?
(c) Which of the purification procedures is least effective?
(d) Is there any indication based on the results shown
in the table that the enzyme after step 6 is now pure? What
else could be done to estimate the purity of the enzyme prepa-
ration?
15. Sequence Determination of the Brain Peptide
Leucine Enkephalin A group of peptides that influence
nerve transmission in certain parts of the brain has been iso-
lated from normal brain tissue. These peptides are known as
opioids, because they bind to specific receptors that also bind
opiate drugs, such as morphine and naloxone. Opioids thus
mimic some of the properties of opiates. Some researchers
consider these peptides to be the brain’s own pain killers. Us-
ing the information below, determine the amino acid sequence
of the opioid leucine enkephalin. Explain how your structure
is consistent with each piece of information.
(a) Complete hydrolysis by 6 M HCl at 110 H11034C followed
by amino acid analysis indicated the presence of Gly, Leu,
Phe, and Tyr, in a 2:1:1:1 molar ratio.
(b) Treatment of the peptide with 1-fluoro-2,4-dini-
trobenzene followed by complete hydrolysis and chromatog-
raphy indicated the presence of the 2,4-dinitrophenyl deriv-
ative of tyrosine. No free tyrosine could be found.
(c) Complete digestion of the peptide with pepsin fol-
lowed by chromatography yielded a dipeptide containing Phe
and Leu, plus a tripeptide containing Tyr and Gly in a 1:2 ratio.
16. Structure of a Peptide Antibiotic from Bacillus bre-
vis Extracts from the bacterium Bacillus brevis contain a
peptide with antibiotic properties. This peptide forms com-
plexes with metal ions and apparently disrupts ion transport
across the cell membranes of other bacterial species, killing
them. The structure of the peptide has been determined from
the following observations.
(a) Complete acid hydrolysis of the peptide followed by
amino acid analysis yielded equimolar amounts of Leu, Orn,
Phe, Pro, and Val. Orn is ornithine, an amino acid not present
in proteins but present in some peptides. It has the structure
(b) The molecular weight of the peptide was estimated
as about 1,200.
(c) The peptide failed to undergo hydrolysis when
treated with the enzyme carboxypeptidase. This enzyme cat-
alyzes the hydrolysis of the carboxyl-terminal residue of a
polypeptide unless the residue is Pro or, for some reason,
does not contain a free carboxyl group.
(d) Treatment of the intact peptide with 1-fluoro-2,4-
dinitrobenzene, followed by complete hydrolysis and chro-
matography, yielded only free amino acids and the following
derivative:
(Hint: Note that the 2,4-dinitrophenyl derivative involves the
amino group of a side chain rather than the H9251-amino group.)
(e) Partial hydrolysis of the peptide followed by chro-
matographic separation and sequence analysis yielded the fol-
lowing di- and tripeptides (the amino-terminal amino acid is
always at the left):
Leu–Phe Phe–Pro Orn–Leu Val–Orn
Val–Orn–Leu Phe–Pro–Val Pro–Val–Orn
Given the above information, deduce the amino acid sequence
of the peptide antibiotic. Show your reasoning. When you
have arrived at a structure, demonstrate that it is consistent
with each experimental observation.
17. Efficiency in Peptide Sequencing A peptide with the
primary structure Lys–Arg–Pro–Leu–Ile–Asp–Gly–Ala is se-
quenced by the Edman procedure. If each Edman cycle is
96% efficient, what percentage of the amino acids liberated
in the fourth cycle will be leucine? Do the calculation a sec-
ond time, but assume a 99% efficiency for each cycle.
18. Biochemistry Protocols: Your First Protein Purifi-
cation As the newest and least experienced student in a
biochemistry research lab, your first few weeks are spent
washing glassware and labeling test tubes. You then graduate
to making buffers and stock solutions for use in various lab-
oratory procedures. Finally, you are given responsibility for
purifying a protein. It is a citric acid cycle enzyme, citrate
synthase, located in the mitochondrial matrix. Following a
protocol for the purification, you proceed through the steps
below. As you work, a more experienced student questions
you about the rationale for each procedure. Supply the an-
swers. (Hint: See Chapter 2 for information about osmolar-
ity; see p. 6 for information on separation of organelles from
cells.)
(a) You pick up 20 kg of beef hearts from a nearby
slaughterhouse. You transport the hearts on ice, and perform
NO
2
CH
2
CH
2
H11001
NH
3
O
2
N COO
H11002
CH
2
NH C
H
CH
2
CH
2
CH
2
C COO
H11002
H
3
N
H
H11001
NH
3
H11001
Total
protein Activity
Procedure (mg) (units)
1. Crude extract 20,000 4,000,000
2. Precipitation (salt) 5,000 3,000,000
3. Precipitation (pH) 4,000 1,000,000
4. Ion-exchange chromatography 200 800,000
5. Affinity chromatography 50 750,000
6. Size-exclusion chromatography 45 675,000
8885d_c03_114 12/23/03 10:29 AM Page 114 mac111 mac111:reb:
Chapter 3 Problems 115
each step of the purification on ice or in a walk-in cold room.
You homogenize the beef heart tissue in a high-speed blender
in a medium containing 0.2 M sucrose, buffered to a pH of 7.2.
Why do you use beef heart tissue, and in such large quan-
tity? What is the purpose of keeping the tissue cold and
suspending it in 0.2 M sucrose, at pH 7.2? What happens
to the tissue when it is homogenized?
(b) You subject the resulting heart homogenate, which
is dense and opaque, to a series of differential centrifugation
steps. What does this accomplish?
(c) You proceed with the purification using the super-
natant fraction that contains mostly intact mitochondria. Next
you osmotically lyse the mitochondria. The lysate, which is
less dense than the homogenate, but still opaque, consists
primarily of mitochondrial membranes and internal mito-
chondrial contents. To this lysate you add ammonium sulfate,
a highly soluble salt, to a specific concentration. You cen-
trifuge the solution, decant the supernatant, and discard the
pellet. To the supernatant, which is clearer than the lysate,
you add more ammonium sulfate. Once again, you centrifuge
the sample, but this time you save the pellet because it con-
tains the protein of interest. What is the rationale for the
two-step addition of the salt?
(d) You solubilize the ammonium sulfate pellet contain-
ing the mitochondrial proteins and dialyze it overnight against
large volumes of buffered (pH 7.2) solution. Why isn’t am-
monium sulfate included in the dialysis buffer? Why do
you use the buffer solution instead of water?
(e) You run the dialyzed solution over a size-exclusion
chromatographic column. Following the protocol, you collect
the first protein fraction that exits the column, and discard
the rest of the fractions that elute from the column later. You
detect the protein by measuring UV absorbance (at 280 nm)
in the fractions. What does the instruction to collect the
first fraction tell you about the protein? Why is UV ab-
sorbance at 280 nm a good way to monitor for the pres-
ence of protein in the eluted fractions?
(f) You place the fraction collected in (e) on a cation-
exchange chromatographic column. After discarding the ini-
tial solution that exits the column (the flowthrough), you add
a washing solution of higher pH to the column and collect the
protein fraction that immediately elutes. Explain what you
are doing.
(g) You run a small sample of your fraction, now very
reduced in volume and quite clear (though tinged pink), on
an isoelectric focusing gel. When stained, the gel shows three
sharp bands. According to the protocol, the protein of inter-
est is the one with the pI of 5.6, but you decide to do one
more assay of the protein’s purity. You cut out the pI 5.6 band
and subject it to SDS polyacrylamide gel electrophoresis. The
protein resolves as a single band. Why were you uncon-
vinced of the purity of the “single” protein band on your
isoelectric focusing gel? What did the results of the SDS
gel tell you? Why is it important to do the SDS gel elec-
trophoresis after the isoelectric focusing?
8885d_c03_115 1/16/04 6:09 AM Page 115 mac76 mac76:385_reb:
chapter
THE THREE-DIMENSIONAL
STRUCTURE OF PROTEINS
4.1 Overview of Protein Structure 116
4.2 Protein Secondary Structure 120
4.3 Protein Tertiary and Quaternary Structures 125
4.4 Protein Denaturation and Folding 147
Perhaps the more remarkable features of [myoglobin] are
its complexity and its lack of symmetry. The arrangement
seems to be almost totally lacking in the kind of regulari-
ties which one instinctively anticipates, and it is more
complicated than has been predicted by any theory of
protein structure.
—John Kendrew, article in Nature, 1958
4
T
he covalent backbone of a typical protein contains
hundreds of individual bonds. Because free rotation
is possible around many of these bonds, the protein can
assume an unlimited number of conformations. How-
ever, each protein has a specific chemical or structural
function, strongly suggesting that each has a unique
three-dimensional structure (Fig. 4–1). By the late
1920s, several proteins had been crystallized, including
hemoglobin (M
r
64,500) and the enzyme urease (M
r
483,000). Given that the ordered array of molecules in
a crystal can generally form only if the molecular units
are identical, the simple fact that many proteins can be
crystallized provides strong evidence that even very
large proteins are discrete chemical entities with unique
structures. This conclusion revolutionized thinking
about proteins and their functions.
In this chapter, we explore the three-dimensional
structure of proteins, emphasizing five themes. First,
the three-dimensional structure of a protein is deter-
mined by its amino acid sequence. Second, the function
of a protein depends on its structure. Third, an isolated
protein usually exists in one or a small number of sta-
ble structural forms. Fourth, the most important forces
stabilizing the specific structures maintained by a given
protein are noncovalent interactions. Finally, amid the
huge number of unique protein structures, we can rec-
ognize some common structural patterns that help us
organize our understanding of protein architecture.
These themes should not be taken to imply that pro-
teins have static, unchanging three-dimensional struc-
tures. Protein function often entails an interconversion
between two or more structural forms. The dynamic as-
pects of protein structure will be explored in Chapters
5 and 6.
The relationship between the amino acid sequence
of a protein and its three-dimensional structure is an in-
tricate puzzle that is gradually yielding to techniques
used in modern biochemistry. An understanding of
structure, in turn, is essential to the discussion of func-
tion in succeeding chapters. We can find and understand
the patterns within the biochemical labyrinth of protein
structure by applying fundamental principles of chem-
istry and physics.
4.1 Overview of Protein Structure
The spatial arrangement of atoms in a protein is called
its conformation. The possible conformations of a pro-
tein include any structural state that can be achieved
without breaking covalent bonds. A change in confor-
mation could occur, for example, by rotation about sin-
gle bonds. Of the numerous conformations that are
theoretically possible in a protein containing hundreds
of single bonds, one or (more commonly) a few gener-
ally predominate under biological conditions. The need
for multiple stable conformations reflects the changes
that must occur in most proteins as they bind to other
116
8885d_c04_116 12/23/03 7:43 AM Page 116 mac111 mac111:reb:
molecules or catalyze reactions. The conformations ex-
isting under a given set of conditions are usually the
ones that are thermodynamically the most stable, hav-
ing the lowest Gibbs free energy (G). Proteins in any of
their functional, folded conformations are called native
proteins.
What principles determine the most stable confor-
mations of a protein? An understanding of protein con-
formation can be built stepwise from the discussion of
primary structure in Chapter 3 through a consideration
of secondary, tertiary, and quaternary structures. To this
traditional approach must be added a new emphasis on
supersecondary structures, a growing set of known and
classifiable protein folding patterns that provides an im-
portant organizational context to this complex endeavor.
We begin by introducing some guiding principles.
A Protein’s Conformation Is Stabilized Largely
by Weak Interactions
In the context of protein structure, the term stability
can be defined as the tendency to maintain a native con-
formation. Native proteins are only marginally stable;
the H9004G separating the folded and unfolded states in typ-
ical proteins under physiological conditions is in the
range of only 20 to 65 kJ/mol. A given polypeptide chain
can theoretically assume countless different conforma-
tions, and as a result the unfolded state of a protein is
characterized by a high degree of conformational en-
tropy. This entropy, and the hydrogen-bonding interac-
tions of many groups in the polypeptide chain with sol-
vent (water), tend to maintain the unfolded state. The
chemical interactions that counteract these effects and
stabilize the native conformation include disulfide bonds
and the weak (noncovalent) interactions described in
Chapter 2: hydrogen bonds, and hydrophobic and ionic
interactions. An appreciation of the role of these weak
interactions is especially important to our understand-
ing of how polypeptide chains fold into specific sec-
ondary and tertiary structures, and how they combine
with other polypeptides to form quaternary structures.
About 200 to 460 kJ/mol are required to break a sin-
gle covalent bond, whereas weak interactions can be dis-
rupted by a mere 4 to 30 kJ/mol. Individual covalent
bonds that contribute to the native conformations of
proteins, such as disulfide bonds linking separate parts
of a single polypeptide chain, are clearly much stronger
than individual weak interactions. Yet, because they are
so numerous, it is weak interactions that predominate
as a stabilizing force in protein structure. In general, the
protein conformation with the lowest free energy (that
is, the most stable conformation) is the one with the
maximum number of weak interactions.
The stability of a protein is not simply the sum of
the free energies of formation of the many weak inter-
actions within it. Every hydrogen-bonding group in a
folded polypeptide chain was hydrogen-bonded to wa-
ter prior to folding, and for every hydrogen bond formed
in a protein, a hydrogen bond (of similar strength) be-
tween the same group and water was broken. The net
stability contributed by a given weak interaction, or the
difference in free energies of the folded and unfolded
states, may be close to zero. We must therefore look
elsewhere to explain why the native conformation of a
protein is favored.
We find that the contribution of weak interactions
to protein stability can be understood in terms of the
properties of water (Chapter 2). Pure water contains a
network of hydrogen-bonded H
2
O molecules. No other
molecule has the hydrogen-bonding potential of water,
and other molecules present in an aqueous solution dis-
rupt the hydrogen bonding of water. When water sur-
rounds a hydrophobic molecule, the optimal arrange-
ment of hydrogen bonds results in a highly structured
shell, or solvation layer, of water in the immediate
vicinity. The increased order of the water molecules in
the solvation layer correlates with an unfavorable de-
crease in the entropy of the water. However, when non-
polar groups are clustered together, there is a decrease
in the extent of the solvation layer because each group
no longer presents its entire surface to the solution. The
result is a favorable increase in entropy. As described in
4.1 Overview of Protein Structure 117
FIGURE 4–1 Structure of the enzyme chymotrypsin, a globular pro-
tein. Proteins are large molecules and, as we shall see, each has a
unique structure. A molecule of glycine (blue) is shown for size com-
parison. The known three-dimensional structures of proteins are
archived in the Protein Data Bank, or PDB (www.rcsb.org/pdb). Each
structure is assigned a unique four-character identifier, or PDB ID.
Where appropriate, we will provide the PDB IDs for molecular graphic
images in the figure captions. The image shown here was made using
data from the PDB file 6GCH. The data from the PDB files provide
only a series of coordinates detailing the location of atoms and their
connectivity. Viewing the images requires easy-to-use graphics pro-
grams such as RasMol and Chime that convert the coordinates into
an image and allow the viewer to manipulate the structure in three
dimensions. You will find instructions for downloading Chime with
the structure tutorials on the textbook website (www.whfreeman.
com/lehninger). The PDB website has instructions for downloading
other viewers. We encourage all students to take advantage of the re-
sources of the PDB and the free molecular graphics programs.
8885d_c04_117 12/23/03 7:43 AM Page 117 mac111 mac111:reb:
Chapter 2, this entropy term is the major thermody-
namic driving force for the association of hydrophobic
groups in aqueous solution. Hydrophobic amino acid
side chains therefore tend to be clustered in a protein’s
interior, away from water.
Under physiological conditions, the formation of
hydrogen bonds and ionic interactions in a protein is
driven largely by this same entropic effect. Polar groups
can generally form hydrogen bonds with water and
hence are soluble in water. However, the number of hy-
drogen bonds per unit mass is generally greater for pure
water than for any other liquid or solution, and there
are limits to the solubility of even the most polar mole-
cules as their presence causes a net decrease in hydro-
gen bonding per unit mass. Therefore, a solvation shell
of structured water will also form to some extent around
polar molecules. Even though the energy of formation
of an intramolecular hydrogen bond or ionic interaction
between two polar groups in a macromolecule is largely
canceled out by the elimination of such interactions be-
tween the same groups and water, the release of struc-
tured water when the intramolecular interaction is
formed provides an entropic driving force for folding.
Most of the net change in free energy that occurs when
weak interactions are formed within a protein is there-
fore derived from the increased entropy in the sur-
rounding aqueous solution resulting from the burial of
hydrophobic surfaces. This more than counterbalances
the large loss of conformational entropy as a polypep-
tide is constrained into a single folded conformation.
Hydrophobic interactions are clearly important in
stabilizing a protein conformation; the interior of a pro-
tein is generally a densely packed core of hydrophobic
amino acid side chains. It is also important that any po-
lar or charged groups in the protein interior have suit-
able partners for hydrogen bonding or ionic interactions.
One hydrogen bond seems to contribute little to the
stability of a native structure, but the presence of
hydrogen-bonding or charged groups without partners
in the hydrophobic core of a protein can be so destabi-
lizing that conformations containing these groups are
often thermodynamically untenable. The favorable free-
energy change realized by combining such a group with
a partner in the surrounding solution can be greater than
the difference in free energy between the folded and
unfolded states. In addition, hydrogen bonds between
groups in proteins form cooperatively. Formation of one
hydrogen bond facilitates the formation of additional hy-
drogen bonds. The overall contribution of hydrogen
bonds and other noncovalent interactions to the stabi-
lization of protein conformation is still being evaluated.
The interaction of oppositely charged groups that form
an ion pair (salt bridge) may also have a stabilizing effect
on one or more native conformations of some proteins.
Most of the structural patterns outlined in this chap-
ter reflect two simple rules: (1) hydrophobic residues
are largely buried in the protein interior, away from wa-
ter; and (2) the number of hydrogen bonds within the
protein is maximized. Insoluble proteins and proteins
within membranes (which we examine in Chapter 11)
follow somewhat different rules because of their func-
tion or their environment, but weak interactions are still
critical structural elements.
The Peptide Bond Is Rigid and Planar
Protein Architecture—Primary Structure Covalent bonds also
place important constraints on the conformation of a
polypeptide. In the late 1930s, Linus Pauling and Robert
Corey embarked on a series of studies that laid the foun-
dation for our present understanding of protein struc-
ture. They began with a careful analysis of the peptide
bond. The H9251 carbons of adjacent amino acid residues
are separated by three covalent bonds, arranged as
C
H9251
OCONOC
H9251
. X-ray diffraction studies of crystals of
amino acids and of simple dipeptides and tripeptides
demonstrated that the peptide CON bond is somewhat
shorter than the CON bond in a simple amine and that
the atoms associated with the peptide bond are co-
planar. This indicated a resonance or partial sharing of
two pairs of electrons between the carbonyl oxygen and
the amide nitrogen (Fig. 4–2a). The oxygen has a par-
tial negative charge and the nitrogen a partial positive
charge, setting up a small electric dipole. The six atoms
of the peptide group lie in a single plane, with the oxy-
gen atom of the carbonyl group and the hydrogen atom
of the amide nitrogen trans to each other. From these
findings Pauling and Corey concluded that the peptide
CON bonds are unable to rotate freely because of their
partial double-bond character. Rotation is permitted
about the NOC
H9251
and the C
H9251
OC bonds. The backbone
of a polypeptide chain can thus be pictured as a series
of rigid planes with consecutive planes sharing a com-
mon point of rotation at C
H9251
(Fig. 4–2b). The rigid pep-
tide bonds limit the range of conformations that can be
assumed by a polypeptide chain.
By convention, the bond angles resulting from ro-
tations at C
H9251
are labeled H9278 (phi) for the NOC
H9251
bond
and H9274 (psi) for the C
H9251
OC bond. Again by convention,
both H9278 and H9274 are defined as 180H11034 when the polypeptide
is in its fully extended conformation and all peptide
groups are in the same plane (Fig. 4–2b). In principle,
H9278 and H9274 can have any value between H11002180H11034 and H11001180H11034,
but many values are prohibited by steric interference
between atoms in the polypeptide backbone and amino
acid side chains. The conformation in which both H9278 and
H9274 are 0H11034 (Fig. 4–2c) is prohibited for this reason; this
conformation is used merely as a reference point for de-
scribing the angles of rotation. Allowed values for H9278 and
H9274 are graphically revealed when H9274 is plotted versus H9278 in
a Ramachandran plot (Fig. 4–3), introduced by G. N.
Ramachandran.
Chapter 4 The Three-Dimensional Structure of Proteins118
8885d_c04_118 12/23/03 7:43 AM Page 118 mac111 mac111:reb:
4.1 Overview of Protein Structure 119
C
O
N
H
C
O
H9254H11002
N
H9254H11001
H11001
H
C
O
H11002
N
H
The carbonyl oxygen has a partial negative
charge and the amide nitrogen a partial positive
charge, setting up a small electric dipole.
Virtually all peptide bonds in proteins occur in
this trans configuration; an exception is noted in
Figure 4–8b.
(a)
C
H9251
C
H9251
C
H9251
C
H9251
C
H9251
C
H9251
C
a
Amino
terminus
H
N–Ca Ca–C C–N
C
R
O
C
a
1.24 ?
1.32 ?
1.46 ?
1.53 ?
fw fw
f
w
Carboxyl
terminus
(b)
N
w
f
C
a
C
a
C
a
N
H
H
R
N
C
C
O
O
(c)
FIGURE 4–2 The planar peptide group. (a) Each peptide bond has
some double-bond character due to resonance and cannot rotate.
(b) Three bonds separate sequential H9251 carbons in a polypeptide
chain. The NOC
H9251
and C
H9251
OC bonds can rotate, with bond angles
designated H9278 and H9274, respectively. The peptide CON bond is not free
to rotate. Other single bonds in the backbone may also be
rotationally hindered, depending on the size and charge of the R
groups. In the conformation shown, H9278 and H9274 are 180H11034 (or H11002180H11034).
As one looks out from the H9251 carbon, the H9274 and H9278 angles increase as
the carbonyl or amide nitrogens (respectively) rotate clockwise.
(c) By convention, both H9278 and H9274 are defined as 0H11034 when the two
peptide bonds flanking that H9251 carbon are in the same plane and
positioned as shown. In a protein, this conformation is prohibited
by steric overlap between an H9251-carbonyl oxygen and an H9251-amino
hydrogen atom. To illustrate the bonds between atoms, the balls
representing each atom are smaller than the van der Waals radii for
this scale. 1 ? H11005 0.1 nm.
H11001180
120
60
0
H1100260
H11002120
H11002180
H110011800H11002180
w
(degrees)
f (degrees)
FIGURE 4–3 Ramachandran plot for L-Ala residues. The
conformations of peptides are defined by the values of H9278 and H9274.
Conformations deemed possible are those that involve little or no
steric interference, based on calculations using known van der
Waals radii and bond angles. The areas shaded dark blue reflect
conformations that involve no steric overlap and thus are fully
allowed; medium blue indicates conformations allowed at the
extreme limits for unfavorable atomic contacts; the lightest blue
area reflects conformations that are permissible if a little flexibility is
allowed in the bond angles. The asymmetry of the plot results from
the L stereochemistry of the amino acid residues. The plots for other
L-amino acid residues with unbranched side chains are nearly
identical. The allowed ranges for branched amino acid residues
such as Val, Ile, and Thr are somewhat smaller than for Ala. The Gly
residue, which is less sterically hindered, exhibits a much broader
range of allowed conformations. The range for Pro residues is
greatly restricted because H9278 is limited by the cyclic side chain to the
range of H1100235H11034 to H1100285H11034.
8885d_c04_119 12/30/03 2:13 PM Page 119 mac76 mac76:385_reb:
SUMMARY 4.1 Overview of Protein Structure
■ Every protein has a three-dimensional structure
that reflects its function.
■ Protein structure is stabilized by multiple weak
interactions. Hydrophobic interactions are the
major contributors to stabilizing the globular
form of most soluble proteins; hydrogen bonds
and ionic interactions are optimized in the
specific structures that are thermodynamically
most stable.
■ The nature of the covalent bonds in the
polypeptide backbone places constraints on
structure. The peptide bond has a partial double-
bond character that keeps the entire six-atom
peptide group in a rigid planar configuration.
The NOC
H9251
and C
H9251
OC bonds can rotate to
assume bond angles of H9278 and H9274, respectively.
4.2 Protein Secondary Structure
The term secondary structure refers to the local con-
formation of some part of a polypeptide. The discussion
of secondary structure most usefully focuses on com-
mon regular folding patterns of the polypeptide back-
bone. A few types of secondary structure are particu-
larly stable and occur widely in proteins. The most
prominent are the H9251 helix and H9252 conformations de-
scribed below. Using fundamental chemical principles
and a few experimental observations, Pauling and Corey
predicted the existence of these secondary structures
in 1951, several years before the first complete protein
structure was elucidated.
The H9251 Helix Is a Common Protein
Secondary Structure
Protein Architecture—H9251 Helix Pauling and Corey were
aware of the importance of hydrogen bonds in orient-
ing polar chemical groups such as the CPO and NOH
groups of the peptide bond. They also had the experi-
mental results of William Astbury, who in the 1930s had
conducted pioneering x-ray studies of proteins. Astbury
demonstrated that the protein that makes up hair and
porcupine quills (the fibrous protein H9251-keratin) has a
regular structure that repeats every 5.15 to 5.2 ?. (The
angstrom, ?, named after the physicist Anders J.
?ngstr?m, is equal to 0.1 nm. Although not an SI unit,
it is used universally by structural biologists to describe
atomic distances.) With this information and their data
on the peptide bond, and with the help of precisely con-
structed models, Pauling and Corey set out to deter-
mine the likely conformations of protein molecules.
The simplest arrangement the polypeptide chain
could assume with its rigid peptide bonds (but other
single bonds free to rotate) is a helical structure, which
Pauling and Corey called the H9251 helix (Fig. 4–4). In this
structure the polypeptide backbone is tightly wound
around an imaginary axis drawn longitudinally through
the middle of the helix, and the R groups of the amino
acid residues protrude outward from the helical back-
bone. The repeating unit is a single turn of the helix,
which extends about 5.4 ? along the long axis, slightly
greater than the periodicity Astbury observed on x-ray
analysis of hair keratin. The amino acid residues in an
H9251 helix have conformations with H9274 H11005H1100245H11034 to H1100250H11034 and
H9278 H11005H1100260H11034, and each helical turn includes 3.6 amino acid
residues. The helical twist of the H9251 helix found in all pro-
teins is right-handed (Box 4–1). The H9251 helix proved to
be the predominant structure in H9251-keratins. More gen-
erally, about one-fourth of all amino acid residues in
polypeptides are found in H9251 helices, the exact fraction
varying greatly from one protein to the next.
Why does the H9251 helix form more readily than many
other possible conformations? The answer is, in part,
that an H9251 helix makes optimal use of internal hydrogen
bonds. The structure is stabilized by a hydrogen bond
between the hydrogen atom attached to the elec-
tronegative nitrogen atom of a peptide linkage and the
electronegative carbonyl oxygen atom of the fourth
amino acid on the amino-terminal side of that peptide
bond (Fig. 4–4b). Within the H9251 helix, every peptide bond
(except those close to each end of the helix) partici-
pates in such hydrogen bonding. Each successive turn
of the H9251 helix is held to adjacent turns by three to four
hydrogen bonds. All the hydrogen bonds combined give
the entire helical structure considerable stability.
Further model-building experiments have shown
that an H9251 helix can form in polypeptides consisting of
either L- or D-amino acids. However, all residues must
be of one stereoisomeric series; a D-amino acid will dis-
rupt a regular structure consisting of L-amino acids, and
vice versa. Naturally occurring L-amino acids can form
either right- or left-handed H9251 helices, but extended left-
handed helices have not been observed in proteins.
Chapter 4 The Three-Dimensional Structure of Proteins120
Linus Pauling, 1901–1994 Robert Corey, 1897–1971
8885d_c04_120 12/23/03 7:44 AM Page 120 mac111 mac111:reb:
Amino Acid Sequence Affects H9251 Helix Stability
Not all polypeptides can form a stable H9251 helix. Interac-
tions between amino acid side chains can stabilize or
destabilize this structure. For example, if a polypeptide
chain has a long block of Glu residues, this segment of
the chain will not form an H9251 helix at pH 7.0. The nega-
tively charged carboxyl groups of adjacent Glu residues
repel each other so strongly that they prevent forma-
tion of the H9251 helix. For the same reason, if there are
many adjacent Lys and/or Arg residues, which have pos-
itively charged R groups at pH 7.0, they will also repel
each other and prevent formation of the H9251 helix. The
bulk and shape of Asn, Ser, Thr, and Cys residues can
also destabilize an H9251 helix if they are close together in
the chain.
The twist of an H9251 helix ensures that critical inter-
actions occur between an amino acid side chain and the
side chain three (and sometimes four) residues away on
either side of it (Fig. 4–5). Positively charged amino
acids are often found three residues away from nega-
tively charged amino acids, permitting the formation of
an ion pair. Two aromatic amino acid residues are often
similarly spaced, resulting in a hydrophobic interaction.
4.2 Protein Secondary Structure 121
(b)
Carbon
Hydrogen
Oxygen
Nitrogen
R group
5.4 ?
(3.6 residues)
(a)
Carboxyl terminus
Amino terminus
(c) (d)
FIGURE 4–4 Four models of the H9251 helix, showing different aspects
of its structure. (a) Formation of a right-handed H9251 helix. The planes
of the rigid peptide bonds are parallel to the long axis of the helix,
depicted here as a vertical rod. (b) Ball-and-stick model of a right-
handed H9251 helix, showing the intrachain hydrogen bonds. The repeat
unit is a single turn of the helix, 3.6 residues. (c) The H9251 helix as viewed
from one end, looking down the longitudinal axis (derived from PDB
ID 4TNC). Note the positions of the R groups, represented by purple
spheres. This ball-and-stick model, used to emphasize the helical
arrangement, gives the false impression that the helix is hollow, be-
cause the balls do not represent the van der Waals radii of the indi-
vidual atoms. As the space-filling model (d) shows, the atoms in the
center of the H9251 helix are in very close contact.
FIGURE 4–5 Interactions between R groups of amino acids three
residues apart in an H9251 helix. An ionic interaction between Asp
100
and
Arg
103
in an H9251-helical region of the protein troponin C, a calcium-
binding protein associated with muscle, is shown in this space-filling
model (derived from PDB ID 4TNC). The polypeptide backbone (car-
bons, H9251-amino nitrogens, and H9251-carbonyl oxygens) is shown in gray
for a helix segment 13 residues long. The only side chains represented
here are the interacting Asp (red) and Arg (blue) side chains.
8885d_c04_121 12/23/03 7:44 AM Page 121 mac111 mac111:reb:
A constraint on the formation of the H9251 helix is the
presence of Pro or Gly residues. In proline, the nitrogen
atom is part of a rigid ring (see Fig. 4–8b), and rotation
about the NOC
H9251
bond is not possible. Thus, a Pro
residue introduces a destabilizing kink in an H9251 helix. In
addition, the nitrogen atom of a Pro residue in peptide
linkage has no substituent hydrogen to participate in hy-
drogen bonds with other residues. For these reasons,
proline is only rarely found within an H9251 helix. Glycine
occurs infrequently in H9251 helices for a different reason:
it has more conformational flexibility than the other
amino acid residues. Polymers of glycine tend to take
up coiled structures quite different from an H9251 helix.
A final factor affecting the stability of an H9251 helix in
a polypeptide is the identity of the amino acid residues
near the ends of the H9251-helical segment. A small electric
dipole exists in each peptide bond (Fig. 4–2a). These
dipoles are connected through the hydrogen bonds of
the helix, resulting in a net dipole extending along the
helix that increases with helix length (Fig. 4–6). The
four amino acid residues at each end of the helix do not
participate fully in the helix hydrogen bonds. The par-
tial positive and negative charges of the helix dipole ac-
tually reside on the peptide amino and carbonyl groups
near the amino-terminal and carboxyl-terminal ends of
the helix, respectively. For this reason, negatively
charged amino acids are often found near the amino ter-
minus of the helical segment, where they have a stabi-
lizing interaction with the positive charge of the helix
dipole; a positively charged amino acid at the amino-
terminal end is destabilizing. The opposite is true at the
carboxyl-terminal end of the helical segment.
Thus, five different kinds of constraints affect the
stability of an H9251 helix: (1) the electrostatic repulsion (or
attraction) between successive amino acid residues with
charged R groups, (2) the bulkiness of adjacent R
groups, (3) the interactions between R groups spaced
three (or four) residues apart, (4) the occurrence of Pro
and Gly residues, and (5) the interaction between amino
acid residues at the ends of the helical segment and the
electric dipole inherent to the H9251 helix. The tendency of
a given segment of a polypeptide chain to fold up as an
H9251 helix therefore depends on the identity and sequence
of amino acid residues within the segment.
Chapter 4 The Three-Dimensional Structure of Proteins122
BOX 4–1 WORKING IN BIOCHEMISTRY
Knowing the Right Hand from the Left
There is a simple method for determining whether a
helical structure is right-handed or left-handed. Make
fists of your two hands with thumbs outstretched and
pointing straight up. Looking at your right hand, think
of a helix spiraling up your right thumb in the direc-
tion in which the other four fingers are curled as
shown (counterclockwise). The resulting helix is
right-handed. Your left hand will demonstrate a left-
handed helix, which rotates in the clockwise direction
as it spirals up your thumb.
–
+
–
+
–
+
–
–
–
–
+
+
+
–
+
–
–
–
+
+
+
+
d
+
d
–
Carboxyl terminus
Amino terminus
FIGURE 4–6 Helix dipole. The electric dipole of a peptide bond (see
Fig. 4–2a) is transmitted along an H9251-helical segment through the in-
trachain hydrogen bonds, resulting in an overall helix dipole. In this
illustration, the amino and carbonyl constituents of each peptide bond
are indicated by H11001 and H11002 symbols, respectively. Non-hydrogen-
bonded amino and carbonyl constituents in the peptide bonds near
each end of the H9251-helical region are shown in red.
8885d_c04_122 12/23/03 7:44 AM Page 122 mac111 mac111:reb:
The H9252 Conformation Organizes Polypeptide Chains
into Sheets
Protein Architecture—H9252 Sheet Pauling and Corey predicted
a second type of repetitive structure, the H9252 conforma-
tion. This is a more extended conformation of polypep-
tide chains, and its structure has been confirmed by
x-ray analysis. In the H9252 conformation, the backbone of
the polypeptide chain is extended into a zigzag rather
than helical structure (Fig. 4–7). The zigzag polypep-
tide chains can be arranged side by side to form a struc-
ture resembling a series of pleats. In this arrangement,
called a H9252 sheet, hydrogen bonds are formed between
adjacent segments of polypeptide chain. The individual
segments that form a H9252 sheet are usually nearby on the
polypeptide chain, but can also be quite distant from
each other in the linear sequence of the polypeptide;
they may even be segments in different polypeptide
chains. The R groups of adjacent amino acids protrude
from the zigzag structure in opposite directions, creat-
ing the alternating pattern seen in the side views in Fig-
ure 4–7.
The adjacent polypeptide chains in a H9252 sheet can
be either parallel or antiparallel (having the same or
opposite amino-to-carboxyl orientations, respectively).
The structures are somewhat similar, although the
repeat period is shorter for the parallel conformation
(6.5 ?, versus 7 ? for antiparallel) and the hydrogen-
bonding patterns are different.
Some protein structures limit the kinds of amino
acids that can occur in the H9252 sheet. When two or more
H9252 sheets are layered close together within a protein, the
R groups of the amino acid residues on the touching sur-
faces must be relatively small. H9252-Keratins such as silk
fibroin and the fibroin of spider webs have a very high
content of Gly and Ala residues, the two amino acids
with the smallest R groups. Indeed, in silk fibroin Gly
and Ala alternate over large parts of the sequence.
H9252 Turns Are Common in Proteins
Protein Architecture—H9252 Turn In globular proteins, which
have a compact folded structure, nearly one-third of the
amino acid residues are in turns or loops where the
polypeptide chain reverses direction (Fig. 4–8). These
are the connecting elements that link successive runs
of H9251 helix or H9252 conformation. Particularly common are
H9252 turns that connect the ends of two adjacent segments
of an antiparallel H9252 sheet. The structure is a 180H11034 turn
involving four amino acid residues, with the carbonyl
oxygen of the first residue forming a hydrogen bond with
the amino-group hydrogen of the fourth. The peptide
groups of the central two residues do not participate in
any interresidue hydrogen bonding. Gly and Pro
residues often occur in H9252 turns, the former because it
is small and flexible, the latter because peptide bonds
involving the imino nitrogen of proline readily assume
the cis configuration (Fig. 4–8b), a form that is partic-
ularly amenable to a tight turn. Of the several types of
H9252 turns, the two shown in Figure 4–8a are the most com-
mon. Beta turns are often found near the surface of a
protein, where the peptide groups of the central two
amino acid residues in the turn can hydrogen-bond with
water. Considerably less common is the H9253 turn, a three-
residue turn with a hydrogen bond between the first and
third residues.
4.2 Protein Secondary Structure 123
(a) Antiparallel
Top view
Side view
(b) Parallel
Top view
Side view
FIGURE 4–7 The H9252 conformation of polypeptide chains. These top
and side views reveal the R groups extending out from the H9252 sheet
and emphasize the pleated shape described by the planes of the pep-
tide bonds. (An alternative name for this structure is H9252-pleated sheet.)
Hydrogen-bond cross-links between adjacent chains are also shown.
(a) Antiparallel H9252 sheet, in which the amino-terminal to carboxyl-
terminal orientation of adjacent chains (arrows) is inverse. (b) Parallel
H9252 sheet.
8885d_c04_123 12/23/03 7:45 AM Page 123 mac111 mac111:reb:
1
Type I
Type II
(a) b Turns
2
3
4
R
Cα
Cα
Cα
Cα
R
R
1
2
3
4
H C O
C
R
C
O
N
O
H
trans cis
H
C
RH
C
O
N
(b) Proline isomers
¨
¨
¨
∑
C
Glycine
Common Secondary Structures Have Characteristic
Bond Angles and Amino Acid Content
The H9251 helix and the H9252 conformation are the major repet-
itive secondary structures in a wide variety of proteins,
although other repetitive structures do exist in some
specialized proteins (an example is collagen; see Fig.
4–13 on page 128). Every type of secondary structure
can be completely described by the bond angles H9278 and
H9274 at each residue. As shown by a Ramachandran plot,
the H9251 helix and H9252 conformation fall within a relatively re-
stricted range of sterically allowed structures (Fig.
4–9a). Most values of H9278 and H9274 taken from known protein
structures fall into the expected regions, with high con-
centrations near the H9251 helix and H9252 conformation values
as predicted (Fig. 4–9b). The only amino acid residue
often found in a conformation outside these regions is
glycine. Because its side chain, a single hydrogen atom,
is small, a Gly residue can take part in many conforma-
tions that are sterically forbidden for other amino acids.
Some amino acids are accommodated better than
others in the different types of secondary structures. An
overall summary is presented in Figure 4–10. Some
biases, such as the common presence of Pro and Gly
residues in H9252 turns and their relative absence in H9251 he-
lices, are readily explained by the known constraints on
the different secondary structures. Other evident biases
may be explained by taking into account the sizes or
charges of side chains, but not all the trends shown in
Figure 4–10 are understood.
SUMMARY 4.2 Protein Secondary Structure
■ Secondary structure is the regular arrangement
of amino acid residues in a segment of a
polypeptide chain, in which each residue is
spatially related to its neighbors in the same
way.
■ The most common secondary structures are
the H9251 helix, the H9252 conformation, and H9252 turns.
■ The secondary structure of a polypeptide
segment can be completely defined if the H9278
and H9274 angles are known for all amino acid
residues in that segment.
Chapter 4 The Three-Dimensional Structure of Proteins124
FIGURE 4–8 Structures of H9252 turns. (a) Type I and type II H9252 turns are
most common; type I turns occur more than twice as frequently as
type II. Type II H9252 turns always have Gly as the third residue. Note the
hydrogen bond between the peptide groups of the first and fourth
residues of the bends. (Individual amino acid residues are framed by
large blue circles.) (b) The trans and cis isomers of a peptide bond in-
volving the imino nitrogen of proline. Of the peptide bonds between
amino acid residues other than Pro, over 99.95% are in the trans con-
figuration. For peptide bonds involving the imino nitrogen of proline,
however, about 6% are in the cis configuration; many of these occur
at H9252 turns.
8885d_c04_124 12/23/03 7:45 AM Page 124 mac111 mac111:reb:
4.3 Protein Tertiary and Quaternary
Structures
Protein Architecture—Introduction to Tertiary Structure The
overall three-dimensional arrangement of all atoms in a
protein is referred to as the protein’s tertiary struc-
ture. Whereas the term “secondary structure” refers to
the spatial arrangement of amino acid residues that are
adjacent in the primary structure, tertiary structure in-
cludes longer-range aspects of amino acid sequence.
Amino acids that are far apart in the polypeptide se-
quence and that reside in different types of secondary
structure may interact within the completely folded
structure of a protein. The location of bends (including
H9252 turns) in the polypeptide chain and the direction and
angle of these bends are determined by the number and
location of specific bend-producing residues, such as
Pro, Thr, Ser, and Gly. Interacting segments of polypep-
tide chains are held in their characteristic tertiary posi-
tions by different kinds of weak bonding interactions
(and sometimes by covalent bonds such as disulfide
cross-links) between the segments.
Some proteins contain two or more separate
polypeptide chains, or subunits, which may be identical
or different. The arrangement of these protein subunits
in three-dimensional complexes constitutes quater-
nary structure.
In considering these higher levels of structure, it is
useful to classify proteins into two major groups: fi-
brous proteins, having polypeptide chains arranged in
long strands or sheets, and globular proteins, having
polypeptide chains folded into a spherical or globular
shape. The two groups are structurally distinct: fibrous
proteins usually consist largely of a single type of sec-
ondary structure; globular proteins often contain sev-
eral types of secondary structure. The two groups dif-
fer functionally in that the structures that provide
support, shape, and external protection to vertebrates
are made of fibrous proteins, whereas most enzymes and
regulatory proteins are globular proteins. Certain fi-
brous proteins played a key role in the development of
our modern understanding of protein structure and pro-
vide particularly clear examples of the relationship be-
tween structure and function. We begin our discussion
with fibrous proteins, before turning to the more com-
plex folding patterns observed in globular proteins.
4.3 Protein Tertiary and Quaternary Structures 125
(b)
H11001180
120
60
0
H1100260
H11002120
H11002180
H110011800H11002180
w
(degrees)
f (degrees)
Antiparallel
b sheets
Collagen triple
helix
Right-twisted
b sheets
Parallel
b sheets
Left-handed
a helix
Right-handed
a helix
H11001180
120
60
0
H1100260
H11002120
H11002180
H110011800H11002180
w
(degrees)
f (degrees)(a)
FIGURE 4–9 Ramachandran plots for a variety of structures. (a) The
values of H9278 and H9274 for various allowed secondary structures are over-
laid on the plot from Figure 4–3. Although left-handed H9251 helices ex-
tending over several amino acid residues are theoretically possible,
they have not been observed in proteins. (b) The values of H9278 and H9274
for all the amino acid residues except Gly in the enzyme pyruvate ki-
nase (isolated from rabbit) are overlaid on the plot of theoretically al-
lowed conformations (Fig. 4–3). The small, flexible Gly residues were
excluded because they frequently fall outside the expected ranges
(blue).
FIGURE 4–10 Relative probabilities that a given amino acid will oc-
cur in the three common types of secondary structure.
Glu
a Helix b Conformation b Turn
Met
Ala
Leu
Lys
Phe
Gln
Trp
Ile
Val
Asp
His
Arg
Thr
Ser
Cys
Tyr
Asn
Pro
Gly
8885d_c04_125 12/23/03 7:46 AM Page 125 mac111 mac111:reb:
Fibrous Proteins Are Adapted for
a Structural Function
Protein Architecture—Tertiary Structure of Fibrous Proteins
H9251-Keratin, collagen, and silk fibroin nicely illustrate the
relationship between protein structure and biological
function (Table 4–1). Fibrous proteins share properties
that give strength and/or flexibility to the structures in
which they occur. In each case, the fundamental struc-
tural unit is a simple repeating element of secondary
structure. All fibrous proteins are insoluble in water, a
property conferred by a high concentration of hy-
drophobic amino acid residues both in the interior of
the protein and on its surface. These hydrophobic sur-
faces are largely buried by packing many similar
polypeptide chains together to form elaborate supramol-
ecular complexes. The underlying structural simplicity
of fibrous proteins makes them particularly useful for
illustrating some of the fundamental principles of pro-
tein structure discussed above.
H9251-Keratin The H9251-keratins have evolved for strength.
Found in mammals, these proteins constitute almost
the entire dry weight of hair, wool, nails, claws, quills,
horns, hooves, and much of the outer layer of skin. The
H9251-keratins are part of a broader family of proteins called
intermediate filament (IF) proteins. Other IF proteins
are found in the cytoskeletons of animal cells. All IF pro-
teins have a structural function and share structural fea-
tures exemplified by the H9251-keratins.
The H9251-keratin helix is a right-handed H9251 helix, the
same helix found in many other proteins. Francis Crick
and Linus Pauling in the early 1950s independently sug-
gested that the H9251 helices of keratin were arranged as a
coiled coil. Two strands of H9251-keratin, oriented in parallel
(with their amino termini at the same end), are wrapped
about each other to form a supertwisted coiled coil. The
supertwisting amplifies the strength of the overall struc-
ture, just as strands are twisted to make a strong rope
(Fig. 4–11). The twisting of the axis of an H9251 helix to
form a coiled coil explains the discrepancy between the
5.4 ? per turn predicted for an H9251 helix by Pauling and
Corey and the 5.15 to 5.2 ? repeating structure observed
in the x-ray diffraction of hair (p. 120). The helical path
of the supertwists is left-handed, opposite in sense to
the H9251 helix. The surfaces where the two H9251 helices touch
are made up of hydrophobic amino acid residues, their
R groups meshed together in a regular interlocking pat-
tern. This permits a close packing of the polypeptide
chains within the left-handed supertwist. Not surpris-
ingly, H9251-keratin is rich in the hydrophobic residues Ala,
Val, Leu, Ile, Met, and Phe.
An individual polypeptide in the H9251-keratin coiled
coil has a relatively simple tertiary structure, dominated
by an H9251-helical secondary structure with its helical axis
twisted in a left-handed superhelix. The intertwining of
the two H9251-helical polypeptides is an example of quater-
nary structure. Coiled coils of this type are common
structural elements in filamentous proteins and in the
muscle protein myosin (see Fig. 5–29). The quaternary
structure of H9251-keratin can be quite complex. Many coiled
coils can be assembled into large supramolecular com-
plexes, such as the arrangement of H9251-keratin to form
the intermediate filament of hair (Fig. 4–11b).
Chapter 4 The Three-Dimensional Structure of Proteins126
Cells
Intermediate
filament
Protofibril
Cross section of a hair
Protofilament
Two-chain
coiled coil
H9251 Helix
(b)
FIGURE 4–11 Structure of hair. (a) Hair H9251-keratin is an elongated H9251 helix with
somewhat thicker elements near the amino and carboxyl termini. Pairs of these
helices are interwound in a left-handed sense to form two-chain coiled coils.
These then combine in higher-order structures called protofilaments and
protofibrils. About four protofibrils—32 strands of H9251-keratin altogether—combine
to form an intermediate filament. The individual two-chain coiled coils in the
various substructures also appear to be interwound, but the handedness of the
interwinding and other structural details are unknown. (b) A hair is an array of
many H9251-keratin filaments, made up of the substructures shown in (a).
(a)
Protofibril
Protofilament
Two-chain
coiled coil
20–30 ?
Keratin a helix
8885d_c04_126 12/23/03 7:46 AM Page 126 mac111 mac111:reb:
The strength of fibrous proteins is enhanced by co-
valent cross-links between polypeptide chains within
the multihelical “ropes” and between adjacent chains in
a supramolecular assembly. In H9251-keratins, the cross-links
stabilizing quaternary structure are disulfide bonds
(Box 4–2). In the hardest and toughest H9251-keratins, such
as those of rhinoceros horn, up to 18% of the residues
are cysteines involved in disulfide bonds.
Collagen Like the H9251-keratins, collagen has evolved to
provide strength. It is found in connective tissue such
as tendons, cartilage, the organic matrix of bone, and
the cornea of the eye. The collagen helix is a unique
secondary structure quite distinct from the H9251 helix. It
is left-handed and has three amino acid residues per
turn (Fig. 4–12). Collagen is also a coiled coil, but one
with distinct tertiary and quaternary structures: three
separate polypeptides, called H9251 chains (not to be con-
fused with H9251 helices), are supertwisted about each other
(Fig. 4–12c). The superhelical twisting is right-handed
in collagen, opposite in sense to the left-handed helix
of the H9251 chains.
There are many types of vertebrate collagen. Typi-
cally they contain about 35% Gly, 11% Ala, and 21% Pro
and 4-Hyp (4-hydroxyproline, an uncommon amino
acid; see Fig. 3–8a). The food product gelatin is derived
4.3 Protein Tertiary and Quaternary Structures 127
TABLE 4–1 Secondary Structures and Properties of Fibrous Proteins
Structure Characteristics Examples of occurrence
H9251 Helix, cross-linked by disulfide Tough, insoluble protective structures of H9251-Keratin of hair, feathers, and nails
bonds varying hardness and flexibility
H9252 Conformation Soft, flexible filaments Silk fibroin
Collagen triple helix High tensile strength, without stretch Collagen of tendons, bone matrix
BOX 4–2 THE WORLD OF BIOCHEMISTRY
Permanent Waving Is Biochemical Engineering
When hair is exposed to moist heat, it can be
stretched. At the molecular level, the H9251 helices in the
H9251-keratin of hair are stretched out until they arrive at
the fully extended H9252 conformation. On cooling they
spontaneously revert to the H9251-helical conformation.
The characteristic “stretchability” of H9251-keratins, and
their numerous disulfide cross-linkages, are the basis
of permanent waving. The hair to be waved or curled
is first bent around a form of appropriate shape. A so-
lution of a reducing agent, usually a compound con-
taining a thiol or sulfhydryl group (OSH), is then ap-
plied with heat. The reducing agent cleaves the
cross-linkages by reducing each disulfide bond to form
two Cys residues. The moist heat breaks hydrogen
bonds and causes the H9251-helical structure of the
polypeptide chains to uncoil. After a time the reduc-
ing solution is removed, and an oxidizing agent is
added to establish new disulfide bonds between pairs
of Cys residues of adjacent polypeptide chains, but not
the same pairs as before the treatment. After the hair
is washed and cooled, the polypeptide chains revert
to their H9251-helical conformation. The hair fibers now
curl in the desired fashion because the new disulfide
cross-linkages exert some torsion or twist on the bun-
dles of H9251-helical coils in the hair fibers. A permanent
wave is not truly permanent, because the hair grows;
in the new hair replacing the old, the H9251-keratin has
the natural, nonwavy pattern of disulfide bonds.
SH
SH
SH
SH
SH
SH
HS
HS
HS
HS
HS
HS
S S
S S
S S
S S
S S
S S
SH
HS
HS
SH
HS
SH
SH
HS
SH
HS
HS
S
S
S
S
S
S
S
S
reduce curl oxidize
HS
HS
HS
SH
HS
8885d_c04_127 1/16/04 6:13 AM Page 127 mac76 mac76:385_reb:
Heads of collagen
molecules
Section of collagen
molecule
Cross-striations
640 ? (64 nm)
250
nm
FIGURE 4–13 Structure of collagen fibrils. Collagen (M
r
300,000) is
a rod-shaped molecule, about 3,000 ? long and only 15 ? thick. Its
three helically intertwined H9251 chains may have different sequences, but
each has about 1,000 amino acid residues. Collagen fibrils are made
up of collagen molecules aligned in a staggered fashion and cross-
linked for strength. The specific alignment and degree of cross-linking
vary with the tissue and produce characteristic cross-striations in an
electron micrograph. In the example shown here, alignment of the
head groups of every fourth molecule produces striations 640 ? apart.
from collagen; it has little nutritional value as a protein,
because collagen is extremely low in many amino acids
that are essential in the human diet. The unusual amino
acid content of collagen is related to structural con-
straints unique to the collagen helix. The amino acid se-
quence in collagen is generally a repeating tripeptide
unit, Gly–X–Y, where X is often Pro, and Y is often
4-Hyp. Only Gly residues can be accommodated at the
very tight junctions between the individual H9251 chains
(Fig. 4–12d); The Pro and 4-Hyp residues permit the
sharp twisting of the collagen helix. The amino acid se-
quence and the supertwisted quaternary structure of
collagen allow a very close packing of its three polypep-
tides. 4-Hydroxyproline has a special role in the struc-
ture of collagen—and in human history (Box 4–3).
The tight wrapping of the H9251 chains in the collagen
triple helix provides tensile strength greater than that
Chapter 4 The Three-Dimensional Structure of Proteins128
(b) (c) (d)(a)
FIGURE 4–12 Structure of collagen. (Derived from PDB ID 1CGD.)
(a) The H9251 chain of collagen has a repeating secondary structure unique
to this protein. The repeating tripeptide sequence Gly–X–Pro or
Gly–X–4-Hyp adopts a left-handed helical structure with three residues
per turn. The repeating sequence used to generate this model is
Gly–Pro–4-Hyp. (b) Space-filling model of the same H9251 chain. (c) Three
of these helices (shown here in gray, blue, and purple) wrap around
one another with a right-handed twist. (d) The three-stranded colla-
gen superhelix shown from one end, in a ball-and-stick representa-
tion. Gly residues are shown in red. Glycine, because of its small size,
is required at the tight junction where the three chains are in contact.
The balls in this illustration do not represent the van der Waals radii
of the individual atoms. The center of the three-stranded superhelix is
not hollow, as it appears here, but is very tightly packed.
N
OH
CH
2
CH CH
2
CH
2
C
CO
H
Polypeptide
chain
HN NH
OC
CH CH
2
CH
2
CH
2
CH
Polypeptide Lys residue HyLys
chain minus H9280-amino residue
group (norleucine)
Dehydrohydroxylysinonorleucine
of a steel wire of equal cross section. Collagen fibrils
(Fig. 4–13) are supramolecular assemblies consisting of
triple-helical collagen molecules (sometimes referred to
as tropocollagen molecules) associated in a variety of
ways to provide different degrees of tensile strength.
The H9251 chains of collagen molecules and the collagen mol-
ecules of fibrils are cross-linked by unusual types of co-
valent bonds involving Lys, HyLys (5-hydroxylysine; see
Fig. 3–8a), or His residues that are present at a few of
the X and Y positions in collagens. These links create
uncommon amino acid residues such as dehydrohy-
droxylysinonorleucine. The increasingly rigid and brit-
tle character of aging connective tissue results from ac-
cumulated covalent cross-links in collagen fibrils.
8885d_c04_128 12/23/03 7:47 AM Page 128 mac111 mac111:reb:
129
A typical mammal has more than 30 structural
variants of collagen, particular to certain tissues
and each somewhat different in sequence and function.
Some human genetic defects in collagen structure il-
lustrate the close relationship between amino acid se-
quence and three-dimensional structure in this protein.
Osteogenesis imperfecta is characterized by abnormal
bone formation in babies; Ehlers-Danlos syndrome is
characterized by loose joints. Both conditions can be
lethal, and both result from the substitution of an amino
acid residue with a larger R group (such as Cys or Ser)
for a single Gly residue in each H9251 chain (a different Gly
residue in each disorder). These single-residue substi-
tutions have a catastrophic effect on collagen function
because they disrupt the Gly–X–Y repeat that gives col-
lagen its unique helical structure. Given its role in the
collagen triple helix (Fig. 4–12d), Gly cannot be re-
placed by another amino acid residue without substan-
tial deleterious effects on collagen structure. ■
Silk Fibroin Fibroin, the protein of silk, is produced by
insects and spiders. Its polypeptide chains are predom-
inantly in the H9252 conformation. Fibroin is rich in Ala and
Gly residues, permitting a close packing of H9252 sheets and
an interlocking arrangement of R groups (Fig. 4–14).
The overall structure is stabilized by extensive hydro-
gen bonding between all peptide linkages in the
polypeptides of each H9252 sheet and by the optimization of
van der Waals interactions between sheets. Silk does not
stretch, because the H9252 conformation is already highly
extended (Fig. 4–7; see also Fig. 4–15). However, the
structure is flexible because the sheets are held together
by numerous weak interactions rather than by covalent
bonds such as the disulfide bonds in H9251-keratins.
Structural Diversity Reflects Functional Diversity
in Globular Proteins
In a globular protein, different segments of a polypep-
tide chain (or multiple polypeptide chains) fold back on
each other. As illustrated in Figure 4–15, this folding
generates a compact form relative to polypeptides in a
fully extended conformation. The folding also provides
the structural diversity necessary for proteins to carry
out a wide array of biological functions. Globular proteins
include enzymes, transport proteins, motor proteins,
regulatory proteins, immunoglobulins, and proteins with
many other functions.
As a new millennium begins, the number of known
three-dimensional protein structures is in the thousands
and more than doubles every two years. This wealth of
structural information is revolutionizing our under-
standing of protein structure, the relation of structure
4.3 Protein Tertiary and Quaternary Structures
(b) 70 mH9262
3.5 ?
5.7 ?
Ala side chain Gly side chain(a)
FIGURE 4–14 Structure of silk. The fibers used to make silk cloth or
a spider web are made up of the protein fibroin. (a) Fibroin consists
of layers of antiparallel H9252 sheets rich in Ala (purple) and Gly (yellow)
residues. The small side chains interdigitate and allow close packing
of each layered sheet, as shown in this side view. (b) Strands of
fibroin (blue) emerge from the spinnerets of a spider in this colorized
electron micrograph.
FIGURE 4–15 Globular protein structures are compact and varied.
Human serum albumin (M
r
64,500) has 585 residues in a single chain.
Given here are the approximate dimensions its single polypeptide
chain would have if it occurred entirely in extended H9252 conformation
or as an H9251 helix. Also shown is the size of the protein in its native
globular form, as determined by X-ray crystallography; the polypeptide
chain must be very compactly folded to fit into these dimensions.
a Helix
900 H11003 11 ?
Native globular form
100 H11003 60 ?
b Conformation
2,000 H11003 5 ?
8885d_c04_129 12/30/03 2:13 PM Page 129 mac76 mac76:385_reb:
Chapter 4 The Three-Dimensional Structure of Proteins130
BOX 4–3 BIOCHEMISTRY IN MEDICINE
Why Sailors, Explorers, and College Students
Should Eat Their Fresh Fruits and Vegetables
. . . from this misfortune, together with the unhealthiness
of the country, where there never falls a drop of rain, we
were stricken with the “camp-sickness,” which was such
that the flesh of our limbs all shrivelled up, and the skin
of our legs became all blotched with black, mouldy
patches, like an old jack-boot, and proud flesh came
upon the gums of those of us who had the sickness,
and none escaped from this sickness save through the
jaws of death. The signal was this: when the nose began
to bleed, then death was at hand . . .
—from The Memoirs of the Lord of Joinville, ca. 1300
This excerpt describes the plight of Louis IX’s army
toward the end of the Seventh Crusade (1248–1254),
immediately preceding the battle of Fariskur, where
the scurvy-weakened Crusader army was destroyed by
the Egyptians. What was the nature of the malady af-
flicting these thirteenth-century soldiers?
Scurvy is caused by lack of vitamin C, or ascorbic
acid (ascorbate). Vitamin C is required for, among other
things, the hydroxylation of proline and lysine in colla-
gen; scurvy is a deficiency disease characterized by
general degeneration of connective tissue. Manifesta-
tions of advanced scurvy include numerous small hem-
orrhages caused by fragile blood vessels, tooth loss,
poor wound healing and the reopening of old wounds,
bone pain and degeneration, and eventually heart fail-
ure. Despondency and oversensitivity to stimuli of
many kinds are also observed. Milder cases of vitamin
C deficiency are accompanied by fatigue, irritability,
and an increased severity of respiratory tract infections.
Most animals make large amounts of vitamin C, con-
verting glucose to ascorbate in four enzymatic steps.
But in the course of evolution, humans and some other
animals—gorillas, guinea pigs, and fruit bats—have lost
the last enzyme in this pathway and must obtain ascor-
bate in their diet. Vitamin C is available in a wide range
of fruits and vegetables. Until 1800, however, it was of-
ten absent in the dried foods and other food supplies
stored for winter or for extended travel.
Scurvy was recorded by the Egyptians in 1500 BCE,
and it is described in the fifth century BCE writings of
Hippocrates. Although scurvy played a critical role in
medieval wars and made regular winter appearances
in northern climates, it did not come to wide public
notice until the European voyages of discovery from
1500 to 1800. The first circumnavigation of the globe,
led by Ferdinand Magellan (1520), was accomplished
only with the loss of more than 80% of his crew to
scurvy. Vasco da Gama lost two-thirds of his crew to
the same fate during his first exploration of trade
routes to India (1499). During Jacques Cartier’s sec-
ond voyage to explore the St. Lawrence River (1535–
1536), his band suffered numerous fatalities and was
threatened with complete disaster until the native
Americans taught the men to make a cedar tea that
cured and prevented scurvy (it contained vitamin C)
(Fig. 1). It is estimated that a million sailors died of
scurvy in the years 1600 to 1800. Winter outbreaks of
scurvy in Europe were gradually eliminated in the
nineteenth century as the cultivation of the potato, in-
troduced from South America, became widespread.
In 1747, James Lind, a Scottish surgeon in the Royal
Navy (Fig. 2), carried out the first controlled clinical
study in recorded history. During an extended voyage
on the 50-gun warship HMS Salisbury, Lind selected
12 sailors suffering from scurvy and separated them into
groups of two. All 12 received
the same diet, except that each
group was given a different rem-
edy for scurvy from among those
recommended at the time. The
sailors given lemons and oranges
recovered and returned to duty.
The sailors given boiled apple
juice improved slightly. The re-
mainder continued to deterio-
rate. Lind’s Treatise on the
Scurvy was published in 1753,
but inaction persisted in the
Royal Navy for another 40 years.
FIGURE 1 Iroquois showing Jacques Cartier how to make cedar
tea as a remedy for scurvy.
FIGURE 2 James Lind,
1716–1794; naval sur-
geon, 1739–1748.
8885d_c04_130 12/23/03 7:47 AM Page 130 mac111 mac111:reb:
4.3 Protein Tertiary and Quaternary Structures 131
In 1795 the British admiralty finally mandated a ration
of concentrated lime or lemon juice for all British sailors
(hence the name “limeys”). Scurvy continued to be a
problem in some other parts of the world until 1932,
when Hungarian scientist Albert Szent-Gy?rgyi, and W.
A. Waugh and C. G. King at the University of Pittsburgh,
isolated and synthesized ascorbic acid.
L-Ascorbic acid (vitamin C) is a white, odorless,
crystalline powder. It is freely soluble in water and rel-
atively insoluble in organic solvents. In a dry state,
away from light, it is stable for a considerable length
of time. The appropriate daily intake of this vitamin is
still in dispute. The recommended daily allowance in
the United States is 60 mg (Australia and the United
Kingdom recommend 30 to 40 mg; Russia recom-
mends 100 mg). Higher doses of vitamin C are some-
times recommended, although the benefit of such a
regimen is disputed. Notably, animals that synthesize
their own vitamin C maintain levels found in humans
only if they consume hundreds of times the recom-
mended daily allowance. Along with citrus fruits and
almost all other fresh fruits, other good sources of vi-
tamin C include peppers, tomatoes, potatoes, and
broccoli. The vitamin C of fruits and vegetables is de-
stroyed by overcooking or prolonged storage.
So why is ascorbate so necessary to good health?
Of particular interest to us here is its role in the for-
mation of collagen. The proline derivative 4(R)-L-
hydroxyproline (4-Hyp) plays an essential role in the
folding of collagen and in maintaining its structure. As
noted in the text, collagen is constructed of the re-
peating tripeptide unit Gly–X–Y, where X and Y are
generally Pro or 4-Hyp. A constructed peptide with 10
Gly–Pro–Pro repeats will fold to form a collagen triple
helix, but the structure melts at 41 H11034C. If the 10 re-
peats are changed to Gly–Pro–4-Hyp, the melting tem-
perature jumps to 69 H11034C. The stability of collagen
arises from the detailed structure of the collagen he-
lix, determined independently by Helen Berman and
Adriana Zagari and their colleagues. The proline ring
is normally found as a mixture of two puckered con-
formations, called C
H9253
-endo and C
H9253
-exo (Fig. 3). The
collagen helix structure requires the Pro residue in
the Y positions to be in the C
H9253
-exo conformation, and
it is this conformation that is enforced by the hydroxyl
substitution at C-4 in 4-hydroxyproline. However, the
collagen structure requires the Pro residue in the X
positions to have the C
H9253
-endo conformation, and in-
troduction of 4-Hyp here can destabilize the helix. The
inability to hydroxylate the Pro at the Y positions when
vitamin C is absent leads to collagen instability and
the connective tissue problems seen in scurvy.
The hydroxylation of specific Pro residues in pro-
collagen, the precursor of collagen, requires the ac-
tion of the enzyme prolyl 4-hydroxylase. This enzyme
(M
r
240,000) is an H9251
2
H9252
2
tetramer in all vertebrate
sources. The proline-hydroxylating activity is found in
the H9251 subunits. (Researchers were surprised to find
that the H9252 subunits are identical to the enzyme pro-
tein disulfide isomerase (PDI; p. 152); these subunits
do not participate in the prolyl hydroxylation activity.)
Each H9251 subunit contains one atom of nonheme iron
(Fe
2H11001
), and the enzyme is one of a class of hydroxy-
lases that require H9251-ketoglutarate in their reactions.
In the normal prolyl 4-hydroxylase reaction (Fig.
4a), one molecule of H9251-ketoglutarate and one of O
2
bind to the enzyme. The H9251-ketoglutarate is oxidatively
decarboxylated to form CO
2
and succinate. The re-
maining oxygen atom is then used to hydroxylate an
appropriate Pro residue in procollagen. No ascorbate is
needed in this reaction. However, prolyl 4-hydroxylase
also catalyzes an oxidative decarboxylation of H9251-
ketoglutarate that is not coupled to proline hydroxy-
lation—and this is the reaction that requires ascorbate
(Fig. 4b). During this reaction, the heme Fe
2H11001
be-
comes oxidized, and the oxidized form of the enzyme
is inactive—unable to hydroxylate proline. The ascor-
bate consumed in the reaction presumably functions
to reduce the heme iron and restore enzyme activity.
But there is more to the vitamin C story than pro-
line hydroxylation. Very similar hydroxylation reac-
tions generate the less abundant 3-hydroxyproline and
5-hydroxylysine residues that also occur in collagen.
The enzymes that catalyze these reactions are mem-
bers of the same H9251-ketoglutarate-dependent dioxyge-
nase family, and for all these enzymes ascorbate plays
the same role. These dioxygenases are just a few
of the dozens of closely related enzymes that play
a variety of metabolic roles in different classes of
organisms. Ascorbate serves other roles too. It is an
antioxidant, reacting enzymatically and nonenzymati-
cally with reactive oxygen species, which in mammals
play an important role in aging and cancer.
O
N
O
N
HO
C
H9253
-endo
Proline
C
H9253
-exo
4-Hydroxyproline
FIGURE 3 The C
H9253
-endo conformation of proline and the C
H9253
-exo
conformation of 4-hydroxyproline.
(continued on next page)
8885d_c04_131 12/23/03 7:48 AM Page 131 mac111 mac111:reb:
to function, and even the evolutionary paths by which
proteins arrived at their present state, which can be
glimpsed in the family resemblances that are revealed
as protein databases are sifted and sorted. The sheer
variety of structures can seem daunting. Yet as new pro-
tein structures become available it is becoming increas-
ingly clear that they are manifestations of a finite set of
recognizable, stable folding patterns.
Our discussion of globular protein structure begins
with the principles gleaned from the earliest protein
structures to be elucidated. This is followed by a de-
tailed description of protein substructure and compar-
ative categorization. Such discussions are possible only
because of the vast amount of information available over
the Internet from resources such as the Protein Data
Bank (PDB; www.rcsb.org/pdb), an archive of experi-
mentally determined three-dimensional structures of
biological macromolecules.
Myoglobin Provided Early Clues about the Complexity
of Globular Protein Structure
Protein Architecture—Tertiary Structure of Small Globular Pro-
teins, II. Myoglobin The first breakthrough in understand-
ing the three-dimensional structure of a globular pro-
tein came from x-ray diffraction studies of myoglobin
carried out by John Kendrew and his colleagues in the
1950s. Myoglobin is a relatively small (M
r
16,700),
oxygen-binding protein of muscle cells. It functions both
to store oxygen and to facilitate oxygen diffusion in rap-
idly contracting muscle tissue. Myoglobin contains a sin-
gle polypeptide chain of 153 amino acid residues of
known sequence and a single iron protoporphyrin, or
heme, group. The same heme group is found in hemo-
globin, the oxygen-binding protein of erythrocytes, and
is responsible for the deep red-brown color of both myo-
globin and hemoglobin. Myoglobin is particularly abun-
Chapter 4 The Three-Dimensional Structure of Proteins132
In plants, ascorbate is required as a substrate for
the enzyme ascorbate peroxidase, which converts
H
2
O
2
to water. The peroxide is generated from the O
2
produced in photosynthesis, an unavoidable conse-
quence of generating O
2
in a compartment laden with
powerful oxidation-reduction systems (Chapter 19).
Ascorbate is a also a precursor of oxalate and tartrate
in plants, and is involved in the hydroxylation of Pro
residues in cell wall proteins called extensins. Ascor-
bate is found in all subcellular compartments of plants,
at concentrations of 2 to 25 mM—which is why plants
are such good sources of vitamin C.
Scurvy remains a problem today. The malady is still
encountered not only in remote regions where nutri-
tious food is scarce but, surprisingly, on U.S. college
campuses. The only vegetables consumed by some stu-
dents are those in tossed salads, and days go by with-
out these young adults consuming fruit. A 1998 study
of 230 students at Arizona State University revealed
that 10% had serious vitamin C deficiencies, and 2 stu-
dents had vitamin C levels so low that they probably
had scurvy. Only half the students in the study con-
sumed the recommended daily allowance of vitamin C.
Eat your fresh fruit and vegetables.
CO
HC
H
2
C
H
2
COH
HCOH
C
CC
HO OH
(a)
(b)
CH
2
CH
2
CH
2
O
2
COOH
COOH
C O
O
O
N
Pro residue
H11001H11001
H9251-Ketoglutarate
H9251-Ketoglutarate Ascorbate
CO
HC OH
C
CH
2
CH
2
CO
2
COOH
COOH
N H
4-Hyp residue
H11001 H11001
Succinate
CH
2
CH
2
CO
2
COOH
COOH
H11001H11001
Succinate
CH
2
CH
2
O
2
COOH
C
OC
H11001H11001
H
2
COH
HCOH
C
CC
O
Dehydroascorbate
O
OO
C
COOH
Fe
2H11001
Fe
2H11001
C
H
2
H
2
C
C
H
2
FIGURE 4 The reactions catalyzed by
prolyl 4-hydroxylase. (a) The normal
reaction, coupled to proline hydroxyla-
tion, which does not require ascorbate.
The fate of the two oxygen atoms from
O
2
is shown in red. (b) The uncoupled
reaction, in which H9251-ketoglutarate is
oxidatively decarboxylated without
hydroxylation of proline. Ascorbate is
consumed stoichiometrically in this
process as it is converted to
dehydroascorbate.
BOX 4–3 BIOCHEMISTRY IN MEDICINE (continued from previous page)
8885d_c04_132 12/23/03 7:48 AM Page 132 mac111 mac111:reb:
4.3 Protein Tertiary and Quaternary Structures 133
dant in the muscles of diving mammals such as the
whale, seal, and porpoise, whose muscles are so rich in
this protein that they are brown. Storage and distribu-
tion of oxygen by muscle myoglobin permit these ani-
mals to remain submerged for long periods of time. The
activities of myoglobin and other globin molecules are
investigated in greater detail in Chapter 5.
Figure 4–16 shows several structural representa-
tions of myoglobin, illustrating how the polypeptide
chain is folded in three dimensions—its tertiary struc-
ture. The red group surrounded by protein is heme. The
backbone of the myoglobin molecule is made up of eight
relatively straight segments of H9251 helix interrupted by
bends, some of which are H9252 turns. The longest H9251 helix
has 23 amino acid residues and the shortest only 7; all
helices are right-handed. More than 70% of the residues
in myoglobin are in these H9251-helical regions. X-ray analy-
sis has revealed the precise position of each of the R
groups, which occupy nearly all the space within the
folded chain.
Many important conclusions were drawn from the
structure of myoglobin. The positioning of amino acid
side chains reflects a structure that derives much of its
stability from hydrophobic interactions. Most of the hy-
drophobic R groups are in the interior of the myoglobin
molecule, hidden from exposure to water. All but two
of the polar R groups are located on the outer surface
of the molecule, and all are hydrated. The myoglobin
molecule is so compact that its interior has room for
only four molecules of water. This dense hydrophobic
core is typical of globular proteins. The fraction of space
occupied by atoms in an organic liquid is 0.4 to 0.6; in
a typical crystal the fraction is 0.70 to 0.78, near the
theoretical maximum. In a globular protein the fraction
is about 0.75, comparable to that in a crystal. In this
packed environment, weak interactions strengthen and
reinforce each other. For example, the nonpolar side
chains in the core are so close together that short-range
van der Waals interactions make a significant contribu-
tion to stabilizing hydrophobic interactions.
(d) (e)
(a) (b) (c)
FIGURE 4–16 Tertiary structure of sperm whale myoglobin. (PDB ID
1MBO) The orientation of the protein is similar in all panels; the heme
group is shown in red. In addition to illustrating the myoglobin struc-
ture, this figure provides examples of several different ways to display
protein structure. (a) The polypeptide backbone, shown in a ribbon
representation of a type introduced by Jane Richardson, which high-
lights regions of secondary structure. The H9251-helical regions are evi-
dent. (b) A “mesh” image emphasizes the protein surface. (c) A sur-
face contour image is useful for visualizing pockets in the protein
where other molecules might bind. (d) A ribbon representation, in-
cluding side chains (blue) for the hydrophobic residues Leu, Ile, Val,
and Phe. (e) A space-filling model with all amino acid side chains.
Each atom is represented by a sphere encompassing its van der Waals
radius. The hydrophobic residues are again shown in blue; most are
not visible, because they are buried in the interior of the protein.
8885d_c04_133 12/23/03 7:48 AM Page 133 mac111 mac111:reb:
Deduction of the structure of myoglobin confirmed
some expectations and introduced some new elements
of secondary structure. As predicted by Pauling and
Corey, all the peptide bonds are in the planar trans con-
figuration. The H9251 helices in myoglobin provided the first
direct experimental evidence for the existence of this
type of secondary structure. Three of the four Pro
residues of myoglobin are found at bends (recall that
proline, with its fixed H9278 bond angle and lack of a peptide-
bond NOH group for participation in hydrogen bonds,
is largely incompatible with H9251-helical structure). The
fourth Pro residue occurs within an H9251 helix, where it cre-
ates a kink necessary for tight helix packing. Other bends
contain Ser, Thr, and Asn residues, which are among the
amino acids whose bulk and shape tend to make them
incompatible with H9251-helical structure if they are in close
proximity in the amino acid sequence (p. 121).
The flat heme group rests in a crevice, or pocket, in
the myoglobin molecule. The iron atom in the center of
the heme group has two bonding (coordination) posi-
tions perpendicular to the plane of the heme (Fig. 4–17).
One of these is bound to the R group of the His residue
at position 93; the other is the site at which an O
2
mol-
ecule binds. Within this pocket, the accessibility of the
heme group to solvent is highly restricted. This is im-
portant for function, because free heme groups in an oxy-
genated solution are rapidly oxidized from the ferrous
(Fe
2H11001
) form, which is active in the reversible binding of
O
2
, to the ferric (Fe
3H11001
) form, which does not bind O
2
.
Knowledge of the structure of myoglobin allowed
researchers for the first time to understand in detail the
correlation between the structure and function of a pro-
tein. Many different myoglobin structures have been
elucidated, allowing investigators to see how the struc-
ture changes when oxygen or other molecules bind to
it. Hundreds of proteins have been subjected to similar
analysis since then. Today, techniques such as NMR
spectroscopy supplement x-ray diffraction data, pro-
viding more information on a protein’s structure (Box
4–4). The ongoing sequencing of genomic DNA from
many organisms (Chapter 9) has identified thousands
of genes that encode proteins of known sequence but
unknown function. Our first insight into what these pro-
teins do often comes from our still-limited understand-
ing of how primary structure determines tertiary struc-
ture, and how tertiary structure determines function.
Globular Proteins Have a Variety
of Tertiary Structures
With elucidation of the tertiary structures of hundreds
of other globular proteins by x-ray analysis, it became
clear that myoglobin illustrates only one of many ways
in which a polypeptide chain can be folded. In Figure
4–18 the structures of cytochrome c, lysozyme, and
ribonuclease are compared. These proteins have differ-
ent amino acid sequences and different tertiary struc-
tures, reflecting differences in function. All are relatively
small and easy to work with, facilitating structural analy-
sis. Cytochrome c is a component of the respiratory
chain of mitochondria (Chapter 19). Like myoglobin, cy-
tochrome c is a heme protein. It contains a single
polypeptide chain of about 100 residues (M
r
12,400) and
a single heme group. In this case, the protoporphyrin of
the heme group is covalently attached to the polypep-
tide. Only about 40% of the polypeptide is in H9251-helical
segments, compared with 70% of the myoglobin chain.
The rest of the cytochrome c chain contains H9252 turns and
irregularly coiled and extended segments.
Lysozyme (M
r
14,600) is an enzyme abundant in egg
white and human tears that catalyzes the hydrolytic
cleavage of polysaccharides in the protective cell walls
of some families of bacteria. Lysozyme, because it can
lyse, or degrade, bacterial cell walls, serves as a bacte-
ricidal agent. As in cytochrome c, about 40% of its 129
amino acid residues are in H9251-helical segments, but the
arrangement is different and some H9252-sheet structure is
also present (Fig. 4–18). Four disulfide bonds con-
tribute stability to this structure. The H9251 helices line a
long crevice in the side of the molecule, called the ac-
tive site, which is the site of substrate binding and catal-
ysis. The bacterial polysaccharide that is the substrate
for lysozyme fits into this crevice. Protein Architecture—
Tertiary Structure of Small Globular Proteins, III. Lysozyme
Ribonuclease, another small globular protein (M
r
13,700), is an enzyme secreted by the pancreas into the
small intestine, where it catalyzes the hydrolysis of cer-
tain bonds in the ribonucleic acids present in ingested
Chapter 4 The Three-Dimensional Structure of Proteins134
O
C
O
O
Fe
CH
3
CH
(a)
N
CH
2
CH
2
CH
2
CH
2
CH
2
CH
3
CH
3
CH
3
CH
CH
CH CH
CH
O
C
CC
CC
C
C
C
CC
C
C
C
C
CC
N
NN
CCH
2
H11002H11002
H11001
H11001
(b)
Fe
O
2
CH
2
N
N
FIGURE 4–17 The heme group. This group is present in myoglobin,
hemoglobin, cytochromes, and many other heme proteins. (a) Heme
consists of a complex organic ring structure, protoporphyrin, to which
is bound an iron atom in its ferrous (Fe
2H11001
) state. The iron atom has six
coordination bonds, four in the plane of, and bonded to, the flat por-
phyrin molecule and two perpendicular to it. (b) In myoglobin and
hemoglobin, one of the perpendicular coordination bonds is bound
to a nitrogen atom of a His residue. The other is “open” and serves as
the binding site for an O
2
molecule.
8885d_c04_134 12/23/03 7:49 AM Page 134 mac111 mac111:reb:
food. Its tertiary structure, determined by x-ray analy-
sis, shows that little of its 124 amino acid polypeptide
chain is in an H9251-helical conformation, but it contains
many segments in the H9252 conformation (Fig. 4–18). Like
lysozyme, ribonuclease has four disulfide bonds be-
tween loops of the polypeptide chain.
In small proteins, hydrophobic residues are less
likely to be sheltered in a hydrophobic interior—simple
geometry dictates that the smaller the protein, the lower
the ratio of volume to surface area. Small proteins also
have fewer potential weak interactions available to sta-
bilize them. This explains why many smaller proteins
such as those in Figure 4–18 are stabilized by a number
of covalent bonds. Lysozyme and ribonuclease, for ex-
ample, have disulfide linkages, and the heme group in
cytochrome c is covalently linked to the protein on two
sides, providing significant stabilization of the entire
protein structure.
Table 4–2 shows the proportions of H9251 helix and H9252
conformation (expressed as percentage of residues in
each secondary structure) in several small, single-chain,
globular proteins. Each of these proteins has a distinct
structure, adapted for its particular biological function,
but together they share several important properties.
Each is folded compactly, and in each case the hydro-
phobic amino acid side chains are oriented toward the
interior (away from water) and the hydrophilic side
chains are on the surface. The structures are also sta-
bilized by a multitude of hydrogen bonds and some ionic
interactions.
4.3 Protein Tertiary and Quaternary Structures 135
FIGURE 4–18 Three-dimensional structures of some small proteins.
Shown here are cytochrome c (PDB ID 1CCR), lysozyme (PDB ID
3LYM), and ribonuclease (PDB ID 3RN3). Each protein is shown in
surface contour and in a ribbon representation, in the same orienta-
tion. In the ribbon depictions, regions in the H9252 conformation are
represented by flat arrows and the H9251 helices are represented by spiral
ribbons. Key functional groups (the heme in cytochrome c; amino acid
side chains in the active site of lysozyme and ribonuclease) are shown
in red. Disulfide bonds are shown (in the ribbon representations) in
yellow.
Source: Data from Cantor, C.R. & Schimmel, P.R. (1980) Biophysical Chemistry, Part I: The Confor-
mation of Biological Macromolecules, p. 100, W. H. Freeman and Company, New York.
*Portions of the polypeptide chains that are not accounted for by H9251 helix or H9252 conformation con-
sist of bends and irregularly coiled or extended stretches. Segments of H9251 helix and H9252 conforma-
tion sometimes deviate slightly from their normal dimensions and geometry.
Residues (%)*
Protein (total residues) H9251 Helix H9252 Conformation
Chymotrypsin (247) 14 45
Ribonuclease (124) 26 35
Carboxypeptidase (307) 38 17
Cytochrome c (104) 39 0
Lysozyme (129) 40 12
Myoglobin (153) 78 0
TABLE 4–2 Approximate Amounts of H9251 Helix and
H9252 Conformation in Some Single-Chain Proteins
Cytochrome c Lysozyme Ribonuclease
8885d_c04_135 12/23/03 7:49 AM Page 135 mac111 mac111:reb:
BOX 4–4 WORKING IN BIOCHEMISTRY
Chapter 4 The Three-Dimensional Structure of Proteins136
(a) (b)
Methods for Determining the Three-Dimensional
Structure of a Protein
X-Ray Diffraction
The spacing of atoms in a crystal lattice can be de-
termined by measuring the locations and intensities
of spots produced on photographic film by a beam of
x rays of given wavelength, after the beam has been
diffracted by the electrons of the atoms. For example,
x-ray analysis of sodium chloride crystals shows that
Na
H11001
and Cl
H11002
ions are arranged in a simple cubic lat-
tice. The spacing of the different kinds of atoms in
complex organic molecules, even very large ones such
as proteins, can also be analyzed by x-ray diffraction
methods. However, the technique for analyzing crys-
tals of complex molecules is far more laborious than
for simple salt crystals. When the repeating pattern of
the crystal is a molecule as large as, say, a protein, the
numerous atoms in the molecule yield thousands of
diffraction spots that must be analyzed by computer.
The process may be understood at an elementary
level by considering how images are generated in a
light microscope. Light from a point source is focused
on an object. The light waves are scattered by the ob-
ject, and these scattered waves are recombined by a
series of lenses to generate an enlarged image of the
object. The smallest object whose structure can be
determined by such a system—that is, the resolv-
ing power of the microscope—is determined by the
wavelength of the light, in this case visible light, with
wavelengths in the range of 400 to 700 nm. Objects
smaller than half the wavelength of the incident light
cannot be resolved. To resolve objects as small as pro-
teins we must use x rays, with wavelengths in the
range of 0.7 to 1.5 ? (0.07 to 0.15 nm). However, there
are no lenses that can recombine x rays to form an
image; instead the pattern of diffracted x rays is col-
lected directly and an image is reconstructed by math-
ematical techniques.
The amount of information obtained from x-ray
crystallography depends on the degree of structural
order in the sample. Some important structural pa-
rameters were obtained from early studies of the dif-
fraction patterns of the fibrous proteins arranged in
fairly regular arrays in hair and wool. However, the or-
derly bundles formed by fibrous proteins are not
crystals—the molecules are aligned side by side, but
not all are oriented in the same direction. More de-
tailed three-dimensional structural information about
proteins requires a highly ordered protein crystal. Pro-
tein crystallization is something of an empirical sci-
ence, and the structures of many important proteins
are not yet known, simply because they have proved
difficult to crystallize. Practitioners have compared
making protein crystals to holding together a stack of
bowling balls with cellophane tape.
Operationally, there are several steps in x-ray
structural analysis (Fig. 1). Once a crystal is obtained,
it is placed in an x-ray beam between the x-ray source
and a detector, and a regular array of spots called re-
8885d_c04_136 12/23/03 7:49 AM Page 136 mac111 mac111:reb:
4.3 Protein Tertiary and Quaternary Structures 137
(c) (d)
flections is generated. The spots are created by the
diffracted x-ray beam, and each atom in a molecule
makes a contribution to each spot. An electron-density
map of the protein is reconstructed from the overall
diffraction pattern of spots by using a mathematical
technique called a Fourier transform. In effect, the
computer acts as a “computational lens.” A model for
the structure is then built that is consistent with the
electron-density map.
John Kendrew found that the x-ray diffraction
pattern of crystalline myoglobin (isolated from mus-
cles of the sperm whale) is very complex, with nearly
25,000 reflections. Computer analysis of these reflec-
tions took place in stages. The resolution improved at
each stage, until in 1959 the positions of virtually all
the non-hydrogen atoms in the protein had been de-
termined. The amino acid sequence of the protein, ob-
tained by chemical analysis, was consistent with the
molecular model. The structures of thousands of pro-
teins, many of them much more complex than myo-
globin, have since been determined to a similar level
of resolution.
The physical environment within a crystal, of
course, is not identical to that in solution or in a liv-
ing cell. A crystal imposes a space and time average
on the structure deduced from its analysis, and x-ray
diffraction studies provide little information about mo-
lecular motion within the protein. The conformation
of proteins in a crystal could in principle also be af-
fected by nonphysiological factors such as incidental
protein-protein contacts within the crystal. However,
when structures derived from the analysis of crystals
are compared with structural information obtained by
other means (such as NMR, as described below), the
crystal-derived structure almost always represents a
functional conformation of the protein. X-ray crystal-
lography can be applied successfully to proteins too
large to be structurally analyzed by NMR.
Nuclear Magnetic Resonance
An important complementary method for determining
the three-dimensional structures of macromolecules is
nuclear magnetic resonance (NMR). Modern NMR
techniques are being used to determine the structures
of ever-larger macromolecules, including carbohy-
drates, nucleic acids, and small to average-sized pro-
teins. An advantage of NMR studies is that they are
FIGURE 1 Steps in the determination of the structure of sperm whale
myoglobin by x-ray crystallography. (a) X-ray diffraction patterns are
generated from a crystal of the protein. (b) Data extracted from the
diffraction patterns are used to calculate a three-dimensional elec-
tron-density map of the protein. The electron density of only part of
the structure, the heme, is shown. (c) Regions of greatest electron
density reveal the location of atomic nuclei, and this information is
used to piece together the final structure. Here, the heme structure
is modeled into its electron-density map. (d) The completed struc-
ture of sperm whale myoglobin, including the heme (PDB ID
2MBW).
(continued on next page)
8885d_c04_137 12/23/03 7:49 AM Page 137 mac111 mac111:reb:
carried out on macromolecules in solution, whereas x-
ray crystallography is limited to molecules that can be
crystallized. NMR can also illuminate the dynamic side
of protein structure, including conformational changes,
protein folding, and interactions with other molecules.
NMR is a manifestation of nuclear spin angular
momentum, a quantum mechanical property of atomic
nuclei. Only certain atoms, including
1
H,
13
C,
15
N,
19
F,
and
31
P, possess the kind of nuclear spin that gives
rise to an NMR signal. Nuclear spin generates a mag-
netic dipole. When a strong, static magnetic field is
applied to a solution containing a single type of macro-
molecule, the magnetic dipoles are aligned in the field
in one of two orientations, parallel (low energy) or
antiparallel (high energy). A short (~10 H9262s) pulse of
electromagnetic energy of suitable frequency (the res-
onant frequency, which is in the radio frequency
range) is applied at right angles to the nuclei aligned
in the magnetic field. Some energy is absorbed as nu-
clei switch to the high-energy state, and the absorp-
tion spectrum that results contains information about
the identity of the nuclei and their immediate chemi-
cal environment. The data from many such experi-
ments performed on a sample are averaged, increas-
ing the signal-to-noise ratio, and an NMR spectrum
such as that in Figure 2 is generated.
1
H is particularly important in NMR experiments
because of its high sensitivity and natural abundance.
For macromolecules,
1
H NMR spectra can become
quite complicated. Even a small protein has hundreds
of
1
H atoms, typically resulting in a one-dimensional
NMR spectrum too complex for analysis. Structural
analysis of proteins became possible with the advent
of two-dimensional NMR techniques (Fig. 3). These
methods allow measurement of distance-dependent
coupling of nuclear spins in nearby atoms through
space (the nuclear Overhauser effect (NOE), in a
method dubbed NOESY) or the coupling of nuclear
spins in atoms connected by covalent bonds (total cor-
relation spectroscopy, or TOCSY).
Translating a two-dimensional NMR spectrum into
a complete three-dimensional structure can be a labo-
rious process. The NOE signals provide some informa-
tion about the distances between individual atoms, but
for these distance constraints to be useful, the atoms
giving rise to each signal must be identified. Comple-
mentary TOCSY experiments can help identify which
NOE signals reflect atoms that are linked by covalent
bonds. Certain patterns of NOE signals have been as-
sociated with secondary structures such as H9251 helices.
Modern genetic engineering (Chapter 9) can be used
to prepare proteins that contain the rare isotopes
13
C
or
15
N. The new NMR signals produced by these atoms,
and the coupling with
1
H signals resulting from these
substitutions, help in the assignment of individual
1
H
NOE signals. The process is also aided by a knowledge
of the amino acid sequence of the polypeptide.
To generate a three-dimensional structure, re-
searchers feed the distance constraints into a com-
puter along with known geometric constraints such as
chirality, van der Waals radii, and bond lengths and
angles. The computer generates a family of closely re-
lated structures that represent the range of confor-
mations consistent with the NOE distance constraints
(Fig. 3c). The uncertainty in structures generated by
NMR is in part a reflection of the molecular vibrations
(breathing) within a protein structure in solution, dis-
cussed in more detail in Chapter 5. Normal experi-
mental uncertainty can also play a role.
When a protein structure has been determined by
both x-ray crystallography and NMR, the structures
Chapter 4 The Three-Dimensional Structure of Proteins138
FIGURE 2 A one-dimensional NMR spectrum of a globin from a
marine blood worm. This protein and sperm whale myoglobin are
very close structural analogs, belonging to the same protein struc-
tural family and sharing an oxygen-transport function.
10.0 8.0 6.0 4.0 2.0 0.0 –2.0
1
H chemical shift (ppm)
Analysis of Many Globular Proteins Reveals
Common Structural Patterns
Protein Architecture—Tertiary Structure of Large Globular Pro-
teins For the beginning student, the very complex terti-
ary structures of globular proteins much larger than
those shown in Figure 4–18 are best approached by fo-
cusing on structural patterns that recur in different and
often unrelated proteins. The three-dimensional struc-
ture of a typical globular protein can be considered an
assemblage of polypeptide segments in the H9251-helix and
H9252-sheet conformations, linked by connecting segments.
The structure can then be described to a first approxi-
mation by defining how these segments stack on one
BOX 4–4 WORKING IN BIOCHEMISTRY (continued from previous page)
8885d_c04_138 12/23/03 7:50 AM Page 138 mac111 mac111:reb:
generally agree well. In some cases, the precise loca-
tions of particular amino acid side chains on the pro-
tein exterior are different, often because of effects re-
lated to the packing of adjacent protein molecules in
a crystal. The two techniques together are at the heart
of the rapid increase in the availability of structural
information about the macromolecules of living cells.
4.3 Protein Tertiary and Quaternary Structures 139
1
2
–2.00.02.04.06.08.010.0
–
2.0
0.0
2.0
4.0
6.0
8.0
10.0
1
H chemical shift (ppm)
1
H chemical shift (ppm)
(a) (b)
1
2
(c)
FIGURE 3 The use of two-dimensional NMR to generate a three-
dimensional structure of a globin, the same protein used to
generate the data in Figure 2. The diagonal in a two-dimensional
NMR spectrum is equivalent to a one-dimensional spectrum. The
off-diagonal peaks are NOE signals generated by close-range
interactions of
1
H atoms that may generate signals quite distant in
the one-dimensional spectrum. Two such interactions are identified
in (a), and their identities are shown with blue lines in (b) (PDB
ID 1VRF). Three lines are drawn for interaction 2 between a
methyl group in the protein and a hydrogen on the heme. The
methyl group rotates rapidly such that each of its three hydrogens
contributes equally to the interaction and the NMR signal. Such
information is used to determine the complete three-dimensional
structure (PDB ID 1VRE), as in (c). The multiple lines shown for
the protein backbone represent the family of structures consistent
with the distance constraints in the NMR data. The structural
similarity with myoglobin (Fig. 1) is evident. The proteins are
oriented in the same way in both figures.
another and how the segments that connect them are
arranged. This formalism has led to the development of
databases that allow informative comparisons of protein
structures, complementing other databases that permit
comparisons of protein sequences.
An understanding of a complete three-dimensional
structure is built upon an analysis of its parts. We begin
by defining terms used to describe protein substruc-
tures, then turn to the folding rules elucidated from
analysis of the structures of many proteins.
Supersecondary structures, also called motifs
or simply folds, are particularly stable arrangements of
several elements of secondary structure and the con-
nections between them. There is no universal agreement
8885d_c04_139 12/23/03 7:50 AM Page 139 mac111 mac111:reb:
among biochemists on the application of the three
terms, and they are often used interchangeably. The
terms are also applied to a wide range of structures.
Recognized motifs range from simple to complex, some-
times appearing in repeating units or combinations. A
single large motif may comprise the entire protein. We
have already encountered one well-studied motif, the
coiled coil of H9251-keratin, also found in a number of other
proteins.
Polypeptides with more than a few hundred amino
acid residues often fold into two or more stable, globu-
lar units called domains. In many cases, a domain from
a large protein will retain its correct three-dimensional
structure even when it is separated (for example, by
proteolytic cleavage) from the remainder of the
polypeptide chain. A protein with multiple domains may
appear to have a distinct globular lobe for each domain
(Fig. 4–19), but, more commonly, extensive contacts be-
tween domains make individual domains hard to dis-
cern. Different domains often have distinct functions,
such as the binding of small molecules or interaction
with other proteins. Small proteins usually have only one
domain (the domain is the protein).
Folding of polypeptides is subject to an array of
physical and chemical constraints. A sampling of the
prominent folding rules that have emerged provides an
opportunity to introduce some simple motifs.
1. Hydrophobic interactions make a large contribu-
tion to the stability of protein structures. Burial of
hydrophobic amino acid R groups so as to exclude
water requires at least two layers of secondary
structure. Two simple motifs, the H9252-H9251-H9252 loop and
the H9251-H9251 corner (Fig. 4–20a), create two layers.
2. Where they occur together in proteins, H9251 helices
and H9252 sheets generally are found in different
structural layers. This is because the backbone of
a polypeptide segment in the H9252 conformation (Fig.
4–7) cannot readily hydrogen-bond to an H9251 helix
aligned with it.
Chapter 4 The Three-Dimensional Structure of Proteins140
FIGURE 4–19 Structural domains in the polypeptide troponin C.
(PDB ID 4TNC) This calcium-binding protein associated with muscle
has separate calcium-binding domains, indicated in blue and purple.
FIGURE 4–20 Stable folding patterns in proteins. (a) Two simple and
common motifs that provide two layers of secondary structure. Amino
acid side chains at the interface between elements of secondary struc-
ture are shielded from water. Note that the H9252 strands in the H9252-H9251-H9252 loop
tend to twist in a right-handed fashion. (b) Connections between H9252
strands in layered H9252 sheets. The strands are shown from one end, with
no twisting included in the schematic. Thick lines represent connec-
tions at the ends nearest the viewer; thin lines are connections at the
far ends of the H9252 strands. The connections on a given end (e.g., near
the viewer) do not cross each other. (c) Because of the twist in H9252
strands, connections between strands are generally right-handed. Left-
handed connections must traverse sharper angles and are harder to
form. (d) Two arrangements of H9252 strands stabilized by the tendency of
the strands to twist. This H9252 barrel is a single domain of H9251-hemolysin
(a pore-forming toxin that kills a cell by creating a hole in its mem-
brane) from the bacterium Staphylococcus aureus (derived from PDB
ID 7AHL). The twisted H9252 sheet is from a domain of photolyase (a pro-
tein that repairs certain types of DNA damage) from E. coli (derived
from PDB ID 1DNP).
Loop--H9252H9251H9252(a) H9251H9251- Corner
Typical connections
in an all- motifH9252
(b) Crossover connection
(not observed)
Right-handed connection
between strandsH9252
(c) Left-handed connection
between strands
(very rare)
H9252
BarrelH9252(d) Twisted sheetH9252
8885d_c04_140 12/30/03 2:13 PM Page 140 mac76 mac76:385_reb:
3. Polypeptide segments adjacent to each other in
the primary sequence are usually stacked adjacent
to each other in the folded structure. Although
distant segments of a polypeptide may come
together in the tertiary structure, this is not the
norm.
4. Connections between elements of secondary
structure cannot cross or form knots (Fig. 4–20b).
5. The H9252 conformation is most stable when the
individual segments are twisted slightly in a right-
handed sense. This influences both the arrange-
ment of H9252 sheets relative to one another and the
path of the polypeptide connection between them.
Two parallel H9252 strands, for example, must be
connected by a crossover strand (Fig. 4–20c). In
principle, this crossover could have a right- or left-
handed conformation, but in proteins it is almost
always right-handed. Right-handed connections
tend to be shorter than left-handed connections
and tend to bend through smaller angles, making
them easier to form. The twisting of H9252 sheets also
leads to a characteristic twisting of the structure
formed when many segments are put together.
Two examples of resulting structures are the H9252
barrel and twisted H9252 sheet (Fig. 4–20d), which
form the core of many larger structures.
Following these rules, complex motifs can be built up
from simple ones. For example, a series of H9252-H9251-H9252 loops,
arranged so that the H9252 strands form a barrel, creates a
particularly stable and common motif called the H9251/H9252
barrel (Fig. 4–21). In this structure, each parallel H9252 seg-
ment is attached to its neighbor by an H9251-helical segment.
All connections are right-handed. The H9251/H9252 barrel is
found in many enzymes, often with a binding site for a
cofactor or substrate in the form of a pocket near one
end of the barrel. Note that domains exhibiting similar
folding patterns are said to have the same motif even
though their constituent H9251 helices and H9252 sheets may dif-
fer in length.
Protein Motifs Are the Basis for Protein Structural
Classification
Protein Architecture—Tertiary Structure of Large Globular Pro-
teins, IV. Structural Classification of Proteins As we have seen,
the complexities of tertiary structure are decreased by
considering substructures. Taking this idea further, re-
searchers have organized the complete contents of
databases according to hierarchical levels of structure.
The Structural Classification of Proteins (SCOP) data-
base offers a good example of this very important trend
in biochemistry. At the highest level of classification, the
SCOP database (http://scop.mrc-lmb.cam.ac.uk/scop)
borrows a scheme already in common use, in which pro-
tein structures are divided into four classes: all H9251, all H9252,
H9251/H9252 (in which the H9251 and H9252 segments are interspersed or
alternate), and H9251 H11001 H9252 (in which the H9251 and H9252 regions are
somewhat segregated) (Fig. 4–22). Within each class are
tens to hundreds of different folding arrangements, built
up from increasingly identifiable substructures. Some of
the substructure arrangements are very common, oth-
ers have been found in just one protein. Figure 4–22 dis-
plays a variety of motifs arrayed among the four classes
of protein structure. Those illustrated are just a minute
sample of the hundreds of known motifs. The number
of folding patterns is not infinite, however. As the rate
at which new protein structures are elucidated has in-
creased, the fraction of those structures containing a
new motif has steadily declined. Fewer than 1,000 dif-
ferent folds or motifs may exist in all proteins. Figure
4–22 also shows how proteins can be organized based
on the presence of the various motifs. The top two lev-
els of organization, class and fold, are purely structural.
Below the fold level, categorization is based on evolu-
tionary relationships.
Many examples of recurring domain or motif struc-
tures are available, and these reveal that protein terti-
ary structure is more reliably conserved than primary
sequence. The comparison of protein structures can
thus provide much information about evolution. Pro-
teins with significant primary sequence similarity,
and/or with demonstrably similar structure and func-
tion, are said to be in the same protein family. A strong
evolutionary relationship is usually evident within a pro-
tein family. For example, the globin family has many dif-
ferent proteins with both structural and sequence sim-
ilarity to myoglobin (as seen in the proteins used as
examples in Box 4–4 and again in the next chapter).
Two or more families with little primary sequence sim-
ilarity sometimes make use of the same major structural
4.3 Protein Tertiary and Quaternary Structures 141
- - LoopH9251H9252H9252 / BarrelH9251H9252
FIGURE 4–21 Constructing large motifs from smaller ones. The H9251/H9252
barrel is a common motif constructed from repetitions of the simpler
H9252-H9251-H9252 loop motif. This H9251/H9252 barrel is a domain of the pyruvate kinase
(a glycolytic enzyme) from rabbit (derived from PDB ID 1PKN).
8885d_c04_141 12/23/03 7:50 AM Page 141 mac111 mac111:reb:
Chapter 4 The Three-Dimensional Structure of Proteins142
1AO6
Serum albumin
Serum albumin
Serum albumin
Serum albumin
Human (Homo sapiens)
1JPC
-Prism II
-D-Mannose-specific plant lectins
-D-Mannose-specific plant lectins
Lectin (agglutinin)
Snowdrop (Galanthus nivalis)
1LXA
Single-stranded left-handed helix
Trimeric LpxA-like enzymes
UDP N-acetylglucosamine acyltransferase
UDP N-acetylglucosamine acyltransferase
Escherichia coli
1PEX
Four-bladed propeller
Hemopexin-like domain
Hemopexin-like domain
Collagenase-3 (MMP-13),
carboxyl-terminal domain
Human (Homo sapiens)
1GAI
H11408 toroid
Six-hairpin glycosyltransferase
Glucoamylase
Glucoamylase
Aspergillus awamori,
variant x100
1ENH
DNA/RNA-binding
3-helical bundle
Homeodomain-like
Homeodomain
engrailed Homeodomain
Drosophila melanogaster
1BCF
Ferritin-like
Ferritin-like
Ferritin
Bacterioferritin (cytochrome b
1
)
Escherichia coli
All
All
H9251
H9251H9251
H9251
H9251
H9251
H9251
H9251
H9251
H9252 H9252
H9252H9252
H9252
1HOE
-Amylase inhibitor tendamistat
-Amylase inhibitor tendamistat
-Amylase inhibitor tendamistat
-Amylase inhibitor tendamistat
Streptomyces tendae
1CD8
Immunoglobulin-like sandwich
Immunoglobulin
V set domains (antibody variable domain-like)
CD8
Human (Homo sapiens)
8885d_c04_142 12/30/03 2:14 PM Page 142 mac76 mac76:385_reb:
4.3 Protein Tertiary and Quaternary Structures 143
1DEH
NAD(P)-binding Rossmann-fold domains
NAD(P)-binding Rossmann-fold domains
Alcohol/glucose dehydrogenases,
carboxyl-terminal domain
Alcohol dehydrogenase
Human (Homo sapiens)
2PIL
Pilin
Pilin
Pilin
Pilin
Neisseria gonorrhoeae
1U9A
UBC-like
UBC-like
Ubibuitin-conjugating enzyme, UBC
Ubiquitin-conjugating enzyme, UBC
Human (Homo sapiens) ubc9
1SYN
Thymidylate synthase/dCMP hydroxymethylase
Thymidylate synthase/dCMP hydroxymethylase
Thymidylate synthase/dCMP hydroxymethylase
Thymidylate synthase
Escherichia coli
1EMA
GFP-like
GFP-like
Fluorescent proteins
Green fluorescent protein, GFP
Jellyfish (Aequorea victoria)
1DUB
ClpP/crotonase
ClpP/crotonase
Crotonase-like
Enoyl-CoA hydratase (crotonase)
Rat (Rattus norvegicus)
1PFK
Phosphofructokinase
Phosphofructokinase
Phosphofructokinase
ATP-dependent
phosphofructokinase
Escherichia coli
PDB identifier
Fold
Superfamily
Family
Protein
Species
H11545
/H9251H9252
H9251 H9252
FIGURE 4–22 Organization of proteins based on motifs. Shown here
are just a small number of the hundreds of known stable motifs. They
are divided into four classes: all H9251, all H9252, H9251/H9252, and H9251 H11001 H9252. Structural
classification data from the SCOP (Structural Classification of Proteins)
database (http://scop.mrc-lmb.cam.ac.uk/scop) are also provided. The
PDB identifier is the unique number given to each structure archived
in the Protein Data Bank (www.rcsb.org/pdb). The H9251/H9252 barrel, shown
in Figure 4–21, is another particularly common H9251/H9252 motif.
8885d_c04_143 12/30/03 2:14 PM Page 143 mac76 mac76:385_reb:
motif and have functional similarities; these families are
grouped as superfamilies. An evolutionary relationship
between the families in a superfamily is considered
probable, even though time and functional distinc-
tions—hence different adaptive pressures—may have
erased many of the telltale sequence relationships. A
protein family may be widespread in all three domains
of cellular life, the Bacteria, Archaea, and Eukarya, sug-
gesting a very ancient origin. Other families may be pres-
ent in only a small group of organisms, indicating that
the structure arose more recently. Tracing the natural
history of structural motifs, using structural classifica-
tions in databases such as SCOP, provides a powerful
complement to sequence analyses in tracing many evo-
lutionary relationships.
The SCOP database is curated manually, with the
objective of placing proteins in the correct evolutionary
framework based on conserved structural features. Two
similar enterprises, the CATH (class, architecture,
topology, and homologous superfamily) and FSSP ( fold
classification based on structure-structure alignment of
proteins) databases, make use of more automated meth-
ods and can provide additional information.
Structural motifs become especially important in
defining protein families and superfamilies. Improved
classification and comparison systems for proteins lead
inevitably to the elucidation of new functional relation-
ships. Given the central role of proteins in living sys-
tems, these structural comparisons can help illuminate
every aspect of biochemistry, from the evolution of in-
dividual proteins to the evolutionary history of complete
metabolic pathways.
Protein Quaternary Structures Range from Simple
Dimers to Large Complexes
Protein Architecture—Quaternary Structure Many proteins
have multiple polypeptide subunits. The association of
polypeptide chains can serve a variety of functions.
Many multisubunit proteins have regulatory roles; the
binding of small molecules may affect the interaction
between subunits, causing large changes in the protein’s
activity in response to small changes in the concentra-
tion of substrate or regulatory molecules (Chapter 6).
In other cases, separate subunits can take on separate
but related functions, such as catalysis and regulation.
Some associations, such as the fibrous proteins consid-
ered earlier in this chapter and the coat proteins of
viruses, serve primarily structural roles. Some very large
protein assemblies are the site of complex, multistep re-
actions. One example is the ribosome, site of protein
synthesis, which incorporates dozens of protein sub-
units along with a number of RNA molecules.
A multisubunit protein is also referred to as a mul-
timer. Multimeric proteins can have from two to hun-
dreds of subunits. A multimer with just a few subunits
is often called an oligomer. If a multimer is composed
of a number of nonidentical subunits, the overall struc-
ture of the protein can be asymmetric and quite com-
plicated. However, most multimers have identical sub-
units or repeating groups of nonidentical subunits,
usually in symmetric arrangements. As noted in Chap-
ter 3, the repeating structural unit in such a multimeric
protein, whether it is a single subunit or a group of sub-
units, is called a protomer.
The first oligomeric protein for which the three-
dimensional structure was determined was hemoglobin
(M
r
64,500), which contains four polypeptide chains and
four heme prosthetic groups, in which the iron atoms
are in the ferrous (Fe
2H11001
) state (Fig. 4–17). The protein
portion, called globin, consists of two H9251 chains (141
residues each) and two H9252 chains (146 residues each).
Note that in this case H9251 and H9252 do not refer to second-
ary structures. Because hemoglobin is four times as
large as myoglobin, much more time and effort were re-
quired to solve its three-dimensional structure by x-ray
analysis, finally achieved by Max Perutz, John Kendrew,
and their colleagues in 1959. The subunits of hemoglo-
bin are arranged in symmetric pairs (Fig. 4–23), each
pair having one H9251 and one H9252 subunit. Hemoglobin can
therefore be described either as a tetramer or as a dimer
of H9251H9252 protomers.
Identical subunits of multimeric proteins are gen-
erally arranged in one or a limited set of symmetric pat-
terns. A description of the structure of these proteins
requires an understanding of conventions used to de-
fine symmetries. Oligomers can have either rotational
symmetry or helical symmetry; that is, individual
subunits can be superimposed on others (brought to co-
incidence) by rotation about one or more rotational
axes, or by a helical rotation. In proteins with rotational
symmetry, the subunits pack about the rotational axes
to form closed structures. Proteins with helical symme-
Chapter 4 The Three-Dimensional Structure of Proteins144
Max Perutz, 1914–2002 (left)
John Kendrew, 1917–1997 (right)
8885d_c04_144 12/23/03 7:51 AM Page 144 mac111 mac111:reb:
try tend to form structures that are more open-ended,
with subunits added in a spiraling array.
There are several forms of rotational symmetry. The
simplest is cyclic symmetry, involving rotation about a
single axis (Fig. 4–24a). If subunits can be superimposed
by rotation about a single axis, the protein has a sym-
metry defined by convention as C
n
(C for cyclic, n for
the number of subunits related by the axis). The axis
itself is described as an n-fold rotational axis. The H9251H9252
protomers of hemoglobin (Fig. 4–23) are related by C
2
symmetry. A somewhat more complicated rotational
symmetry is dihedral symmetry, in which a twofold
rotational axis intersects an n-fold axis at right angles.
The symmetry is defined as D
n
(Fig. 4–24b). A protein
with dihedral symmetry has 2n protomers.
Proteins with cyclic or dihedral symmetry are par-
ticularly common. More complex rotational symmetries
are possible, but only a few are regularly encountered.
One example is icosahedral symmetry. An icosahe-
dron is a regular 12-cornered polyhedron having 20
equilateral triangular faces (Fig. 4–24c). Each face can
4.3 Protein Tertiary and Quaternary Structures 145
(a)
(b)
FIGURE 4–23 Quaternary structure of deoxyhemoglobin. (PDB ID
2HHB) X-ray diffraction analysis of deoxyhemoglobin (hemoglobin
without oxygen molecules bound to the heme groups) shows how the
four polypeptide subunits are packed together. (a) A ribbon represen-
tation. (b) A space-filling model. The H9251 subunits are shown in gray and
light blue; the H9252 subunits in pink and dark blue. Note that the heme
groups (red) are relatively far apart.
Icosahedral symmetry
(c)
Fivefold
Threefold
Twofold
Two types of dihedral symmetry
(b)
D
2
D
4
Twofold Fourfold
Twofold
Twofold
Twofold
Twofold
Two types of cyclic symmetry
(a)
C
2
C
3
Twofold Threefold
FIGURE 4–24 Rotational symmetry in proteins. (a) In cyclic sym-
metry, subunits are related by rotation about a single n-fold axis, where
n is the number of subunits so related. The axes are shown as black
lines; the numbers are values of n. Only two of many possible C
n
arrangements are shown. (b) In dihedral symmetry, all subunits can
be related by rotation about one or both of two axes, one of which is
twofold. D
2
symmetry is most common. (c) Icosahedral symmetry. Re-
lating all 20 triangular faces of an icosahedron requires rotation about
one or more of three separate rotational axes: twofold, threefold, and
fivefold. An end-on view of each of these axes is shown at the right.
8885d_c04_145 12/23/03 7:51 AM Page 145 mac111 mac111:reb:
be brought to coincidence with another by rotation
about one or more of three rotational axes. This is a
common structure in virus coats, or capsids. The human
poliovirus has an icosahedral capsid (Fig. 4–25a). Each
triangular face is made up of three protomers, each pro-
tomer containing single copies of four different polypep-
tide chains, three of which are accessible at the outer
surface. Sixty protomers form the 20 faces of the icosa-
hedral shell enclosing the genetic material (RNA).
The other major type of symmetry found in
oligomers, helical symmetry, also occurs in capsids. To-
bacco mosaic virus is a right-handed helical filament
made up of 2,130 identical subunits (Fig. 4–25b). This
cylindrical structure encloses the viral RNA. Proteins
with subunits arranged in helical filaments can also form
long, fibrous structures such as the actin filaments of
muscle (see Fig. 5–30).
There Are Limits to the Size of Proteins
The relatively large size of proteins reflects their func-
tions. The function of an enzyme, for example, requires
a stable structure containing a pocket large enough to
bind its substrate and catalyze a reaction. Protein size
has limits, however, imposed by two factors: the genetic
coding capacity of nucleic acids and the accuracy of the
protein biosynthetic process. The use of many copies of
one or a few proteins to make a large enclosing struc-
ture (capsid) is important for viruses because this strat-
egy conserves genetic material. Remember that there is
a linear correspondence between the sequence of a gene
in the nucleic acid and the amino acid sequence of the
protein for which it codes (see Fig. 1–31). The nucleic
acids of viruses are much too small to encode the in-
formation required for a protein shell made of a single
polypeptide. By using many copies of much smaller
polypeptides, a much shorter nucleic acid is needed for
coding the capsid subunits, and this nucleic acid can be
efficiently used over and over again. Cells also use large
complexes of polypeptides in muscle, cilia, the cyto-
skeleton, and other structures. It is simply more effi-
cient to make many copies of a small polypeptide than
one copy of a very large protein. In fact, most proteins
with a molecular weight greater than 100,000 have mul-
tiple subunits, identical or different.
The second factor limiting the size of proteins is the
error frequency during protein biosynthesis. The error
frequency is low (about 1 mistake per 10,000 amino acid
residues added), but even this low rate results in a high
probability of a damaged protein if the protein is very
large. Simply put, the potential for incorporating a
“wrong” amino acid in a protein is greater for a large
protein than for a small one.
SUMMARY 4.3 Protein Tertiary and Quaternary
Structures
■ Tertiary structure is the complete three-
dimensional structure of a polypeptide chain.
There are two general classes of proteins based
on tertiary structure: fibrous and globular.
■ Fibrous proteins, which serve mainly structural
roles, have simple repeating elements of
secondary structure.
■ Globular proteins have more complicated
tertiary structures, often containing several
types of secondary structure in the same
polypeptide chain. The first globular protein
structure to be determined, using x-ray
diffraction methods, was that of myoglobin.
■ The complex structures of globular proteins
can be analyzed by examining stable
substructures called supersecondary structures,
Chapter 4 The Three-Dimensional Structure of Proteins146
(b)
(a)
Protein
subunit
RNA
FIGURE 4–25 Viral capsids. (a) Poliovirus (derived from PDB ID
2PLV). The coat proteins of poliovirus assemble into an icosahedron
300 ? in diameter. Icosahedral symmetry is a type of rotational sym-
metry (see Fig. 4–24c). On the left is a surface contour image of the
poliovirus capsid. In the image on the right, lines have been super-
imposed to show the axes of symmetry. (b) Tobacco mosaic virus (de-
rived from PDB ID 1VTM). This rod-shaped virus (as shown in the
electron micrograph) is 3,000 ? long and 180 ? in diameter; it has
helical symmetry.
8885d_c04_146 12/23/03 7:51 AM Page 146 mac111 mac111:reb:
motifs, or folds. The thousands of known
protein structures are generally assembled
from a repertoire of only a few hundred motifs.
Regions of a polypeptide chain that can fold
stably and independently are called domains.
■ Quaternary structure results from interactions
between the subunits of multisubunit
(multimeric) proteins or large protein assemblies.
Some multimeric proteins have a repeated unit
consisting of a single subunit or a group of
subunits referred to as a protomer. Protomers are
usually related by rotational or helical symmetry.
4.4 Protein Denaturation and Folding
All proteins begin their existence on a ribosome as a lin-
ear sequence of amino acid residues (Chapter 27). This
polypeptide must fold during and following synthesis to
take up its native conformation. We have seen that a na-
tive protein conformation is only marginally stable. Mod-
est changes in the protein’s environment can bring about
structural changes that can affect function. We now ex-
plore the transition that occurs between the folded and
unfolded states.
Loss of Protein Structure Results in Loss of Function
Protein structures have evolved to function in particu-
lar cellular environments. Conditions different from those
in the cell can result in protein structural changes, large
and small. A loss of three-dimensional structure suffi-
cient to cause loss of function is called denaturation.
The denatured state does not necessarily equate with
complete unfolding of the protein and randomization of
conformation. Under most conditions, denatured pro-
teins exist in a set of partially folded states that are
poorly understood.
Most proteins can be denatured by heat, which af-
fects the weak interactions in a protein (primarily hy-
drogen bonds) in a complex manner. If the temperature
is increased slowly, a protein’s conformation generally
remains intact until an abrupt loss of structure (and
function) occurs over a narrow temperature range (Fig.
4–26). The abruptness of the change suggests that un-
folding is a cooperative process: loss of structure in one
part of the protein destabilizes other parts. The effects
of heat on proteins are not readily predictable. The very
heat-stable proteins of thermophilic bacteria have
evolved to function at the temperature of hot springs
(~100 H11034C). Yet the structures of these proteins often dif-
fer only slightly from those of homologous proteins de-
rived from bacteria such as Escherichia coli. How these
small differences promote structural stability at high
temperatures is not yet understood.
Proteins can be denatured not only by heat but by
extremes of pH, by certain miscible organic solvents
such as alcohol or acetone, by certain solutes such as
urea and guanidine hydrochloride, or by detergents.
Each of these denaturing agents represents a relatively
mild treatment in the sense that no covalent bonds in
the polypeptide chain are broken. Organic solvents,
urea, and detergents act primarily by disrupting the hy-
drophobic interactions that make up the stable core of
globular proteins; extremes of pH alter the net charge
on the protein, causing electrostatic repulsion and the
disruption of some hydrogen bonding. The denatured
states obtained with these various treatments need not
be equivalent.
4.4 Protein Denaturation and Folding 147
Ribonuclease A
(a)
80
100
60
40
20
0 2040608010
Ribonuclease A
Apomyoglobin
Temperature (°C)
P
ercent of maximum signal
(b)
80
100
60
40
20
012345
[GdnHCl], M
P
ercent unfolded
T
m
T
m
T
m
FIGURE 4–26 Protein denaturation. Results are shown for proteins de-
natured by two different environmental changes. In each case, the tran-
sition from the folded to unfolded state is fairly abrupt, suggesting co-
operativity in the unfolding process. (a) Thermal denaturation of horse
apomyoglobin (myoglobin without the heme prosthetic group) and ri-
bonuclease A (with its disulfide bonds intact; see Fig. 4–27). The mid-
point of the temperature range over which denaturation occurs is called
the melting temperature, or T
m
. The denaturation of apomyoglobin was
monitored by circular dichroism, a technique that measures the amount
of helical structure in a macromolecule. Denaturation of ribonuclease
A was tracked by monitoring changes in the intrinsic fluorescence of
the protein, which is affected by changes in the environment of Trp
residues. (b) Denaturation of disulfide-intact ribonuclease A by guani-
dine hydrochloride (GdnHCl), monitored by circular dichroism.
8885d_c04_147 12/23/03 7:52 AM Page 147 mac111 mac111:reb:
Amino Acid Sequence Determines Tertiary Structure
The tertiary structure of a globular protein is deter-
mined by its amino acid sequence. The most important
proof of this came from experiments showing that de-
naturation of some proteins is reversible. Certain glob-
ular proteins denatured by heat, extremes of pH, or de-
naturing reagents will regain their native structure and
their biological activity if returned to conditions in which
the native conformation is stable. This process is called
renaturation.
A classic example is the denaturation and renatu-
ration of ribonuclease. Purified ribonuclease can be
completely denatured by exposure to a concentrated
urea solution in the presence of a reducing agent. The
reducing agent cleaves the four disulfide bonds to yield
eight Cys residues, and the urea disrupts the stabiliz-
ing hydrophobic interactions, thus freeing the entire
polypeptide from its folded conformation. Denaturation
of ribonuclease is accompanied by a complete loss of
catalytic activity. When the urea and the reducing agent
are removed, the randomly coiled, denatured ribonu-
clease spontaneously refolds into its correct tertiary
structure, with full restoration of its catalytic activity
(Fig. 4–27). The refolding of ribonuclease is so accurate
that the four intrachain disulfide bonds are re-formed
in the same positions in the renatured molecule as in
the native ribonuclease. As calculated mathematically,
the eight Cys residues could recombine at random to
form up to four disulfide bonds in 105 different ways.
In fact, an essentially random distribution of disulfide
bonds is obtained when the disulfides are allowed to re-
form in the presence of denaturant, indicating that weak
bonding interactions are required for correct position-
ing of disulfide bonds and assumption of the native
conformation.
This classic experiment, carried out by Christian
Anfinsen in the 1950s, provided the first evidence that
the amino acid sequence of a polypeptide chain contains
all the information required to fold the chain into its na-
tive, three-dimensional structure. Later, similar results
were obtained using chemically synthesized, catalyti-
cally active ribonuclease. This eliminated the possibility
that some minor contaminant in Anfinsen’s purified
ribonuclease preparation might have contributed to
the renaturation of the enzyme, thus dispelling any re-
maining doubt that this enzyme folds spontaneously.
Polypeptides Fold Rapidly by a Stepwise Process
In living cells, proteins are assembled from amino acids
at a very high rate. For example, E. coli cells can make
a complete, biologically active protein molecule con-
taining 100 amino acid residues in about 5 seconds at
37 H11034C. How does such a polypeptide chain arrive at its
native conformation? Let’s assume conservatively that
each of the amino acid residues could take up 10 dif-
ferent conformations on average, giving 10
100
different
conformations for the polypeptide. Let’s also assume
that the protein folds itself spontaneously by a random
process in which it tries out all possible conformations
around every single bond in its backbone until it finds
its native, biologically active form. If each conformation
were sampled in the shortest possible time (~10
H1100213
sec-
ond, or the time required for a single molecular vibra-
tion), it would take about 10
77
years to sample all pos-
sible conformations. Thus protein folding cannot be a
completely random, trial-and-error process. There must
be shortcuts. This problem was first pointed out by
Cyrus Levinthal in 1968 and is sometimes called
Levinthal’s paradox.
The folding pathway of a large polypeptide chain is
unquestionably complicated, and not all the principles
that guide the process have been worked out. However,
extensive study has led to the development of several
Chapter 4 The Three-Dimensional Structure of Proteins148
26
removal of
urea and
mercapto-
ethanol
addition of
urea and
mercapto-
ethanol
84
40
95
110
58
65
72
110
95
HS
HS
HS
HS
HS
SH
SH SH
72
65
58
40
26
84
40
26
84
65
72
58
110
95
Native state;
catalytically active.
Unfolded state;
inactive. Disulfide
cross-links reduced to
yield Cys residues.
Native,
catalytically
active state.
Disulfide cross-links
correctly re-formed.
FIGURE 4–27 Renaturation of unfolded, denatured ribonuclease.
Urea is used to denature ribonuclease, and mercaptoethanol
(HOCH
2
CH
2
SH) to reduce and thus cleave the disulfide bonds to yield
eight Cys residues. Renaturation involves reestablishment of the cor-
rect disulfide cross-links.
8885d_c04_148 12/23/03 7:52 AM Page 148 mac111 mac111:reb:
plausible models. In one, the folding process is envi-
sioned as hierarchical. Local secondary structures form
first. Certain amino acid sequences fold readily into H9251
helices or H9252 sheets, guided by constraints we have re-
viewed in our discussion of secondary structure. This is
followed by longer-range interactions between, say, two
H9251 helices that come together to form stable supersec-
ondary structures. The process continues until complete
domains form and the entire polypeptide is folded (Fig.
4–28). In an alternative model, folding is initiated by a
spontaneous collapse of the polypeptide into a compact
state, mediated by hydrophobic interactions among non-
polar residues. The state resulting from this “hy-
drophobic collapse” may have a high content of sec-
ondary structure, but many amino acid side chains are
not entirely fixed. The collapsed state is often referred
to as a molten globule. Most proteins probably fold by
a process that incorporates features of both models. In-
stead of following a single pathway, a population of pep-
tide molecules may take a variety of routes to the same
end point, with the number of different partly folded
conformational species decreasing as folding nears
completion.
Thermodynamically, the folding process can be
viewed as a kind of free-energy funnel (Fig. 4–29). The
unfolded states are characterized by a high degree of
conformational entropy and relatively high free energy.
As folding proceeds, the narrowing of the funnel repre-
4.4 Protein Denaturation and Folding 149
FIGURE 4–28 A simulated folding pathway. The folding pathway of
a 36-residue segment of the protein villin (an actin-binding protein
found principally in the microvilli lining the intestine) was simulated
by computer. The process started with the randomly coiled peptide
and 3,000 surrounding water molecules in a virtual “water box.” The
molecular motions of the peptide and the effects of the water mole-
cules were taken into account in mapping the most likely paths to
the final structure among the countless alternatives. The simulated
folding took place in a theoretical time span of 1 ms; however, the
calculation required half a billion integration steps on two Cray
supercomputers, each running for two months.
P
ercentage of residues of protein
in native conformation
Energy
Molten globule
states
Native
structure
Discrete folding
intermediates
100
0
Entropy
Beginning of helix formation and collapse
FIGURE 4–29 The thermodynamics of protein folding depicted as a
free-energy funnel. At the top, the number of conformations, and
hence the conformational entropy, is large. Only a small fraction of
the intramolecular interactions that will exist in the native conforma-
tion are present. As folding progresses, the thermodynamic path down
the funnel reduces the number of states present (decreases entropy),
increases the amount of protein in the native conformation, and de-
creases the free energy. Depressions on the sides of the funnel repre-
sent semistable folding intermediates, which may, in some cases, slow
the folding process.
sents a decrease in the number of conformational
species present. Small depressions along the sides of the
free-energy funnel represent semistable intermediates
that can briefly slow the folding process. At the bottom
of the funnel, an ensemble of folding intermediates has
been reduced to a single native conformation (or one of
a small set of native conformations).
Defects in protein folding may be the molecular
basis for a wide range of human genetic disorders.
For example, cystic fibrosis is caused by defects in a
membrane-bound protein called cystic fibrosis trans-
membrane conductance regulator (CFTR), which acts as
a channel for chloride ions. The most common cystic
8885d_c04_149 12/23/03 7:52 AM Page 149 mac111 mac111:reb:
fibrosis–causing mutation is the deletion of a Phe
residue at position 508 in CFTR, which causes improper
protein folding (see Box 11–3). Many of the disease-
related mutations in collagen (p. 129) also cause de-
fective folding. An improved understanding of protein
folding may lead to new therapies for these and many
other diseases (Box 4–5). ■
Thermodynamic stability is not evenly distributed
over the structure of a protein—the molecule has re-
gions of high and low stability. For example, a protein
Chapter 4 The Three-Dimensional Structure of Proteins150
BOX 4–5 BIOCHEMISTRY IN MEDICINE
Death by Misfolding: The Prion Diseases
A misfolded protein appears to be the causative agent
of a number of rare degenerative brain diseases in
mammals. Perhaps the best known of these is mad
cow disease (bovine spongiform encephalopathy,
BSE), an outbreak of which made international head-
lines in the spring of 1996. Related diseases include
kuru and Creutzfeldt-Jakob disease in humans, scrapie
in sheep, and chronic wasting disease in deer and elk.
These diseases are also referred to as spongiform en-
cephalopathies, because the diseased brain frequently
becomes riddled with holes (Fig. 1). Typical symptoms
include dementia and loss of coordination. The dis-
eases are fatal.
In the 1960s, investigators found that prepara-
tions of the disease-causing agents appeared to lack
nucleic acids. At this time, Tikvah Alper suggested
that the agent was a protein. Initially, the idea seemed
heretical. All disease-causing agents known up to that
time—viruses, bacteria, fungi, and so on—contained
nucleic acids, and their virulence was related to ge-
netic reproduction and propagation. However, four
decades of investigations, pursued most notably by
Stanley Prusiner, have provided evidence that spongi-
form encephalopathies are different.
The infectious agent has been traced to a single
protein (M
r
28,000), which Prusiner dubbed prion
(from proteinaceous infectious only) protein (PrP).
Prion protein is a normal constituent of brain tissue
in all mammals. Its role in the mammalian brain is not
known in detail, but it appears to have a molecular
signaling function. Strains of mice lacking the gene for
PrP (and thus the protein itself) suffer no obvious ill
effects. Illness occurs only when the normal cellular
PrP, or PrP
C
, occurs in an altered conformation called
PrP
Sc
(Sc denotes scrapie). The interaction of PrP
Sc
with PrP
C
converts the latter to PrP
Sc
, initiating a
domino effect in which more and more of the brain
protein converts to the disease-causing form. The
mechanism by which the presence of PrP
Sc
leads to
spongiform encephalopathy is not understood.
In inherited forms of prion diseases, a mutation in
the gene encoding PrP produces a change in one amino
acid residue that is believed to make the conversion of
PrP
C
to PrP
Sc
more likely. A complete understanding of
prion diseases awaits new information about how prion
protein affects brain function. Structural information
about PrP is beginning to provide insights into the mo-
lecular process that allows the prion proteins to inter-
act so as to alter their conformation (Fig. 2).
FIGURE 1 A stained section of the cerebral cortex from a patient
with Creutzfeldt-Jakob disease shows spongiform (vacuolar) degen-
eration, the most characteristic neurohistological feature. The yel-
lowish vacuoles are intracellular and occur mostly in pre- and post-
synaptic processes of neurons. The vacuoles in this section vary in
diameter from 20 to 100 H9262m.
FIGURE 2 The structure of the globular domain of human PrP in
monomeric (left) and dimeric (right) forms. The second subunit is
gray to highlight the dramatic conformational change in the green
H9251 helix when the dimer is formed.
8885d_c04_150 12/23/03 7:52 AM Page 150 mac111 mac111:reb:
may have two stable domains joined by a segment with
lower structural stability, or one small part of a domain
may have a lower stability than the remainder. The re-
gions of low stability allow a protein to alter its confor-
mation between two or more states. As we shall see in
the next two chapters, variations in the stability of re-
gions within a given protein are often essential to pro-
tein function.
Some Proteins Undergo Assisted Folding
Not all proteins fold spontaneously as they are synthe-
sized in the cell. Folding for many proteins is facilitated
by the action of specialized proteins. Molecular chap-
erones are proteins that interact with partially folded
or improperly folded polypeptides, facilitating correct
folding pathways or providing microenvironments in
which folding can occur. Two classes of molecular chap-
erones have been well studied. Both are found in or-
ganisms ranging from bacteria to humans. The first
class, a family of proteins called Hsp70, generally have
a molecular weight near 70,000 and are more abundant
in cells stressed by elevated temperatures (hence, heat
shock proteins of M
r
70,000, or Hsp70). Hsp70 proteins
bind to regions of unfolded polypeptides that are rich
in hydrophobic residues, preventing inappropriate
aggregation. These chaperones thus “protect” proteins
that have been denatured by heat and peptides that are
being synthesized (and are not yet folded). Hsp70
proteins also block the folding of certain proteins that
must remain unfolded until they have been translocated
across membranes (as described in Chapter 27). Some
chaperones also facilitate the quaternary assembly of
oligomeric proteins. The Hsp70 proteins bind to and
release polypeptides in a cycle that also involves sev-
eral other proteins (including a class called Hsp40) and
ATP hydrolysis. Figure 4–30 illustrates chaperone-
assisted folding as elucidated for the chaperones DnaK
and DnaJ in E. coli, homologs of the eukaryotic Hsp70
and Hsp40. DnaK and DnaJ were first identified as pro-
teins required for in vitro replication of certain viral DNA
molecules (hence the “Dna” designation).
4.4 Protein Denaturation and Folding 151
DnaJ
DnaK
2 P
i
Unfolded
protein
Folded
protein
(native
conformation)
GrpE
ADP + GrpE (+ DnaJ ?)
+
+
+
ATP
ATP
ATP
ATP
ATP
+
ATP
To GroEL
system
Partially
folded
protein
1 DnaJ binds to the
unfolded or partially
folded protein and
then to DnaK.
4 ATP binds to
DnaK and the
protein dissociates.
2 DnaJ stimulates ATP
hydrolysis by DnaK.
DnaK–ADP binds tightly
to the unfolded protein.
3 In bacteria, the
nucleotide-exchange
factor GrpE stimulates
release of ADP.
ADP
ADP
FIGURE 4–30 Chaperones in protein folding. The cyclic pathway by
which chaperones bind and release polypeptides is illustrated for the
E. coli chaperone proteins DnaK and DnaJ, homologs of the eukary-
otic chaperones Hsp70 and Hsp40. The chaperones do not actively
promote the folding of the substrate protein, but instead prevent ag-
gregation of unfolded peptides. For a population of polypeptides, some
fraction of the polypeptides released at the end of the cycle are in the
native conformation. The remainder are rebound by DnaK or are di-
verted to the chaperonin system (GroEL; see Fig. 4–31). In bacteria, a
protein called GrpE interacts transiently with DnaK late in the cycle
(step 3 ), promoting dissociation of ADP and possibly DnaJ. No eu-
karyotic analog of GrpE is known.
8885d_c04_151 12/23/03 7:53 AM Page 151 mac111 mac111:reb:
The second class of chaperones is called chaper-
onins. These are elaborate protein complexes required
for the folding of a number of cellular proteins that do
not fold spontaneously. In E. coli an estimated 10% to
15% of cellular proteins require the resident chaperonin
system, called GroEL/GroES, for folding under normal
conditions (up to 30% require this assistance when the
cells are heat stressed). These proteins first became
known when they were found to be necessary for the
growth of certain bacterial viruses (hence the designa-
tion “Gro”). Unfolded proteins are bound within pock-
ets in the GroEL complex, and the pockets are capped
transiently by the GroES “lid” (Fig. 4–31). GroEL un-
dergoes substantial conformational changes, coupled to
ATP hydrolysis and the binding and release of GroES,
which promote folding of the bound polypeptide. Al-
though the structure of the GroEL/GroES chaperonin is
known, many details of its mechanism of action remain
unresolved.
Finally, the folding pathways of a number of pro-
teins require two enzymes that catalyze isomerization
reactions. Protein disulfide isomerase (PDI) is a
widely distributed enzyme that catalyzes the inter-
change or shuffling of disulfide bonds until the bonds of
the native conformation are formed. Among its func-
tions, PDI catalyzes the elimination of folding interme-
Chapter 4 The Three-Dimensional Structure of Proteins152
(b)
GroEL
7 P
i
,
Unfolded protein
GroES
7
7 P
i
GroES
(a)
GroES
7 P
i
GroES
1 Unfolded
protein binds
to the GroEL
pocket not
blocked by
GroES.
2 ATP binds to
each subunit
of the GroEL
heptamer.
3 ATP hydrolysis
leads to release
of 14 ADP and
GroES.
Folded
protein
7 Proteins not
folded when
released are
rapidly bound
again.
6 The released
protein is fully
folded or in a
partially folded
state that is
committed to
adopt the native
conformation.
5 Protein folds
inside the
enclosure.
4 7 ATP and GroES
bind to GroEL with
a filled pocket.
ATP
7 ADP
7 ADP
7 ADP
7 ADP
7 ADP
7 ADP
7 ATP
7 ATP
7 ATP
7 ATP
7 ATP
FIGURE 4–31 Chaperonins in protein folding. (a) A proposed
pathway for the action of the E. coli chaperonins GroEL (a member
of the Hsp60 protein family) and GroES. Each GroEL complex
consists of two large pockets formed by two heptameric rings (each
subunit M
r
57,000). GroES, also a heptamer (subunits M
r
10,000),
blocks one of the GroEL pockets. (b) Surface and cut-away images
of the GroEL/GroES complex (PDB ID 1AON). The cut-away (right)
illustrates the large interior space within which other proteins are
bound.
8885d_c04_152 12/23/03 7:53 AM Page 152 mac111 mac111:reb:
diates with inappropriate disulfide cross-links. Peptide
prolyl cis-trans isomerase (PPI) catalyzes the in-
terconversion of the cis and trans isomers of Pro pep-
tide bonds (Fig. 4–8b), which can be a slow step in the
folding of proteins that contain some Pro residue pep-
tide bonds in the cis conformation.
Protein folding is likely to be a more complex
process in the densely packed cellular environment than
in the test tube. More classes of proteins that facilitate
protein folding may be discovered as the biochemical
dissection of the folding process continues.
SUMMARY 4.4 Protein Denaturation and Folding
■ The three-dimensional structure and the
function of proteins can be destroyed by
denaturation, demonstrating a relationship
between structure and function. Some
denatured proteins can renature spontaneously
to form biologically active protein, showing that
protein tertiary structure is determined by
amino acid sequence.
■ Protein folding in cells probably involves
multiple pathways. Initially, regions of
secondary structure may form, followed by
folding into supersecondary structures. Large
ensembles of folding intermediates are rapidly
brought to a single native conformation.
■ For many proteins, folding is facilitated by
Hsp70 chaperones and by chaperonins.
Disulfide bond formation and the cis-trans
isomerization of Pro peptide bonds are
catalyzed by specific enzymes.
Chapter 4 Further Reading 153
Key Terms
conformation 116
native conformation
117
solvation layer 117
peptide group 118
Ramachandran
plot 118
secondary struc-
ture 120
H9251 helix 120
H9252 conformation 123
H9252 sheet 123
H9252 turn 123
tertiary
structure 125
quaternary
structure 125
fibrous proteins 125
globular proteins 125
H9251-keratin 126
collagen 127
silk fibroin 129
supersecondary struc-
tures 139
motif 139
fold 139
domain 140
protein family 141
multimer 144
oligomer 144
protomer 144
symmetry 144
denaturation 147
molten globule 149
prion 150
molecular
chaperone 151
Hsp70 151
chaperonin 152
Terms in bold are defined in the glossary.
Further Reading
General
Anfinsen, C.B. (1973) Principles that govern the folding of
protein chains. Science 181, 223–230.
The author reviews his classic work on ribonuclease.
Branden, C. & Tooze, J. (1991) Introduction to Protein
Structure, Garland Publishing, Inc., New York.
Creighton, T.E. (1993) Proteins: Structures and Molecular
Properties, 2nd edn, W. H. Freeman and Company, New York.
A comprehensive and authoritative source.
Evolution of Catalytic Function. (1987) Cold Spring Harb. Symp.
Quant. Biol. 52.
A collection of excellent articles on many topics, including
protein structure, folding, and function.
Kendrew, J.C. (1961) The three-dimensional structure of a
protein molecule. Sci. Am. 205 (December), 96–111.
Describes how the structure of myoglobin was determined and
what was learned from it.
Richardson, J.S. (1981) The anatomy and taxonomy of protein
structure. Adv. Prot. Chem. 34, 167–339.
An outstanding summary of protein structural patterns and
principles; the author originated the very useful “ribbon”
representations of protein structure.
Secondary, Tertiary, and Quaternary Structures
Berman, H.M. (1999) The past and future of structure databases.
Curr. Opin. Biotechnol. 10, 76–80.
A broad summary of the different approaches being used to
catalog protein structures.
Brenner, S.E., Chothia, C., & Hubbard, T.J.P. (1997)
Population statistics of protein structures: lessons from structural
classifications. Curr. Opin. Struct. Biol. 7, 369–376.
Fuchs, E. & Cleveland, D.W. (1998) A structural scaffolding of
intermediate filaments in health and disease. Science 279,
514–519.
8885d_c04_153 1/16/04 6:14 AM Page 153 mac76 mac76:385_reb:
Chapter 4 The Three-Dimensional Structure of Proteins154
McPherson, A. (1989) Macromolecular crystals. Sci. Am. 260
(March), 62–69.
A description of how macromolecules such as proteins are
crystallized.
Ponting, C.P. & Russell, R.R. (2002) The natural history of
protein domains. Annu. Rev. Biophys. Biomol. Struct. 31,
45–71.
An explanation of how structural databases can be used to
explore evolution.
Prockop, D.J. & Kivirikko, K.I. (1995) Collagens, molecular
biology, diseases, and potentials for therapy. Annu. Rev. Biochem.
64, 403–434.
Protein Denaturation and Folding
Baldwin, R.L. (1994) Matching speed and stability. Nature 369,
183–184.
Bukau, B., Deuerling, E., Pfund, C., & Craig, E.A. (2000)
Getting newly synthesized proteins into shape. Cell 101, 119–122.
A good summary of chaperone mechanisms.
Collinge, J. (2001) Prion diseases of humans and animals: their
causes and molecular basis. Annu. Rev. Neurosci. 24, 519–550.
Creighton, T.E., Darby, N.J., & Kemmink, J. (1996) The roles of
partly folded intermediates in protein folding. FASEB J. 10, 110–118.
Daggett, V., & Fersht, A.R. (2003) Is there a unifying mecha-
nism for protein folding? Trends Biochem. Sci. 28, 18–25.
Dill, K.A. & Chan, H.S. (1997) From Levinthal to pathways to
funnels. Nat. Struct. Biol. 4, 10–19.
Luque, I., Leavitt, S.A., & Freire, E. (2002) The linkage
between protein folding and functional cooperativity: two sides
of the same coin? Annu. Rev. Biophys. Biomol. Struct. 31,
235–256.
A review of how variations in structural stability within one
protein contribute to function.
Nicotera, P. (2001) A route for prion neuroinvasion. Neuron 31,
345–348.
Prusiner, S.B. (1995) The prion diseases. Sci. Am. 272
(January), 48–57.
A good summary of the evidence leading to the prion
hypothesis.
Richardson, A., Landry, S.J., & Georgopolous, C. (1998) The
ins and outs of a molecular chaperone machine. Trends Biochem.
Sci. 23, 138–143.
Thomas, P.J., Qu, B.-H., & Pederson, P.L. (1995) Defective
protein folding as a basis of human disease. Trends Biochem. Sci.
20, 456–459.
Westaway, D. & Carlson, G.A. (2002) Mammalian prion
proteins: enigma, variation and vaccination. Trends Biochem. Sci.
27, 301–307.
A good update.
1. Properties of the Peptide Bond In x-ray studies of
crystalline peptides, Linus Pauling and Robert Corey found
that the CON bond in the peptide link is intermediate in
length (1.32 ?) between a typical CON single bond (1.49 ?)
and a CPN double bond (1.27 ?). They also found that the
peptide bond is planar (all four atoms attached to the CON
group are located in the same plane) and that the two H9251-
carbon atoms attached to the CON are always trans to each
other (on opposite sides of the peptide bond):
(a) What does the length of the CON bond in the pep-
tide linkage indicate about its strength and its bond order
(i.e., whether it is single, double, or triple)?
(b) What do the observations of Pauling and Corey tell
us about the ease of rotation about the CON peptide bond?
2. Structural and Functional Relationships in Fibrous
Proteins William Astbury discovered that the x-ray pattern
of wool shows a repeating structural unit spaced about 5.2 ?
along the length of the wool fiber. When he steamed and
stretched the wool, the x-ray pattern showed a new repeating
structural unit at a spacing of 7.0 ?. Steaming and stretching
the wool and then letting it shrink gave an x-ray pattern con-
sistent with the original spacing of about 5.2 ?. Although these
observations provided important clues to the molecular struc-
ture of wool, Astbury was unable to interpret them at the time.
(a) Given our current understanding of the structure of
wool, interpret Astbury’s observations.
(b) When wool sweaters or socks are washed in hot wa-
ter or heated in a dryer, they shrink. Silk, on the other hand,
does not shrink under the same conditions. Explain.
3. Rate of Synthesis of Hair H9251-Keratin Hair grows at
a rate of 15 to 20 cm/yr. All this growth is concentrated at
the base of the hair fiber, where H9251-keratin filaments are syn-
thesized inside living epidermal cells and assembled into ro-
pelike structures (see Fig. 4–11). The fundamental structural
element of H9251-keratin is the H9251 helix, which has 3.6 amino acid
residues per turn and a rise of 5.4 ? per turn (see Fig. 4–4b).
Assuming that the biosynthesis of H9251-helical keratin chains is
the rate-limiting factor in the growth of hair, calculate the
rate at which peptide bonds of H9251-keratin chains must be syn-
thesized (peptide bonds per second) to account for the ob-
served yearly growth of hair.
4. Effect of pH on the Conformation of H9251-Helical Sec-
ondary Structures The unfolding of the H9251 helix of a
polypeptide to a randomly coiled conformation is accompanied
by a large decrease in a property called its specific rotation, a
measure of a solution’s capacity to rotate plane-polarized light.
Polyglutamate, a polypeptide made up of only L-Glu residues,
C
a
N
C
O
C
a
H
Problems
8885d_c04_154 12/23/03 7:53 AM Page 154 mac111 mac111:reb:
Chapter 4 Problems 155
has the H9251-helical conformation at pH 3. When the pH is raised
to 7, there is a large decrease in the specific rotation of the so-
lution. Similarly, polylysine (L-Lys residues) is an H9251 helix at pH
10, but when the pH is lowered to 7 the specific rotation also
decreases, as shown by the following graph.
What is the explanation for the effect of the pH changes on
the conformations of poly(Glu) and poly(Lys)? Why does the
transition occur over such a narrow range of pH?
5. Disulfide Bonds Determine the Properties of Many
Proteins A number of natural proteins are very rich in
disulfide bonds, and their mechanical properties (tensile
strength, viscosity, hardness, etc.) are correlated with the de-
gree of disulfide bonding. For example, glutenin, a wheat pro-
tein rich in disulfide bonds, is responsible for the cohesive
and elastic character of dough made from wheat flour. Simi-
larly, the hard, tough nature of tortoise shell is due to the
extensive disulfide bonding in its H9251-keratin.
(a) What is the molecular basis for the correlation be-
tween disulfide-bond content and mechanical properties of
the protein?
(b) Most globular proteins are denatured and lose their
activity when briefly heated to 65 H11034C. However, globular pro-
teins that contain multiple disulfide bonds often must be
heated longer at higher temperatures to denature them. One
such protein is bovine pancreatic trypsin inhibitor (BPTI),
which has 58 amino acid residues in a single chain and con-
tains three disulfide bonds. On cooling a solution of dena-
tured BPTI, the activity of the protein is restored. What is
the molecular basis for this property?
6. Amino Acid Sequence and Protein Structure Our
growing understanding of how proteins fold allows re-
searchers to make predictions about protein structure based
on primary amino acid sequence data.
(a) In the amino acid sequence above, where would you
predict that bends or H9252 turns would occur?
(b) Where might intrachain disulfide cross-linkages be
formed?
(c) Assuming that this sequence is part of a larger glob-
ular protein, indicate the probable location (the external sur-
face or interior of the protein) of the following amino acid
residues: Asp, Ile, Thr, Ala, Gln, Lys. Explain your reasoning.
(Hint: See the hydropathy index in Table 3–1.)
7. Bacteriorhodopsin in Purple Membrane Proteins
Under the proper environmental conditions, the salt-loving
bacterium Halobacterium halobium synthesizes a membrane
protein (M
r
26,000) known as bacteriorhodopsin, which is pur-
ple because it contains retinal (see Fig. 10–21). Molecules of
this protein aggregate into “purple patches” in the cell mem-
brane. Bacteriorhodopsin acts as a light-activated proton pump
that provides energy for cell functions. X-ray analysis of this
protein reveals that it consists of seven parallel H9251-helical seg-
ments, each of which traverses the bacterial cell membrane
(thickness 45 ?). Calculate the minimum number of amino acid
residues necessary for one segment of H9251 helix to traverse the
membrane completely. Estimate the fraction of the bacteri-
orhodopsin protein that is involved in membrane-spanning he-
lices. (Use an average amino acid residue weight of 110.)
8. Pathogenic Action of Bacteria That Cause Gas
Gangrene The highly pathogenic anaerobic bacterium
Clostridium perfringens is responsible for gas gangrene, a
condition in which animal tissue structure is destroyed. This
bacterium secretes an enzyme that efficiently catalyzes the
hydrolysis of the peptide bond indicated in red:
where X and Y are any of the 20 common amino acids. How
does the secretion of this enzyme contribute to the invasive-
ness of this bacterium in human tissues? Why does this en-
zyme not affect the bacterium itself?
9. Number of Polypeptide Chains in a Multisubunit
Protein A sample (660 mg) of an oligomeric protein of M
r
132,000 was treated with an excess of 1-fluoro-2,4-
dinitrobenzene (Sanger’s reagent) under slightly alkaline con-
ditions until the chemical reaction was complete. The pep-
tide bonds of the protein were then completely hydrolyzed
by heating it with concentrated HCl. The hydrolysate was
found to contain 5.5 mg of the following compound:
2,4-Dinitrophenyl derivatives of the H9251-amino groups of other
amino acids could not be found.
(a) Explain how this information can be used to deter-
mine the number of polypeptide chains in an oligomeric
protein.
(b) Calculate the number of polypeptide chains in this
protein.
(c) What other protein analysis technique could you
employ to determine whether the polypeptide chains in this
protein are similar or different?
O
2
N
NO
2
NH C
C
CH
3
CH
3
H
H
COOH
X Gly Pro Y
H
2
O
X COO H
3
NGly Pro Y
H11001
H11001
H11002
12345678910
Ile Ala His Thr Tyr Gly Pro Phe Glu Ala
11 12 13 14 15 16 17 18 19 20
Ala Met Cys Lys Trp Glu Ala Gln Pro Asp
21 22 23 24 25 26 27 28
Gly Met Glu Cys Ala Phe His Arg
0
Poly(Glu)
Random conformation
Poly(Lys)
pH
Specific rotation
2 4 6 8 10 12 14
a Helix
Random
conformation
a Helix
8885d_c04_155 1/16/04 6:14 AM Page 155 mac76 mac76:385_reb:
Chapter 4 The Three-Dimensional Structure of Proteins156
Biochemistry on the Internet
10. Protein Modeling on the Internet A group of pa-
tients suffering from Crohn’s disease (an inflammatory bowel
disease) underwent biopsies of their intestinal mucosa in an
attempt to identify the causative agent. A protein was iden-
tified that was expressed at higher levels in patients with
Crohn’s disease than in patients with an unrelated inflamma-
tory bowel disease or in unaffected controls. The protein was
isolated and the following partial amino acid sequence was
obtained (reads left to right):
EAELCPDRCI HSFQNLGIQC VKKRDLEQAI
SQRIQTNNNP FQVPIEEQRG DYDLNAVRLC
FQVTVRDPSG RPLRLPPVLP HPIFDNRAPN
TAELKICRVN RNSGSCLGGD EIFLLCDKVQ
KEDIEVYFTG PGWEARGSFS QADVHRQVAI
VFRTPPYADP SLQAPVRVSM QLRRPSDREL
SEPMEFQYLP DTDDRHRIEE KRKRTYETFK
SIMKKSPFSG PTDPRPPPRR IAVPSRSSAS
VPKPAPQPYP
(a) You can identify this protein using a protein data-
base on the Internet. Some good places to start include
Protein Information Resource (PIR; pir.georgetown.edu/
pirwww), Structural Classification of Proteins (SCOP; http://
scop.berkeley.edu), and Prosite (http://us.expasy.org/prosite).
At your selected database site, follow links to locate the
sequence comparison engine. Enter about 30 residues from
the sequence of the protein in the appropriate search field
and submit it for analysis. What does this analysis tell you
about the identity of the protein?
(b) Try using different portions of the protein amino acid
sequence. Do you always get the same result?
(c) A variety of websites provide information about the
three-dimensional structure of proteins. Find information
about the protein’s secondary, tertiary, and quaternary struc-
ture using database sites such as the Protein Data Bank (PDB;
www.rcsb.org/pdb) or SCOP.
(d) In the course of your Web searches try to find in-
formation about the cellular function of the protein.
8885d_c04_156 12/23/03 7:54 AM Page 156 mac111 mac111:reb: