Chapter 6 The Three-dimensional
Structure of Proteins
1,General studies of the peptide bond
1.1 The peptide (O=C-N-H) bond was found to
be shorter than the C-N bond in a simple
amine and atoms attached are coplanar.
1.1.1 This was revealed by X-ray
diffraction studies of amino acids and of simple
dipeptides and tripeptides.
1.1.2 The peptide (amide) bond was
found to be about 1.32 ? (C-N single bond,1.49;
C=N double bond,1.27),thus having partial
double bond feature (should be rigid and
unable to rotate freely).
1.1.3 The partial double bond feature is
a result of partial sharing (resonance) of
electrons between the carbonyl oxygen and
amide nitrogen.
1.1.4 The atoms attached to the peptide
bond are coplanar with the oxygen and
hydrogen atom in trans positions.
1.2 X-ray studies of a-keratin (the fibrous
protein making up hair and wool) revealed a
repeating unit of 5.4 ? (Astury in the 1930s).
1.2 The backbone conformation of a peptide can be
defined by two sets of rotation angles.
1.2.1 The rotation angles around the N-Ca bonds
are labeled as phi (?),and around Ca-C bonds are psi
(?).
1.2.2 By convention,both phi and psi are defined
as 0 degree in the conformation when the two peptide
planes connected to the same a carbon are in the same
plane.
1.2.3 In principle,phi and psi can have any value
between -180 and +180 degrees.
1.2.4 The conformation of the main chain is
completely defined when phi and psi are specified for
each residue in the chain.
The peptide bond is rigid and planar
c) The conformation corresponding
to ?=00,?=00,which is disallowed by
the steric overlap between H and O
atoms of adjacent peptide planes.
b) The conformation
corresponding to ?=1800,?=1800,
when the peptide is in its fully
extended conformation.
Ramachandran plot for L-Ala residues,Dark blue area reflect conformations that involve
no steric overlap and thus are fully allowed; medium blue indicates conformations allowed
at the extreme limits for unfavorable atomic contacts; the lightest blue area reflects
conformations that are permissible if a little flexibility is allowed in the bond angles.
1.3 Protein structures have conventionally been
understood at four different levels.
1.3.1 The primary structure is the amino acid
sequence (including the locations of disulfide bonds).
1.3.2 The secondary structure refers to the regular,
recurring arrangements of adjacent residues resulting
mainly from hydrogen bonding between backbone
groups,with a-helices and b-pleated sheets as the two
most common ones.
1.3.3 The tertiary structure refers to the spatial
relationship among all amino acid residues in a
polypeptide chain,that is,the complete three-
dimensional structure.
1.3.4 The quaternary structure refers to the spatial
arrangements of each subunit in a multisubunit protein,
including nature of their contact.
2,Protein Secondary Structure
2.1 The likely regular conformations of protein
molecules were proposed before they were
actually observed!
This was accomplished by building precise
molecular models.
2.1.1 Experimental data (from X-ray
studies) were closely adhered,interpreted.
2.1.2 Single bonds other than the peptide
bond in the backbone chain are free to rotate.
2.2 The simplest arrangement of the polypeptide chain
was proposed to be a helical structure called a-helix
(Pauling and Corey,1951)
2.2.1 The polypeptide backbone is tightly wound
around the long axis (rodlike).
2.2.2 R groups protrude outward from the
helical backbone.
2.2.3 A single turn of the helix (corresponding to
the repeating unit in a-keratin) extends about 5.6
Angstroms,including 3.6 residues (each residue arises
1.5 ? and rotate 100 degrees about the helix axis).
2.2.4 The model made optimal use of internal
hydrogen bonding for structure stabilization.
2.2.5 Each carbonyl oxygen of the residue n is
hydrogen bonded to the NH group of residue (n+4).
2.2.6 The residues forming one a-helix must all
be one type of stereoisomers (either L- or D-).
2.2.7 L amino acids can be used to build either
right- or left-handed a-helices (the helix spiraling
away clockwise or counterclockwise respectively).
Transmission of electric dipole in a-helix.
2.2.8 Five constraints affecting the stability of a-helix
?The electrostatic repulsion (or attraction) between
successive AA residues with charged R groups;
?The bulkiness of adjacent R groups;
?The interactions between AA side chains spaced
three (or four) residues apart;
?The occurrence of Pro and Gly residues;
?The interaction between AA residues at the ends of
the helical segment and the electric dipole inherent
to the a helix.
2.3 b-pleated sheet (or b conformation) was
proposed to be the more extended conformation
of the polypeptide chain.
2.3.1 The conformation is formed when
two or more almost fully extended polypeptide
chains are brought together side by side.
2.3.2 Regular hydrogen bonds are formed
between the carbonyl oxygen and amide
hydrogen between adjacent chains (look like a
zipper).
2.3.3 The axial distance between the
adjacent amino acid residues is ~3.5 Angstroms.
2.3.4 The planes of the peptide bonds
arrange as pleated sheets.
2.3.5 The R groups of adjacent residues
protrude in opposite directions.
2.3.6 The adjacent polypeptide chains can
be either parallel (the same direction) or
antiparallel (the opposite direction).
2.4 b turn (hairpin发卡 turn) is also a common secondary
structure found where a polypeptide chain abruptly
reverses its direction.
2.4.1 It often connects the ends of two adjacent
segments of an antiparallel b-pleated sheet.
2.4.2 It is a tight turn of ~180 degrees involving
four amino acid residues.
2.4.3 The essence of the structure is the hydrogen
bonding between the C=O group of residue n and the NH
group of the residue n+3.
2.4.4 Gly and Pro are often found in b turns,Gly is
there (as the 3rd residue in type II) because it is small and
flexible; for Pro it is because the peptide bond involving
Pro can assume the cis configuration,which in turn
generates a tight turn on the polypeptide chain,
2.4.5 b turns are often found near the surface of a
protein.
With different phi,psi angles.
2.5 Some amino acid residues are accommodated
in the different types of secondary structure
better than others.
2.5.1 The probability is calculated from
known protein structures,It is used in predicting
secondary structures.
2.5.2 Some bias or propensities can be
explained easily.
2.5.3 Others are not yet understood.
Ramachandran plots for a variety of structures.
Relative probabilities that a given amino acid will occur
in the three common types of secondary structure.
2.6.1 The supersecondary structures,also called motifs
or simply folds,refers to clusters of secondary structures that
repeatedly appear.
2.6.2 The already identified supersecondary structures
include mainly bab motif,Greek key motif,b-hairpin loop,
four-helix-bundle,…etc.
2.6.3 Supersecondary structure motifs are usually also
folding motifs of proteins,(a conjecture,Not completely
established experimentally).
2.6.4 A compact region (usually including less than
200~400 residues) that is a distinct structural unit within a
larger polypeptide chain is called a domain.
2.6.5 Many domains fold independently into
thermodynamically stable structures,and sometimes,have
separate functions.
2.6 Supersecondary structures and domains
Structural domains in the polypeptide troponin (肌
钙蛋白 ) C,two separate calcium-binding domains
Repeated usage of a pattern
3,Protein Tertiary Structure
3.1 Each protein usually has one native conformation
3.1.1 Under physiological conditions of solvent
and temperature,each protein folds spontaneously into
one three-dimensional conformation,called the native
conformation.
3.1.2 This conformation is usually
thermodynamically the most stable (having the lowest
Gibb’s free energy),and predominates among the
innumerable theoretically possible ones.
3.1.3 Usually only the native conformation is
functional.
3.2 Proteins are classified into two major groups:
3.2.1 Fibrous proteins have polypeptide chains
arranged in long strands or sheets
3.2.2 Globular proteins have polypeptide chains
folded into a spherical or globular shape.
3.3 a-keratins (角蛋白 ) contain a-coiled coils
3.3.1 a-keratins are rich in hydrophobic residues,
including Phe,Ile,Val,Met,and Ala,that make the
protein insoluble in water.
3.3.2 Two helical strands wrap in parallel together to
form a supertwisted coiled coil in a-keratin.
Each strand is an a-helix.
The superhelical twisting is left-handed in a-
keratins (opposite to the individual strand).
The surfaces where two a-helices touch are
made up of hydrophobic AA residues,their R groups
meshed together in a regular interlocking pattern.
3.3.3 a-keratins are the main components of skin
and many skin derivatives in vertebrate animals.
Including,e.g.,hair,wool,feathers,nails,
claws,quills,scales,horns,hooves,tortoise shell,
and much of the outer layer of skin.
Usually harder a-keratins contain higher
number of Cys (18% of the residues are Cys in
tortoise shells and rhinoceros horns) involved in
disulfide bonds,
a-keratins can be stretched (to twice as its
original length) due to its structure springiness.
While a-keratins have relatively simple
tertiary structure,their quaternary
structure can be quite complex.
Permanent waving of hair is biochemical engineering,
where disulfide bonds between individual chains are
reduced (while the hair being heated),curled,and
reoxidized (cooled at the same time).
3.4 Collagen (胶原蛋白 ) has left-handed polypeptide
chains wrapped together to form right-handed triple helix
3.4.1 The amino acid sequence of collagen is revealed
to be remarkable regular.
Nearly every third residue is Gly (Gly-X-Pro or
Gly-X-Hyp,hydroxylproline),where the sequence Gly-
Pro-Hyp recurs frequently,
Only the small Gly can fit into the crowded
interior of the triple helix,while Pro permits the sharp
twisting of the collagen helix.
3.4.2 The helical motif of its three a chains is entirely
different from that of the a helix.
Intrachain hydrogen bonds are absent,while interchain
hydrogen bonds are formed.
The rise per residue is 2.9 ? and there are nearly 3
residues per turn.
Also superhelical twisting,but right handed,opposite to
that of a keratin.
The collagen polypeptide chains within or between the
triple helices are covalently cross linked through Lys or Hylys
side chains.
The superhelix provides great tensile strength with no
capacity to stretch.
Collagen fibers have similar tensile strength as a steel
wire of equal cross section.
3.4.3 Collagen is the most abundant protein in
mammals.
About 25% of the total protein mass in
mammals is collagen.
It is a major component of tendons,the
extracellular matrix of the connective tissues (skin,
bone matrix),and the cornea of the eye.
Collagen triple helices (also called
tropocollagen) self-assemble in the extracellular
space to form much larger collagen fibrils that
further aggregate into collagen fibers.
3.4.4 Collagen’s significance
The collagen triple helices are regularly
staggered in fibril to give rise to the striated
appearance in negatively stained electron
micrograph.
The fibril formation involves many enzymatic
steps,Deficiency of these steps generate many
genetic diseases (e.g.,osteogenesis imperfecta,
Ehlers-Danlos syndrome,both resulted from single
amino acid replacements of a Gly).
3.5 Fibroin is predominantly in the b conformation
3.5.1 The protein of silk is produced by insects and
spiders.
3.5.2 Fibroin is rich in Ala and Gly,permitting a
close packing of b sheets and an interlocking
arrangements of R groups.
3.5.3 Overall structure is stabilized by extensive
hydrogen bonds and by optimization of van der
Waals interactions between sheets.
3.5.4 Silk does not stretch since its b conformation
is already highly extended.
3.5.5 Structure is flexible since the major force
holding sheets together is weak interactions rather
than disulfide bonds as in a–keratins.
3.6 Globular protein structures are compact and varied
3.7 Sperm whale (抹香鲸,巨头鲸 ) myoglobin(肌
红蛋白 ),the oxygen carrier in muscle,was the
first protein to be seen in atomic detail by X-ray
analysis (John Kendrew,1950s)
3.7.1 The existence of a-helices were for the first time
directly observed in a protein.
The myoglobin molecule contains eight a-
helices.
All the a-helices are right-handed.
All the peptide bonds are in the planar trans
configuration,
There is no b-pleated sheets observed in the
molecule.
3.7.2 The myoglobin molecule has a dense
hydrophobic core.
Many hydrophobic R groups (e.g.,Leu,Val,
Met,Phe) are found to be in the interior of the
myoglobin molecule.
Hydrophobic interaction is important for the
stability of the protein structure.
Only two hydrophilic histidine residues were
found in the interior of the protein.
Short-range van der Waals interactions make
a significant contribution to the stabilizing
hydrophobic interactions due to the solid-like
compactness of the molecule (residues of subtle
difference are used to fill the interior of a protein
neatly and thus maximize van der Waals
interactions.
3.7.3 All but two of the polar R groups are located on
the outer surface and hydrated.
Nonpolar residues are also present on the
outer surface!
3.7.4 Bending are made of residues or sequences that
are incompatible with a-helical structure.
All Pro residues are found at bends.
3.7.5 The flat heme group (the prosthetic group) was
revealed to rest in a crevice (pocket).
The heme group consists of a complex organic
ring structure,protoporphyrin,which is bound to an
iron atom in its ferrous (Fe2+) state.
The iron atom has six coordination bonds,with
four in the plane of and bonded to the flat porphyrin
molecule and two perpendicular to it.
One of the perpendicular coordination bonds is
bound to a nitrogen atom of an interior His residue.
The sixth coordination serves as the binding site
for O2.
The accessibility of the heme group to solvent is
highly restricted,thus preventing the oxidation of the
Fe2+ to the ferric ion (Fe3+),which is unable to bind
O2.
Sir John Kendrew and Max Perutz won the
Nobel Prize in Chemistry in 1962 for
determining the complete atomic structure of
myoglobin and hemeglobin.
Tertiary structure of sperm whale myoglobin,a)
backbone; b),mesh” image; c) surface contour image;
d) ribbon; e) space-filling model.
The heme group.
3.8 Three-dimensional structures of proteins can be
determined by several methids
3.8.1 X-Ray Diffraction
3.8.2 Nuclear Magnetic Resonance (NMR)
3.8.3 Many proteins have since been determined,such
as cytochrome c,lysozyme and ribonuclease A.
3.8.4 Some common characteristics of proteins
All the water soluble globular proteins have a
hydrophobic core and a mainly hydrophilic outer
surface.
Proteins are stabilized mainly by noncovalent
interactions (sometimes by disulfide bonds).
Each protein has a unique structure to perform
its unique function.
X-Ray Diffraction
One-dimensional NMR
Two-dimensional NMR
cytochrome c contains only about 40% a-helices,but
many irregularly coiled and extended segments
Lysozyme,an enzyme
that catalyzes the
hydrolytic cleavage of
polysaccharides in some
bacterial cell walls,
contains about 40% a-
helices and about 12%
b-sheets.
Ribonuclease A
contains more b-
sheet structures.
– Comparative modeling:
? Homology-based.
? Better accuracy with
higher sequence identity
(>30%).
– Threading:
? Fold assignment.
– Ab initio prediction:
? No homology required.
? Computation intensive -
IBM’s,Blue Gene”.
3.9 Protein Structure Modeling
Applications of Structure Modeling
With experimentally determined three-dimensional structure
accumulating in various databases such as PDB,the study of
structure of new proteins becomes more feasible.
deoxyhemoglobin
4,Quaternary structure,macromolecular assembly
Human poliovirus (脊髓灰质炎病毒 ) has an
icosahedral capsid (衣壳,coat protein of virus),
Capsid of tobacco mosaic virus,the virus-encapsulating
cylindrical structure is a right-handed helical filament
made up of 2,130 identical subunits.
5,Proteins denature under stress
5.1 Loss of tertiary structure (native conformation) is
accompanied by loss of function.
5.1.1 Proteins are relatively easy to lose their
tertiary structures due to their marginal stability
maintained by noncovalent interactions.
5.1.2 The process of total loss or randomization
of three-dimensional structure of proteins is called
denaturation.
5.1.3 Protein denaturation results from a change
in the solvent environment that is sufficiently large to
upset the forces that keep the protein structure intact.
5.1.4 Many means can cause protein to denature:
Heating,unbalance the compensating
enthalpic and entropic contributions,thermal motion
causes melting.
Extreme pH,upset the balance of a
protein’s charge interactions.
Miscible organic solvents (alcohol and
acetone),solvent polarity and hydrogen bonding.
Solutes (urea,guanidine),provide
alternative hydrogen bonding.
Detergents (SDS),introducing their
hydrophobic tails into the protein’s interior,
5.1.5 The mechanism of many of these denaturing
processes are fully understood.
5.2 Denaturation of some proteins is reversible.
5.2.1 Some denatured globular proteins will regain
their native structure and their biological activity once
returned to conditions in which the native conformation
is stable,This process is called renaturation.
5.2.2 The denaturation and renaturation
phenomena were originally observed on ribonuclease A
by chance by Christian Anfinsen (1950s).
5.2.3 Ribonuclease A became reduced and
randomly coiled (denatured) in 8 M urea plus b-
mercaptoethanol,with a loss of the enzymatic activity.
5.2.4 When urea and b-mercaptoethanol were
removed,the enzymatic activity was slowly regained
until full recovery under stable conditions,with
existence of trace amount of b-mercaptoethanol.
5.2.5 All the physical and chemical properties of
the refolded enzyme were virtually identical with those
of the native enzyme.
5.2.6 Conclusion,the information needed to
specify the complex tertiary structure of ribonuclease A
is all contained in its amino acid sequence.
5.2.7 Subsequent studies have established the
generality of this central principle of molecular biology,
sequence specifies conformation.
Nobel Prize in Chemistry in 1972 to Anfinsen.
5.3 The tertiary structures of proteins are not rigid.
5.3.1 Many studies have found that globular
proteins have certain amount of flexibility in their
backbones and undergo short-range internal
fluctuations.
5.3.2 Many proteins undergo small
conformational changes in the course of their
biological function (e.g.,O2-bound hemoglobin
differs from O2-free hemoglobin,substrate binding to
enzymes often causes conformational changes).
6,The polypeptide chain of a protein folds rapidly in
vivo
6.1 The protein folding problem is one of the
most challenging and important areas of inquiry
in biochemistry.
6.1 How does the amino acid sequence of a
protein specify its three-dimensional structure?
6.2 How does an unfolded polypeptide
chain acquire the form of its native
conformation?
6.2 Are all possible conformations searched to find
the energetically most favorable one?
6.2.1 The Levinthal’s paradox,the huge
difference between the calculated (theoretical) time
it may take for a polypeptide to fold by random
searching and the actual time it takes.
6.2.2 The cumulative selection,that is,
partially correct intermediates are (recognized by
nature and) retained due to sub-stability,makes
the searching process much more efficient.
6.3 Protein folding is an intriguing problem for both
theoreticians and experimentalists.
6.3.1 Proteins are only marginally stable,The
free energy difference between the folded and unfolded
states of a typical 100-residue protein is only 10
kcal/mol,meaning that correct intermediates can be
easily lost.
6.3.2 The criterion of correctness is the total free
energy of the transient species,not a residue-by-
residue scrutiny of conformation.
6.3.3 Some intermediates,called kinetic traps,
have a favorable free energy but are not on the path to
the final folded form.
6.4 Molten globules are formed early in folding.
6.4.1 Molten globule state contains native
secondary but not tertiary structure (an
experimental observation).
6.4.2 Hydrophobic collapse and acquisition
of stable secondary structure are mutually
reinforcing events in the formation of molten
globules (synergistic,helping each other).
6.5 Partially folded intermediates can be detected,
trapped,and characterized.
6.5.1 Rapid-kinetics studies,where protein
secondary structures are monitored by
spectroscopic methods (e.g.,fluorescence,circular
dichroism),can reveal the progression of distinctive
intermediates during refolding processes.
6.5.2 Disulfide-bonded intermediates can be
trapped covalently by blocking uncombined
cysteines with iodoacetate.
6.5.3 Pulsed hydrogen-deuterium exchange
can be used to monitor the acquisition of secondary
structures in protein folding.
6.5.4 Our understanding of protein folding can be
stringently tested by designing novel proteins with
distinctive functions,For example,encouraging starts
have been made in synthesizing new scaffolds,metal-
binding proteins,channels,and catalysts.
6.6 Protein folding in vivo is sometimes catalyzed by
isomerases and chaperone proteins.
6.6.1 The formation of correct disulfide pairing in
nascent proteins is catalyzed by protein disulfide
isomerase (PDI),which is especially important for
accelerating disulfide interchange in kinetically trapped
folding intermediates.
6.6.2 Peptidyl prolyl isomerases (PPIases)
accelerate cis-trans isomerization of Pro residues
during protein folding.
6.6.3 Molecular chaperones in cells facilitate
the correct assembly (including folding,refolding,
formation of oligomeric complexes,… etc) of
other polypeptides but are not themselves part of
the assembled,functional structure,Are they
enzymes? Facilitated search (e.g.,avoiding
aggregates)? How? Certainly,they do not specify
the final structure? But do they specify the path?
May specify certain properties of the paths and/or
intermediates,May provide an appropriate
environment for folding,Many questions are yet
to be answered,
Summary,
Five schemes of protein three-
dimensional structures
1) The three-dimensional structure of a protein is
determined by its amino acid sequence.
2) The function of protein depends on its structure.
3) An isolated protein has a unique,or nearly unique,
structure.
4) The most important forces stabilizing the specific
structure of a protein are non-covalent interactions.
5) Amid the the huge number of unique protein
structures,we can recognize some common
structural patterns to improve our understanding of
protein architecture.