T
he third and final part of this book explores the bio-
chemical mechanisms underlying the apparently con-
tradictory requirements for both genetic continuity and
the evolution of living organisms. What is the molecular
nature of genetic material? How is genetic information
transmitted from one generation to the next with high
fidelity? How do the rare changes in genetic material
that are the raw material of evolution arise? How is ge-
netic information ultimately expressed in the amino acid
sequences of the astonishing variety of protein mole-
cules in a living cell?
The fundamental unit of information in living sys-
tems is the gene. A gene can be defined biochemically
as a segment of DNA (or, in a few cases, RNA) that en-
codes the information required to produce a functional
biological product. The final product is usually a pro-
tein, so much of the material in Part III concerns genes
that encode proteins. A functional gene product might
also be one of several classes of RNA molecules. The
storage, maintenance, and metabolism of these infor-
mational units form the focal points of our discussion in
Part III.
Modern biochemical research on gene structure and
function has brought to biology a revolution compara-
ble to that stimulated by the publication of Darwin’s the-
ory on the origin of species nearly 150 years ago. An un-
derstanding of how information is stored and used in
cells has brought penetrating new insights to some of
the most fundamental questions about cellular structure
and function. A comprehensive conceptual framework
for biochemistry is now unfolding.
Today’s understanding of information pathways has
arisen from the convergence of genetics, physics, and
chemistry in modern biochemistry. This was epitomized
by the discovery of the double-helical structure of DNA,
postulated by James Watson and Francis Crick in 1953
(see Fig. 8–15). Genetic theory contributed the concept
of coding by genes. Physics permitted the determina-
tion of molecular structure by x-ray diffraction analysis.
Chemistry revealed the composition of DNA. The pro-
found impact of the Watson-Crick hypothesis arose from
its ability to account for a wide range of observations
derived from studies in these diverse disciplines.
This revolution in our understanding of the struc-
ture of DNA inevitably stimulated questions about its
function. The double-helical structure itself clearly sug-
gested how DNA might be copied so that the informa-
tion it contains can be transmitted from one generation
to the next. Clarification of how the information in DNA
is converted into functional proteins came with the dis-
covery of both messenger RNA and transfer RNA and
with the deciphering of the genetic code.
These and other major advances gave rise to the
central dogma of molecular biology, comprising the
three major processes in the cellular utilization of ge-
netic information. The first is replication, the copying
of parental DNA to form daughter DNA molecules with
identical nucleotide sequences. The second is tran-
scription, the process by which parts of the genetic
message encoded in DNA are copied precisely into RNA.
The third is translation, whereby the genetic message
encoded in messenger RNA is translated on the ribo-
somes into a polypeptide with a particular sequence of
amino acids.
PART
INFORMATION PATHWAYS
III
24 Genes and Chromosomes 923
25 DNA Metabolism 948
26 RNA Metabolism 995
27 Protein Metabolism 1034
28 Regulation of Gene Expression 1081
921
8885d_c24_920-947 2/11/04 1:36 PM Page 921 mac76 mac76:385_reb:
Part III explores these and related processes. In
Chapter 24 we examine the structure, topology, and
packaging of chromosomes and genes. The processes
underlying the central dogma are elaborated in Chap-
ters 25 through 27. Finally, we turn to regulation, ex-
amining how the expression of genetic information is
controlled (Chapter 28).
A major theme running through these chapters is
the added complexity inherent in the biosynthesis of
macromolecules that contain information. Assembling
nucleic acids and proteins with particular sequences of
nucleotides and amino acids represents nothing less
than preserving the faithful expression of the template
upon which life itself is based. We might expect the for-
mation of phosphodiester bonds in DNA or peptide
bonds in proteins to be a trivial feat for cells, given the
arsenal of enzymatic and chemical tools described in
Part II. However, the framework of patterns and rules
established in our examination of metabolic pathways
thus far must be enlarged considerably to take into
account molecular information. Bonds must be formed
between particular subunits in informational biopoly-
mers, avoiding either the occurrence or the persistence
of sequence errors. This has an enormous impact on the
thermodynamics, chemistry, and enzymology of the
biosynthetic processes. Formation of a peptide bond re-
quires an energy input of only about 21 kJ/mol of bonds
and can be catalyzed by relatively simple enzymes. But
to synthesize a bond between two specific amino acids
at a particular point in a polypeptide, the cell invests
about 125 kJ/mol while making use of more than 200
enzymes, RNA molecules, and specialized proteins. The
chemistry involved in peptide bond formation does not
change because of this requirement, but additional
processes are layered over the basic reaction to ensure
that the peptide bond is formed between particular
amino acids. Information is expensive.
The dynamic interaction between nucleic acids and
proteins is another central theme of Part III. With the
important exception of a few catalytic RNA molecules
(discussed in Chapters 26 and 27), the processes that
make up the pathways of cellular information flow are
catalyzed and regulated by proteins. An understanding
of these enzymes and other proteins can have practical
as well as intellectual rewards, because they form the
basis of recombinant DNA technology (introduced in
Chapter 9).
Part III Information Pathways922
The central dogma of molecular biology, showing the general path-
ways of information flow via replication, transcription, and transla-
tion. The term “dogma” is a misnomer. Introduced by Francis Crick at
a time when little evidence supported these ideas, the dogma has be-
come a well-established principle.
RNA
Protein
Transcription
Translation
DNAReplication
8885d_c24_922 2/11/04 3:11 PM Page 922 mac76 mac76:385_reb:
chapter
A
lmost every cell of a multicellular organism contains
the same complement of genetic material—its
genome. Just look at any human individual for a hint
of the wealth of information contained in each human
cell. Chromosomes, the nucleic acid molecules that are
the repository of an organism’s genetic information, are
the largest molecules in a cell and may contain thou-
sands of genes as well as considerable tracts of inter-
genic DNA. The 16 chromosomes in the relatively small
genome of the yeast Saccharomyces cerevisiae have
molecular masses ranging from 1.5 H11003 10
8
to 1 H11003 10
9
dal-
tons, corresponding to DNA molecules with 230,000 to
1,532,000 contiguous base pairs (bp). Human chromo-
somes range up to 279 million bp.
The very size of DNA molecules presents an inter-
esting biological puzzle, given that they are generally
much longer than the cells or viral packages that con-
tain them (Fig. 24–1). In this chapter we shift our focus
from the secondary structure of DNA, considered in
Chapter 8, to the extraordinary degree of organization
required for the tertiary packaging of DNA into chromo-
somes. We first examine the elements within viral and
cellular chromosomes, then assess their size and organi-
zation. We next consider DNA topology, providing a
GENES AND CHROMOSOMES
24.1 Chromosomal Elements 924
24.2 DNA Supercoiling 930
24.3 The Structure of Chromosomes 938
DNA topoisomerases are the magicians of the DNA world.
By allowing DNA strands or double helices to pass
through each other, they can solve all of the topological
problems of DNA in replication, transcription and other
cellular transactions.
—James Wang, article in Nature Reviews in
Molecular Cell Biology, 2002
Supercoiling, in fact, does more for DNA than act as an
executive enhancer; it keeps the unruly, spreading DNA
inside the cramped confines that the cell has provided
for it.
—Nicholas Cozzarelli, Harvey Lectures, 1993
24
923
0.5 mH9262
FIGURE 24–1 Bacteriophage T2 protein coat surrounded by its sin-
gle, linear molecule of DNA. The DNA was released by lysing the
bacteriophage particle in distilled water and allowing the DNA to
spread on the water surface. An undamaged T2 bacteriophage parti-
cle consists of a head structure that tapers to a tail by which the bac-
teriophage attaches itself to the outer surface of a bacterial cell. All
the DNA shown in this electron micrograph is normally packaged in-
side the phage head.
8885d_c24_920-947 2/11/04 1:36 PM Page 923 mac76 mac76:385_reb:
description of the coiling of DNA molecules. Finally, we
discuss the protein-DNA interactions that organize
chromosomes into compact structures.
24.1 Chromosomal Elements
Cellular DNA contains genes and intergenic regions,
both of which may serve functions vital to the cell. The
more complex genomes, such as those of eukaryotic
cells, demand increased levels of chromosomal organi-
zation, and this is reflected in the chromosome’s struc-
tural features. We begin by considering the different
types of DNA sequences and structural elements within
chromosomes.
Genes Are Segments of DNA That Code
for Polypeptide Chains and RNAs
Our understanding of genes has evolved tremendously
over the last century. Classically, a gene was defined as
a portion of a chromosome that determines or affects a
single character or phenotype (visible property), such
as eye color. George Beadle and Edward Tatum proposed
a molecular definition of a gene in 1940. After exposing
spores of the fungus Neurospora crassa to x rays and
other agents known to damage DNA and cause alterations
in DNA sequence (mutations), they detected mutant
fungal strains that lacked one or another specific en-
zyme, sometimes resulting in the failure of an entire
metabolic pathway. Beadle and Tatum concluded that a
gene is a segment of genetic material that determines
or codes for one enzyme: the one gene–one enzyme
hypothesis. Later this concept was broadened to one
gene–one polypeptide, because many genes code for
proteins that are not enzymes or for one polypeptide of
a multisubunit protein.
The modern biochemical definition of a gene is even
more precise. A gene is all the DNA that encodes the
primary sequence of some final gene product, which can
be either a polypeptide or an RNA with a structural or
catalytic function. DNA also contains other segments or
sequences that have a purely regulatory function. Reg-
ulatory sequences provide signals that may denote the
beginning or the end of genes, or influence the tran-
scription of genes, or function as initiation points for
replication or recombination (Chapter 28). Some genes
can be expressed in different ways to generate multiple
gene products from one segment of DNA. The special
transcriptional and translational mechanisms that allow
this are described in Chapters 26 through 28.
We can make direct estimations of the minimum
overall size of genes that encode proteins. As described
in detail in Chapter 27, each amino acid of a polypep-
tide chain is coded for by a sequence of three consec-
utive nucleotides in a single strand of DNA (Fig. 24–2),
with these “codons” arranged in a sequence that corre-
sponds to the sequence of amino acids in the polypep-
tide that the gene encodes. A polypeptide chain of 350
amino acid residues (an average-size chain) corre-
Chapter 24 Genes and Chromosomes924
George W. Beadle,
1903–1989
Edward L. Tatum,
1909–1975
U
C
U
A
G
A
C
G
U
G
C
A
G
G
A
C
C
T
U
A
C
A
T
G
A
C
U
T
G
A
U
U
U
A
A
A
G
C
C
C
G
G
G
U
U
C
A
A
5H11032 3H11032
3H11032 5H11032
DNA mRNA
T
C
T
C
G
T
G
G
A
T
A
C
A
C
T
T
T
T
G
C
C
G
T
T
3H11032
5H11032
Arg
Gly
Tyr
Thr
Phe
Ala
Val
Ser
Carboxyl
terminus
Amino
terminus
Polypeptide
Template strand
FIGURE 24–2 Colinearity of the coding nucleotide sequences of
DNA and mRNA and the amino acid sequence of a polypeptide chain.
The triplets of nucleotide units in DNA determine the amino acids in
a protein through the intermediary mRNA. One of the DNA strands
serves as a template for synthesis of mRNA, which has nucleotide
triplets (codons) complementary to those of the DNA. In some bacte-
rial and many eukaryotic genes, coding sequences are interrupted at
intervals by regions of noncoding sequences (called introns).
8885d_c24_920-947 2/11/04 1:36 PM Page 924 mac76 mac76:385_reb:
sponds to 1,050 bp. Many genes in eukaryotes and a few
in prokaryotes are interrupted by noncoding DNA seg-
ments and are therefore considerably longer than this
simple calculation would suggest.
How many genes are in a single chromosome? The
Escherichia coli chromosome, one of the prokaryotic
genomes that has been completely sequenced, is a cir-
cular DNA molecule (in the sense of an endless loop
rather than a perfect circle) with 4,639,221 bp. These
base pairs encode about 4,300 genes for proteins and
another 115 genes for stable RNA molecules. Among eu-
karyotes, the approximately 3.2 billion base pairs of the
human genome include 30,000 to 35,000 genes on 24
different chromosomes.
DNA Molecules Are Much Longer Than the Cellular
Packages That Contain Them
Chromosomal DNAs are often many orders of magni-
tude longer than the cells or viruses in which they are
found (Fig. 24–1; Table 24–1). This is true of every class
of organism or parasite.
Viruses Viruses are not free-living organisms; rather,
they are infectious parasites that use the resources of a
host cell to carry out many of the processes they re-
quire to propagate. Many viral particles consist of no
more than a genome (usually a single RNA or DNA mol-
ecule) surrounded by a protein coat.
Almost all plant viruses and some bacterial and an-
imal viruses have RNA genomes. These genomes tend
to be particularly small. For example, the genomes of
mammalian retroviruses such as HIV are about 9,000 nu-
cleotides long, and that of the bacteriophage QH9252 has
4,220 nucleotides. Both types of viruses have single-
stranded RNA genomes.
The genomes of DNA viruses vary greatly in size
(Table 24–1). Many viral DNAs are circular for at least
part of their life cycle. During viral replication within a
host cell, specific types of viral DNA called replicative
forms may appear; for example, many linear DNAs be-
come circular and all single-stranded DNAs become
double-stranded. A typical medium-sized DNA virus is
bacteriophage H9261 (lambda), which infects E. coli. In its
replicative form inside cells, H9261 DNA is a circular double
helix. This double-stranded DNA contains 48,502 bp and
has a contour length of 17.5 H9262m. Bacteriophage H9278X174
is a much smaller DNA virus; the DNA in the viral par-
ticle is a single-stranded circle, and the double-stranded
replicative form contains 5,386 bp. Although viral
genomes are small, the contour lengths of their DNAs
are much greater than the long dimensions of the viral
particles that contain them. The DNA of bacteriophage
T4, for example, is about 290 times longer than the vi-
ral particle itself (Table 24–1).
Bacteria A single E. coli cell contains almost 100 times
as much DNA as a bacteriophage H9261 particle. The chro-
mosome of an E. coli cell is a single double-stranded
circular DNA molecule. Its 4,639,221 bp have a contour
length of about 1.7 mm, some 850 times the length of
the E. coli cell (Fig. 24–3). In addition to the very large,
circular DNA chromosome in their nucleoid, many bac-
teria contain one or more small circular DNA molecules
that are free in the cytosol. These extrachromosomal
elements are called plasmids (Fig. 24–4; see also
p. 311). Most plasmids are only a few thousand base
pairs long, but some contain more than 10,000 bp. They
carry genetic information and undergo replication to
yield daughter plasmids, which pass into the daughter
cells at cell division. Plasmids have been found in yeast
and other fungi as well as in bacteria.
In many cases plasmids confer no obvious advan-
tage on their host, and their sole function appears to be
self-propagation. However, some plasmids carry genes
that are useful to the host bacterium. For example,
some plasmid genes make a host bacterium resistant
to antibacterial agents. Plasmids carrying the gene for
the enzyme H9252-lactamase confer resistance to H9252-lactam
antibiotics such as penicillin and amoxicillin (see Box
20–1). These and similar plasmids may pass from an
antibiotic-resistant cell to an antibiotic-sensitive cell of the
same or another bacterial species, making the recipient
cell antibiotic resistant. The extensive use of antibiotics
24.1 Chromosomal Elements 925
TABLE 24–1 The Sizes of DNA and Viral Particles for Some Bacterial Viruses (Bacteriophages)
Size of viral Length of Long dimension of
Virus DNA (bp) viral DNA (nm) viral particle (nm)
H9278X174 5,386 1,939 25
T7 39,936 14,377 78
H9261 (lambda) 48,502 17,460 190
T4 168,889 60,800 210
Note: Data on size of DNA are for the replicative form (double-stranded). The contour length is calculated assuming that
each base pair occupies a length of 3.4 ? (see Fig. 8–15).
8885d_c24_925 2/12/04 11:21 AM Page 925 mac76 mac76:385_reb:
E. coli
E. coli
DNA
mosomes (Fig. 24–5). Each chromosome of a eukary-
otic cell, such as that shown in Figure 24–5a, contains
a single, very large, duplex DNA molecule. The DNA
molecules in the 24 different types of human chromo-
somes (22 matching pairs plus the X and Y sex chro-
mosomes) vary in length over a 25-fold range. Each type
of chromosome in eukaryotes carries a characteristic set
of genes. Interestingly, the number of genes does not
vary nearly as much as does genome size (see Chapter
9 for a discussion of the types of sequences, besides
genes, that contribute to genome size).
The DNA of one human genome (22 chromosomes
plus X and Y or two X chromosomes), placed end to
end, would extend for about a meter. Most human cells
are diploid and each cell contains a total of 2 m of DNA.
An adult human body contains approximately 10
14
cells
and thus a total DNA length of 2 H11003 10
11
km. Compare
this with the circumference of the earth (4 H11003 10
4
km)
or the distance between the earth and the sun
(1.5 H11003 10
8
km)—a dramatic illustration of the extraor-
dinary degree of DNA compaction in our cells.
in some human populations has served as a strong
selective force, encouraging the spread of antibiotic
resistance–coding plasmids (as well as transposable el-
ements, described below, that harbor similar genes) in
disease-causing bacteria and creating bacterial strains
that are resistant to several antibiotics. Physicians are
becoming increasingly reluctant to prescribe antibiotics
unless a clear clinical need is confirmed. For similar rea-
sons, the widespread use of antibiotics in animal feeds
is being curbed.
Eukaryotes A yeast cell, one of the simplest eukary-
otes, has 2.6 times more DNA in its genome than an E.
coli cell (Table 24–2). Cells of Drosophila, the fruit fly
used in classical genetic studies, contain more than 35
times as much DNA as E. coli cells, and human cells
have almost 700 times as much. The cells of many plants
and amphibians contain even more. The genetic material
of eukaryotic cells is apportioned into chromosomes, the
diploid (2n) number depending on the species (Table
24–2). A human somatic cell, for example, has 46 chro-
Chapter 24 Genes and Chromosomes926
FIGURE 24–3 The length of the E. coli chromosome (1.7 mm) depicted in
linear form relative to the length of a typical E. coli cell (2 H9262m).
FIGURE 24–4 DNA from a lysed E. coli cell. In this electron micrograph several small, circu-
lar plasmid DNAs are indicated by white arrows. The black spots and white specks are artifacts
of the preparation.
8885d_c24_920-947 2/11/04 1:36 PM Page 926 mac76 mac76:385_reb:
Eukaryotic cells also have organelles, mitochondria
(Fig. 24–6) and chloroplasts, that contain DNA. Mito-
chondrial DNA (mtDNA) molecules are much smaller
than the nuclear chromosomes. In animal cells, mtDNA
contains fewer than 20,000 bp (16,569 bp in human
mtDNA) and is a circular duplex. Each mitochondrion
typically has two to ten copies of this mtDNA molecule,
and the number can rise to hundreds in certain cells
when an embryo is undergoing cell differentiation. In a
few organisms (trypanosomes, for example) each mito-
chondrion contains thousands of copies of mtDNA, or-
ganized into a complex and interlinked matrix known as
a kinetoplast. Plant cell mtDNA ranges in size from
200,000 to 2,500,000 bp. Chloroplast DNA (cpDNA) also
exists as circular duplexes and ranges in size from
120,000 to 160,000 bp. The evolutionary origin of mito-
chondrial and chloroplast DNAs has been the subject of
much speculation. A widely accepted view is that they
are vestiges of the chromosomes of ancient bacteria that
gained access to the cytoplasm of host cells and became
the precursors of these organelles (see Fig. 1–36).
24.1 Chromosomal Elements 927
(a)
(b)
FIGURE 24–6 A dividing mitochondrion. Some mitochondrial
proteins and RNAs are encoded by one of the copies of the mito-
chondrial DNA (none of which are visible here). The DNA (mtDNA)
is replicated each time the mitochondrion divides, before cell division.
FIGURE 24–5 Eukaryotic chromosomes. (a) A pair of linked and condensed
sister chromatids from a human chromosome. Eukaryotic chromosomes are
in this state after replication and at metaphase during mitosis. (b) A complete
set of chromosomes from a leukocyte from one of the authors. There are 46
chromosomes in every normal human somatic cell.
8885d_c24_920-947 2/11/04 1:36 PM Page 927 mac76 mac76:385_reb:
Mitochondrial DNA codes for the mitochondrial tRNAs
and rRNAs and for a few mitochondrial proteins. More
than 95% of mitochondrial proteins are encoded by nu-
clear DNA. Mitochondria and chloroplasts divide when
the cell divides. Their DNA is replicated before and dur-
ing division, and the daughter DNA molecules pass into
the daughter organelles.
Eukaryotic Genes and Chromosomes
Are Very Complex
Many bacterial species have only one chromosome per
cell and, in nearly all cases, each chromosome contains
only one copy of each gene. A very few genes, such as
those for rRNAs, are repeated several times. Genes and
regulatory sequences account for almost all the DNA in
prokaryotes. Moreover, almost every gene is precisely
colinear with the amino acid sequence (or RNA se-
quence) for which it codes (Fig. 24–2).
The organization of genes in eukaryotic DNA is
structurally and functionally much more complex. The
study of eukaryotic chromosome structure, and more
recently the sequencing of entire eukaryotic genomes,
has yielded many surprises. Many, if not most, eukary-
otic genes have a distinctive and puzzling structural
feature: their nucleotide sequences contain one or more
intervening segments of DNA that do not code for the
amino acid sequence of the polypeptide product. These
nontranslated inserts interrupt the otherwise colinear
relationship between the nucleotide sequence of the
gene and the amino acid sequence of the polypeptide it
encodes. Such nontranslated DNA segments in genes
are called intervening sequences or introns, and the
coding segments are called exons. Few prokaryotic
genes contain introns.
In higher eukaryotes, the typical gene has much
more intron sequence than sequences devoted to ex-
ons. For example, in the gene coding for the single
polypeptide chain of the avian egg protein ovalbumin
(Fig. 24–7), the introns are much longer than the ex-
ons; altogether, seven introns make up 85% of the gene’s
DNA. In the gene for the H9252 subunit of hemoglobin, a sin-
gle intron contains more than half of the gene’s DNA.
The gene for the muscle protein titin is the intron cham-
pion, with 178 introns. Genes for histones appear to have
no introns. In most cases the function of introns is not
clear. In total, only about 1.5% of human DNA is “cod-
ing” or exon DNA, carrying information for protein or
RNA products. However, when the much larger introns
are included in the count, as much as 30% of the hu-
man genome consists of genes.
The relative paucity of genes in the human genome
leaves a lot of DNA unaccounted for. Figure 24–8
provides a summary of sequence types. Much of the
nongene DNA is in the form of repeated sequences of
several kinds. Perhaps most surprising, about half the
human genome is made up of moderately repeated se-
quences that are derived from transposable elements—
segments of DNA, ranging from a few hundred to sev-
eral thousand base pairs long, that can move from one
location to another in the genome. Transposable ele-
ments (transposons) are a kind of molecular parasite,
efficiently making a home within the host genome. Many
have genes encoding proteins that catalyze the trans-
position process, described in more detail in Chapters
25 and 26. Some transposons in the human genome are
active, moving at a low frequency, but most are inactive
relics, evolutionarily altered by mutations. Although
these elements generally do not encode proteins or
RNAs that are used in human cells, they have played a
Chapter 24 Genes and Chromosomes928
TABLE 24–2 DNA, Gene, and Chromosome Content in Some Genomes
Total DNA (bp) Number of Approximate
chromosomes
*
number of genes
Bacterium (Escherichia coli) 4,639,221 1 4,405
Yeast (Saccharomyces cerevisiae) 12,068,000 16
?
6,200
Nematode (Caenorhabditis elegans) 97,000,000 12
?
19,000
Plant (Arabidopsis thaliana) 125,000,000 10 25,500
Fruit fly (Drosophila melanogaster) 180,000,000 18 13,600
Plant (Oryza sativa; rice) 480,000,000 24 57,000
Mouse (Mus musculus) 2,500,000,000 40 30,000–35,000
Human (Homo sapiens) 3,200,000,000 46 30,000–35,000
Note: This information is constantly being refined. For the most current information, consult the websites for the individual genome projects.
*
The diploid chromosome number is given for all eukaryotes except yeast.
?
Haploid chromosome number. Wild yeast strains generally have eight (octoploid) or more sets of these chromosomes.
?
Number for females, with two X chromosomes. Males have an X but no Y, thus 11 chromosomes in all.
8885d_c24_920-947 2/11/04 1:36 PM Page 928 mac76 mac76:385_reb:
major role in human evolution: movement of trans-
posons can lead to the redistribution of other genomic
sequences.
Another 3% or so of the human genome consists of
highly repetitive sequences, also referred to as
simple-sequence DNA or simple sequence repeats
(SSR). These short sequences, generally less than
10 bp long, are sometimes repeated millions of times per
cell. The simple-sequence DNA has also been called
satellite DNA, so named because its unusual base com-
position often causes it to migrate as “satellite” bands
(separated from the rest of the DNA) when fragmented
cellular DNA samples are centrifuged in a cesium chlo-
ride density gradient. Studies suggest that simple-
sequence DNA does not encode proteins or RNAs. Un-
like the transposable elements, the highly repetitive
DNA can have identifiable functional importance in
human cellular metabolism, because much of it is asso-
ciated with two defining features of eukaryotic chro-
mosomes: centromeres and telomeres.
24.1 Chromosomal Elements 929
A BC D E F G
12 3 4 5 6 7
Ovalbumin
gene
A
131 bp
B
851 bp
1
90 bp
2
222 bp
3
126 bp
L
Hemoglobin
H9252 subunit
Exon
Intron
FIGURE 24–7 Introns in two eukaryotic genes. The gene for ovalbu-
min has seven introns (A to G), splitting the coding sequences into
eight exons (L, and 1 to 7). The gene for the H9252 subunit of hemoglobin
has two introns and three exons, including one intron that alone con-
tains more than half the base pairs of the gene.
G
e
n
e
s
3
0
%
M
i
s
c
e
l
l
a
n
e
o
u
s
2
5
%
T
ra
nsposon
s
45%
13%
SINEs
8%
Retroviruslike
3% SSR
5% SD
17% ?
28.5%
Introns and
noncoding
segments
21%
LINEs
1.5% Exons
FIGURE 24–8 Types of sequences in the human genome. This pie
chart divides the genome into transposons (transposable elements),
genes, and miscellaneous sequences. There are four main classes of
transposons. Long interspersed elements (LINEs), 6 to 8 kbp long (1 kbp
H11005 1,000 bp), typically include a few genes encoding proteins that cat-
alyze transposition. The genome has about 850,000 LINEs. Short inter-
spersed elements (SINEs) are about 100 to 300 bp long. Of the 1.5
million in the human genome more than 1 million are Alu elements,
so called because they generally include one copy of the recognition
sequence for AluI, a restriction endonuclease (see Fig. 9–3). The
genome also contains 450,000 copies of retroviruslike transposons,
1.5 to 11 kbp long. Although these are “trapped” in the genome and
cannot move from one cell to another, they are evolutionarily related
to the retroviruses (Chapter 26), which include HIV. A final class of
transposons (making up H110211% and not shown here) consists of a vari-
ety of transposon remnants that differ greatly in length.
About 30% of the genome consists of sequences included in genes
for proteins, but only a small fraction of this DNA is in exons (coding
sequences). Miscellaneous sequences include simple-sequence re-
peats (SSR) and large segmental duplications (SD), the latter being seg-
ments that appear more than once in different locations. Among the
unlisted sequence elements (denoted by a question mark) are genes
encoding RNAs (which can be harder to identify than genes for pro-
teins) and remnants of transposons that have been evolutionarily al-
tered so that they are now hard to identify.
8885d_c24_920-947 2/11/04 1:36 PM Page 929 mac76 mac76:385_reb:
The centromere (Fig. 24–9) is a sequence of DNA
that functions during cell division as an attachment
point for proteins that link the chromosome to the mi-
totic spindle. This attachment is essential for the equal
and orderly distribution of chromosome sets to daugh-
ter cells. The centromeres of Saccharomyces cere-
visiae have been isolated and studied. The sequences
essential to centromere function are about 130 bp long
and are very rich in AUT pairs. The centromeric se-
quences of higher eukaryotes are much longer and, un-
like those of yeast, generally contain simple-sequence
DNA, which consists of thousands of tandem copies of
one or a few short sequences of 5 to 10 bp, in the same
orientation. The precise role of simple-sequence DNA
in centromere function is not yet understood.
Telomeres (Greek telos, “end”) are sequences at
the ends of eukaryotic chromosomes that help stabilize
the chromosome. The best-characterized telomeres are
those of the simpler eukaryotes. Yeast telomeres end
with about 100 bp of imprecisely repeated sequences of
the form
(5H11032)(T
x
G
y
)
n
(3H11032)(A
x
C
y
)
n
where x and y are generally between 1 and 4. The num-
ber of telomere repeats, n, is in the range of 20 to 100
for most single-celled eukaryotes and generally more
than 1,500 in mammals. The ends of a linear DNA mol-
ecule cannot be routinely replicated by the cellular repli-
cation machinery (which may be one reason why bac-
terial DNA molecules are circular). Repeated telomeric
sequences are added to eukaryotic chromosome ends
primarily by the enzyme telomerase (see Fig. 26–35).
Artificial chromosomes (Chapter 9) have been con-
structed as a means of better understanding the func-
tional significance of many structural features of eukar-
yotic chromosomes. A reasonably stable artificial linear
chromosome requires only three components: a centro-
mere, telomeres at each end, and sequences that allow
the initiation of DNA replication. Yeast artificial chromo-
somes (YACs; see Fig. 9–8) have been developed as a
research tool in biotechnology. Similarly, human artificial
chromosomes (HACs) are being developed for the treat-
ment of genetic diseases by somatic gene therapy.
SUMMARY 24.1 Chromosomal Elements
■ Genes are segments of a chromosome that
contain the information for a functional
polypeptide or RNA molecule. In addition to
genes, chromosomes contain a variety of
regulatory sequences involved in replication,
transcription, and other processes.
■ Genomic DNA and RNA molecules are
generally orders of magnitude longer than the
viral particles or cells that contain them.
■ Many genes in eukaryotic cells, and a few in
bacteria, are interrupted by noncoding
sequences called introns. The coding segments
separated by introns are called exons.
■ Less than one-third of human genomic DNA
consists of genes. Much of the remainder
consists of repeated sequences of various
types. Nucleic acid parasites known as
transposons account for about half of the
human genome.
■ Eukaryotic chromosomes have two important
special-function repetitive DNA sequences:
centromeres, which are attachment points for
the mitotic spindle, and telomeres, located at
the ends of chromosomes.
24.2 DNA Supercoiling
Cellular DNA, as we have seen, is extremely compacted,
implying a high degree of structural organization. The
folding mechanism must not only pack the DNA but also
permit access to the information in the DNA. Before
considering how this is accomplished in processes such
as replication and transcription, we need to examine an
important property of DNA structure known as super-
coiling.
Supercoiling means the coiling of a coil. A telephone
cord, for example, is typically a coiled wire. The path
taken by the wire between the base of the phone and
the receiver often includes one or more supercoils (Fig.
24–10). DNA is coiled in the form of a double helix, with
both strands of the DNA coiling around an axis. The
further coiling of that axis upon itself (Fig. 24–11) pro-
duces DNA supercoiling. As detailed below, DNA
supercoiling is generally a manifestation of structural
strain. When there is no net bending of the DNA axis
upon itself, the DNA is said to be in a relaxed state.
We might have predicted that DNA compaction in-
volved some form of supercoiling. Perhaps less pre-
dictable is that replication and transcription of DNA also
affect and are affected by supercoiling. Both processes
Chapter 24 Genes and Chromosomes930
Unique sequences (genes), dispersed repeats,
and multiple replication origins
TelomereCentromereTelomere
FIGURE 24–9 Important structural elements of a yeast chromosome.
8885d_c24_920-947 2/11/04 1:36 PM Page 930 mac76 mac76:385_reb:
require a separation of DNA strands—a process com-
plicated by the helical interwinding of the strands (as
demonstrated in Fig. 24–12).
That DNA would bend on itself and become super-
coiled in tightly packaged cellular DNA would seem log-
ical, then, and perhaps even trivial, were it not for one
additional fact: many circular DNA molecules remain
highly supercoiled even after they are extracted and pu-
rified, freed from protein and other cellular components.
This indicates that supercoiling is an intrinsic property
of DNA tertiary structure. It occurs in all cellular DNAs
and is highly regulated by each cell.
A number of measurable properties of supercoiling
have been established, and the study of supercoiling has
provided many insights into DNA structure and func-
tion. This work has drawn heavily on concepts derived
from a branch of mathematics called topology, the
study of the properties of an object that do not change
under continuous deformations. For DNA, continuous
deformations include conformational changes due to
thermal motion or an interaction with proteins or other
molecules; discontinuous deformations involve DNA
strand breakage. For circular DNA molecules, a topolo-
gical property is one that is unaffected by deformations
FIGURE 24–10 Supercoils. A typical phone cord is coiled like a DNA
helix, and the coiled cord can itself coil in a supercoil. The illustra-
tion is especially appropriate because an examination of phone cords
helped lead Jerome Vinograd and his colleagues to the insight that
many properties of small circular DNAs can be explained by super-
coiling. They first detected DNA supercoiling, in small circular viral
DNAs, in 1965.
DNA double
helix (coil)
DNA
supercoil
Axis
FIGURE 24–11 Supercoiling of DNA. When the axis of the DNA dou-
ble helix is coiled on itself, it forms a new helix (superhelix). The DNA
superhelix is usually called a supercoil.
FIGURE 24–12 Supercoiling induced by separating the strands of a
helical structure. Twist two linear strands of rubber band into a right-
handed double helix as shown. Fix one end by having a friend hold
onto it, then pull apart the two strands at the other end. The resulting
strain will produce supercoiling.
24.2 DNA Supercoiling 931
8885d_c24_920-947 2/11/04 1:36 PM Page 931 mac76 mac76:385_reb:
of the DNA strands as long as no breaks are introduced.
Topological properties are changed only by breakage
and rejoining of the backbone of one or both DNA
strands.
We now examine the fundamental properties and
physical basis of supercoiling.
Most Cellular DNA Is Underwound
To understand supercoiling we must first focus on the
properties of small circular DNAs such as plasmids and
small viral DNAs. When these DNAs have no breaks in
either strand, they are referred to as closed-circular
DNAs. If the DNA of a closed-circular molecule con-
forms closely to the B-form structure (the Watson-Crick
structure; see Fig. 8–15), with one turn of the double
helix per 10.5 bp, the DNA is relaxed rather than su-
percoiled (Fig. 24–13). Supercoiling results when DNA
is subject to some form of structural strain. Purified
closed-circular DNA is rarely relaxed, regardless of its
biological origin. Furthermore, DNAs derived from a
given cellular source have a characteristic degree of su-
percoiling. DNA structure is therefore strained in a man-
ner that is regulated by the cell to induce the super-
coiling.
In almost every instance, the strain is a result of un-
derwinding of the DNA double helix in the closed cir-
cle. In other words, the DNA has fewer helical turns
than would be expected for the B-form structure. The
effects of underwinding are summarized in Figure
24–14. An 84 bp segment of a circular DNA in the re-
laxed state would contain eight double-helical turns, or
one for every 10.5 bp. If one of these turns were re-
moved, there would be (84 bp)/7 H11005 12.0 bp per turn,
rather than the 10.5 found in B-DNA (Fig. 24–14b). This
is a deviation from the most stable DNA form, and the
molecule is thermodynamically strained as a result. Gen-
erally, much of this strain would be accommodated by
coiling the axis of the DNA on itself to form a supercoil
(Fig. 24–14c; some of the strain in this 84 bp segment
would simply become dispersed in the untwisted struc-
ture of the larger DNA molecule). In principle, the strain
could also be accommodated by separating the two DNA
strands over a distance of about 10 bp (Fig. 24–14d). In
isolated closed-circular DNA, strain introduced by un-
derwinding is generally accommodated by supercoiling
rather than strand separation, because coiling the axis
of the DNA usually requires less energy than breaking
the hydrogen bonds that stabilize paired bases. Note,
however, that the underwinding of DNA in vivo makes
Chapter 24 Genes and Chromosomes932
0.2 mH9262
FIGURE 24–13 Relaxed and supercoiled plasmid DNAs. The molecule in the leftmost
electron micrograph is relaxed; the degree of supercoiling increases from left to right.
(a) Relaxed (8 turns)
(d) Strand separation
(b) Strained (7 turns)
(c) Supercoil
FIGURE 24–14 Effects of DNA underwinding. (a) A segment of DNA
within a closed-circular molecule, 84 bp long, in its relaxed form with
eight helical turns. (b) Removal of one turn induces structural strain.
(c) The strain is generally accommodated by formation of a supercoil.
(d) DNA underwinding also makes the separation of strands some-
what easier. In principle, each turn of underwinding should facilitate
strand separation over about 10 bp, as shown. However, the hydrogen-
bonded base pairs would generally preclude strand separation over
such a short distance, and the effect becomes important only for longer
DNAs and higher levels of DNA underwinding.
8885d_c24_920-947 2/11/04 1:36 PM Page 932 mac76 mac76:385_reb:
it easier to separate DNA strands, giving access to the
information they contain.
Every cell actively underwinds its DNA with the aid
of enzymatic processes (described below), and the
resulting strained state represents a form of stored en-
ergy. Cells maintain DNA in an underwound state to fa-
cilitate its compaction by coiling. The underwinding of
DNA is also important to enzymes of DNA metabolism
that must bring about strand separation as part of their
function.
The underwound state can be maintained only if the
DNA is a closed circle or if it is bound and stabilized by
proteins so that the strands are not free to rotate about
each other. If there is a break in one strand of an iso-
lated, protein-free circular DNA, free rotation at that
point will cause the underwound DNA to revert spon-
taneously to the relaxed state. In a closed-circular DNA
molecule, however, the number of helical turns cannot
be changed without at least transiently breaking one of
the DNA strands. The number of helical turns in a DNA
molecule therefore provides a precise description of
supercoiling.
DNA Underwinding Is Defined by Topological
Linking Number
The field of topology provides a number of ideas that
are useful to this discussion, particularly the concept of
linking number. Linking number is a topological prop-
erty of double-stranded DNA, because it does not vary
when the DNA is bent or deformed, as long as both DNA
strands remain intact. Linking number (Lk) is illustrated
in Figure 24–15.
Let’s begin by visualizing the separation of the two
strands of a double-stranded circular DNA. If the two
strands are linked as shown in Figure 24–15a, they are
effectively joined by what can be described as a
topological bond. Even if all hydrogen bonds and base-
stacking interactions were abolished such that the
strands were not in physical contact, this topological
bond would still link the two strands. Visualize one of
the circular strands as the boundary of a surface (such
as a soap film spanning the space framed by a circular
wire before you blow a soap bubble). The linking num-
ber can be defined as the number of times the second
strand pierces this surface. For the molecule in Figure
24–15a, Lk H11005 1; for that in Figure 24–15b, Lk H11005 6. The
linking number for a closed-circular DNA is always an
integer. By convention, if the links between two DNA
strands are arranged so that the strands are interwound
in a right-handed helix, the linking number is defined
as positive (H11001); for strands interwound in a left-handed
helix, the linking number is negative (H11002). Negative link-
ing numbers are, for all practical purposes, not en-
countered in DNA.
We can now extend these ideas to a closed-circular
DNA with 2,100 bp (Fig. 24–16a). When the molecule
is relaxed, the linking number is simply the number of
base pairs divided by the number of base pairs per turn,
which is close to 10.5; so in this case, Lk H11005 200. For a
circular DNA molecule to have a topological property
such as linking number, neither strand may contain a
break. If there is a break in either strand, the strands
can, in principle, be unraveled and separated com-
pletely. In this case, no topological bond exists and Lk
is undefined (Fig. 24–16b).
We can now describe DNA underwinding in terms
of changes in the linking number. The linking number
in relaxed DNA, Lk
0
, is used as a reference. For the mol-
ecule shown in Figure 24–16a, Lk
0
H11005 200; if two turns
are removed from this molecule, Lk H11005 198. The change
can be described by the equation
H9004Lk H11005 Lk H11002 Lk
0
H11005 198 H11002 200 H11005H110022
It is often convenient to express the change in linking
number in terms of a quantity that is independent of the
length of the DNA molecule. This quantity, called the
specific linking difference (H9268), or superhelical
density, is a measure of the number of turns removed
relative to the number present in relaxed DNA:
H9268 H11005 H5007
H9004
L
L
k
k
0
H5007
In the example in Figure 24–16c, H9268 H11005H110020.01, which
means that 1% (2 of 200) of the helical turns present
24.2 DNA Supercoiling 933
(b) Lk = 6
(a) Lk = 1
FIGURE 24–15 Linking number, Lk. Here, as usual, each blue ribbon
represents one strand of a double-stranded DNA molecule. For the
molecule in (a), Lk H11005 1. For the molecule in (b), Lk H11005 6. One of the
strands in (b) is kept untwisted for illustrative purposes, to define
the border of an imaginary surface (shaded blue). The number of
times the twisting strand penetrates this surface provides a rigorous
definition of linking number.
8885d_c24_920-947 2/11/04 1:36 PM Page 933 mac76 mac76:385_reb:
in the DNA (in its B form) have been removed. The de-
gree of underwinding in cellular DNAs generally falls in
the range of 5% to 7%; that is, H9268 H11005H110020.05 to H110020.07. The
negative sign indicates that the change in linking num-
ber is due to underwinding of the DNA. The supercoil-
ing induced by underwinding is therefore defined as
negative supercoiling. Conversely, under some condi-
tions DNA can be overwound, resulting in positive su-
percoiling. Note that the twisting path taken by the axis
of the DNA helix when the DNA is underwound (nega-
tive supercoiling) is the mirror image of that taken when
the DNA is overwound (positive supercoiling) (Fig.
24–17). Supercoiling is not a random process; the path
of the supercoiling is largely prescribed by the torsional
strain imparted to the DNA by decreasing or increasing
the linking number relative to B-DNA.
Linking number can be changed by H110061 by breaking
one DNA strand, rotating one of the ends 360H11034 about the
unbroken strand, and rejoining the broken ends. This
change has no effect on the number of base pairs or the
number of atoms in the circular DNA molecule. Two
forms of a circular DNA that differ only in a topological
property such as linking number are referred to as
topoisomers.
Linking number can be broken down into two struc-
tural components called writhe (Wr) and twist (Tw)
(Fig. 24–18). These are more difficult to describe than
linking number, but writhe may be thought of as a meas-
ure of the coiling of the helix axis and twist as deter-
mining the local twisting or spatial relationship of neigh-
boring base pairs. When the linking number changes,
some of the resulting strain is usually compensated for
by writhe (supercoiling) and some by changes in twist,
giving rise to the equation
Lk H11005 Tw H11001 Wr
Tw and Wr need not be integers. Twist and writhe are
geometric rather than topological properties, because
they may be changed by deformation of a closed-circular
DNA molecule.
In addition to causing supercoiling and making
strand separation somewhat easier, the underwinding of
Chapter 24 Genes and Chromosomes934
Relaxed DNA
Lk H11005 200
?Lk H11005 H110012?Lk H11005 H110022
Negative
supercoils
Lk H11005 198
Positive
supercoils
Lk H11005 202
FIGURE 24–17 Negative and positive supercoils. For the relaxed DNA
molecule of Figure 24–16a, underwinding or overwinding by two
helical turns (Lk H11005 198 or 202) will produce negative or positive su-
percoiling, respectively. Note that the DNA axis twists in opposite
directions in the two cases.
Straight ribbon (relaxed DNA)
Zero writhe, large change in twist
Large writhe, small change in twist
FIGURE 24–18 Ribbon model for illustrating twist and writhe. The
pink ribbon represents the axis of a relaxed DNA molecule. Strain
introduced by twisting the ribbon (underwinding the DNA) can be
manifested as writhe or twist. Changes in linking number are usually
accompanied by changes in both writhe and twist.
(a) Lk H11005 200 H11005 Lk
0
(b) Lk undefined
(c) Lk = 198
strand
break
?Lk H11005 H110022
Nick
FIGURE 24–16 Linking number applied to closed-circular DNA mol-
ecules. A 2,100 bp circular DNA is shown in three forms: (a) relaxed,
Lk H11005 200; (b) relaxed with a nick (break) in one strand, Lk undefined;
and (c) underwound by two turns, Lk H11005 198. The underwound mole-
cule generally exists as a supercoiled molecule, but underwinding also
facilitates the separation of DNA strands.
8885d_c24_920-947 2/11/04 1:36 PM Page 934 mac76 mac76:385_reb:
DNA facilitates a number of structural changes in the
molecule. These are of less physiological importance but
help illustrate the effects of underwinding. Recall that
a cruciform (see Fig. 8–21) generally contains a few un-
paired bases; DNA underwinding helps to maintain the
required strand separation (Fig. 24–19). Underwinding
of a right-handed DNA helix also facilitates the forma-
tion of short stretches of left-handed Z-DNA in regions
where the base sequence is consistent with the Z form
(Chapter 8).
Topoisomerases Catalyze Changes in the Linking
Number of DNA
DNA supercoiling is a precisely regulated process that
influences many aspects of DNA metabolism. Every cell
has enzymes with the sole function of underwinding
and/or relaxing DNA. The enzymes that increase or de-
crease the extent of DNA underwinding are topoiso-
merases; the property of DNA that they change is the
linking number. These enzymes play an especially im-
24.2 DNA Supercoiling 935
Relaxed DNA
Underwound DNA
Cruciform DNA
FIGURE 24–19 Promotion of cruciform structures by DNA under-
winding. In principle, cruciforms can form at palindromic sequences
(see Fig. 8–21), but they seldom occur in relaxed DNA because the
linear DNA accommodates more paired bases than does the cruci-
form structure. Underwinding of the DNA facilitates the partial strand
separation needed to promote cruciform formation at appropriate
sequences.
Relaxed
DNA
Highly
supercoiled
DNA
12 3
Decreasing
Lk
FIGURE 24–20 Visualization of topoisomers. In this experiment, all
DNA molecules have the same number of base pairs but exhibit some
range in the degree of supercoiling. Because supercoiled DNA mole-
cules are more compact than relaxed molecules, they migrate more
rapidly during gel electrophoresis. The gels shown here separate topoi-
somers (moving from top to bottom) over a limited range of superhe-
lical density. In lane 1, highly supercoiled DNA migrates in a single
band, even though different topoisomers are probably present. Lanes
2 and 3 illustrate the effect of treating the supercoiled DNA with a
type I topoisomerase; the DNA in lane 3 was treated for a longer time
than that in lane 2. As the superhelical density of the DNA is reduced
to the point where it corresponds to the range in which the gel can
resolve individual topoisomers, distinct bands appear. Individual bands
in the region indicated by the bracket next to lane 3 each contain
DNA circles with the same linking number; the linking number
changes by 1 from one band to the next.
portant role in processes such as replication and DNA
packaging. There are two classes of topoisomerases.
Type I topoisomerases act by transiently breaking one
of the two DNA strands, passing the unbroken strand
through the break, and rejoining the broken ends; they
change Lk in increments of 1. Type II topoisomerases
break both DNA strands and change Lk in increments
of 2.
The effects of these enzymes can be demonstrated
using agarose gel electrophoresis (Fig. 24–20). A pop-
ulation of identical plasmid DNAs with the same linking
number migrates as a discrete band during electro-
phoresis. Topoisomers with Lk values differing by as
little as 1 can be separated by this method, so changes
in linking number induced by topoisomerases are read-
ily detected.
8885d_c24_920-947 2/11/04 1:36 PM Page 935 mac76 mac76:385_reb:
5H11032
Tyr
Closed
conformation
3H11032
3H11032 5H11032
Open
conformation
5H110323H11032
3H11032
3H110325H11032
After DNA binds (step 1 ), an active-site Tyr attacks a phosphodiester
bond on one DNA strand in step 2 , cleaving it, creating a covalent
5H11032- P –Tyr protein-DNA linkage, and liberating the 3H11032-hydroxyl group
of the adjacent nucleotide.
In step 3 the enzyme switches to its open conformation, and the
unbroken DNA strand passes through the break in the first strand.
With the enzyme in the closed conformation, the liberated 3H11032-hydroxyl
group attacks the 5H11032- P –Tyr protein-DNA linkage in step 4 to
religate the cleaved DNA strand.
Release, or begin
new cyle
:
1 2
3
(a)
(b)
(c)
O
O
O
P O
–
O
OO
OH
CH
2
CH
2
O
H
Tyr
O
O
O
P O
–
O
O
CH
2
CH
2
O
H
Tyr
O
–
O
O
P O
CH
2
CH
2
O
Tyr
:
4
5
O
–
OH
O
O
P
H
+
OO
O
CH
2
CH
2
Tyr
5H110323H11032
3H11032
3H110325H11032
5H11032 3H11032
3H11032 5H11032
Base
Base
Base
O
BaseO
BaseO
BaseO
BaseO
O
Base
OO
MECHANISM FIGURE 24–21 Bacterial type I topoisomerases alter
linking number. A proposed reaction sequence for the bacterial topoi-
somerase I is illustrated. The enzyme has closed and open conforma-
tions. (a) A DNA molecule binds to the closed conformation and one
DNA strand is cleaved. (b) The enzyme changes to its open confor-
mation, and the other DNA strand moves through the break in the first
strand. (c) In the closed conformation, the DNA strand is religated.
8885d_c24_920-947 2/11/04 1:36 PM Page 936 mac76 mac76:385_reb:
E. coli has at least four different individual topo-
isomerases (I through IV). Those of type I (topoiso-
merases I and III) generally relax DNA by removing
negative supercoils (increasing Lk). The way in which
bacterial type I topoisomerases change linking number
is illustrated in Figure 24–21. A bacterial type II enzyme,
called either topoisomerase II or DNA gyrase, can in-
troduce negative supercoils (decrease Lk). It uses the
energy of ATP to accomplish this. To alter DNA linking
number, type II topoisomerases cleave both strands of
a DNA molecule and pass another duplex through the
break. The degree of supercoiling of bacterial DNA is
maintained by regulation of the net activity of topoiso-
merases I and II.
Eukaryotic cells also have type I and type II topo-
isomerases. The type I enzymes are topoisomerases I and
III; the type II enzymes are topoisomerases IIH9251 and IIH9252.
The eukaryotic type II topoisomerases cannot under-
wind DNA (introduce negative supercoils), but they can
relax both positive and negative supercoils. We consider
one probable origin of negative supercoils in eukaryotic
cells in our discussion of chromatin in Section 24.3. The
process catalyzed by eukaryotic type II topoisomerases
is illustrated in Figure 24–22.
DNA Compaction Requires a Special Form
of Supercoiling
Supercoiled DNA molecules are uniform in a number of
respects. The supercoils are right-handed in a negatively
supercoiled DNA molecule (Fig. 24–17), and they tend
to be extended and narrow rather than compacted, of-
ten with multiple branches (Fig. 24–23). At the super-
helical densities normally encountered in cells, the
length of the supercoil axis, including branches, is about
40% of the length of the DNA. This type of supercoiling
is referred to as plectonemic (from the Greek plektos,
“twisted,” and nema, “thread”). This term can be ap-
plied to any structure with strands intertwined in some
simple and regular way, and it is a good description of
the general structure of supercoiled DNA in solution.
24.2 DNA Supercoiling 937
3
4
5
2
1
N-gate
C-gate
FIGURE 24–22 Proposed mechanism for the alteration of linking
number by eukaryotic type IIA topoisomerases. 1 The multisubunit
enzyme binds one DNA molecule (blue). Gated cavities above and
below the bound DNA are called the N-gate and the C-gate. 2 A
second segment of the same DNA molecule (red) is bound at the N-
gate and 3 trapped. Both strands of the first DNA are now cleaved
(the chemistry is similar to that in Fig. 24–20b), and 4 the second
DNA segment is passed through the break. 5 The broken DNA is re-
ligated, and the second DNA segment is released through the C-gate.
Two ATPs are bound and hydrolyzed during this cycle; it is likely that
one is hydrolyzed in the step leading to the complex in step 4 . Ad-
ditional details of the ATP hydrolysis component of the reaction re-
main to be worked out.
Plectonemic supercoiling, the form observed in
isolated DNAs in the laboratory, does not produce suf-
ficient compaction to package DNA in the cell. A sec-
ond form of supercoiling, solenoidal (Fig. 24–24), can
be adopted by an underwound DNA. Instead of the
(a) (c)
Branch
points
Supercoil axis
(b)
FIGURE 24–23 Plectonemic supercoiling.
(a) Electron micrograph of plectonemically
supercoiled plasmid DNA and (b) an
interpretation of the observed structure.
The purple lines show the axis of the
supercoil; note the branching of the
supercoil. (c) An idealized representation
of this structure.
8885d_c24_920-947 2/11/04 1:36 PM Page 937 mac76 mac76:385_reb:
extended right-handed supercoils characteristic of the
plectonemic form, solenoidal supercoiling involves tight
left-handed turns, similar to the shape taken up by a
garden hose neatly wrapped on a reel. Although their
structures are dramatically different, plectonemic and
solenoidal supercoiling are two forms of negative super-
coiling that can be taken up by the same segment of
underwound DNA. The two forms are readily intercon-
vertible. Although the plectonemic form is more stable
in solution, the solenoidal form can be stabilized by
protein binding and is the form found in chromatin. It
provides a much greater degree of compaction (Fig.
24–24b). Solenoidal supercoiling is the mechanism by
which underwinding contributes to DNA compaction.
SUMMARY 24.2 DNA Supercoiling
■ Most cellular DNAs are supercoiled. Under-
winding decreases the total number of helical
turns in the DNA relative to the relaxed, B form.
To maintain an underwound state, DNA must
be either a closed circle or bound to protein.
Underwinding is quantified by a topological
parameter called linking number, Lk.
■ Underwinding is measured in terms of specific
linking difference, H9268 (also called superhelical
density), which is (Lk H11002 Lk
0
)/Lk
0
. For cellular
DNAs, H9268 is typically H110020.05 to H110020.07, which
means that approximately 5% to 7% of the
helical turns in the DNA have been removed.
DNA underwinding facilitates strand separation
by enzymes of DNA metabolism.
■ DNAs that differ only in linking number are
called topoisomers. Enzymes that underwind
and/or relax DNA, the topoisomerases, catalyze
changes in linking number. The two classes of
topoisomerases, type I and type II, change Lk
in increments of 1 or 2, respectively, per
catalytic event.
24.3 The Structure of Chromosomes
The term “chromosome” is used to refer to a nucleic
acid molecule that is the repository of genetic informa-
tion in a virus, a bacterium, a eukaryotic cell, or an or-
ganelle. It also refers to the densely colored bodies seen
in the nuclei of dye-stained eukaryotic cells, as visual-
ized using a light microscope.
Chromatin Consists of DNA and Proteins
The eukaryotic cell cycle (see Fig. 12–41) produces re-
markable changes in the structure of chromosomes (Fig.
24–25). In nondividing eukaryotic cells (in G0) and
those in interphase (G1, S, and G2), the chromosomal
material, chromatin, is amorphous and appears to be
randomly dispersed in certain parts of the nucleus. In
the S phase of interphase the DNA in this amorphous
state replicates, each chromosome producing two sister
chromosomes (called sister chromatids) that remain as-
sociated with each other after replication is complete.
The chromosomes become much more condensed dur-
ing prophase of mitosis, taking the form of a species-
specific number of well-defined pairs of sister chro-
matids (Fig. 24–5).
Chromatin consists of fibers containing protein and
DNA in approximately equal masses, along with a small
amount of RNA. The DNA in the chromatin is very
tightly associated with proteins called histones, which
package and order the DNA into structural units called
nucleosomes (Fig. 24–26). Also found in chromatin are
many nonhistone proteins, some of which help maintain
chromosome structure, others that regulate the ex-
pression of specific genes (Chapter 28). Beginning with
nucleosomes, eukaryotic chromosomal DNA is packaged
into a succession of higher-order structures that ulti-
mately yield the compact chromosome seen with the
light microscope. We now turn to a description of this
structure in eukaryotes and compare it with the pack-
aging of DNA in bacterial cells.
Chapter 24 Genes and Chromosomes938
(b)(a)
Plectonemic
Solenoidal
FIGURE 24–24 Plectonemic and solenoidal supercoiling. (a) Plec-
tonemic supercoiling takes the form of extended right-handed coils.
Solenoidal negative supercoiling takes the form of tight left-handed
turns about an imaginary tubelike structure. The two forms are read-
ily interconverted, although the solenoidal form is generally not ob-
served unless certain proteins are bound to the DNA. (b) Plectonemic
(top) and solenoidal supercoiling of the same DNA molecule, drawn
to scale. Solenoidal supercoiling provides a much greater degree of
compaction.
8885d_c24_920-947 2/11/04 1:36 PM Page 938 mac76 mac76:385_reb:
Histones Are Small, Basic Proteins
Found in the chromatin of all eukaryotic cells, histones
have molecular weights between 11,000 and 21,000 and
are very rich in the basic amino acids arginine and ly-
sine (together these make up about one-fourth of the
amino acid residues). All eukaryotic cells have five ma-
jor classes of histones, differing in molecular weight and
amino acid composition (Table 24–3). The H3 histones
are nearly identical in amino acid sequence in all
eukaryotes, as are the H4 histones, suggesting strict
conservation of their functions. For example, only 2 of
102 amino acid residues differ between the H4 histone
molecules of peas and cows, and only 8 differ between
the H4 histones of humans and yeast. Histones H1, H2A,
and H2B show less sequence similarity among eukary-
otic species.
Each type of histone has variant forms, because cer-
tain amino acid side chains are enzymatically modified
by methylation, ADP-ribosylation, phosphorylation, gly-
cosylation, or acetylation. Such modifications affect the
net electric charge, shape, and other properties of
histones, as well as the structural and functional prop-
erties of the chromatin, and they play a role in the reg-
ulation of transcription (Chapter 28).
24.3 The Structure of Chromosomes 939
Interphase
Mitosis
Metaphase
Anaphase Prophase
Spindle
pole
G2G1
condensation
replication
and cohesion
Condensins
Cohesins
Replication
completed
Cohesin
Duplex
DNA
S
Replication occurs
from multiple
origins of replication;
daughter chromatids
are linked by cohesins
alignment
separation
FIGURE 24–25 Changes in chromosome structure during
the eukaryotic cell cycle. Cellular DNA is uncondensed
throughout interphase. The interphase period can be
subdivided (see Fig. 12–41) into the G1 (gap) phase; the S
(synthesis) phase, when the DNA is replicated; and the G2
phase, in which the replicated chromosomes cohere to one
another. The DNA undergoes condensation in the prophase
of mitosis. Cohesins (green) and condensins (red) are
proteins involved in cohesion and condensation (discussed
later in the chapter). The architecture of the cohesin-
condensin-DNA complex is not yet established, and the
interactions shown here are figurative, simply suggesting
their role in condensation of the chromosome. During
metaphase, the condensed chromosomes line up along a
plane halfway between the spindle poles. One chromosome
of each pair is linked to each spindle pole via microtubules
that extend between the spindle and the centromere. The
sister chromatids separate at anaphase, each drawn toward
the spindle pole to which it is connected. After cell division
is complete, the chromosomes decondense and the cycle
begins anew.
Histone core
of nucleosome
Linker DNA
of nucleosome
(a)
(b)
50 nm
FIGURE 24–26 Nucleosomes. Regularly spaced nucleosomes consist
of histone complexes bound to DNA. (a) Schematic illustration and
(b) electron micrograph.
8885d_c24_920-947 2/11/04 1:36 PM Page 939 mac76 mac76:385_reb:
Nucleosomes Are the Fundamental Organizational
Units of Chromatin
The eukaryotic chromosome depicted in Figure 24–5
represents the compaction of a DNA molecule about
10
5
H9262m long into a cell nucleus that is typically 5 to
10 H9262m in diameter. This compaction involves several
levels of highly organized folding. Subjection of chromo-
somes to treatments that partially unfold them reveals
a structure in which the DNA is bound tightly to beads
of protein, often regularly spaced (Fig. 24–26). The
beads in this “beads-on-a-string” arrangement are com-
plexes of histones and DNA. The bead plus the con-
necting DNA that leads to the next bead form the nu-
cleosome, the fundamental unit of organization upon
which the higher-order packing of chromatin is built.
The bead of each nucleosome contains eight histone
molecules: two copies each of H2A, H2B, H3, and H4.
The spacing of the nucleosome beads provides a re-
peating unit typically of about 200 bp, of which 146 bp
are bound tightly around the eight-part histone core and
the remainder serve as linker DNA between nucleosome
beads. Histone H1 binds to the linker DNA. Brief treat-
ment of chromatin with enzymes that digest DNA causes
preferential degradation of the linker DNA, releasing his-
tone particles containing 146 bp of bound DNA that have
been protected from digestion. Researchers have crys-
tallized nucleosome cores obtained in this way, and
x-ray diffraction analysis reveals a particle made up of
the eight histone molecules with the DNA wrapped
around it in the form of a left-handed solenoidal super-
coil (Fig. 24–27).
A close inspection of this structure reveals why eu-
karyotic DNA is underwound even though eukaryotic
cells lack enzymes that underwind DNA. Recall that the
solenoidal wrapping of DNA in nucleosomes is but one
form of supercoiling that can be taken up by under-
wound (negatively supercoiled) DNA. The tight wrap-
ping of DNA around the histone core requires the re-
moval of about one helical turn in the DNA. When the
protein core of a nucleosome binds in vitro to a relaxed,
closed-circular DNA, the binding introduces a negative
supercoil. Because this binding process does not break
the DNA or change the linking number, the formation
of a negative solenoidal supercoil must be accompanied
by a compensatory positive supercoil in the unbound re-
gion of the DNA (Fig. 24–28). As mentioned earlier, eu-
karyotic topoisomerases can relax positive supercoils.
Relaxing the unbound positive supercoil leaves the neg-
ative supercoil fixed (through its binding to the nucle-
osome histone core) and results in an overall decrease
in linking number. Indeed, topoisomerases have proved
necessary for assembling chromatin from purified his-
tones and closed-circular DNA in vitro.
Another factor that affects the binding of DNA to
histones in nucleosome cores is the sequence of the
Chapter 24 Genes and Chromosomes940
H2B
H4
H2A
H2A
H3
H4
H3
H2B
(a)
(b)
(c)
FIGURE 24–27 DNA wrapped around a nucleosome core. (a) Space-
filling representation of the nucleosome protein core, with different
colors for the different histones (PDB ID 1AOI). (b) Top and (c) side
views of the crystal structure of a nucleosome with 146 bp of bound
DNA. The protein is depicted as a gray surface contour, with the bound
DNA in blue. The DNA binds in a left-handed solenoidal supercoil
that circumnavigates the histone complex 1.8 times. A schematic draw-
ing is included in (c) for comparison with other figures depicting
nucleosomes.
8885d_c24_920-947 2/11/04 1:36 PM Page 940 mac76 mac76:385_reb:
bound DNA. Histone cores do not bind randomly to
DNA; rather, they tend to position themselves at certain
locations. This positioning is not fully understood but in
some cases appears to depend on a local abundance of
AUT base pairs in the DNA helix where it is in contact
with the histones (Fig. 24–29). The tight wrapping of
the DNA around the nucleosome’s histone core requires
compression of the minor groove of the helix at these
points, and a cluster of two or three AUT base pairs
makes this compression more likely.
Other proteins are required for the positioning of
some nucleosome cores on DNA. In several organisms,
certain proteins bind to a specific DNA sequence and
then facilitate the formation of a nucleosome core
nearby. Precise positioning of nucleosome cores can
play a role in the expression of some eukaryotic genes
(Chapter 28).
24.3 The Structure of Chromosomes 941
TABLE 24–3 Types and Properties of Histones
Number of
Content of basic amino
Molecular amino acid
acids (% of total)
Histone weight residues Lys Arg
H1
*
21,130 223 29.5 11.3
H2A
*
13,960 129 10.9 19.3
H2B
*
13,774 125 16.0 16.4
H3 15,273 135 19.6 13.3
H4 11,236 102 10.8 13.7
*
The sizes of these histones vary somewhat from species to species. The numbers given here are for bovine histones.
FIGURE 24–28 Chromatin assembly. (a) Relaxed, closed-circular
DNA. (b) Binding of a histone core to form a nucleosome induces one
negative supercoil. In the absence of any strand breaks, a positive
supercoil must form elsewhere in the DNA (H9004Lk H11005 0). (c) Relaxation
of this positive supercoil by a topoisomerase leaves one net negative
supercoil (H9004Lk H11005H110021).
DNA
Histone
core
(a)
(b)
(c)
One (net) negative
supercoil
H9004Lk H11005 0
H9004Lk H11005 H110021
topoisomerase
Bound
negative
supercoil
(solenoidal)
Unbound positive
supercoil (plectonemic)
DNA
Histone core
A T pairs abundant
FIGURE 24–29 Positioning of a nucleosome to make optimal use of
AUT base pairs where the histone core is in contact with the minor
groove of the DNA helix.
8885d_c24_920-947 2/11/04 1:36 PM Page 941 mac76 mac76:385_reb:
Nucleosomes Are Packed into Successively
Higher Order Structures
Wrapping of DNA around a nucleosome core compacts
the DNA length about sevenfold. The overall compaction
in a chromosome, however, is greater than 10,000-fold—
ample evidence for even higher orders of structural or-
ganization. In chromosomes isolated by very gentle
methods, nucleosome cores appear to be organized into
a structure called the 30 nm fiber (Fig. 24–30). This
packing requires one molecule of histone H1 per nucle-
osome core. Organization into 30 nm fibers does not ex-
tend over the entire chromosome but is punctuated by
regions bound by sequence-specific (nonhistone) DNA-
binding proteins. The 30 nm structure also appears to
depend on the transcriptional activity of the particular
region of DNA. Regions in which genes are being tran-
scribed are apparently in a less-ordered state that con-
tains little, if any, histone H1.
The 30 nm fiber, a second level of chromatin or-
ganization, provides an approximately 100-fold com-
paction of the DNA. The higher levels of folding are not
yet understood, but it appears that certain regions of
DNA associate with a nuclear scaffold (Fig. 24–31). The
scaffold-associated regions are separated by loops of
DNA with perhaps 20 to 100 kbp. The DNA in a loop
may contain a set of related genes. For example, in
Drosophila complete sets of histone-coding genes seem
to cluster together in loops that are bounded by scaf-
fold attachment sites (Fig. 24–32). The scaffold itself
appears to contain several proteins, notably large
amounts of histone H1 (located in the interior of the
fiber) and topoisomerase II. The presence of topoiso-
merase II further emphasizes the relationship between
DNA underwinding and chromatin structure. Topoiso-
merase II is so important to the maintenance of chro-
matin structure that inhibitors of this enzyme can kill
Chapter 24 Genes and Chromosomes942
FIGURE 24–30 The 30 nm fiber, a higher-order organization of nu-
cleosomes. (a) Schematic illustration of the probable structure of the
fiber, showing nucleosome packing. (b) Electron micrograph.
FIGURE 24–31 A partially unraveled human chromosome, revealing
numerous loops of DNA attached to a scaffoldlike structure.
30
nm
30 nm Fiber Histone
genes
Nuclear
scaffold
H1
H3
H4
H2B
H2A
FIGURE 24–32 Loops of chromosomal DNA attached to a nuclear
scaffold. The DNA in the loops is packaged as 30 nm fibers, so the
loops are the next level of organization. Loops often contain groups
of genes with related functions. Complete sets of histone-coding genes,
as shown in this schematic illustration, appear to be clustered in loops
of this kind. Unlike most genes, histone genes occur in multiple copies
in many eukaryotic genomes.
(a)
(b)
8885d_c24_920-947 2/11/04 1:36 PM Page 942 mac76 mac76:385_reb:
rapidly dividing cells. Several drugs used in cancer
chemotherapy are topoisomerase II inhibitors that allow
the enzyme to promote strand breakage but not the re-
sealing of the breaks.
Evidence exists for additional layers of organization
in eukaryotic chromosomes, each dramatically enhanc-
ing the degree of compaction. One model for achieving
this compaction is illustrated in Figure 24–33. Higher-
order chromatin structure probably varies from chro-
mosome to chromosome, from one region to the next in
a single chromosome, and from moment to moment in
the life of a cell. No single model can adequately de-
scribe these structures. Nevertheless, the principle is
clear: DNA compaction in eukaryotic chromosomes is
likely to involve coils upon coils upon coils . . . Three-
Dimensional Packaging of Nuclear Chromosomes
Condensed Chromosome Structures Are Maintained
by SMC Proteins
A third major class of chromatin proteins, in addition to
the histones and topoisomerases, is the SMC proteins
(structural maintenance of chromosomes). The primary
structure of SMC proteins consists of five distinct do-
mains (Fig. 24–34a). The amino- and carboxyl-terminal
globular domains, N and C, each of which has part of
an ATP hydrolytic site, are connected by two regions of
H9251-helical coiled-coil motifs (see Fig. 4–11) that are joined
by a hinge domain. The proteins are generally dimeric,
forming a V-shaped complex that is thought to be tied
together through their hinge domains (Fig. 24–34b). One
N and one C domain come together to form a complete
ATP hydrolytic site at each end of the V.
Proteins in the SMC family are found in all types of
organisms, from bacteria to humans. Eukaryotes have
two major types, cohesins and condensins (Fig. 24–25).
The cohesins play a substantial role in linking together
sister chromatids immediately after replication and
keeping them together as the chromosomes condense
to metaphase. This linkage is essential if chromosomes
are to segregate properly at cell division. The detailed
mechanism by which cohesins link sister chromosomes,
and the role of ATP hydrolysis, are not yet understood.
The condensins are essential to the condensation of
chromosomes as cells enter mitosis. In the laboratory,
condensins bind to DNA in a manner that creates pos-
itive supercoils; that is, condensin binding causes the
DNA to become overwound, in contrast to the under-
winding induced by the binding of nucleosomes. It is not
yet clear how this helps to compact the chromatin, al-
though one possibility is presented in Figure 24–35.
Bacterial DNA Is Also Highly Organized
We now turn briefly to the structure of bacterial chro-
mosomes. Bacterial DNA is compacted in a structure
called the nucleoid, which can occupy a significant
24.3 The Structure of Chromosomes 943
Nuclear
scaffold
Two
chromatids
(10 coils each)
One coil
(30 rosettes)
One rosette
(6 loops)
One loop
(~75,000 bp)
30 nm Fiber
“Beads-on-
a-string”
form of
chromatin
DNA
FIGURE 24–33 Compaction of DNA in a eukaryotic chromosome.
Model for levels of organization that could provide DNA compaction
in the chromosomes of eukaryotes. The levels take the form of coils
upon coils. In cells, the higher-order structures (above the 30 nm fibers)
are unlikely to be as uniform as depicted here.
8885d_c24_920-947 2/11/04 1:36 PM Page 943 mac76 mac76:385_reb:
fraction of the cell volume (Fig. 24–36). The DNA ap-
pears to be attached at one or more points to the
inner surface of the plasma membrane. Much less is
known about the structure of the nucleoid than of eu-
karyotic chromatin. In E. coli, a scaffoldlike structure
appears to organize the circular chromosome into a
series of looped domains, as described above for chro-
matin. Bacterial DNA does not seem to have any struc-
ture comparable to the local organization provided by
nucleosomes in eukaryotes. Histonelike proteins are
abundant in E. coli—the best-characterized example
is a two-subunit protein called HU (M
r
19,000)—but
these proteins bind and dissociate within minutes, and
no regular, stable DNA-histone structure has been
found. The bacterial chromosome is a relatively dy-
namic molecule, possibly reflecting a requirement for
more ready access to its genetic information. The bac-
terial cell division cycle can be as short as 15 min,
whereas a typical eukaryotic cell may not divide for
hours or even months. In addition, a much greater
fraction of prokaryotic DNA is used to encode RNA
and/or protein products. Higher rates of cellular me-
tabolism in bacteria mean that a much higher propor-
tion of the DNA is being transcribed or replicated at
a given time than in most eukaryotic cells.
Chapter 24 Genes and Chromosomes944
Condensin
+
Relaxed DNA
(–)
topoisomerase I
(–)
(+)(+) (+)(+)
FIGURE 24–35 Model for the effect of condensins on DNA super-
coiling. Binding of condensins to a closed-circular DNA in the pres-
ence of topoisomerase I leads to the production of positive supercoils
(H11001). Wrapping of the DNA about the condensin introduces positive
supercoils because it wraps in the opposite sense to a solenoidal su-
percoil (see Fig. 24–24). The compensating negative supercoils (H11002) that
appear elsewhere in the DNA are then relaxed by topoisomerase I. In
the chromosome, it is the wrapping of the DNA about condensin that
may contribute to DNA condensation.
2 mH9262
FIGURE 24–36 E. coli cells showing nucleoids. The DNA is stained
with a dye that fluoresces when exposed to UV light. The light area
defines the nucleoid. Note that some cells have replicated their DNA
but have not yet undergone cell division and hence have multiple
nucleoids.
ATP
ATP
(a)
N Hinge
Coiled coil Coiled coil
50 nm
C
(b)
(c)
FIGURE 24–34 Structure of SMC proteins. (a) The five domains of
the SMC primary structure. N and C denoted the amino-terminal and
carboxyl-terminal domains, respectively. (b) Each polypeptide is
folded so that the two coiled-coil domains wrap around each other
and the N and C domains come together to form a complete ATP-
binding site. Two of these domains are linked at the hinge region to
form the dimeric V-shaped molecule. (c) Electron micrograph of SMC
proteins from Bacillus subtilis.
8885d_c24_920-947 2/11/04 1:36 PM Page 944 mac76 mac76:385_reb:
With this overview of the complexity of DNA struc-
ture, we are now ready to turn, in the next chapter, to
a discussion of DNA metabolism.
SUMMARY 24.3 The Structure of Chromosomes
■ The fundamental unit of organization in the
chromatin of eukaryotic cells is the
nucleosome, which consists of histones and a
200 bp segment of DNA. A core protein
particle containing eight histones (two copies
each of histones H2A, H2B, H3, and H4) is
encircled by a segment of DNA (about 146 bp)
in the form of a left-handed solenoidal
supercoil.
■ Nucleosomes are organized into 30 nm fibers,
and the fibers are extensively folded to provide
the 10,000-fold compaction required to fit a
typical eukaryotic chromosome into a cell
nucleus. The higher-order folding involves
attachment to a nuclear scaffold that contains
histone H1, topoisomerase II, and SMC
proteins.
■ Bacterial chromosomes are also extensively
compacted into the nucleoid, but the
chromosome appears to be much more
dynamic and irregular in structure than
eukaryotic chromatin, reflecting the shorter cell
cycle and very active metabolism of a bacterial
cell.
Chapter 24 Further Reading 945
Key Terms
gene 921
genome 923
chromosome 923
phenotype 924
mutation 924
regulatory
sequence 924
plasmid 925
intron 928
exon 928
simple-sequence
DNA 929
satellite DNA 929
centromere 930
telomere 930
supercoil 930
relaxed DNA 930
topology 931
underwinding 932
linking number 933
specific linking difference
(H9268) 933
superhelical
density 933
topoisomers 934
topoisomerases 935
plectonemic 937
solenoidal 937
chromatin 938
histones 938
nucleosome 938
30 nm fiber 942
SMC proteins 943
cohesins 943
condensins 943
nucleoid 943
Terms in bold are defined in the glossary.
Further Reading
General
Blattner, F.R., Plunkett, G., III, Bloch, C.A., Perna, N.T.,
Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode,
C.K., Mayhew, G.F., et al. (1997) The complete genome
sequence of Escherichia coli K-12. Science 277, 1453–1474.
New secrets of this common laboratory organism are revealed.
Cozzarelli, N.R. & Wang, J.C. (eds) (1990) DNA Topology and
Its Biological Effects, Cold Spring Harbor Laboratory Press, Cold
Spring Harbor, NY.
Kornberg, A. & Baker, T.A. (1991) DNA Replication, 2nd edn,
W. H. Freeman & Company, New York.
A good place to start for further information on the structure
and function of DNA.
Lodish, H., Berk, A., Matsudaira, P., Kaiser, C.A., Krieger,
M., Scott, M.P., Zipursky, S.L., & Darnell, J. (2003) Molecular
Cell Biology, 5th edn, W. H. Freeman & Company, New York.
Another excellent general reference.
Genes and Chromosomes
Bromham, L. (2002) The human zoo: endogenous retroviruses in
the human genome. Trends Ecol. Evolut. 17, 91–97.
A thorough description of one of the transposon classes that
makes up a large part of the human genome.
Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B.,
Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C.,
Johnston, M., et al. (1996) Life with 6000 genes. Science 274,
546, 563–567.
Report of the first complete sequence of a eukaryotic genome,
the yeast Saccharomyces cerevisiae.
Greider, C.W. & Blackburn, E.H. (1996) Telomeres, telomerase
and cancer. Sci. Am. 274 (February), 92–97.
Huxley, C. (1997) Mammalian artificial chromosomes and chromo-
some transgenics. Trends Genet. 13, 345–347.
Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody,
M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M.,
FitzHugh, W., et al. (2001) Initial sequencing and analysis of the
human genome. Nature 409, 860–921.
One of the first reports on the draft sequence of the human
genome, with lots of analysis and many associated articles.
Long, M., de Souza, S.J., & Gilbert, W. (1995) Evolution of the
intron-exon structure of eukaryotic genes. Curr. Opin. Genet.
Dev. 5, 774–778.
McEachern, M.J., Krauskopf, A., & Blackburn, E.H. (2000)
Telomeres and their control. Annu. Rev. Genet. 34, 331–358.
8885d_c24_945 2/12/04 11:22 AM Page 945 mac76 mac76:385_reb:
Chapter 24 Genes and Chromosomes946
Schmid, C.W. (1996) Alu: structure, origin, evolution, significance
and function of one-tenth of human DNA. Prog. Nucleic Acid Res.
Mol. Biol. 53, 283–319.
Tyler-Smith, C. & Floridia, G. (2000) Many paths to the top of
the mountain: diverse evolutionary solutions to centromere struc-
ture. Cell 102, 5–8.
Details of the diversity of centromere structures from different
organisms, as currently understood.
Zakian, V.A. (1996) Structure, function, and replication of
Saccharomyces cerevisiae telomeres. Annu. Rev. Genet. 30,
141–172.
Supercoiling and Topoisomerases
Berger, J.M. (1998) Type II DNA topoisomerases. Curr. Opin.
Struct. Biol. 8, 26–32.
Boles, T.C., White, J.H., & Cozzarelli, N.R. (1990) Structure of
plectonemically supercoiled DNA. J. Mol. Biol. 213, 931–951.
A study that defines several fundamental features of
supercoiled DNA.
Champoux, J.J. (2001) DNA topoisomerases: structure, function,
and mechanism. Annu. Rev. Biochem. 70, 369–413.
An excellent summary of the topoisomerase classes.
Cozzarelli, N.R., Boles, T.C., & White, J.H. (1990) Primer on
the topology and geometry of DNA supercoiling. In DNA Topology
and Its Biological Effects (Cozzarelli, N.R. & Wang, J.C., eds),
pp. 139–184, Cold Spring Harbor Laboratory Press, Cold Spring
Harbor, NY.
A more advanced and thorough discussion.
Lebowitz, J. (1990) Through the looking glass: the discovery of
supercoiled DNA. Trends Biochem. Sci. 15, 202–207.
A short and interesting historical note.
Wang, J.C. (2002) Cellular roles of DNA topoisomerases: a
molecular perspective. Nat. Rev. Mol. Cell Biol. 3, 430–440.
Chromatin and Nucleosomes
Filipski, J., Leblanc, J., Youdale, T., Sikorska, M., & Walker,
P.R. (1990) Periodicity of DNA folding in higher order chromatin
structures. EMBO J. 9, 1319–1327.
Hirano, T. (2002) The ABCs of SMC proteins: two-armed ATPases
for chromosome condensation, cohesion and repair. Genes Dev.
16, 399–414.
Description of the rapid advances in understanding of this
interesting class of proteins.
Kornberg, R.D. (1974) Chromatin structure: a repeating unit of
histones and DNA. Science 184, 868–871.
A classic paper that introduced the subunit model for chromatin.
Nasmyth, K. (2002) Segregating sister genomes: the molecular
biology of chromosome separation. Science 297, 559–565.
Wyman, C. & Kanaar, R. (2002) Chromosome organization:
reaching out to embrace new models. Curr. Biol. 12, R446–R448.
A good, short summary of chromosome structure and the roles
of SMC proteins within it.
Zlatanova, J. & van Holde, K. (1996) The linker histones and
chromatin structure: new twists. Prog. Nucleic Acid Res. Mol.
Biol. 52, 217–259.
1. Packaging of DNA in a Virus Bacteriophage T2 has
a DNA of molecular weight 120 H11003 10
6
contained in a head
about 210 nm long. Calculate the length of the DNA (assume
the molecular weight of a nucleotide pair is 650) and com-
pare it with the length of the T2 head.
2. The DNA of Phage M13 The base composition of
phage M13 DNA is A, 23%; T, 36%; G, 21%; C, 20%. What
does this tell you about the DNA of phage M13?
3. The Mycoplasma Genome The complete genome of
the simplest bacterium known, Mycoplasma genitalium, is
a circular DNA molecule with 580,070 bp. Calculate the mo-
lecular weight and contour length (when relaxed) of this mol-
ecule. What is Lk
0
for the Mycoplasma chromosome? If
H9268 H11005H110020.06, what is Lk?
4. Size of Eukaryotic Genes An enzyme isolated from
rat liver has 192 amino acid residues and is coded for by a
gene with 1,440 bp. Explain the relationship between the
number of amino acid residues in the enzyme and the num-
ber of nucleotide pairs in its gene.
5. Linking Number A closed-circular DNA molecule in
its relaxed form has an Lk of 500. Approximately how many
base pairs are in this DNA? How is the linking number altered
(increases, decreases, doesn’t change, becomes undefined)
when (a) a protein complex is bound to form a nucleosome,
(b) one DNA strand is broken, (c) DNA gyrase and ATP are
added to the DNA solution, or (d) the double helix is dena-
tured by heat?
6. Superhelical Density Bacteriophage H9261 infects E. coli
by integrating its DNA into the bacterial chromosome. The
success of this recombination depends on the topology of the
E. coli DNA. When the superhelical density (H9268) of the E. coli
DNA is greater than H110020.045, the probability of integration is
H1102120%; when H9268 is less than H110020.06, the probability is H1102270%.
Plasmid DNA isolated from an E. coli culture is found to have
a length of 13,800 bp and an Lk of 1,222. Calculate H9268 for this
DNA and predict the likelihood that bacteriophage H9261 will be
able to infect this culture.
7. Altering Linking Number (a) What is the Lk of a
5,000 bp circular duplex DNA molecule with a nick in one
strand? (b) What is the Lk of the molecule in (a) when the
nick is sealed (relaxed)? (c) How would the Lk of the mole-
cule in (b) be affected by the action of a single molecule of
E. coli topoisomerase I? (d) What is the Lk of the molecule
in (b) after eight enzymatic turnovers by a single molecule of
DNA gyrase in the presence of ATP? (e) What is the Lk of the
molecule in (d) after four enzymatic turnovers by a single mol-
ecule of bacterial type I topoisomerase? (f) What is the Lk of
the molecule in (d) after binding of one nucleosome?
Problems
8885d_c24_920-947 2/11/04 1:36 PM Page 946 mac76 mac76:385_reb:
Chapter 24 Problems 947
8. Chromatin Early evidence that helped researchers
define nucleosome structure is illustrated by the agarose gel
below, in which the thick bands represent DNA. It was gen-
erated by briefly treating chromatin with an enzyme that
degrades DNA, then removing all protein and subjecting the
purified DNA to electrophoresis. Numbers at the side of the
gel denote the position to which a linear DNA of the indicated
size would migrate. What does this gel tell you about chro-
matin structure? Why are the DNA bands thick and spread
out rather than sharply defined?
9. DNA Structure Explain how the underwinding of a B-
DNA helix might facilitate or stabilize the formation of Z-DNA.
10. Maintaining DNA Structure (a) Describe two struc-
tural features required for a DNA molecule to maintain a neg-
atively supercoiled state. (b) List three structural changes
that become more favorable when a DNA molecule is nega-
tively supercoiled. (c) What enzyme, with the aid of ATP, can
generate negative superhelicity in DNA? (d) Describe the
physical mechanism by which this enzyme acts.
11. Yeast Artificial Chromosomes (YACs) YACs are
used to clone large pieces of DNA in yeast cells. What three
types of DNA sequences are required to ensure proper repli-
cation and propagation of a YAC in a yeast cell?
200 bp
400 bp
600 bp
800 bp
1,000 bp
8885d_c24_920-947 2/11/04 1:36 PM Page 947 mac76 mac76:385_reb:
another set of subunits, a clamp-loading complex, or H9253
complex, consisting of five subunits of four different
types, H9270
2
H9253H9254H9254H11032. The core polymerases are linked through
the H9270 (tau) subunits. Two additional subunits, H9273 (chi) and
H9274 (psi), are bound to the clamp-loading complex. The
entire assembly of 13 protein subunits (nine different
types) is called DNA polymerase III* (Fig. 25–10a).
DNA polymerase III* can polymerize DNA, but with
a much lower processivity than one would expect for
the organized replication of an entire chromosome. The
necessary increase in processivity is provided by the ad-
dition of the H9252 subunits, four of which complete the DNA
polymerase III holoenzyme. The H9252 subunits associate in
pairs to form donut-shaped structures that encircle the
DNA and act like clamps (Fig. 25–10b). Each dimer as-
sociates with a core subassembly of polymerase III* (one
dimeric clamp per core subassembly) and slides along
the DNA as replication proceeds. The H9252 sliding clamp
prevents the dissociation of DNA polymerase III from
DNA, dramatically increasing processivity—to greater
than 500,000 (Table 25–1).
DNA Replication Requires Many Enzymes
and Protein Factors
Replication in E. coli requires not just a single DNA
polymerase but 20 or more different enzymes and pro-
teins, each performing a specific task. The entire com-
plex has been termed the DNA replicase system or
replisome. The enzymatic complexity of replication re-
flects the constraints imposed by the structure of DNA
and by the requirements for accuracy. The main classes
of replication enzymes are considered here in terms of
the problems they overcome.
Access to the DNA strands that are to act as tem-
plates requires separation of the two parent strands.
This is generally accomplished by helicases, enzymes
that move along the DNA and separate the strands, us-
ing chemical energy from ATP. Strand separation cre-
ates topological stress in the helical DNA structure (see
Fig. 24–12), which is relieved by the action of topo-
isomerases. The separated strands are stabilized by
DNA-binding proteins. As noted earlier, before DNA
polymerases can begin synthesizing DNA, primers must
be present on the template—generally short segments
25.1 DNA Replication 957
3H110325H11032
5H110323H11032
OH
P
RNA or DNA
Template
DNA strand
(PP
i
)
n
3H110325H11032
5H110323H11032
OH
P
dNTPs
dNMPs
or
rNMPs
3H110325H11032
5H110323H11032
OH
P
3H110325H11032
5H110323H11032
OH
P
Nick
Nick
DNA
polymerase I
FIGURE 25–8 Large (Klenow) fragment of DNA polymerase I. This
polymerase is widely distributed in bacteria. The Klenow fragment,
produced by proteolytic treatment of the polymerase, retains the poly-
merization and proofreading activities of the enzyme. The Klenow
fragment shown here is from the thermophilic bacterium Bacillus
stearothermophilus (PDB ID 3BDP). The active site for addition of nu-
cleotides is deep in the crevice at the far end of the bound DNA. The
dark blue strand is the template.
FIGURE 25–9 Nick translation. In this process, an RNA or DNA strand
paired to a DNA template is simultaneously degraded by the 5H11032n3H11032
exonuclease activity of DNA polymerase I and replaced by the poly-
merase activity of the same enzyme. These activities have a role in
both DNA repair and the removal of RNA primers during replication
(both described later). The strand of nucleic acid to be removed (ei-
ther DNA or RNA) is shown in green, the replacement strand in red.
DNA synthesis begins at a nick (a broken phosphodiester bond, leav-
ing a free 3H11032 hydroxyl and a free 5H11032 phosphate). Polymerase I extends
the nontemplate DNA strand and moves the nick along the DNA—a
process called nick translation. A nick remains where DNA polymerase
I dissociates, and is later sealed by another enzyme.
8885d_c25_948-994 2/11/04 1:57 PM Page 957 mac76 mac76:385_reb:
A Word about Terminology Before beginning to look
closely at replication, we must make a short digression
into the use of abbreviations in naming genes and pro-
teins. By convention, bacterial genes generally are
named using three lowercase, italicized letters that of-
ten reflect their apparent function. For example, the
dna, uvr, and rec genes affect DNA replication, resist-
ance to the damaging effects of UV radiation, and re-
combination, respectively. Where several genes affect
the same process, the letters A, B, C, and so forth, are
added—as in dnaA, dnaB, dnaQ, for example—usually
reflecting their order of discovery rather than their or-
der in a reaction sequence.
During genetic investigations, the protein product
of each gene is usually isolated and characterized. Many
bacterial genes have been identified and named before
the roles of their protein products are understood in
detail. Sometimes the gene product is found to be a pre-
viously isolated protein, and some renaming occurs.
Often the product turns out to be an as yet unknown
protein, with an activity not easily described by a sim-
ple enzyme name. In a practice that can be confusing,
Chapter 25 DNA Metabolism 949
Mismatch repair protein mutL
Single-stranded DNA–binding protein ssb
Helicase dnaB
RNA polymerase
subunits
rpoB
rpoC
DNA polymerase I
polA
mutU
dnaP
rep
(Replication origin) oriC
Replication initiation
dnaA
dnaN
Recombinational repair recF
Methylation dam
RNA polymerase
subunits
rpoA
rpoD
Primase dnaG
Mismatch repair proteins
mutH
mutS
recC
Recombination and
recombinational repair
recB
recD
recA
Recombination and
recombinational repair
DNA repair uvrA
DNA helicase/mismatch repair uvrD
DNA gyrase subunit
Primosome assembly
gyrB
priA
Ter (Replication termination)
DNA ligase lig
Uracyl glycosylase ung
recO Recombinational
repair
nfo AP endonuclease
DNA gyrase subunit gyrA
sbcB Exonuclease I
uvrC DNA repair
ruvC
ruvA Recombination and recombinational repair
holE DNA polymerase III subunit
xthA AP endonuclease
ogt O
6
-G alkyltransferase
ruvB
umuC
umuD
uvrB DNA repair
phr DNA photolyase
holB DNA polymerase III subunit
holA DNA polymerase III subunit
recR Recombinational repair
dinB DNA polymerase IV
dnaQ DNA polymerase III subunit
polC (dnaE) DNA polymerase III subunit
mutT
polB DNA polymerase II
holC DNA polymerase III subunit
dnaJ, dnaK
dnaC Primosome component
holD DNA polymerase III subunit
100/0
50
75 25
5H11032Helicase 3H11032
DNA polymerase V
FIGURE 25–1 Map of the E. coli chromosome. The map shows the
relative positions of genes encoding many of the proteins important
in DNA metabolism. The number of genes known to be involved pro-
vides a hint of the complexity of these processes. The numbers 0 to
100 inside the circular chromosome denote a genetic measurement
called minutes. Each minute corresponds to ~40,000 bp along the
DNA molecule of E. coli. The three-letter names of genes and other
elements generally reflect some aspect of their function. These include
mut, mutagenesis; dna, DNA replication; pol, DNA polymerase; rpo,
RNA polymerase; uvr, UV resistance; rec, recombination; dam, DNA
adenine methylation; lig, DNA ligase; Ter, termination of replication;
and ori, origin of replication.
8885d_c25_948-994 2/11/04 1:57 PM Page 949 mac76 mac76:385_reb:
these bacterial proteins often retain the name of their
genes. When referring to the protein, roman type is used
and the first letter is capitalized: for example, the dnaA
and recA gene products are called the DnaA and RecA
proteins, respectively. You will encounter many such ex-
amples in this chapter.
Similar conventions exist for the naming of eukary-
otic genes, although the exact form of the abbreviations
may vary with the species and no single convention ap-
plies to all eukaryotic systems.
25.1 DNA Replication
Long before the structure of DNA was known, scientists
wondered at the ability of organisms to create faithful
copies of themselves and, later, at the ability of cells to
produce many identical copies of large and complex
macromolecules. Speculation about these problems cen-
tered around the concept of a template, a structure
that would allow molecules to be lined up in a specific
order and joined, to create a macromolecule with a
unique sequence and function. The 1940s brought the
revelation that DNA was the genetic molecule, but not
until James Watson and Francis Crick deduced its struc-
ture did the way in which DNA could act as a template
for the replication and transmission of genetic informa-
tion become clear: one strand is the complement of
the other. The strict base-pairing rules mean that each
strand provides the template for a sister strand with a
predictable and complementary sequence (see Figs
8–16, 8–17). Nucleotides: Building Blocks of Nucleic Acids
The fundamental properties of the DNA replication
process and the mechanisms used by the enzymes that
catalyze it have proved to be essentially identical in all
species. This mechanistic unity is a major theme as we
proceed from general properties of the replication
process, to E. coli replication enzymes, and, finally, to
replication in eukaryotes.
DNA Replication Follows a Set of Fundamental Rules
Early research on bacterial DNA replication and its en-
zymes helped to establish several basic properties that
have proven applicable to DNA synthesis in every
organism.
DNA Replication Is Semiconservative Each DNA strand
serves as a template for the synthesis of a new strand,
producing two new DNA molecules, each with one new
strand and one old strand. This is semiconservative
replication.
Watson and Crick proposed the hypothesis of semi-
conservative replication soon after publication of their
1953 paper on the structure of DNA, and the hypothe-
sis was proved by ingeniously designed experiments car-
ried out by Matthew Meselson and Franklin Stahl in
1957. Meselson and Stahl grew E. coli cells for many
generations in a medium in which the sole nitrogen
source (NH
4
Cl) contained
15
N, the “heavy” isotope of
nitrogen, instead of the normal, more abundant “light”
isotope,
14
N. The DNA isolated from these cells had a
density about 1% greater than that of normal [
14
N]DNA
(Fig. 25–2a). Although this is only a small difference, a
mixture of heavy [
15
N]DNA and light [
14
N]DNA can be
separated by centrifugation to equilibrium in a cesium
chloride density gradient.
The E. coli cells grown in the
15
N medium were
transferred to a fresh medium containing only the
14
N
isotope, where they were allowed to grow until the cell
population had just doubled. The DNA isolated from
these first-generation cells formed a single band in the
CsCl gradient at a position indicating that the double-
helical DNA molecules of the daughter cells were hy-
brids containing one new
14
N strand and one parent
15
N
strand (Fig. 25–2b).
This result argued against conservative replication,
an alternative hypothesis in which one progeny DNA
Chapter 25 DNA Metabolism950
DNA extracted and centrifuged
to equilibrium in CsCl
density gradient
Original parent
molecule
First-generation
daughter molecules
Second-generation
daughter molecules
Heavy
DNA (
15
N)
Hybrid DNA
(
15
N–
14
N)
Hybrid DNA
Light
DNA (
14
N)
(a)
(b)
(c)
FIGURE 25–2 The Meselson-Stahl experiment. (a) Cells were grown
for many generations in a medium containing only heavy nitrogen,
15
N, so that all the nitrogen in their DNA was
15
N, as shown by a sin-
gle band (blue) when centrifuged in a CsCl density gradient. (b) Once
the cells had been transferred to a medium containing only light ni-
trogen,
14
N, cellular DNA isolated after one generation equilibrated
at a higher position in the density gradient (purple band). (c) Contin-
uation of replication for a second generation yielded two hybrid DNAs
and two light DNAs (red), confirming semiconservative replication.
8885d_c25_948-994 2/11/04 1:57 PM Page 950 mac76 mac76:385_reb:
molecule would consist of two newly synthesized DNA
strands and the other would contain the two parent
strands; this would not yield hybrid DNA molecules in
the Meselson-Stahl experiment. The semiconservative
replication hypothesis was further supported in the next
step of the experiment (Fig. 25–2c). Cells were again
allowed to double in number in the
14
N medium. The
isolated DNA product of this second cycle of replication
exhibited two bands in the density gradient, one with a
density equal to that of light DNA and the other with
the density of the hybrid DNA observed after the first
cell doubling.
Replication Begins at an Origin and Usually Proceeds Bidirec-
tionally Following the confirmation of a semiconserva-
tive mechanism of replication, a host of questions arose.
Are the parent DNA strands completely unwound be-
fore each is replicated? Does replication begin at ran-
dom places or at a unique point? After initiation at any
point in the DNA, does replication proceed in one di-
rection or both?
An early indication that replication is a highly co-
ordinated process in which the parent strands are si-
multaneously unwound and replicated was provided by
John Cairns, using autoradiography. He made E. coli
DNA radioactive by growing cells in a medium contain-
ing thymidine labeled with tritium (
3
H). When the DNA
was carefully isolated, spread, and overlaid with a pho-
tographic emulsion for several weeks, the radioactive
thymidine residues generated “tracks” of silver grains in
the emulsion, producing an image of the DNA molecule.
These tracks revealed that the intact chromosome of
E. coli is a single huge circle, 1.7 mm long. Radioactive
DNA isolated from cells during replication showed an
extra loop (Fig. 25–3a). Cairns concluded that the loop
resulted from the formation of two radioactive daugh-
ter strands, each complementary to a parent strand. One
or both ends of the loop are dynamic points, termed
replication forks, where parent DNA is being un-
wound and the separated strands quickly replicated.
Cairns’s results demonstrated that both DNA strands are
replicated simultaneously, and a variation on his exper-
iment (Fig. 25–3b) indicated that replication of bacter-
ial chromosomes is bidirectional: both ends of the loop
have active replication forks.
The determination of whether the replication loops
originate at a unique point in the DNA required land-
marks along the DNA molecule. These were provided
25.1 DNA Replication 951
(a)
Bidirectional
Origin
Replication
forks
Unidirectional
Origin
(b)
FIGURE 25–3 Visualization of bidirectional DNA replication. Repli-
cation of a circular chromosome produces a structure resembling the
Greek letter theta (H9258). (a) Labeling with tritium (
3
H) shows that both
strands are replicated at the same time (new strands shown in red).
The electron micrographs illustrate the replication of a circular E. coli
plasmid as visualized by autoradiography. (b) Addition of
3
H for a
short period just before the reaction is stopped allows a distinction to
be made between unidirectional and bidirectional replication, by de-
termining whether label (red) is found at one or both replication forks
in autoradiograms. This technique has revealed bidirectional replica-
tion in E. coli, Bacillus subtilis, and other bacteria.
8885d_c25_948-994 2/11/04 1:57 PM Page 951 mac76 mac76:385_reb:
by a technique called denaturation mapping, devel-
oped by Ross Inman and colleagues. Using the 48,502
bp chromosome of bacteriophage H9261, Inman showed that
DNA could be selectively denatured at sequences un-
usually rich in AUT base pairs, generating a repro-
ducible pattern of single-strand bubbles (see Fig. 8–31).
Isolated DNA containing replication loops can be par-
tially denatured in the same way. This allows the posi-
tion and progress of the replication forks to be meas-
ured and mapped, using the denatured regions as points
of reference. The technique revealed that in this system
the replication loops always initiate at a unique point,
which was termed an origin. It also confirmed the ear-
lier observation that replication is usually bidirectional.
For circular DNA molecules, the two replication forks
meet at a point on the side of the circle opposite to the
origin. Specific origins of replication have since been
identified and characterized in bacteria and lower
eukaryotes.
DNA Synthesis Proceeds in a 5H11541n3H11541 Direction and Is Semidis-
continuous A new strand of DNA is always synthesized
in the 5H11032n3H11032 direction, with the free 3H11032 OH as the point
at which the DNA is elongated (the 5H11032 and 3H11032 ends of a
DNA strand are defined in Fig. 8–7). Because the two
DNA strands are antiparallel, the strand serving as the
template is read from its 3H11032 end toward its 5H11032 end.
If synthesis always proceeds in the 5H11032n3H11032 direction,
how can both strands be synthesized simultaneously? If
both strands were synthesized continuously while the
replication fork moved, one strand would have to un-
dergo 3H11032n5H11032 synthesis. This problem was resolved by
Reiji Okazaki and colleagues in the 1960s. Okazaki found
that one of the new DNA strands is synthesized in short
pieces, now called Okazaki fragments. This work ul-
timately led to the conclusion that one strand is syn-
thesized continuously and the other discontinuously
(Fig. 25–4). The continuous strand, or leading strand,
is the one in which 5H11032n3H11032 synthesis proceeds in the
same direction as replication fork movement. The dis-
continuous strand, or lagging strand, is the one in
which 5H11032n3H11032 synthesis proceeds in the direction oppo-
site to the direction of fork movement. Okazaki frag-
ments range in length from a few hundred to a few thou-
sand nucleotides, depending on the cell type. As we shall
see later, leading and lagging strand syntheses are
tightly coordinated.
DNA Is Degraded by Nucleases
To explain the enzymology of DNA replication, we first
introduce the enzymes that degrade DNA rather than
synthesize it. These enzymes are known as nucleases,
or DNases if they are specific for DNA rather than RNA.
Every cell contains several different nucleases, belong-
ing to two broad classes: exonucleases and endonucle-
ases. Exonucleases degrade nucleic acids from one
end of the molecule. Many operate in only the 5H11032n3H11032 or
the 3H11032n5H11032 direction, removing nucleotides only from the
5H11032 or the 3H11032 end, respectively, of one strand of a double-
stranded nucleic acid or of a single-stranded DNA. En-
donucleases can begin to degrade at specific internal
sites in a nucleic acid strand or molecule, reducing it to
smaller and smaller fragments. A few exonucleases and
endonucleases degrade only single-stranded DNA.
There are a few important classes of endonucleases that
cleave only at specific nucleotide sequences (such as
the restriction endonucleases that are so important in
biotechnology; see Chapter 9, Fig. 9–3). You will en-
counter many types of nucleases in this and subsequent
chapters.
DNA Is Synthesized by DNA Polymerases
The search for an enzyme that
could synthesize DNA began in
1955. Work by Arthur Kornberg
and colleagues led to the puri-
fication and characterization of
DNA polymerase from E. coli
cells, a single-polypeptide en-
zyme now called DNA poly-
merase I (M
r
103,000; encoded
by the polA gene). Much later,
investigators found that E. coli
contains at least four other
distinct DNA polymerases, de-
scribed below.
Detailed studies of DNA polymerase I revealed fea-
tures of the DNA synthetic process that are now known
to be common to all DNA polymerases. The fundamen-
Chapter 25 DNA Metabolism952
3H11032
5H11032
3H11032
5H11032
Direction of movement
of replication fork
Lagging
strand
5H11032
5H11032
3H11032
3H11032
Leading
strand
Okazaki
fragments
5H11032
3H11032
FIGURE 25–4 Defining DNA strands at the replication fork. A new
DNA strand (red) is always synthesized in the 5H11032n3H11032 direction. The
template is read in the opposite direction, 3H11032n5H11032. The leading strand
is continuously synthesized in the direction taken by the replication
fork. The other strand, the lagging strand, is synthesized discontinu-
ously in short pieces (Okazaki fragments) in a direction opposite to
that in which the replication fork moves. The Okazaki fragments are
spliced together by DNA ligase. In bacteria, Okazaki fragments are
~1,000 to 2,000 nucleotides long. In eukaryotic cells, they are 150 to
200 nucleotides long.
Arthur Kornberg
8885d_c25_948-994 2/11/04 1:57 PM Page 952 mac76 mac76:385_reb:
tal reaction is a phosphoryl group transfer. The nucleo-
phile is the 3H11032-hydroxyl group of the nucleotide at the
3H11032 end of the growing strand. Nucleophilic attack occurs
at the H9251 phosphorus of the incoming deoxynucleoside
5H11032-triphosphate (Fig. 25–5). Inorganic pyrophosphate is
released in the reaction. The general reaction is
(dNMP)
n
H11001 dNTP 88n (dNMP)
nH110011
H11001 PP
i
(25–1)
DNA Lengthened
DNA
where dNMP and dNTP are deoxynucleoside 5H11032-mono-
phosphate and 5H11032-triphosphate, respectively. The reac-
tion appears to proceed with only a minimal change in
free energy, given that one phosphodiester bond is
formed at the expense of a somewhat less stable phos-
phate anhydride. However, noncovalent base-stacking
and base-pairing interactions provide additional stabi-
lization to the lengthened DNA product relative to the
free nucleotide. Also, the formation of products is facil-
itated in the cell by the 19 kJ/mol generated in the sub-
sequent hydrolysis of the pyrophosphate product by the
enzyme pyrophosphatase.
25.1 DNA Replication 953
A
T
P
PP
GA
T
T
OH
G
C
A
PO
O
O
H11002
O
PO
O
O
H11002
PO
O
H11002
O
H11002
O
H11002
PO
O
O
H11002
PO
O
H11002
O
H11002
OH
Template
DNA strand
Growing
DNA strand
(primer)
5H11032
3H11032
P
PPPP
P
Deoxyribose
GA
T
C
5H11032
3H11032
P
PP
P
5H11032
5H11032
3H11032
5H11032
OH
Incoming
deoxynucleoside
5H11032-triphosphate
G
:
CH
2
O
P
P
P
DNA polymerase
P
PP
i
O
OO
–
O
O
CH
2
HH
H
–
O
O
B
HH
H
O
O
–
–
O
O
OO
–
O
–
O
–
H
HOH
O
B
C
Asp
Asp
Asp
O
Template strand
CH
2
HH
HO
O
–
O
O
O
B
CH
2
HH
HOH
O
B
Template strand
Mg
2+
Mg
2+
MECHANISM FIGURE 25–5 Elongation of a DNA chain. (a) DNA polymerase I activity requires a
single unpaired strand to act as template and a primer strand to provide a free hydroxyl group at
the 3H11032 end, to which a new nucleotide unit is added. Each incoming nucleotide is selected in part
by base pairing to the appropriate nucleotide in the template strand. The reaction product has a
new free 3H11032 hydroxyl, allowing the addition of another nucleotide. (b) The catalytic mechanism
likely involves two Mg
2H11001
ions, coordinated to the phosphate groups of the incoming nucleotide
triphosphate and to three Asp residues, two of which are highly conserved in all DNA polymerases.
The top Mg
2H11001
ion in the figure facilitates attack of the 3H11032-hydroxyl group of the primer on the H9251
phosphate of the nucleotide triphosphate; the lower Mg
2H11001
ion facilitates displacement of the
pyrophosphate. Both ions stabilize the structure of the pentacovalent transition state. RNA
polymerases use a similar mechanism (See Fig. 26–1b). Nucleic Acid Synthesis
(a)
(b)
8885d_c25_948-994 2/11/04 1:57 PM Page 953 mac76 mac76:385_reb:
Early work on DNA polymerase I led to the defini-
tion of two central requirements for DNA polymeriza-
tion. First, all DNA polymerases require a template.
The polymerization reaction is guided by a template
DNA strand according to the base-pairing rules pre-
dicted by Watson and Crick: where a guanine is present
in the template, a cytosine deoxynucleotide is added to
the new strand, and so on. This was a particularly im-
portant discovery, not only because it provided a chem-
ical basis for accurate semiconservative DNA replication
but also because it represented the first example of the
use of a template to guide a biosynthetic reaction.
Second, the polymerases require a primer. A primer
is a strand segment (complementary to the template)
with a free 3H11032-hydroxyl group to which a nucleotide can
be added; the free 3H11032 end of the primer is called the
primer terminus. In other words, part of the new
strand must already be in place: all DNA polymerases
can only add nucleotides to a preexisting strand. Most
primers are oligonucleotides of RNA rather than DNA,
and specialized enzymes synthesize primers when and
where they are required.
After adding a nucleotide to a growing DNA strand,
a DNA polymerase either dissociates or moves along the
template and adds another nucleotide. Dissociation and
reassociation of the polymerase can limit the overall
polymerization rate—the process is generally faster
when a polymerase adds more nucleotides without dis-
sociating from the template. The average number of nu-
cleotides added before a polymerase dissociates defines
its processivity. DNA polymerases vary greatly in pro-
cessivity; some add just a few nucleotides before disso-
ciating, others add many thousands. Nucleotide Poly-
merization by DNA Polymerase
Replication Is Very Accurate
Replication proceeds with an extraordinary degree of fi-
delity. In E. coli, a mistake is made only once for every
10
9
to 10
10
nucleotides added. For the E. coli chromo-
some of ~4.6 H11003 10
6
bp, this means that an error occurs
only once per 1,000 to 10,000 replications. During poly-
merization, discrimination between correct and incor-
rect nucleotides relies not just on the hydrogen bonds
that specify the correct pairing between complementary
bases but also on the common geometry of the standard
AUT and GmC base pairs (Fig. 25–6). The active site
of DNA polymerase I accommodates only base pairs with
this geometry. An incorrect nucleotide may be able to
hydrogen-bond with a base in the template, but it gen-
erally will not fit into the active site. Incorrect bases can
be rejected before the phosphodiester bond is formed.
The accuracy of the polymerization reaction itself,
however, is insufficient to account for the high degree
of fidelity in replication. Careful measurements in vitro
have shown that DNA polymerases insert one incorrect
nucleotide for every 10
4
to 10
5
correct ones. These
Chapter 25 DNA Metabolism954
N
N
N
N
H
H
A
N
N
N
NH N
O
NH
H
G
N
H11001
H
N
N
N
A
NHN
H
C
N
N
O
H
H
NT
N
O
OH
HO
N N
N
N
H
NH
G
(b)
CH
3
NT
N
CH
3
O
O
H
H N
H
N
N
N
N
H
A
NC
N
N
O
H
H O
H
N
N
N
N
H
H N
G
(a)
FIGURE 25–6 Contribution of base-pair geometry to the fidelity of
DNA replication. (a) The standard AUT and GmC base pairs have very
similar geometries, and an active site sized to fit one (blue shading)
will generally accommodate the other. (b) The geometry of incorrectly
paired bases can exclude them from the active site, as occurs on DNA
polymerase I.
8885d_c25_948-994 2/11/04 1:57 PM Page 954 mac76 mac76:385_reb:
mistakes sometimes occur because a base is briefly in
an unusual tautomeric form (see Fig. 8–9), allowing it to
hydrogen-bond with an incorrect partner. In vivo, the er-
ror rate is reduced by additional enzymatic mechanisms.
One mechanism intrinsic to virtually all DNA poly-
merases is a separate 3H11032n5H11032 exonuclease activity that
double-checks each nucleotide after it is added. This
nuclease activity permits the enzyme to remove a newly
added nucleotide and is highly specific for mismatched
base pairs (Fig. 25–7). If the polymerase has added the
wrong nucleotide, translocation of the enzyme to the
position where the next nucleotide is to be added is in-
hibited. This kinetic pause provides the opportunity for
a correction. The 3H11032n5H11032 exonuclease activity removes
the mispaired nucleotide, and the polymerase begins
again. This activity, known as proofreading, is not
simply the reverse of the polymerization reaction (Eqn
25–1), because pyrophosphate is not involved. The
polymerizing and proofreading activities of a DNA poly-
merase can be measured separately. Proofreading im-
proves the inherent accuracy of the polymerization re-
action 10
2
- to 10
3
-fold. In the monomeric DNA
polymerase I, the polymerizing and proofreading activ-
ities have separate active sites within the same
polypeptide.
When base selection and proofreading are com-
bined, DNA polymerase leaves behind one net error for
every 10
6
to 10
8
bases added. Yet the measured accu-
racy of replication in E. coli is higher still. The addi-
tional accuracy is provided by a separate enzyme sys-
tem that repairs the mismatched base pairs remaining
after replication. We describe this mismatch repair,
along with other DNA repair processes, in Section 25.2.
E. coli Has at Least Five DNA Polymerases
More than 90% of the DNA polymerase activity observed
in E. coli extracts can be accounted for by DNA poly-
merase I. Soon after the isolation of this enzyme in 1955,
however, evidence began to accumulate that it is not
suited for replication of the large E. coli chromosome.
First, the rate at which it adds nucleotides (600 nu-
cleotides/min) is too slow (by a factor of 100 or more)
to account for the rates at which the replication fork
moves in the bacterial cell. Second, DNA polymerase I
has a relatively low processivity. Third, genetic studies
have demonstrated that many genes, and therefore
many proteins, are involved in replication: DNA poly-
merase I clearly does not act alone. Fourth, and most
important, in 1969 John Cairns isolated a bacterial strain
with an altered gene for DNA polymerase I that pro-
duced an inactive enzyme. Although this strain was ab-
normally sensitive to agents that damaged DNA, it was
nevertheless viable!
A search for other DNA polymerases led to the
discovery of E. coli DNA polymerase II and DNA
polymerase III in the early 1970s. DNA polymerase II
is an enzyme involved in one type of DNA repair (Sec-
tion 25.3). DNA polymerase III is the principal replica-
tion enzyme in E. coli. The properties of these three
DNA polymerases are compared in Table 25–1. DNA
25.1 DNA Replication 955
DNA polymerase I
OH
Before the polymerase
moves on, the cytosine
undergoes a tautomeric
shift from C* to C. The
new nucleotide is now
mispaired.
is a rare tautomeric
form of cytosine (C*)
that pairs with A and
is incorporated into
the growing strand.
The mispaired 3H11032-OH
end of the growing
strand blocks further
elongation. DNA
polymerase slides back
to position the
mispaired base in the
3H11032→5H11032 exonuclease
active site.
The mispaired
nucleotide is removed.
DNA polymerase slides
forward and resumes its
polymerization activity.
DNA polymerase
active site
3H11032→5H11032 (proofreading)
exonuclease
active site
C
A
OH
C
OH
OH
C
A
OH
OH
5H11032 3H11032
FIGURE 25–7 An example of error correction by the 3H11541n5H11541 exonu-
clease activity of DNA polymerase I. Structural analysis has located
the exonuclease activity ahead of the polymerase activity as the en-
zyme is oriented in its movement along the DNA. A mismatched base
(here, a C–A mismatch) impedes translocation of DNA polymerase I
to the next site. Sliding backward, the enzyme corrects the mistake
with its 3H11032n5H11032 exonuclease activity, then resumes its polymerase ac-
tivity in the 5H11032n3H11032 direction.
8885d_c25_948-994 2/11/04 1:57 PM Page 955 mac76 mac76:385_reb:
polymerases IV and V, identified in 1999, are involved
in an unusual form of DNA repair (Section 25.2).
DNA polymerase I, then, is not the primary enzyme
of replication; instead it performs a host of clean-up
functions during replication, recombination, and repair.
The polymerase’s special functions are enhanced by its
5H11032n3H11032 exonuclease activity. This activity, distinct from
the 3H11032n5H11032 proofreading exonuclease (Fig. 25–7), is lo-
cated in a structural domain that can be separated from
the enzyme by mild protease treatment. When the
5H11032n3H11032 exonuclease domain is removed, the remaining
fragment (M
r
68,000), the large fragment or Klenow
fragment (Fig. 25–8), retains the polymerization and
proofreading activities. The 5H11032n3H11032 exonuclease activity
of intact DNA polymerase I can replace a segment of
DNA (or RNA) paired to the template strand, in a
process known as nick translation (Fig. 25–9). Most
other DNA polymerases lack a 5H11032n3H11032 exonuclease
activity.
DNA polymerase III is much more complex than
DNA polymerase I, having ten types of subunits (Table
25–2). Its polymerization and proofreading activities re-
side in its H9251 and H9255 (epsilon) subunits, respectively. The
H9258 subunit associates with H9251 and H9255 to form a core poly-
merase, which can polymerize DNA but with limited
processivity. Two core polymerases can be linked by
Chapter 25 DNA Metabolism956
TABLE 25–1 Comparison of DNA Polymerases of E. coli
DNA polymerase
I II III
Structural gene
*
polA polB polC (dnaE)
Subunits (number of different types) 1 7 H1135010
M
r
103,000 88,000
?
791,500
3H11032n5H11032 Exonuclease (proofreading) Yes Yes Yes
5H11032n3H11032 Exonuclease Yes No No
Polymerization rate (nucleotides/s) 16–20 40 250–1,000
Processivity (nucleotides added 3–200 1,500 H11350500,000
before polymerase dissociates)
*
For enzymes with more than one subunit, the gene listed here encodes the subunit with polymerization activity. Note that dnaE
is an earlier designation for the gene now referred to as polC.
?
Polymerization subunit only. DNA polymerase II shares several subunits with DNA polymerase III, including the H9252, H9253, H9254, H9254H11032, H9273,
and H9274 subunits (see Table 25–2).
TABLE 25–2 Subunits of DNA Polymerase III of E. coli
Number of
subunits per
Subunit holoenzyme M
r
of subunit Gene Function of subunit
H9251 2 129,900 polC (dnaE) Polymerization activity
H9255 2 27,500 dnaQ (mutD)3H11032n5H11032 Proofreading exonuclease Core polymerase
H9258 2 8,600 holE
H9270 2 71,100 dnaX Stable template binding;
core enzyme dimerization Clamp-loading (H9253) complex that
H9253 1 47,500 dnaX
*
Clamp loader loads H9252 subunits on lagging
H9254 1 38,700 holA Clamp opener strand at each Okazaki fragment
H9254H11032 1 36,900 holB Clamp loader
H9273 1 16,600 holC Interaction with SSB
H9274 1 15,200 holD Interaction with H9253 and H9273
H9252 4 40,600 dnaN DNA clamp required for
optimal processivity
*
The H9253 subunit is encoded by a portion of the gene for the H9270 subunit, such that the amino-terminal 66% of the H9270 subunit has
the same amino acid sequence as the H9253 subunit. The H9253 subunit is generated by a translational frameshifting mechanism (see
Box 27–1) that leads to premature translational termination.
H20903
H20903
8885d_c25_948-994 2/11/04 1:57 PM Page 956 mac76 mac76:385_reb:
another set of subunits, a clamp-loading complex, or H9253
complex, consisting of five subunits of four different
types, H9270
2
H9253H9254H9254H11032. The core polymerases are linked through
the H9270 (tau) subunits. Two additional subunits, H9273 (chi) and
H9274 (psi), are bound to the clamp-loading complex. The
entire assembly of 13 protein subunits (nine different
types) is called DNA polymerase III* (Fig. 25–10a).
DNA polymerase III* can polymerize DNA, but with
a much lower processivity than one would expect for
the organized replication of an entire chromosome. The
necessary increase in processivity is provided by the ad-
dition of the H9252 subunits, four of which complete the DNA
polymerase III holoenzyme. The H9252 subunits associate in
pairs to form donut-shaped structures that encircle the
DNA and act like clamps (Fig. 25–10b). Each dimer as-
sociates with a core subassembly of polymerase III* (one
dimeric clamp per core subassembly) and slides along
the DNA as replication proceeds. The H9252 sliding clamp
prevents the dissociation of DNA polymerase III from
DNA, dramatically increasing processivity—to greater
than 500,000 (Table 25–1).
DNA Replication Requires Many Enzymes
and Protein Factors
Replication in E. coli requires not just a single DNA
polymerase but 20 or more different enzymes and pro-
teins, each performing a specific task. The entire com-
plex has been termed the DNA replicase system or
replisome. The enzymatic complexity of replication re-
flects the constraints imposed by the structure of DNA
and by the requirements for accuracy. The main classes
of replication enzymes are considered here in terms of
the problems they overcome.
Access to the DNA strands that are to act as tem-
plates requires separation of the two parent strands.
This is generally accomplished by helicases, enzymes
that move along the DNA and separate the strands, us-
ing chemical energy from ATP. Strand separation cre-
ates topological stress in the helical DNA structure (see
Fig. 24–12), which is relieved by the action of topo-
isomerases. The separated strands are stabilized by
DNA-binding proteins. As noted earlier, before DNA
polymerases can begin synthesizing DNA, primers must
be present on the template—generally short segments
25.1 DNA Replication 957
3H110325H11032
5H110323H11032
OH
P
RNA or DNA
Template
DNA strand
(PP
i
)
n
3H110325H11032
5H110323H11032
OH
P
dNTPs
dNMPs
or
rNMPs
3H110325H11032
5H110323H11032
OH
P
3H110325H11032
5H110323H11032
OH
P
Nick
Nick
DNA
polymerase I
FIGURE 25–8 Large (Klenow) fragment of DNA polymerase I. This
polymerase is widely distributed in bacteria. The Klenow fragment,
produced by proteolytic treatment of the polymerase, retains the poly-
merization and proofreading activities of the enzyme. The Klenow
fragment shown here is from the thermophilic bacterium Bacillus
stearothermophilus (PDB ID 3BDP). The active site for addition of nu-
cleotides is deep in the crevice at the far end of the bound DNA. The
dark blue strand is the template.
FIGURE 25–9 Nick translation. In this process, an RNA or DNA strand
paired to a DNA template is simultaneously degraded by the 5H11032n3H11032
exonuclease activity of DNA polymerase I and replaced by the poly-
merase activity of the same enzyme. These activities have a role in
both DNA repair and the removal of RNA primers during replication
(both described later). The strand of nucleic acid to be removed (ei-
ther DNA or RNA) is shown in green, the replacement strand in red.
DNA synthesis begins at a nick (a broken phosphodiester bond, leav-
ing a free 3H11032 hydroxyl and a free 5H11032 phosphate). Polymerase I extends
the nontemplate DNA strand and moves the nick along the DNA—a
process called nick translation. A nick remains where DNA polymerase
I dissociates, and is later sealed by another enzyme.
8885d_c25_948-994 2/16/04 6:43 AM Page 957 mac39 Pdrive 01:es%0:freeman:8885d:ch25:
End view
of RNA synthesized by enzymes known as primases.
Ultimately, the RNA primers are removed and replaced
by DNA; in E. coli, this is one of the many functions of
DNA polymerase I. After an RNA primer is removed and
the gap is filled in with DNA, a nick remains in the DNA
backbone in the form of a broken phosphodiester bond.
These nicks are sealed by DNA ligases. All these
processes require coordination and regulation, an in-
terplay best characterized in the E. coli system.
Replication of the E. coli Chromosome
Proceeds in Stages
The synthesis of a DNA molecule can be divided into
three stages: initiation, elongation, and termination,
distinguished both by the reactions taking place and by
the enzymes required. As you will find here and in the
next two chapters, synthesis of the major information-
containing biological polymers—DNAs, RNAs, and pro-
teins—can be understood in terms of these same three
stages, with the stages of each pathway having unique
characteristics. The events described below reflect in-
formation derived primarily from in vitro experiments
using purified E. coli proteins, although the principles
are highly conserved in all replication systems.
Initiation The E. coli replication origin, oriC, consists
of 245 bp; it bears DNA sequence elements that are
highly conserved among bacterial replication origins.
The general arrangement of the conserved sequences is
Chapter 25 DNA Metabolism958
t
b clamp
DnaB
helicase
t
b clamp
(open)
Core (aev)
d
g
dH11032
FIGURE 25–10 DNA polymerase III. (a) Architecture of bacterial
DNA polymerase III. Two core domains, composed of subunits H9251, H9255,
and H9258, are linked by a five-subunit H9253 complex (also known as the
clamp-loading complex) with the composition H9270
2
H9253H9254H9254H11032. The H9253 and H9270
subunits are encoded by the same gene. The H9253 subunit is a shortened
version of H9270; the H9270 subunit thus contains a domain identical to H9253, along
with an additional segment that interacts with the core polymerase.
The other two subunits of DNA polymerase III*, H9273 and H9274 (not shown),
also bind to the H9253 complex. Two H9252 clamps interact with the two-core
subassembly, each clamp a dimer of the H9252 subunit. The complex in-
teracts with the DnaB helicase through the H9270 subunit. (b) Two H9252 sub-
units of E. coli polymerase III form a circular clamp that surrounds the
DNA. The clamp slides along the DNA molecule, increasing the pro-
cessivity of the polymerase III holoenzyme to greater than 500,000 by
preventing its dissociation from the DNA. The end-on view shows the
two H9252 subunits as gray and light-blue ribbon structures surrounding a
space-filling model of DNA. In the side view, surface contour models
of the H9252 subunits (gray) surround a stick representation of a DNA dou-
ble helix (light and dark blue) (derived from PDB ID 2POL). Side view
(b)
(a)
8885d_c25_958 2/12/04 11:32 AM Page 958 mac76 mac76:385_reb:
illustrated in Figure 25–11. The key sequences of in-
terest here are two series of short repeats: three repeats
of a 13 bp sequence and four repeats of a 9 bp sequence.
At least nine different enzymes or proteins (sum-
marized in Table 25–3) participate in the initiation phase
of replication. They open the DNA helix at the origin
and establish a prepriming complex for subsequent re-
actions. The crucial component in the initiation process
is the DnaA protein. A single complex of four to five
DnaA protein molecules binds to the four 9 bp repeats
in the origin (Fig. 25–12, step 1 ), then recognizes and
successively denatures the DNA in the region of the
three 13 bp repeats, which are rich in AUT pairs (step
2 ). This process requires ATP and the bacterial his-
tonelike protein HU. The DnaC protein then loads the
DnaB protein onto the unwound region. Two ring-
shaped hexamers of DnaB, one loaded onto each DNA
strand, act as helicases, unwinding the DNA bidirec-
tionally and creating two potential replication forks. If
the E. coli single-stranded DNA–binding protein (SSB)
and DNA gyrase (DNA topoisomerase II) are now added
in vitro, thousands of base pairs are rapidly unwound
by the DnaB helicase, proceeding out from the origin.
Many molecules of SSB bind cooperatively to single-
stranded DNA, stabilizing the separated strands and
preventing renaturation while gyrase relieves the topo-
logical stress produced by the DnaB helicase. When ad-
ditional replication proteins are included in the in vitro
system, the DNA unwinding mediated by DnaB is cou-
pled to replication, as described below.
Initiation is the only phase of DNA replication that
is known to be regulated, and it is regulated such that
replication occurs only once in each cell cycle. The
mechanism of regulation is not yet well understood, but
genetic and biochemical studies have provided a few
insights.
The timing of replication initiation is affected by
DNA methylation and interactions with the bacterial
plasma membrane. The oriC DNA is methylated by the
Dam methylase (Table 25–3), which methylates the N
6
position of adenine within the palindromic sequence
(5H11032)GATC. (Dam is not a biochemical expletive; it stands
for DNA adenine methylation.) The oriC region of E. coli
is highly enriched in GATC sequences—it has 11 of them
in its 245 bp, whereas the average frequency of GATC in
the E. coli chromosome as a whole is 1 in 256 bp.
25.1 DNA Replication 959
Tandem array of
three 13 bp sequences
Binding sites for DnaA protein,
four 9 bp sequences
Consensus sequence
TTATCCACA
Consensus sequence
GATCTNTTNTTTT
FIGURE 25–11 Arrangement of sequences in the E. coli replication
origin, oriC. Although the repeated sequences (shaded in color) are
not identical, certain nucleotides are particularly common in each po-
sition, forming a consensus sequence. In positions where there is no
consensus, N represents any of the four nucleotides. The arrows indi-
cate the orientations of the nucleotide sequences.
1
2
3
DnaB helicase
Priming and
replication
DnaB
DnaC
HU
DnaA
Supercoiled
template
Three 13 bp
repeats
Four 9 bp
repeats
oriC
ATPH11001
ATPH11001
ATPH11001
FIGURE 25–12 Model for initiation of replication at the E. coli ori-
gin, oriC. 1H22071 About 20 DnaA protein molecules, each with a bound
ATP, bind at the four 9 bp repeats. The DNA is wrapped around this
complex. 2H22071 The three AUT-rich 13 bp repeats are denatured se-
quentially. 3H22071 Hexamers of the DnaB protein bind to each strand,
with the aid of DnaC protein. The DnaB helicase activity further un-
winds the DNA in preparation for priming and DNA synthesis.
8885d_c25_948-994 2/11/04 1:57 PM Page 959 mac76 mac76:385_reb:
Immediately after replication, the DNA is hemi-
methylated: the parent strands have methylated oriC
sequences but the newly synthesized strands do not. The
hemimethylated oriC sequences are now sequestered
for a period by interaction with the plasma membrane
(the mechanism is unknown). After a time, oriC is re-
leased from the plasma membrane, and it must be fully
methylated by Dam methylase before it can again bind
DnaA. Regulation of initiation also involves the slow hy-
drolysis of ATP by DnaA protein, which cycles the pro-
tein between active (with bound ATP) and inactive (with
bound ADP) forms on a timescale of 20 to 40 minutes.
Elongation The elongation phase of replication includes
two distinct but related operations: leading strand syn-
thesis and lagging strand synthesis. Several enzymes at
the replication fork are important to the synthesis of both
strands. Parent DNA is first unwound by DNA helicases,
and the resulting topological stress is relieved by topo-
isomerases. Each separated strand is then stabilized by
Chapter 25 DNA Metabolism960
TABLE 25–3 Proteins Required to Initiate Replication at the E. coli Origin
Number of
Protein M
r
subunits Function
DnaA protein 52,000 1 Recognizes ori sequence; opens duplex at specific sites in
origin
DnaB protein (helicase) 300,000 6
*
Unwinds DNA
DnaC protein 29,000 1 Required for DnaB binding at origin
HU 19,000 2 Histonelike protein; DNA-binding protein; stimulates initiation
Primase (DnaG protein) 60,000 1 Synthesizes RNA primers
Single-stranded DNA–binding
protein (SSB) 75,600 4
*
Binds single-stranded DNA
RNA polymerase 454,000 5 Facilitates DnaA activity
DNA gyrase (DNA topoisomerase II) 400,000 4 Relieves torsional strain generated by DNA unwinding
Dam methylase 32,000 1 Methylates (5H11032)GATC sequences at oriC
FIGURE 25–13 Synthesis of Okazaki
fragments. (a) At intervals, primase
synthesizes an RNA primer for a new
Okazaki fragment. Note that if we
consider the two template strands as
lying side by side, lagging strand
synthesis formally proceeds in the
opposite direction from fork movement.
(b) Each primer is extended by DNA
polymerase III. (c) DNA synthesis
continues until the fragment extends as
far as the primer of the previously added
Okazaki fragment. A new primer is
synthesized near the replication fork to
begin the process again.
5H11032
3H11032
5H11032
3H11032
5H11032
3H11032
Replication fork movement
Leading strand synthesis
(DNA polymerase III)
DnaB
helicase
DNA topoisomerase II
(DNA gyrase)
Lagging
strand
Lagging strand synthesis
(DNA polymerase III)
SSB
RNA
primer
DNA
primase
(a)
(c)
(b)
RNA primer
from previous
Okazaki
fragment
*
Subunits in these cases are identical.
8885d_c25_948-994 2/11/04 1:57 PM Page 960 mac76 mac76:385_reb:
SSB. From this point, synthesis of leading and lagging
strands is sharply different.
Leading strand synthesis, the more straightforward
of the two, begins with the synthesis by primase (DnaG
protein) of a short (10 to 60 nucleotide) RNA primer at
the replication origin. Deoxyribonucleotides are added
to this primer by DNA polymerase III. Leading strand
synthesis then proceeds continuously, keeping pace
with the unwinding of DNA at the replication fork.
Lagging strand synthesis, as we have noted, is ac-
complished in short Okazaki fragments. First, an RNA
primer is synthesized by primase and, as in leading
strand synthesis, DNA polymerase III binds to the RNA
primer and adds deoxyribonucleotides (Fig. 25–13). On
this level, the synthesis of each Okazaki fragment seems
straightforward, but the reality is quite complex. The
complexity lies in the coordination of leading and lag-
ging strand synthesis: both strands are produced by a
single asymmetric DNA polymerase III dimer, which is
accomplished by looping the DNA of the lagging strand
as shown in Figure 25–14, bringing together the two
points of polymerization.
25.1 DNA Replication 961
DnaB
Core
Clamp-loading complex
with open b sliding clamp
Lagging strand
RNA primer
of previous
Okazaki
fragment
Leading
strand
(a) Continuous synthesis on the leading strand proceeds
as DNA is unwound by the DnaB helicase.
Primase
New
RNA
primer
Primer of previous
Okazaki fragment
approaches core
subunits
(b) DNA primase binds to DnaB, synthesizes
a new primer, then dissociates.
Primase
Discarded
b clamp
The next b clamp
is readied
New b clamp is loaded
onto new template primer
Synthesis of new
Okazaki fragment
is completed
(c)
New b clamp
(e)
(d)
FIGURE 25–14 DNA synthesis on the leading
and lagging strands. Events at the replication fork
are coordinated by a single DNA polymerase III
dimer, in an integrated complex with DnaB
helicase. This figure shows the replication
process already underway (parts (a) through (e)
are discussed in the text). The lagging strand is
looped so that DNA synthesis proceeds steadily
on both the leading and lagging strand templates
at the same time. Red arrows indicate the 3H11032 end
of the two new strands and the direction of DNA
synthesis. Black arrows show the direction of
movement of the parent DNA through the
complex. An Okazaki fragment is being
synthesized on the lagging strand.
8885d_c25_948-994 2/11/04 1:57 PM Page 961 mac76 mac76:385_reb:
The synthesis of Okazaki fragments on the lagging
strand entails some elegant enzymatic choreography.
The DnaB helicase and DnaG primase constitute a func-
tional unit within the replication complex, the primo-
some. DNA polymerase III uses one set of its core sub-
units (the core polymerase) to synthesize the leading
strand continuously, while the other set of core subunits
cycles from one Okazaki fragment to the next on the
looped lagging strand. The DnaB helicase unwinds the
DNA at the replication fork (Fig. 25–14a) as it travels
along the lagging strand template in the 5H11032n3H11032 direc-
tion. DNA primase occasionally associates with DnaB
helicase and synthesizes a short RNA primer (Fig.
25–14b). A new H9252 sliding clamp is then positioned at the
primer by the clamp-loading complex of DNA poly-
merase III (Fig. 25–14c). When synthesis of an Okazaki
fragment has been completed, replication halts, and the
core subunits of DNA polymerase III dissociate from
their H9252 sliding clamp (and from the completed Okazaki
fragment) and associate with the new clamp (Fig.
25–14d, e). This initiates synthesis of a new Okazaki
fragment. As noted earlier, the entire complex respon-
sible for coordinated DNA synthesis at a replication fork
is a replisome. The proteins acting at the replication
fork are summarized in Table 25–4.
The replisome promotes rapid DNA synthesis,
adding ~1,000 nucleotides/s to each strand (leading and
lagging). Once an Okazaki fragment has been com-
pleted, its RNA primer is removed and replaced with
DNA by DNA polymerase I, and the remaining nick is
sealed by DNA ligase (Fig. 25–15).
DNA ligase catalyzes the formation of a phosphodi-
ester bond between a 3H11032 hydroxyl at the end of one
DNA strand and a 5H11032 phosphate at the end of another
strand. The phosphate must be activated by adenylyla-
tion. DNA ligases isolated from viruses and eukaryotes
use ATP for this purpose. DNA ligases from bacteria are
unusual in that they generally use NAD
H11001
—a cofactor
that normally functions in hydride transfer reactions
(see Fig. 13–15)—as the source of the AMP activating
group (Fig. 25–16). DNA ligase is another enzyme of
DNA metabolism that has become an important reagent
in recombinant DNA experiments (see Fig. 9–1).
Termination Eventually, the two replication forks of the
circular E. coli chromosome meet at a terminus region
containing multiple copies of a 20 bp sequence called
Ter (for terminus) (Fig. 25–17a). The Ter sequences are
arranged on the chromosome to create a sort of trap
that a replication fork can enter but cannot leave. The
Ter sequences function as binding sites for a protein
called Tus (terminus utilization substance). The Tus-Ter
complex can arrest a replication fork from only one di-
rection. Only one Tus-Ter complex functions per repli-
cation cycle—the complex first encountered by either
Chapter 25 DNA Metabolism962
TABLE 25–4 Proteins at the E. coli Replication Fork
Number of
Protein M
r
subunits Function
SSB 75,600 4 Binding to single-stranded DNA
DnaB protein (helicase) 300,000 6 DNA unwinding; primosome constituent
Primase (DnaG protein) 60,000 1 RNA primer synthesis; primosome constituent
DNA polymerase III 791,500 17 New strand elongation
DNA polymerase I 103,000 1 Filling of gaps; excision of primers
DNA ligase 74,000 1 Ligation
DNA gyrase (DNA topoisomerase II) 400,000 4 Supercoiling
Modified from Kornberg, A. (1982) Supplement to DNA Replication, Table S11–2, W. H. Freeman and Company, New York.
5H110323H11032
3H110325H11032
Lagging
strand
dNTPs
DNA polymerase I
rNMPs
Nick
ATP (or NAD
+
)
AMP +PP
i
(or NMN)
DNA ligase
FIGURE 25–15 Final steps in the synthesis of lagging strand seg-
ments. RNA primers in the lagging strand are removed by the 5H11032n3H11032
exonuclease activity of DNA polymerase I and replaced with DNA by
the same enzyme. The remaining nick is sealed by DNA ligase. The
role of ATP or NAD
H11001
is shown in Figure 25–16.
8885d_c25_948-994 2/11/04 1:57 PM Page 962 mac76 mac76:385_reb:
O
PP
i
(from ATP)
or
NMN (from NAD
H11001
)
Enzyme P O
O
H11002
O
Ribose Adenine
Enzyme
P
O
DNA ligase
OOH
Nick in DNA
Enzyme-AMP
NH
3
H11001
O
P
OOOH
O O
H11002
P
O
H11002
OO
DNA ligase
P
O
O
H11002
O
AdenineRibose
AMP
H11002
OP
O
H11002
O
Sealed DNA
AdenineRibose
ROPO
O
H11002
O
Ribose Adenine
AMP from ATP (R H11005 PP
i
)
or NAD
H11001
(R H11005 NMN)
NH
2
H11001
O
H11002
O
H11002
Enzyme NH
3
H11001
1 Adenylylation of
DNA ligase
2 Activation of
5H11032 phosphate in
nick
5H11032
3H11032
3H11032
5H11032
3 Displacement of AMP seals nick
replication fork. Given that opposing replication forks
generally halt when they collide, Ter sequences do not
seem essential, but they may prevent overreplication by
one replication fork in the event that the other is de-
layed or halted by an encounter with DNA damage or
some other obstacle.
So, when either replication fork encounters a func-
tional Tus-Ter complex, it halts; the other fork halts
when it meets the first (arrested) fork. The final few
hundred base pairs of DNA between these large protein
complexes are then replicated (by an as yet unknown
mechanism), completing two topologically interlinked
(catenated) circular chromosomes (Fig. 25–17b). DNA
circles linked in this way are known as catenanes. Sep-
aration of the catenated circles in E. coli requires topoi-
somerase IV (a type II topoisomerase). The separated
chromosomes then segregate into daughter cells at cell
division. The terminal phase of replication of other cir-
cular chromosomes, including many of the DNA viruses
that infect eukaryotic cells, is similar.
Bacterial Replication Is Organized in Membrane-
Bound Replication Factories
The replication of a circular bacterial chromosome is
highly organized. Once bidirectional replication is initi-
ated at the origin, the two replisomes do not travel away
from each other along the DNA. Instead, the replisomes
are linked together and tethered to one point on the
bacterial inner membrane, and the DNA substrate is fed
through this “replication factory” (Fig. 25–18a). The
tethering point is at the center of the elongated bacte-
rial cell. After initiation, each of the two newly synthe-
sized replication origins is partitioned into one half of
25.1 DNA Replication 963
FIGURE 25–16 Mechanism of the DNA ligase reaction. In each of
the three steps, one phosphodiester bond is formed at the expense of
another. Steps 1H22071 and 2H22071 lead to activation of the 5H11032 phosphate in
the nick. An AMP group is transferred first to a Lys residue on the en-
zyme and then to the 5H11032 phosphate in the nick. In step 3H22071, the 3H11032-
hydroxyl group attacks this phosphate and displaces AMP, producing a
phosphodiester bond to seal the nick. In the E. coli DNA ligase reac-
tion, AMP is derived from NAD
H11001
. The DNA ligases isolated from a
number of viral and eukaryotic sources use ATP rather than NAD
H11001
,
and they release pyrophosphate rather than nicotinamide mononu-
cleotide (NMN) in step 1H22071.
8885d_c25_963 2/12/04 11:32 AM Page 963 mac76 mac76:385_reb:
(a)
Origin
Clockwise
fork
Counter-
clockwise
fork trap
Clockwise
fork trap
Counterclockwise
fork
TerG
TerF
TerB TerC
TerA
TerD
TerB
Clockwise
fork
Counter-
clockwise
fork
completion
of replication
Catenated
chromosomes
Separated
chromosomes
(b)
DNA topoisomerase IV
the cell, and continuing replication extrudes each new
chromosome into that half (Fig. 25–18b). The elaborate
spatial organization of the newly replicated chromo-
somes is orchestrated and maintained by many proteins,
including bacterial homologs of the SMC proteins and
topoisomerases (Chapter 24). Once replication is ter-
minated, the cell divides, and the chromosomes se-
questered in the two halves of the original cell are ac-
curately partitioned into the daughter cells. When
replication commences in the daughter cells, the origin
of replication is sequestered in new replication factories
formed at a point on the membrane at the center of the
cell, and the entire process is repeated.
Replication in Eukaryotic Cells Is More Complex
The DNA molecules in eukaryotic cells are considerably
larger than those in bacteria and are organized into com-
plex nucleoprotein structures (chromatin; p. 938). The
essential features of DNA replication are the same in
eukaryotes and prokaryotes, and many of the protein
complexes are functionally and structurally conserved.
However, some interesting variations on the general
principles discussed above promise new insights into the
regulation of replication and its link with the cell cycle.
Origins of replication, called autonomously repli-
cating sequences (ARS) or replicators, have been
identified and best studied in yeast. Yeast replicators
span ~150 bp and contain several essential conserved
sequences. About 400 replicators are distributed among
the 16 chromosomes in a haploid yeast genome. Initia-
tion of replication in all eukaryotes requires a multi-
subunit protein, the origin recognition complex (ORC),
which binds to several sequences within the replicator.
ORC interacts with and is regulated by a number of
other proteins involved in control of the eukaryotic cell
cycle. Two other proteins, CDC6 (discovered in a screen
for genes affecting the cell division cycle) and CDT1
(Cdc10-dependent transcript 1), bind to ORC and me-
diate the loading of a heterohexamer of minichromo-
some maintenance proteins (MCM2 to MCM7). The
MCM complex is a ring-shaped replicative helicase, anal-
ogous to the bacterial DnaB helicase. The CDC6 and
CDT1 proteins have a role comparable to that of the
bacterial DnaC protein, loading the MCM helicase onto
the DNA near the replication origin.
The rate of replication fork movement in eukary-
otes (~50 nucleotides/s) is only one-twentieth that ob-
served in E. coli. At this rate, replication of an average
human chromosome proceeding from a single origin
Chapter 25 DNA Metabolism964
FIGURE 25–17 Termination of chromosome replication in
E. coli. (a) The Ter sequences are positioned on the chromo-
some in two clusters with opposite orientations. (b) Replication
of the DNA separating the opposing replication forks leaves the
completed chromosomes joined as catenanes, or topologically
interlinked circles. The circles are not covalently linked, but
because they are interwound and each is covalently closed,
they cannot be separated—except by the action of topoiso-
merases. In E. coli, a type II topoisomerase known as DNA
topoisomerase IV plays the primary role in the separation of
catenated chromosomes, transiently breaking both DNA strands
of one chromosome and allowing the other chromosome to pass
through the break.
8885d_c25_948-994 2/11/04 1:57 PM Page 964 mac76 mac76:385_reb:
5H110323H11032
5H110323H11032
(a)
would take more than 500 hours. Replication of human
chromosomes in fact proceeds bidirectionally from
many origins, spaced 30,000 to 300,000 bp apart. Eu-
karyotic chromosomes are almost always much larger
than bacterial chromosomes, so multiple origins are
probably a universal feature in eukaryotic cells.
Like bacteria, eukaryotes have several types of
DNA polymerases. Some have been linked to particu-
lar functions, such as the replication of mitochondrial
DNA. The replication of nuclear chromosomes involves
DNA polymerase H9251, in association with DNA poly-
merase H9254. DNA polymerase H9251 is typically a multisub-
unit enzyme with similar structure and properties in all
eukaryotic cells. One subunit has a primase activity, and
the largest subunit (M
r
~180,000) contains the poly-
merization activity. However, this polymerase has no
proofreading 3H11032n5H11032 exonuclease activity, making it un-
suitable for high-fidelity DNA replication. DNA poly-
merase H9251 is believed to function only in the synthesis
of short primers (containing either RNA or DNA) for
Okazaki fragments on the lagging strand. These primers
are then extended by the multisubunit DNA poly-
merase H9254. This enzyme is associated with and stimu-
lated by a protein called proliferating cell nuclear anti-
gen (PCNA; M
r
29,000), found in large amounts in the
nuclei of proliferating cells. The three-dimensional
structure of PCNA is remarkably similar to that of the
H9252 subunit of E. coli DNA polymerase III (Fig. 25–10b),
although primary sequence homology is not evident.
PCNA has a function analogous to that of the H9252 sub-
unit, forming a circular clamp that greatly enhances the
processivity of the polymerase. DNA polymerase H9254 has
a 3H11032n5H11032 proofreading exonuclease activity and appears
to carry out both leading and lagging strand synthesis
in a complex comparable to the dimeric bacterial DNA
polymerase III.
Yet another polymerase, DNA polymerase H9255, re-
places DNA polymerase H9254 in some situations, such as in
DNA repair. DNA polymerase H9255 may also function at the
replication fork, perhaps playing a role analogous to that
of the bacterial DNA polymerase I, removing the primers
of Okazaki fragments on the lagging strand.
25.1 DNA Replication 965
Origin
Bacterium
Replisome
replication
begins
origins
separate
cell elongates
as replication
continues
chromosomes
separate
cells
divide
Terminator
(b)
Chromosome
FIGURE 25–18 Chromosome partitioning
in bacteria. (a) All replication is carried
out at a central replication factory that
includes two complete replication forks.
(b) The two replicated copies of the
bacterial chromosome are extruded from
the replication factory into the two halves
of the cell, possibly with each newly
synthesized origin bound separately to
different points on the plasma membrane.
Sequestering the two chromosome copies
in separate cell halves facilitates their
proper segregation at cell division.
8885d_c25_948-994 2/11/04 1:57 PM Page 965 mac76 mac76:385_reb:
Many DNA viruses encode their own DNA poly-
merases, and some of these have become targets for
pharmaceuticals. For example, the DNA polymerase of
the herpes simplex virus is inhibited by acyclovir, a com-
pound developed by Gertrude Elion (p. 876). Acyclovir
consists of guanine attached to an incomplete ribose
ring. It is phosphorylated by a virally encoded thymi-
dine kinase; acyclovir binds to this viral enzyme with an
affinity 200-fold greater than its binding to the cellular
thymidine kinase. This ensures that phosphorylation oc-
curs mainly in virus-infected cells. Cellular kinases con-
vert the resulting acyclo-GMP to acyclo-GTP, which is
both an inhibitor and a substrate of DNA polymerases,
and which competitively inhibits the herpes DNA poly-
merase more strongly than cellular DNA polymerases.
Because it lacks a 3H11032 hydroxyl, acyclo-GTP also acts as
a chain terminator when incorporated into DNA. Thus
viral replication is inhibited at several steps.
Two other protein complexes also function in eu-
karyotic DNA replication. RPA (replication protein A)
is a eukaryotic single-stranded DNA–binding protein,
equivalent in function to the E. coli SSB protein. RFC
(replication factor C) is a clamp loader for PCNA and
facilitates the assembly of active replication complexes.
The subunits of the RFC complex have significant se-
quence similarity to the subunits of the bacterial clamp-
loading (H9253) complex.
The termination of replication on linear eukaryotic
chromosomes involves the synthesis of special struc-
tures called telomeres at the ends of each chromo-
some, as discussed in the next chapter.
SUMMARY 25.1 DNA Replication
■ Replication of DNA occurs with very high
fidelity and at a designated time in the cell
cycle. Replication is semiconservative, each
strand acting as template for a new daughter
strand. It is carried out in three identifiable
phases: initiation, elongation, and termination.
The reaction starts at the origin and usually
proceeds bidirectionally.
■ DNA is synthesized in the 5H11032n3H11032 direction by
DNA polymerases. At the replication fork, the
leading strand is synthesized continuously in
the same direction as replication fork
movement; the lagging strand is synthesized
discontinuously as Okazaki fragments, which
are subsequently ligated.
HN
N
N
O
O
OH
H
2
N
N
■ The fidelity of DNA replication is maintained
by (1) base selection by the polymerase, (2) a
3H11032n5H11032 proofreading exonuclease activity that is
part of most DNA polymerases, and (3) specific
repair systems for mismatches left behind after
replication.
■ Most cells have several DNA polymerases. In
E. coli, DNA polymerase III is the primary
replication enzyme. DNA polymerase I is
responsible for special functions during
replication, recombination, and repair.
■ Replication of the E. coli chromosome involves
many enzymes and protein factors organized in
replication factories, in which template DNA is
spooled through two replisomes tethered to the
bacterial plasma membrane.
■ Replication is similar in eukaryotic cells, but
eukaryotic chromosomes have many replication
origins.
25.2 DNA Repair
A cell generally has only one or two sets of genomic
DNA. Damaged proteins and RNA molecules can be
quickly replaced by using information encoded in the
DNA, but DNA molecules themselves are irreplaceable.
Maintaining the integrity of the information in DNA is a
cellular imperative, supported by an elaborate set of
DNA repair systems. DNA can become damaged by a
variety of processes, some spontaneous, others cat-
alyzed by environmental agents (Chapter 8). Replica-
tion itself can very occasionally damage the information
content in DNA when errors introduce mismatched base
pairs (such as G paired with T).
The chemistry of DNA damage is diverse and com-
plex. The cellular response to this damage includes a
wide range of enzymatic systems that catalyze some of
the most interesting chemical transformations in DNA
metabolism. We first examine the effects of alterations
in DNA sequence and then consider specific repair
systems.
Mutations Are Linked to Cancer
The best way to illustrate the importance of DNA repair
is to consider the effects of unrepaired DNA damage
(a lesion). The most serious outcome is a change in the
base sequence of the DNA, which, if replicated and
transmitted to future cell generations, becomes perma-
nent. A permanent change in the nucleotide sequence
of DNA is called a mutation. Mutations can involve the
replacement of one base pair with another (substitution
mutation) or the addition or deletion of one or more
base pairs (insertion or deletion mutations). If the mu-
tation affects nonessential DNA or if it has a negligible
Chapter 25 DNA Metabolism966
8885d_c25_948-994 2/11/04 1:57 PM Page 966 mac76 mac76:385_reb:
effect on the function of a gene, it is known as a silent
mutation. Rarely, a mutation confers some biological
advantage. Most nonsilent mutations, however, are
deleterious.
In mammals there is a strong correlation between
the accumulation of mutations and cancer. A simple test
developed by Bruce Ames measures the potential of a
given chemical compound to promote certain easily de-
tected mutations in a specialized bacterial strain (Fig.
25–19). Few of the chemicals that we encounter in daily
life score as mutagens in this test. However, of the com-
pounds known to be carcinogenic from extensive animal
trials, more than 90% are also found to be mutagenic in
the Ames test. Because of this strong correlation be-
tween mutagenesis and carcinogenesis, the Ames test
for bacterial mutagens is widely used as a rapid and in-
expensive screen for potential human carcinogens.
The genome of a typical mammalian cell accumulates
many thousands of lesions during a 24-hour period.
However, as a result of DNA repair, fewer than 1 in 1,000
becomes a mutation. DNA is a relatively stable mole-
cule, but in the absence of repair systems, the cumula-
tive effect of many infrequent but damaging reactions
would make life impossible.
All Cells Have Multiple DNA Repair Systems
The number and diversity of repair systems reflect both
the importance of DNA repair to cell survival and the
diverse sources of DNA damage (Table 25–5). Some
common types of lesions, such as pyrimidine dimers
(see Fig. 8–34), can be repaired by several distinct sys-
tems. Many DNA repair processes also appear to be ex-
traordinarily inefficient energetically—an exception to
25.2 DNA Repair 967
FIGURE 25–19 Ames test for carcinogens, based on their muta-
genicity. A strain of Salmonella typhimurium having a mutation that
inactivates an enzyme of the histidine biosynthetic pathway is plated
on a histidine-free medium. Few cells grow. (a) The few small colonies
of S. typhimurium that do grow on a histidine-free medium carry spon-
taneous back-mutations that permit the histidine biosynthetic pathway
to operate. Three identical nutrient plates (b), (c), and (d) have been
inoculated with an equal number of cells. Each plate then receives a
disk of filter paper containing progressively lower concentrations of a
mutagen. The mutagen greatly increases the rate of back-mutation and
hence the number of colonies. The clear areas around the filter paper
indicate where the concentration of mutagen is so high that it is lethal
to the cells. As the mutagen diffuses away from the filter paper, it is
diluted to sublethal concentrations that promote back-mutation. Mu-
tagens can be compared on the basis of their effect on mutation rate.
Because many compounds undergo a variety of chemical transforma-
tions after entering a cell, compounds are sometimes tested for mu-
tagenicity after first incubating them with a liver extract. Some sub-
stances have been found to be mutagenic only after this treatment.
TABLE 25–5 Types of DNA Repair Systems in E. coli
Enzymes/proteins Type of damage
Mismatch repair
Dam methylase Mismatches
MutH, MutL, MutS proteins
DNA helicase II
SSB
DNA polymerase III
Exonuclease I
Exonuclease VII
RecJ nuclease
Exonuclease X
DNA ligase
Base-excision repair
DNA glycosylases Abnormal bases (uracil,
hypoxanthine, xanthine);
alkylated bases; in some
other organisms,
pyrimidine dimers
AP endonucleases
DNA polymerase I
DNA ligase
Nucleotide-excision repair
ABC excinuclease DNA lesions that cause
large structural changes
(e.g., pyrimidine dimers)
DNA polymerase I
DNA ligase
Direct repair
DNA photolyases Pyrimidine dimers
O
6
-Methylguanine-DNA O
6
-Methylguanine
methyltransferase
AlkB protein 1-Methylguanine,
3-methylcytosine
(a) (b)
(c) (d)
8885d_c25_948-994 2/11/04 1:57 PM Page 967 mac76 mac76:385_reb:
the pattern observed in the metabolic pathways, where
every ATP is generally accounted for and used optimally.
When the integrity of the genetic information is at stake,
the amount of chemical energy invested in a repair
process seems almost irrelevant.
DNA repair is possible largely because the DNA mol-
ecule consists of two complementary strands. DNA
damage in one strand can be removed and accurately
replaced by using the undamaged complementary strand
as a template. We consider here the principal types of
repair systems, beginning with those that repair the
rare nucleotide mismatches that are left behind by repli-
cation.
Mismatch Repair Correction of the rare mismatches left
after replication in E. coli improves the overall fidelity
of replication by an additional factor of 10
2
to 10
3
. The
mismatches are nearly always corrected to reflect the
information in the old (template) strand, so the repair
system must somehow discriminate between the tem-
plate and the newly synthesized strand. The cell ac-
complishes this by tagging the template DNA with
methyl groups to distinguish it from newly synthesized
strands. The mismatch repair system of E. coli includes
at least 12 protein components (Table 25–5) that func-
tion either in strand discrimination or in the repair
process itself.
The strand discrimination mechanism has not been
worked out for most bacteria or eukaryotes, but is well
understood for E. coli and some closely related bacte-
ria. In these prokaryotes, strand discrimination is based
on the action of Dam methylase (Table 25–3), which, as
you will recall, methylates DNA at the N
6
position of all
adenines within (5H11032)GATC sequences. Immediately af-
ter passage of the replication fork, there is a short pe-
riod (a few seconds or minutes) during which the tem-
plate strand is methylated but the newly synthesized
strand is not (Fig. 25–20). The transient unmethylated
state of GATC sequences in the newly synthesized
strand permits the new strand to be distinguished from
the template strand. Replication mismatches in the
vicinity of a hemimethylated GATC sequence are then
repaired according to the information in the methylated
parent (template) strand. Tests in vitro show that if both
strands are methylated at a GATC sequence, few mis-
matches are repaired; if neither strand is methylated,
repair occurs but does not favor either strand. The cell’s
Chapter 25 DNA Metabolism968
GATC
CTAG
5H11032
3H11032
3H11032
5H11032
CH
3
CH
3
replication
GATC
CTAG
5H11032
3H11032
3H11032
5H11032
CH
3
GATC
CTAG
CH
3
5H11032
3H11032
For a short period
following replication,
the template strand is
methylated and the
new strand is not.
Hemimethylated DNA
GATC
CTAG
5H11032
3H11032
3H11032
5H11032
CH
3
GATC
CTAG
5H11032
3H11032
3H11032
5H11032
CH
3
After a few minutes
the new strand is
methylated and the
two strands can no
longer be distinguished.
Dam methylase
GATC
CTAG
5H11032
3H11032
3H11032
5H11032
CH
3
CH
3
GATC
CTAG
5H11032
3H11032
3H11032
5H11032
CH
3
CH
3
FIGURE 25–20 Methylation and mismatch repair. Methylation of
DNA strands can serve to distinguish parent (template) strands from
newly synthesized strands in E. coli DNA, a function that is critical to
mismatch repair (see Fig. 25–21). The methylation occurs at the N
6
of
adenines in (5H11032)GATC sequences. This sequence is a palindrome (see
Fig. 8–20), present in opposite orientations on the two strands.
8885d_c25_948-994 2/11/04 1:57 PM Page 968 mac76 mac76:385_reb:
CH
3
CH
3
CH
3
CH
3
5H11032
3H11032
3H11032
5H11032
MutL-MutS
complex
MutH
MutH cleaves the
unmodified strand
CH
3
CH
3
CH
3
CH
3
MutS
MutL
MutH
ATP
ADP+P
i
ATP
ADP+P
i
G A T C
C T A G
methyl-directed mismatch repair system efficiently re-
pairs mismatches up to 1,000 bp from a hemimethylated
GATC sequence. For many bacterial species, the mech-
anism of strand discrimination during mismatch repair
has not been determined.
How is the mismatch correction process directed
by relatively distant GATC sequences? A mechanism is
illustrated in Figure 25–21. MutL protein forms a com-
plex with MutS protein, and the complex binds to all
mismatched base pairs (except C–C). MutH protein
binds to MutL and to GATC sequences encountered by
the MutL-MutS complex. DNA on both sides of the mis-
match is threaded through the MutL-MutS complex, cre-
ating a DNA loop; simultaneous movement of both legs
of the loop through the complex is equivalent to the
complex moving in both directions at once along the
DNA. MutH has a site-specific endonuclease activity that
is inactive until the complex encounters a hemimethyl-
ated GATC sequence. At this site, MutH catalyzes cleav-
age of the unmethylated strand on the 5H11032 side of the G
in GATC, which marks the strand for repair. Further
steps in the pathway depend on where the mismatch is
located relative to this cleavage site (Fig. 25–22).
When the mismatch is on the 5H11032 side of the cleav-
age site, the unmethylated strand is unwound and de-
graded in the 3H11032n5H11032 direction from the cleavage site
through the mismatch, and this segment is replaced with
new DNA. This process requires the combined action of
DNA helicase II, SSB, exonuclease I or exonuclease X
(both of which degrade strands of DNA in the 3H11032n5H11032 di-
rection), DNA polymerase III, and DNA ligase. The
pathway for repair of mismatches on the 3H11032 side of the
cleavage site is similar, except that the exonuclease is
either exonuclease VII (which degrades single-stranded
DNA in the 5H11032n3H11032 or 3H11032n5H11032 direction) or RecJ nucle-
ase (which degrades single-stranded DNA in the 5H11032n3H11032
direction).
Mismatch repair is a particularly expensive process
for E. coli in terms of energy expended. The mismatch
may be 1,000 bp or more from the GATC sequence. The
degradation and replacement of a strand segment of this
length require an enormous investment in activated de-
oxynucleotide precursors to repair a single mismatched
base. This again underscores the importance to the cell
of genomic integrity.
All eukaryotic cells have several proteins struc-
turally and functionally analogous to the bacterial MutS
and MutL (but not MutH) proteins. Alterations in hu-
man genes encoding proteins of this type produce some
of the most common inherited cancer-susceptibility syn-
dromes (Box 25–1), further demonstrating the value to
the organism of DNA repair systems. The main MutS ho-
mologs in most eukaryotes, from yeast to humans, are
MSH2 (MutShomolog 2), MSH3, and MSH6. Het-
erodimers of MSH2 and MSH6 generally bind to single
25.2 DNA Repair 969
FIGURE 25–21 A model for the early steps of methyl-directed mis-
match repair. The proteins involved in this process in E. coli have been
purified (see Table 25–5). Recognition of the sequence (5H11032)GATC and
of the mismatch are specialized functions of the MutH and MutS pro-
teins, respectively. The MutL protein forms a complex with MutS at
the mismatch. DNA is threaded through this complex such that the
complex moves simultaneously in both directions along the DNA un-
til it encounters a MutH protein bound at a hemimethylated GATC se-
quence. MutH cleaves the unmethylated strand on the 5H11032 side of the
G in this sequence. A complex consisting of DNA helicase II and one
of several exonucleases then degrades the unmethylated DNA strand
from that point toward the mismatch (see Fig. 25–22).
8885d_c25_948-994 2/11/04 1:57 PM Page 969 mac76 mac76:385_reb:
Chapter 25 DNA Metabolism970
BOX 25–1 BIOCHEMISTRY IN MEDICINE
DNA Repair and Cancer
Human cancer develops when certain genes that reg-
ulate normal cell division (oncogenes and tumor
suppressor genes; Chapter 12) fail to function, are ac-
tivated at the wrong time, or are altered. As a conse-
quence, cells may grow out of control and form a
tumor. The genes controlling cell division can be dam-
aged by spontaneous mutation or overridden by the
invasion of a tumor virus (Chapter 26). Not surpris-
ingly, alterations in DNA-repair genes that result in an
increase in the rate of mutation can greatly increase
an individual’s susceptibility to cancer. Defects in the
genes encoding the proteins involved in nucleotide-
excision repair, mismatch repair, recombinational re-
pair, and error-prone translesion synthesis have all
been linked to human cancers. Clearly, DNA repair can
be a matter of life and death.
Nucleotide-excision repair requires a larger num-
ber of proteins in humans than in bacteria, although
the overall pathways are very similar. Genetic defects
that inactivate nucleotide-excision repair have been
associated with several genetic diseases, the best-
studied of which is xeroderma pigmentosum, or XP.
Because nucleotide-excision repair is the sole repair
pathway for pyrimidine dimers in humans, people with
XP are extremely light sensitive and readily develop
sunlight-induced skin cancers. Most people with XP
also have neurological abnormalities, presumably be-
cause of their inability to repair certain lesions caused
by the high rate of oxidative metabolism in neurons.
Defects in the genes encoding any of at least seven
different protein components of the nucleotide-
excision repair system can result in XP, giving rise
to seven different genetic groups denoted XPA to
XPG. Several of these proteins (notably XPB, XPD,
and XPG) also play roles in transcription-coupled
base-excision repair of oxidative lesions, described in
Chapter 26.
Most microorganisms have redundant pathways
for the repair of cyclobutane pyrimidine dimers—
making use of DNA photolyase and sometimes base-
excision repair as alternatives to nucleotide-excision
repair—but humans and other placental mammals do
not. This lack of a back-up to nucleotide-excision
repair for the removal of pyrimidine dimers has led to
speculation that early mammalian evolution involved
small, furry, nocturnal animals with little need to re-
pair UV damage. However, mammals do have a path-
way for the translesion bypass of cyclobutane pyrim-
idine dimers, which involves DNA polymerase H9257. This
enzyme preferentially inserts two A residues opposite
a T–T pyrimidine dimer, minimizing mutations. Peo-
ple with a genetic condition in which DNA polymerase
H9257 function is missing exhibit an XP-like illness known
as XP-variant or XP-V. Clinical manifestations of XP-
V are similar to those of the classic XP diseases, al-
though mutation levels are higher when cells are ex-
posed to UV light. Apparently, the nucleotide-excision
repair system works in concert with DNA polymerase
H9257 in normal human cells, repairing and/or bypassing
pyrimidine dimers as needed to keep cell growth and
DNA replication going. Exposure to UV light intro-
duces a heavy load of pyrimidine dimers, requiring
that some be bypassed by translesion synthesis to
keep replication on track. When either system is miss-
ing, it is partly compensated for by the other. A loss
of polymerase H9257 activity leads to stalled replication
forks and bypass of UV lesions by different, and more
mutagenic, translesion synthesis (TLS) polymerases.
As when other DNA repair systems are absent, the
resulting increase in mutations often leads to cancer.
One of the most common inherited cancer-sus-
ceptibility syndromes is hereditary nonpolyposis colon
cancer, or HNPCC. This syndrome has been traced to
defects in mismatch repair. Human and other eukary-
otic cells have several proteins analogous to the bac-
terial MutL and MutS proteins (see Fig. 25–21). De-
fects in at least five different mismatch repair genes
can give rise to HNPCC. The most prevalent are de-
fects in the hMLH1 (human MutL homolog 1) and
hMSH2 (human MutS homolog 2) genes. In individu-
als with HNPCC, cancer generally develops at an early
age, with colon cancers being most common.
Most human breast cancer occurs in women with
no known predisposition. However, about 10% of cases
are associated with inherited defects in two genes,
BRCA1 and BRCA2. BRCA1 and BRCA2 are large pro-
teins (human BRCA1 and BRCA2 are 1834 and 3418
amino acid residues long, respectively). They both in-
teract with a wide range of other proteins involved in
transcription, chromosome maintenance, DNA repair,
and control of the cell cycle. However, the precise
molecular function of BRACA1 and BRCA2 in these
various cellular processes is not yet clear. Women with
defects in either the BRCA1 or BRCA2 gene have a
greater than 80% chance of developing breast cancer.
8885d_c25_948-994 2/11/04 1:57 PM Page 970 mac76 mac76:385_reb:
base-pair mismatches, and bind less well to slightly
longer mispaired loops. In many organisms the longer
mismatches (2 to 6 bp) may be bound instead by a het-
erodimer of MSH2 and MSH3, or are bound by both
types of heterodimers in tandem. Homologs of MutL,
predominantly a heterodimer of MLH1 and PMS1 (post-
meiotic segregation), bind to and stabilize the MSH com-
plexes. Many details of the subsequent events in eu-
karyotic mismatch repair remain to be worked out. In
particular, we do not know the mechanism by which
newly synthesized DNA strands are identified, although
research has revealed that this strand identification
does not involve GATC sequences.
Base-Excision Repair Every cell has a class of enzymes
called DNA glycosylases that recognize particularly
common DNA lesions (such as the products of cytosine
and adenine deamination; see Fig. 8–33a) and remove
the affected base by cleaving the N-glycosyl bond. This
cleavage creates an apurinic or apyrimidinic site in the
DNA, commonly referred to as an AP site or abasic
site. Each DNA glycosylase is generally specific for one
type of lesion.
Uracil DNA glycosylases, for example, found in most
cells, specifically remove from DNA the uracil that re-
sults from spontaneous deamination of cytosine. Mutant
cells that lack this enzyme have a high rate of GmC to
AUT mutations. This glycosylase does not remove uracil
residues from RNA or thymine residues from DNA. The
capacity to distinguish thymine from uracil, the product
of cytosine deamination—necessary for the selective re-
pair of the latter—may be one reason why DNA evolved
to contain thymine instead of uracil (p. 293).
Bacteria generally have just one type of uracil DNA
glycosylase, whereas humans have at least four types,
with different specificities—an indicator of the impor-
tance of uracil removal from DNA. The most abundant
human uracil glycosylase, UNG, is associated with the
human replisome, where it eliminates the occasional U
residue inserted in place of a T during replication. The
deamination of C residues is 100-fold faster in single-
stranded DNA than in double-stranded DNA, and
25.2 DNA Repair 971
3H11032
5H11032
5H11032
3H11032
CH
3
CH
3
CH
3
CH
3
CH
3
CH
3
CH
3
CH
3
CH
3
CH
3
CH
3
CH
3
CH
3
CH
3
MutS
MutL
MutH
MutL-MutS
DNA helicase II
exonuclease VII
or
RecJ nuclease
MutL-MutS
DNA helicase II
exonuclease I
or
exonuclease X
DNA polymerase III
SSB
DNA polymerase III
SSB
ATP
ADP+P
i
ATP
ADP+P
i
ATP
ADP+P
i
FIGURE 25–22 Completing methyl-directed mismatch repair. The
combined action of DNA helicase II, SSB, and one of four different
exonucleases removes a segment of the new strand between the MutH
cleavage site and a point just beyond the mismatch. The exonuclease
that is used depends on the location of the cleavage site relative to
the mismatch. The resulting gap is filled in by DNA polymerase III,
and the nick is sealed by DNA ligase (not shown).
8885d_c25_948-994 2/11/04 1:57 PM Page 971 mac76 mac76:385_reb:
humans have the enzyme hSMUG1, which removes any
U residues that occur in single-stranded DNA during
replication or transcription. Two other human DNA
glycosylases, TDG and MBD4, remove either U or T
residues paired with G, generated by deamination of
cytosine or 5-methylcytosine, respectively.
Other DNA glycosylases recognize and remove a va-
riety of damaged bases, including formamidopyrimidine
and 8-hydroxyguanine (both arising from purine oxida-
tion), hypoxanthine (arising from adenine deamina-
tion), and alkylated bases such as 3-methyladenine and
7-methylguanine. Glycosylases that recognize other le-
sions, including pyrimidine dimers, have also been iden-
tified in some classes of organisms. Remember that AP
sites also arise from the slow, spontaneous hydrolysis of
the N-glycosyl bonds in DNA (see Fig. 8–33b).
Once an AP site has formed, another group of en-
zymes must repair it. The repair is not made by simply
inserting a new base and re-forming the N-glycosyl
bond. Instead, the deoxyribose 5H11032-phosphate left behind
is removed and replaced with a new nucleotide. This
process begins with AP endonucleases, enzymes that
cut the DNA strand containing the AP site. The position
of the incision relative to the AP site (5H11032 or 3H11032 to the
site) varies with the type of AP endonuclease. A seg-
ment of DNA including the AP site is then removed,
DNA polymerase I replaces the DNA, and DNA ligase
seals the remaining nick (Fig. 25–23). In eukaryotes, nu-
cleotide replacement is carried out by specialized poly-
merases, as described below.
Nucleotide-Excision Repair DNA lesions that cause large
distortions in the helical structure of DNA generally are
repaired by the nucleotide-excision system, a repair
pathway critical to the survival of all free-living organ-
isms. In nucleotide-excision repair (Fig. 25–24), a mul-
tisubunit enzyme hydrolyzes two phosphodiester bonds,
one on either side of the distortion caused by the lesion.
In E. coli and other prokaryotes, the enzyme system hy-
drolyzes the fifth phosphodiester bond on the 3H11032 side
and the eighth phosphodiester bond on the 5H11032 side to
generate a fragment of 12 to 13 nucleotides (depending
on whether the lesion involves one or two bases). In hu-
mans and other eukaryotes, the enzyme system hy-
drolyzes the sixth phosphodiester bond on the 3H11032 side
and the twenty-second phosphodiester bond on the 5H11032
side, producing a fragment of 27 to 29 nucleotides. Fol-
lowing the dual incision, the excised oligonucleotides
are released from the duplex and the resulting gap is
filled—by DNA polymerase I in E. coli and DNA poly-
merase H9255 in humans. DNA ligase seals the nick.
In E. coli, the key enzymatic complex is the ABC
excinuclease, which has three subunits, UvrA (M
r
104,000), UvrB (M
r
78,000), and UvrC (M
r
68,000). The
Chapter 25 DNA Metabolism972
FIGURE 25–23 DNA repair by the base-excision repair pathway.
1H22071 A DNA glycosylase recognizes a damaged base and cleaves
between the base and deoxyribose in the backbone. 2H22071 An AP en-
donuclease cleaves the phosphodiester backbone near the AP site.
3H22071 DNA polymerase I initiates repair synthesis from the free 3H11032 hy-
droxyl at the nick, removing (with its 5H11032n3H11032 exonuclease activity) a
portion of the damaged strand and replacing it with undamaged DNA.
4H22071 The nick remaining after DNA polymerase I has dissociated is
sealed by DNA ligase.
5H11032
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
5H11032
DNA
glycosylase
Damaged base
3H11032
3H11032
AP endonuclease
NTPs
DNA
ligase
New DNA Nick
Deoxyribose phosphate H11001 dNMPs
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P P P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P P P
P
P
P
P
P
P P
P
P
P
P
PP
P
P
P
P
P
P
P
P
P
P
P
P
P P P
P
P
P
P
P
P P
P
P
P
P
PP P
DNA
polymerase
I
2
1
3
4
8885d_c25_948-994 2/11/04 1:57 PM Page 972 mac76 mac76:385_reb:
25.2 DNA Repair 973
DNA polymerase I
3H110325H11032
5H110323H11032
P PP
P
P
P
POH OH
P
DNA ligase
DNA polymerase e
DNA ligase
DNA helicaseDNA helicase
13 mer
DNA lesion
human
excinuclease
E. coli
excinuclease
29 mer
1
2
3
4 4
3
2
1
FIGURE 25–24 Nucleotide-excision repair in E. coli and humans. The general pathway of
nucleotide-excision repair is similar in all organisms. 1H22071 An excinuclease binds to DNA at the site
of a bulky lesion and cleaves the damaged DNA strand on either side of the lesion. 2H22071 The DNA
segment—of 13 nucleotides (13 mer) or 29 nucleotides (29 mer)—is removed with the aid of a
helicase. 3H22071 The gap is filled in by DNA polymerase, and 4H22071 the remaining nick is sealed with
DNA ligase.
term “excinuclease” is used to describe the unique ca-
pacity of this enzyme complex to catalyze two specific
endonucleolytic cleavages, distinguishing this activity
from that of standard endonucleases. A complex of the
UvrA and UvrB proteins (A
2
B) scans the DNA and binds
to the site of a lesion. The UvrA dimer then dissociates,
leaving a tight UvrB-DNA complex. UvrC protein then
binds to UvrB, and UvrB makes an incision at the fifth
phosphodiester bond on the 3H11032 side of the lesion. This
is followed by a UvrC-mediated incision at the eighth
phosphodiester bond on the 5H11032 side. The resulting 12 to
13 nucleotide fragment is removed by UvrD helicase.
The short gap thus created is filled in by DNA poly-
merase I and DNA ligase. This pathway is a primary
repair route for many types of lesions, including cyclo-
butane pyrimidine dimers, 6-4 photoproducts (see Fig.
8–34), and several other types of base adducts includ-
ing benzo[a]pyrene-guanine, which is formed in DNA by
exposure to cigarette smoke. The nucleolytic activity of
the ABC excinuclease is novel in the sense that two cuts
are made in the DNA (Fig. 25–24).
The mechanism of eukaryotic excinucleases is quite
similar to that of the bacterial enzyme, although 16 poly-
peptides with no similarity to the E. coli excinuclease
subunits are required for the dual incision. As described
in Chapter 26, some of the nucleotide-excision repair
and base-excision repair in eukaryotes is closely tied
to transcription. Genetic deficiencies in nucleotide-
excision repair in humans give rise to a variety of serious
diseases (Box 25–1).
8885d_c25_948-994 2/11/04 1:57 PM Page 973 mac76 mac76:385_reb:
age (Fig. 25–25). Photolyases generally contain two
cofactors that serve as light-absorbing agents, or chro-
mophores. One of the chromophores is always FADH
H11002
.
In E. coli and yeast, the other chromophore is a folate.
The reaction mechanism entails the generation of free
radicals. DNA photolyases are not present in the cells
of placental mammals (which include humans).
Chapter 25 DNA Metabolism974
O
O
NHHN
ON
P
Cyclobutane pyrimidine dimer
Monomeric pyrimidines
in repaired DNA
O
N
O
HN
N
H
H
C
N
H
N
N (Glu)
n
C
H
2
O
H11002
e
H11002
e
H11002
O
NHHN
ON
P
O
N
O
H11002
O
NHHN
ON
P
O
N
O
H11002
O
NH
NH
CH
3
CH
3
HN
O
O
N
P
O
N
H11002
O
O
N
N
R
N
H
O
NHHN
ON
P
O
N
O
Flavin radical
FADH
?
NH
CH
3
CH
3
H
2
N
N
H11002
O
O
N
R
N
H
*FADH
H11002
NH
CH
3
CH
3 N
H11002
O
O
N
R
N
H
FADH
H11002
H11001
H11001H11001
MTHFpolyGlu
O
HN
N
H
H
C
N
H
N
N (Glu)
n
C
H
2
H
2
N
*MTHFpolyGlu
light
*
3
4
4 5
1
2
MECHANISM FIGURE 25–25 Repair of pyrimidine dimers with pho-
tolyase. Energy derived from absorbed light is used to reverse the pho-
toreaction that caused the lesion. The two chromophores in E. coli
photolyase (M
r
54,000), N
5
,N
10
-methenyltetrahydrofolylpolygluta-
mate (MTHFpolyGlu) and FADH
H11002
, perform complementary functions.
On binding of photolyase to a pyrimidine dimer, repair proceeds as
follows. 1H22071 A blue-light photon (300 to 500 nm wavelength) is ab-
sorbed by the MTHFpolyGlu, which functions as a photoantenna.
2H22071 The excitation energy passes to FADH
H11002
in the active site of the
enzyme. 3H22071 The excited flavin (*FADH
H11002
) donates an electron to the
pyrimidine dimer (shown here in a simplified representation) to gen-
erate an unstable dimer radical. 4H22071 Electronic rearrangement restores
the monomeric pyrimidines, and 5H22071 the electron is transferred back
to the flavin radical to regenerate FADH
H11002
.
Direct Repair Several types of damage are repaired
without removing a base or nucleotide. The best-char-
acterized example is direct photoreactivation of cy-
clobutane pyrimidine dimers, a reaction promoted by
DNA photolyases. Pyrimidine dimers result from an
ultraviolet light–induced reaction, and photolyases use
energy derived from absorbed light to reverse the dam-
8885d_c25_948-994 2/11/04 1:57 PM Page 974 mac76 mac76:385_reb:
25.2 DNA Repair 975
O
Guanine
methylation and replication
O
6
-Methylguanine
ON
N
HN
N
R
H
N
N
H
N
N
N
R
R
H
O
N
N
N
R
O
H
Cytosine
CH
3
O
N
N
CH
3
Thymine
HN
H
H
(a)
G
CH
3
methylation
CG C
G
CH
3
TC
replication
A T
replication
Correctly
paired DNA
(no mutations)
G
CH
3
T
G
(b)
FIGURE 25–26 Example of how DNA damage results in mutations.
(a) The methylation product O
6
-methylguanine pairs with thymine
rather than cytosine. (b) If not repaired, this leads to a GmC to AUT
mutation after replication.
Additional examples can be seen in the repair of
nucleotides with alkylation damage. The modified nu-
cleotide O
6
-methylguanine forms in the presence of
alkylating agents and is a common and highly mutagenic
lesion (p. 295). It tends to pair with thymine rather than
cytosine during replication, and therefore causes GmC
to AUT mutations (Fig. 25–26). Direct repair of O
6
-
methylguanine is carried out by O
6
-methylguanine-DNA
methyltransferase, a protein that catalyzes transfer of
the methyl group of O
6
-methylguanine to one of its own
Cys residues. This methyltransferase is not strictly an
enzyme, because a single methyl transfer event perma-
nently methylates the protein, making it inactive in this
pathway. The consumption of an entire protein mole-
cule to correct a single damaged base is another vivid
illustration of the priority given to maintaining the in-
tegrity of cellular DNA.
OCH
3
CH
3
N
N
N
Cys
active
SH
R
H
2
N
N
O
HN
N
N
R
Guanine nucleotide
methyltransferase
H
2
N
N
Cys
inactive
S
O
6
-Methylguanine nucleotide
A very different but equally direct mechanism is
used to repair 1-methyladenine and 3-methylcytosine.
The amino groups of A and C residues are sometimes
methylated when the DNA is single-stranded, and the
methylation directly affects proper base pairing. In
E. coli, oxidative demethylation of these alkylated nu-
cleotides is mediated by the AlkB protein, a member of
the H9251-ketoglutarate-Fe
2H11001
–dependent dioxygenase su-
perfamily (Fig. 25–27). (See Box 4–3 for a description
of another member of this enzyme family.)
8885d_c25_948-994 2/11/04 1:57 PM Page 975 mac76 mac76:385_reb:
The Interaction of Replication Forks with DNA
Damage Can Lead to Error-Prone Translesion
DNA Synthesis
The repair pathways considered to this point generally
work only for lesions in double-stranded DNA, the un-
damaged strand providing the correct genetic informa-
tion to restore the damaged strand to its original state.
However, in certain types of lesions, such as double-
strand breaks, double-strand cross-links, or lesions in a
single-stranded DNA, the complementary strand is it-
self damaged or is absent. Double-strand breaks and le-
sions in single-stranded DNA most often arise when a
replication fork encounters an unrepaired DNA lesion
(Fig. 25–28). Such lesions and DNA cross-links can also
result from ionizing radiation and oxidative reactions.
At a stalled bacterial replication fork, there are two
avenues for repair. In the absence of a second strand,
the information required for accurate repair must come
from a separate, homologous chromosome. The repair
system thus involves homologous genetic recombina-
tion. This recombinational DNA repair is considered
in detail in Section 25.3. Under some conditions, a sec-
ond repair pathway, error-prone translesion DNA
synthesis (often abbreviated TLS), becomes available.
When this pathway is active, DNA repair becomes sig-
nificantly less accurate and a high mutation rate can re-
sult. In bacteria, error-prone translesion DNA synthesis
is part of a cellular stress response to extensive DNA
damage known, appropriately enough, as the SOS re-
sponse. Some SOS proteins, such as the UvrA and UvrB
proteins already described (Table 25–6), are normally
present in the cell but are induced to higher levels as
part of the SOS response. Additional SOS proteins par-
ticipate in the pathway for error-prone repair; these in-
clude the UmuC and UmuD proteins (“Umu” from un-
mutable; lack of the umu gene function eliminates
error-prone repair). The UmuD protein is cleaved in an
SOS-regulated process to a shorter form called UmuDH11032,
which forms a complex with UmuC to create a special-
ized DNA polymerase (DNA polymerase V) that can
replicate past many of the DNA lesions that would nor-
mally block replication. Proper base pairing is often im-
possible at the site of such a lesion, so this translesion
replication is error-prone.
Given the emphasis on the importance of genomic
integrity throughout this chapter, the existence of a sys-
tem that increases the rate of mutation may seem in-
congruous. However, we can think of this system as a
desperation strategy. The umuC and umuD genes are
fully induced only late in the SOS response, and they
are not activated for translesion synthesis initiated by
UmuD cleavage unless the levels of DNA damage are
particularly high and all replication forks are blocked.
The mutations resulting from DNA polymerase V–
mediated replication kill some cells and create deleteri-
ous mutations in others, but this is the biological price
an organism pays to overcome an otherwise insur-
mountable barrier to replication, as it permits at least a
few mutant cells to survive.
In addition to DNA polymerase V, translesion repli-
cation requires the RecA protein, SSB, and some sub-
units derived from DNA polymerase III. Yet another DNA
polymerase, DNA polymerase IV, is also induced during
Chapter 25 DNA Metabolism976
N
N
N
NH
2
N
H11001
CH
3
CH
3
N
H11001
CH
2
N
NH
2
N
H11001
N
N
N
NH
2
H
2
C
OH
N
N
N
N
NH
2
CH
2
CH
2
CO
2
H11001
COO
H11002
COO
H11002
C
CH
2
CH
2
O
2
H11001
CO
2
H11001
O
2
H11001
COO
H11002
COO
H11002
H9251-Ketoglutarate
1-Methyladenine
3-Methylcytosine
Succinate
AlkB, Fe
2H11001
H9251-Ketoglutarate
Succinate
AlkB, Fe
2H11001
O
O
H11001 H
H11001
Formaldehyde
N
H11001
CH
2
N
NH
2
H
2
C
OH
O H11001 H
H11001
Formaldehyde
Adenine
O
O
N
N
NH
2
O
Cytosine
FIGURE 25–27 Direct repair of alkylated bases by AlkB. The AlkB
protein is an H9251-ketoglutarate-Fe
2H11001
–dependent dioxygenase. It cat-
alyzes the oxidative demethylation of 1-methyladenine and 3-methyl-
cytosine residues.
8885d_c25_948-994 2/11/04 1:57 PM Page 976 mac76 mac76:385_reb:
the SOS response. Replication by DNA polymerase IV,
a product of the dinB gene, is also highly error-prone.
The bacterial DNA polymerases IV and V are part of a
family of TLS polymerases found in all organisms. These
enzymes lack a proofreading exonuclease activity, and
the fidelity of replicative base selection can be reduced
by a factor of 10
2
, lowering overall replication fidelity to
one error in ~1,000 nucleotides.
Mammals have many low-fidelity DNA polymerases
of the TLS polymerase family. However, the presence of
these enzymes does not necessarily translate into an
unacceptable mutational burden, because most of the
25.2 DNA Repair 977
FIGURE 25–28 DNA damage and its effect on
DNA replication. If the replication fork encounters
an unrepaired lesion or strand break, replication
generally halts and the fork may collapse. A lesion
is left behind in an unreplicated, single-stranded
segment of the DNA; a strand break becomes a
double-strand break. In each case, the damage to
one strand cannot be repaired by mechanisms
described earlier in this chapter, because the
complementary strand required to direct accurate
repair is damaged or absent. There are two
possible avenues for repair: recombinational DNA
repair (described in Fig. 25–37) or, when lesions
are unusually numerous, error-prone repair. The
latter mechanism involves a novel DNA poly-
merase (DNA polymerase V, encoded by the
umuC and umuD genes) that can replicate, albeit
inaccurately, over many types of lesions. The
repair mechanism is referred to as error-prone
because mutations often result.
Unrepaired
lesion
Unrepaired
break
Single-stranded
DNA
Double-strand
break
Recombinational
DNA repair or
error-prone repair
Recombinational
DNA repair
TABLE 25–6 Genes Induced as Part of the SOS Response in E. coli
Gene name Protein encoded and/or role in DNA repair
Genes of known function
polB (dinA) Encodes polymerization subunit of DNA polymerase II, required for
replication restart in recombinational DNA repair
uvrA
Encode ABC excinuclease subunits UvrA and UvrB
uvrB
umuC
Encode DNA polymerase V
umuD
sulA Encodes protein that inhibits cell division, possibly to allow time for DNA repair
recA Encodes RecA protein, required for error-prone repair and recombinational repair
dinB Encodes DNA polymerase IV
Genes involved in DNA metabolism,
but role in DNA repair unknown
ssb Encodes single-stranded DNA–binding protein (SSB)
uvrD Encodes DNA helicase II (DNA-unwinding protein)
himA Encodes subunit of integration host factor (IHF), involved in site-specific
recombination, replication, transposition, regulation of gene expression
recN Required for recombinational repair
Genes of unknown function
dinD
dinF
H20903
H20903
Note: Some of these genes and their functions are further discussed in Chapter 28.
8885d_c25_948-994 2/11/04 1:57 PM Page 977 mac76 mac76:385_reb:
enzymes also have specialized functions in DNA repair.
DNA polymerase H9257 (eta), for example, is a TLS poly-
merase found in all eukaryotes. It promotes translesion
synthesis primarily across cyclobutane T–T dimers. Few
mutations result in this case, because the enzyme pref-
erentially inserts two A residues across from the linked
T residues. Several other low-fidelity polymerases, in-
cluding DNA polymerases H9252, H9259 (iota), and H9261, have spe-
cialized roles in eukaryotic base-excision repair. Each of
these enzymes has a 5H11032-deoxyribose phosphate lyase ac-
tivity in addition to its polymerase activity. After base
removal by a glycosylase and backbone cleavage by an
AP endonuclease, these enzymes remove the abasic site
(a 5H11032-deoxyribose phosphate) and fill in the very short
gap with their polymerase activity. The frequency of mu-
tations due to DNA polymerase H9257 activity is minimized
by the very short lengths (often one nucleotide) of DNA
synthesized.
What emerges from research into cellular DNA re-
pair systems is a picture of a DNA metabolism that main-
tains genomic integrity with multiple and often redun-
dant systems. In the human genome, more than 130
genes encode proteins dedicated to the repair of DNA.
In many cases, the loss of function of one of these pro-
teins results in genomic instability and an increased oc-
currence of oncogenesis (Box 25–1). These repair sys-
tems are often integrated with the DNA replication
systems and are complemented by the recombination
systems that we turn to next.
SUMMARY 25.2 DNA Repair
■ Cells have many systems for DNA repair.
Mismatch repair in E. coli is directed by
transient nonmethylation of (5H11032)GATC
sequences on the newly synthesized strand.
■ Base-excision repair systems recognize and
repair damage caused by environmental agents
(such as radiation and alkylating agents) and
spontaneous reactions of nucleotides. Some
repair systems recognize and excise only
damaged or incorrect bases, leaving an AP
(abasic) site in the DNA. This is repaired by
excision and replacement of the DNA segment
containing the AP site.
■ Nucleotide-excision repair systems recognize
and remove a variety of bulky lesions and
pyrimidine dimers. They excise a segment of
the DNA strand including the lesion, leaving a
gap that is filled in by DNA polymerase and
ligase activities.
■ Some DNA damage is repaired by direct
reversal of the reaction causing the damage:
pyrimidine dimers are directly converted to
monomeric pyrimidines by a photolyase, and
the methyl group of O
6
-methylguanine is
removed by a methyltransferase.
■ In bacteria, error-prone translesion DNA
synthesis, involving TLS DNA polymerases,
occurs in response to very heavy DNA damage.
In eukaryotes, similar polymerases have
specialized roles in DNA repair that minimize
the introduction of mutations.
25.3 DNA Recombination
The rearrangement of genetic information within and
among DNA molecules encompasses a variety of proc-
esses, collectively placed under the heading of genetic
recombination. The practical applications of DNA re-
arrangements in altering the genomes of increasing num-
bers of organisms are now being explored (Chapter 9).
Genetic recombination events fall into at least three
general classes. Homologous genetic recombination
(also called general recombination) involves genetic ex-
changes between any two DNA molecules (or segments
of the same molecule) that share an extended region of
nearly identical sequence. The actual sequence of bases
is irrelevant, as long as it is similar in the two DNAs. In
site-specific recombination, the exchanges occur
only at a particular DNA se-
quence. DNA transposition
is distinct from both other
classes in that it usually in-
volves a short segment of DNA
with the remarkable capacity
to move from one location in a
chromosome to another. These
“jumping genes” were first ob-
served in maize in the 1940s by
Barbara McClintock. There is
in addition a wide range of un-
usual genetic rearrangements
for which no mechanism or
purpose has yet been proposed. Here we focus on the
three general classes.
The functions of genetic recombination systems are
as varied as their mechanisms. They include roles in spe-
cialized DNA repair systems, specialized activities in
DNA replication, regulation of expression of certain
genes, facilitation of proper chromosome segregation
during eukaryotic cell division, maintenance of genetic
diversity, and implementation of programmed genetic
rearrangements during embryonic development. In
most cases, genetic recombination is closely integrated
with other processes in DNA metabolism, and this be-
comes a theme of our discussion.
Chapter 25 DNA Metabolism978
Barbara McClintock,
1902–1992
8885d_c25_948-994 2/11/04 1:57 PM Page 978 mac76 mac76:385_reb:
Homologous Genetic Recombination
Has Several Functions
In bacteria, homologous genetic recombination is pri-
marily a DNA repair process and in this context (as
noted in Section 25.2) is referred to as recombina-
tional DNA repair. It is usually directed at the recon-
struction of replication forks stalled at the site of DNA
damage. Homologous genetic recombination can also
occur during conjugation (mating), when chromosomal
DNA is transferred from a donor to a recipient bacter-
ial cell. Recombination during conjugation, although
rare in wild bacterial populations, contributes to genetic
diversity.
In eukaryotes, homologous genetic recombination
can have several roles in replication and cell division,
including the repair of stalled replication forks. Recom-
bination occurs with the highest frequency during meio-
sis, the process by which diploid germ-line cells with
two sets of chromosomes divide to produce haploid ga-
metes—sperm cells or ova in higher eukaryotes—each
gamete having only one member of each chromosome
pair (Fig. 25–29). Meiosis begins with replication of the
DNA in the germ-line cell so that each DNA molecule is
present in four copies. The cell then goes through two
rounds of cell division without an intervening round of
DNA replication. This reduces the DNA content to the
haploid level in each gamete.
After the DNA is replicated during prophase of the
first meiotic division, the resulting sister chromatids re-
main associated at their centromeres. At this stage, each
set of four homologous chromosomes exists as two pairs
of chromatids. Genetic information is now exchanged
between the closely associated homologous chromatids
by homologous genetic recombination, a process in-
volving the breakage and rejoining of DNA (Fig. 25–30).
This exchange, also referred to as crossing over, can be
observed with the light microscope. Crossing over links
the two pairs of sister chromatids together at points
called chiasmata (singular, chiasma).
25.3 DNA Recombination 979
replication
Diploid
germ-line
cell
Prophase
I
separation of
homologous
pairs
first
meiotic
division
second
meiotic
division
Haploid
gametes
FIGURE 25–29 Meiosis in eukaryotic germ-line cells. The chromo-
somes of a hypothetical diploid germ-line cell (six chromosomes; three
homologous pairs) replicate and are held together at their centromeres.
Each replicated double-stranded DNA molecule is called a chromatid
(sister chromatid). In prophase I, just before the first meiotic division,
the three homologous sets of chromatids align to form tetrads, held
together by covalent links at homologous junctions (chiasmata).
Crossovers occur within the chiasmata (see Fig. 25–30). These tran-
sient associations between homologs ensure that the two tethered
chromosomes segregate properly in the next step, when they migrate
toward opposite poles of the dividing cell in the first meiotic division.
The products of this division are two daughter cells, each with three
pairs of chromatids. The pairs now line up across the equator of the
cell in preparation for separation of the chromatids (now called chro-
mosomes). The second meiotic division produces four haploid daugh-
ter cells that can serve as gametes. Each has three chromosomes, half
the number of the diploid germ-line cell. The chromosomes have re-
sorted and recombined.
8885d_c25_948-994 2/11/04 1:57 PM Page 979 mac76 mac76:385_reb:
Crossing over effectively links together all four ho-
mologous chromatids, a linkage that is essential to the
proper segregation of chromosomes in the subsequent
meiotic cell divisions. Crossing over is not an entirely
random process, and “hot spots” have been identified
on many eukaryotic chromosomes. However, the as-
sumption that crossing over can occur with equal prob-
ability at almost any point along the length of two
homologous chromosomes remains a reasonable ap-
proximation in many cases, and it is this assumption that
permits the genetic mapping of genes. The frequency of
homologous recombination in any region separating two
points on a chromosome is roughly proportional to the
distance between the points, and this allows determi-
nation of the relative positions of and distances between
different genes.
Homologous recombination thus serves at least
three identifiable functions: (1) it contributes to the re-
pair of several types of DNA damage; (2) it provides, in
eukaryotic cells, a transient physical link between chro-
matids that promotes the orderly segregation of chro-
mosomes at the first meiotic cell division; and (3) it
enhances genetic diversity in a population.
Recombination during Meiosis Is Initiated
with Double-Strand Breaks
A likely pathway for homologous recombination during
meiosis is outlined in Figure 25–31a. The model has
four key features. First, homologous chromosomes are
aligned. Second, a double-strand break in a DNA mole-
cule is enlarged by an exonuclease, leaving a single-
strand extension with a free 3H11032-hydroxyl group at the
broken end (step 1 ). Third, the exposed 3H11032 ends in-
vade the intact duplex DNA, and this is followed by
branch migration (Fig. 25–32) and/or replication to
create a pair of crossover structures, called Holliday
junctions (Fig. 25–31a, steps 2 to 4 ). Fourth, cleav-
age of the two crossovers creates two complete recom-
binant products (step 5 ).
In this double-strand break repair model for re-
combination, the 3H11032 ends are used to initiate the genetic
exchange. Once paired with the complementary strand
on the intact homolog, a region of hybrid DNA is created
that contains complementary strands from two differ-
ent parent DNAs (the product of step 2 in Fig. 25–31a).
Each of the 3H11032 ends can then act as a primer for DNA
replication. The structures thus formed, Holliday inter-
mediates (Fig. 25–31b), are a feature of homologous
genetic recombination pathways in all organisms.
Homologous recombination can vary in many details
from one species to another, but most of the steps out-
lined above are generally present in some form. There
are two ways to cleave, or “resolve,” the Holliday inter-
mediate so that the two recombinant products carry
genes in the same linear order as in the substrates—the
original, unrecombined chromosomes (step 5 of Fig.
25–31a). If cleaved one way, the DNA flanking the re-
gion containing the hybrid DNA is not recombined; if
cleaved the other way, the flanking DNA is recombined.
Both outcomes are observed in vivo in eukaryotes and
prokaryotes.
Chapter 25 DNA Metabolism980
Centromere
Homologous
pair
Homolog
Tetrad
Crossover
point
(chiasma)
(a) (b)
Sister
chromatids
2 m
Centromeres
Chromatids
H9262
FIGURE 25–30 Crossing over. (a) Crossing over often produces an
exchange of genetic material. (b) The homologous chromosomes of a
grasshopper are shown during prophase I of meiosis. Many points of
joining (chiasmata) are evident between the two homologous pairs of
chromatids. These chiasmata are the physical manifestation of prior
homologous recombination (crossing over) events.
8885d_c25_948-994 2/11/04 1:57 PM Page 980 mac76 mac76:385_reb:
The homologous recombination illustrated in Figure
25–31 is a very elaborate process with subtle molecular
consequences for the generation of genetic diversity. To
understand how this process contributes to diversity, we
should keep in mind that the two homologous chromo-
somes that undergo recombination are not necessarily
identical. The linear array of genes may be the same,
but the base sequences in some of the genes may differ
slightly (in different alleles). In a human, for example, one
chromosome may contain the allele for hemoglobin A
25.3 DNA Recombination 981
Gene
A
Gene
B
(a)
5H11032
3H11032
3H11032
5H11032
5H11032
3H11032
3H11032
5H11032
A double-strand break in one of two
homologs is converted to a double-
strand gap by the action of exonucleases.
Strands with 3H11032 ends are degraded
less than those with 5H11032 ends,
producing 3H11032 single-strand extensions.
An exposed 3H11032 end pairs with its
complement in the intact homolog.
The other strand of the duplex
is displaced.
The invading 3H11032 end is extended by
DNA polymerase plus branch migration,
eventually generating a DNA molecule
with two crossovers called Holliday
intermediates.
Further DNA replication replaces the
DNA missing from the site of the
original double-strand break.
Cleavage of the Holliday intermediates
by specialized nucleases generates
either of the two recombination products.
In product set 2, the DNA on either
side of the region undergoing repair
is recombined.
Product set 1 Product set 2
4
2
1
5
3
(b)
FIGURE 25–31 Recombination during meiosis.
(a) Model of double-strand break repair for
homologous genetic recombination. The two
homologous chromosomes involved in this
recombination event have similar sequences.
Each of the two genes shown has different alleles
on the two chromosomes. The DNA strands and
alleles are colored differently so that their fate is
evident. The steps are described in the text.
(b) A Holliday intermediate formed between two
bacterial plasmids in vivo, as seen with the
electron microscope. The intermediates are
named for Robin Holliday, who first proposed
their existence in 1964.
8885d_c25_948-994 2/11/04 1:57 PM Page 981 mac76 mac76:385_reb:
(normal hemoglobin) while the other contains the allele
for hemoglobin S (the sickle-cell mutation). The differ-
ence may consist of no more than one base pair among
millions. Homologous recombination does not change the
linear array of genes, but it can determine which alleles
become linked together on a single chromosome.
Recombination Requires a Host of Enzymes
and Other Proteins
Enzymes that promote various steps of homologous re-
combination have been isolated from both prokaryotes
and eukaryotes. In E. coli, the recB, recC, and recD
genes encode the RecBCD enzyme, which has both he-
licase and nuclease activities. The RecA protein pro-
motes all the central steps in the homologous recombi-
nation process: the pairing of two DNAs, formation of
Holliday intermediates, and branch migration (as de-
scribed below). The RuvA and RuvB proteins (repair of
UV damage) form a complex that binds to Holliday in-
termediates, displaces RecA protein, and promotes
branch migration at higher rates than does RecA. Nu-
cleases that specifically cleave Holliday intermediates,
often called resolvases, have been isolated from bacte-
ria and yeast. The RuvC protein is one of at least two
such nucleases in E. coli.
The RecBCD enzyme binds to linear DNA at a free
(broken) end and moves inward along the double helix,
unwinding and degrading the DNA in a reaction coupled
to ATP hydrolysis (Fig. 25–33). The activity of the en-
zyme is altered when it interacts with a sequence re-
ferred to as chi, (5H11032)GCTGGTGG. From that point,
degradation of the strand with a 3H11032 terminus is greatly
reduced, but degradation of the 5H11032-terminal strand is in-
creased. This process creates a single-stranded DNA
with a 3H11032 end, which is used during subsequent steps in
recombination (Fig. 25–31). The 1,009 chi sequences
scattered throughout the E. coli genome enhance the
frequency of recombination about five- to tenfold within
1,000 bp of the chi site. The enhancement declines as
the distance from the site increases. Sequences that en-
hance recombination frequency have also been identi-
fied in several other organisms.
RecA is unusual among the proteins of DNA me-
tabolism in that its active form is an ordered, helical fil-
ament of up to several thousand RecA monomers that
assemble cooperatively on DNA (Fig. 25–34). This fila-
Chapter 25 DNA Metabolism982
FIGURE 25–32 Branch migration. When a template strand pairs with
two different complementary strands, a branch is formed at the point
where the three complementary strands meet. The branch “migrates”
when base pairing to one of the two complementary strands is bro-
ken and replaced with base pairing to the other complementary strand.
In the absence of an enzyme to direct it, this process can move the
branch spontaneously in either direction. Spontaneous branch migra-
tion is blocked wherever one of the otherwise complementary strands
has a sequence nonidentical to the other strand.
ATP
ADP+P
i
5H11032
3H11032
3H11032
5H11032
chi
Helicase and nuclease activities of
RecBCD degrade the DNA.
On reaching a chi sequence,
nuclease activity on the strand with
the 3H11032 end is suppressed. The other
strand continues to be degraded,
generating a 3H11032-terminal single-
stranded end.
OH 3H11032
OH 3H11032
RecBCD
enzyme
5H11032
3H11032
chi
5H11032
3H11032
FIGURE 25–33 Helicase and nuclease activities of the RecBCD en-
zyme. Entering at a double-stranded end, RecBCD unwinds and de-
grades the DNA until it encounters a chi sequence. The interaction
with chi alters the activity of RecBCD so that it generates a single-
stranded DNA with a 3H11032 end, suitable for subsequent steps in recom-
bination. Movement of the enzyme requires ATP hydrolysis. This en-
zyme is believed to help initiate homologous genetic recombination
in E. coli. It is also involved in the repair of double-strand breaks at
collapsed replication forks.
8885d_c25_948-994 2/11/04 1:57 PM Page 982 mac76 mac76:385_reb:
ment normally forms on single-stranded DNA, such as
that produced by the RecBCD enzyme. The filament will
also form on a duplex DNA with a single-strand gap; in
this case, the first RecA monomers bind to the single-
stranded DNA in the gap, after which the assembled fil-
ament rapidly envelops the neighboring duplex. The
RecF, RecO, and RecR proteins regulate the assembly
and disassembly of RecA filaments.
A useful model to illustrate the recombination ac-
tivities of the RecA filament is the in vitro DNA strand
exchange reaction (Fig. 25–35). A single strand of DNA
is first bound by RecA to establish the nucleoprotein fil-
ament. The RecA filament then takes up a homologous
duplex DNA and aligns it with the bound single strand.
Strands are then exchanged between the two DNAs to
create hybrid DNA. The exchange occurs at a rate of
6 bp/s and progresses in the 5H11032n3H11032 direction relative to
the single-stranded DNA within the RecA filament. This
reaction can involve either three or four strands (Fig.
25–35); in the latter case, a Holliday intermediate forms
during the process.
As the duplex DNA is incorporated within the RecA
filament and aligned with the bound single-stranded
DNA over regions of hundreds of base pairs, one strand
of the duplex switches pairing partners (Fig. 25–36,
step 2 ). Because DNA is a helical structure, continued
strand exchange requires an ordered rotation of the two
aligned DNAs. This brings about a spooling action (steps
3 and 4 ) that shifts the branch point along the helix.
ATP is hydrolyzed by RecA protein during this reaction.
Once a Holliday intermediate has formed, a host of
enzymes—topoisomerases, the RuvAB branch migration
protein, a resolvase, other nucleases, DNA polymerase
25.3 DNA Recombination 983
Circular single-
stranded DNA
Circular duplex DNA
with single-strand gap
RecA protein RecA protein
H11001H11001
Homologous linear
duplex DNA
H11001 H11001
Branched
intermediates
RecA protein binds to single-stranded or gapped
DNA. The complementary strand of the linear DNA
pairs with a circular single strand. The other
linear strand is displaced (left) or pairs with its
complement in the circular duplex to yield a
Holliday structure (right).
RecA
protein
RecA
protein
ADP H11001 P
i
ADP H11001 P
i
Continued branch migration yields a circular duplex
with a nick and a displaced linear strand (left) or a
partially single-stranded linear duplex (right).
ATP ATP
(a)
(b)
FIGURE 25–35 DNA strand-exchange reactions promoted by RecA
protein in vitro. Strand exchange involves the separation of one strand
of a duplex DNA from its complement and transfer of the strand to
an alternative complementary strand to form a new duplex (het-
eroduplex) DNA. The transfer forms a branched intermediate. Forma-
tion of the final product depends on branch migration, which is fa-
cilitated by RecA. The reaction can involve three strands (left) or a
reciprocal exchange between two homologous duplexes—four strands
in all (right). When four strands are involved, the branched interme-
diate that results is a Holliday intermediate. RecA protein promotes
the branch-migration phases of these reactions, using energy derived
from ATP hydrolysis.
FIGURE 25–34 RecA. (a) Nucleoprotein filament of RecA protein on
single-stranded DNA, as seen with the electron microscope. The stri-
ations indicate the right-handed helical structure of the filament.
(b) Surface contour model of a 24-subunit RecA filament. The filament
has six subunits per turn. One subunit is colored red to provide per-
spective (derived from PDB ID 2REB).
8885d_c25_948-994 2/11/04 1:57 PM Page 983 mac76 mac76:385_reb:
I or III, and DNA ligase—are required to complete re-
combination. The RuvC protein (M
r
20,000) of E. coli
cleaves Holliday intermediates to generate full-length,
unbranched chromosome products.
All Aspects of DNA Metabolism Come Together
to Repair Stalled Replication Forks
Like all cells, bacteria sustain high levels of DNA dam-
age even under normal growth conditions. Most DNA
lesions are repaired rapidly by base-excision repair,
nucleotide-excision repair, and the other pathways de-
scribed earlier. Nevertheless, almost every bacterial
replication fork encounters an unrepaired DNA lesion
or break at some point in its journey from the replica-
tion origin to the terminus (Fig. 25–28). DNA poly-
merase III cannot proceed past many types of DNA le-
sions, and these encounters tend to leave the lesion in
a single-strand gap. An encounter with a DNA strand
break creates a double-strand break. Both situations re-
quire recombinational DNA repair (Fig. 25–37). Under
normal growth conditions, stalled replication forks are
reactivated by an elaborate repair pathway encompass-
ing recombinational DNA repair, the restart of replica-
tion, and the repair of any lesions left behind. All as-
pects of DNA metabolism come together in this process.
After a replication fork has been halted, it can be
restored by at least two major paths, both of which re-
quire the RecA protein. The repair pathway for lesion-
containing DNA gaps also requires the RecF, RecO, and
RecR proteins. Repair of double-strand breaks requires
the RecBCD enzyme (Fig. 25–37). Additional recom-
bination steps are followed by a process called origin-
independent restart of replication, in which the
replication fork reassembles with the aid of a complex
of seven proteins (PriA, B, and C, and DnaB, C, G, and
T). This complex, originally discovered as a component
required for the replication of H9278X174 DNA in vitro,
is now termed the replication restart primosome.
Restart of the replication fork also requires DNA poly-
merase II, in a role not yet defined; this polymerase II
activity gives way to DNA polymerase III for the ex-
tensive replication generally required to complete the
chromosome.
The repair of stalled replication forks entails a coor-
dinated transition from replication to recombination and
back to replication. The recombination steps function to
fill the DNA gap or rejoin the broken DNA branch to recre-
ate the branched DNA structure at the replication fork.
Lesions left behind in what is now duplex DNA are
repaired by pathways such as base-excision or nucleotide-
excision repair. Thus a wide range of enzymes encom-
passing every aspect of DNA metabolism ultimately take
part in the repair of a stalled replication fork. This type
of repair process is clearly a primary function of the ho-
mologous recombination system of every cell, and defects
in recombinational DNA repair play an important role in
human disease (Box 25–1).
Site-Specific Recombination Results in Precise
DNA Rearrangements
Homologous genetic recombination can involve any two
homologous sequences. The second general type of re-
combination, site-specific recombination, is a very dif-
ferent type of process: recombination is limited to spe-
Chapter 25 DNA Metabolism984
RecA protein
5H11032
Three-stranded
pairing intermediate
Homologous
duplex DNA
Rotation
spools DNA
Branch
point
5H11032
5H11032
5H11032
5H11032
5H11032
5H11032
3H11032
3H11032
Homologous
duplex DNA
5H11032
ADP+P
i
ADP+P
i
Branch
migration
ATP
ATP
2
1
3
4
5
FIGURE 25–36 Model for DNA strand exchange mediated by RecA
protein. A three-strand reaction is shown. The balls representing RecA
protein are undersized relative to the thickness of DNA to clarify the
fate of the DNA strands. 1H22071 RecA protein forms a filament on the sin-
gle-stranded DNA. 2H22071 A homologous duplex incorporates into this
complex. 3H22071 As spooling shifts the three-stranded region from left to
right, one of the strands in the duplex is transferred to the single strand
originally bound in the filament. The other strand of the duplex is dis-
placed, and a new duplex forms within the filament. As rotation con-
tinues ( 4H22071 and 5H22071), the displaced strand separates entirely. In this
model, hydrolysis of ATP by RecA protein rotates the two DNA mol-
ecules relative to each other and thus directs the strand exchange from
left to right as shown.
8885d_c25_948-994 2/11/04 1:57 PM Page 984 mac76 mac76:385_reb:
FIGURE 25–37 Models for recombinational DNA repair of stalled
replication forks. The replication fork collapses on encountering a
DNA lesion (left) or strand break (right). Recombination enzymes pro-
mote the DNA strand transfers needed to repair the branched DNA
structure at the replication fork. A lesion in a single-strand gap is re-
paired in a reaction requiring the RecF, RecO, and RecR proteins.
Double-strand breaks are repaired in a pathway requiring the RecBCD
enzyme. Both pathways require RecA. Recombination intermediates
are processed by additional enzymes (e.g., RuvA, RuvB, and RuvC,
which process Holliday intermediates). Lesions in double-stranded
DNA are repaired by nucleotide-excision repair or other pathways.
The replication fork re-forms with the aid of enzymes catalyzing
origin-independent replication restart, and chromosomal replication
is completed. The overall process requires an elaborate coordination
of all aspects of bacterial DNA metabolism.
3H11032
5H11032
3H11032
5H11032
DNA lesion DNA nick
RecA
RecFOR
RecA
RecBCD
fork regression
strand
invasion
replicationPol I
branch
migration
resolution of
Holliday junction
reverse branch
migration
RuvAB
RuvC
Origin-independent
replication restart
25.3 DNA Recombination 985
cific sequences. Recombination reactions of this type
occur in virtually every cell, filling specialized roles that
vary greatly from one species to another. Examples in-
clude regulation of the expression of certain genes and
promotion of programmed DNA rearrangements in em-
bryonic development or in the replication cycles of some
viral and plasmid DNAs. Each site-specific recombina-
tion system consists of an enzyme called a recombinase
and a short (20 to 200 bp), unique DNA sequence where
the recombinase acts (the recombination site). One or
8885d_c25_948-994 2/11/04 1:57 PM Page 985 mac76 mac76:385_reb:
more auxiliary proteins may regulate the timing or out-
come of the reaction.
In vitro studies of many site-specific recombination
systems have elucidated some general principles, in-
cluding the fundamental reaction pathway (Fig. 25–38a).
A separate recombinase recognizes and binds to each
of two recombination sites on two different DNA mole-
cules or within the same DNA. One DNA strand in each
site is cleaved at a specific point within the site, and the
recombinase becomes covalently linked to the DNA at
the cleavage site through a phosphotyrosine (or phos-
phoserine) bond (step 1 ). The transient protein-DNA
linkage preserves the phosphodiester bond that is lost
in cleaving the DNA, so high-energy cofactors such as
ATP are unnecessary in subsequent steps. The cleaved
DNA strands are rejoined to new partners to form a
Holliday intermediate, with new phosphodiester bonds
created at the expense of the protein-DNA linkage (step
2 ). To complete the reaction, the process must be re-
peated at a second point within each of the two recom-
bination sites (steps 3 and 4 ). In some systems, both
strands of each recombination site are cut concurrently
and rejoined to new partners without the Holliday inter-
mediate. The exchange is always reciprocal and precise,
regenerating the recombination sites when the reaction
is complete. We can view a recombinase as a site-specific
endonuclease and ligase in one package.
The sequences of the recombination sites recognized
by site-specific recombinases are partially asymmetric
(nonpalindromic), and the two recombining sites align in
the same orientation during the recombinase reaction.
The outcome depends on the location and orientation of
the recombination sites (Fig. 25–39). If the two sites are
on the same DNA molecule, the reaction either inverts
or deletes the intervening DNA, determined by whether
the recombination sites have the opposite or the same
Chapter 25 DNA Metabolism986
(b)
2
5H11032
5H11032
5H11032
5H11032
3H11032
3H11032
3H11032
3H11032
5H11032
5H11032
5H11032
5H110323H11032
3H11032
3H11032
3H11032
5H11032
5H11032
5H11032
5H110323H11032
3H11032
3H11032
3H11032
5H11032
5H11032
5H11032
5H110323H11032
3H11032
3H11032
3H11032
5H11032
5H11032
5H11032
5H110323H11032
3H11032
3H11032
3H11032
Recombinase
Tyr
Tyr
Tyr
Tyr
Tyr-
P
P
-Tyr
Tyr
Tyr
Tyr
Tyr
Tyr
Tyr
Tyr
Tyr
Tyr
OH
OH
HO
HO
Tyr
1 4
3
(a)
FIGURE 25–38 A site-specific recombination reaction. (a) The reac-
tion shown here is for a common class of site-specific recombinases
called integrase-class recombinases (named after bacteriophage H9261 in-
tegrase, the first recombinase characterized). The reaction is carried
out within a tetramer of identical subunits. Recombinase subunits
bind to a specific sequence, often called simply the recombination
site. 1H22071 One strand in each DNA is cleaved at particular points within
the sequence. The nucleophile is the OH group of an active-site Tyr
residue, and the product is a covalent phosphotyrosine link between
protein and DNA. 2H22071 The cleaved strands join to new partners, pro-
ducing a Holliday intermediate. Steps 3H22071 and 4H22071 complete the re-
action by a process similar to the first two steps. The original sequence
of the recombination site is regenerated after recombining the DNA
flanking the site. These steps occur within a complex of multiple
recombinase subunits that sometimes includes other proteins not
shown here. (b) A surface contour model of a four-subunit integrase-
class recombinase called the Cre recombinase, bound to a Holliday
intermediate (shown with light blue and dark blue helix strands). The
protein has been rendered transparent so that the bound DNA is vis-
ible (derived from PDB ID 3CRX).
8885d_c25_948-994 2/11/04 1:57 PM Page 986 mac76 mac76:385_reb:
orientation, respectively. If the sites are on different
DNAs, the recombination is intermolecular; if one or both
DNAs are circular, the result is an insertion. Some re-
combinase systems are highly specific for one of these
reaction types and act only on sites with particular
orientations.
The first site-specific recombination system studied
in vitro was that encoded by bacteriophage H9261. When H9261
phage DNA enters an E. coli cell, a complex series of
regulatory events commits the DNA to one of two fates.
The H9261 DNA either replicates and produces more bacte-
riophages (destroying the host cell) or integrates into
the host chromosome, replicating passively along with
the chromosome for many cell generations. Integration
is accomplished by a phage-encoded recombinase (H9261 in-
tegrase) that acts at recombination sites on the phage
and bacterial DNAs—at attachment sites attP and attB,
respectively (Fig. 25–40). The role of site-specific re-
combination in regulating gene expression is considered
in Chapter 28.
25.3 DNA Recombination 987
(a)
Sites of exchange
(b)
Inversion Deletion and insertion
+
deletioninsertion
FIGURE 25–39 Effects of site-specific recombination. The outcome
of site-specific recombination depends on the location and orientation
of the recombination sites (red and green) in a double-stranded DNA
molecule. Orientation here (shown by arrowheads) refers to the order
of nucleotides in the recombination site, not the 5H11032n3H11032 direction.
(a) Recombination sites with opposite orientation in the same DNA
molecule. The result is an inversion. (b) Recombination sites with the
same orientation, either on one DNA molecule, producing a deletion,
or on two DNA molecules, producing an insertion.
Bacterial attachment
site (attB)
Phage
attachment
site (attP)
Point of
crossover
H9261 Phage
DNA
Integration:
H9261 integrase (INT)
IHF
Excision:
H9261 integrase (INT)
IHF
FIS H11001 XIS
attR
attL
Integrated
H9261 phage DNA
(prophage)
E. coli chromosome
FIGURE 25–40 Integration and excision of bacteriophage H9261 DNA at
the chromosomal target site. The attachment site on the H9261 phage DNA
(attP) shares only 15 bp of complete homology with the bacterial site
(attB) in the region of the crossover. The reaction generates two new
attachment sites (attR and attL) flanking the integrated phage DNA.
The recombinase is the H9261 integrase (or INT protein). Integration and
excision use different attachment sites and different auxiliary proteins.
Excision uses the proteins XIS, encoded by the bacteriophage, and FIS,
encoded by the bacterium. Both reactions require the protein IHF (in-
tegration host factor), encoded by the bacterium.
8885d_c25_948-994 2/11/04 1:57 PM Page 987 mac76 mac76:385_reb:
Transposable Genetic Elements Move from One
Location to Another
We now consider the third general type of recombina-
tion system: recombination that allows the movement
of transposable elements, or transposons. These seg-
ments of DNA, found in virtually all cells, move, or
“jump,” from one place on a chromosome (the donor
site) to another on the same or a different chromosome
(the target site). DNA sequence homology is not usu-
ally required for this movement, called transposition;
the new location is determined more or less randomly.
Insertion of a transposon in an essential gene could kill
the cell, so transposition is tightly regulated and usually
very infrequent. Transposons are perhaps the simplest
of molecular parasites, adapted to replicate passively
within the chromosomes of host cells. In some cases
they carry genes that are useful to the host cell, and
thus exist in a kind of symbiosis with the host.
Bacteria have two classes of transposons. Insertion
sequences (simple transposons) contain only the se-
quences required for transposition and the genes for
proteins (transposases) that promote the process.
Complex transposons contain one or more genes in
addition to those needed for transposition. These extra
genes might, for example, confer resistance to antibi-
otics and thus enhance the survival chances of the host
cell. The spread of antibiotic-resistance elements among
disease-causing bacterial populations that is rendering
some antibiotics ineffectual (pp. 925–926) is mediated
in part by transposition.
Bacterial transposons vary in structure, but most
have short repeated sequences at each end that serve
as binding sites for the transposase. When transposition
occurs, a short sequence at the target site (5 to 10 bp)
is duplicated to form an additional short repeated se-
quence that flanks each end of the inserted transposon
(Fig. 25–42). These duplicated segments result from the
cutting mechanism used to insert a transposon into the
DNA at a new location.
There are two general pathways for transposition in
bacteria. In direct or simple transposition (Fig. 25–43,
left), cuts on each side of the transposon excise it, and
the transposon moves to a new location. This leaves a
double-strand break in the donor DNA that must be
Complete Chromosome Replication Can Require
Site-Specific Recombination
Recombinational DNA repair of a circular bacterial chro-
mosome, while essential, sometimes generates deleteri-
ous byproducts. The resolution of a Holliday junction at
a replication fork by a nuclease such as RuvC, followed
by completion of replication, can give rise to one of two
products: the usual two monomeric chromosomes or a
contiguous dimeric chromosome (Fig. 25–41). In the lat-
ter case, the covalently linked chromosomes cannot be
segregated to daughter cells at cell division and the di-
viding cells become “stuck.” A specialized site-specific
recombination system in E. coli, the XerCD system, con-
verts the dimeric chromosomes to monomeric chromo-
somes so that cell division can proceed. The reaction is
a site-specific deletion reaction (Fig. 25–39b). This is
another example of the close coordination between DNA
recombination processes and other aspects of DNA
metabolism.
Chapter 25 DNA Metabolism988
termination
of replication
Fork undergoing
recombinational
DNA repair
resolution to monomers
by XerCD system
Dimeric
genome
2
FIGURE 25–41 DNA deletion to undo a deleterious effect of re-
combinational DNA repair. The resolution of a Holliday intermediate
during recombinational DNA repair (if cut at the points indicated by
red arrows) can generate a contiguous dimeric chromosome. A spe-
cialized site-specific recombinase in E. coli, XerCD, converts the dimer
to monomers, allowing chromosome segregation and cell division to
proceed.
8885d_c25_948-994 2/11/04 1:57 PM Page 988 mac76 mac76:385_reb:
repaired. At the target site, a staggered cut is made (as
in Fig. 25–42), the transposon is inserted into the break,
and DNA replication fills in the gaps to duplicate the
target site sequence. In replicative transposition (Fig.
25–43, right), the entire transposon is replicated, leav-
ing a copy behind at the donor location. A cointegrate
is an intermediate in this process, consisting of the
donor region covalently linked to DNA at the target site.
Two complete copies of the transposon are present in
the cointegrate, both having the same relative orienta-
tion in the DNA. In some well-characterized trans-
posons, the cointegrate intermediate is converted to
products by site-specific recombination, in which spe-
cialized recombinases promote the required deletion
reaction.
25.3 DNA Recombination 989
FIGURE 25–42 Duplication of the DNA sequence at a target site
when a transposon is inserted. The duplicated sequences are shown
in red. These sequences are generally only a few base pairs long, so
their size (compared with that of a typical transposon) is greatly ex-
aggerated in this drawing.
Transposon
Terminal
repeats
Transposase makes
staggered cuts
in the target site.
Target DNA
The transposon is inserted
at the site of the cuts.
Replication fills in the gaps,
duplicating the sequences flanking
the transposon.
Cleavage
Free ends
of transposons
attack target DNA
Gaps filled (left)
or entire transposon
replicated (right)
Site-specific
recombination
(within transposon)
DNA polymerase
DNA ligase
replication
Direct
transposition
Replicative
transposition
OH
HO
3H11032
3H11032
3H11032
Target
DNA
Cointegrate
5H11032
OH
HO
3H11032
3H11032
HO
3H11032
OH
3H11032
1
2
3
4
FIGURE 25–43 Two general pathways for transposition: direct
(simple) and replicative. 1H22071 The DNA is first cleaved on each side of
the transposon, at the sites indicated by arrows. 2H22071 The liberated 3H11032-
hydroxyl groups at the ends of the transposon act as nucleophiles in
a direct attack on phosphodiester bonds in the target DNA. The target
phosphodiester bonds are staggered (not directly across from each
other) in the two DNA strands. 3H22071 The transposon is now linked to
the target DNA. In direct transposition, replication fills in gaps at each
end. In replicative transposition, the entire transposon is replicated to
create a cointegrate intermediate. 4H22071 The cointegrate is often resolved
later, with the aid of a separate site-specific recombination system.
The cleaved host DNA left behind after direct transposition is either
repaired by DNA end-joining or degraded (not shown). The latter
outcome can be lethal to an organism.
8885d_c25_948-994 2/11/04 1:57 PM Page 989 mac76 mac76:385_reb:
V
1
V
2
V
3
V
300
J
1
J
2
J
4
J
5
C
V segments
(1 to ~300) C segmentJ segments
Germ-line
DNA
V
1
V
2
V
3
V
84
J
4
J
5
C
Mature light-
chain gene
DNA of
B lymphocyte
recombination resulting in
deletion of DNA between
V and J segments
V
84
J
4
J
5
C
transcription
5H110323H11032
V
84
J
4
C
Processed
mRNA
removal of sequences
between J
4
and C
by mRNA splicing
Light-chain
polypeptide
translation
Antibody
molecule
Light chain
Constant
region
Variable
region
protein folding and assembly
Primary
transcript
Heavy chain
Chapter 25 DNA Metabolism990
FIGURE 25–44 Recombination of the V and J
gene segments of the human IgG kappa light
chain. This process is designed to generate
antibody diversity. At the top is shown the
arrangement of IgG-coding sequences in a bone
marrow stem cell. Recombination deletes the DNA
between a particular V segment and a J segment. After
transcription, the transcript is processed by RNA splicing,
as described in Chapter 26; translation produces the
light-chain polypeptide. The light chain can combine
with any of 5,000 possible heavy chains to produce an
antibody molecule.
Eukaryotes also have transposons, structurally sim-
ilar to bacterial transposons, and some use similar trans-
position mechanisms. In other cases, however, the
mechanism of transposition appears to involve an RNA
intermediate. Evolution of these transposons is inter-
twined with the evolution of certain classes of RNA
viruses. Both are described in the next chapter.
Immunoglobulin Genes Assemble by Recombination
Some DNA rearrangements are a programmed part of
development in eukaryotic organisms. An important ex-
ample is the generation of complete immunoglobulin
genes from separate gene segments in vertebrate
genomes. A human (like other mammals) is capable of
producing millions of different immunoglobulins (anti-
bodies) with distinct binding specificities, even though
the human genome contains only ~35,000 genes. Recom-
bination allows an organism to produce an extraordinary
diversity of antibodies from a limited DNA-coding
capacity. Studies of the recombination mechanism
reveal a close relationship to DNA transposition and
suggest that this system for generating antibody diver-
sity may have evolved from an ancient cellular invasion
of transposons.
We can use the human genes that encode proteins
of the immunoglobulin G (IgG) class to illustrate how
antibody diversity is generated. Immunoglobulins con-
sist of two heavy and two light polypeptide chains (see
Fig. 5–23). Each chain has two regions, a variable re-
gion, with a sequence that differs greatly from one im-
munoglobulin to another, and a region that is virtually
constant within a class of immunoglobulins. There are
also two distinct families of light chains, kappa and
lambda, which differ somewhat in the sequences of their
constant regions. For all three types of polypeptide
chain (heavy chain, and kappa and lambda light chains),
diversity in the variable regions is generated by a simi-
lar mechanism. The genes for these polypeptides are di-
vided into segments, and the genome contains clusters
with multiple versions of each segment. The joining of
one version of each of the segments creates a complete
gene.
Figure 25–44 depicts the organization of the DNA
encoding the kappa light chains of human IgG and shows
how a mature kappa light chain is generated. In undif-
ferentiated cells, the coding information for this
polypeptide chain is separated into three segments. The
V (variable) segment encodes the first 95 amino acid
residues of the variable region, the J (joining) segment
encodes the remaining 12 residues of the variable re-
gion, and the C segment encodes the constant region.
The genome contains ~300 different V segments, 4 dif-
ferent J segments, and 1 C segment.
As a stem cell in the bone marrow differentiates to
form a mature B lymphocyte, one V segment and one J
segment are brought together by a specialized recom-
bination system (Fig. 25–44). During this programmed
DNA deletion, the intervening DNA is discarded. There
are about 300 H11003 4 H11005 1,200 possible V–J combinations.
8885d_c25_990 2/12/04 11:32 AM Page 990 mac76 mac76:385_reb:
The recombination process is not as precise as the site-
specific recombination described earlier, so additional
variation occurs in the sequence at the V–J junction. This
increases the overall variation by a factor of at least 2.5,
thus the cells can generate about 2.5 H11003 1,200 H11005 3,000
different V–J combinations. The final joining of the V–J
combination to the C region is accomplished by an RNA-
splicing reaction after transcription, a process described
in Chapter 26.
The recombination mechanism for joining the V and
J segments is illustrated in Figure 25–45. Just beyond
each V segment and just before each J segment lie re-
combination signal sequences (RSS). These are bound
by proteins called RAG1 and RAG2 (recombination ac-
tivating gene). The RAG proteins catalyze the formation
of a double-strand break between the signal sequences
and the V (or J) segments to be joined. The V and J
segments are then joined with the aid of a second com-
plex of proteins.
The genes for the heavy chains and the lambda light
chains form by similar processes. Heavy chains have more
gene segments than light chains, with more than 5,000
possible combinations. Because any heavy chain can com-
bine with any light chain to generate an immunoglobulin,
each human has at least 3,000 H11003 5,000 H11005 1.5 H11003 10
7
possible IgGs. And additional diversity is generated by
high mutation rates (of unknown mechanism) in the
V sequences during B-lymphocyte differentiation. Each
mature B lymphocyte produces only one type of anti-
body, but the range of antibodies produced by different
cells is clearly enormous.
Did the immune system evolve in part from ancient
transposons? The mechanism for generation of the
double-strand breaks by RAG1 and RAG2 does mirror
several reaction steps in transposition (Fig. 25–45). In
addition, the deleted DNA, with its terminal RSS, has a
sequence structure found in most transposons. In the test
tube, RAG1 and RAG2 can associate with this deleted
DNA and insert it, transposonlike, into other DNA
molecules (probably a rare reaction in B lymphocytes).
Although we cannot know for certain, the properties of
the immunoglobulin gene rearrangement system suggest
an intriguing origin in which the distinction between
host and parasite has become blurred by evolution.
SUMMARY 25.3 DNA Recombination
■ DNA sequences are rearranged in recombina-
tion reactions, usually in processes tightly
coordinated with DNA replication or repair.
■ Homologous genetic recombination can take
place between any two DNA molecules that
share sequence homology. In meiosis (in
eukaryotes), this type of recombination helps
to ensure accurate chromosomal segregation
and create genetic diversity. In both bacteria
and eukaryotes it serves in the repair of stalled
replication forks. A Holliday intermediate forms
during homologous recombination.
■ Site-specific recombination occurs only at
specific target sequences, and this process can
also involve a Holliday intermediate.
Recombinases cleave the DNA at specific
points and ligate the strands to new partners.
This type of recombination is found in virtually
all cells, and its many functions include DNA
integration and regulation of gene expression.
■ In virtually all cells, transposons use
recombination to move within or between
chromosomes. In vertebrates, a programmed
recombination reaction related to transposition
joins immunoglobulin gene segments to form
immunoglobulin genes during B-lymphocyte
differentiation.
25.3 DNA Recombination 991
cleavage
intramolecular
transesterification
double-strand
break repair
via end-joining
RAG1
RAG2
RSSV segment J segment
VJ
RSS
Intervening
DNA
HO
OH
FIGURE 25–45 Mechanism of immunoglobulin gene rearrangement.
The RAG1 and RAG2 proteins bind to the recombination signal se-
quences (RSS) and cleave one DNA strand between the RSS and the
V (or J) segments to be joined. The liberated 3H11032 hydroxyl then acts as
a nucleophile, attacking a phosphodiester bond in the other strand to
create a double-strand break. The resulting hairpin bends on the V and
J segments are cleaved, and the ends are covalently linked by a com-
plex of proteins specialized for end-joining repair of double-strand
breaks. The steps in the generation of the double-strand break cat-
alyzed by RAG1 and RAG2 are chemically related to steps in trans-
position reactions.
8885d_c25_948-994 2/11/04 1:57 PM Page 991 mac76 mac76:385_reb:
Chapter 25 DNA Metabolism992
Key Terms
template 950
semiconservative
replication 950
replication fork 951
origin 952
Okazaki fragments 952
leading strand 952
lagging strand 952
nucleases 952
exonuclease 952
endonuclease 952
DNA polymerase I 952
primer 954
primer terminus 954
processivity 954
proofreading 955
DNA polymerase III
955
replisome 957
helicases 957
topoisomerases 957
primases 958
DNA ligase 958
primosome 962
catenane 963
DNA polymerase H9251 965
DNA polymerase H9254 965
DNA polymerase H9255 965
mutation 966
base-excision repair
971
DNA glycosylases 971
AP site 971
AP endonucleases 972
DNA photolyases 974
recombinational DNA
repair 976
error-prone translesion
DNA synthesis 976
SOS response 976
homologous genetic re-
combination 978
site-specific recombina-
tion 978
DNA transposition 978
meiosis 979
branch migration 980
double-strand break repair
model 980
Holliday intermediate
980
transposons 988
transposition 988
insertion sequence
988
cointegrate 989
Terms in bold are defined in the glossary.
Further Reading
General
Friedberg, E.C., Walker, G.C., & Siede, W. (1995) DNA Repair
and Mutagenesis, American Society for Microbiology, Washington,
DC.
A thorough treatment of DNA metabolism and a good place to
start exploring this field.
Kornberg, A. & Baker, T.A. (1991) DNA Replication, 2nd edn,
W. H. Freeman and Company, New York.
Excellent primary source for all aspects of DNA metabolism.
DNA Replication
Benkovic, S.J., Valentine, A.M., & Salinas, F. (2001)
Replisome-mediated DNA replication. Annu. Rev. Biochem. 70,
181–208.
This review describes the similar strategies and enzymes of
DNA replication in different classes of organisms.
Boye, E., Lobner-Olesen, A., & Skarstad, K. (2000) Limiting
DNA replication to once and only once. EMBO Rep. 1, 479–483.
Good summary of the mechanisms by which replication initia-
tion is regulated.
Davey, M.J. & O’Donnell, M. (2000) Mechanisms of DNA repli-
cation. Curr. Opin. Chem. Biol. 4, 581–586.
Ellison, V. & Stillman, B. (2001) Opening of the clamp: an inti-
mate view of an ATP-driven biological machine. Cell 106, 655–660.
Frick, D.N. & Richardson, C.C. (2001) DNA primases. Annu.
Rev. Biochem. 70, 39–80.
Hübscher, U., Maga, G., & Spadari, S. (2002) Eukaryotic DNA
polymerases. Annu. Rev. Biochem. 71, 133–163.
Good summary of the properties and roles of the more than
one dozen known eukaryotic DNA polymerases.
Jeruzalmi, D., O’Donnell, M., & Kuriyan, J. (2002) Clamp
loaders and sliding clamps. Curr. Opin. Struct. Biol. 12, 217–224.
Summary of some of the elegant work elucidating how clamp
loaders function.
Kamada, K., Horiuchi, T., Ohsumi, K., Shimamoto, N., &
Morikawa, K. (1996) Structure of a replication-terminator protein
complexed with DNA. Nature 383, 598–603.
The report revealing the structure of the Tus-Ter complex.
Katayama, T. (2001) Feedback controls restrain the initiation of
Escherichia coli chromosomal replication. Mol. Microbiol. 41,
9–17.
Kool, E.T. (2002) Active site tightness and substrate fit in DNA
replication. Annu. Rev. Biochem. 71, 191–219.
Excellent summary of the molecular basis of replication fidelity
by a DNA polymerase—base-pair geometry as well as hydrogen
bonding.
Lemon, K.P. & Grossman, A.D. (2001) The extrusion-capture
model for chromosome partitioning in bacteria. Genes Dev. 15,
2031–2041.
Report describing the replication factory model for bacterial
DNA replication.
Nishitani, H. & Lygerou, Z. (2002) Control of DNA replication
licensing in a cell cycle. Genes Cells 7, 523–534.
A good summary of recent advances in the understanding of
how eukaryotic DNA replication is initiated.
Toyn, J.H., Toone, M.W., Morgan, B.A., & Johnston, L.H.
(1995) The activation of DNA replication in yeast. Trends
Biochem. Sci. 20, 70–73.
DNA Repair
Begley, T.J. & Samson, L.D. (2003) AlkB mystery solved:
oxidative demethylation of N1-methyladenine and N3-methylcytosine
adducts by a direct reversal mechanism. Trends Biochem. Sci. 28,
2–5.
Friedberg, E.C., Fischhaber, P.L., & Kisker, C. (2001) Error-
prone DNA polymerases: novel structures and the benefits of
infidelity. Cell 107, 9–12.
8885d_c25_948-994 2/11/04 1:57 PM Page 992 mac76 mac76:385_reb:
Chapter 25 Problems 993
Goodman, M.F. (2002) Error-prone repair DNA polymerases in
prokaryotes and eukaryotes. Annu. Rev. Biochem. 71, 17–50.
Review of a class of DNA polymerases that continues to grow.
Kolodner, R.D. (1995) Mismatch repair: mechanisms and relation-
ship to cancer susceptibility. Trends Biochem. Sci. 20, 397–401.
Lindahl, T. & Wood, R.D. (1999) Quality control by DNA repair.
Science 286, 1897–1905.
Marnett, L.J. & Plastaras, J.P. (2001) Endogenous DNA
damage and mutation. Trends Genet. 17, 214–221.
McCullough, A.K., Dodson, M.L., & Lloyd, R.S. (1999)
Initiation of base excision repair: glycosylase mechanisms and
structures. Annu. Rev. Biochem. 68, 255–286.
Modrich, P. & Lahue, R. (1996) Mismatch repair in replication
fidelity, genetic recombination, and cancer biology. Annu. Rev.
Biochem. 65, 101–133.
Sancar, A. (1996) DNA excision repair. Annu. Rev. Biochem. 65,
43–81.
Sutton, M.D., Smith, B.T., Godoy, V.G., & Walker, G.C. (2000)
The SOS response: recent insights into umuDC-dependent mutage-
nesis and DNA damage tolerance. Annu. Rev. Genet. 34, 479–497.
Wood, R.D., Mitchell, M., Sgouros, J., & Lindahl T. (2001)
Human DNA repair genes. Science 291, 1284–1289.
Description of what an early look at the human genome reveals
about DNA repair.
DNA Recombination
Cox, M.M. (2001) Historical overview: searching for replication
help in all of the rec places. Proc. Natl. Acad. Sci. USA 98,
8173–8180.
A review of how recombination was shown to be a replication
fork repair process.
Craig, N.L. (1995) Unity in transposition reactions. Science 270,
253–254.
Eggleston, A.K. & West, S.C. (1996) Exchanging partners:
recombination in E. coli. Trends Genet. 12, 20–26.
Gellert, M. (2002) V(D)J recombination: RAG proteins, repair
factors, and regulation. Annu. Rev. Biochem. 71, 101–132.
Hallet, B. & Sherratt, D.J. (1997) Transposition and site-
specific recombination: adapting DNA cut-and-paste mechanisms
to a variety of genetic rearrangements. FEMS Microbiol. Rev. 21,
157–178.
Kogoma, T. (1996) Recombination by replication. Cell 85,
625–627.
Lieber, M. (1996) Immunoglobulin diversity: rearranging by
cutting and repairing. Curr. Biol. 6, 134–136.
Lusetti, S.L. & Cox, M.M. (2002) The bacterial RecA protein
and the recombinational DNA repair of stalled replication forks.
Annu. Rev. Biochem. 71, 71–100.
Marians, K.J. (2000) PriA-directed replication fork restart in
Escherichia coli. Trends Biochem. Sci. 25, 185–189.
Paques, F. & Haber, J.E. (1999) Multiple pathways of
recombination induced by double-strand breaks in Saccharomyces
cerevisiae. Microbiol. Mol. Biol. Rev. 63, 349–404.
Van Duyne, G.D. (2001) A structural view of Cre-loxP site-
specific recombination. Annu. Rev. Biophys. Biomol. Struct. 30,
87–104.
11. Conclusions from the Meselson-Stahl Experiment
The Meselson-Stahl experiment (see Fig. 25–2) proved that
DNA undergoes semiconservative replication in E. coli. In the
“dispersive” model of DNA replication, the parent DNA
strands are cleaved into pieces of random size, then joined
with pieces of newly replicated DNA to yield daughter du-
plexes. In the Meselson-Stahl experiment, each strand would
contain random segments of heavy and light DNA. Explain
how the results of Meselson and Stahl’s experiment ruled out
such a model.
12. Heavy Isotope Analysis of DNA Replication A cul-
ture of E. coli growing in a medium containing
15
NH
4
Cl is
switched to a medium containing
14
NH
4
Cl for three genera-
tions (an eightfold increase in population). What is the mo-
lar ratio of hybrid DNA (
15
N–
14
N) to light DNA (
14
N–
14
N) at
this point?
13. Replication of the E. coli Chromosome The E. coli
chromosome contains 4,639,221 bp.
(a) How many turns of the double helix must be un-
wound during replication of the E. coli chromosome?
(b) From the data in this chapter, how long would it take
to replicate the E. coli chromosome at 37 H11034C if two replica-
tion forks proceeded from the origin? Assume replication oc-
curs at a rate of 1,000 bp/s. Under some conditions E. coli
cells can divide every 20 min. How might this be possible?
(c) In the replication of the E. coli chromosome, about
how many Okazaki fragments would be formed? What factors
guarantee that the numerous Okazaki fragments are assem-
bled in the correct order in the new DNA?
14. Base Composition of DNAs Made from Single-
Stranded Templates Predict the base composition of the
total DNA synthesized by DNA polymerase on templates pro-
vided by an equimolar mixture of the two complementary
strands of bacteriophage ?X174 DNA (a circular DNA mole-
cule). The base composition of one strand is A, 24.7%; G,
24.1%; C, 18.5%; and T, 32.7%. What assumption is neces-
sary to answer this problem?
15. DNA Replication Kornberg and his colleagues incu-
bated soluble extracts of E. coli with a mixture of dATP, dTTP,
dGTP, and dCTP, all labeled with
32
P in the H9251-phosphate
group. After a time, the incubation mixture was treated with
trichloroacetic acid, which precipitates the DNA but not the
nucleotide precursors. The precipitate was collected, and the
extent of precursor incorporation into DNA was determined
Problems
8885d_c25_948-994 2/11/04 1:57 PM Page 993 mac76 mac76:385_reb:
Chapter 25 DNA Metabolism994
from the amount of radioactivity present in the precipitate.
(a) If any one of the four nucleotide precursors were
omitted from the incubation mixture, would radioactivity be
found in the precipitate? Explain.
(b) Would
32
P be incorporated into the DNA if only
dTTP were labeled? Explain.
(c) Would radioactivity be found in the precipitate if
32
P
labeled the H9252 or H9253 phosphate rather than the H9251 phosphate of
the deoxyribonucleotides? Explain.
16. Leading and Lagging Strands Prepare a table that
lists the names and compares the functions of the precursors,
enzymes, and other proteins needed to make the leading ver-
sus lagging strands during DNA replication in E. coli.
17. Function of DNA Ligase Some E. coli mutants con-
tain defective DNA ligase. When these mutants are exposed
to
3
H-labeled thymine and the DNA produced is sedimented
on an alkaline sucrose density gradient, two radioactive bands
appear. One corresponds to a high molecular weight fraction,
the other to a low molecular weight fraction. Explain.
18. Fidelity of Replication of DNA What factors pro-
mote the fidelity of replication during the synthesis of the
leading strand of DNA? Would you expect the lagging strand
to be made with the same fidelity? Give reasons for your
answers.
19. Importance of DNA Topoisomerases in DNA Repli-
cation DNA unwinding, such as that occurring in replica-
tion, affects the superhelical density of DNA. In the absence
of topoisomerases, the DNA would become overwound ahead
of a replication fork as the DNA is unwound behind it. A bac-
terial replication fork will stall when the superhelical density
(H9268) of the DNA ahead of the fork reaches H110010.14 (see Chap-
ter 24).
Bidirectional replication is initiated at the origin of a
6,000 bp plasmid in vitro, in the absence of topoisomerases.
The plasmid initially has a H9268 of H110020.06. How many base pairs
will be unwound and replicated by each replication fork be-
fore the forks stall? Assume that each fork travels at the same
rate and that each includes all components necessary for
elongation except topoisomerase.
10. The Ames Test In a nutrient medium that lacks histi-
dine, a thin layer of agar containing ~10
9
Salmonella ty-
phimurium histidine auxotrophs (mutant cells that require
histidine to survive) produces ~13 colonies over a two-day
incubation period at 37 H11034C (see Fig. 25–19). How do these
colonies arise in the absence of histidine? The experiment is
repeated in the presence of 0.4 H9262g of 2-aminoanthracene. The
number of colonies produced over two days exceeds 10,000.
What does this indicate about 2-aminoanthracene? What can
you surmise about its carcinogenicity?
11. DNA Repair Mechanisms Vertebrate and plant cells
often methylate cytosine in DNA to form 5-methylcytosine
(see Fig. 8–5a). In these same cells, a specialized repair sys-
tem recognizes G–T mismatches and repairs them to GmC base
pairs. How might this repair system be advantageous to the
cell? (Explain in terms of the presence of 5-methylcytosine in
the DNA.)
12. DNA Repair in People with Xeroderma Pig-
mentosum The condition known as xeroderma pig-
mentosum (XP) arises from mutations in at least seven dif-
ferent human genes. The deficiencies are generally in genes
encoding enzymes involved in some part of the pathway for
human nucleotide-excision repair. The various types of XP
are labeled A through G (XPA, XPB, etc.), with a few addi-
tional variants lumped under the label XPV.
Cultures of cells from healthy individuals and from pa-
tients with XPG are irradiated with ultraviolet light. The DNA
is isolated and denatured, and the resulting single-stranded
DNA is characterized by analytical ultracentrifugation.
(a) Samples from the normal fibroblasts show a signifi-
cant reduction in the average molecular weight of the single-
stranded DNA after irradiation, but samples from the XPG fi-
broblasts show no such reduction. Why might this be?
(b) If you assume that a nucleotide-excision repair sys-
tem is operative, which step might be defective in the fi-
broblasts from the patients with XPG? Explain.
13. Holliday Intermediates How does the formation of
Holliday intermediates in homologous genetic recombination
differ from their formation in site-specific recombination?
8885d_c25_948-994 2/11/04 1:57 PM Page 994 mac76 mac76:385_reb:
chapter
E
xpression of the information in a gene generally in-
volves production of an RNA molecule transcribed
from a DNA template. Strands of RNA and DNA may
seem quite similar at first glance, differing only in that
RNA has a hydroxyl group at the 2H11032 position of the al-
dopentose and uracil instead of thymine. However, un-
like DNA, most RNAs carry out their functions as sin-
gle strands, strands that fold back on themselves and
have the potential for much greater structural diversity
than DNA (Chapter 8). RNA is thus suited to a variety
of cellular functions.
RNA is the only macromolecule known to have a
role both in the storage and transmission of information
and in catalysis, which has led to much speculation
about its possible role as an essential chemical inter-
mediate in the development of life on this planet. The
discovery of catalytic RNAs, or ribozymes, has changed
the very definition of an enzyme, extending it beyond
the domain of proteins. Proteins nevertheless remain es-
sential to RNA and its functions. In the modern cell, all
nucleic acids, including RNAs, are complexed with pro-
teins. Some of these complexes are quite elaborate, and
RNA can assume both structural and catalytic roles
within complicated biochemical machines.
All RNA molecules except the RNA genomes of cer-
tain viruses are derived from information permanently
stored in DNA. During transcription, an enzyme sys-
tem converts the genetic information in a segment of
double-stranded DNA into an RNA strand with a base
sequence complementary to one of the DNA strands.
Three major kinds of RNA are produced. Messenger
RNAs (mRNAs) encode the amino acid sequence of
one or more polypeptides specified by a gene or set of
genes. Transfer RNAs (tRNAs) read the information
encoded in the mRNA and transfer the appropriate
amino acid to a growing polypeptide chain during pro-
tein synthesis. Ribosomal RNAs (rRNAs) are con-
stituents of ribosomes, the intricate cellular machines
that synthesize proteins. Many additional specialized
RNAs have regulatory or catalytic functions or are pre-
cursors to the three main classes of RNA.
During replication the entire chromosome is usually
copied, but transcription is more selective. Only partic-
ular genes or groups of genes are transcribed at any one
time, and some portions of the DNA genome are never
transcribed. The cell restricts the expression of genetic
information to the formation of gene products needed
at any particular moment. Specific regulatory sequences
mark the beginning and end of the DNA segments to be
transcribed and designate which strand in duplex DNA
is to be used as the template. The regulation of tran-
scription is described in detail in Chapter 28.
In this chapter we examine the synthesis of RNA on
a DNA template and the postsynthetic processing and
turnover of RNA molecules. In doing so we encounter
many of the specialized functions of RNA, including cat-
alytic functions. Interestingly, the substrates for RNA
enzymes are often other RNA molecules. We also de-
scribe systems in which RNA is the template and DNA
the product, rather than vice versa. The information
pathways thus come full circle, revealing that template-
dependent nucleic acid synthesis has standard rules
26
995
RNA METABOLISM
26.1 DNA-Dependent Synthesis of RNA 996
26.2 RNA Processing 1007
26.3 RNA-Dependent Synthesis of RNA and DNA 1021
The RNA of the cell is partly in the nucleus, partly in
particles in the cytoplasm and partly as the “soluble” RNA
of the cell sap; many workers have shown that all these
three fractions turn over differently. It is very important to
realize in any discussion of the role of RNA in the cell
that it is very inhomogeneous metabolically, and probably
of more than one type.
—Francis H. C. Crick, article in Symposia of the
Society for Experimental Biology, 1958
8885d_c26_995-1035 2/12/04 11:18 AM Page 995 mac34 mac34: kec_420:
regardless of the nature of template or product (RNA
or DNA). This examination of the biological intercon-
version of DNA and RNA as information carriers leads
to a discussion of the evolutionary origin of biological
information.
26.1 DNA-Dependent Synthesis of RNA
Our discussion of RNA synthesis begins with a compar-
ison between transcription and DNA replication (Chap-
ter 25). Transcription resembles replication in its fun-
damental chemical mechanism, its polarity (direction of
synthesis), and its use of a template. And like replica-
tion, transcription has initiation, elongation, and termi-
nation phases—though in the literature on transcrip-
tion, initiation is further divided into discrete phases of
DNA binding and initiation of RNA synthesis. Tran-
scription differs from replication in that it does not
require a primer and, generally, involves only limited
segments of a DNA molecule. Additionally, within
transcribed segments only one DNA strand serves as a
template.
RNA Is Synthesized by RNA Polymerases
The discovery of DNA polymerase and its dependence
on a DNA template spurred a search for an enzyme that
synthesizes RNA complementary to a DNA strand. By
1960, four research groups had independently detected
an enzyme in cellular extracts that could form an RNA
polymer from ribonucleoside 5H11032-triphosphates. Subse-
quent work on the purified Escherichia coli RNA poly-
merase helped to define the fundamental properties of
transcription (Fig. 26–1). DNA-dependent RNA poly-
merase requires, in addition to a DNA template, all four
ribonucleoside 5H11032-triphosphates (ATP, GTP, UTP, and
CTP) as precursors of the nucleotide units of RNA, as
well as Mg
2H11001
. The protein also binds one Zn
2H11001
. The
chemistry and mechanism of RNA synthesis closely re-
semble those used by DNA polymerases (see Fig. 25–5).
RNA polymerase elongates an RNA strand by adding ri-
bonucleotide units to the 3H11032-hydroxyl end, building RNA
in the 5H11032n3H11032 direction. The 3H11032-hydroxyl group acts as a
nucleophile, attacking the H9251 phosphate of the incoming
ribonucleoside triphosphate (Fig. 26–1b) and releasing
pyrophosphate. The overall reaction is
RNA polymerase requires DNA for activity and is most
active when bound to a double-stranded DNA. As noted
above, only one of the two DNA strands serves as a tem-
plate. The template DNA strand is copied in the 3H11032n5H11032
direction (antiparallel to the new RNA strand), just as
in DNA replication. Each nucleotide in the newly formed
RNA is selected by Watson-Crick base-pairing interac-
H11001(NMP)
n
(NMP)
nH110011
RNA
NTP
Lengthened RNA
PP
i
H11001
tions; U residues are inserted in the RNA to pair with A
residues in the DNA template, G residues are inserted
to pair with C residues, and so on. Base-pair geometry
(see Fig. 25–6) may also play a role in base selection.
Unlike DNA polymerase, RNA polymerase does not
require a primer to initiate synthesis. Initiation occurs
when RNA polymerase binds at specific DNA sequences
called promoters (described below). The 5H11032-triphos-
phate group of the first residue in a nascent (newly
formed) RNA molecule is not cleaved to release PP
i
, but
instead remains intact throughout the transcription
process. During the elongation phase of transcription,
the growing end of the new RNA strand base-pairs tem-
porarily with the DNA template to form a short hybrid
Chapter 26 RNA Metabolism996
Rewinding
Transcription
bubble
RNA-DNA
hybrid,
~8 bp
Nontemplate
strand
Unwinding
Direction of transcription
Template
strand
Active
site
dNTP
channel
5H11032
3H11032
RNA
5H11032
3H11032
DNA
MECHANISM FIGURE 26–1 Transcription by RNA polymerase in
E. coli. For synthesis of an RNA strand complementary to one of two
DNA strands in a double helix, the DNA is transiently unwound.
(a) About 17 bp are unwound at any given time. RNA polymerase and
the bound transcription bubble move from left to right along the DNA
as shown; facilitating RNA synthesis. The DNA is unwound ahead and
rewound behind as RNA is transcribed. Red arrows show the direc-
tion in which the DNA must rotate to permit this process. As the DNA
is rewound, the RNA-DNA hybrid is displaced and the RNA strand
extruded. The RNA polymerase is in close contact with the DNA ahead
of the transcription bubble, as well as with the separated DNA strands
and the RNA within and immediately behind the bubble. A channel
in the protein funnels new nucleoside triphosphates (NTPs) to the poly-
merase active site. The polymerase footprint encompasses about 35 bp
of DNA during elongation.
(b) Catalytic mechanism of RNA synthesis by RNA polymerase.
Note that this is essentially the same mechanism used by DNA poly-
(a)
8885d_c26_995-1035 2/12/04 11:18 AM Page 996 mac34 mac34: kec_420:
26.1 DNA-Dependent Synthesis of RNA 997
RNA-DNA double helix, estimated to be 8 bp long (Fig.
26–1a). The RNA in this hybrid duplex “peels off” shortly
after its formation, and the DNA duplex re-forms.
To enable RNA polymerase to synthesize an RNA
strand complementary to one of the DNA strands, the
DNA duplex must unwind over a short distance, form-
ing a transcription “bubble.” During transcription, the
E. coli RNA polymerase generally keeps about 17 bp
unwound. The 8 bp RNA-DNA hybrid occurs in this un-
wound region. Elongation of a transcript by E. coli RNA
polymerase proceeds at a rate of 50 to 90 nucleotides/s.
Because DNA is a helix, movement of a transcription
bubble requires considerable strand rotation of the nu-
cleic acid molecules. DNA strand rotation is restricted
in most DNAs by DNA-binding proteins and other struc-
tural barriers. As a result, a moving RNA polymerase
generates waves of positive supercoils ahead of the tran-
scription bubble and negative supercoils behind (Fig.
26–1c). This has been observed both in vitro and in vivo
(in bacteria). In the cell, the topological problems
caused by transcription are relieved through the action
of topoisomerases (Chapter 24).
The two complementary DNA strands have differ-
ent roles in transcription. The strand that serves as tem-
plate for RNA synthesis is called the template strand.
The DNA strand complementary to the template, the
nontemplate strand, or coding strand, is identical in
base sequence to the RNA transcribed from the gene,
CH
2
O
P
P
P
RNA polymerase
P
PP
i
O
OO
–
O
O
CH
2
HH
OH
–
O
O
B
HH
H
O
O
–
–
O
O
OO
–
O
–
O
–
H
OHOH
O
B
C
Asp
Asp
Asp
O
Template strand
CH
2
HH
OHO
O
–
O
O
O
B
CH
2
HH
OHOH
O
B
Template strand
Mg
2+
Mg
2+
(b)
Negative
supercoils
Positive
supercoils
Direction of transcription
5H11032
3H11032
RNA
merases (see Fig. 25–5b). The addition of nu-
cleotides involves an attack by the 3H11032-hydroxyl
group at the end of the growing RNA molecule on
the H9251 phosphate of the incoming NTP. The reaction involves
two Mg
2H11001
ions, coordinated to the phosphate groups of the incoming
NTP and to three Asp residues (Asp
460
, Asp
462
, and Asp
464
in the H9252H11032
subunit of the E. coli RNA polymerase), which are highly conserved
in the RNA polymerases of all species. One Mg
2H11001
ion facilitates at-
tack by the 3H11032-hydroxyl group on the H9251 phosphate of the NTP; the
other Mg
2H11001
ion facilitates displacement of the pyrophosphate; and
both metal ions stabilize the pentacovalent transition state.
(c) Changes in the supercoiling of DNA brought about by tran-
scription. Movement of an RNA polymerase along DNA tends to cre-
ate positive supercoils (overwound DNA) ahead of the transcription
bubble and negative supercoils (underwound DNA) behind it. In a
cell, topoisomerases rapidly eliminate the positive supercoils and reg-
ulate the level of negative supercoiling (Chapter 24). (c)
8885d_c26_995-1035 2/12/04 11:18 AM Page 997 mac34 mac34: kec_420:
with U in the RNA in place of T in the DNA (Fig. 26–2).
The coding strand for a particular gene may be located
in either strand of a given chromosome (as shown in
Fig. 26–3 for a virus). The regulatory sequences that
control transcription (described later in this chapter)
are by convention designated by the sequences in the
coding strand.
The DNA-dependent RNA polymerase of E. coli is
a large, complex enzyme with five core subunits
(H9251
2
H9252H9252H11032H9275; M
r
390,000) and a sixth subunit, one of a group
designated H9268, with variants designated by size (mole-
cular weight). The H9268 subunit binds transiently to the
core and directs the enzyme to specific binding sites on
the DNA (described below). These six subunits consti-
tute the RNA polymerase holoenzyme (Fig. 26–4). The
RNA polymerase holoenzyme of E. coli thus exists in
several forms, depending on the type of H9268 subunit. The
most common subunit is H9268
70
(M
r
70,000), and the up-
coming discussion focuses on the corresponding RNA
polymerase holoenzyme.
RNA polymerases lack a separate proofreading
3H11032n5H11032 exonuclease active site (such as that of many
DNA polymerases), and the error rate for transcription
is higher than that for chromosomal DNA replication—
approximately one error for every 10
4
to 10
5
ribonu-
cleotides incorporated into RNA. Because many copies
of an RNA are generally produced from a single gene
and all RNAs are eventually degraded and replaced, a
mistake in an RNA molecule is of less consequence to
the cell than a mistake in the permanent information
stored in DNA. Many RNA polymerases, including bac-
terial RNA polymerase and the eukaryotic RNA poly-
merase II (discussed below), do pause when a mispaired
base is added during transcription, and they can remove
mismatched nucleotides from the 3H11032 end of a transcript
by direct reversal of the polymerase reaction. But we
do not yet know whether this activity is a true proof-
reading function and to what extent it may contribute
to the fidelity of transcription.
RNA Synthesis Begins at Promoters
Initiation of RNA synthesis at random points in a DNA
molecule would be an extraordinarily wasteful process.
Instead, an RNA polymerase binds to specific sequences
in the DNA called promoters, which direct the tran-
scription of adjacent segments of DNA (genes). The
sequences where RNA polymerases bind can be quite
variable, and much research has focused on identifying
the particular sequences that are critical to promoter
function.
In E. coli, RNA polymerase binding occurs within a
region stretching from about 70 bp before the tran-
scription start site to about 30 bp beyond it. By con-
vention, the DNA base pairs that correspond to the be-
ginning of an RNA molecule are given positive numbers,
and those preceding the RNA start site are given nega-
tive numbers. The promoter region thus extends be-
tween positions H1100270 and H1100130. Analyses and compar-
isons of the most common class of bacterial promoters
(those recognized by an RNA polymerase holoenzyme
containing H9268
70
) have revealed similarities in two short
sequences centered about positions H1100210 and H1100235 (Fig.
26–5). These sequences are important interaction sites
for the H9268
70
subunit. Although the sequences are not
identical for all bacterial promoters in this class, certain
nucleotides that are particularly common at each posi-
tion form a consensus sequence (recall the E. coli
Chapter 26 RNA Metabolism998
DNA
RNA transcripts
3.6 H11003 10
4
bp
(5H11032)
(3H11032)
C
G
G
C
C
G
T
A
(3H11032)
(5H11032)
A
T
T
A
A
T
G
C
C
G
G
C
T
A
T
A
T
A
DNA nontemplate (coding) strand
DNA template strand
RNA transcript(5H11032)(3H11032) CGCUAUAGCGUUU
FIGURE 26–2 Template and nontemplate (coding) DNA
strands. The two complementary strands of DNA are defined
by their function in transcription. The RNA transcript is synthe-
sized on the template strand and is identical in sequence (with
U in place of T) to the nontemplate strand, or coding strand.
FIGURE 26–3 Organization of coding information in the adenovirus
genome. The genetic information of the adenovirus genome (a con-
veniently simple example) is encoded by a double-stranded DNA mol-
ecule of 36,000 bp, both strands of which encode proteins. The in-
formation for most proteins is encoded by the top strand—by
convention, the strand transcribed from left to right—but some is en-
coded by the bottom strand, which is transcribed in the opposite
direction. Synthesis of mRNAs in adenovirus is actually much more
complex than shown here. Many of the mRNAs shown for the upper
strand are initially synthesized as a single, long transcript (25,000
nucleotides), which is then extensively processed to produce the
separate mRNAs. Adenovirus causes upper respiratory tract infections
in some vertebrates.
8885d_c26_995-1035 2/12/04 11:18 AM Page 998 mac34 mac34: kec_420:
oriC consensus sequence; see Fig. 25–11). The con-
sensus sequence at the H1100210 region is (5H11032)TATAAT(3H11032);
the consensus sequence at the H1100235 region is
(5H11032)TTGACA(3H11032). A third AT-rich recognition element,
called the UP (upstream promoter) element, occurs be-
tween positions H1100240 and H1100260 in the promoters of cer-
tain highly expressed genes. The UP element is bound
by the H9251 subunit of RNA polymerase. The efficiency with
which an RNA polymerase binds to a promoter and ini-
tiates transcription is determined in large measure by
these sequences, the spacing between them, and their
distance from the transcription start site.
Many independent lines of evidence attest to the
functional importance of the sequences in the H1100235 and
H1100210 regions. Mutations that affect the function of a
given promoter often involve a base pair in these re-
gions. Variations in the consensus sequence also affect
the efficiency of RNA polymerase binding and tran-
scription initiation. A change in only one base pair can
decrease the rate of binding by several orders of mag-
nitude. The promoter sequence thus establishes a basal
level of expression that can vary greatly from one E. coli
gene to the next. A method that provides information
about the interaction between RNA polymerase and pro-
moters is illustrated in Box 26–1.
The pathway of transcription initiation is becoming
much better defined (Fig. 26–6a). It consists of two ma-
jor parts, binding and initiation, each with multiple
steps. First, the polymerase binds to the promoter, form-
ing, in succession, a closed complex (in which the bound
DNA is intact) and an open complex (in which the
bound DNA is intact and partially unwound near the
H1100210 sequence). Second, transcription is initiated within
the complex, leading to a conformational change that
converts the complex to the elongation form, followed
by movement of the transcription complex away from
26.1 DNA-Dependent Synthesis of RNA 999
bH11032
b
j
a
a
q
FIGURE 26–4 Structure of the RNA polymerase holoenzyme of the
bacterium Thermus aquaticus. (Derived from PDB ID 1IW7.) The over-
all structure of this enzyme is very similar to that of the E. coli RNA
polymerase; no DNA or RNA is shown here. The H9252 subunit is in gray,
the H9252H11032 subunit is white; the two H9251 subunits are different shades of red;
the H9275 subunit is yellow; the H9268 subunit is orange. The image on the left
is oriented as in Figure 26–6. When the structure is rotated 180H11034 about
the y axis (right) the small H9275 subunit is visible.
trp
lac
recA
araBAD
N
17
TTAACTTTGACA
N
17
TATGTTTTTACA
N
16
TATAATTTGATA
N
18
TACTGT ACTGACG
N
7
N
6
N
7
N
6
A
UP element H1100235 Region H1100210 Region Spacer RNA startSpacer
rrnB P1
NNAAA T TTTTNNAAAANNN N TTGACA TATAAT N
6
H110011
N
17
AGAAAATTATTTTAAATTTCCT N GTGTCA TATAAT N
8
AN
16
Consensus
sequence
AA
TT
A
T
A
A
FIGURE 26–5 Typical E. coli promoters recognized by an RNA poly-
merase holoenzyme containing H9268
70
. Sequences of the nontemplate
strand are shown, read in the 5H11032n3H11032 direction, as is the convention
for representations of this kind. The sequences vary from one promoter
to the next, but comparisons of many promoters reveal similarities,
particularly in the H1100210 and H1100235 regions. The sequence element UP,
not present in all E. coli promoters, is shown in the P1 promoter for
the highly expressed rRNA gene rrnB. UP elements, generally occur-
ring in the region between H1100240 and H1100260, strongly stimulate tran-
scription at the promoters that contain them. The UP element in the
rrnB P1 promoter encompasses the region between H1100238 and H1100259.
The consensus sequence for E. coli promoters recognized by H9268
70
is
shown second from the top. Spacer regions contain slightly variable
numbers of nucleotides (N). Only the first nucleotide coding the RNA
transcript (at position H110011) is shown.
8885d_c26_995-1035 2/12/04 11:18 AM Page 999 mac34 mac34: kec_420:
Chapter 26 RNA Metabolism1000
5H11032 3H11032
5H110323H11032
Initiation
transcription
initiation
promoter
clearance
Binding
Closed
complex
H1100235 H1100210 H110011
j
Open
complex
Elongation
form
j
FIGURE 26–6 Transcription initiation and elongation by E. coli RNA
polymerase. (a) Initiation of transcription requires several steps gen-
erally divided into two phases, binding and initiation. In the binding
phase, the initial interaction of the RNA polymerase with the promoter
leads to formation of a closed complex, in which the promoter DNA
is stably bound but not unwound. A 12 to 15 bp region of DNA—
from within the H1100210 region to position H110012 or H110013—is then unwound
to form an open complex. Additional intermediates (not shown) have
been detected in the pathways leading to the closed and open com-
plexes, along with several changes in protein conformation. The ini-
tiation phase encompasses transcription initiation and promoter clear-
ance. Once the first 8 or 9 nucleotides of a new RNA are synthesized,
the H9268 subunit is released and the polymerase leaves the promoter and
becomes committed to elongation of the RNA.
(b) Structure of the RNA core polymerase from E. coli. RNA and
DNA are included here to illustrate a polymerase in the elongation
phase. Subunit coloring matches Figure 26–4: the H9252 and H9252H11032 subunits
are light gray and white; the H9251 subunits, shades of red. The H9275 subunit
is on the opposite side of the complex and is not visible in this view.
The H9268 subunit is not present in this complex, having dissociated after
the initiation steps. The top panel shows the entire complex. The ac-
tive site for transcription is in a cleft between the H9252 and H9252H11032 subunits.
In the middle panel, the H9252 subunit has been removed, exposing the
active site and the DNA-RNA hybrid region. The active site is marked
in part by a Mg
2H11001
ion (red). In the bottom panel, all the protein has
been removed to reveal the circuitous path taken by the DNA and
RNA through the complex.
(a) (b)
8885d_c26_995-1035 2/12/04 11:18 AM Page 1000 mac34 mac34: kec_420:
26.1 DNA-Dependent Synthesis of RNA 1001
the promoter (promoter clearance). Any of these steps
can be affected by the specific makeup of the promoter
sequences. The H9268 subunit dissociates as the polymerase
enters the elongation phase of transcription (Fig. 26–6a).
E. coli has other classes of promoters, bound by
RNA polymerase holoenzymes with different H9268 subunits.
An example is the promoters of the heat-shock genes.
The products of this set of genes are made at higher lev-
els when the cell has received an insult, such as a sud-
den increase in temperature. RNA polymerase binds to
the promoters of these genes only when H9268
70
is replaced
with the H9268
32
(M
r
32,000) subunit, which is specific for
the heat-shock promoters (see Fig. 28–3). By using
different H9268 subunits the cell can coordinate the expres-
sion of sets of genes, permitting major changes in cell
physiology.
Transcription Is Regulated at Several Levels
Requirements for any gene product vary with cellular
conditions or developmental stage, and transcription of
each gene is carefully regulated to form gene products
only in the proportions needed. Regulation can occur at
any step in transcription, including elongation and ter-
mination. However, much of the regulation is directed
at the polymerase binding and transcription initiation
steps outlined in Figure 26–6. Differences in promoter
sequences are just one of several levels of control.
The binding of proteins to sequences both near to
and distant from the promoter can also affect levels of
gene expression. Protein binding can activate tran-
scription by facilitating either RNA polymerase binding
or steps further along in the initiation process, or it can
repress transcription by blocking the activity of the
polymerase. In E. coli, one protein that activates tran-
scription is the cAMP receptor protein (CRP), which
increases the transcription of genes coding for enzymes
that metabolize sugars other than glucose when cells are
grown in the absence of glucose. Repressors are pro-
teins that block the synthesis of RNA at specific genes.
In the case of the Lac repressor (Chapter 28), tran-
scription of the genes for the enzymes of lactose me-
tabolism is blocked when lactose is unavailable.
Transcription is the first step in the complicated and
energy-intensive pathway of protein synthesis, so much
of the regulation of protein levels in both bacterial and
eukaryotic cells is directed at transcription, particularly
its early stages. In Chapter 28 we describe many mech-
anisms by which this regulation is accomplished.
Specific Sequences Signal Termination
of RNA Synthesis
RNA synthesis is processive (that is, the RNA polymer-
ase has high processivity; p. 954)—necessarily so, be-
cause if an RNA polymerase released an RNA transcript
prematurely, it could not resume synthesis of the same
RNA but instead would have to start over. However, an
encounter with certain DNA sequences results in a
pause in RNA synthesis, and at some of these sequences
transcription is terminated. The process of termination
is not yet well understood in eukaryotes, so our focus
is again on bacteria. E. coli has at least two classes of
termination signals: one class relies on a protein factor
called H9267 (rho) and the other is H9267-independent.
Most H9267-independent terminators have two distin-
guishing features. The first is a region that produces an
RNA transcript with self-complementary sequences,
permitting the formation of a hairpin structure (see Fig.
8–21a) centered 15 to 20 nucleotides before the pro-
jected end of the RNA strand. The second feature is a
highly conserved string of three A residues in the
template strand that are transcribed into U residues
near the 3H11032 end of the hairpin. When a polymerase ar-
rives at a termination site with this structure, it pauses
(Fig. 26–7). Formation of the hairpin structure in the
RNA disrupts several AUU base pairs in the RNA-DNA
hybrid segment and may disrupt important interactions
Isomerize
Terminate
Escape
Bypass
Pause
3H11032
3H11032
5H11032
5H11032
FIGURE 26–7 Model for H9267-independent termination of transcription
in E. coli. RNA polymerase pauses at a variety of DNA sequences,
some of which are terminators. One of two outcomes is then possible:
the polymerase bypasses the site and continues on its way, or the com-
plex undergoes a conformational change (isomerization). In the latter
case, intramolecular pairing of complementary sequences in the newly
formed RNA transcript may form a hairpin that disrupts the RNA-DNA
hybrid and/or the interactions between the RNA and the polymerase,
resulting in isomerization. An AUU hybrid region at the 3H11032 end of the
new transcript is relatively unstable, and the RNA dissociates completely,
leading to termination and dissociation of the RNA molecule. This is
the usual outcome at terminators. At other pause sites, the complex
may escape after the isomerization step to continue RNA synthesis.
8885d_c26_995-1033 2/12/04 12:39 PM Page 1001 mac34 mac34: kec_420:
BOX 26–1 WORKING IN BIOCHEMISTRY
RNA Polymerase Leaves Its Footprint
on a Promoter
Footprinting, a technique derived from principles
used in DNA sequencing, identifies the DNA se-
quences bound by a particular protein. Researchers
isolate a DNA fragment thought to contain sequences
recognized by a DNA-binding protein and radiolabel
one end of one strand (Fig. 1). They then use chem-
ical or enzymatic reagents to introduce random breaks
in the DNA fragment (averaging about one per mole-
cule). Separation of the labeled cleavage products (bro-
ken fragments of various lengths) by high-resolution
electrophoresis produces a ladder of radioactive
bands. In a separate tube, the cleavage procedure is
repeated on copies of the same DNA frag-
ment in the presence of the DNA-binding
protein. The researchers then subject the
two sets of cleavage products to elec-
trophoresis and compare them side by
side. A gap (“footprint”) in the series of
radioactive bands derived from the DNA-
protein sample, attributable to protection
of the DNA by the bound protein, identi-
fies the sequences that the protein binds.
The precise location of the protein-
binding site can be determined by di-
rectly sequencing (see Fig. 8–37) copies
of the same DNA fragment and including
the sequencing lanes (not shown here)
on the same gel with the footprint. Fig-
ure 2 shows footprinting results for the
binding of RNA polymerase to a DNA
fragment containing a promoter. The
polymerase covers 60 to 80 bp; protec-
tion by the bound enzyme includes the
H1100210 and H1100235 regions.
Nontemplate strand
H11002H11001C
H110011
H1100220
H1100230
H1100240
H1100250
H1100210
Regions bound by
RNA polymerase
5H11032
3H11032
3H11032
5H11032
Treat with DNase
under conditions in
which each strand is
cut once (on average).
No cuts are made in
the area where RNA
polymerase has bound.
Solution of identical DNA fragments
radioactively labeled at one end of one strand.
Isolate labeled DNA fragments
and denature. Only labeled strands
are detected in next step.
Separate fragments by polyacrylamide gel electrophoresis
and visualize radiolabeled bands on x-ray film.
DNA
migration
Missing bands indicate
where RNA polymerase
was bound to DNA.
Uncut DNA
fragment
Site of
DNase cut
(H11001)(H11002)
H11001H11002
DNase I
FIGURE 1 Footprint analysis of the RNA polymerase–binding site
on a DNA fragment. Separate experiments are carried out in the
presence (H11001) and absence (H11002) of the polymerase.
FIGURE 2 Footprinting results of RNA
polymerase binding to the lac promoter
(see Fig. 26–5). In this experiment, the
5H11032 end of the nontemplate strand was
radioactively labeled. Lane C is a
control in which the labeled DNA
fragments were cleaved with a
chemical reagent that produces a more
uniform banding pattern.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1002 mac34 mac34: kec_420:
between RNA and the RNA polymerase, facilitating dis-
sociation of the transcript.
The H9267-dependent terminators lack the sequence of
repeated A residues in the template strand but usually
include a CA-rich sequence called a rut (rho utilization)
element. The H9267 protein associates with the RNA at spe-
cific binding sites and migrates in the 5H11032n3H11032 direction
until it reaches the transcription complex that is paused
at a termination site. Here it contributes to release of
the RNA transcript. The H9267 protein has an ATP-depend-
ent RNA-DNA helicase activity that promotes translo-
cation of the protein along the RNA, and ATP is hy-
drolyzed by H9267 protein during the termination process.
The detailed mechanism by which the protein promotes
the release of the RNA transcript is not known.
Eukaryotic Cells Have Three Kinds of Nuclear
RNA Polymerases
The transcriptional machinery in the nucleus of a eu-
karyotic cell is much more complex than that in bacte-
ria. Eukaryotes have three RNA polymerases, desig-
nated I, II, and III, which are distinct complexes but have
certain subunits in common. Each polymerase has a spe-
cific function and is recruited to a specific promoter
sequence.
RNA polymerase I (Pol I) is responsible for the syn-
thesis of only one type of RNA, a transcript called pre-
ribosomal RNA (or pre-rRNA), which contains the pre-
cursor for the 18S, 5.8S, and 28S rRNAs (see Fig.
26–22). Pol I promoters vary greatly in sequence from
one species to another. The principal function of RNA
polymerase II (Pol II) is synthesis of mRNAs and some
specialized RNAs. This enzyme can recognize thousands
of promoters that vary greatly in sequence. Many Pol II
promoters have a few sequence features in common, in-
cluding a TATA box (eukaryotic consensus sequence
TATAAA) near base pair H1100230 and an Inr sequence (ini-
tiator) near the RNA start site at H110011 (Fig. 26–8).
RNA polymerase III (Pol III) makes tRNAs, the 5S
rRNA, and some other small specialized RNAs. The pro-
moters recognized by Pol III are well characterized. In-
terestingly, some of the sequences required for the reg-
ulated initiation of transcription by Pol III are located
within the gene itself, whereas others are in more con-
ventional locations upstream of the RNA start site
(Chapter 28).
RNA Polymerase II Requires Many Other Protein
Factors for Its Activity
RNA polymerase II is central to eukaryotic gene ex-
pression and has been studied extensively. Although this
polymerase is strikingly more complex than its bacter-
ial counterpart, the complexity masks a remarkable con-
servation of structure, function, and mechanism. Pol II
is a huge enzyme with 12 subunits. The largest subunit
(RBP1) exhibits a high degree of homology to the H9252H11032
subunit of bacterial RNA polymerase. Another subunit
(RBP2) is structurally similar to the bacterial H9252 subunit,
and two others (RBP3 and RBP11) show some struc-
tural homology to the two bacterial H9251 subunits. Pol II
must function with genomes that are more complex and
with DNA molecules more elaborately packaged than in
bacteria. The need for protein-protein contacts with the
numerous other protein factors required to navigate this
labyrinth accounts in large measure for the added com-
plexity of the eukaryotic polymerase.
The largest subunit of Pol II also has an unusual fea-
ture, a long carboxyl-terminal tail consisting of many re-
peats of a consensus heptad amino acid sequence
–YSPTSPS–. There are 27 repeats in the yeast enzyme
(18 exactly matching the consensus) and 52 (21 exact)
in the mouse and human enzymes. This carboxyl-
terminal domain (CTD) is separated from the main body
of the enzyme by an unstructured linker sequence. The
CTD has many important roles in Pol II function, as out-
lined below.
RNA polymerase II requires an array of other pro-
teins, called transcription factors, in order to form
the active transcription complex. The general tran-
scription factors required at every Pol II promoter
26.1 DNA-Dependent Synthesis of RNA 1003
YYAN
T
YY
A
TATAAA
TATA boxVarious
regulatory
sequences
Inr
H1100230 H110011
3H110325H11032
FIGURE 26–8 Common sequences in promoters recognized by eu-
karyotic RNA polymerase II. The TATA box is the major assembly point
for the proteins of the preinitiation complexes of Pol II. The DNA is
unwound at the initiator sequence (Inr), and the transcription start site
is usually within or very near this sequence. In the Inr consensus se-
quence shown here, N represents any nucleotide; Y, a pyrimidine nu-
cleotide. Many additional sequences serve as binding sites for a wide
variety of proteins that affect the activity of Pol II. These sequences are
important in regulating Pol II promoters and vary greatly in type and
number, and in general the eukaryotic promoter is much more com-
plex than suggested here. Many of the sequences are located within
a few hundred base pairs of the TATA box on the 5H11032 side; others may
be thousands of base pairs away. The sequence elements summarized
here are more variable among the Pol II promoters of eukaryotes than
among the E. coli promoters (see Fig. 26–5). Many Pol II promoters
lack a TATA box or a consensus Inr element or both. Additional se-
quences around the TATA box and downstream (to the right as drawn)
of Inr may be recognized by one or more transcription factors.
8885d_c26_995-1033 2/12/04 12:39 PM Page 1003 mac34 mac34: kec_420:
(factors usually designated TFII with an additional iden-
tifier) are highly conserved in all eukaryotes (Table
26–1). The process of transcription by Pol II can be de-
scribed in terms of several phases—assembly, initiation,
elongation, termination—each associated with charac-
teristic proteins (Fig. 26–9). The step-by-step pathway
described below leads to active transcription in vitro. In
the cell, many of the proteins may be present in larger,
preassembled complexes, simplifying the pathways for
assembly on promoters. As you read about this process,
consult Figure 26–9 and Table 26–1 to help keep track
of the many participants.
Chapter 26 RNA Metabolism1004
H1100230
TATA
TFIIB
TBP (or TFIID and/or TFIIA)
DNA
TFIIF – Pol II
TFIIE
TFIIH
DNA unwinding to
produce open complex
phosphorylation of
Pol II, initiation,
and promoter escape
Closed complex
Open complex
5H11032
3H11032
TFIID
TFIIA
TFIIB
TFIIF
TFIIE
TFIIH
Pol II
TBP
Unwound DNA
RNA
Inr
H110011
TFIID
TFIIA
TFIIB
TBP
Inr
P
P
P
P
P
P
TFIIH
RNA
Pol II
release and
dephosphorylation
elongation
termination
Elongation
factors
TFIIE
(a)
(b)
FIGURE 26–9 Transcription at RNA polymerase II promoters. (a) The
sequential assembly of TBP (often with TFIIA), TFIIB, TFIIF plus Pol II,
TFIIE, and TFIIH results in a closed complex. TBP often binds as part
of a larger complex, TFIID. Some of the TFIID subunits play a role in
transcription regulation (see Fig. 28–30). Within the complex, the DNA
is unwound at the Inr region by the helicase activity of TFIIH and per-
haps of TFIIE, creating an open complex. The carboxyl-terminal do-
main of the largest Pol II subunit is phosphorylated by TFIIH, and the
polymerase then escapes the promoter and begins transcription. Elon-
gation is accompanied by the release of many transcription factors
and is also enhanced by elongation factors (see Table 26–1). After ter-
mination, Pol II is released, dephosphorylated, and recycled. (b) The
structure of human TBP (gray) bound to DNA (blue and white) (PDB
ID 1TGH).
8885d_c26_995-1035 2/12/04 11:18 AM Page 1004 mac34 mac34: kec_420:
Assembly of RNA Polymerase and Transcription Factors at a
Promoter The formation of a closed complex begins
when the TATA-binding protein (TBP) binds to the
TATA box (Fig. 26–9b). TBP is bound in turn by the
transcription factor TFIIB, which also binds to DNA on
either side of TBP. TFIIA binding, although not always
essential, can stabilize the TFIIB-TBP complex on the
DNA and can be important at nonconsensus promoters
where TBP binding is relatively weak. The TFIIB-TBP
complex is next bound by another complex consisting
of TFIIF and Pol II. TFIIF helps target Pol II to its pro-
moters, both by interacting with TFIIB and by reducing
the binding of the polymerase to nonspecific sites on
the DNA. Finally, TFIIE and TFIIH bind to create the
closed complex. TFIIH has DNA helicase activity that
promotes the unwinding of DNA near the RNA start site
(a process requiring the hydrolysis of ATP), thereby cre-
ating an open complex. Counting all the subunits of the
various essential factors (excluding TFIIA), this mini-
mal active assembly has more than 30 polypeptides.
RNA Strand Initiation and Promoter Clearance TFIIH has an
additional function during the initiation phase. A kinase
activity in one of its subunits phosphorylates Pol II at
many places in the CTD (Fig. 26–9). Several other pro-
tein kinases, including CDK9 (cyclin-dependent kinase
9), which is part of the complex pTEFb (positive tran-
scription elongation factor b), also phosphorylate the
CTD. This causes a conformational change in the over-
all complex, initiating transcription. Phosphorylation of
the CTD is also important during the subsequent elon-
gation phase, and it affects the interactions between the
transcription complex and other enzymes involved in
processing the transcript (as described below).
During synthesis of the initial 60 to 70 nucleotides
of RNA, first TFIIE and then TFIIH is released, and Pol
II enters the elongation phase of transcription.
Elongation, Termination, and Release TFIIF remains asso-
ciated with Pol II throughout elongation. During this
stage, the activity of the polymerase is greatly enhanced
by proteins called elongation factors (Table 26–1). The
elongation factors suppress pausing during transcription
and also coordinate interactions between protein com-
plexes involved in the posttranscriptional processing of
mRNAs. Once the RNA transcript is completed, tran-
scription is terminated. Pol II is dephosphorylated and
recycled, ready to initiate another transcript (Fig. 26–9).
Regulation of RNA Polymerase II Activity Regulation of tran-
scription at Pol II promoters is quite elaborate. It in-
volves the interaction of a wide variety of other proteins
with the preinitiation complex. Some of these regula-
tory proteins interact with transcription factors, others
with Pol II itself. Many interact through TFIID, a com-
plex of about 12 proteins, including TBP and certain
26.1 DNA-Dependent Synthesis of RNA 1005
TABLE 26–1 Proteins Required for Initiation of Transcription at the RNA Polymerase II (Pol II)
Promoters of Eukaryotes
Transcription Number of
protein subunits Subunit(s) M
r
Function(s)
Initiation
Pol II 12 10,000–220,000 Catalyzes RNA synthesis
TBP (TATA-binding protein) 1 38,000 Specifically recognizes the TATA box
TFIIA 3 12,000, 19,000, 35,000 Stabilizes binding of TFIIB and TBP to the promoter
TFIIB 1 35,000 Binds to TBP; recruits Pol II–TFIIF complex
TFIIE 2 34,000, 57,000 Recruits TFIIH; has ATPase and helicase activities
TFIIF 2 30,000, 74,000 Binds tightly to Pol II; binds to TFIIB and prevents
binding of Pol II to nonspecific DNA sequences
TFIIH 12 35,000–89,000 Unwinds DNA at promoter (helicase activity);
phosphorylates Pol II (within the CTD);
recruits nucleotide-excision repair proteins
Elongation
*
ELL
?
1 80,000
p-TEFb 2 43,000, 124,000 Phosphorylates Pol II (within the CTD)
SII (TFIIS) 1 38,000
Elongin (SIII) 3 15,000, 18,000, 110,000
*The function of all elongation factors is to suppress the pausing or arrest of transcription by the Pol II–TFIIF complex.
?
Name derived from eleven-nineteen lysine-rich leukemia. The gene for ELL is the site of chromosomal recombination events frequently
associated with acute myeloid leukemia.
8885d_c26_995-1033 2/12/04 2:46 PM Page 1005 mac34 mac34: kec_420:
TBP-associated factors, or TAFs. The regulation of tran-
scription is described in more detail in Chapter 28.
Diverse Functions of TFIIH In eukaryotes, the repair of
damaged DNA (see Table 25–5) is more efficient within
genes that are actively being transcribed than for other
damaged DNA, and the template strand is repaired
somewhat more efficiently than the nontemplate strand.
These remarkable observations are explained by the al-
ternative roles of the TFIIH subunits. Not only does
TFIIH participate in the formation of the closed com-
plex during assembly of a transcription complex (as de-
scribed above), but some of its subunits are also essen-
tial components of the separate nucleotide-excision
repair complex (see Fig. 25–24).
When Pol II transcription halts at the site of a
DNA lesion, TFIIH can interact with the lesion
and recruit the entire nucleotide-excision repair com-
plex. Genetic loss of certain TFIIH subunits can produce
human diseases. Some examples are xeroderma pig-
mentosum (see Box 25–1) and Cockayne’s syndrome,
which is characterized by arrested growth, photosensi-
tivity, and neurological disorders. ■
DNA-Dependent RNA Polymerase Undergoes
Selective Inhibition
The elongation of RNA strands by RNA polymerase in
both bacteria and eukaryotes is inhibited by the antibi-
otic actinomycin D (Fig. 26–10). The planar portion of
this molecule inserts (intercalates) into the double-
helical DNA between successive GqC base pairs,
deforming the DNA. This prevents movement of the
polymerase along the template. Because actinomycin D
inhibits RNA elongation in intact cells as well as in cell
extracts, it is used to identify cell processes that depend
on RNA synthesis. Acridine inhibits RNA synthesis in
a similar fashion (Fig. 26–10).
Rifampicin inhibits bacterial RNA synthesis by
binding to the H9252 subunit of bacterial RNA polymerases,
preventing the promoter clearance step of transcription
(Fig. 26–6). It is sometimes used as an antibiotic.
The mushroom Amanita phalloides has evolved a
very effective defense mechanism against predators. It
produces H9251-amanitin, which disrupts mRNA formation
in animal cells by blocking Pol II and, at higher con-
centrations, Pol III. Neither Pol I nor bacterial RNA poly-
merase is sensitive to H9251-amanitin—nor is the RNA poly-
merase II of A. phalloides itself!
SUMMARY 26.1 DNA-Dependent Synthesis of RNA
■ Transcription is catalyzed by DNA-dependent
RNA polymerases, which use ribonucleoside
5H11032-triphosphates to synthesize RNA
complementary to the template strand of
duplex DNA. Transcription occurs in several
phases: binding of RNA polymerase to a DNA
site called a promoter, initiation of transcript
synthesis, elongation, and termination.
■ Bacterial RNA polymerase requires a special
subunit to recognize the promoter. As the first
committed step in transcription, binding of
RNA polymerase to the promoter and initiation
of transcription are closely regulated.
Transcription stops at sequences called
terminators.
Chapter 26 RNA Metabolism1006
(b)
Sar
L-meValL-Pro
D-Val
L-Thr
OC
N NH
2
Actinomycin D
H11001
Acridine
N
H
O
OC
Sar
L-meValL-Pro
D-Val
L-Thr
O
CH
3
CH
3
O
O
(a)
FIGURE 26–10 Actinomycin D and
acridine, inhibitors of DNA transcription.
(a) The shaded portion of actinomycin D
is planar and intercalates between two
successive GqC base pairs in duplex
DNA. The two cyclic peptide structures of
actinomycin D bind to the minor groove
of the double helix. Sarcosine (Sar) is
N-methylglycine; meVal is methylvaline.
Acridine also acts by intercalation in
DNA. (b) A complex of actinomycin D
with DNA (PDB ID 1DSC). The DNA
backbone is shown in blue, the bases are
white, the intercalated part of actinomycin
(shaded in (a)) is orange, and the
remainder of the actinomycin is red. The
DNA is bent as a result of the
actinomycin binding.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1006 mac34 mac34: kec_420:
■ Eukaryotic cells have three types of RNA
polymerases. Binding of RNA polymerase II to
its promoters requires an array of proteins
called transcription factors. Elongation factors
participate in the elongation phase of
transcription. The largest subunit of Pol II has
a long carboxyl-terminal domain, which is
phosphorylated during the initiation and
elongation phases.
26.2 RNA Processing
Many of the RNA molecules in bacteria and virtually all
RNA molecules in eukaryotes are processed to some de-
gree after synthesis. Some of the most interesting mo-
lecular events in RNA metabolism occur during this
postsynthetic processing. Intriguingly, several of the en-
zymes that catalyze these reactions consist of RNA
rather than protein. The discovery of these catalytic
RNAs, or ribozymes, has brought a revolution in think-
ing about RNA function and about the origin of life.
A newly synthesized RNA molecule is called a pri-
mary transcript. Perhaps the most extensive process-
ing of primary transcripts occurs in eukaryotic mRNAs
and in tRNAs of both bacteria and eukaryotes.
The primary transcript for a eukaryotic mRNA typ-
ically contains sequences encompassing one gene, al-
though the sequences encoding the polypeptide may not
be contiguous. Noncoding tracts that break up the cod-
ing region of the transcript are called introns, and the
coding segments are called exons (see the discussion of
introns and exons in DNA in Chapter 24). In a process
called splicing, the introns are removed from the pri-
mary transcript and the exons are joined to form a con-
tinuous sequence that specifies a functional polypep-
tide. Eukaryotic mRNAs are also modified at each end.
A modified residue called a 5H11032 cap (p. 1008) is added at
the 5H11032 end. The 3H11032 end is cleaved, and 80 to 250 A
residues are added to create a poly(A) “tail.” The some-
times elaborate protein complexes that carry out each
of these three mRNA-processing reactions do not oper-
ate independently. They appear to be organized in as-
sociation with each other and with the phosphorylated
CTD of Pol II; each complex affects the function of the
others. Other proteins involved in mRNA transport to
the cytoplasm are also associated with the mRNA in the
nucleus, and the processing of the transcript is coupled
to its transport. In effect, a eukaryotic mRNA, as it is
synthesized, is ensconced in an elaborate complex in-
volving dozens of proteins. The composition of the com-
plex changes as the primary transcript is processed,
transported to the cytoplasm, and delivered to the ri-
bosome for translation. These processes are outlined in
Figure 26–11 and described in more detail below.
The primary transcripts of prokaryotic and eukary-
otic tRNAs are processed by the removal of sequences
from each end (cleavage) and in a few cases by the re-
moval of introns (splicing). Many bases and sugars in
tRNAs are also modified; mature tRNAs are replete with
unusual bases not found in other nucleic acids (see Fig.
26–24).
The ultimate fate of any RNA is its complete and
regulated degradation. The rate of turnover of RNAs
plays a critical role in determining their steady-state lev-
els and the rate at which cells can shut down expres-
sion of a gene whose product is no longer needed. Dur-
ing the development of multicellular organisms, for
example, certain proteins must be expressed at one
stage only, and the mRNA encoding such a protein must
be made and destroyed at the appropriate times.
26.2 RNA Processing 1007
completion of
primary transcript
5H11032 3H11032
Primary
transcript
cleavage,
polyadenylation,
and splicing
5H11032 AAA(A)
n
3H11032
Mature
mRNA
5H11032
DNA
Exon
5H11032 Cap
Noncoding
end sequence
Intron
transcription
and 5H11032 capping
Pol II
FIGURE 26–11 Formation of the primary
transcript and its processing during maturation
of mRNA in a eukaryotic cell. The 5H11032 cap (red) is
added before synthesis of the primary transcript
is complete. A noncoding sequence following
the last exon is shown in orange. Splicing can
occur either before or after the cleavage and
polyadenylation steps. All the processes shown
here take place within the nucleus.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1007 mac34 mac34: kec_420:
Eukaryotic mRNAs Are Capped at the 5H11541 End
Most eukaryotic mRNAs have a 5H11541 cap, a residue of 7-
methylguanosine linked to the 5H11032-terminal residue of the
mRNA through an unusual 5H11032,5H11032-triphosphate linkage
(Fig. 26–12). The 5H11032 cap helps protect mRNA from
ribonucleases. The cap also binds to a specific cap-
binding complex of proteins and participates in binding
of the mRNA to the ribosome to initiate translation
(Chapter 27).
The 5H11032 cap is formed by condensation of a molecule
of GTP with the triphosphate at the 5H11032 end of the tran-
script. The guanine is subsequently methylated at N-7,
and additional methyl groups are often added at the 2H11032
hydroxyls of the first and second nucleotides adjacent
to the cap (Fig. 26–12). The methyl groups are derived
from S-adenosylmethionine. All these reactions occur
very early in transcription, after the first 20 to 30 nu-
cleotides of the transcript have been added. All three of
the capping enzymes, and through them the 5H11032 end of
the transcript itself, are associated with the RNA poly-
merase II CTD until the cap is synthesized. The capped
5H11032 end is then released from the capping enzymes and
bound by the cap-binding complex (Fig. 26–12c).
Both Introns and Exons Are Transcribed from
DNA into RNA
In bacteria, a polypeptide chain is generally encoded by
a DNA sequence that is colinear with the amino acid se-
quence, continuing along the DNA template without in-
terruption until the information needed to specify the
polypeptide is complete. However, the notion that all
genes are continuous was disproved in 1977 when
Chapter 26 RNA Metabolism1008
O
N
H11001
7-Methyl-
guanosine
NH
2
N
H
CH
2
H5008
O
O
P
H
HN
H
Base
CH
3
A
N
O
PO
O
A
O
A
H5008
OPPO
O
A
O
A
H5008
OPPO
O
A
OH
H
CH
2
H
OH
H
3H11032
O
A
(a)
H
Sometimes
methylated
D
O
H5008
OPPO
O
D
OCH
3
CH
2
O
5H11032,5H11032-Triphosphate
linkage
H
H
H
Base
O
H
Sometimes
methylated
OCH
3
A
O
O
A
H5008
O PPO
D
O
D
O
5H11032
5H11032
2H11032
H
2H11032
pppNp
H9253H9252H9251
ppNp
P
i
H9253H9252H9251
GpppNp
PP
i
m
7
GpppNp
adoHcy
m
7
GpppmNp
adoMet
adoHcy
adoMet
Gppp GTP
(b) (c)
phosphohydrolase
guanylyltransferase
guanine-7-
methyltransferase
2H11032-O-methyltransferase
5H11032 End of RNA with cap
5H11032 End of RNA
with triphosphate group
Cap-
synthesizing
complex
Cap
CBC
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
FIGURE 26–12 The 5H11541 cap of mRNA. (a) 7-Methylguanosine is joined
to the 5H11032 end of almost all eukaryotic mRNAs in an unusual 5H11032,5H11032-
triphosphate linkage. Methyl groups (pink) are often found at the 2H11032
position of the first and second nucleotides. RNAs in yeast cells lack
the 2H11032-methyl groups. The 2H11032-methyl group on the second nucleotide
is generally found only in RNAs from vertebrate cells. (b) Generation
of the 5H11032 cap involves four to five separate steps (adoHcy is S-
adenosylhomocysteine). (c) Synthesis of the cap is carried out by en-
zymes tethered to the CTD of Pol II. The cap remains tethered to the
CTD through an association with the cap-binding complex (CBC).
8885d_c26_995-1035 2/12/04 11:18 AM Page 1008 mac34 mac34: kec_420:
Phillip Sharp and Richard Roberts independently dis-
covered that many genes for polypeptides in eukaryotes
are interrupted by noncoding sequences (introns).
The vast majority of genes in vertebrates contain in-
trons; among the few exceptions are those that encode
histones. The occurrence of introns in other eukaryotes
varies. Many genes in the yeast Saccharomyces cere-
visiae lack introns, although in some other yeast species
introns are more common. Introns are also found in a
few eubacterial and archaebacterial genes. Introns in
DNA are transcribed along with the rest of the gene by
RNA polymerases. The introns in the primary RNA tran-
script are then spliced, and the exons are joined to form
a mature, functional RNA. In eukaryotic mRNAs, most
exons are less than 1,000 nucleotides long, with many in
the 100 to 200 nucleotide size range, encoding stretches
of 30 to 60 amino acids within a longer polypeptide. In-
trons vary in size from 50 to 20,000 nucleotides. Genes
of higher eukaryotes, including humans, typically have
much more DNA devoted to introns than to exons. Many
genes have introns; some genes have dozens of them.
RNA Catalyzes the Splicing of Introns
There are four classes of introns. The first two, the
group I and group II introns, differ in the details of their
splicing mechanisms but share one surprising charac-
teristic: they are self-splicing—no protein enzymes are
involved. Group I introns are found in some nuclear, mi-
tochondrial, and chloroplast genes coding for rRNAs,
mRNAs, and tRNAs. Group II introns are generally found
in the primary transcripts of mitochondrial or chloro-
plast mRNAs in fungi, algae, and plants. Group I and
group II introns are also found among the rarer exam-
ples of introns in bacteria. Neither class requires a high-
energy cofactor (such as ATP) for splicing. The splicing
mechanisms in both groups involve two transesterifica-
tion reaction steps (Fig. 26–13). A ribose 2H11032- or 3H11032-
hydroxyl group makes a nucleophilic attack on a phos-
phorus and, in each step, a new phosphodiester bond is
formed at the expense of the old, maintaining the bal-
ance of energy. These reactions are very similar to the
DNA breaking and rejoining reactions promoted by
topoisomerases (see Fig. 24–21) and site-specific re-
combinases (see Fig. 25–38).
The group I splicing reaction requires a guanine nu-
cleoside or nucleotide cofactor, but the cofactor is not
used as a source of energy; instead, the 3H11032-hydroxyl
group of guanosine is used as a nucleophile in the first
step of the splicing pathway. The guanosine 3H11032-hydroxyl
group forms a normal 3H11032,5H11032-phosphodiester bond with
the 5H11032 end of the intron (Fig. 26–14). The 3H11032 hydroxyl
of the exon that is displaced in this step then acts as a
nucleophile in a similar reaction at the 3H11032 end of the in-
tron. The result is precise excision of the intron and lig-
ation of the exons.
In group II introns the reaction pattern is similar ex-
cept for the nucleophile in the first step, which in this
case is the 2H11032-hydroxyl group of an A residue within
the intron (Fig. 26–15). A branched lariat structure is
formed as an intermediate.
Self-splicing of introns was first revealed in 1982 in
studies of the splicing mechanism of the group I rRNA
intron from the ciliated protozoan Tetrahymena ther-
mophila, conducted by Thomas Cech and colleagues.
These workers transcribed isolated Tetrahymena DNA
(including the intron) in vitro using purified bacterial
RNA polymerase. The resulting RNA spliced itself ac-
curately without any protein enzymes from Tetrahy-
mena. The discovery that RNAs could have catalytic
functions was a milestone in our understanding of bio-
logical systems.
26.2 RNA Processing 1009
O
OH
P
A
G
3H11032
H5008
O
O
O
OH
O
O
U
OH
O
5H11032
Intron
Exon
Guanosine
O
OH
P
A
G
3H11032
H5008
O
O
OH
O
OH
O
O U
OH
O
O
5H11032
OH
O
H11001
FIGURE 26–13 Transesterification reaction. This is the first
step in the splicing of group I introns. Here, the 3H11032 OH of a
guanosine molecule acts as nucleophile. Thomas Cech
8885d_c26_995-1035 2/12/04 11:18 AM Page 1009 mac34 mac34: kec_420:
Chapter 26 RNA Metabolism1010
5H11032 3H11032
Primary
transcript
The 3H11032 OH of guanosine
acts as a nucleophile,
attacking the phosphate at
the 5H11032 splice site.
5H11032
Intermediate
The 3H11032 OH of the 5H11032exon
becomes the nucleophile,
completing the reaction.
Spliced RNA
3H11032
5H11032 UpU
pApG
OHpG
U OH pUG
UpA pUG
5H11032 Exon 3H11032 Exon
Intron
3H11032
G OH
pApG5H11032
3H11032
FIGURE 26–14 Splicing mechanism of group I
introns. The nucleophile in the first step may
be guanosine, GMP, GDP, or GTP. The spliced
intron is eventually degraded.
Most introns are not self-splicing, and these types
are not designated with a group number. The third and
largest class of introns includes those found in nuclear
mRNA primary transcripts. These are called spliceo-
somal introns, because their removal occurs within
and is catalyzed by a large protein complex called a
spliceosome. Within the spliceosome, the introns un-
dergo splicing by the same lariat-forming mechanism as
the group II introns. The spliceosome is made up of spe-
cialized RNA-protein complexes, small nuclear ribonu-
cleoproteins (snRNPs, often pronounced “snurps”).
Each snRNP contains one of a class of eukaryotic RNAs,
100 to 200 nucleotides long, known as small nuclear
RNAs (snRNAs). Five snRNAs (U1, U2, U4, U5, and
U6) involved in splicing reactions are generally found in
abundance in eukaryotic nuclei. The RNAs and proteins
in snRNPs are highly conserved in eukaryotes from
yeasts to humans. mRNA Splicing
Spliceosomal introns generally have the dinu-
cleotide sequence GU and AG at the 5H11032 and 3H11032 ends, re-
spectively, and these sequences mark the sites where
splicing occurs. The U1 snRNA contains a sequence
complementary to sequences near the 5H11032 splice site of
nuclear mRNA introns (Fig. 26–16a), and the U1 snRNP
binds to this region in the primary transcript. Addition
of the U2, U4, U5, and U6 snRNPs leads to formation of
the spliceosome (Fig. 26–16b). The snRNPs together
contribute five RNAs and about 50 proteins to the
spliceosome, a supramolecular assembly nearly as com-
plex as the ribosome (described in Chapter 27). ATP is
required for assembly of the spliceosome, but the RNA
cleavage-ligation reactions do not seem to require ATP.
Some mRNA introns are spliced by a less common type
of spliceosome, in which the U1 and U2 snRNPs are re-
placed by the U11 and U12 snRNPs. Whereas U1- and
U2-containing spliceosomes remove introns with (5H11032)GU
and AG(3H11032) terminal sequences, as shown in Figure
26–16, the U11- and U12-containing spliceosomes re-
move a rare class of introns that have (5H11032)AU and AC(3H11032)
terminal sequences to mark the intronic splice sites. The
spliceosomes used in nuclear RNA splicing may have
evolved from more ancient group II introns, with the
snRNPs replacing the catalytic domains of their self-
splicing ancestors.
Some components of the splicing apparatus appear
to be tethered to the CTD of RNA polymerase II, sug-
gesting an interesting model for the splicing reaction.
As the first splice junction is synthesized, it is bound by
8885d_c26_995-1035 2/12/04 11:18 AM Page 1010 mac34 mac34: kec_420:
a tethered spliceosome. The second splice junction is
then captured by this complex as it passes, facilitating
the juxtaposition of the intron ends and the subsequent
splicing process (Fig. 26–16c). After splicing, the intron
remains in the nucleus and is eventually degraded.
The fourth class of introns, found in certain tRNAs,
is distinguished from the group I and II introns in that
the splicing reaction requires ATP and an endonucle-
ase. The splicing endonuclease cleaves the phosphodi-
ester bonds at both ends of the intron, and the two ex-
ons are joined by a mechanism similar to the DNA ligase
reaction (see Fig. 25–16).
Although spliceosomal introns appear to be limited
to eukaryotes, the other intron classes are not. Genes
with group I and II introns have now been found in both
bacteria and bacterial viruses. Bacteriophage T4, for ex-
ample, has several protein-encoding genes with group I
introns. Introns appear to be more common in archae-
bacteria than in eubacteria.
Eukaryotic mRNAs Have a Distinctive
3H11541 End Structure
At their 3H11032 end, most eukaryotic mRNAs have a string
of 80 to 250 A residues, making up the poly(A) tail.
This tail serves as a binding site for one or more spe-
cific proteins. The poly(A) tail and its associated pro-
teins probably help protect mRNA from enzymatic de-
struction. Many prokaryotic mRNAs also acquire
poly(A) tails, but these tails stimulate decay of mRNA
rather than protecting it from degradation.
The poly(A) tail is added in a multistep process.
The transcript is extended beyond the site where the
poly(A) tail is to be added, then is cleaved at the poly(A)
addition site by an endonuclease component of a large
enzyme complex, again associated with the CTD of RNA
polymerase II (Fig. 26–17). The mRNA site where cleav-
age occurs is marked by two sequence elements: the
highly conserved sequence (5H11032)AAUAAA(3H11032), 10 to 30
26.2 RNA Processing 1011
5H11032 UpG 3H11032pU
p
C
pA
pA
OH
The 2H11032 OH of a specific adenosine
in the intron acts as a nucleophile,
attacking the 5H11032 splice site to form
a lariat structure.
5H11032
Adenosine in the lariat
structure has three
phosphodiester bonds.
U OH 3H11032pU
Intermediate
Primary
transcript
Intron
The 3H11032 OH of the 5H11032 exon acts
as a nucleophile, completing
the reaction.
5H11032 UpU 3H11032
Spliced RNA
p
A
OH(3H11032)
C
A
G
A
To 3H11032 end
2H11032,5H11032-Phosphodiester bond
GpApC
p
A
G
p
ApC FIGURE 26–15 Splicing mechanism of group II
introns. The chemistry is similar to that of group I
intron splicing, except for the identity of the
nucleophile in the first step and formation of a
lariatlike intermediate, in which one branch is a
2H11032,5H11032-phosphodiester bond.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1011 mac34 mac34: kec_420:
5H11032 3H11032A
5H11032 3H11032A
U1 U2
5H11032 3H11032
U5
U2
U1
U4/U6
A
Inactive
spliceosome
5H11032 3H11032
U5
U2U6
A
Active
spliceosome
5H11032 3H11032
5H11032 3H11032
U5
U2U6
OH
U5
U2
U6
Intron release
lariat
formation
ATP
ADP H11001 P
i
ATP
ADP H11001 P
i
ATP
ADP H11001 P
i
U1 snRNP
U2 snRNP
U4/U6 H11001 U5
U1, U4
GU AG
GU AG
GU
GU
AG
AG
AG
AG
A
U
G
A
U
G
(b)
UCCA CAUA AUGAUGU
A
5H11032 Exon
3H11032
5H11032
3H11032 Exon
G GUAGGU UACUA C
A
U1 U2
H9274H9274
GU AG
(a)
Spliceosome
CTD
CBC
Cap
Spliced
intron
(c)
FIGURE 26–16 Splicing mechanism in mRNA primary transcripts. (a) RNA
pairing interactions in the formation of spliceosome complexes. The U1
snRNA has a sequence near its 5H11032 end that is complementary to the splice
site at the 5H11032 end of the intron. Base pairing of U1 to this region of the
primary transcript helps define the 5H11032 splice site during spliceosome
assembly (H9023 is pseudouridine; see Fig. 26–24). U2 is paired to the intron at
a position encompassing the A residue (shaded pink) that becomes the
nucleophile during the splicing reaction. Base pairing of U2 snRNA causes
a bulge that displaces and helps to activate the adenylate, whose 2H11032 OH
will form the lariat structure through a 2H11032,5H11032-phosphodiester bond.
(b) Assembly of spliceosomes. The U1 and U2 snRNPs bind, then the
remaining snRNPs (the U4/U6 complex and U5) bind to form an inactive
spliceosome. Internal rearrangements convert this species to an active
spliceosome in which U1 and U4 have been expelled and U6 is paired
with both the 5H11032 splice site and U2. This is followed by the catalytic steps,
which parallel those of the splicing of group II introns (see Fig. 26–15).
(c) Coordination of splicing with transcription provides an attractive
mechanism for bringing the two splice sites together. See the text for details.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1012 mac34 mac34: kec_420:
nucleotides on the 5H11032 side (upstream) of the cleavage
site, and a less well-defined sequence rich in G and U
residues, 20 to 40 nucleotides downstream of the cleav-
age site. Cleavage generates the free 3H11032-hydroxyl group
that defines the end of the mRNA, to which A residues
are immediately added by polyadenylate polymerase,
which catalyzes the reaction
RNA H11001 nATP 88n RNA–(AMP)
n
H11001 nPP
i
where n H11005 80 to 250. This enzyme does not require a
template but does require the cleaved mRNA as a primer.
The overall processing of a typical eukaryotic mRNA is
summarized in Figure 26–18. In some cases the polypep-
tide-coding region of the mRNA is also modified by RNA
“editing” (see Box 27–1 for details).This editing includes
processes that add or delete bases in the coding regions
26.2 RNA Processing 1013
5H11032
Pol II
Enzyme
complex
Template DNA
RNA
AAUAAA
Cap
endonuclease
PP
i
5H11032 AAUAAA OH(3H11032)
ATP
5H11032 AAUAAA AAA(A)
n
polyadenylate
polymerase
OH(3H11032)
1
2
3
5H11032 AAUAAA
FIGURE 26–17 Addition of the poly(A) tail to the primary RNA tran-
script of eukaryotes. Pol II synthesizes RNA beyond the segment of
the transcript containing the cleavage signal sequences, including the
highly conserved upstream sequence (5H11032)AAUAAA. 1 The cleavage
signal sequence is bound by an enzyme complex that includes an en-
donuclease, a polyadenylate polymerase, and several other multisub-
unit proteins involved in sequence recognition, stimulation of cleav-
age, and regulation of the length of the poly(A) tail. 2 The RNA is
cleaved by the endonuclease at a point 10 to 30 nucleotides 3H11032 to
(downstream of) the sequence AAUAAA. 3 The polyadenylate poly-
merase synthesizes a poly(A) tail 80 to 250 nucleotides long, begin-
ning at the cleavage site.
FIGURE 26–18 Overview of the processing of a eukaryotic mRNA.
The ovalbumin gene, shown here, has introns A to G and exons 1 to
7 and L (L encodes a signal peptide sequence that targets the protein
for export from the cell; see Fig. 27–34). About three-quarters of the
RNA is removed during processing. Pol II extends the primary tran-
script well beyond the cleavage and polyadenylation site (“extra RNA”)
before terminating transcription. Termination signals for Pol II have not
yet been defined.
Ovalbumin gene
transcription and
5H11032 capping
splicing, cleavage,
and polyadenylation
Extra RNA
Seven introns
Mature
mRNA
Primary
transcript
DNA
5H11032 3H11032
Extra RNA
Cap
123 7456L
1,872
nucleotides
56 7
7,700 bp
12 3 4L
ABCDEF G
56 712 3 4L
ABCDEF G
AAA(A)
n
8885d_c26_995-1035 2/12/04 11:18 AM Page 1013 mac34 mac34: kec_420:
of primary transcripts or that change the sequence (by,
for example, enzymatic deamination of a C residue to
create a U residue). A particularly dramatic example oc-
curs in trypanosomes, which are parasitic protozoa:
large regions of an mRNA are synthesized without any
uridylate, and the U residues are inserted later by RNA
editing.
A Gene Can Give Rise to Multiple Products
by Differential RNA Processing
The transcription of introns seems to consume cellular
resources and energy without returning any benefit to
the organism, but introns may confer an advantage not
yet fully appreciated by scientists. Introns may be ves-
tiges of a molecular parasite not unlike transposons
(Chapter 25). Although the benefits of introns are not
yet clear in most cases, cells have evolved to take ad-
vantage of the splicing pathways to alter the expression
of certain genes.
Most eukaryotic mRNA transcripts produce only one
mature mRNA and one corresponding polypeptide, but
some can be processed in more than one way to produce
different mRNAs and thus different polypeptides. The
primary transcript contains molecular signals for all the
alternative processing pathways, and the pathway favored
in a given cell is determined by processing factors, RNA-
binding proteins that promote one particular path.
Complex transcripts can have either more than one
site for cleavage and polyadenylation or alternative
splicing patterns, or both. If there are two or more sites
for cleavage and polyadenylation, use of the one closest
to the 5H11032 end will remove more of the primary transcript
sequence (Fig. 26–19a). This mechanism, called poly(A)
site choice, generates diversity in the variable domains
of immunoglobulin heavy chains. Alternative splicing
patterns (Fig. 26–19b) produce, from a common pri-
mary transcript, three different forms of the myosin
heavy chain at different stages of fruit fly development.
Both mechanisms come into play when a single RNA
transcript is processed differently to produce two dif-
ferent hormones: the calcium-regulating hormone cal-
citonin in rat thyroid and calcitonin-gene-related pep-
tide (CGRP) in rat brain (Fig. 26–20).
Ribosomal RNAs and tRNAs Also Undergo Processing
Posttranscriptional processing is not limited to mRNA.
Ribosomal RNAs of both prokaryotic and eukaryotic cells
are made from longer precursors called preribosomal
RNAs, or pre-rRNAs, synthesized by Pol I. In bacteria,
16S, 23S, and 5S rRNAs (and some tRNAs, although
most tRNAs are encoded elsewhere) arise from a single
30S RNA precursor of about 6,500 nucleotides. RNA at
both ends of the 30S precursor and segments between
the rRNAs are removed during processing (Fig. 26–21).
Chapter 26 RNA Metabolism1014
Poly(A) site
cleavage and
polyadenylation
AAA(A)
n
AAA(A)
n
AAA(A)
n
AAA(A)
n
AAA(A)
n
5H11032 Splice site 3H11032 Splice sites
DNA
Cap
DNA
Primary
transcript
Cap
Primary
transcript
Poly(A) sites
A
1
A
2
A
1
A
2
Mature
mRNA
cleavage and
polyadenylation
at A
1
cleavage and
polyadenylation
at A
2
(a)
Mature
mRNA
(b)
splicing
FIGURE 26–19 Two mechanisms for the alternative processing of
complex transcripts in eukaryotes. (a) Alternative cleavage and
polyadenylation patterns. Two poly(A) sites, A
1
and A
2
, are shown.
(b) Alternative splicing patterns. Two different 3H11032 splice sites are shown.
In both mechanisms, different mature mRNAs are produced from the
same primary transcript.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1014 mac34 mac34: kec_420:
26.2 RNA Processing 1015
Exon Intron
123
Calcitonin CGRP
6
Poly(A) site Poly(A) site
cleavage and
polyadenylation
cleavage and
polyadenylation
Primary
transcript
6
AAA(A)
n
BrainThyroid
54321
AAA(A)
n
3 421
AAA(A)
n
3 421
Mature mRNA AAA(A)
n
3 521
Mature mRNA
splicing splicing
translation translation
6
protease
action
protease
action
Calcitonin CGRP
45
FIGURE 26–20 Alternative processing of the calcitonin gene tran-
script in rats. The primary transcript has two poly(A) sites; one pre-
dominates in the brain, the other in the thyroid. In the brain, splicing
eliminates the calcitonin exon (exon 4); in the thyroid, this exon is re-
tained. The resulting peptides are processed further to yield the final
hormone products: calcitonin-gene-related peptide (CGRP) in the
brain and calcitonin in the thyroid.
methyl groups
Intermediates
Mature RNAs
Pre-rRNA
transcript
(30S)
16S tRNA
(4S)
23S 5S
methylation
cleavage
nucleases
17S 25S
nucleases
tRNA
tRNA 16S rRNA 23S rRNA 5S rRNA
5S
112 31 13
3
2
1
3
FIGURE 26–21 Processing of pre-rRNA
transcripts in bacteria. 1 Before cleavage,
the 30S RNA precursor is methylated at
specific bases. 2 Cleavage liberates
precursors of rRNAs and tRNA(s). Cleavage at
the points labeled 1, 2, and 3 is carried out
by the enzymes RNase III, RNase P, and
RNase E, respectively. As discussed later in the
text, RNase P is a ribozyme. 3 The final 16S,
23S, and 5S rRNA products result from the
action of a variety of specific nucleases. The
seven copies of the gene for pre-rRNA in the
E. coli chromosome differ in the number,
location, and identity of tRNAs included in
the primary transcript. Some copies of the
gene have additional tRNA gene segments
between the 16S and 23S rRNA segments and
at the far 3H11032 end of the primary transcript.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1015 mac34 mac34: kec_420:
The genome of E. coli encodes seven pre-rRNA mol-
ecules. All these genes have essentially identical rRNA-
coding regions, but they differ in the segments between
these regions. The segment between the 16S and 23S
rRNA genes generally encodes one or two tRNAs, with
different tRNAs arising from different pre-rRNA tran-
scripts. Coding sequences for tRNAs are also found on
the 3H11032 side of the 5S rRNA in some precursor transcripts.
In eukaryotes, a 45S pre-rRNA transcript is
processed in the nucleolus to form the 18S, 28S, and
5.8S rRNAs characteristic of eukaryotic ribosomes (Fig.
26–22). The 5S rRNA of most eukaryotes is made as a
completely separate transcript by a different poly-
merase (Pol III instead of Pol I).
Most cells have 40 to 50 distinct tRNAs, and eu-
karyotic cells have multiple copies of many of the tRNA
genes. Transfer RNAs are derived from longer RNA pre-
cursors by enzymatic removal of nucleotides from the
5H11032 and 3H11032 ends (Fig. 26–23). In eukaryotes, introns are
present in a few tRNA transcripts and must be excised.
Where two or more different tRNAs are contained in
a single primary transcript, they are separated by
enzymatic cleavage. The endonuclease RNase P, found
in all organisms, removes RNA at the 5H11032 end of tRNAs.
This enzyme contains both protein and RNA. The RNA
component is essential for activity, and in bacterial cells
it can carry out its processing function with precision
even without the protein component. RNase P is there-
fore another example of a catalytic RNA, as described
in more detail below. The 3H11032 end of tRNAs is processed
by one or more nucleases, including the exonuclease
RNase D.
Chapter 26 RNA Metabolism1016
Pre-rRNA
transcript
(45S)
18S 5.8S 28S
methylation
methyl groups
cleavage
5.8S rRNA18S rRNA 28S rRNA
Mature rRNAs
2
1
FIGURE 26–22 Processing of pre-rRNA transcripts
in vertebrates. In step 1 , the 45S precursor is
methylated at more than 100 of its 14,000
nucleotides, mostly on the 2H11032-OH groups of ribose
units retained in the final products. 2 A series of
enzymatic cleavages produces the 18S, 5.8S, and
28S rRNAs. The cleavage reactions require RNAs
found in the nucleolus, called small nucleolar RNAs
(snoRNAs), within protein complexes reminiscent of
spliceosomes. The 5S rRNA is produced separately.
A
G A
C
G GGCG
C CCGC
C
G
A
C
U
U
U
GCCA
CGGAA
U
G
G
G
A
U
U
U
U
A
3H11032
U
U
C
G
G
C
A
G
G
G
C
C
G
A
U
C
OH
U
U
AGUUAAUUGACUAUUG5H11032
U
A
C
A
C
A
A
G
A
C
U
U
C
U
U
G
G
A
A
A
G
G
C
A
U
C
C
U
U
A
U
A
RNase D cut
Primary transcript
RNase P cut
H9274
H9274
H9274
A
G A
mC
G GGCG
C CCGC
C
G
mA
C
U
T
D
C
A
A
G
A
C
U
U
C
U
G
G
A
A
A
mG
mG
GCCA
CGGAA
D
G
G
A
D
D
D
D
3H11032
U
U
C
G
G
C
A
G
G
G
C
C
G
A
C
C
A
U
C
OH
mG
Mature tRNA
Tyr
p
U
5H11032 p
U
A
C
A
C
U
G
A
A
A
G
C
A
U
C
C
U
U
A
A
Intermediate
H9274
H9274
H9274
A
G A
mC
G GGCG
C CCGC
C
G
mA
C
U
T
D
C
A
A
G
A
U
U
C
G
mG
mG
GCCA
CGGAA
D
G
G
A
D
D
D
D
3H11032
U
U
C
G
G
C
A
G
G
G
C
C
G
A
C
C
A
U
C
OH
mG
5H11032 p
splicingbase modification
5H11032 cleavage
3H11032 cleavage
CCA addition
FIGURE 26–23 Processing of tRNAs in bacteria and eukaryotes. The
yeast tRNA
Tyr
(the tRNA specific for tyrosine binding; see Chapter 27)
is used to illustrate the important steps. The nucleotide sequences
shown in yellow are removed from the primary transcript. The ends
are processed first, the 5H11032 end before the 3H11032 end. CCA is then added
to the 3H11032 end, a necessary step in processing eukaryotic tRNAs and
those bacterial tRNAs that lack this sequence in the primary transcript.
While the ends are being processed, specific bases in the rest of the
transcript are modified (see Fig. 26–24). For the eukaryotic tRNA
shown here, the final step is splicing of the 14-nucleotide intron. In-
trons are found in some eukaryotic tRNAs but not in bacterial tRNAs.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1016 mac34 mac34: kec_420:
Transfer RNA precursors may undergo further post-
transcriptional processing. The 3H11032-terminal trinucleotide
CCA(3H11032) to which an amino acid will be attached dur-
ing protein synthesis (Chapter 27) is absent from some
bacterial and all eukaryotic tRNA precursors and is
added during processing (Fig. 26–23). This addition is
carried out by tRNA nucleotidyltransferase, an unusual
enzyme that binds the three ribonucleoside triphos-
phate precursors in separate active sites and catalyzes
formation of the phosphodiester bonds to produce the
CCA(3H11032) sequence. The creation of this defined se-
quence of nucleotides is therefore not dependent on a
DNA or RNA template—the template is the binding site
of the enzyme.
The final type of tRNA processing is the modifica-
tion of some of the bases by methylation, deamination,
or reduction (Fig. 26–24). In the case of pseudouridine
(H9023), the base (uracil) is removed and reattached to the
sugar through C-5. Some of these modified bases occur
at characteristic positions in all tRNAs (Fig. 26–23).
RNA Enzymes Are the Catalysts of Some
Events in RNA Metabolism
The study of posttranscriptional processing of RNA mol-
ecules led to one of the most exciting discoveries in
modern biochemistry—the existence of RNA enzymes.
The best-characterized ribozymes are the self-splicing
group I introns, RNase P, and the hammerhead ribozyme
(discussed below). Most of the activities of these ri-
bozymes are based on two fundamental reactions: trans-
esterification (Fig. 26–13) and phosphodiester bond hy-
drolysis (cleavage). The substrate for ribozymes is often
an RNA molecule, and it may even be part of the ri-
bozyme itself. When its substrate is RNA, an RNA cat-
alyst can make use of base-pairing interactions to align
the substrate for the reaction.
Ribozymes vary greatly in size. A self-splicing group
I intron may have more than 400 nucleotides. The ham-
merhead ribozyme consists of two RNA strands with
only 41 nucleotides in all (Fig. 26–25). As with protein
enzymes, the three-dimensional structure of ribozymes
is important for function. Ribozymes are inactivated by
heating above their melting temperature or by addition
of denaturing agents or complementary oligonu-
cleotides, which disrupt normal base-pairing patterns.
Ribozymes can also be inactivated if essential nu-
cleotides are changed. The secondary structure of a self-
splicing group I intron from the 26S rRNA precursor of
Tetrahymena is shown in detail in Figure 26–26.
Enzymatic Properties of Group I Introns Self-splicing group
I introns share several properties with enzymes besides
accelerating the reaction rate, including their kinetic be-
haviors and their specificity. Binding of the guanosine
cofactor (Fig. 26–13) to the Tetrahymena group I rRNA
intron (Fig. 26–26) is saturable (K
m
≈ 30 H9262M) and can
be competitively inhibited by 3H11032-deoxyguanosine. The
intron is very precise in its excision reaction, largely due
to a segment called the internal guide sequence that
can base-pair with exon sequences near the 5H11032 splice
site (Fig. 26–26). This pairing promotes the alignment
of specific bonds to be cleaved and rejoined.
Because the intron itself is chemically altered dur-
ing the splicing reaction—its ends are cleaved—it may
appear to lack one key enzymatic property: the ability
to catalyze multiple reactions. Closer inspection has
shown that after excision, the 414 nucleotide intron
from Tetrahymena rRNA can, in vitro, act as a true
enzyme (but in vivo it is quickly degraded). A series of
26.2 RNA Processing 1017
Ribose
S
CH
N
O
N
N
N
O
Ribose
N
N
NH
2
N
CH
3
O
N
N
HN
Ribose
N
CO
H
3
C
CH
2
ONH P
N
D
G
N
CH
3
HN
Ribose
Ribose
N
H
O
O
H9274
Ribose
HN
N
CH
3
O
H
Dihydrouridine (D)
HN
Ribose
HN
NO
O O
O
i
f
O
H
H
H
A
AA
H
Pseudouridine ( )Ribothymidine (T)N
6
-Isopentenyladenosine (i
6
A)
4-Thiouridine (S
4
U) Inosine (I) 1-Methylguanosine (m
1
G)
FIGURE 26–24 Some modified bases of tRNAs, produced in posttranscriptional reactions.
The standard symbols (used in Fig. 26–23) are shown in parentheses. Note the unusual
ribose attachment point in pseudouridine.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1017 mac34 mac34: kec_420:
Chapter 26 RNA Metabolism1018
5H11032
3H11032
P5b
P5a
P6a
P6b
P2.1
P2
P5
P4
P1
P6
P3
P8
P7
P9.0
P9.2
P9.1
P9.1a
P9
P10
P5c
A
AA
G
CG
UG
UG
UA
GC
AU
GC
AU
U
U
U
A
G
G
G
G
G
G
CC
G
C
GC
U
U
A
A
A
A
A
A
U
A
A
A
U
C
C
G
G
UA
GC
AU
U
U
U
A
A
A
A
A
A
GC
GU
UG
CG
CG
G
A
CA
GU
A
A
A
U
U
U
G
G
A
G
U
A
A
U
A
U
U
A
U
G
G
G
U
U
U
U
A
A
U
U
A
G
U
C
G
G
A
G
C
A
A
A
U
G
C
G
C
G
U
U
U
CG
GC
GC
CG
CG
CG
GC
C
CA
G
C
CA
AG
AC
G
AU
U
U
AA
UG
U
U
U
UA
GG
AC
AC
A
A
AU
UU
C
U
A
G
UA
CG
CG
UA
AU
GC
GU
GC
AU
G
G
A
C
C
G U
UA
UA
U
C
C
A
U
U
A
A
A
A
U
U
U
CG
CG
AU
AU
A
A
A
U
UA
AA
AU
GC
UG
UA
A
A
U
U
U
G
CG
G
A
A
U
GC
GC
C
C
AG
CG
A
A
U
U
G
C
U
G
UA
U
A
A
CG
UA
G
U
GC
A U
UU
CG
CG
GC
CG
A
A
U
A
G
A
A
U
U
UA
A
AU
GC
UA
CG
GC
C
GUA
A
G
G
U
A
G
C G
U G
A U
G C
C G
GA
AU
GG
GA
G C
G C
A
AA
UA
GC
U
G C
U A
G C
U A
C C
AC
U U
UA
UA
A
A
A
CG
UG
GC
GU
UA
CG
AU
AU
CG
AU
GC
AGACA
C
A
UG
A
340
360
320
20
120
260
140
220
240
60
80
40
300
100
280
200
180
160
400
380
FIGURE 26–26 Secondary structure of the self-splicing
rRNA intron from Tetrahymena. Intron sequences are
shaded yellow, exon sequences green. Each thick yellow
line represents a bond between neighboring nucleotides in
a continuous sequence (a device necessitated by showing
this complex molecule in two dimensions; similarly an
oversize blue line between a C and G residue indicates
normal base pairing); all nucleotides are shown. The
catalytic core of the self-splicing activity is shaded. Some
base-paired regions are labeled (P1, P3, P2.1, P5a, and so
forth) according to an established convention for this RNA
molecule. The P1 region, which contains the internal guide
sequence (boxed), is the location of the 5H11032 splice site (red
arrow). Part of the internal guide sequence pairs with the
end of the 3H11032 exon, bringing the 5H11032 and 3H11032 splice sites
(red and blue arrows) into close proximity. The three-
dimensional structure of a large segment of this intron is
illustrated in Figure 8–28c.
(b)
(a)
G5H11032
3H11032
3H11032
5H11032C
G
C
A
U
C
G
C
G
C
GU
C
G
A
G
A
A
C
U
G
A
U
A
G
U
CG
UA
C
GA
U A
G
C
G
A
C
FIGURE 26–25 Hammerhead ribozyme. Certain viruslike elements called virusoids
have small RNA genomes and usually require another virus to assist in their replication
and/or packaging. Some virusoid RNAs include small segments that promote site-
specific RNA cleavage reactions associated with replication. These segments are called
hammerhead ribozymes, because their secondary structures are shaped like the head
of a hammer. Hammerhead ribozymes have been defined and studied separately from
the much larger viral RNAs. (a) The minimal sequences required for catalysis by the
ribozyme. The boxed nucleotides are highly conserved and are required for catalytic
function. The arrow indicates the site of self-cleavage. (b) Three-dimensional structure
(PDB 1D 1MME). The strands are colored as in (a). The hammerhead ribozyme is a
metalloenzyme; Mg
2H11001
ions are required for activity. The phosphodiester bond at the
site of self-cleavage is indicated by an arrow. Hammerhead Ribozyme
8885d_c26_995-1035 2/12/04 11:18 AM Page 1018 mac34 mac34: kec_420:
26.2 RNA Processing 1019
intramolecular cyclization and cleavage reactions in the
excised intron leads to the loss of 19 nucleotides from
its 5H11032 end. The remaining 395 nucleotide, linear RNA—
referred to as L-19 IVS—promotes nucleotidyl transfer
reactions in which some oligonucleotides are lengthened
at the expense of others (Fig. 26–27). The best sub-
strates are oligonucleotides, such as a synthetic (C)
5
oligomer, that can base-pair with the same guanylate-
rich internal guide sequence that held the 5H11032 exon in
place for self-splicing.
The enzymatic activity of the L-19 IVS ribozyme re-
sults from a cycle of transesterification reactions mech-
anistically similar to self-splicing. Each ribozyme mole-
cule can process about 100 substrate molecules per hour
and is not altered in the reaction; therefore the intron
acts as a catalyst. It follows Michaelis-Menten kinetics,
is specific for RNA oligonucleotide substrates, and can
be competitively inhibited. The k
cat
/K
m
(specificity con-
stant) is 10
3
M
H110021
s
H110021
, lower than that of many enzymes,
but the ribozyme accelerates hydrolysis by a factor of
10
10
relative to the uncatalyzed reaction. It makes use
of substrate orientation, covalent catalysis, and metal-
ion catalysis—strategies used by protein enzymes.
Characteristics of Other Ribozymes E. coli RNase P has
both an RNA component (the M1 RNA, with 377 nu-
cleotides) and a protein component (M
r
17,500). In 1983
Sidney Altman and Norman Pace and their coworkers
discovered that under some conditions, the M1 RNA
alone is capable of catalysis, cleaving tRNA precursors
at the correct position. The protein component appar-
ently serves to stabilize the RNA or facilitate its func-
tion in vivo. The RNase P ribozyme recognizes the three-
dimensional shape of its pre-tRNA substrate, along with
the CCA sequence, and thus can cleave the 5H11032 leaders
from diverse tRNAs (Fig. 26–23).
The known catalytic repertoire of ribozymes con-
tinues to expand. Some virusoids, small RNAs associ-
ated with plant RNA viruses, include a structure that
promotes a self-cleavage reaction; the hammerhead
ribozyme illustrated in Figure 26–25 is in this class,
catalyzing the hydrolysis of an internal phosphodiester
bond. The splicing reaction that occurs in a spliceosome
seems to rely on a catalytic center formed by the U2,
U5, and U6 snRNAs (Fig. 26–16). And perhaps most im-
portant, an RNA component of ribosomes catalyzes the
synthesis of proteins (Chapter 27).
Exploring catalytic RNAs has provided new insights
into catalytic function in general and has important im-
plications for our understanding of the origin and evo-
lution of life on this planet, a topic discussed in Section
26.3.
Spliced rRNA intron
GOH(5H11032) G A A A U A G C A A U A U A U A C C U U U G G A G G G
L-19 IVS A
19 nucleotides from 5H11032 end
G OH (3H11032)
(a)
(5H11032) U U G G A G G G A
(C)
5
HO
OH
HO
(C)
5
C C C C CHO
C C C C CHO
HO
C C C C CHO
C C C C
C C C C C C
(C)
6
3
4
1
2
C
G
HO C
G
(b)
U U G G A G G G
(3H11032)
(5H11032)
(C)
4
G
(3H11032) HO
U U G G A G G G A
C C C C C
U U G G A G G G A U U G G A G G G A
G
OH
FIGURE 26–27 In vitro catalytic activity of
L-19 IVS. (a) L-19 IVS is generated by the
autocatalytic removal of 19 nucleotides from
the 5H11032 end of the spliced Tetrahymena intron.
The cleavage site is indicated by the arrow in
the internal guide sequence (boxed). The G
residue (shaded pink) added in the first step
of the splicing reaction (see Fig. 26–14) is
part of the removed sequence. A portion of
the internal guide sequence remains at the
5H11032 end of L-19 IVS. (b) L-19 IVS lengthens
some RNA oligonucleotides at the expense
of others in a cycle of transesterification
reactions (steps 1 through 4 ). The 3H11032 OH
of the G residue at the 3H11032 end of L-19 IVS
plays a key role in this cycle (note that this is
not the G residue added in the splicing
reaction). (C)
5
is one of the ribozyme’s better
substrates because it can base-pair with the
guide sequence remaining in the intron.
Although this catalytic activity is probably
irrelevant to the cell, it has important
implications for current hypotheses on
evolution, discussed at the end of this
chapter.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1019 mac34 mac34: kec_420:
Cellular mRNAs Are Degraded at Different Rates
The expression of genes is regulated at many levels. A
crucial factor governing a gene’s expression is the cel-
lular concentration of its associated mRNA. The con-
centration of any molecule depends on two factors: its
rate of synthesis and its rate of degradation. When syn-
thesis and degradation of an mRNA are balanced, the
concentration of the mRNA remains in a steady state.
A change in either rate will lead to net accumulation or
depletion of the mRNA. Degradative pathways ensure
that mRNAs do not build up in the cell and direct the
synthesis of unnecessary proteins.
The rates of degradation vary greatly for mRNAs
from different eukaryotic genes. For a gene product that
is needed only briefly, the half-life of its mRNA may be
only minutes or even seconds. Gene products needed
constantly by the cell may have mRNAs that are stable
over many cell generations. The average half-life of a
vertebrate cell mRNA is about 3 hours, with the pool of
each type of mRNA turning over about ten times per
cell generation. The half-life of bacterial mRNAs is much
shorter—only about 1.5 min—perhaps because of reg-
ulatory requirements.
Messenger RNA is degraded by ribonucleases pres-
ent in all cells. In E. coli, the process begins with one
or a few cuts by an endoribonuclease, followed by 3H11032n5H11032
degradation by exoribonucleases. In lower eukaryotes,
the major pathway involves first shortening the poly(A)
tail, then decapping the 5H11032 end and degrading the mRNA
in the 5H11032n3H11032 direction. A 3H11032n5H11032 degradative pathway
also exists and may be the major path in higher eu-
karyotes. All eukaryotes have a complex of up to ten
conserved 3H11032n5H11032 exoribonucleases, called the exosome,
which is involved in the processing of the 3H11032 end of
rRNAs and tRNAs as well as the degradation of mRNAs.
A hairpin structure in bacterial mRNAs with a H9267-
independent terminator (Fig. 26–7) confers stability
against degradation. Similar hairpin structures can make
some parts of a primary transcript more stable, leading
to nonuniform degradation of transcripts. In eukaryotic
cells, both the 3H11032 poly(A) tail and the 5H11032 cap are im-
portant to the stability of many mRNAs. Life Cycle of
an mRNA
Polynucleotide Phosphorylase Makes Random
RNA-like Polymers
In 1955, Marianne Grunberg-Manago and Severo Ochoa
discovered the bacterial enzyme polynucleotide phos-
phorylase, which in vitro catalyzes the reaction
(NMP)
n
H11001 NDP (NMP)
nH110011
H11001 P
i
Lengthened
polynucleotide
Polynucleotide phosphorylase was the first nucleic acid–
synthesizing enzyme discovered (Arthur Kornberg’s dis-
covery of DNA polymerase followed soon thereafter).
z
y
The reaction catalyzed by polynucleotide phosphorylase
differs fundamentally from the polymerase activities dis-
cussed so far in that it is not template-dependent. The
enzyme uses the 5H11032-diphosphates of ribonucleosides as
substrates and cannot act on the homologous 5H11032-triphos-
phates or on deoxyribonucleoside 5H11032-diphosphates. The
RNA polymer formed by polynucleotide phosphorylase
contains the usual 3H11032,5H11032-phosphodiester linkages, which
can be hydrolyzed by ribonuclease. The reaction is read-
ily reversible and can be pushed in the direction of
breakdown of the polyribonucleotide by increasing the
phosphate concentration. The probable function of this
enzyme in the cell is the degradation of mRNAs to nu-
cleoside diphosphates.
Because the polynucleotide phosphorylase reaction
does not use a template, the polymer it forms does not
have a specific base sequence. The reaction proceeds
equally well with any or all of the four nucleoside diphos-
phates, and the base composition of the resulting poly-
mer reflects nothing more than the relative concentra-
tions of the 5H11032-diphosphate substrates in the medium.
Polynucleotide phosphorylase can be used in the
laboratory to prepare RNA polymers with many differ-
ent base sequences and frequencies. Synthetic RNA
polymers of this sort were critical for deducing the ge-
netic code for the amino acids (Chapter 27).
SUMMARY 26.2 RNA Processing
■ Eukaryotic mRNAs are modified by addition of
a 7-methylguanosine residue at the 5H11032 end and
by cleavage and polyadenylation at the 3H11032 end
to form a long poly(A) tail.
■ Many primary mRNA transcripts contain introns
(noncoding regions), which are removed by
splicing. Excision of the group I introns found
in some rRNAs requires a guanosine cofactor.
Some group I and group II introns are capable of
self-splicing; no protein enzymes are required.
Nuclear mRNA precursors have a third class
(the largest class) of introns, which are spliced
Chapter 26 RNA Metabolism1020
Marianne Grunberg-Manago Severo Ochoa,
1905–1993
8885d_c26_995-1035 2/12/04 11:18 AM Page 1020 mac34 mac34: kec_420:
with the aid of RNA-protein complexes called
snRNPs, assembled into spliceosomes. A fourth
class of introns, found in some tRNAs, is the only
class known to be spliced by protein enzymes.
■ Ribosomal RNAs and transfer RNAs are derived
from longer precursor RNAs, trimmed by
nucleases. Some bases are modified
enzymatically during the maturation process.
■ The self-splicing introns and the RNA
component of RNase P (which cleaves the 5H11032
end of tRNA precursors) are two examples of
ribozymes. These biological catalysts have the
properties of true enzymes. They generally pro-
mote hydrolytic cleavage and transesterification,
using RNA as substrate. Combinations of these
reactions can be promoted by the excised
group I intron of Tetrahymena rRNA, resulting
in a type of RNA polymerization reaction.
■ Polynucleotide phosphorylase reversibly forms
RNA-like polymers from ribonucleoside
5H11032-diphosphates, adding or removing
ribonucleotides at the 3H11032-hydroxyl end of the
polymer. The enzyme degrades RNA in vivo.
26.3 RNA-Dependent Synthesis
of RNA and DNA
In our discussion of DNA and RNA synthesis up to this
point, the role of the template strand has been reserved
for DNA. However, some enzymes use an RNA template
for nucleic acid synthesis. With the very important ex-
ception of viruses with an RNA genome, these enzymes
play only a modest role in information pathways. RNA
viruses are the source of most RNA-dependent poly-
merases characterized so far.
The existence of RNA replication requires an elab-
oration of the central dogma (Fig. 26–28; contrast this
with the diagram on p. 922). The enzymes involved in
RNA replication have profound implications for investi-
gations into the nature of self-replicating molecules that
may have existed in prebiotic times.
Reverse Transcriptase Produces DNA from Viral RNA
Certain RNA viruses that infect animal cells carry within
the viral particle an RNA-dependent DNA polymerase
called reverse transcriptase. On infection, the single-
stranded RNA viral genome (~10,000 nucleotides) and
the enzyme enter the host cell. The reverse transcrip-
tase first catalyzes the synthesis of a DNA strand com-
plementary to the viral RNA (Fig. 26–29), then degrades
the RNA strand of the viral RNA-DNA hybrid and re-
places it with DNA. The resulting duplex DNA often be-
comes incorporated into the genome of the eukaryotic
host cell. These integrated (and dormant) viral genes
can be activated and transcribed, and the gene prod-
ucts—viral proteins and the viral RNA genome itself—
packaged as new viruses. The RNA viruses that contain
reverse transcriptases are known as retroviruses
(retro is the Latin prefix for “backward”).
26.3 RNA-Dependent Synthesis of RNA and DNA 1021
DNA
replication
DNA
RNA
RNA
replication
Reverse
transcription
Transcription
Translation
Protein
FIGURE 26–28 Extension of the central dogma to include RNA-
dependent synthesis of RNA and DNA.
FIGURE 26–29 Retroviral infection of a mammalian cell and inte-
gration of the retrovirus into the host chromosome. Viral particles
entering the host cell carry viral reverse transcriptase and a cellular
tRNA (picked up from a former host cell) already base-paired to the
viral RNA. The tRNA facilitates immediate conversion of viral RNA
to double-stranded DNA by the action of reverse transcriptase, as de-
scribed in the text. Once converted to double-stranded DNA, the
DNA enters the nucleus and is integrated into the host genome. The
integration is catalyzed by a virally encoded integrase. Integration of
viral DNA into host DNA is mechanistically similar to the insertion
of transposons in bacterial chromosomes (see Fig. 25–43). For ex-
ample, a few base pairs of host DNA become duplicated at the site
of integration, forming short repeats of 4 to 6 bp at each end of the
inserted retroviral DNA (not shown).
Cytoplasm
RNA genome
Retrovirus
Host cell
RNA
reverse transcription
Viral DNA
Nucleus
Chromosome
integration
8885d_c26_995-1035 2/12/04 11:18 AM Page 1021 mac34 mac34: kec_420:
The existence of reverse transcriptases in RNA
viruses was predicted by Howard Temin in 1962, and the
enzymes were ultimately detected by Temin and, inde-
pendently, by David Baltimore in 1970. Their discovery
aroused much attention as dogma-shaking proof that
genetic information can flow “backward” from RNA to
DNA.
Retroviruses typically have three genes: gag (de-
rived from the historical designation group associated
antigen), pol, and env (Fig. 26–30). The transcript that
contains gag and pol is translated into a long “polypro-
tein,” a single large polypeptide that is cleaved into six
proteins with distinct functions. The proteins derived
from the gag gene make up the interior core of the vi-
ral particle. The pol gene encodes the protease that
cleaves the long polypeptide, an integrase that inserts
the viral DNA into the host chromosomes, and reverse
transcriptase. Many reverse transcriptases have two
Chapter 26 RNA Metabolism1022
Howard Temin, David Baltimore
1934–1994
gagw pol env
LTR
Host-cell
DNA
LTR
transcription
translation
Primary
transcript
Virus
structural
proteins
Integrase
Protease
Reverse
transcriptase
proteolytic
cleavage
Polyprotein A
Viral envelope
proteins
proteolytic
cleavage
Polyprotein B
FIGURE 26–30 Structure and gene products of an integrated retro-
viral genome. The long terminal repeats (LTRs) have sequences needed
for the regulation and initiation of transcription. The sequence denoted
H9023 is required for packaging of retroviral RNAs into mature viral par-
ticles. Transcription of the retroviral DNA produces a primary tran-
script encompassing the gag, pol, and env genes. Translation (Chap-
ter 27) produces a polyprotein, a single long polypeptide derived from
the gag and pol genes, which is cleaved into six distinct proteins. Splic-
ing of the primary transcript yields an mRNA derived largely from the
env gene, which is also translated into a polyprotein, then cleaved to
generate viral envelope proteins.
subunits, H9251 and H9252. The pol gene specifies the H9252 subunit
(M
r
90,000), and the H9251 subunit (M
r
65,000) is simply a
proteolytic fragment of the H9252 subunit. The env gene en-
codes the proteins of the viral envelope. At each end of
the linear RNA genome are long terminal repeat (LTR)
sequences of a few hundred nucleotides. Transcribed
into the duplex DNA, these sequences facilitate inte-
gration of the viral chromosome into the host DNA and
contain promoters for viral gene expression.
Reverse transcriptases catalyze three different re-
actions: (1) RNA-dependent DNA synthesis, (2) RNA
degradation, and (3) DNA-dependent DNA synthesis.
Like many DNA and RNA polymerases, reverse tran-
scriptases contain Zn
2H11001
. Each transcriptase is most ac-
tive with the RNA of its own virus, but each can be used
experimentally to make DNA complementary to a vari-
ety of RNAs. The DNA and RNA synthesis and RNA
degradation activities use separate active sites on the
protein. For DNA synthesis to begin, the reverse tran-
scriptase requires a primer, a cellular tRNA obtained
during an earlier infection and carried within the viral
particle. This tRNA is base-paired at its 3H11032 end with a
complementary sequence in the viral RNA. The new
DNA strand is synthesized in the 5H11032n3H11032 direction, as in
all RNA and DNA polymerase reactions. Reverse tran-
scriptases, like RNA polymerases, do not have 3H11032n5H11032
proofreading exonucleases. They generally have error
rates of about 1 per 20,000 nucleotides added. An error
rate this high is extremely unusual in DNA replication
and appears to be a feature of most enzymes that repli-
cate the genomes of RNA viruses. A consequence is a
higher mutation rate and faster rate of viral evolution,
which is a factor in the frequent appearance of new
strains of disease-causing retroviruses.
Reverse transcriptases have become important
reagents in the study of DNA-RNA relationships and in
DNA cloning techniques. They make possible the syn-
thesis of DNA complementary to an mRNA template,
and synthetic DNA prepared in this manner, called com-
plementary DNA (cDNA), can be used to clone cel-
lular genes (see Fig. 9–14).
8885d_c26_995-1035 2/12/04 11:18 AM Page 1022 mac34 mac34: kec_420:
Some Retroviruses Cause Cancer and AIDS
Retroviruses have featured prominently in recent ad-
vances in the molecular understanding of cancer. Most
retroviruses do not kill their host cells but remain inte-
grated in the cellular DNA, replicating when the cell di-
vides. Some retroviruses, classified as RNA tumor
viruses, contain an oncogene that can cause the cell to
grow abnormally (see Fig. 12–47). The first retrovirus
of this type to be studied was the Rous sarcoma virus
(also called avian sarcoma virus; Fig. 26–31), named for
F. Peyton Rous, who studied chicken tumors now known
to be caused by this virus. Since the initial discovery of
oncogenes by Harold Varmus and Michael Bishop, many
dozens of such genes have been found in retroviruses.
The human immunodeficiency virus (HIV), which
causes acquired immune deficiency syndrome (AIDS),
is a retrovirus. Identified in 1983, HIV has an RNA
genome with standard retroviral genes along with sev-
eral other unusual genes (Fig. 26–32). Unlike many
other retroviruses, HIV kills many of the cells it infects
(principally T lymphocytes) rather than causing tumor
formation. This gradually leads to suppression of the im-
mune system in the host organism. The reverse tran-
scriptase of HIV is even more error prone than other
known reverse transcriptases—ten times more so—
resulting in high mutation rates in this virus. One or
more errors are generally made every time the viral
genome is replicated, so any two viral RNA molecules
are likely to differ.
Many modern vaccines for viral infections consist
of one or more coat proteins of the virus, produced by
methods described in Chapter 9. These proteins are not
infectious on their own but stimulate the immune sys-
tem to recognize and resist subsequent viral invasions
(Chapter 5). Because of the high error rate of the HIV
reverse transcriptase, the env gene in this virus (along
with the rest of the genome) undergoes very rapid mu-
tation, complicating the development of an effective
vaccine. However, repeated cycles of cell invasion and
replication are needed to propagate an HIV infection,
so inhibition of viral enzymes offers promise as an ef-
fective therapy. The HIV protease is targeted by a class
of drugs called protease inhibitors (see Box 6–3). Re-
verse transcriptase is the target of some additional
drugs widely used to treat HIV-infected individuals
(Box 26–2).
Many Transposons, Retroviruses, and Introns May
Have a Common Evolutionary Origin
Some well-characterized eukaryotic DNA transposons
from sources as diverse as yeast and fruit flies have a
structure very similar to that of retroviruses; these are
sometimes called retrotransposons (Fig. 26–33). Retro-
transposons encode an enzyme homologous to the retro-
viral reverse transcriptase, and their coding regions are
flanked by LTR sequences. They transpose from one po-
sition to another in the cellular genome by means of an
RNA intermediate, using reverse transcriptase to make
a DNA copy of the RNA, followed by integration of
the DNA at a new site. Most transposons in eukaryotes
use this mechanism for transposition, distinguishing
them from bacterial transposons, which move as DNA
directly from one chromosomal location to another (see
Fig. 25–43).
26.3 RNA-Dependent Synthesis of RNA and DNA 1023
gag envpol
LTR LTR
src
FIGURE 26–31 Rous sarcoma virus genome. The src gene encodes a
tyrosine-specific protein kinase, one of a class of enzymes known to
function in systems that affect cell division, cell-cell interactions, and
intercellular communication (Chapter 12). The same gene is found in
chicken DNA (the usual host for this virus) and in the genomes of
many other eukaryotes, including humans. When associated with the
Rous sarcoma virus, this oncogene is often expressed at abnormally
high levels, contributing to unregulated cell division and cancer.
LTR LTR
pol env
gag vif tat vpu tat nef
revrevvpr
FIGURE 26–32 The genome of HIV, the virus that causes AIDS. In
addition to the typical retroviral genes, HIV contains several small
genes with a variety of functions (not identified here, and not all
known). Some of these genes overlap (see Box 27–1). Alternative
splicing mechanisms produce many different proteins from this small
(9.7 H11003 10
3
nucleotides) genome.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1023 mac34 mac34: kec_420:
Retrotransposons lack an env gene and so cannot
form viral particles. They can be thought of as defective
viruses, trapped in cells. Comparisons between retro-
viruses and eukaryotic transposons suggest that reverse
transcriptase is an ancient enzyme that predates the
evolution of multicellular organisms.
Interestingly, many group I and group II introns are
also mobile genetic elements. In addition to their self-
splicing activities, they encode DNA endonucleases that
promote their movement. During genetic exchanges be-
tween cells of the same species, or when DNA is intro-
duced into a cell by parasites or by other means, these
endonucleases promote insertion of the intron into an
identical site in another DNA copy of a homologous gene
that does not contain the intron, in a process termed
homing (Fig. 26–34). Whereas group I intron homing is
DNA-based, group II intron homing occurs through an
RNA intermediate. The endonucleases of the group II
introns have associated reverse transcriptase activity.
The proteins can form complexes with the intron RNAs
themselves, after the introns are spliced from the pri-
mary transcripts. Because the homing process involves
insertion of the RNA intron into DNA and reverse tran-
scription of the intron, the movement of these introns
has been called retrohoming. Over time, every copy of
a particular gene in a population may acquire the intron.
Chapter 26 RNA Metabolism1024
BOX 26–2 BIOCHEMISTRY IN MEDICINE
Fighting AIDS with Inhibitors of HIV
Reverse Transcriptase
Research into the chemistry of template-dependent
nucleic acid biosynthesis, combined with modern
techniques of molecular biology, has elucidated the life
cycle and structure of the human immunodeficiency
virus, the retrovirus that causes AIDS. A few years af-
ter the isolation of HIV, this research resulted in the
development of drugs capable of prolonging the lives
of people infected by HIV.
The first drug to be approved for clinical use was
AZT, a structural analog of deoxythymidine. AZT was
first synthesized in 1964 by Jerome P. Horwitz. It failed
as an anticancer drug (the purpose for which it was
made), but in 1985 it was found to be a useful treat-
ment for AIDS. AZT is taken up by T lymphocytes,
immune system cells that are particularly vulnerable
to HIV infection, and converted to AZT triphosphate.
(AZT triphosphate taken directly would be ineffective,
because it cannot cross the plasma membrane.) HIV’s
reverse transcriptase has a higher affinity for AZT
triphosphate than for dTTP, and binding of AZT
triphosphate to this enzyme competitively inhibits
dTTP binding. When AZT is added to the 3H11032 end of
the growing DNA strand, lack of a 3H11032 hydroxyl means
that the DNA strand is terminated prematurely and
viral DNA synthesis grinds to a halt.
AZT triphosphate is not as toxic to the T lym-
phocytes themselves, because cellular DNA poly-
merases have a lower affinity for this compound than
for dTTP. At concentrations of 1 to 5 H9262M, AZT affects
HIV reverse transcription but not most cellular DNA
replication. Unfortunately, AZT appears to be toxic to
the bone marrow cells that are the progenitors of ery-
throcytes, and many individuals taking AZT develop
anemia. AZT can increase the survival time of people
with advanced AIDS by about a year, and it delays the
onset of AIDS in those who are still in the early stages
of HIV infection. Some other AIDS drugs, such as
dideoxyinosine (DDI), have a similar mechanism of ac-
tion. Newer drugs target and inactivate the HIV pro-
tease. Because of the high error rate of HIV reverse
transcriptase and the resulting rapid evolution of HIV,
the most effective treatments of HIV infections use a
combination of drugs directed at both the protease
and the reverse transcriptase.
FIGURE 26–33 Eukaryotic transposons. The Ty element of the yeast
Saccharomyces and the copia element of the fruit fly Drosophila serve
as examples of eukaryotic transposons, which often have a structure
similar to retroviruses but lack the env gene. The H9254 sequences of the
Ty element are functionally equivalent to retroviral LTRs. In the copia
element, int and RT are homologous to the integrase and reverse tran-
scriptase segments, respectively, of the pol gene.
Ty element
(Saccharomyces)
LTR
Copia element
(Drosophila)
LTR
gag int
RT?
H9254H9254TYA TYB
(gag)(LTR) (LTR)(pol)
O
N
N
N
NH
CH
3
HN
NO
O
HOCH
2
H
N
H11001H11002
O
H
H
H
H
O
HOCH
2
H
H
H
H
H
H
NN
3H11032-Azido-2H11032,3H11032-dideoxy-
thymidine (AZT)
2H11032,3H11032-Dideoxyinosine (DDI)
8885d_c26_995-1035 2/12/04 11:18 AM Page 1024 mac34 mac34: kec_420:
Much more rarely, the intron may insert itself into a new
location in an unrelated gene. If this event does not kill
the host cell, it can lead to the evolution and distribu-
tion of an intron in a new location. The structures and
mechanisms used by mobile introns support the idea
that at least some introns originated as molecular par-
asites whose evolutionary past can be traced to retro-
viruses and transposons.
Telomerase Is a Specialized Reverse Transcriptase
Telomeres, the structures at the ends of linear eukary-
otic chromosomes (see Fig. 24–9), generally consist of
many tandem copies of a short oligonucleotide se-
quence. This sequence usually has the form T
x
G
y
in one
strand and C
y
A
x
in the complementary strand, where x
and y are typically in the range of 1 to 4 (p. 930). Telo-
meres vary in length from a few dozen base pairs in some
ciliated protozoans to tens of thousands of base pairs in
mammals. The TG strand is longer than its complement,
leaving a region of single-stranded DNA of up to a few
hundred nucleotides at the 3H11032 end.
The ends of a linear chromosome are not readily
replicated by cellular DNA polymerases. DNA replica-
tion requires a template and primer, and beyond the end
of a linear DNA molecule no template is available for the
pairing of an RNA primer. Without a special mechanism
for replicating the ends, chromosomes would be short-
ened somewhat in each cell generation. The enzyme
telomerase solves this problem by adding telomeres to
chromosome ends.
26.3 RNA-Dependent Synthesis of RNA and DNA 1025
homing endonuclease
DNA for gene X,
allele b, no intron
Gene X, allele a with intron
double-strand break repair
a with intron
b with intron
(b) Homing
Type II intron
endonuclease
transcription
translationSpliced intron
reverse
splicing
splicing
RNA replaced by DNA, ligation
b with intron
(c) Retrohoming
reverse transcriptase
DNA for gene Y,
allele a, donor
Endonuclease/
reverse transcriptase
DNA for gene Y,
allele b, recipient
transcription
DNA for gene X,
allele a
Type I intron
splicing
Primary
transcript
translation
Homing endonucleaseGene X product
Spliced type I
intron
(a) Production of homing endonucleaseFIGURE 26–34 Introns that move: homing and retrohoming. Certain
introns include a gene (shown in red) for enzymes that promote hom-
ing (type I introns) or retrohoming (type II introns). (a) The gene within
the spliced intron is bound by a ribosome and translated. Type I hom-
ing introns specify a site-specific endonuclease, called a homing en-
donuclease. Type II retrohoming introns specify a protein with both
endonuclease and reverse transcriptase activities.
(b) Homing. Allele a of a gene X containing a type I homing in-
tron is present in a cell containing allele b of the same gene, which
lacks the intron. The homing endonuclease produced by a cleaves b
at the position corresponding to the intron in a, and double-strand
break repair (recombination with allele a; see Fig. 25–31a) then cre-
ates a new copy of the intron in b. (c) Retrohoming. Allele a of gene
Y contains a retrohoming type II intron; allele b lacks the intron. The
spliced intron inserts itself into the coding strand of b in a reaction
that is the reverse of the splicing that excised the intron from the pri-
mary transcript (see Fig. 26–15), except that here the insertion is into
DNA rather than RNA. The noncoding DNA strand of b is then cleaved
by the intron-encoded endonuclease/reverse transcriptase. This same
enzyme uses the inserted RNA as a template to synthesize a comple-
mentary DNA strand. The RNA is then degraded by cellular ribonu-
cleases and replaced with DNA.
8885d_c26_995-1033 2/12/04 2:46 PM Page 1025 mac34 mac34: kec_420:
Although the existence of this enzyme may not be
surprising, the mechanism by which it acts is remark-
able and unprecedented. Telomerase, like some other
enzymes described in this chapter, contains both RNA
and protein components. The RNA component is about
150 nucleotides long and contains about 1.5 copies of
the appropriate C
y
A
x
telomere repeat. This region of the
RNA acts as a template for synthesis of the T
x
G
y
strand
of the telomere. Telomerase thereby acts as a cellular
reverse transcriptase that provides the active site for
RNA-dependent DNA synthesis. Unlike retroviral re-
verse transcriptases, telomerase copies only a small
segment of RNA that it carries within itself. Telomere
synthesis requires the 3H11032 end of a chromosome as primer
and proceeds in the usual 5H11032n3H11032 direction. Having syn-
thesized one copy of the repeat, the enzyme repositions
to resume extension of the telomere (Fig. 26–35a).
After extension of the T
x
G
y
strand by telomerase,
the complementary C
y
A
x
strand is synthesized by cel-
lular DNA polymerases, starting with an RNA primer
(see Fig. 25–13). The single-stranded region is pro-
tected by specific binding proteins in many lower eu-
karyotes, especially those species with telomeres of less
than a few hundred base pairs. In higher eukaryotes (in-
cluding mammals) with telomeres many thousands of
base pairs long, the single-stranded end is sequestered
in a specialized structure called a T loop. The single-
stranded end is folded back and paired with its com-
plement in the double-stranded portion of the telomere.
The formation of a T loop involves invasion of the 3H11032 end
Chapter 26 RNA Metabolism1026
(a)
DNA 5H11032
3H11032
TTTTGGGGTTTTG
3H11032 5H11032
C
U
A
G
C
CAAAACCCCAA AA
C
A
A
A
OH(3H11032)
Internal
template RNA
5H11032
3H11032
TTTTGGGGTTTTGGGGTTT T
3H11032 5H11032
C
U
A
G
C
CAAAACCCCAAA
A
C
A
A
A
OH(3H11032)G
polymerization and
hybridization
5H11032
3H11032
3H11032 5H11032
C
U
A
G
C
CAAAACCCCAA A
A
C
A
A
A
OH(3H11032)G
translocation and
rehybridization
Telomerase
Further polymerization
TTTTGGGGTTTTGGGGTTT T
2
1
3
FIGURE 26–35 The TG strand and T loop of telomeres. The internal
template RNA of telomerase binds to and base-pairs with the DNA’s
TG primer (TxGy). 1 Telomerase adds more T and G residues to the
TG primer, then 2 repositions the internal template RNA to allow
3 the addition of more T and G residues. The complementary strand
is synthesized by cellular DNA polymerases (not shown). (b) Proposed
structure of T loops in telomeres. The single-stranded tail synthesized
by telomerase is folded back and paired with its complement in the
duplex portion of the telomere. The telomere is bound by several
telomere-binding proteins, including TRF1 and TRF2 (telomere repeat
binding factors). (c) Electron micrograph of a T loop at the end of a
chromosome isolated from a mouse hepatocyte. The bar at the bot-
tom of the micrograph represents a length of 5,000 bp.
(b)
3H11032
5H11032
TRF1
and
TRF2
TG
strand
Telomere
duplex DNA-
binding
proteins
CA
strand
(c)
8885d_c26_995-1033 2/12/04 2:46 PM Page 1026 mac34 mac34: kec_420:
of the telomere’s single strand into the duplex DNA, per-
haps by a mechanism similar to the initiation of homol-
ogous genetic recombination (see Fig. 25–31). In mam-
mals, the looped DNA is bound by two proteins, TRF1
and TRF2, with the latter protein involved in formation
of the T loop. T loops protect the 3H11032 ends of chromo-
somes, making them inaccessible to nucleases and the
enzymes that repair double-strand breaks (Fig. 26–35b).
In protozoans (such as Tetrahymena), loss of
telomerase activity results in a gradual shortening of
telomeres with each cell division, ultimately leading to
the death of the cell line. A similar link between telo-
mere length and cell senescence (cessation of cell divi-
sion) has been observed in humans. In germ-line cells,
which contain telomerase activity, telomere lengths are
maintained; in somatic cells, which lack telomerase, they
are not. There is a linear, inverse relationship between
the length of telomeres in cultured fibroblasts and the
age of the individual from whom the fibroblasts were
taken: telomeres in human somatic cells gradually
shorten as an individual ages. If the telomerase reverse
transcriptase is introduced into human somatic cells in
vitro, telomerase activity is restored and the cellular life
span increases markedly.
Is the gradual shortening of telomeres a key to the
aging process? Is our natural life span determined by
the length of the telomeres we are born with? Further
research in this area should yield some fascinating
insights.
Some Viral RNAs Are Replicated by RNA-Dependent
RNA Polymerase
Some E. coli bacteriophages, including f2, MS2, R17,
and QH9252, as well as some eukaryotic viruses (including
influenza and Sindbis viruses, the latter associated with
a form of encephalitis) have RNA genomes. The single-
stranded RNA chromosomes of these viruses, which also
function as mRNAs for the synthesis of viral proteins, are
replicated in the host cell by an RNA-dependent RNA
polymerase (RNA replicase). All RNA viruses—with
the exception of retroviruses—must encode a protein
with RNA-dependent RNA polymerase activity because
the host cells do not possess this enzyme.
The RNA replicase of most RNA bac-
teriophages has a molecular weight of
~210,000 and consists of four subunits.
One subunit (M
r
65,000) is the product
of the replicase gene encoded by the vi-
ral RNA and has the active site for repli-
cation. The other three subunits are host
proteins normally involved in host-cell
protein synthesis: the E. coli elongation
factors Tu (M
r
30,000) and Ts (M
r
45,000)
(which ferry amino acyl–tRNAs to the
ribosomes) and the protein S1 (an inte-
gral part of the 30S ribosomal subunit).
These three host proteins may help the RNA replicase
locate and bind to the 3H11032 ends of the viral RNAs.
RNA replicase isolated from QH9252-infected E. coli
cells catalyzes the formation of an RNA complementary
to the viral RNA, in a reaction equivalent to that cat-
alyzed by DNA-dependent RNA polymerases. New RNA
strand synthesis proceeds in the 5H11032n3H11032 direction by a
chemical mechanism identical to that used in all other
nucleic acid synthetic reactions that require a template.
RNA replicase requires RNA as its template and will not
function with DNA. It lacks a separate proofreading en-
donuclease activity and has an error rate similar to that
of RNA polymerase. Unlike the DNA and RNA poly-
merases, RNA replicases are specific for the RNA of
their own virus; the RNAs of the host cell are generally
not replicated. This explains how RNA viruses are pref-
erentially replicated in the host cell, which contains
many other types of RNA.
RNA Synthesis Offers Important Clues to
Biochemical Evolution
The extraordinary complexity and order that distinguish
living from inanimate systems are key manifestations of
fundamental life processes. Maintaining the living state
requires that selected chemical transformations occur
very rapidly—especially those that use environmental
energy sources and synthesize elaborate or specialized
cellular macromolecules. Life depends on powerful and
selective catalysts—enzymes—and on informational
systems capable of both securely storing the blueprint
for these enzymes and accurately reproducing the blue-
print for generation after generation. Chromosomes en-
code the blueprint not for the cell but for the enzymes
that construct and maintain the cell. The parallel de-
mands for information and catalysis present a classic co-
nundrum: what came first, the information needed to
specify structure or the enzymes needed to maintain
and transmit the information?
The unveiling of the structural and functional com-
plexity of RNA led Carl Woese, Francis Crick, and Leslie
Orgel to propose in the 1960s that this macromolecule
might serve as both information carrier and catalyst.
The discovery of catalytic RNAs took this proposal from
26.3 RNA-Dependent Synthesis of RNA and DNA 1027
Carl Woese Francis Crick Leslie Orgel
8885d_c26_995-1035 2/12/04 11:18 AM Page 1027 mac34 mac34: kec_420:
conjecture to hypothesis and has led to widespread
speculation that an “RNA world” might have been im-
portant in the transition from prebiotic chemistry to life
(see Fig. 1–34). The parent of all life on this planet, in
the sense that it could reproduce itself across the gen-
erations from the origin of life to the present, might
have been a self-replicating RNA or a polymer with
equivalent chemical characteristics.
How might a self-replicating polymer come to be?
How might it maintain itself in an environment where
the precursors for polymer synthesis are scarce? How
could evolution progress from such a polymer to the
modern DNA-protein world? These difficult questions
can be addressed by careful experimentation, providing
clues about how life on Earth began and evolved.
The probable origin of purine and pyrimidine bases
is suggested by experiments designed to test hypothe-
ses about prebiotic chemistry (pp. 32–33). Beginning
with simple molecules thought to be present in the early
atmosphere (CH
4
, NH
3
, H
2
O, H
2
), electrical discharges
such as lightning generate, first, more reactive mole-
cules such as HCN and aldehydes, then an array of
amino acids and organic acids (see Fig. 1–33). When
molecules such as HCN become abundant, purine and
pyrimidine bases are synthesized in detectable amounts.
Remarkably, a concentrated solution of ammonium
cyanide, refluxed for a few days, generates adenine in
yields of up to 0.5% (Fig. 26–36). Adenine may well have
been the first and most abundant nucleotide constituent
to appear on Earth. Intriguingly, most enzyme cofactors
contain adenosine as part of their structure, although it
plays no direct role in the cofactor function (see Fig.
8–41). This may suggest an evolutionary relationship,
based on the simple synthesis of adenine from cyanide.
The RNA world hypothesis requires a nucleotide
polymer to reproduce itself. Can a ribozyme bring about
its own synthesis in a template-directed manner? The
self-splicing rRNA intron of Tetrahymena (Fig. 26–26)
catalyzes the reversible attack of a guanosine residue
on the 5H11032 splice junction (Fig. 26–37). If the 5H11032 splice
site and the internal guide sequence are removed from
the intron, the rest of the intron can bind RNA strands
paired with short oligonucleotides. Part of the remain-
ing intact intron effectively acts as a template for the
alignment and ligation of the short oligonucleotides. The
reaction is in essence a reversal of the attack of guano-
sine on the 5H11032 splice junction, but the result is the syn-
thesis of long RNA polymers from short ones, with the
sequence of the product defined by an RNA template.
A self-replicating polymer would quickly use up
available supplies of precursors provided by the rela-
tively slow processes of prebiotic chemistry. Thus, from
an early stage in evolution, metabolic pathways would
be required to generate precursors efficiently, with the
synthesis of precursors presumably catalyzed by ri-
bozymes. The extant ribozymes found in nature have a
limited repertoire of catalytic functions, and of the ri-
bozymes that may once have existed, no trace is left. To
explore the RNA world hypothesis more deeply, we need
to know whether RNA has the potential to catalyze the
many different reactions needed in a primitive system
of metabolic pathways.
The search for RNAs with new catalytic functions
has been aided by the development of a method that
rapidly searches pools of random polymers of RNA and
extracts those with particular activities: SELEX is noth-
ing less than accelerated evolution in a test tube (Box
26–3). It has been used to generate RNA molecules that
bind to amino acids, organic dyes, nucleotides, cyano-
cobalamin, and other molecules. Researchers have iso-
lated ribozymes that catalyze ester and amide bond for-
mation, S
N
2 reactions, metallation of (addition of metal
ions to) porphyrins, and carbon–carbon bond formation.
The evolution of enzymatic cofactors with nucleotide
“handles” that facilitate their binding to ribozymes might
have further expanded the repertoire of chemical
processes available to primitive metabolic systems.
As we shall see in the next chapter, some natural
RNA molecules catalyze the formation of peptide bonds,
offering an idea of how the RNA world might have been
transformed by the greater catalytic potential of pro-
teins. The synthesis of proteins would have been a ma-
jor event in the evolution of the RNA world, but would
also have hastened its demise. The information-
carrying role of RNA may have passed to DNA because
DNA is chemically more stable. RNA replicase and re-
verse transcriptase may be modern versions of enzymes
that once played important roles in making the transi-
tion to the modern DNA-based system.
Molecular parasites may also have originated in an
RNA world. With the appearance of the first inefficient
self-replicators, transposition could have been a poten-
tially important alternative to replication as a strategy
for successful reproduction and survival. Early parasitic
RNAs would simply hop into a self-replicating molecule
via catalyzed transesterification, then passively undergo
replication. Natural selection would have driven trans-
position to become site-specific, targeting sequences
that did not interfere with the catalytic activities of the
Chapter 26 RNA Metabolism1028
N
N
H
N
NH
2
N
C C
C
C
CHCN
(NH
4
CN)
Reflux
FIGURE 26–36 Possible prebiotic synthesis of adenine from ammo-
nium cyanide. Adenine is derived from five molecules of cyanide, de-
noted by shading.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1028 mac34 mac34: kec_420:
host RNA. Replicators and RNA transposons could have
existed in a primitive symbiotic relationship, each con-
tributing to the evolution of the other. Modern introns,
retroviruses, and transposons may all be vestiges of a
“piggy-back” strategy pursued by early parasitic RNAs.
These elements continue to make major contributions
to the evolution of their hosts.
Although the RNA world remains a hypothesis, with
many gaps yet to be explained, experimental evidence
supports a growing list of its key elements. Further ex-
perimentation should increase our understanding. Im-
portant clues to the puzzle will be found in the work-
ings of fundamental chemistry, in living cells, and
perhaps on other planets.
26.3 RNA-Dependent Synthesis of RNA and DNA 1029
Template
RNA
Complementary
oligo-RNAs
G
GGAGUACCAC
G
G
G
AGUAGCAC
CCUCAU GUGCCG UCAUC UGG
O
H
G
CAUGGU CCUCAGUCGUG
GUACCA GGAGUCAAC
UGAC
G
C
G
G
U
A
C
GU
A
U
A
U
U
C
A
C
G
A
C
U
A
A
A
U
U
U
(a)
(b)
U
G G
Ribozyme
P1
Internal
guide sequence
AU UGAC
G
C
G
U
A
C
GU
A
U
AA
U
U
C
A
C
G
A
C
U
A
A
A
U
U
U
U
G G
Cleaved ribozyme
P1
U
G
3H11032
5H11032
3H11032
5H11032
5H11032
FIGURE 26–37 RNA-dependent synthesis of an RNA polymer from
oligonucleotide precursors. (a) The first step in the removal of the self-
splicing group I intron of the rRNA precursor of Tetrahymena is re-
versible attack of a guanosine residue on the 5H11032 splice site. Only P1,
the region of the ribozyme that includes the internal guide sequence
(boxed) and the 5H11032 splice site, is shown in detail; the rest of the ri-
bozyme is represented as a green blob. The complete secondary struc-
ture of the ribozyme is shown in Figure 26–26. (b) If P1 is removed
(shown as the darker green “hole”), the ribozyme retains both its three-
dimensional shape and its catalytic capacity. A new RNA molecule
added in vitro can bind to the ribozyme in the same manner as does
the internal guide sequence of P1 in (a). This provides a template for
further RNA polymerization reactions when oligonucleotides com-
plementary to the added RNA base-pair with it. The ribozyme can link
these oligonucleotides in a process equivalent to the reversal of the
reaction in (a). Although only one such reaction is shown in (b), re-
peated binding and catalysis can result in the RNA-dependent syn-
thesis of long RNA polymers.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1029 mac34 mac34: kec_420:
BOX 26–3 WORKING IN BIOCHEMISTRY
The SELEX Method for Generating RNA Polymers
with New Functions
SELEX (systematic evolution of ligands by exponen-
tial enrichment) is used to generate aptamers,
oligonucleotides selected to tightly bind a specific mo-
lecular target. The process is generally automated to
allow rapid identification of one or more aptamers with
the desired binding specificity.
Figure 1 illustrates how SELEX is used to select
an RNA species that binds tightly to ATP. In step 1 ,
a random mixture of RNA polymers is subjected to
“unnatural selection” by passing it through a resin to
which ATP is attached. The practical limit for the com-
plexity of an RNA mixture in SELEX is about 10
15
dif-
ferent sequences, which allows for the complete ran-
domization of 25 nucleotides (4
25
H11005 10
15
). When
longer RNAs are used, the RNA pool used to initiate
the search does not include all possible sequences.
2 RNA polymers that pass through the column are
discarded; 3 those that bind to ATP are washed from
the column with salt solution and collected. 4 The
collected RNA polymers are amplified by reverse tran-
scriptase to make many DNA complements to the se-
lected RNAs; then an RNA polymerase makes many
RNA complements of the resulting DNA molecules.
5 This new pool of RNA is subjected to the same se-
lection procedure, and the cycle is repeated a dozen
or more times. At the end, only a few aptamers, in this
case RNA sequences with considerable affinity for
ATP, remain.
Critical sequence features of an RNA aptamer that
binds ATP are shown in Figure 2; molecules with this
general structure bind ATP (and other adenosine nu-
cleotides) with K
d
H11021 50 H9262M. Figure 3 presents the
three-dimensional structure of a 36 nucleotide RNA
aptamer (shown as a complex with AMP) generated
by SELEX. This RNA has the backbone structure
shown in Figure 2.
In addition to its use in exploring the potential
functionality of RNA, SELEX has an important
practical side in identifying short RNAs with
pharmaceutical uses. Finding an aptamer that binds
specifically to every potential therapeutic target may
be impossible, but the capacity of SELEX to rapidly
select and amplify a specific oligonucleotide sequence
from a highly complex pool of sequences makes this
a promising approach for the generation of new ther-
apies. For example, one could select an RNA that
binds tightly to a receptor protein prominent in the
plasma membrane of cells in a particular cancerous
tumor. Blocking the activity of the receptor, or tar-
geting a toxin to the tumor cells by attaching it to the
aptamer, would kill the cells. SELEX also has been
used to select DNA aptamers that detect anthrax
spores. Many other promising applications are under
development. ■
G
G
A
A
A
A
A
C
G
G
U
G
5H11032
3H11032
ATP
10
15
random
RNA sequences
RNA sequences
that do not bind
ATP (discard)
RNA sequences
that bind ATP
RNA sequences
enriched for
ATP-binding
function
amplify
ATP coupled
to resin
1 5
2 3
4
repeat
3H11032
5H11032
FIGURE 1 The SELEX procedure.
FIGURE 2 RNA aptamer that binds ATP. The shaded nucleotides are
those required for the binding activity.
FIGURE 3 (Derived from PDB ID 1RAW.) RNA aptamer bound to
AMP. The bases of the conserved nucleotides (forming the binding
pocket) are white; the bound AMP is red.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1030 mac34 mac34: kec_420:
SUMMARY 26.3 RNA-Dependent Synthesis
of RNA and DNA
■ RNA-dependent DNA polymerases, also called
reverse transcriptases, were first discovered in
retroviruses, which must convert their RNA
genomes into double-stranded DNA as part of
their life cycle. These enzymes transcribe the
viral RNA into DNA, a process that can be used
experimentally to form complementary DNA.
■ Many eukaryotic transposons are related to
retroviruses, and their mechanism of
transposition includes an RNA intermediate.
■ Telomerase, the enzyme that synthesizes the
telomere ends of linear chromosomes, is a
specialized reverse transcriptase that contains
an internal RNA template.
■ RNA-dependent RNA polymerases, such as the
replicases of RNA bacteriophages, are
template-specific for the viral RNA.
■ The existence of catalytic RNAs and pathways
for the interconversion of RNA and DNA has
led to speculation that an important stage in
evolution was the appearance of an RNA
(or an equivalent polymer) that could catalyze
its own replication. The biochemical potential
of RNAs can be explored by SELEX, a method
for rapidly selecting RNA sequences with
particular binding or catalytic properties.
Chapters 26 Further Reading 1031
Key Terms
transcription 995
messenger RNA (mRNA) 995
transfer RNA (tRNA) 995
ribosomal RNA (rRNA) 995
DNA-dependent RNA
polymerase 996
promoter 998
consensus sequence 998
cAMP receptor protein (CRP) 1001
repressor 1001
footprinting 1002
transcription factors 1003
ribozymes 1007
primary transcript 1007
RNA splicing 1007
5H11032 cap 1008
spliceosome 1010
poly(A) tail 1011
reverse transcriptase 1021
retrovirus 1021
complementary DNA (cDNA) 1022
homing 1024
telomerase 1025
RNA-dependent RNA polymerase
(RNA replicase) 1027
aptamer 1030
Terms in bold are defined in the glossary.
Further Reading
General
Jacob, F. & Monod, J. (1961) Genetic regulatory mechanisms in
the synthesis of proteins. J. Mol. Biol. 3, 318–356.
A classic article that introduced many important ideas.
Lodish, H., Berk, A., Matsudaira, P., Kaiser, C.A., Krieger,
M., Scott, M.P., Zipursky, S.L., & Darnell, J. (2003) Molecular
Cell Biology, 5th edn, W. H. Freeman & Company, New York.
DNA-Directed RNA Synthesis
Conaway, J.W. & Conaway, R.C. (1999) Transcription elongation
and human disease. Annu. Rev. Biochem. 68, 301–320.
Conaway, J.W., Shilatifard, A., Dvir, A., & Conaway, R.C.
(2000) Control of elongation by RNA polymerase II. Trends
Biochem. Sci. 25, 375–380.
A particularly good summary of what is known about elongation
factors.
DeHaseth, P.L., Zupancic, M.L., & Record, M.T., Jr. (1998)
RNA polymerase-promoter interactions: the comings and goings of
RNA polymerase. J. Bacteriol. 180, 3019–3025.
Friedberg, E.C. (1996) Relationships between DNA repair and
transcription. Annu. Rev. Biochem. 65, 15–42.
Kornberg, R.D. (1996) RNA polymerase II transcription control.
Trends Biochem. Sci. 21, 325–327.
Introduction to an issue of Trends in Biochemical Sciences
that is devoted to RNA polymerase II.
Mooney, R.A., Artsimovitch, I., & Landick, R. (1998)
Informational processing by RNA polymerase: recognition of
regulatory signals during RNA chain elongation. J. Bacteriol. 180,
3265–3275.
Murakami, K.S. & Darst, S.A. (2003) Bacterial RNA
polymerases: the wholo story. Curr. Opin. Struct. Biol. 13, 31–39.
This article and the two listed below explore the wealth of new
structural information and what it tells us about RNA
polymerase function.
Woychik, N.A. & Hampsey, M. (2002) The RNA polymerase II
machinery: structure illuminates function. Cell 108, 453–463.
Young, B.A., Gruber, T.M., & Gross, C.A. (2002) Views of
transcription initiation. Cell 109, 417–420.
RNA Processing
Beelman, C.A. & Parker, R. (1995) Degradation of mRNA in
eukaryotes. Cell 81, 179–183.
8885d_c26_995-1035 2/12/04 11:18 AM Page 1031 mac34 mac34: kec_420:
Chapter 26I RNA Metabolism1032
Brow, D.A. (2002) Allosteric cascade of spliceosome activation.
Annu. Rev. Genet. 36, 333–360.
Chevalier, B.S. & Stoddard, B.L. (2001) Homing endonucleases:
structural and functional insight into the catalysts of intron/intein
mobility. Nucleic Acid Res. 29, 3757–3774.
Curcio, M.J. & Belfort, M. (1996) Retrohoming: cDNA-mediated
mobility of group II introns requires a catalytic RNA. Cell 84, 9–12.
Frank, D.N. & Pace, N.R. (1998) Ribonuclease P: unity and
diversity in a tRNA-processing ribozyme. Annu. Rev. Biochem.
67, 153–180.
Jensen, T.H., Dower, K., Libri, D., & Rosbash, M. (2003) Early
formation of mRNP: license for export or quality control? Mol. Cell
11, 1129–1138.
A good summary of current ideas about the coupled processing
and transport of eukaryotic mRNAs.
Kushner, S.R. (2002) mRNA decay in Escherichia coli comes of
age. J. Bacteriol. 184, 4658–4665.
Narlikar, G.J. & Herschlag, D. (1997) Mechanistic aspects of
enzymatic catalysis: lessons from comparison of RNA and protein
enzymes. Annu. Rev. Biochem. 66, 19–59.
Proudfoot, N.J., Furger, A., & Dye, M.J. (2002) Integrating
mRNA processing with transcription. Cell 108, 501–512.
A description of current evidence for how processing is linked
to the CTD of RNA polymerase II.
Sarkar, N. (1997) Polyadenylation of mRNA in prokaryotes.
Annu. Rev. Biochem. 66, 173–197.
Staley, J.P. & Guthrie, C. (1988) Mechanical devices of the
spliceosome—motors, clocks, springs, and things. Cell 92, 315–326.
RNA-Directed RNA or DNA Synthesis
Bishop, J.M. (1991) Molecular themes in oncogenesis. Cell 64,
235–248.
A good overview of oncogenes; it introduces a series of more
detailed reviews included in the same issue of Cell.
Blackburn, E.H. (1992) Telomerases. Annu. Rev. Biochem. 61,
113–129.
Boeke, J.D. & Devine, S.E. (1998) Yeast retrotransposons:
finding a nice, quiet neighborhood. Cell 93, 1087–1089.
Collins, K. (1999) Ciliate telomerase biochemistry. Annu. Rev.
Biochem. 68, 187–218.
Frankel, A.D. & Young, J.A.T. (1998) HIV-1: fifteen proteins and
an RNA. Annu. Rev. Biochem. 67, 1–25.
Greider, C.W. (1996) Telomere length regulation. Annu. Rev.
Biochem. 65, 337–365.
Griffith, J.D., Comeau, L., Rosenfield, S., Stansel, R.M.,
Bianchi, A., Moss, H., & de Lange, T. (1999) Mammalian
telomeres end in a large duplex loop. Cell 97, 503–514.
Lingner, J. & Cech, T.R. (1998) Telomerase and chromosome
end maintenance. Curr. Opin. Genet. Dev. 8, 226–232.
Temin, H.M. (1976) The DNA provirus hypothesis: the
establishment and implications of RNA-directed DNA synthesis.
Science 192, 1075–1080.
Discussion of the original proposal for reverse transcription in
retroviruses.
Zakian, V.A. (1995) Telomeres: beginning to understand the end.
Science 270, 1601–1607.
Ribozymes and Evolution
Bittker, J.A., Phillips, K.J., & Liu, D.R. (2002) Recent
advances in the in vitro evolution of nucleic acids. Curr. Opin.
Chem. Biol. 6, 367–374.
DeRose, V.J. (2002) Two decades of RNA catalysis. Chem. Biol.
9, 961–969.
Johnston, W.K., Unrau, P.J., Lawrence, M.S., Glasner, M.E.,
& Bartel, D.P. (2001) RNA-catalyzed RNA polymerization:
accurate and general RNA-templated primer extension.
Science 292, 1319–1325.
Review of progress toward the laboratory evolution of a
self-replicating RNA.
Joyce, G.F. (2002) The antiquity of RNA-based evolution. Nature
418, 214–221.
Wilson, D.S. & Szostak, J.W. (1999) In vitro selection of
functional nucleic acids. Annu. Rev. Biochem. 68, 611–648.
Yarus, M. (2002) Primordial genetics: phenotype of the ribocyte.
Annu. Rev. Genet. 36, 125–151.
Detailed speculations about what an RNA-based life form might
have been like, and a good summary of the research behind the
speculations.
1. RNA Polymerase (a) How long would it take for the
E. coli RNA polymerase to synthesize the primary transcript
for the E. coli genes encoding the enzymes for lactose me-
tabolism (the 5,300 bp lac operon, considered in Chapter 28)?
(b) How far along the DNA would the transcription “bubble”
formed by RNA polymerase move in 10 seconds?
2. Error Correction by RNA Polymerases DNA poly-
merases are capable of editing and error correction, whereas
the capacity for error correction in RNA polymerases appears
to be quite limited. Given that a single base error in either
replication or transcription can lead to an error in protein
synthesis, suggest a possible biological explanation for this
striking difference.
3. RNA Posttranscriptional Processing Predict the
likely effects of a mutation in the sequence (5H11032)AAUAAA in
a eukaryotic mRNA transcript.
4. Coding versus Template Strands The RNA genome
of phage QH9252 is the nontemplate or coding strand, and when
introduced into the cell it functions as an mRNA. Suppose
the RNA replicase of phage QH9252 synthesized primarily
template-strand RNA and uniquely incorporated this, rather
than nontemplate strands, into the viral particles. What would
be the fate of the template strands when they entered a new
cell? What enzyme would such a template-strand virus need
to include in the viral particles for successful invasion of a
host cell?
Problems
8885d_c26_995-1035 2/12/04 11:18 AM Page 1032 mac34 mac34: kec_420:
Chapters 26 Problems 1033
5. The Chemistry of Nucleic Acid Biosynthesis De-
scribe three properties common to the reactions catalyzed by
DNA polymerase, RNA polymerase, reverse transcriptase,
and RNA replicase. How is the enzyme polynucleotide phos-
phorylase similar to and different from these three enzymes?
6. RNA Splicing What is the minimum number of trans-
esterification reactions needed to splice an intron from an
mRNA transcript? Explain.
7. RNA Genomes The RNA viruses have relatively small
genomes. For example, the single-stranded RNAs of retro-
viruses have about 10,000 nucleotides and the QH9252 RNA is only
4,220 nucleotides long. Given the properties of reverse tran-
scriptase and RNA replicase described in this chapter, can
you suggest a reason for the small size of these viral genomes?
8. Screening RNAs by SELEX The practical limit for
the number of different RNA sequences that can be screened
in a SELEX experiment is 10
15
. (a) Suppose you are work-
ing with oligonucleotides 32 nucleotides in length. How many
sequences exist in a randomized pool containing every se-
quence possible? (b) What percentage of these can be
screened in a SELEX experiment? (c) Suppose you wish to
select an RNA molecule that catalyzes the hydrolysis of a par-
ticular ester. From what you know about catalysis (Chapter
6), propose a SELEX strategy that might allow you to select
the appropriate catalyst.
9. Slow Death The death cap mushroom, Amanita phal-
loides, contains several dangerous substances, including the
lethal H9251-amanitin. This toxin blocks RNA elongation in con-
sumers of the mushroom by binding to eukaryotic RNA poly-
merase II with very high affinity; it is deadly in concentra-
tions as low as 10
H110028
M. The initial reaction to ingestion of the
mushroom is gastrointestinal distress (caused by some of the
other toxins). These symptoms disappear, but about 48 hours
later, the mushroom-eater dies, usually from liver dysfunc-
tion. Speculate on why it takes this long for H9251-amanitin to kill.
10. Detection of Rifampicin-Resistant Strains of Tu-
berculosis Rifampicin is an important antibiotic used to
treat tuberculosis, as well as other mycobacterial diseases.
Some strains of Mycobacterium tuberculosis, the causative
agent of tuberculosis, are resistant to rifampicin. These
strains become resistant through mutations that alter the
rpoB gene, which encodes the H9252 subunit of the RNA poly-
merase. Rifampicin cannot bind to the mutant RNA
polymerase and so is unable to block the initiation of tran-
scription. DNA sequences from a large number of rifampicin-
resistant M. tuberculosis strains have been found to have
mutations in a specific 69 bp region of rpoB. One well-
characterized strain with rifampicin resistance has a single
base pair alteration in rpoB that results in a single amino acid
substitution in the H9252 subunit: a His residue is replaced by an
Asp residue.
(a) Based on your knowledge of protein chemistry
(Chapters 3 and 4), suggest a technique that would allow de-
tection of the rifampicin-resistant strain containing this par-
ticular mutant protein.
(b) Based on your knowledge of nucleic acid chemistry
(Chapter 8), suggest a technique to identify the mutant form
of rpoB.
Biochemistry on the Internet
11. The Ribonuclease Gene Human pancreatic ribonu-
clease has 128 amino acid residues.
(a) What is the minimum number of nucleotide pairs re-
quired to code for this protein?
(b) The mRNA expressed in human pancreatic cells was
copied with reverse transcriptase to create a “library” of hu-
man DNA. The sequence of the mRNA coding for human pan-
creatic ribonuclease was determined by sequencing the com-
plementary DNA (cDNA) from this library that included an
open reading frame for the protein. Use the Entrez database
system (www.ncbi.nlm.nih.gov/Entrez) to find the published
sequence of this mRNA (search the nucleotide database for
accession number D26129). What is the length of this mRNA?
(c) How can you account for the discrepancy between
the size you calculated in (a) and the actual length of the
mRNA?
8885d_c26_995-1033 2/12/04 2:46 PM Page 1033 mac34 mac34: kec_420:
chapter
P
roteins are the end products of most information
pathways. A typical cell requires thousands of dif-
ferent proteins at any given moment. These must be
synthesized in response to the cell’s current needs,
transported (targeted) to their appropriate cellular lo-
cations, and degraded when no longer needed.
An understanding of protein synthesis, the most
complex biosynthetic process, has been one of the great-
est challenges in biochemistry. Eukaryotic protein syn-
thesis involves more than 70 different ribosomal pro-
teins; 20 or more enzymes to activate the amino acid
precursors; a dozen or more auxiliary enzymes and other
protein factors for the initiation, elongation, and termi-
nation of polypeptides; perhaps 100 additional enzymes
for the final processing of different proteins; and 40 or
more kinds of transfer and ribosomal RNAs. Overall, al-
most 300 different macromolecules cooperate to syn-
thesize polypeptides. Many of these macromolecules are
organized into the complex three-dimensional structure
of the ribosome.
To appreciate the central importance of protein syn-
thesis, consider the cellular resources devoted to this
process. Protein synthesis can account for up to 90% of
the chemical energy used by a cell for all biosynthetic
reactions. Every prokaryotic and eukaryotic cell con-
tains from several to thousands of copies of many dif-
ferent proteins and RNAs. The 15,000 ribosomes,
100,000 molecules of protein synthesis–related protein
factors and enzymes, and 200,000 tRNA molecules in a
typical bacterial cell can account for more than 35% of
the cell’s dry weight.
Despite the great complexity of protein synthesis,
proteins are made at exceedingly high rates. A polypep-
tide of 100 residues is synthesized in an Escherichia
coli cell (at 37 H11034C) in about 5 seconds. Synthesis of the
thousands of different proteins in a cell is tightly regu-
lated, so that just enough copies are made to match the
current metabolic circumstances. To maintain the ap-
propriate mix and concentration of proteins, the tar-
geting and degradative processes must keep pace with
synthesis. Research is gradually uncovering the finely
coordinated cellular choreography that guides each pro-
tein to its proper cellular location and selectively de-
grades it when it is no longer required.
The study of protein synthesis offers another im-
portant reward: a look at a world of RNA catalysts that
may have existed before the dawn of life “as we know
it.” Researchers have elucidated the structure of bacte-
rial ribosomes, revealing the workings of cellular pro-
tein synthesis in beautiful molecular detail. And what
did they find? Proteins are synthesized by a gigantic
RNA enzyme!
27.1 The Genetic Code
Three major advances set the stage for our present
knowledge of protein biosynthesis. First, in the early
1950s, Paul Zamecnik and his colleagues designed a set
of experiments to investigate where in the cell proteins
are synthesized. They injected radioactive amino acids
into rats and, at different time intervals after the injec-
27
1034
PROTEIN METABOLISM
27.1 The Genetic Code 1034
27.2 Protein Synthesis 1044
27.3 Protein Targeting and Degradation 1068
Obviously, Harry [Noller]’s finding doesn’t speak to how
life started, and it doesn’t explain what came before
RNA. But as part of the continually growing body of
circumstantial evidence that there was a life form before
us on this planet, from which we emerged—boy, it’s
very strong!
—Gerald Joyce, quoted in commentary in Science, 1992
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1034 mac76 mac76:385_reb:
tion, removed the liver, ho-
mogenized it, fractionated the
homogenate by centrifuga-
tion, and examined the sub-
cellular fractions for the pres-
ence of radioactive protein.
When hours or days were al-
lowed to elapse after injection
of the labeled amino acids, all
the subcellular fractions con-
tained labeled proteins. How-
ever, when only minutes had
elapsed, labeled protein ap-
peared only in a fraction containing small ribonucleo-
protein particles. These particles, visible in animal tis-
sues by electron microscopy, were therefore identified
as the site of protein synthesis from amino acids, and
later were named ribosomes (Fig. 27–1).
The second key advance was made by Mahlon
Hoagland and Zamecnik, when they found that amino
acids were “activated” when incubated with ATP and
the cytosolic fraction of liver cells. The amino acids
became attached to a heat-stable soluble RNA of the
type that had been discovered and characterized by
Robert Holley and later called transfer RNA (tRNA), to
form aminoacyl-tRNAs. The enzymes that catalyze
this process are the aminoacyl-tRNA synthetases.
The third advance resulted from Francis Crick’s rea-
soning on how the genetic information encoded in the 4-
letter language of nucleic acids could be translated into
the 20-letter language of proteins. A small nucleic acid
(perhaps RNA) could serve the role of an adaptor, one
part of the adaptor molecule binding a specific amino acid
and another part recognizing the nucleotide sequence
encoding that amino acid in an mRNA (Fig. 27–2). This
idea was soon verified. The tRNA adaptor “translates”
the nucleotide sequence of an mRNA into the amino
acid sequence of a polypeptide. The overall process of
mRNA-guided protein synthesis is often referred to sim-
ply as translation.
These three developments soon led to recognition
of the major stages of protein synthesis and ultimately
to the elucidation of the genetic code that specifies each
amino acid.
The Genetic Code Was Cracked Using
Artificial mRNA Templates
By the 1960s it had long been apparent that at least
three nucleotide residues of DNA are necessary to en-
code each amino acid. The four code letters of DNA (A,
T, G, and C) in groups of two can yield only 4
2
H11005 16 dif-
ferent combinations, insufficient to encode 20 amino
acids. Groups of three, however, yield 4
3
H11005 64 different
combinations.
Several key properties of the genetic code were es-
tablished in early genetic studies (Figs 27–3, 27–4). A
codon is a triplet of nucleotides that codes for a spe-
cific amino acid. Translation occurs in such a way that
these nucleotide triplets are read in a successive,
nonoverlapping fashion. A specific first codon in the
27.1 The Genetic Code 1035
Cytosol ER lumen
Ribosomes
FIGURE 27–1 Ribosomes and endoplasmic reticulum. Electron mi-
crograph and schematic drawing of a portion of a pancreatic cell,
showing ribosomes attached to the outer (cytosolic) face of the endo-
plasmic reticulum (ER). The ribosomes are the numerous small dots
bordering the parallel layers of membranes.
Paul Zamecnik
FIGURE 27–2 Crick’s adaptor hypothesis. Today we know that the
amino acid is covalently bound at the 3H11032 end of a tRNA molecule and
that a specific nucleotide triplet elsewhere in the tRNA interacts with
a particular triplet codon in mRNA through hydrogen bonding of com-
plementary bases.
GAUC
Amino acid
U C G GG AUAUC
mRNA
HC
R
O
H11002
C
H
3
N
H11001
Amino acid
binding site
Adaptor
Nucleotide triplet
coding for an
amino acid
O
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1035 mac76 mac76:385_reb:
sequence establishes the reading frame, in which a
new codon begins every three nucleotide residues.
There is no punctuation between codons for successive
amino acid residues. The amino acid sequence of a pro-
tein is defined by a linear sequence of contiguous
triplets. In principle, any given single-stranded DNA or
mRNA sequence has three possible reading frames.
Each reading frame gives a different sequence of codons
(Fig. 27–5), but only one is likely to encode a given pro-
tein. A key question remained: what were the three-
letter code words for each amino acid?
In 1961 Marshall Nirenberg and Heinrich Matthaei re-
ported the first breakthrough. They incubated synthetic
polyuridylate, poly(U), with an E. coli extract, GTP, ATP,
and a mixture of the 20 amino acids in 20 different tubes,
each tube containing a different radioactively labeled
amino acid. Because poly(U) mRNA is made up of many
successive UUU triplets, it should promote the synthesis
of a polypeptide containing only the amino acid encoded
by the triplet UUU. A radioac-
tive polypeptide was indeed
formed in only one of the 20
tubes, the one containing ra-
dioactive phenylalanine. Niren-
berg and Matthaei therefore
concluded that the triplet
codon UUU encodes phenyl-
alanine. The same approach re-
vealed that polycytidylate,
poly(C), encodes a polypep-
tide containing only proline
(polyproline), and polyadeny-
late, poly(A), encodes polylysine. Polyguanylate did not
generate any polypeptide in this experiment because it
spontaneously forms tetraplexes (see Fig. 8–22) that can-
not be bound by ribosomes.
The synthetic polynucleotides used in such exper-
iments were prepared with polynucleotide phosphory-
lase (p. 1020), which catalyzes the formation of RNA
polymers starting from ADP, UDP, CDP, and GDP. This
enzyme requires no template and makes polymers with
a base composition that directly reflects the relative
concentrations of the nucleoside 5H11032-diphosphate pre-
cursors in the medium. If polynucleotide phosphorylase
is presented with UDP only, it makes only poly(U). If it
is presented with a mixture of five parts ADP and one
part CDP, it makes a polymer in which about five-sixths
of the residues are adenylate and one-sixth are cytidy-
late. This random polymer is likely to have many triplets
of the sequence AAA, smaller numbers of AAC, ACA,
and CAA triplets, relatively few ACC, CCA, and CAC
triplets, and very few CCC triplets (Table 27–1). Using
a variety of artificial mRNAs made by polynucleotide
phosphorylase from different starting mixtures of ADP,
GDP, UDP, and CDP, investigators soon identified the
base compositions of the triplets coding for almost all
the amino acids. Although these experiments revealed
the base composition of the coding triplets, they could
not reveal the sequence of the bases.
Chapter 27 Protein Metabolism1036
Nonoverlapping
code
AUA C G A G U C
123
Overlapping
code
A U A C G A G U C
1
2
3
FIGURE 27–4 The triplet, nonoverlapping code. Evidence for
the general nature of the genetic code came from many types
of experiments, including genetic experiments on the effects
of deletion and insertion mutations. Inserting or deleting one
base pair (shown here in the mRNA transcript) alters the
sequence of triplets in a nonoverlapping code; all amino acids
coded by the mRNA following the change are affected.
Combining insertion and deletion mutations affects some
amino acids but can eventually restore the correct amino acid
sequence. Adding or subtracting three nucleotides (not shown)
leaves the remaining triplets intact, providing evidence that a
codon has three, rather than four or five, nucleotides. The
triplet codons shaded in gray are those transcribed from the
original gene; codons shaded in blue are new codons
resulting from the insertion or deletion mutations.
FIGURE 27–3 Overlapping versus nonoverlapping genetic codes. In
a nonoverlapping code, codons (numbered consecutively) do not
share nucleotides. In an overlapping code, some nucleotides in the
mRNA are shared by different codons. In a triplet code with maxi-
mum overlap, many nucleotides, such as the third nucleotide from the
left (A), are shared by three codons. Note that in an overlapping code,
the triplet sequence of the first codon limits the possible sequences
for the second codon. A nonoverlapping code provides much more
flexibility in the triplet sequence of neighboring codons and therefore
in the possible amino acid sequences designated by the code. The ge-
netic code used in all living systems is now known to be nonover-
lapping.
mRNA 5H11032
Insertion
Deletion
GUA G C C U A C G G A U 3H11032
GUA G C C U C A C G G A U
GUA C C U A C G G A U
Insertion and
deletion
GUA A G C C A C G G A U
(H11545)
(H11546)
(H11545) (H11546)
Reading frame
restored
Marshall Nirenberg
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1036 mac76 mac76:385_reb:
In 1964 Nirenberg and Philip Leder achieved an-
other experimental breakthrough. Isolated E. coli ribo-
somes would bind a specific aminoacyl-tRNA in the
presence of the corresponding synthetic polynucleotide
messenger. (By convention, the identity of a tRNA is in-
dicated by a superscript, such as tRNA
Ala
, and the
aminoacylated tRNA by a hyphenated name: alanyl-
tRNA
Ala
or Ala-tRNA
Ala
.) For example, ribosomes incu-
bated with poly(U) and phenylalanyl-tRNA
Phe
(Phe-
tRNA
Phe
) bind both RNAs, but if the ribosomes are
incubated with poly(U) and some other aminoacyl-
tRNA, the aminoacyl-tRNA is not bound, because it does
not recognize the UUU triplets in poly(U) (Table 27–2).
Even trinucleotides could promote specific binding of
appropriate tRNAs, so these experiments could be car-
ried out with chemically synthesized small oligonu-
cleotides. With this technique researchers determined
which aminoacyl-tRNA bound to about 50 of the 64 pos-
sible triplet codons. For some codons, either no amino-
acyl-tRNA or more than one would bind. Another
method was needed to complete and confirm the entire
genetic code.
27.1 The Genetic Code 1037
Expected frequency
Observed Tentative assignment of incorporation
frequency of for nucleotide based on
incorporation composition
*
of assignment
Amino acid (Lys = 100) corresponding codon (Lys = 100)
Asparagine 24 A
2
C20
Glutamine 24 A
2
Histidine 6 AC
2
4
Lysine 100 AAA 100
Proline 7 AC
2
, CCC 4.8
Threonine 26 A
2
C, AC
2
24
Note: Presented here is a summary of data from one of the early experiments designed to elucidate the genetic code. A synthetic RNA contain-
ing only A and C residues in a 5:1 ratio directed polypeptide synthesis, and both the identity and the quantity of incorporated amino acids were
determined. Based on the relative abundance of A and C residues in the synthetic RNA, and assigning the codon AAA (the most likely codon) a
frequency of 100, there should be three different codons of composition A
2
C, each at a relative frequency of 20; three of composition AC
2
,
each at a relative frequency of 4.0; and CCC at a relative frequency of 0.8. The CCC assignment was based on information derived from prior
studies with poly(C). Where two tentative codon assignments are made, both are proposed to code for the same amino acid.
*These designations of nucleotide composition contain no information on nucleotide sequence (except, of course, AAA and CCC).
TABLE 27–1 Incorporation of Amino Acids into Polypeptides in Response to
Random Polymers of RNA
Reading frame 1 5H11032 UUC U C G G A C C U G 3H11032GA G AUU CAC AGU
Reading frame 2 U U C U C G G A C C GUGG AGA UUC ACA
Reading frame 3 U U C U C G G A C C U UGGA GA U UC A CAG
U
FIGURE 27–5 Reading frames in the genetic code. In a triplet, nonoverlapping code, all mRNAs have
three potential reading frames, shaded here in different colors. The triplets, and hence the amino acids
specified, are different in each reading frame.
Relative increase in
14
C-labeled
aminoacyl-tRNA bound to ribosome*
Trinucleotide Phe-tRNA
Phe
Lys-tRNA
Lys
Pro-tRNA
Pro
UUU 4.6 0 0
AAA 0 7.7 0
CCC 0 0 3.1
Source: Modified from Nirenberg, M. & Leder, P. (1964) RNA code words and protein
synthesis. Science 145, 1399.
*
Each number represents the factor by which the amount of bound
14
C increased
when the indicated trinucleotide was present, relative to a control with no trinucleotide.
TABLE 27–2 Trinucleotides That Induce Specific
Binding of Aminoacyl-tRNAs to Ribosomes+
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1037 mac76 mac76:385_reb:
At about this time, a com-
plementary approach was pro-
vided by H. Gobind Khorana,
who developed chemical
methods to synthesize poly-
ribonucleotides with defined,
repeating sequences of two to
four bases. The polypeptides
produced by these mRNAs
had one or a few amino acids
in repeating patterns. These
patterns, when combined with
information from the random
polymers used by Nirenberg and colleagues, permitted
unambiguous codon assignments. The copolymer
(AC)
n
, for example, has alternating ACA and CAC
codons: ACACACACACACACA. The polypeptide syn-
thesized on this messenger contained equal amounts of
threonine and histidine. Given that a histidine codon has
one A and two Cs (Table 27–1), CAC must code for his-
tidine and ACA for threonine.
Consolidation of the results from many experiments
permitted the assignment of 61 of the 64 possible
codons. The other three were identified as termination
codons, in part because they disrupted amino acid
coding patterns when they occurred in a synthetic
RNA polymer (Fig. 27–6). Meanings for all the triplet
codons (tabulated in Fig. 27–7) were established by
1966 and have been verified in many different ways. The
cracking of the genetic code is regarded as one of the
most important scientific discoveries of the twentieth
century.
Codons are the key to the translation of genetic in-
formation, directing the synthesis of specific proteins.
The reading frame is set when translation of an mRNA
molecule begins, and it is maintained as the synthetic
machinery reads sequentially from one triplet to the
next. If the initial reading frame is off by one or two
bases, or if translation somehow skips a nucleotide in
the mRNA, all the subsequent codons will be out of reg-
ister; the result is usually a “missense” protein with a
garbled amino acid sequence. There are a few unusual
but interesting exceptions to this rule (Box 27–1).
Several codons serve special functions (Fig. 27–7).
The initiation codon AUG is the most common signal
for the beginning of a polypeptide in all cells (some rare
alternatives are discussed in Box 27–2), in addition to
coding for Met residues in internal positions of polypep-
tides. The termination codons (UAA, UAG, and UGA),
also called stop codons or nonsense codons, normally
signal the end of polypeptide synthesis and do not code
for any known amino acids.
As described in Section 27.2, initiation of protein
synthesis in the cell is an elaborate process that relies
on initiation codons and other signals in the mRNA. In
retrospect, the experiments of Nirenberg and Khorana
to identify codon function should not have worked in the
absence of initiation codons. Serendipitously, experi-
mental conditions caused the normal initiation require-
Chapter 27 Protein Metabolism1038
UA A GUAAGU AA G AA AGUA UG
GUA A G U A A G U A A U A A A 3H11032AGU G Reading frame 1 5H11032
Reading frame 2
AAGUA AGUAAGU AGUA AA U Reading frame 3 G
FIGURE 27–6 Effect of a termination codon in a repeating tetranucleotide. Termination codons
(pink) are encountered every fourth codon in three different reading frames (shown in different colors).
Dipeptides or tripeptides are synthesized, depending on where the ribosome initially binds.
H. Gobind Khorana
UUU
UUC
UUA
UUG
Phe
Phe
Leu
Leu
UCU
UCC
UCA
UCG
Ser
Ser
Ser
Ser
UAU
UAC
UAA
UAG
Tyr
Tyr
Stop
Stop
UGU
UGC
UGA
UGG
Cys
Cys
Stop
Trp
CUU
CUC
CUA
CUG
Leu
Leu
Leu
Leu
CCU
CCC
CCA
CCG
Pro
Pro
Pro
Pro
CAU
CAC
CAA
CAG
His
His
Gln
Gln
CGU
CGC
CGA
CGG
Arg
Arg
Arg
Arg
AUU
AUC
AUA
AUG
Ile
Ile
Ile
Met
ACU
ACC
ACA
ACG
Thr
Thr
Thr
Thr
AAU
AAC
AAA
AAG
Asn
Asn
Lys
Lys
AGU
AGC
AGA
AGG
Ser
Ser
Arg
Arg
GUU
GUC
GUA
GUG
Val
Val
Val
Val
GCU
GCC
GCA
GCG
Ala
Ala
Ala
Ala
GAU
GAC
GAA
GAG
Asp
Asp
Glu
Glu
Gly
Gly
Gly
Gly
GGU
GGC
GGA
GGG
U
C
A
G
U CAG
Second letter
of codon
First letter of codon (5H11032 end)
FIGURE 27–7 ”Dictionary” of amino acid code words in mRNAs.
The codons are written in the 5H11032n3H11032 direction. The third base of each
codon (in bold type) plays a lesser role in specifying an amino acid
than the first two. The three termination codons are shaded in pink,
the initiation codon AUG in green. All the amino acids except me-
thionine and tryptophan have more than one codon. In most cases,
codons that specify the same amino acid differ only at the third base.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1038 mac76 mac76:385_reb:
ments for protein synthesis to be relaxed. Diligence
combined with chance to produce a breakthrough—a
common occurrence in the history of biochemistry.
In a random sequence of nucleotides, 1 in every 20
codons in each reading frame is, on average, a termina-
tion codon. In general, a reading frame without a ter-
mination codon among 50 or more codons is referred to
as an open reading frame (ORF). Long open reading
frames usually correspond to genes that encode pro-
teins. In the analysis of sequence databases, sophisti-
cated programs are used to search for open reading
frames in order to find genes among the often huge
background of nongenic DNA. An uninterrupted gene
coding for a typical protein with a molecular weight of
60,000 would require an open reading frame with 500
or more codons.
A striking feature of the genetic code is that an
amino acid may be specified by more than one codon,
so the code is described as degenerate. This does not
suggest that the code is flawed: although an amino acid
may have two or more codons, each codon specifies only
one amino acid. The degeneracy of the code is not uni-
form. Whereas methionine and tryptophan have single
codons, for example, three amino acids (Leu, Ser, Arg)
have six codons, five amino acids have four, isoleucine
has three, and nine amino acids have two (Table 27–3).
The genetic code is nearly universal. With the in-
triguing exception of a few minor variations in mito-
chondria, some bacteria, and some single-celled eu-
karyotes (Box 27–2), amino acid codons are identical in
all species examined so far. Human beings, E. coli, to-
bacco plants, amphibians, and viruses share the same
genetic code. Thus it would appear that all life forms
have a common evolutionary ancestor, whose genetic
code has been preserved throughout biological evolu-
tion. Even the variations (Box 27–2) reinforce this
theme.
Wobble Allows Some tRNAs to Recognize
More than One Codon
When several different codons specify one amino acid,
the difference between them usually lies at the third
base position (at the 3H11032 end). For example, alanine is
coded by the triplets GCU, GCC, GCA, and GCG. The
codons for most amino acids can be symbolized by XY
A
G
or XY
U
C
. The first two letters of each codon are the pri-
mary determinants of specificity, a feature that has some
interesting consequences.
Transfer RNAs base-pair with mRNA codons at a
three-base sequence on the tRNA called the anticodon.
The first base of the codon in mRNA (read in the 5H11032n3H11032
direction) pairs with the third base of the anticodon
(Fig. 27–8a). If the anticodon triplet of a tRNA recog-
nized only one codon triplet through Watson-Crick base
pairing at all three positions, cells would have a differ-
ent tRNA for each amino acid codon. This is not the
case, however, because the anticodons in some tRNAs
include the nucleotide inosinate (designated I), which
contains the uncommon base hypoxanthine (see Fig.
8–5b). Inosinate can form hydrogen bonds with three
different nucleotides (U, C, and A; Fig. 27–8b), although
27.1 The Genetic Code 1039
TABLE 27–3
Number Number
Amino acid of codons Amino acid of codons
Met 1 Tyr 2
Trp 1 Ile 3
Asn 2 Ala 4
Asp 2 Gly 4
Cys 2 Pro 4
Gln 2 Thr 4
Glu 2 Val 4
His 2 Arg 6
Lys 2 Leu 6
Phe 2 Ser 6
Degeneracy of the Genetic Code
CUAmRNA 5H11032
tRNA
321
Codon
GAU
123
Anticodon
3H11032
5H11032
3H11032
(a)
321 321 321
Anticodon (3H11032)G–C–I G–C– I G–C– I (5H11032)
Codon (5H11032)C–G–A C–G–U C–G–C (3H11032)
123 123 123
(b)
FIGURE 27–8 Pairing relationship of codon and anticodon. (a) Align-
ment of the two RNAs is antiparallel. The tRNA is shown in the tra-
ditional cloverleaf configuration. (b) Three different codon pairing re-
lationships are possible when the tRNA anticodon contains inosinate.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1039 mac76 mac76:385_reb:
these pairings are much weaker than the hydrogen
bonds of Watson-Crick base pairs (GmC and AUU). In
yeast, one tRNA
Arg
has the anticodon (5H11032)ICG, which
recognizes three arginine codons: (5H11032)CGA, (5H11032)CGU,
and (5H11032)CGC. The first two bases are identical (CG) and
form strong Watson-Crick base pairs with the corre-
sponding bases of the anticodon, but the third base
(A, U, or C) forms rather weak hydrogen bonds with the
I residue at the first position of the anticodon.
Examination of these and other codon-anticodon
pairings led Crick to conclude that the third base of most
codons pairs rather loosely with the corresponding base
of its anticodon; to use his picturesque word, the third
base of such codons (and the first base of their corre-
Chapter 27 Protein Metabolism1040
BOX 27–1 WORKING IN BIOCHEMISTRY
Changing Horses in Midstream: Translational
Frameshifting and mRNA Editing
Once the reading frame has been set during protein
synthesis, codons are translated without overlap or
punctuation until the ribosomal complex encounters
a termination codon. The other two possible reading
frames usually contain no useful genetic information,
but a few genes are structured so that ribosomes
“hiccup” at a certain point in the translation of their
mRNAs, changing the reading frame from that point
on. This appears to be a mechanism either to allow
two or more related but distinct proteins to be pro-
duced from a single transcript or to regulate the syn-
thesis of a protein.
One of the best-documented examples occurs in
translation of the mRNA for the overlapping gag and
pol genes of the Rous sarcoma virus (see Fig. 26–31).
The reading frame for pol is offset to the left by one
base pair (H110021 reading frame) relative to the reading
frame for gag (Fig. 1).
The product of the pol gene (reverse transcrip-
tase) is translated as a larger polyprotein, on the same
mRNA that is used for the gag protein alone (see Fig.
26–30). The polyprotein, or gag-pol protein, is then
trimmed to the mature reverse transcriptase by pro-
teolytic digestion. Production of the polyprotein re-
quires a translational frameshift in the overlap region
to allow the ribosome to bypass the UAG termination
codon at the end of the gag gene (shaded pink in
Fig. 1).
Frameshifts occur during about 5% of translations
of this mRNA, and the gag-pol polyprotein (and ulti-
mately reverse transcriptase) is synthesized at about
one-twentieth the frequency of the gag protein, a level
that suffices for efficient reproduction of the virus. In
some retroviruses, another translational frameshift al-
lows translation of an even larger polyprotein that in-
cludes the product of the env gene fused to the gag
and pol gene products (see Fig. 26–30). A similar
mechanism produces both the H9270 and H9253 subunits of
E. coli DNA polymerase III from a single dnaX gene
transcript (see Table 25–2).
This mechanism also occurs in the gene for E. coli
release factor 2 (RF-2), discussed in Section 27.2,
which is required for termination of protein synthesis
at the termination codons UAA and UGA. The twenty-
sixth codon in the transcript of the gene for RF-2 is
UGA, which would normally halt protein synthesis.
The remainder of the gene is in the H110011 reading frame
(offset one base pair to the right) relative to this UGA
codon. Translation pauses at this codon, but termina-
tion does not occur unless RF-2 is bound to the codon
(the lower the level of RF-2, the less likely the bind-
ing). The absence of bound RF-2 prevents the termi-
nation of protein synthesis at UGA and allows time for
a frameshift to occur. The UGA plus the C that follows
it (UGAC) is therefore read as GAC, which translates
to Asp. Translation then proceeds in the new reading
frame to complete synthesis of RF-2. In this way, RF-2
regulates its own synthesis in a feedback loop.
Some mRNAs are edited before translation. The
initial transcripts of the genes that encode cytochrome
oxidase subunit II in some protist mitochondria do not
correspond precisely to the sequence needed at the
gag reading frame
CUAGGGCUC CGC 3
H11032
UUGACAAAU UUAUA GGGA
CUA G G G C U C C G C U U G A C A A A U U U A UA GGGAG
Ile Gly Arg Ala
GGC
pol reading frame
GGGCC A
Leu Gly Leu Arg Leu Thr Asn Leu
CA
Stop
5
H11032
FIGURE 1 The gag-pol overlap region in Rous sarcoma virus RNA.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1040 mac76 mac76:385_reb:
sponding anticodons) “wobbles.” Crick proposed a set
of four relationships called the wobble hypothesis:
1. The first two bases of an mRNA codon always
form strong Watson-Crick base pairs with the
corresponding bases of the tRNA anticodon and
confer most of the coding specificity.
2. The first base of the anticodon (reading in the
5H11032n3H11032 direction; this pairs with the third base of
the codon) determines the number of codons
recognized by the tRNA. When the first base of
the anticodon is C or A, base pairing is specific
and only one codon is recognized by that tRNA.
When the first base is U or G, binding is less
27.1 The Genetic Code 1041
carboxyl terminus of the protein product. A posttran-
scriptional editing process inserts four U residues that
shift the translational reading frame of the transcript.
Figure 2a shows the added U residues in the small part
of the transcript that is affected by editing. Neither
the function nor the mechanism of this editing process
is understood. Investigators have detected a special
class of RNA molecules encoded by these mitochon-
dria, with sequences complementary to the edited
mRNAs. These so-called guide RNAs (Fig. 2b) appear
to act as templates for the editing process. Note that
the base pairing involves a number of GUU base pairs
(blue dots), which are common in RNA molecules.
A distinct form of RNA editing occurs in the gene
for the apolipoprotein B component of low-density
lipoprotein in vertebrates. One form of apolipopro-
tein B, apoB-100 (M
r
513,000), is synthesized in the
liver; a second form, apoB-48 (M
r
250,000), is syn-
thesized in the intestine. Both are encoded by an
mRNA produced from the gene for apoB-100. A cy-
tosine deaminase enzyme found only in the intestine
binds to the mRNA at the codon for amino acid
residue 2,153 (CAA H11005 Gln) and converts the C to a
U, to introduce the termination codon UAA. The
apoB-48 produced in the intestine from this modified
mRNA is simply an abbreviated form (corresponding
to the amino-terminal half) of apoB-100 (Fig. 3). This
reaction permits tissue-specific synthesis of two dif-
ferent proteins from one gene.
5
H11032
AAAG T A G A G A A C 3
H11032
CT GGT A
Glu Asn Leu ValLys Val
AAAGUAGA U U G U A U A CCU
Asp Cys Ile ProLys Val
GG
Gly
(a)
U
DNA
coding
strand
Edited
mRNA
5
H11032
AAAGU AGA U U G U 3
H11032
A U ACCUGU G
UU AUAUCUA AUA UAUGGAU U A
mRNA
Guide RNA
3
H11032
5
H11032
(b)
CAA 3
H11032
CAGACAUA UAUG CAA
Gln
Residue number
UUU GA U CAGUA U
Leu Gln Thr Tyr Met Gln Phe Asp Gln Tyr
CUG
CAA CAGACAUAUAUGAUA
Gln
UAA GA U CAGUA U
Leu Gln Thr Tyr Met Ile Stop
CUG
2,146 2,148 2,150 2,152 2,154 2,156
Human intestine
(apoB-48)
UUU
AUA
Ile
Human liver
(apoB-100)
5
H11032
FIGURE 2 RNA editing of the tran-
script of the cytochrome oxidase
subunit II gene from Trypanosoma
brucei mitochondria. (a) Insertion
of four U residues (pink) produces a
revised reading frame. (b) A special
class of guide RNAs, complemen-
tary to the edited product, may act
as templates for the editing process.
FIGURE 3 RNA editing of the transcript of the gene for the
apolipoprotein B-100 component of LDL. Deamination, which oc-
curs only in the intestine, converts a specific cytosine to uracil,
changing a Gln codon to a stop codon and producing a truncated
protein.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1041 mac76 mac76:385_reb:
specific and two different codons may be read.
When inosine (I) is the first (wobble) nucleotide
of an anticodon, three different codons can
be recognized—the maximum number for any
tRNA. These relationships are summarized in
Table 27–4.
3. When an amino acid is specified by several
different codons, the codons that differ in either
of the first two bases require different tRNAs.
4. A minimum of 32 tRNAs are required to translate
all 61 codons (31 to encode the amino acids and
1 for initiation).
Chapter 27 Protein Metabolism1042
BOX 27–2 WORKING IN BIOCHEMISTRY
Exceptions That Prove the Rule: Natural
Variations in the Genetic Code
In biochemistry, as in other disciplines, exceptions to
general rules can be problematic for instructors and
frustrating for students. At the same time, though, they
teach us that life is complex and inspire us to search
for more surprises. Understanding the exceptions can
even reinforce the original rule in surprising ways.
One would expect little room for variation in the
genetic code. Even a single amino acid substitution
can have profoundly deleterious effects on the struc-
ture of a protein. Nevertheless, variations in the code
do occur in some organisms, and they are both inter-
esting and instructive. The types of variation and their
rarity provide powerful evidence for a common evo-
lutionary origin of all living things.
To alter the code, changes must occur in one or
more tRNAs, with the obvious target for alteration be-
ing the anticodon. Such a change would lead to the
systematic insertion of an amino acid at a codon that,
according to the normal code (see Fig. 27–7), does
not specify that amino acid. The genetic code, in ef-
fect, is defined by two elements: (1) the anticodons
on tRNAs (which determine where an amino acid is
placed in a growing polypeptide) and (2) the speci-
ficity of the enzymes—the aminoacyl-tRNA syn-
thetases—that charge the tRNAs, which determines
the identity of the amino acid attached to a given
tRNA.
Most sudden changes in the code would have cat-
astrophic effects on cellular proteins, so code alter-
ations are more likely where relatively few proteins
would be affected—such as in small genomes encod-
ing only a few proteins. The biological consequences
of a code change could also be limited by restricting
changes to the three termination codons, which do not
generally occur within genes (see Box 27–4 for ex-
ceptions to this rule). This pattern is in fact observed.
Of the very few variations in the genetic code that
we know of, most occur in mitochondrial DNA
(mtDNA), which encodes only 10 to 20 proteins. Mi-
tochondria have their own tRNAs, so their code vari-
ations do not affect the much larger cellular genome.
The most common changes in mitochondria (and the
only code changes that have been observed in cellu-
lar genomes) involve termination codons. These
changes affect termination in the products of only a
subset of genes, and sometimes the effects are minor
because the genes have multiple (redundant) termi-
nation codons.
In mitochondria, these changes can be viewed as
a kind of genomic streamlining. Vertebrate mtDNAs
have genes that encode 13 proteins, 2 rRNAs, and 22
tRNAs (see Fig. 19–32). An unusual set of wobble
rules allows the 22 tRNAs to decode all 64 possible
codon triplets; not all of the 32 tRNAs required for the
normal code are needed. Four codon families (in
which the amino acid is determined entirely by the
first two nucleotides) are decoded by a single tRNA
with a U residue in the first (or wobble) position in
the anticodon. Either the U pairs somehow with any
of the four possible bases in the third position of the
codon or a “two out of three” mechanism is used—
that is, no base pairing is needed at the third position.
Other tRNAs recognize codons with either A or G in
the third position, and yet others recognize U or C, so
that virtually all the tRNAs recognize either two or
four codons.
In the normal code, only two amino acids are spec-
ified by single codons: methionine and tryptophan
(see Table 27–3). If all mitochondrial tRNAs recognize
two codons, we would expect additional Met and Trp
codons in mitochondria. And we find that the single
most common code variation is the normal termina-
tion codon UGA specifying tryptophan. The tRNA
Trp
recognizes and inserts a Trp residue at either UGA or
the normal Trp codon, UGG. The second most com-
mon variation is conversion of AUA from an Ile codon
to a Met codon; the normal Met codon is AUG, and a
single tRNA recognizes both codons. The known cod-
ing variations in mitochondria are summarized in
Table 1.
Turning to the much rarer changes in the codes
for cellular (as distinct from mitochondrial) genomes,
we find that the only known variation in a prokaryote
is again the use of UGA to encode Trp residues, oc-
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1042 mac76 mac76:385_reb:
The wobble (or third) base of the codon con-
tributes to specificity, but, because it pairs only loosely
with its corresponding base in the anticodon, it per-
mits rapid dissociation of the tRNA from its codon dur-
ing protein synthesis. If all three bases of a codon
engaged in strong Watson-Crick pairing with the three
bases of the anticodon, tRNAs would dissociate too
slowly and this would severely limit the rate of protein
synthesis. Codon-anticodon interactions balance the
requirements for accuracy and speed.
The genetic code tells us how protein sequence in-
formation is stored in nucleic acids and provides some
27.1 The Genetic Code 1043
curring in the simplest free-living cell, Mycoplasma
capricolum. Among eukaryotes, the only known ex-
tramitochondrial coding changes occur in a few
species of ciliated protists, in which both termination
codons UAA and UAG can specify glutamine.
Changes in the code need not be absolute; a codon
might not always encode the same amino acid. In E.
coli we find two examples of amino acids being in-
serted at positions not specified in the normal code.
The first is the occasional use of GUG (Val) as an ini-
tiation codon. This occurs only for those genes in
which the GUG is properly located relative to partic-
ular mRNA sequences that affect the initiation of
translation (as discussed in Section 27.2).
The second E. coli example also involves contex-
tual signals that alter coding patterns. A few proteins
in all cells (such as formate dehydrogenase in bacte-
ria and glutathione peroxidase in mammals) require
the element selenium for their activity, generally in
the form of the modified amino acid selenocysteine.
Although modified amino acids are generally produced
in posttranslational reactions (described in Section
27.3), in E. coli selenocysteine is introduced into for-
mate dehydrogenase during translation, in response
to an in-frame UGA codon. A special type of serine
tRNA, present at lower levels than other Ser-tRNAs,
recognizes UGA and no other codons. This tRNA is
charged with serine, and the serine is enzymatically
converted to selenocysteine before its use at the ri-
bosome. The charged tRNA does not recognize just
any UGA codon; some contextual signal in the mRNA,
still to be identified, ensures that this tRNA recognizes
only the few UGA codons, within certain genes, that
specify selenocysteine. In effect, E. coli has 21 com-
mon amino acids, and UGA doubles as a codon for both
termination and (sometimes) selenocysteine.
These variations tell us that the code is not quite
as universal as once believed, but that its flexibility is
severely constrained. The variations are obviously de-
rivatives of the normal code, and no example of a com-
pletely different code has been found. The limited
scope of code variants strengthens the principle that
all life on this planet evolved on the basis of a single
(slightly flexible) genetic code.
TABLE 1 Known Variant Codon Assignments in Mitochondria
Codons*
AGA
UGA AUA AGG CUN CGG
Normal code assignment Stop Ile Arg Leu Arg
Animals
Vertebrates Trp Met Stop H11001H11001
Drosophila Trp Met Ser H11001H11001
Yeasts
Saccharomyces cerevisiae Trp Met H11001 Thr H11001
Torulopsis glabrata Trp Met H11001 Thr ?
Schizosaccharomyces pombe Trp H11001H11001H11001H11001
Filamentous fungi Trp H11001H11001H11001H11001
Trypanosomes Trp H11001H11001H11001H11001
Higher plants H11001H11001H11001H11001Trp
Chlamydomonas reinhardtii ? H11001H11001H11001 ?
*
N indicates any nucleotide; H11001, codon has the same meaning as in the normal code; ?, codon not observed in this mitochondrial genome.
O
Se
CH
2
CH
COO
H5008
H
H
3
N
H11001
Selenocysteine
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1043 mac76 mac76:385_reb:
clues about how that information is translated into pro-
tein. We now turn to the molecular mechanisms of the
translation process.
SUMMARY 27.1 The Genetic Code
■ The particular amino acid sequence of a
protein is constructed through the translation
of information encoded in mRNA. This process
is carried out by ribosomes.
■ Amino acids are specified by mRNA codons
consisting of nucleotide triplets. Translation
requires adaptor molecules, the tRNAs, that
recognize codons and insert amino acids into
their appropriate sequential positions in the
polypeptide.
■ The base sequences of the codons were
deduced from experiments using synthetic
mRNAs of known composition and sequence.
■ The codon AUG signals initiation of translation.
The triplets UAA, UAG, and UGA are signals
for termination.
■ The genetic code is degenerate: it has multiple
code words for almost every amino acid.
■ The standard genetic code words are universal
in all species, with some minor deviations in
mitochondria and a few single-celled organisms.
■ The third position in each codon is much less
specific than the first and second and is said to
wobble.
27.2 Protein Synthesis
As we have seen for DNA and RNA (Chapters 25 and
26), the synthesis of polymeric biomolecules can be con-
sidered in terms of initiation, elongation, and termina-
tion stages. These fundamental processes are typically
bracketed by two additional stages: activation of pre-
cursors before synthesis and postsynthetic processing
of the completed polymer. Protein synthesis follows the
same pattern. The activation of amino acids before their
incorporation into polypeptides and the posttransla-
tional processing of the completed polypeptide play par-
ticularly important roles in ensuring both the fidelity of
synthesis and the proper function of the protein prod-
uct. The cellular components involved in the five stages
of protein synthesis in E. coli and other bacteria are
listed in Table 27–5; the requirements in eukaryotic cells
are quite similar, although the components are in some
cases more numerous. An initial overview of the stages
of protein synthesis provides a useful outline for the dis-
cussion that follows.
Protein Biosynthesis Takes Place in Five Stages
Stage 1: Activation of Amino Acids For the synthesis of a
polypeptide with a defined sequence, two fundamental
chemical requirements must be met: (1) the carboxyl
group of each amino acid must be activated to facilitate
formation of a peptide bond, and (2) a link must be es-
tablished between each new amino acid and the infor-
mation in the mRNA that encodes it. Both these re-
quirements are met by attaching the amino acid to a
tRNA in the first stage of protein synthesis. Attaching
the right amino acid to the right tRNA is critical. This
reaction takes place in the cytosol, not on the ribosome.
Each of the 20 amino acids is covalently attached to a
specific tRNA at the expense of ATP energy, using Mg
2H11001
-
dependent activating enzymes known as aminoacyl-
tRNA synthetases. When attached to their amino acid
(aminoacylated) the tRNAs are said to be “charged.”
Stage 2: Initiation The mRNA bearing the code for the
polypeptide to be made binds to the smaller of two ri-
bosomal subunits and to the initiating aminoacyl-tRNA.
The large ribosomal subunit then binds to form an ini-
tiation complex. The initiating aminoacyl-tRNA base-
pairs with the mRNA codon AUG that signals the be-
ginning of the polypeptide. This process, which requires
GTP, is promoted by cytosolic proteins called initiation
factors.
Stage 3: Elongation The nascent polypeptide is length-
ened by covalent attachment of successive amino acid
units, each carried to the ribosome and correctly posi-
tioned by its tRNA, which base-pairs to its correspon-
ding codon in the mRNA. Elongation requires cytosolic
proteins known as elongation factors. The binding of
each incoming aminoacyl-tRNA and the movement of
Chapter 27 Protein Metabolism1044
TABLE 27–4 How the Wobble Base of the
Anticodon Determines the Number of Codons a
tRNA Can Recognize
1. One codon recognized:
1. Anticodon (3H11032) X–Y–C (5H11032)(3H11032) X–Y– A (5H11032)
––– –––
––– –––
Codon (5H11032) Y–X–G (3H11032)(5H11032) Y–X– U (3H11032)
2. Two codons recognized:
1. Anticodon (3H11032) X–Y– U (5H11032)(3H11032) X–Y– G (5H11032)
––– –––
––– –––
Codon (5H11032) Y–X–
A
G
(3H11032)(5H11032) Y–X–
C
U
(3H11032)
3. Three codons recognized:
1. Anticodon (3H11032) X–Y– I (5H11032)
–––
–––
Codon (5H11032) Y–X– U
A
C
(3H11032)
Note: X and Y denote bases complementary to and capable of strong Watson-Crick base
pairing with XH11032 and YH11032, respectively. Wobble bases—in the 3H11032 position of codons and 5H11032
position of anticodons—are shaded in pink.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1044 mac76 mac76:385_reb:
the ribosome along the mRNA are facilitated by the hy-
drolysis of GTP as each residue is added to the grow-
ing polypeptide.
Stage 4: Termination and Release Completion of the poly-
peptide chain is signaled by a termination codon in the
mRNA. The new polypeptide is released from the ribo-
some, aided by proteins called release factors.
Stage 5: Folding and Posttranslational Processing In order
to achieve its biologically active form, the new polypep-
tide must fold into its proper three-dimensional confor-
mation. Before or after folding, the new polypeptide may
undergo enzymatic processing, including removal of one
or more amino acids (usually from the amino terminus);
addition of acetyl, phosphoryl, methyl, carboxyl, or
other groups to certain amino acid residues; proteolytic
cleavage; and/or attachment of oligosaccharides or pros-
thetic groups.
Before looking at these five stages in detail, we must ex-
amine two key components in protein biosynthesis: the
ribosome and tRNAs.
The Ribosome Is a Complex Supramolecular Machine
Each E. coli cell contains 15,000 or more ribosomes,
making up almost a quarter of the dry weight of the cell.
Bacterial ribosomes contain about 65% rRNA and 35%
protein; they have a diameter of about 18 nm and are
composed of two unequal subunits with sedimentation
coefficients of 30S and 50S and a combined sedimenta-
tion coefficient of 70S. Both subunits contain dozens of
ribosomal proteins and at least one large rRNA (Table
27–6).
Following Zamecnik’s discovery that ribosomes are
the complexes responsible for protein synthesis, and fol-
lowing elucidation of the genetic code, the study of ri-
bosomes accelerated. In the late 1960s Masayasu No-
mura and colleagues demonstrated that both ribosomal
subunits can be broken down into their RNA and pro-
tein components, then reconstituted in vitro. Under ap-
propriate experimental conditions, the RNA and protein
spontaneously reassemble to form 30S or 50S subunits
nearly identical in structure and activity to native sub-
units. This breakthrough fueled decades of research into
27.2 Protein Synthesis 1045
Stage Essential components
1. Activation of amino acids 20 amino acids
20 aminoacyl-tRNA synthetases
32 or more tRNAs
ATP
Mg
2H11001
2. Initiation mRNA
N-Formylmethionyl-tRNA
fmet
Initiation codon in mRNA (AUG)
30S ribosomal subunit
50S ribosomal subunit
Initiation factors (IF-1, IF-2, IF-3)
GTP
Mg
2H11001
3. Elongation Functional 70S ribosome (initiation complex)
Aminoacyl-tRNAs specified by codons
Elongation factors (EF-Tu, EF-Ts, EF-G)
GTP
Mg
2H11001
4. Termination and release Termination codon in mRNA
Release factors (RF-1, RF-2, RF-3)
5. Folding and posttranslational Specific enzymes, cofactors, and other components for
processing removal of initiating residues and signal sequences,
additional proteolytic processing, modification of
terminal residues, and attachment of phosphate,
methyl, carboxyl, carbohydrate, or prosthetic groups
TABLE 27–5 Components Required for the Five Major Stages of Protein
Synthesis in E. coli
Masayasu Nomura
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1045 mac76 mac76:385_reb:
the function and structure of ribosomal RNAs and pro-
teins. At the same time, increasingly sophisticated struc-
tural methods revealed more and more details about
ribosome structure.
The dawn of a new millennium brought with it the
elucidation of the first high-resolution structures of bac-
terial ribosomal subunits. The bacterial ribosome is com-
plex, with a combined molecular weight of H110112.7 million,
and it is providing a wealth of surprises (Fig. 27–9). First,
the traditional focus on the protein components of ribo-
somes was shifted. The ribosomal subunits are huge RNA
molecules. In the 50S subunit, the 5S and 23S rRNAs form
the structural core. The proteins are secondary elements
in the complex, decorating the surface. Second and most
important, there is no protein within 18 ? of the active
site for peptide bond formation. The high-resolution
structure thus confirms what many had suspected for
more than a decade: the ribosome is a ribozyme. In ad-
dition to the insight they provide into the mechanism of
protein synthesis (as elaborated below), the detailed
Chapter 27 Protein Metabolism1046
EPA
50S 30S
(a)
(b)
FIGURE 27–9 Ribosomes. Our understanding of ribosome structure
took a giant step forward with the publication in 2000 of the high-
resolution structure of the 50S ribosomal subunit of the bacterium
Haloarcula marismortui by Thomas Steitz, Peter Moore, and
their colleagues. This was followed by additional high-
resolution structures of the ribosomal subunits from
several different bacterial species, and models of the
corresponding complete ribosomes. A sampling of
that progress is presented here.
(a) The 50S and 30S bacterial subunits, split
apart to visualize the surfaces that interact in the
active ribosome. The structure on the left is the 50S
subunit (derived from PDB ID 1JJ2 and 1GIY), with
tRNAs (purple, mauve, and gray); bound to sites E,
P, and A, described later in the text; the tRNA anti-
codons are in orange. Proteins appear as blue wormlike
structures; the rRNA as a blended space-filling representation
designed to highlight surface features, with the bases in white and
the backbone in green. The structure on the right is the 30S subunit (derived from PDB ID 1J5E and 1JGO). Proteins are yellow and the
rRNA white. The part of the mRNA that interacts with the tRNA anti-
codons is shown in red. The rest of the mRNA winds through grooves
or channels on the 30S subunit surface.
(b) A model of a complete active bacterial ribosome (derived from
PDB ID 1J5E, 1JJ2, 1JGO, and 1GIY). All components are colored as
in (a). This is a view down into the groove separating the sub-
units. A second view (inset) is from the same angle, but with
the tRNAs removed to give a better sense of the cleft where
protein synthesis occurs.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1046 mac76 mac76:385_reb:
structures of the ribosome and its subunits have stimu-
lated a new look at the evolution of life (Box 27–3).
The two irregularly shaped ribosomal subunits fit
together to form a cleft through which the mRNA passes
as the ribosome moves along it during translation (Fig.
27–9b). The 55 proteins in bacterial ribosomes vary
enormously in size and structure. Molecular weights
range from about 6,000 to 75,000. Most of the proteins
have globular domains arranged on the ribosome sur-
face. Some also have snakelike protein extensions that
protrude into the rRNA core of the ribosome, stabiliz-
ing its structure. The functions of some of these pro-
teins have not yet been elucidated in detail, although a
structural role seems evident for many of them.
The sequences of the rRNAs of many organisms are
now known. Each of the three single-stranded rRNAs of
27.2 Protein Synthesis 1047
TABLE 27–6 RNA and Protein Components of the E. coli Ribosome
Number of Total number Protein Number and
Subunit different proteins of proteins designations type of rRNAs
30S 21 21 S1–S21 1 (16S rRNA)
50S 33 36 L1–L36* 2 (5S and 23S rRNAs)
*
The L1 to L36 protein designations do not correspond to 36 different proteins. The protein originally designated L7 is in fact a modified form of
L12, and L8 is a complex of three other proteins. Also, L26 proved to be the same protein as S20 (and not part of the 50S subunit). This gives 33
different proteins in the large subunit. There are four copies of the L7/L12 protein, with the three extra copies bringing the total protein count to 36.
(d)
Bacterial ribosome
70S M
r
2.7 H11547 10
6
Eukaryotic ribosome
80S M
r
4.2 H11547 10
6
50S 60S
M
r
1.8 H11547 10
6
5S rRNA
(120 nucleotides)
23S rRNA
(3,200 nucleotides)
36 proteins
M
r
2.8 H11547 10
6
5S rRNA
(120 nucleotides)
28S rRNA
(4,700 nucleotides)
5.8S rRNA
(160 nucleotides)
H11011 49 proteins
30S 40S
M
r
0.9 H11547 10
6
16S rRNA
(1,540 nucleotides)
21 proteins
M
r
1.4 H11547 10
6
18S rRNA
(1,900 nucleotides)
H11011 33 proteins
(c) Structure of the 50S bacterial ribosome subunit (PDB ID 1Q7Y).
The subunit is again viewed from the side that attaches to the 30S sub-
unit, but is tilted down slightly compared to its orientation in (a). The
active site for peptide bond formation (the peptidyl transferase activ-
ity), deep within a surface groove and far away from any protein, is
marked by a bound inhibitor, puromycin (red).
(d) Summary of the composition and mass of ribosomes in prokary-
otes and eukaryotes. Ribosomal subunits are identified by their S (Sved-
berg unit) values, sedimentation coefficients that refer to their rate of
sedimentation in a centrifuge. The S values are not necessarily addi-
tive when subunits are combined, because rates of sedimentation are
affected by shape as well as mass.
(c)
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1047 mac76 mac76:385_reb:
Chapter 27 Protein Metabolism1048
BOX 27–3 THE WORLD OF BIOCHEMISTRY
From an RNA World to a Protein World
Extant ribozymes generally promote one of two types
of reactions: hydrolytic cleavage of phosphodiester
bonds or phosphoryl transfers (Chapter 26). In both
cases, the substrates of the reactions are also RNA
molecules. The ribosomal RNAs provide an important
expansion of the catalytic range of known ribozymes.
Coupled to the laboratory exploration of potential
RNA catalytic function (see Box 26–3), the idea of an
RNA world as a precursor to current life forms be-
comes increasingly attractive.
A viable RNA world would require an RNA capa-
ble of self-replication, a primitive metabolism to gen-
erate the needed ribonucleotide precursors, and a cell
boundary to aid in concentrating the precursors and
sequestering them from the environment. The re-
quirements for catalysis of reactions involving a grow-
ing range of metabolites and macromolecules could
have led to larger and more complex RNA catalysts.
The many negatively charged phosphoryl groups in
the RNA backbone limit the stability of very large RNA
molecules. In an RNA world, divalent cations or other
positively charged groups could be incorporated into
the structures to augment stability.
Certain peptides could stabilize large RNA mol-
ecules. For example, many ribosomal proteins in
modern eukaryotic cells have long extensions, lack-
ing secondary structure, that snake into the rRNAs
and help stabilize them (Fig. 1). Ribozyme-catalyzed
synthesis of peptides could thus initially have evolved
as part of a general solution to the structural main-
tenance of large RNA molecules. The synthesis of
peptides may have helped stabilize large ribozymes,
but this advance also marked the beginning of the
end for the RNA world. Once peptide synthesis was
possible, the greater catalytic potential of proteins
would have set in motion an irreversible transition to
a protein-dominated metabolic system.
Most enzymatic processes, then, were eventually
surrendered to the proteins—but not all. In every or-
ganism, the critical task of synthesizing the proteins
remains, even now, a ribozyme-catalyzed process.
There appears to be only one good arrangement (or
just a very few) of nucleotide residues in a ribozyme
active site that can catalyze peptide synthesis. The
rRNA residues that seem to be involved in the pep-
tidyl transferase activity of ribosomes are highly con-
served in the large-subunit rRNAs of all species. Using
in vitro evolution (SELEX; see Box 26–3), investiga-
tors have isolated artificial ribozymes that promote
peptide synthesis. Intriguingly, most of them include
the ribonucleotide octet (5H11032)AUAACAGG(3H11032), a highly
conserved sequence found at the peptidyl transferase
active site in the ribosomes of all cells. There may be
just one optimal solution to the overall chemical prob-
lem of ribozyme-catalyzed synthesis of proteins of de-
fined sequence. Evolution found this solution once,
and no life form has notably improved on it.
FIGURE 1 The 50S subunit of a bacterial ribosome (PDB ID 1NKW).
The protein backbones are shown as blue wormlike structures; the
rRNA components are transparent. The unstructured extensions of
many of the ribosomal proteins snake into the rRNA structures, help-
ing to stabilize them.
E. coli has a specific three-dimensional conformation fea-
turing extensive intrachain base pairing. The predicted
secondary structure of the rRNAs (Fig. 27–10) has
largely been confirmed in the high-resolution models,
but fails to convey the extensive network of tertiary in-
teractions evident in the complete structure.
The ribosomes of eukaryotic cells (other than mi-
tochondrial and chloroplast ribosomes) are larger and
more complex than bacterial ribosomes (Fig. 27–9d),
with a diameter of about 23 nm and a sedimentation co-
efficient of about 80S. They also have two subunits,
which vary in size among species but on average are 60S
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1048 mac76 mac76:385_reb:
and 40S. Altogether, eukaryotic ribosomes contain more
than 80 different proteins. The ribosomes of mitochon-
dria and chloroplasts are somewhat smaller and simpler
than bacterial ribosomes. Nevertheless, ribosomal struc-
ture and function are strikingly similar in all organisms
and organelles.
Transfer RNAs Have Characteristic
Structural Features
To understand how tRNAs can serve as adaptors in
translating the language of nucleic acids into the lan-
guage of proteins, we must first examine their structure
in more detail. Transfer RNAs are relatively small and
consist of a single strand of RNA folded into a precise
three-dimensional structure (see Fig. 8–28a). The tRNAs
in bacteria and in the cytosol of eukaryotes have be-
tween 73 and 93 nucleotide residues, corresponding to
molecular weights of 24,000 to 31,000. Mitochondria and
chloroplasts contain distinctive, somewhat smaller
tRNAs. Cells have at least one kind of tRNA for each
amino acid; at least 32 tRNAs are required to recognize
all the amino acid codons (some recognize more than
one codon), but some cells use more than 32.
Yeast alanine tRNA (tRNA
Ala
), the first nucleic acid
to be completely sequenced (Fig. 27–11), contains 76
nucleotide residues, 10 of which have modified bases.
Comparisons of tRNAs from various species have re-
vealed many common denominators of structure (Fig.
27–12). Eight or more of the nucleotide residues have
modified bases and sugars, many of which are methy-
lated derivatives of the principal bases. Most tRNAs have
a guanylate (pG) residue at the 5H11032 end, and all have the
trinucleotide sequence CCA(3H11032) at the 3H11032 end. When
27.2 Protein Synthesis 1049
3H11032 (1,542)
5H11032 (1)
16S rRNA
5S rRNA
5'
3'
FIGURE 27–10 Bacterial rRNAs. Diagrams of the secondary structure
of E. coli 16S and 5S rRNAs. The first (5H11032 end) and final (3H11032 end)
ribonucleotide residues of the 16S rRNA are numbered.
FIGURE 27–11 Nucleotide sequence of yeast tRNA
Ala
. This structure
was deduced in 1965 by Robert W. Holley and his colleagues; it is
shown in the cloverleaf conformation in which intrastrand base pair-
ing is maximal. The following symbols are used for the modified nu-
cleotides (shaded pink): H9274, pseudouridine; I, inosine; T, ribothymidine;
D, 5,6-dihydrouridine; m
1
I, 1-methylinosine; m
1
G, 1-methylguano-
sine; m
2
G, N
2
-dimethylguanosine (see
Fig. 26–24). Blue lines between paral-
lel sections indicate Watson-Crick base
pairs. The anticodon can recognize
three codons for alanine (GCA, GCU,
and GCC). Other features of tRNA
structure are shown in Figures 27–12
and 27–13. Note the presence of two
GUU base pairs, signified by a blue
dot to indicate non-Watson-Crick pair-
ing. In RNAs, guanosine is often base-
paired with uridine, although the GUU
pair is not as stable as the Watson-
Crick GmC pair (Chapter 8).
3H11032
5H11032
Site for amino acid
attachment
Anticodon
triplet
G
G
C
G
U
G
pG
C
U
G
C
U
C
C
U
C
C
C
U
G
A
G
G
G
C
C
C
A
A
U
I
G
C
m
1
I
5H11032 3H11032
m
2
G
U
GCGU
CGCGA
G
G
C
G
A
m
1
G
A
G G
C
U
CCGG
A
GGCC
C
G
A
U
U
T
w
w
DD
D
Robert W. Holley,
1922–1993
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1049 mac76 mac76:385_reb:
drawn in two dimensions, the hydrogen-bonding pattern
of all tRNAs forms a cloverleaf structure with four arms;
the longer tRNAs have a short fifth arm, or extra arm
(Fig. 27–12). In three dimensions, a tRNA has the form
of a twisted L (Fig. 27–13).
Two of the arms of a tRNA are critical for its adap-
tor function. The amino acid arm can carry a specific
amino acid esterified by its carboxyl group to the 2H11032- or
3H11032-hydroxyl group of the A residue at the 3H11032 end of the
tRNA. The anticodon arm contains the anticodon. The
other major arms are the D arm, which contains the un-
usual nucleotide dihydrouridine (D), and the TH9274C arm,
which contains ribothymidine (T), not usually present
in RNAs, and pseudouridine (H9274), which has an unusual
carbon–carbon bond between the base and ribose (see
Fig. 26–24). The D and TH9274C arms contribute important
Chapter 27 Protein Metabolism1050
pG
U
A
Pu
G*
G A
Py
U Pu
G
T
Pu
C
C
A
3H11032
5H11032
Amino acid
arm
TwC arm
Extra arm
Variable in size,
not present in
all tRNAs
Anticodon
arm
Anticodon
Wobble
position
Contains
two or three
D residues
at different
positions
D arm
C
C
3H11032
5H11032
FIGURE 27–12 General cloverleaf secondary structure
of tRNAs. The large dots on the backbone represent
nucleotide residues; the blue lines represent base pairs.
Characteristic and/or invariant residues common to all
tRNAs are shaded in pink. Transfer RNAs vary in length
from 73 to 93 nucleotides. Extra nucleotides occur in
the extra arm or in the D arm. At the end of the anti-
codon arm is the anticodon loop, which always
contains seven unpaired nucleotides. The D arm
contains two or three D (5,6-dihydrouridine) residues,
depending on the tRNA. In some tRNAs, the D arm has
only three hydrogen-bonded base pairs. In addition to
the symbols explained in Figure 27–11: Pu, purine
nucleotide; Py, pyrimidine nucleotide; G*, guanylate or
2H11032-O-methylguanylate.
D arm
(residues
10–25)
Anticodon
arm
Anticodon
Amino
acid armTwC arm
5H11032
1
64
54
56
20
44
32
38
26
12
7
69
72
3H11032
(a) (b)
FIGURE 27–13 Three-dimensional structure of yeast tRNA
Phe
de-
duced from x-ray diffraction analysis. The shape resembles a twisted
L. (a) Schematic diagram with the various arms identified in Figure
27–12 shaded in different colors. (b) A space-filling model, with the
same color coding (PDB ID 4TRA).The CCA sequence at the 3H11032 end
(orange) is the attachment point for the amino acid.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1050 mac76 mac76:385_reb:
interactions for the overall folding of tRNA molecules,
and the TH9274C arm interacts with the large-subunit rRNA.
Having looked at the structures of ribosomes and tRNAs,
we now consider in detail the five stages of protein syn-
thesis.
Stage 1: Aminoacyl-tRNA Synthetases Attach the
Correct Amino Acids to Their tRNAs
During the first stage of protein synthesis, taking place
in the cytosol, aminoacyl-tRNA synthetases esterify the
20 amino acids to their corresponding tRNAs. Each en-
zyme is specific for one amino acid and one or more cor-
responding tRNAs. Most organisms have one aminoacyl-
tRNA synthetase for each amino acid. For amino acids
with two or more corresponding tRNAs, the same en-
zyme usually aminoacylates all of them.
The structures of all the aminoacyl-tRNA syn-
thetases of E. coli have been determined. Researchers
have divided them into two classes (Table 27–7) based
on substantial differences in primary and tertiary
structure and in reaction mechanism (Fig. 27–14);
these two classes are the same in all organisms. There
is no evidence for a common ancestor, and the bio-
logical, chemical, or evolutionary reasons for two en-
zyme classes for essentially identical processes remain
obscure.
The reaction catalyzed by an aminoacyl-tRNA syn-
thetase is
Amino acid H11001 tRNA H11001 ATP
aminoacyl-tRNA H11001 AMP H11001 PP
i
This reaction occurs in two steps in the enzyme’s active
site. In step 1 (Fig. 27–14) an enzyme-bound interme-
diate, aminoacyl adenylate (aminoacyl-AMP), forms
when the carboxyl group of the amino acid reacts with
the H9251-phosphoryl group of ATP to form an anhydride
linkage, with displacement of pyrophosphate. In the sec-
Mg
2H11001
3:::4
ond step the aminoacyl group is transferred from
enzyme-bound aminoacyl-AMP to its corresponding
specific tRNA. The course of this second step depends
on the class to which the enzyme belongs, as shown by
pathways 2a and 2b in Figure 27–14. The resulting es-
ter linkage between the amino acid and the tRNA (Fig.
27–15) has a highly negative standard free energy of
hydrolysis (H9004GH11032H11034 H11005 H1100229 kJ/mol). The pyrophosphate
formed in the activation reaction undergoes hydrolysis
to phosphate by inorganic pyrophosphatase. Thus two
high-energy phosphate bonds are ultimately expended
for each amino acid molecule activated, rendering the
overall reaction for amino acid activation essentially
irreversible:
Amino acid H11001 tRNA H11001 ATP
aminoacyl-tRNA H11001 AMP H11001 2P
i
DGH11032H11034 H11015 H1100229 kJ/mol
Proofreading by Aminoacyl-tRNA Synthetases The amino-
acylation of tRNA accomplishes two ends: (1) activation
of an amino acid for peptide bond formation and (2) at-
tachment of the amino acid to an adaptor tRNA that en-
sures appropriate placement of the amino acid in a grow-
ing polypeptide. The identity of the amino acid attached
to a tRNA is not checked on the ribosome, so attach-
ment of the correct amino acid to the tRNA is essential
to the fidelity of protein synthesis.
As you will recall from Chapter 6, enzyme speci-
ficity is limited by the binding energy available from en-
zyme-substrate interactions. Discrimination between
two similar amino acid substrates has been studied in
detail in the case of Ile-tRNA synthetase, which distin-
guishes between valine and isoleucine, amino acids that
differ by only a single methylene group (OCH
2
O). Ile-
tRNA synthetase favors activation of isoleucine (to form
Ile-AMP) over valine by a factor of 200—as we would
expect, given the amount by which a methylene group
(in Ile) could enhance substrate binding. Yet valine is
erroneously incorporated into proteins in positions nor-
mally occupied by an Ile residue at a frequency of only
about 1 in 3,000. How is this greater than tenfold in-
crease in accuracy brought about? Ile-tRNA synthetase,
like some other aminoacyl-tRNA synthetases, has a
proofreading function.
Recall a general principle from the discussion of
proofreading by DNA polymerases (p. 955): if available
binding interactions do not provide sufficient discrimi-
nation between two substrates, the necessary specificity
can be achieved by substrate-specific binding in two
successive steps. The effect of forcing the system
through two successive filters is multiplicative. In the
case of Ile-tRNA synthetase, the first filter is the initial
binding of the amino acid to the enzyme and its activa-
tion to aminoacyl-AMP. The second is the binding of any
Mg
2H11001
88888n
27.2 Protein Synthesis 1051
TABLE 27–7
Class I Class II
Arg Leu Ala Lys
Cys Met Asn Phe
Gln Trp Asp Pro
Glu Tyr Gly Ser
Ile Val His Thr
The Two Classes of Aminoacyl-
tRNA Synthetases
Note: Here, Arg represents arginyl-tRNA synthetase, and so forth. The classification applies
to all organisms for which tRNA synthetases have been analyzed and is based on protein
structural distinctions and on the mechanistic distinction outlined in Figure 27–14.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1051 mac76 mac76:385_reb:
Chapter 27 Protein Metabolism1052
CH
2
A
O
H11002
O
H11002
O
O
H
C
O
O
RC
H11001
B
B
P
A
O
H11002
O
O
O
B
PO
A
O
H11002
O
O
O
B
P
H11002
O
O
ATPAmino acid
Adenine
O
H
H
H
HOH
O
O
H
C
A
RCO
B
O
H11002
H11001
NH
3
CH
2
A
O
H11002
O
O
O
B
POO O
AdenineO
H
OH
H
H
H
OH
5H11032-Aminoacyl adenylate
(aminoacyl-AMP)
class II
aminoacyl-tRNA
synthetases
class I
aminoacyl-tRNA
synthetases
A
O
H11002
O
O
H
C O
O
A
R OC
B
O
B
POO
H11001
NH
3
O
AdenineO
H
OH
H
H
H
OH
CH
2
A
A
O
PO
OP
A
H11001
NH
3
A
3H11032 end of tRNA
PP
i
CH
2
A
O
O
H
C
O
O
RCO
B
P
H11002
O
AdenosineO
H
OH
H
H
H
OH
A
OO
OP
A
H11001
NH
3
A
Aminoacyl-AMP
tRNA
O
O
transesterification
O
Adenine
CH
2
A
O
H11002
O
O
H
C
O
O
RC
B
B
P
H11002
O
O
Adenine
O
H
AMP
H
H
H
A
A
O
PO
OP
A
H11001
NH
3
A
CH
2
A
O
O
H
C
O
O
RCO
B
P
O
H11002
O
AdenosineO
H
OH
H
H
H
OH
A
OO
OP
A
H11001
NH
3
A
Aminoacyl-AMP
3H11032
2H11032
3H11032
2H11032
tRNA
O
O
AMP
O
Adenine
Aminoacyl-tRNA
O O
O
OH
3H11032
2H11032
3H11032
2H11032
1
2b
3a
2a
MECHANISM FIGURE 27–14 Aminoacylation
of tRNA by aminoacyl-tRNA synthetases. Step
1 is formation of an aminoacyl adenylate,
which remains bound to the active site. In the
second step the aminoacyl group is transferred
to the tRNA. The mechanism of this step is
somewhat different for the two classes of
aminoacyl-tRNA synthetases (see Table 27–7).
For class I enzymes, 2a the aminoacyl group
is transferred initially to the 2H11032-hydroxyl group
of the 3H11032-terminal A residue, then 3a to the
3H11032-hydroxyl group by a transesterification
reaction. For class II enzymes, 2b the
aminoacyl group is transferred directly to the
3H11032-hydroxyl group of the terminal adenylate.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1052 mac76 mac76:385_reb:
incorrect aminoacyl-AMP products to a separate active
site on the enzyme; a substrate that binds in this sec-
ond active site is hydrolyzed. The R group of valine is
slightly smaller than that of isoleucine, so Val-AMP fits
the hydrolytic (proofreading) site of the Ile-tRNA syn-
thetase but Ile-AMP does not. Thus Val-AMP is hy-
drolyzed to valine and AMP in the proofreading active
site, and tRNA bound to the synthetase does not be-
come aminoacylated to the wrong amino acid.
In addition to proofreading after formation of the
aminoacyl-AMP intermediate, most aminoacyl-tRNA
synthetases can also hydrolyze the ester linkage be-
tween amino acids and tRNAs in the aminoacyl-tRNAs.
This hydrolysis is greatly accelerated for incorrectly
charged tRNAs, providing yet a third filter to enhance
the fidelity of the overall process. The few aminoacyl-
tRNA synthetases that activate amino acids with no
close structural relatives (Cys-tRNA synthetase, for ex-
ample) demonstrate little or no proofreading activity; in
these cases, the active site for aminoacylation can suf-
ficiently discriminate between the proper substrate and
any incorrect amino acid.
The overall error rate of protein synthesis (~1 mis-
take per 10
4
amino acids incorporated) is not nearly as
COO
H11002
H11001
CH
CH
H
3
N
CH
3
CH
3
Valine
COO
H11002
H11001
CHH
3
N
CCH
3
CH
2
H
CH
3
Isoleucine
low as that of DNA replication. Because flaws in a pro-
tein are eliminated when the protein is degraded and
are not passed on to future generations, they have less
biological significance. The degree of fidelity in protein
synthesis is sufficient to ensure that most proteins con-
tain no mistakes and that the large amount of energy
required to synthesize a protein is rarely wasted. One
defective protein molecule is usually unimportant when
many correct copies of the same protein are present.
Interaction between an Aminoacyl-tRNA Synthetase and a tRNA:
A “Second Genetic Code” An individual aminoacyl-tRNA
synthetase must be specific not only for a single amino
acid but for certain tRNAs as well. Discriminating among
dozens of tRNAs is just as important for the overall fi-
delity of protein biosynthesis as is distinguishing among
amino acids. The interaction between aminoacyl-tRNA
synthetases and tRNAs has been referred to as the “sec-
ond genetic code,” reflecting its critical role in main-
taining the accuracy of protein synthesis. The “coding”
rules appear to be more complex than those in the “first”
code.
Figure 27–16 summarizes what we know about the
nucleotides involved in recognition by some aminoacyl-
tRNA synthetases. Some nucleotides are conserved in all
tRNAs and therefore cannot be used for discrimination.
27.2 Protein Synthesis 1053
3H11032 end of tRNA
A
CH
2
A
O
O
H
C
O
O
RCO
B
P
H11002
O
O
H
OH
H
H
H
A
O
OP
A
H11001
NH
3
A
Aminoacyl
group
3H11032
2H11032
Adenine
5H11032
pG
Amino acid
arm
TwC
arm
Anticodon
arm
D
arm
O
FIGURE 27–15 General structure of aminoacyl-tRNAs. The amino-
acyl group is esterified to the 3H11032 position of the terminal A residue.
The ester linkage that both activates the amino acid and joins it to the
tRNA is shaded pink.
3H11032
5H11032
Amino acid
arm
TwC arm
Extra arm
Anticodon
arm
Anticodon
D arm
FIGURE 27–16 Nucleotide positions in tRNAs that are recognized
by aminoacyl-tRNA synthetases. Some positions (blue dots) are the
same in all tRNAs and therefore cannot be used to discriminate one
from another. Other positions are known recognition points for one
(orange) or more (green) aminoacyl-tRNA synthetases. Structural fea-
tures other than sequence are important for recognition by some of
the synthetases.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1053 mac76 mac76:385_reb:
By observing changes in nucleotides that alter substrate
specificity, researchers have identified nucleotide posi-
tions that are involved in discrimination by the amino-
acyl-tRNA synthetases. These nucleotide positions seem
to be concentrated in the amino acid arm and the anti-
codon arm, including the nucleotides of the anticodon
itself, but are also located in other parts of the tRNA
molecule. Determination of the crystal structures of
aminoacyl-tRNA synthetases complexed with their cog-
nate tRNAs and ATP has added a great deal to our un-
derstanding of these interactions (Fig. 27–17).
Ten or more specific nucleotides may be involved
in recognition of a tRNA by its specific aminoacyl-tRNA
synthetase. But in a few cases the recognition mecha-
nism is quite simple. Across a range of organisms from
bacteria to humans, the primary determinant of tRNA
recognition by the Ala-tRNA synthetases is a single
GUU base pair in the amino acid arm of tRNA
Ala
(Fig.
27–18a). A short RNA with as few as 7 bp arranged in
a simple hairpin minihelix is efficiently aminoacylated
by the Ala-tRNA synthetase, as long as the RNA con-
tains the critical GUU (Fig. 27–18b). This relatively sim-
ple alanine system may be an evolutionary relic of a pe-
riod when RNA oligonucleotides, ancestors to tRNA,
were aminoacylated in a primitive system for protein
synthesis.
Stage 2: A Specific Amino Acid Initiates
Protein Synthesis
Protein synthesis begins at the amino-terminal end and
proceeds by the stepwise addition of amino acids to the
carboxyl-terminal end of the growing polypeptide, as de-
termined by Howard Dintzis in 1961 (Fig. 27–19). The
AUG initiation codon thus specifies an amino-terminal
methionine residue. Although methionine has only one
codon, (5H11032)AUG, all organisms have two tRNAs for me-
thionine. One is used exclusively when (5H11032)AUG is the
initiation codon for protein synthesis. The other is used
to code for a Met residue in an internal position in a
polypeptide.
The distinction between an initiating (5H11032)AUG and
an internal one is straightforward. In bacteria, the two
types of tRNA specific for methionine are designated
tRNA
Met
and tRNA
fMet
. The amino acid incorporated in
response to the (5H11032)AUG initiation codon is N-formyl-
methionine (fMet). It arrives at the ribosome as
N-formylmethionyl-tRNA
fMet
(fMet-tRNA
fMet
), which is
formed in two successive reactions. First, methionine is
attached to tRNA
fMet
by the Met-tRNA synthetase (which
in E. coli aminoacylates both tRNA
fMet
and tRNA
Met
):
Methionine H11001 tRNA
fMet
H11001 ATP On
Met-tRNA
fMet
H11001 AMP H11001 PP
i
Chapter 27 Protein Metabolism1054
(a) (b)
FIGURE 27–17 Aminoacyl-tRNA synthetases. Both synthetases are
complexed with their cognate tRNAs (green stick structures). Bound
ATP (red) pinpoints the active site near the end of the aminoacyl arm.
(a) Gln-tRNA synthetase from E. coli, a typical monomeric type I syn-
thetase (PDB ID 1QRT). (b) Asp-tRNA synthetase from yeast, a typi-
cal dimeric type II synthetase (PDB ID 1ASZ).
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1054 mac76 mac76:385_reb:
Next, a transformylase transfers a formyl group from
N
10
-formyltetrahydrofolate to the amino group of the
Met residue:
N
10
-Formyltetrahydrofolate H11001 Met-tRNA
fMet
On
tetrahydrofolate H11001 fMet-tRNA
fMet
The transformylase is more selective than the Met-tRNA
synthetase; it is specific for Met residues attached to
tRNA
fMet
, presumably recognizing some unique struc-
tural feature of that tRNA. By contrast, Met-tRNA
Met
in-
serts methionine in interior positions in polypeptides.
Addition of the N-formyl group to the amino group
of methionine by the transformylase prevents fMet from
entering interior positions in a polypeptide while also
allowing fMet-tRNA
fMet
to be bound at a specific ribo-
somal initiation site that accepts neither Met-tRNA
Met
nor any other aminoacyl-tRNA.
In eukaryotic cells, all polypeptides synthesized by
cytosolic ribosomes begin with a Met residue (rather than
fMet), but, again, the cell uses a specialized initiating
N-Formylmethionine
N
H
H O
S
CH
2
CH
2
CH
3
C
COO
H5008
C
O
H
27.2 Protein Synthesis 1055
4 min
Amino terminus Carboxyl terminus
Direction of chain growth
7 min
16 min
60 min
146
Residue number
1
3H11032
5H11032
76
GU70
4030
20
10
3H11032
5H11032
76
GU70
(a)
1
5
10
13
66
Deleted
nucleotides
(b)
60
50
1
FIGURE 27–18 Structural elements of tRNA
Ala
that
are required for recognition by Ala-tRNA synthetase.
(a) The tRNA
Ala
structural elements recognized by the
Ala-tRNA synthetase are unusually simple. A single
GUU base pair (pink) is the only element needed for
specific binding and aminoacylation. (b) A short
synthetic RNA minihelix, with the critical GUU base
pair but lacking most of the remaining tRNA structure.
This is specifically aminoacylated with alanine almost
as efficiently as the complete tRNA
Ala
.
FIGURE 27–19 Proof that polypeptides grow by addition of amino
acid residues to the carboxyl end: the Dintzis experiment. Reticulo-
cytes (immature erythrocytes) actively synthesizing hemoglobin were
incubated with radioactive leucine (selected because it occurs fre-
quently in both the H9251- and H9252-globin chains). Samples of completed H9251
chains were isolated from the reticulocytes at various times afterward,
and the distribution of radioactivity was determined. The dark red
zones show the portions of completed H9251-globin chains containing ra-
dioactive Leu residues. At 4 min, only a few residues at the carboxyl
end of H9251-globin were labeled, because the only complete globin chains
with incorporated label after 4 min were those that had nearly com-
pleted synthesis at the time the label was added. With longer incu-
bation times, successively longer segments of the polypeptide con-
tained labeled residues, always in a block at the carboxyl end of the
chain. The unlabeled end of the polypeptide (the amino terminus) was
thus defined as the initiating end, which means that polypeptides grow
by successive addition of amino acids to the carboxyl end.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1055 mac76 mac76:385_reb:
tRNA that is distinct from the tRNA
Met
used at (5H11032)AUG
codons at interior positions in the mRNA. Polypeptides
synthesized by mitochondrial and chloroplast ribo-
somes, however, begin with N-formylmethionine. This
strongly supports the view that mitochondria and
chloroplasts originated from bacterial ancestors that
were symbiotically incorporated into precursor eukary-
otic cells at an early stage of evolution (see Fig. 1–36).
How can the single (5H11032)AUG codon distinguish be-
tween the starting N-formylmethionine (or methionine,
in eukaryotes) and interior Met residues? The details of
the initiation process provide the answer.
The Three Steps of Initiation The initiation of polypep-
tide synthesis in bacteria requires (1) the 30S ribosomal
subunit, (2) the mRNA coding for the polypeptide to be
made, (3) the initiating fMet-tRNA
fMet
, (4) a set of three
proteins called initiation factors (IF-1, IF-2, and IF-3),
(5) GTP, (6) the 50S ribosomal subunit, and (7) Mg
2H11001
.
Formation of the initiation complex takes place in three
steps (Fig. 27–20).
In step 1 the 30S ribosomal subunit binds two ini-
tiation factors, IF-1 and IF-3. Factor IF-3 prevents the
30S and 50S subunits from combining prematurely. The
mRNA then binds to the 30S subunit. The initiating
(5H11032)AUG is guided to its correct position by the Shine-
Dalgarno sequence (named for Australian researchers
John Shine and Lynn Dalgarno, who identified it) in the
mRNA. This consensus sequence is an initiation signal
of four to nine purine residues, 8 to 13 bp to the 5H11032 side
of the initiation codon (Fig. 27–21a). The sequence
base-pairs with a complementary pyrimidine-rich se-
quence near the 3H11032 end of the 16S rRNA of the 30S ri-
bosomal subunit (Fig. 27–21b). This mRNA-rRNA in-
teraction positions the initiating (5H11032)AUG sequence of
the mRNA in the precise position on the 30S subunit
where it is required for initiation of translation. The par-
ticular (5H11032)AUG where fMet-tRNA
fMet
is to be bound is
distinguished from other methionine codons by its prox-
imity to the Shine-Dalgarno sequence in the mRNA.
Bacterial ribosomes have three sites that bind
aminoacyl-tRNAs, the aminoacyl (A) site, the pep-
tidyl (P) site, and the exit (E) site. Both the 30S
and the 50S subunits contribute to the characteristics
of the A and P sites, whereas the E site is largely con-
fined to the 50S subunit. The initiating (5H11032)AUG is
positioned at the P site, the only site to which fMet-
tRNA
fMet
can bind (Fig. 27–20). The fMet-tRNA
fMet
is
the only aminoacyl-tRNA that binds first to the P site;
during the subsequent elongation stage, all other in-
coming aminoacyl-tRNAs (including the Met-tRNA
Met
that binds to interior AUG codons) bind first to the A
site and only subsequently to the P and E sites. The E
site is the site from which the “uncharged” tRNAs leave
during elongation. Factor IF-1 binds at the A site and
prevents tRNA binding at this site during initiation.
Chapter 27 Protein Metabolism1056
P
IF-3
3H110325H11032
Initiation
codon
1
PA
mRNA
IF-3
30S
Subunit
IF-2
(3H11032) UAC (5H11032)
Anticodon
5H11032
5H11032
2
fMet
fMet
GTP
GTP
P
3H110325H11032
UAC
UA
C
IF-3
IF-1 IF-2 IF-3
IF-1
IF-1
5H11032
H11545H11545
GDP H11001 P
i
PA
E
3H110325H11032
UAC
AUG
fMet
50S
Subunit
50S
Subunit
Next
codon
mRNA
3
tRNA
IF-1
A U G
AUG
IF-2
FIGURE 27–20 Formation of the initiation complex in bacteria. The
complex forms in three steps (described in the text) at the expense of
the hydrolysis of GTP to GDP and P
i
. IF-1, IF-2, and IF-3 are initia-
tion factors. P designates the peptidyl site, A the aminoacyl site, and
E the exit site. Here the anticodon of the tRNA is oriented 3H11032 to 5H11032,
left to right, as in Figure 27–8 but opposite to the orientation in Fig-
ures 27–16 and 27–18.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1056 mac76 mac76:385_reb:
In step 2 of the initiation process (Fig. 27–20), the
complex consisting of the 30S ribosomal subunit, IF-3,
and mRNA is joined by both GTP-bound IF-2 and the
initiating fMet-tRNA
fMet
. The anticodon of this tRNA
now pairs correctly with the mRNA’s initiation codon.
In step 3 this large complex combines with the 50S
ribosomal subunit; simultaneously, the GTP bound to
IF-2 is hydrolyzed to GDP and P
i
, which are released
from the complex. All three initiation factors depart
from the ribosome at this point.
Completion of the steps in Figure 27–20 produces
a functional 70S ribosome called the initiation com-
plex, containing the mRNA and the initiating fMet-
tRNA
fMet
. The correct binding of the fMet-tRNA
fMet
to
the P site in the complete 70S initiation complex is as-
sured by at least three points of recognition and at-
tachment: the codon-anticodon interaction involving the
initiation AUG fixed in the P site; interaction between
the Shine-Dalgarno sequence in the mRNA and the 16S
rRNA; and binding interactions between the ribosomal
P site and the fMet-tRNA
fMet
. The initiation complex is
now ready for elongation.
Initiation in Eukaryotic Cells Translation is generally sim-
ilar in eukaryotic and bacterial cells; most of the signif-
icant differences are in the mechanism of initiation. Eu-
karyotic mRNAs are bound to the ribosome as a complex
with a number of specific binding proteins. Several of
these tie together the 5H11032 and 3H11032 ends of the message. At
the 3H11032 end, the mRNA is bound by the poly(A) binding
protein (PAB). Eukaryotic cells have at least nine initi-
ation factors. A complex called eIF4F, which includes
the proteins eIF4E, eIF4G, and eIF4A, binds to the 5H11032
cap (see Fig. 26–12) through eIF4E. The protein eIF4G
binds to both eIF4E and PAB, effectively tying them to-
gether (Fig. 27–22). The protein eIF4A has an RNA he-
licase activity. It is the eIF4F complex that associates
27.2 Protein Synthesis 1057
FIGURE 27–21 Messenger RNA sequences that serve as signals for
initiation of protein synthesis in bacteria. (a) Alignment of the initi-
ating AUG (shaded in green) at its correct location on the 30S ribo-
somal subunit depends in part on upstream Shine-Dalgarno sequences
(pink). Portions of the mRNA transcripts of five prokaryotic genes are
shown. Note the unusual example of the E. coli LacI protein, which
initiates with a GUG (Val) codon (see Box 27–2). (b) The Shine-
Dalgarno sequence of the mRNA pairs with a sequence near the 3H11032
end of the 16S rRNA.
(5H11032) A G C A C G A G G G G A A A U C U G A U G G A A C G C U A C (3H11032) E. coli trpA
E. coli araB
E. coli lacI
fX174 phage A protein
l phage cro
U U U G G A U G G A G U G A A A C G A U G G C G A U U G C A
C A A U U C A G G G U G G U G A A U G U G A A A C C A G U A
A A U C U U G G A G G C U U U U U U A U G G U U C G U U C U
A U G U A C U A A G G A G G U U G U A U G G A A C A A C G C
Shine-Dalgarno sequence;
pairs with 16S rRNA
Initiation codon;
pairs with fMet-tRNA
fMet
(a)
(5H11032) G A U U C C U A G G A G G U U U
Prokaryotic
mRNA
with consensus
Shine-Dalgarno
sequence
(b)
3H11032 End of
16S rRNA
3H11032
OH
A
U
U C C U C C
G
A
U
C
A
G A C C U A U G C G A G C U U (3H11032)U U A G U
A
A
A
A(A)
n
Gene
eIF4G
PABeIF4E
AUG
eIF3
5H11032 cap 3H11032 poly(A) tail
40S Ribosomal subunit
3H11032 Untranslated
region
FIGURE 27–22 Protein complexes in the formation of a eukaryotic
initiation complex. The 3H11032 and 5H11032 ends of eukaryotic mRNAs are linked
by a complex of proteins that includes several initiation factors and
the poly(A) binding protein (PAB). The factors eIF4E and eIF4G are
part of a larger complex called eIF4F. This complex binds to the 40S
ribosomal subunit.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1057 mac76 mac76:385_reb:
with another factor, eIF3, and with the 40S ribosomal
subunit. The efficiency of translation is affected by many
properties of the mRNA and proteins in this complex,
including the length of the 3H11032 poly(A) tract (in most
cases, longer is better). The end-to-end arrangement of
the eukaryotic mRNA facilitates translational regulation
of gene expression, considered in Chapter 28.
The initiating (5H11032)AUG is detected within the mRNA
not by its proximity to a Shine-Dalgarno-like sequence
but by a scanning process: a scan of the mRNA from the
5H11032 end until the first AUG is encountered, signaling the
beginning of the reading frame. The eIF4F complex is
probably involved in this process, perhaps using the
RNA helicase activity of eIF4A to eliminate secondary
structure in the 5H11032 untranslated portion of the mRNA.
Scanning is also facilitated by another protein, eIF4B.
The roles of the various bacterial and eukaryotic ini-
tiation factors in the overall process are summarized in
Table 27–8. The mechanism by which these proteins act
is an important area of investigation.
Stage 3: Peptide Bonds Are Formed in the
Elongation Stage
The third stage of protein synthesis is elongation.
Again, our initial focus is on bacterial cells. Elongation
requires (1) the initiation complex described above, (2)
aminoacyl-tRNAs, (3) a set of three soluble cytosolic
proteins called elongation factors (EF-Tu, EF-Ts, and
EF-G in bacteria), and (4) GTP. Cells use three steps to
add each amino acid residue, and the steps are repeated
as many times as there are residues to be added.
Elongation Step 1: Binding of an Incoming Aminoacyl-tRNA In
the first step of the elongation cycle (Fig. 27–23), the
appropriate incoming aminoacyl-tRNA binds to a com-
plex of GTP-bound EF-Tu. The resulting aminoacyl-
tRNA–EF-Tu–GTP complex binds to the A site of the
70S initiation complex. The GTP is hydrolyzed and an
EF-Tu–GDP complex is released from the 70S ribosome.
The EF-Tu–GTP complex is regenerated in a process in-
volving EF-Ts and GTP.
Elongation Step 2: Peptide Bond Formation A peptide bond
is now formed between the two amino acids bound by
their tRNAs to the A and P sites on the ribosome. This
occurs by the transfer of the initiating N-formylme-
thionyl group from its tRNA to the amino group of the
second amino acid, now in the A site (Fig. 27–24). The
H9251-amino group of the amino acid in the A site acts as a
nucleophile, displacing the tRNA in the P site to form
the peptide bond. This reaction produces a dipeptidyl-
tRNA in the A site, and the now “uncharged” (deacy-
lated) tRNA
fMet
remains bound to the P site. The tRNAs
then shift to a hybrid binding state, with elements of
each spanning two different sites on the ribosome, as
shown in Figure 27–24.
The enzymatic activity that catalyzes peptide bond
formation has historically been referred to as peptidyl
transferase and was widely assumed to be intrinsic to
one or more of the proteins in the large ribosomal sub-
unit. We now know that this reaction is catalyzed by the
23S rRNA (Fig. 27–9), adding to the known catalytic
repertoire of ribozymes. This discovery has interesting
implications for the evolution of life (Box 27–3).
Chapter 27 Protein Metabolism1058
TABLE 27–8 Protein Factors Required for Initiation of Translation in Bacterial and Eukaryotic Cells
Factor Function
Bacterial
IF-1 Prevents premature binding of tRNAs to A site
IF-2 Facilitates binding of fMet-tRNA
fMet
to 30S ribosomal subunit
IF-3 Binds to 30S subunit; prevents premature association of 50S
subunit; enhances specificity of P site for fMet-tRNA
fMet
Eukaryotic
*
eIF2 Facilitates binding of initiating Met-tRNA
Met
to 40S ribosomal subunit
eIF2B, eIF3 First factors to bind 40S subunit; facilitate subsequent steps
eIF4A RNA helicase activity removes secondary structure in the mRNA to permit binding
to 40S subunit; part of the eIF4F complex
eIF4B Binds to mRNA; facilitates scanning of mRNA to locate the first AUG
eIF4E Binds to the 5H11032 cap of mRNA; part of the eIF4F complex
eIF4G Binds to eIF4E and to poly(A) binding protein (PAB); part of the eIF4F complex
eIF5 Promotes dissociation of several other initiation factors from 40S subunit as a
prelude to association of 60S subunit to form 80S initiation complex
eIF6 Facilitates dissociation of inactive 80S ribosome into 40S and 60S subunits
*
The prefix “e” identifies these as eukaryotic factors.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1058 mac76 mac76:385_reb:
27.2 Protein Synthesis 1059
Tu
5H11032
GTP
PA
3H110325H11032
UAC
AUG
fMet
Initiation
complex
Next
codon
Initiation
codon
30S
5H11032
AA
2
Tu
GDP
Tu
Ts
Tu GTP
Ts
GDP
GTP
Ts
5H11032
AA
2
UAC
5H11032 5H11032
PA
E
E
3H110325H11032 AUG
fMet
binding of
incoming
aminoacyl-
tRNA
Incoming
aminoacyl-
tRNA
P
i
AA
2
50S
FIGURE 27–23 First elongation step in bacteria: binding of the sec-
ond aminoacyl-tRNA. The second aminoacyl-tRNA enters the A site
of the ribosome bound to EF-Tu (shown here as Tu), which also con-
tains GTP. Binding of the second aminoacyl-tRNA to the A site is ac-
companied by hydrolysis of the GTP to GDP and P
i
and release of the
EF-Tu–GDP complex from the ribosome. The bound GDP is released
when the EF-Tu–GDP complex binds to EF-Ts, and EF-Ts is subse-
quently released when another molecule of GTP binds to EF-Tu. This
recycles EF-Tu and makes it available to repeat the cycle.
H
C
NH
R
2
C
O
O
H
C
NH
R
1
C
O
H
C
O
P siteE site A site
5H11032
HC
NH
2
R
2
CO
O
UAC
PA
E
3H11032 mRNA 5H11032 AUG
HC
NH
R
1
CO
O
CO
H
P siteE site A site
fMet-tRNA
fMet
Aminoacyl-
tRNA
2
peptide bond
formation
..
5H11032 5H11032
UAC
Deacylated
tRNA
fMet
PA
E
3H11032 5H11032 AUG
Dipeptidyl-
tRNA
2
OH
5H11032
5H11032 5H11032
FIGURE 27–24 Second elongation step in bacteria: formation of the
first peptide bond. The peptidyl transferase catalyzing this reaction is
the 23S rRNA ribozyme. The N-formylmethionyl group is transferred
to the amino group of the second aminoacyl-tRNA in the A site, form-
ing a dipeptidyl-tRNA. At this stage, both tRNAs bound to the ribo-
some shift position in the 50S subunit to take up a hybrid binding
state. The uncharged tRNA shifts so that its 3H11032 and 5H11032 ends are in the
E site. Similarly, the 3H11032 and 5H11032 ends of the peptidyl tRNA shift to the
P site. The anticodons remain in the A and P sites.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1059 mac76 mac76:385_reb:
Elongation Step 3: Translocation In the final step of the
elongation cycle, translocation, the ribosome moves
one codon toward the 3H11032 end of the mRNA (Fig. 27–25a).
This movement shifts the anticodon of the dipeptidyl-
tRNA, which is still attached to the second codon of the
mRNA, from the A site to the P site, and shifts the de-
acylated tRNA from the P site to the E site, from where
the tRNA is released into the cytosol. The third codon
of the mRNA now lies in the A site and the second codon
in the P site. Movement of the ribosome along the mRNA
requires EF-G (also known as translocase) and the en-
ergy provided by hydrolysis of another molecule of GTP.
A change in the three-dimensional conformation of the
entire ribosome results in its movement along the mRNA.
Because the structure of EF-G mimics the structure of
the EF-Tu–tRNA complex (Fig. 27–25b), EF-G can bind
the A site and presumably displace the peptidyl-tRNA.
The ribosome, with its attached dipeptidyl-tRNA and
mRNA, is now ready for the next elongation cycle and
attachment of a third amino acid residue. This process
occurs in the same way as addition of the second residue
(as shown in Figs 27–23, 27–24, and 27–25). For each
amino acid residue correctly added to the growing
polypeptide, two GTPs are hydrolyzed to GDP and P
i
as
the ribosome moves from codon to codon along the
mRNA toward the 3H11032 end.
The polypeptide remains attached to the tRNA of
the most recent amino acid to be inserted. This associ-
ation maintains the functional connection between the
information in the mRNA and its decoded polypeptide
output. At the same time, the ester linkage between this
tRNA and the carboxyl terminus of the growing polypep-
tide activates the terminal carboxyl group for nucleo-
philic attack by the incoming amino acid to form a new
peptide bond (Fig. 27–24). As the existing ester linkage
between the polypeptide and tRNA is broken during
Chapter 27 Protein Metabolism1060
H
C
NH
R
2
C
O
O
H
C
NH
R
1
C
O
H
C
O
5H11032
5H11032
PA
UA
C
5H11032
5H11032
5H11032
3H11032 5H11032
Direction of
ribosome movement
(a)
AUG
H
C
NH
R
2
C
O
O
OH
H
C
NH
R
1
CO
CO
H
P siteE site A site
UAC
Deacylated
tRNA
fMet
Incoming
aminoacyl-tRNA
3
PA
E
3H11032 5H11032 AUG
Dipeptidyl-
tRNA
2
OH
P siteE site A site
5H11032
5H11032
5H11032 5H11032
EF-G
translocation
GTP
EF-G H11001 GDP H11001 P
i
(b)
FIGURE 27–25 Third elongation step in bacteria: translocation.
(a) The ribosome moves one codon toward the 3H11032 end of the mRNA,
using energy provided by hydrolysis of GTP bound to EF-G (translo-
case). The dipeptidyl-tRNA is now entirely in the P site, leaving the A
site open for the incoming (third) aminoacyl-tRNA. The uncharged
tRNA dissociates from the E site, and the elongation cycle begins again.
(b) The structure of EF-G mimics the structure of EF-Tu complexed with
tRNA. Shown here are (left) EF-Tu complexed with tRNA (green) (PDB
ID 1B23) and (right) EF-G complexed with GDP (red) (PDB ID 1DAR).
The carboxyl-terminal part of EF-G (dark gray) mimics the structure of
the anticodon loop of tRNA in both shape and charge distribution.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1060 mac76 mac76:385_reb:
peptide bond formation, the linkage between the poly-
peptide and the information in the mRNA persists, be-
cause each newly added amino acid is still attached to
its tRNA.
The elongation cycle in eukaryotes is quite simi-
lar to that in prokaryotes. Three eukaryotic elongation
factors (eEF1H9251, eEF1H9252H9253, and eEF2) have functions
analogous to those of the bacterial elongation factors
(EF-Tu, EF-Ts, and EF-G, respectively). Eukaryotic
ribosomes do not have an E site; uncharged tRNAs are
expelled directly from the P site.
Proofreading on the Ribosome The GTPase activity of EF-
Tu during the first step of elongation in bacterial cells
(Fig. 27–23) makes an important contribution to the
rate and fidelity of the overall biosynthetic process. Both
the EF-Tu–GTP and EF-Tu–GDP complexes exist for a
few milliseconds before they dissociate. These two in-
tervals provide opportunities for the codon-anticodon
interactions to be proofread. Incorrect aminoacyl-tRNAs
normally dissociate from the A site during one of these
periods. If the GTP analog guanosine 5H11032-O-(3-thio-
triphosphate) (GTPH9253S) is used in place of GTP, hydro-
lysis is slowed, improving the fidelity (by increasing the
proofreading intervals) but reducing the rate of protein
synthesis.
The process of protein synthesis (including the
characteristics of codon-anticodon pairing already de-
scribed) has clearly been optimized through evolution
to balance the requirements of both speed and fidelity.
Improved fidelity might diminish speed, whereas in-
creases in speed would probably compromise fidelity.
Note that the proofreading mechanism on the ribosome
establishes only that the proper codon-anticodon pair-
ing has taken place. The identity of the amino acid at-
tached to a tRNA is not checked on the ribosome. If a
tRNA is successfully aminoacylated with the wrong
amino acid (as can be done experimentally), this incor-
rect amino acid is efficiently incorporated into a protein
in response to whatever codon is normally recognized
by the tRNA.
Stage 4: Termination of Polypeptide Synthesis
Requires a Special Signal
Elongation continues until the ribosome adds the last
amino acid coded by the mRNA. Termination, the
fourth stage of polypeptide synthesis, is signaled by the
H5008
O
O
H5008
S
PO
O
H5008
O
P
NH
2
O
H5008
O
POO
O
OH
H
H
H
OH
H
N
N
O
N
N
CH
2
Guanosine 5H11032-O-(3-thiotriphosphate)
(GTPgS)
presence of one of three termination codons in the
mRNA (UAA, UAG, UGA), immediately following the fi-
nal coded amino acid. Mutations in a tRNA anticodon
that allow an amino acid to be inserted at a termination
codon are generally deleterious to the cell (Box 27–4).
In bacteria, once a termination codon occupies the
ribosomal A site, three termination factors, or re-
lease factors—the proteins RF-1, RF-2, and RF-3—
contribute to (1) hydrolysis of the terminal peptidyl-
tRNA bond; (2) release of the free polypeptide and the
last tRNA, now uncharged, from the P site; and (3) dis-
sociation of the 70S ribosome into its 30S and 50S sub-
units, ready to start a new cycle of polypeptide synthe-
sis (Fig. 27–26). RF-1 recognizes the termination
codons UAG and UAA, and RF-2 recognizes UGA and
UAA. Either RF-1 or RF-2 (depending on which codon
is present) binds at a termination codon and induces
peptidyl transferase to transfer the growing polypeptide
to a water molecule rather than to another amino acid.
The release factors have domains thought to mimic the
structure of tRNA, as shown for the elongation factor
EF-G in Figure 27–25b. The specific function of RF-3
has not been firmly established, although it is thought
to release the ribosomal subunit. In eukaryotes, a sin-
gle release factor, eRF, recognizes all three termination
codons.
Energy Cost of Fidelity in Protein Synthesis Synthesis of a
protein true to the information specified in its mRNA
requires energy. Formation of each aminoacyl-tRNA
uses two high-energy phosphate groups. An additional
ATP is consumed each time an incorrectly activated
amino acid is hydrolyzed by the deacylation activity of
an aminoacyl-tRNA synthetase, as part of its proof-
reading activity. A GTP is cleaved to GDP and P
i
during
the first elongation step, and another during the translo-
cation step. Thus, on average, the energy derived from
the hydrolysis of more than four NTPs to NDPs is re-
quired for the formation of each peptide bond of a
polypeptide.
This represents an exceedingly large thermody-
namic “push” in the direction of synthesis: at least 4 H11003
30.5 kJ/mol H11005 122 kJ/mol of phosphodiester bond en-
ergy to generate a peptide bond, which has a standard
free energy of hydrolysis of only about H1100221 kJ/mol. The
net free-energy change during peptide bond synthesis is
thus H11002101 kJ/mol. Proteins are information-containing
polymers. The biochemical goal is not simply the for-
mation of a peptide bond but the formation of a peptide
bond between two specified amino acids. Each of the
high-energy phosphate compounds expended in this
process plays a critical role in maintaining proper align-
ment between each new codon in the mRNA and its as-
sociated amino acid at the growing end of the polypep-
tide. This energy permits very high fidelity in the
biological translation of the genetic message of mRNA
into the amino acid sequence of proteins.
27.2 Protein Synthesis 1061
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1061 mac76 mac76:385_reb:
Rapid Translation of a Single Message by Polysomes Large
clusters of 10 to 100 ribosomes that are very active in
protein synthesis can be isolated from both eukaryotic
and bacterial cells. Electron micrographs show a fiber
between adjacent ribosomes in the cluster, which is
called a polysome (Fig. 27–27). The connecting strand
is a single molecule of mRNA that is being translated si-
multaneously by many closely spaced ribosomes, allow-
ing the highly efficient use of the mRNA.
In bacteria, transcription and translation are tightly
coupled. Messenger RNAs are synthesized and trans-
lated in the same 5H11032n 3H11032 direction. Ribosomes begin
translating the 5H11032 end of the mRNA before transcription
is complete (Fig. 27–28). The situation is quite differ-
ent in eukaryotic cells, where newly transcribed mRNAs
must leave the nucleus before they can be translated.
Bacterial mRNAs generally exist for just a few min-
utes (p. 1020) before they are degraded by nucleases.
In order to maintain high rates of protein synthesis, the
mRNA for a given protein or set of proteins must be
made continuously and translated with maximum effi-
ciency. The short lifetime of mRNAs in bacteria allows
a rapid cessation of synthesis when the protein is no
longer needed.
Stage 5: Newly Synthesized Polypeptide Chains
Undergo Folding and Processing
In the final stage of protein synthesis, the nascent
polypeptide chain is folded and processed into its bio-
logically active form. During or after its synthesis, the
polypeptide progressively assumes its native conforma-
tion, with the formation of appropriate hydrogen bonds
and van der Waals, ionic, and hydrophobic interactions.
In this way the linear, or one-dimensional, genetic
message in the mRNA is converted into the three-
dimensional structure of the protein. Some newly made
proteins, both prokaryotic and eukaryotic, do not attain
their final biologically active conformation until they
have been altered by one or more processing reactions
called posttranslational modifications.
Amino-Terminal and Carboxyl-Terminal Modifications The first
residue inserted in all polypeptides is N-formylmethio-
nine (in bacteria) or methionine (in eukaryotes). How-
ever, the formyl group, the amino-terminal Met residue,
and often additional amino-terminal (and, in some cases,
carboxyl-terminal) residues may be removed enzymat-
ically in formation of the final functional protein. In as
many as 50% of eukaryotic proteins, the amino group
of the amino-terminal residue is N-acetylated after
translation. Carboxyl-terminal residues are also some-
times modified.
Loss of Signal Sequences As we shall see in Section 27.3,
the 15 to 30 residues at the amino-terminal end of some
proteins play a role in directing the protein to its ulti-
mate destination in the cell. Such signal sequences are
ultimately removed by specific peptidases.
Modification of Individual Amino Acids The hydroxyl
groups of certain Ser, Thr, and Tyr residues of some pro-
teins are enzymatically phosphorylated by ATP (Fig.
Chapter 27 Protein Metabolism1062
PA
3H110325H11032
5H11032
UAG
E
RF
Release factor
binds
polypeptidyl-tRNA
link hydrolyzed
PA
3H110325H11032 UAG
E
RF
COO
H11002
3H110325H11032
RF
components
dissociate
UAG
5H11032
5H11032
FIGURE 27–26 Termination of protein synthesis in bacteria. Termi-
nation occurs in response to a termination codon in the A site. First,
a release factor, RF (RF-1 or RF-2, depending on which termination
codon is present), binds to the A site. This leads to hydrolysis of the
ester linkage between the nascent polypeptide and the tRNA in the P
site and release of the completed polypeptide. Finally, the mRNA, de-
acylated tRNA, and release factor leave the ribosome, and the ribo-
some dissociates into its 30S and 50S subunits.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1062 mac76 mac76:385_reb:
27–29a); the phosphate groups add negative charges to
these polypeptides. The functional significance of this
modification varies from one protein to the next. For
example, the milk protein casein has many phospho-
serine groups that bind Ca
2H11001
. Calcium, phosphate, and
amino acids are all valuable to suckling young, so casein
efficiently provides three essential nutrients. And as we
have seen in numerous instances, phosphorylation-
dephosphorylation cycles regulate the activity of many
enzymes and regulatory proteins.
Extra carboxyl groups may be added to Glu residues
of some proteins. For example, the blood-clotting pro-
tein prothrombin contains a number of H9253-carboxygluta-
mate residues (Fig. 27–29b) in its amino-terminal re-
gion, introduced by an enzyme that requires vitamin K.
These carboxyl groups bind Ca
2H11001
, which is required to
initiate the clotting mechanism.
27.2 Protein Synthesis 1063
0.25 mm
FIGURE 27–27 Polysome. (a) Four ribosomes translating a eukaryotic
mRNA molecule simultaneously, moving from the 5H11032 end to the 3H11032
end and synthesizing a polypeptide from the amino terminus to the
carboxyl terminus. (b) Electron micrograph and explanatory diagram
of a polysome from the silk gland of a silkworm larva. The mRNA is
being translated by many ribosomes simultaneously. The nascent
polypeptides become longer as the ribosomes move toward the 3H11032 end
of the mRNA. The final product of this process is silk fibroin.
H11001
NH
3
H11001
NH
3
mRNA
5H11032
Direction of translation
Ribosome
DNA
duplex
RNA
polymerase
3H11032
5H11032
3H11032
5H11032
Direction of transcription
FIGURE 27–28 Coupling of transcription and translation in bacte-
ria. The mRNA is translated by ribosomes while it is still being tran-
scribed from DNA by RNA polymerase. This is possible because the
mRNA in bacteria does not have to be transported from a nucleus to
the cytoplasm before encountering ribosomes. In this schematic dia-
gram the ribosomes are depicted as smaller than the RNA polymerase.
In reality the ribosomes (M
r
2.7 H11003 10
6
) are an order of magnitude
larger than the RNA polymerase (M
r
3.9 H11003 10
5
).
(b)
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1063 mac76 mac76:385_reb:
Monomethyl- and dimethyllysine residues (Fig.
27–29c) occur in some muscle proteins and in cy-
tochrome c. The calmodulin of most species contains
one trimethyllysine residue at a specific position. In
other proteins, the carboxyl groups of some Glu residues
undergo methylation, removing their negative charge.
Attachment of Carbohydrate Side Chains The carbohydrate
side chains of glycoproteins are attached covalently dur-
ing or after synthesis of the polypeptide. In some gly-
coproteins, the carbohydrate side chain is attached en-
zymatically to Asn residues (N-linked oligosaccharides),
in others to Ser or Thr residues (O-linked oligosaccha-
rides) (see Fig. 7–31). Many proteins that function ex-
tracellularly, as well as the lubricating proteoglycans
that coat mucous membranes, contain oligosaccharide
side chains (see Fig. 7–29).
Addition of Isoprenyl Groups A number of eukaryotic pro-
teins are modified by the addition of groups derived
from isoprene (isoprenyl groups). A thioether bond is
formed between the isoprenyl group and a Cys residue
of the protein (see Fig. 11–14). The isoprenyl groups
are derived from pyrophosphorylated intermediates of
the cholesterol biosynthetic pathway (see Fig. 21–33),
such as farnesyl pyrophosphate (Fig. 27–30). Proteins
modified in this way include the Ras proteins, products
of the ras oncogenes and proto-oncogenes, and G pro-
teins (both discussed in Chapter 12), and lamins, pro-
teins found in the nuclear matrix. The isoprenyl group
helps to anchor the protein in a membrane. The trans-
forming (carcinogenic) activity of the ras oncogene is
lost when isoprenylation of the Ras protein is blocked,
a finding that has stimulated interest in identifying in-
hibitors of this posttranslational modification pathway
for use in cancer chemotherapy.
Addition of Prosthetic Groups Many prokaryotic and eu-
karyotic proteins require for their activity covalently
bound prosthetic groups. Two examples are the biotin
Chapter 27 Protein Metabolism1064
H11002
OOC
g-Carboxyglutamate
CH
GD
COO
H11002
H
3
N
H11001
COO
H11002
CH
CH
2
O
A
A
O
A
H
3
N
H11001
COO
H11002
CH
O
PO
H11002
Phosphoserine
Phosphothreonine
Phosphotyrosine
O
A
B
P
A
O
O
A
OOO
A
O
H11002
O
OP
O
H11002
B
O
A
OO
O
H11002
O
O
A
HC
H
3
N
H11001
COO
H11002
CH
CH
2
O
A
A
O
A
O
O
H11002
O
H11002
P
H
3
N
H11001
CH
CH
2
O
A
A
O
COO
H11002
CH
3
O
H11001
N
Methyllysine Dimethyllysine
Trimethyllysine Methylglutamate
E
H
3
N
H11001
COO
H11002
C
H
CH
2
O
A
A
O
P
A
O
H
CH
2
A
CH
2
A
CH
2
A
A
H11001
NH
2
H
3
N
H11001
COO
H11002
CH
CH
2
O
A
A
O
A
A
CH
2
A
CH
2
A
CH
2
A
CH
3
H11001
NH
H
3
N
H11001
COO
H11002
CH
CH
2
O
A
A
O
A
GD
CH
2
A
CH
2
A
CH
2
A
CH
3
CH
3
CH
3
CH
3
CH
3
H
3
N
H11001
COO
H11002
CHO
A
A
O
A
CH
2
A
CH
2
A
A
C
O
CH
3
FIGURE 27–29 Some modified amino acid residues. (a) Phosphorylated amino
acids. (b) A carboxylated amino acid. (c) Some methylated amino acids.
Ras
SH
O
H11002
OP
PP
i
CH
2
H11002
O
S
Farnesyl pyrophosphate
Ras protein
Farnesylated Ras protein
A
O
B
O
O PO OO
O
O
B
A
O
H11002
O
CH
2
O
Ras O
FIGURE 27–30 Farnesylation of a Cys residue. The thioether linkage
is shown in red. The Ras protein is the product of the ras oncogene.
(a)
(b)
(c)
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1064 mac76 mac76:385_reb:
molecule of acetyl-CoA carboxylase and the heme group
of hemoglobin or cytochrome c.
Proteolytic Processing Many proteins are initially syn-
thesized as large, inactive precursor polypeptides that
are proteolytically trimmed to form their smaller, active
forms. Examples include proinsulin, some viral proteins,
and proteases such as chymotrypsinogen and trypsino-
gen (see Fig. 6–33).
Formation of Disulfide Cross-Links After folding into their
native conformations, some proteins form intrachain or
interchain disulfide bridges between Cys residues. In eu-
karyotes, disulfide bonds are common in proteins to be
exported from cells. The cross-links formed in this way
help to protect the native conformation of the protein
molecule from denaturation in the extracellular envi-
ronment, which can differ greatly from intracellular con-
ditions and is generally oxidizing.
Protein Synthesis Is Inhibited by Many Antibiotics
and Toxins
Protein synthesis is a central function in cellular phys-
iology and is the primary target of many naturally oc-
curring antibiotics and toxins. Except as noted, these
antibiotics inhibit protein synthesis in bacteria. The dif-
ferences between bacterial and eukaryotic protein syn-
thesis, though in some cases subtle, are sufficient that
most of the compounds discussed below are relatively
harmless to eukaryotic cells. Natural selection has fa-
vored the evolution of compounds that exploit minor
differences in order to affect bacterial systems selec-
tively, such that these biochemical weapons are syn-
thesized by some microorganisms and are extremely
toxic to others. Because nearly every step in protein
synthesis can be specifically inhibited by one antibiotic
or another, antibiotics have become valuable tools in the
study of protein biosynthesis.
27.2 Protein Synthesis 1065
BOX 27–4 WORKING IN BIOCHEMISTRY
Induced Variation in the Genetic Code:
Nonsense Suppression
When a mutation introduces a termination codon in
the interior of a gene, translation is prematurely halted
and the incomplete polypeptide is usually inactive.
These are called nonsense mutations. The gene can
be restored to normal function if a second mutation
either (1) converts the misplaced termination codon
to a codon specifying an amino acid or (2) suppresses
the effects of the termination codon. Such restorative
mutations are called nonsense suppressors; they
generally involve mutations in tRNA genes to produce
altered (suppressor) tRNAs that can recognize the
termination codon and insert an amino acid at that po-
sition. Most known suppressor tRNAs have single base
substitutions in their anticodons.
Suppressor tRNAs constitute an experimentally
induced variation in the genetic code to allow the read-
ing of what are usually termination codons, much like
the naturally occurring code variations described in
Box 27–2. Nonsense suppression does not completely
disrupt normal information transfer in a cell, because
the cell usually has several copies of each tRNA gene;
some of these duplicate genes are weakly expressed
and account for only a minor part of the cellular pool
of a particular tRNA. Suppressor mutations usually in-
volve a “minor” tRNA, leaving the major tRNA to read
its codon normally.
For example, E. coli has three identical genes for
tRNA
Tyr
, each producing a tRNA with the anticodon
(5H11032)GUA. One of these genes is expressed at relatively
high levels and thus its product represents the major
tRNA
Tyr
species; the other two genes are transcribed
in only small amounts. A change in the anticodon of
the tRNA product of one of these duplicate tRNA
Tyr
genes, from (5H11032)GUA to (5H11032)CUA, produces a minor
tRNA
Tyr
species that will insert tyrosine at UAG stop
codons. This insertion of tyrosine at UAG is carried
out inefficiently, but it can produce enough full-length
protein from a gene with a nonsense mutation to al-
low the cell to survive. The major tRNA
Tyr
continues
to translate the genetic code normally for the major-
ity of proteins.
The mutation that leads to creation of a sup-
pressor tRNA does not always occur in the anticodon.
The suppression of UGA nonsense codons generally
involves the tRNA
Trp
that normally recognizes UGG.
The alteration that allows it to read UGA (and insert
Trp residues at these positions) is a G to A change at
position 24 (in an arm of the tRNA somewhat re-
moved from the anticodon); this tRNA can now rec-
ognize both UGG and UGA. A similar change is found
in tRNAs involved in the most common naturally oc-
curring variation in the genetic code (UGA H11005 Trp; see
Box 27–2).
Suppression should lead to many abnormally long
proteins, but this does not always occur. We under-
stand only a few details of the molecular events in
translation termination and nonsense suppression.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1065 mac76 mac76:385_reb:
OH
3
C
CH
3
CONH
2
OH
OH
OH
O
CH
3
O
CH
3
N
OH
H
OH
Tetracycline
CH
3
OH
CH
2
CH
OH
OCH
NH C
O
2
N
CHCl
2
Chloramphenicol
Puromycin, made by the mold Streptomyces al-
boniger, is one of the best-understood inhibitory an-
tibiotics. Its structure is very similar to the 3H11032 end of an
aminoacyl-tRNA, enabling it to bind to the ribosomal A
site and participate in peptide bond formation, produc-
ing peptidyl-puromycin (Fig. 27–31). However, because
puromycin resembles only the 3H11032 end of the tRNA, it
does not engage in translocation and dissociates from
the ribosome shortly after it is linked to the carboxyl
terminus of the peptide. This prematurely terminates
polypeptide synthesis.
Tetracyclines inhibit protein synthesis in bacteria
by blocking the A site on the ribosome, preventing the
binding of aminoacyl-tRNAs. Chloramphenicol in-
hibits protein synthesis by bacterial (and mitochondrial
Chapter 27 Protein Metabolism1066
HC
NH
R
CO
..
HC
CH
2
N
C O
OCH
3
NHOH
N
N
N
N
N
CH
3
O
HH
HH
CH
2
HO
CH
3
H
(b)
AA
HC
NH
R
C O
OOH
N
N
N
N
NH
2
HH
HH
CH
2
O
P
5H11032
O
peptidyl
transferase
P
5H11032
A
3H11032
E
5H11032
(a)
P site
peptidyl-tRNA
A site
puromycin
HC
CH
2
H
2
N
C O
OCH
3
NHOH
N
N
N
N
N
CH
3
O
HH
HH
CH
2
HO
CH
3
..
PA
3H110325H11032
mRNA
5H11032
E
FIGURE 27–31 Disruption of peptide bond formation by puromycin.
(a) The antibiotic puromycin resembles the aminoacyl end of a charged
tRNA, and it can bind to the ribosomal A site and participate in pep-
tide bond formation. The product of this reaction, instead of being
translocated to the P site, dissociates from the ribosome, causing pre-
mature chain termination. (b) Peptidyl puromycin.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1066 mac76 mac76:385_reb:
NH
CH
2
N
ON
H
CH
2
OH
3
C
CHOH
CH
3
Cycloheximide
O
H
Streptomycin
NH
H
CH
3
O
H
H
C
O
H
HO
H
CHO
H
H
O
H
H
2
N
H
H
H
O
HO
CH
3
N
OH
OH
OH
NH
OH
CNH
2
CH
2
OH
NH
NH
ON
H
CH
2
CHOH
Cycloheximide
O
H
H
H
H
and chloroplast) ribosomes by blocking peptidyl trans-
fer; it does not affect cytosolic protein synthesis in eu-
karyotes. Conversely, cycloheximide blocks the pep-
tidyl transferase of 80S eukaryotic ribosomes but not
that of 70S bacterial (and mitochondrial and chloro-
plast) ribosomes. Streptomycin, a basic trisaccharide,
causes misreading of the genetic code (in bacteria) at
relatively low concentrations and inhibits initiation at
higher concentrations.
Each tRNA has an amino acid arm with the
terminal sequence CCA(3H11032) to which an amino
acid is esterified, an anticodon arm, a TH9274C arm,
and a D arm; some tRNAs have a fifth arm. The
anticodon is responsible for the specificity of
interaction between the aminoacyl-tRNA and
the complementary mRNA codon.
■ The growth of polypeptides on ribosomes
begins with the amino-terminal amino acid and
proceeds by successive additions of new
residues to the carboxyl-terminal end.
■ Protein synthesis occurs in five stages.
1. Amino acids are activated by specific
aminoacyl-tRNA synthetases in the cytosol.
These enzymes catalyze the formation of
aminoacyl-tRNAs, with simultaneous cleavage
of ATP to AMP and PP
i
. The fidelity of protein
synthesis depends on the accuracy of this
reaction, and some of these enzymes carry out
proofreading steps at separate active sites. In
bacteria, the initiating aminoacyl-tRNA in all
proteins is N-formylmethionyl-tRNA
fMet
.
2. Initiation of protein synthesis involves
formation of a complex between the 30S
ribosomal subunit, mRNA, GTP, fMet-tRNA
fMet
,
three initiation factors, and the 50S subunit;
GTP is hydrolyzed to GDP and P
i
.
3. In the elongation steps, GTP and elongation
factors are required for binding the incoming
aminoacyl-tRNA to the A site on the ribosome.
In the first peptidyl transfer reaction, the fMet
residue is transferred to the amino group of the
incoming aminoacyl-tRNA. Movement of the
ribosome along the mRNA then translocates
the dipeptidyl-tRNA from the A site to the P
site, a process requiring hydrolysis of GTP.
Deacylated tRNAs dissociate from the
ribosomal E site.
4. After many such elongation cycles, synthesis
of the polypeptide is terminated with the aid of
release factors. At least four high-energy
phosphate equivalents (from ATP and GTP) are
required to generate each peptide bond, an
energy investment required to guarantee
fidelity of translation.
5. Polypeptides fold into their active,
three-dimensional forms. Many proteins are
further processed by posttranslational
modification reactions.
■ Many well-studied antibiotics and toxins inhibit
some aspect of protein synthesis.
27.2 Protein Synthesis 1067
Several other inhibitors of protein synthesis are no-
table because of their toxicity to humans and other
mammals. Diphtheria toxin (M
r
58,330) catalyzes the
ADP-ribosylation of a diphthamide (a modified histi-
dine) residue of eukaryotic elongation factor eEF2,
thereby inactivating it. Ricin (M
r
29,895), an extremely
toxic protein of the castor bean, inactivates the 60S sub-
unit of eukaryotic ribosomes by depurinating a specific
adenosine in 23S rRNA.
SUMMARY 27.2 Protein Synthesis
■ Protein synthesis occurs on the ribosomes,
which consist of protein and rRNA. Bacteria
have 70S ribosomes, with a large (50S) and a
small (30S) subunit. Eukaryotic ribosomes are
significantly larger (80S) and contain more
proteins.
■ Transfer RNAs have 73 to 93 nucleotide
residues, some of which have modified bases.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1067 mac76 mac76:385_reb:
27.3 Protein Targeting and Degradation
The eukaryotic cell is made up of many structures, com-
partments, and organelles, each with specific functions
that require distinct sets of proteins and enzymes. These
proteins (with the exception of those produced in mi-
tochondria and plastids) are synthesized on ribosomes
in the cytosol, so how are they directed to their final
cellular destinations?
We are now beginning to understand this complex
and fascinating process. Proteins destined for secretion,
integration in the plasma membrane, or inclusion in lyso-
somes generally share the first few steps of a pathway
that begins in the endoplasmic reticulum. Proteins des-
tined for mitochondria, chloroplasts, or the nucleus use
three separate mechanisms. And proteins destined for
the cytosol simply remain where they are synthesized.
The most important element in many of these tar-
geting pathways is a short sequence of amino acids
called a signal sequence, whose function was first pos-
tulated by Günter Blobel and colleagues in 1970. The
signal sequence directs a protein to its appropriate lo-
cation in the cell and, for many proteins, is removed dur-
ing transport or after the protein has reached its final
destination. In proteins slated for transport into mito-
chondria, chloroplasts, or the ER, the signal sequence
is at the amino terminus of a newly synthesized polypep-
tide. In many cases, the targeting capacity of particular
signal sequences has been confirmed by fusing the sig-
nal sequence from one protein to a second protein and
showing that the signal directs the second protein to the
location where the first protein is normally found. The
selective degradation of proteins no longer needed by
the cell also relies largely on a set of molecular signals
embedded in each protein’s structure.
In this concluding section we examine protein tar-
geting and degradation, emphasizing the underlying sig-
nals and molecular regulation that are so crucial to cel-
lular metabolism. Except where noted, the focus is now
on eukaryotic cells.
Posttranslational Modification of Many Eukaryotic
Proteins Begins in the Endoplasmic Reticulum
Perhaps the best-characterized targeting system begins
in the ER. Most lysosomal, membrane, or secreted pro-
teins have an amino-terminal signal sequence (Fig.
27–32) that marks them for translocation into the lu-
men of the ER; hundreds of such signal sequences have
been determined. The carboxyl terminus of the signal
sequence is defined by a cleavage site, where protease
action removes the sequence after the protein is im-
ported into the ER. Signal sequences vary in length from
13 to 36 amino acid residues, but all have the following
features: (1) about 10 to 15 hydrophobic amino acid
residues; (2) one or more positively charged residues,
usually near the amino terminus, preceding the hy-
drophobic sequence; and (3) a short sequence at the
carboxyl terminus (near the cleavage site) that is rela-
tively polar, typically having amino acid residues with
short side chains (especially Ala) at the positions clos-
est to the cleavage site.
As originally demonstrated by George Palade, pro-
teins with these signal sequences are synthesized on ri-
bosomes attached to the ER. The signal sequence itself
helps to direct the ribosome to the ER, as illustrated by
Chapter 27 Protein Metabolism1068
Human influenza
virus A
Human
preproinsulin
Bovine
growth
hormone
Bee
promellitin
Drosophila glue
protein
Met Lys Ala Lys Leu Leu Val Leu Leu Tyr Ala Phe Val Ala Gly Asp Gln
cleavage
site
Met Ala Leu Trp Met Arg Leu Leu Pro Leu Leu Ala Leu Leu Ala Leu Trp Gly Pro Asp Pro Ala Ala Ala Phe Val
Met Lys Phe Leu Val Asn Val Ala Leu Val Phe Met Val Val Tyr Ile Ser Tyr Ile Tyr Ala Ala Pro
Met Lys Leu Leu Val Val Ala Val Ile Ala Cys Met Leu Ile Gly Phe Ala Asp Pro Ala Ser Gly Cys Lys
Met Met Ala Ala Gly Pro Arg Thr Ser Leu Leu Leu Ala Phe Ala Leu Leu Cys Leu Pro Trp Thr Gln Val Val Gly Ala Phe
FIGURE 27–32 Translocation into the ER directed by amino-terminal signal sequences of
some eukaryotic proteins. The hydrophobic core (yellow) is preceded by one or more basic
residues (blue). Note the polar and short-side-chain residues immediately preceding (to the
left of, as shown here) the cleavage sites (indicated by red arrows).
Günter Blobel George Palade
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1068 mac76 mac76:385_reb:
steps 1 through 8 in Figure 27–33. 1 The targeting
pathway begins with initiation of protein synthesis on
free ribosomes. 2 The signal sequence appears early in
the synthetic process, because it is at the amino termi-
nus, which as we have seen is synthesized first. 3 As it
emerges from the ribosome, the signal sequence—and
the ribosome itself—are bound by the large signal
recognition particle (SRP); SRP then binds GTP and
halts elongation of the polypeptide when it is about 70
amino acids long and the signal sequence has completely
emerged from the ribosome. 4 The GTP-bound SRP
now directs the ribosome (still bound to the mRNA) and
the incomplete polypeptide to GTP-bound SRP recep-
tors in the cytosolic face of the ER; the nascent polypep-
tide is delivered to a peptide translocation complex
in the ER, which may interact directly with the ribo-
some. 5 SRP dissociates from the ribosome, accompa-
nied by hydrolysis of GTP in both SRP and the SRP re-
ceptor. 6 Elongation of the polypeptide now resumes,
with the ATP-driven translocation complex feeding the
growing polypeptide into the ER lumen until the com-
plete protein has been synthesized. 7 The signal se-
quence is removed by a signal peptidase within the ER
lumen; 8 the ribosome dissociates and is recycled.
Glycosylation Plays a Key Role in Protein Targeting
In the ER lumen, newly synthesized proteins are further
modified in several ways. Following the removal of sig-
nal sequences, polypeptides are folded, disulfide bonds
formed, and many proteins glycosylated to form glyco-
proteins. In many glycoproteins the linkage to their
oligosaccharides is through Asn residues. These N-
linked oligosaccharides are diverse (Chapter 7), but the
pathways by which they form have a common first step.
A 14 residue core oligosaccharide is built up in a step-
wise fashion, then transferred from a dolichol phosphate
donor molecule to certain Asn residues in the protein
(Fig. 27–34). The transferase is on the lumenal face of
the ER and thus cannot catalyze glycosylation of cyto-
solic proteins. After transfer, the core oligosaccharide is
trimmed and elaborated in different ways on different
27.3 Protein Targeting and Degradation 1069
FIGURE 27–33 Directing eukaryotic proteins with the appropriate
signals to the endoplasmic reticulum. This process involves the SRP
cycle and translocation and cleavage of the nascent polypeptide. The
steps are described in the text. SRP is a rod-shaped complex con-
taining a 300 nucleotide RNA (7SL-RNA) and six different proteins
(combined M
r
325,000). One protein subunit of SRP binds directly to
the signal sequence, inhibiting elongation by sterically blocking the
entry of aminoacyl-tRNAs and inhibiting peptidyl transferase. Another
protein subunit binds and hydrolyzes GTP. The SRP receptor is a het-
erodimer of H9251 (M
r
69,000) and H9252 (M
r
30,000) subunits, both of which
bind and hydrolyze multiple GTP molecules during this process.
A
A
A
(
A
) n
Signal
sequence
SRP
2 1
SRP
receptor
Peptide
translocation
complex
Ribosome
receptor
Signal
peptidase
Cytosol
Endoplasmic
reticulum
ER lumen
5
H11032
cap
mRNA
Ribosome
cycle
SRP
cycle
3
4
GTP
GDP H11001 P
i
5
6
7
8
GUA
Dolichol phosphate
(n H11005 9–22)
n
CH
3
CH
3
CH
3
CH
3
P
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1069 mac76 mac76:385_reb:
proteins, but all N-linked oligosaccharides retain a pen-
tasaccharide core derived from the original 14 residue
oligosaccharide. Several antibiotics act by interfering
with one or more steps in this process and have aided
in elucidating the steps of protein glycosylation. The
best-characterized is tunicamycin, which mimics the
structure of UDP-N-acetylglucosamine and blocks the
first step of the process (Fig. 27–34, step 1 ). A few pro-
teins are O-glycosylated in the ER, but most O-glyco-
sylation occurs in the Golgi complex or in the cytosol
(for proteins that do not enter the ER).
Suitably modified proteins can now be moved to a
variety of intracellular destinations. Proteins travel from
the ER to the Golgi complex in transport vesicles (Fig.
27–35). In the Golgi complex, oligosaccharides are O-
linked to some proteins, and N-linked oligosaccharides
are further modified. By mechanisms not yet fully un-
derstood, the Golgi complex also sorts proteins and
sends them to their final destinations. The processes
that segregate proteins targeted for secretion from
those targeted for the plasma membrane or lysosomes
must distinguish among these proteins on the basis of
structural features other than signal sequences, which
were removed in the ER lumen.
Chapter 27 Protein Metabolism1070
FIGURE 27–34 Synthesis of the core oligosaccharide of glycopro-
teins. The core oligosaccharide is built up by the successive addition
of monosaccharide units. 1 , 2 The first steps occur on the cytoso-
lic face of the ER. 3 Translocation moves the incomplete oligosac-
charide across the membrane (mechanism not shown), and 4 com-
pletion of the core oligosaccharide occurs within the lumen of the ER.
The precursors that contribute additional mannose and glucose
residues to the growing oligosaccharide in the lumen are dolichol
phosphate derivatives. In the first step in the construction of the N-
linked oligosaccharide moiety of a glycoprotein, 5 , 6 the core
oligosaccharide is transferred from dolichol phosphate to an Asn
residue of the protein within the ER lumen. The core oligosaccharide
is then further modified in the ER and the Golgi complex in pathways
that differ for different proteins. The five sugar residues shown sur-
rounded by a beige screen (after step 7 ) are retained in the final
structure of all N-linked oligosaccharides. 8 The released dolichol
pyrophosphate is again translocated so that the pyrophosphate is on
the cytosolic face of the ER, then 9 a phosphate is hydrolytically re-
moved to regenerate dolichol phosphate.
A
CH
2
OH
H
O
OH
NH
HO
NH CH
2
C
H
H
H
H
O
O
H
CH
3
OH
H
OHH
H
A
HN
O
O
N
CHOH
OH
OH
H
HH
H
O
(n H11005 8–11)
A
H
O
N-Acetylglucosamine
Uracil
Tunicamine
Fatty acyl
side chain
CH
3
OC
HC
HC
CH
(CH
2
)
n
CH
3
a
b
Tunicamycin
Cytosol
P
Dolichol
P
5 GDP-Man5 GDP 2 UDP-GlcNAcUMP H11001 UDP
tunicamycin
P
iP
P
4 Dolichol P Man
4 Dolichol P
3 Dolichol P Glc
3 Dolichol P
P
P
Asn
NH
3
H11001
H
3
N
H11001
H
3
N
H11001
3H11032
5H11032
mRNA
dolichol
phosphate
recycled
translocation
Endoplasmic
reticulum
P
P
P
P
P
P
12
3
56 7
8
9
4
PP
N-Acetylglucosamine (GlcNAc)
Mannose (Man)
Glucose (Glc)
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1070 mac76 mac76:385_reb:
This sorting process is best understood in the case
of hydrolases destined for transport to lysosomes. On
arrival of a hydrolase (a glycoprotein) in the Golgi com-
plex, an as yet undetermined feature (sometimes called
a signal patch) of the three-dimensional structure of the
hydrolase is recognized by a phosphotransferase, which
phosphorylates certain mannose residues in the oligo-
saccharide (Fig. 27–36). The presence of one or more
mannose 6-phosphate residues in its N-linked oligosac-
charide is the structural signal that targets the protein
to lysosomes. A receptor protein in the membrane of
the Golgi complex recognizes the mannose 6-phosphate
signal and binds the hydrolase so marked. Vesicles con-
taining these receptor-hydrolase complexes bud from
the trans side of the Golgi complex and make their way
to sorting vesicles. Here, the receptor-hydrolase com-
plex dissociates in a process facilitated by the lower pH
in the vesicle and by phosphatase-catalyzed removal of
phosphate groups from the mannose 6-phosphate
residues. The receptor is then recycled to the Golgi com-
plex, and vesicles containing the hydrolases bud from
the sorting vesicles and move to the lysosomes. In cells
treated with tunicamycin (Fig. 27–34, step 1 ), hydro-
lases that should be targeted for lysosomes are instead
secreted, confirming that the N-linked oligosaccharide
plays a key role in targeting these enzymes to lysosomes.
The pathways that target proteins to mitochondria
and chloroplasts also rely on amino-terminal signal se-
quences. Although mitochondria and chloroplasts con-
tain DNA, most of their proteins are encoded by nuclear
DNA and must be targeted to the appropriate organelle.
Unlike other targeting pathways, however, the mito-
chondrial and chloroplast pathways begin only after a
precursor protein has been completely synthesized and
released from the ribosome. Precursor proteins destined
for mitochondria or chloroplasts are bound by cytosolic
chaperone proteins and delivered to receptors on the
exterior surface of the target organelle. Specialized
translocation mechanisms then transport the protein to
its final destination in the organelle, after which the sig-
nal sequence is removed.
Signal Sequences for Nuclear Transport
Are Not Cleaved
Molecular communication between the nucleus and the
cytosol requires the movement of macromolecules
through nuclear pores. RNA molecules synthesized in
the nucleus are exported to the cytosol. Ribosomal pro-
teins synthesized on cytosolic ribosomes are imported
into the nucleus and assembled into 60S and 40S ribo-
somal subunits in the nucleolus; completed subunits are
then exported back to the cytosol. A variety of nuclear
proteins (RNA and DNA polymerases, histones, topo-
isomerases, proteins that regulate gene expression, and
so forth) are synthesized in the cytosol and imported
into the nucleus. This traffic is modulated by a complex
system of molecular signals and transport proteins that
is gradually being elucidated.
In most multicellular eukaryotes, the nuclear enve-
lope breaks down at each cell division, and once divi-
sion is completed and the nuclear envelope reestab-
lished, the dispersed nuclear proteins must be
reimported. To allow this repeated nuclear importation,
the signal sequence that targets a protein to the nu-
cleus—the nuclear localization sequence, NLS—is not
removed after the protein arrives at its destination. An
NLS, unlike other signal sequences, may be located al-
most anywhere along the primary sequence of the pro-
tein. NLSs can vary considerably, but many consist of
four to eight amino acid residues and include several
consecutive basic (Arg or Lys) residues.
Nuclear importation is mediated by a number of pro-
teins that cycle between the cytosol and the nucleus
(Fig. 27–37), including importin H9251 and H9252 and a small
GTPase known as Ran. A heterodimer of importin H9251 and
H9252 functions as a soluble receptor for proteins targeted
to the nucleus, with the H9251 subunit binding NLS-bearing
27.3 Protein Targeting and Degradation 1071
granule
Golgi
complex
FIGURE 27–35 Pathway taken by proteins destined for lysosomes,
the plasma membrane, or secretion. Proteins are moved from the ER
to the cis side of the Golgi complex in transport vesicles. Sorting oc-
curs primarily in the trans side of the Golgi complex.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1071 mac76 mac76:385_reb:
proteins in the cytosol. The complex of the NLS-
bearing protein and the importin docks at a nuclear pore
and is translocated through the pore by an energy-
dependent mechanism that requires the Ran GTPase.
The two importin subunits separate during the translo-
cation, and the NLS-bearing protein dissociates from im-
portin H9251 inside the nucleus. Importin H9251 and H9252 are then
exported from the nucleus to repeat the process. How
importin H9251 remains dissociated from the many NLS-
bearing proteins inside the nucleus is not yet clear.
Bacteria Also Use Signal Sequences
for Protein Targeting
Bacteria can target proteins to their inner or outer mem-
branes, to the periplasmic space between these mem-
branes, or to the extracellular medium. They use signal
sequences at the amino terminus of the proteins (Fig.
27–38), much like those on eukaryotic proteins targeted
to the ER, mitochondria, and chloroplasts.
Most proteins exported from E. coli make use of
the pathway shown in Figure 27–39. Following transla-
tion, a protein to be exported may fold only slowly, the
amino-terminal signal sequence impeding the folding.
The soluble chaperone protein SecB binds to the pro-
tein’s signal sequence or other features of its incom-
pletely folded structure. The bound protein is then de-
livered to SecA, a protein associated with the inner
surface of the plasma membrane. SecA acts as both a
receptor and a translocating ATPase. Released from
SecB and bound to SecA, the protein is delivered to a
translocation complex in the membrane, made up of
SecY, E, and G, and is translocated stepwise through the
membrane at the SecYEG complex in lengths of about
20 amino acid residues. Each step is facilitated by the
hydrolysis of ATP, catalyzed by SecA.
Chapter 27 Protein Metabolism1072
H
H
H
O
H
HO
CH
2
OH
A
O
B
O
O O
O
Uridine
H11001
H
H
H
H
HO
O
O
H5008
O
H
Oligosaccharide
O
H
NO Enzyme
Mannose 6-phosphate residue
UDP N-Acetylglucosamine
O
O
GlcNAc
A
CH
3
NH
PCO
A
PO
H
H
H
HOH
H
HO
CH
2
OH
OO
O
O
A
B
OP
O
H5008
O
OCH
2
H5008
OO
A
B
OP
O
H5008
O
H
OOligosaccharide
H
NO Enzyme
Hydrolase
A
CH
3
PCO
A
O
H
H
H
H
HO
OO Oligosaccharide
H
NO Enzyme
O
UMP
N-acetylglucosamine
phosphotransferase
NH
O
H
H
H
HOH
H
HO
CH
2
O
O
O
O
A
B
OP
O
H5008
O
CH
2
OH
H
O
phosphodiesterase
Mannose residue
OH
(UDP-GlcNAc)
OH
OH
HO
HO
HOFIGURE 27–36 Phosphorylation of mannose residues
on lysosome-targeted enzymes. N-Acetylglucosamine
phosphotransferase recognizes some as yet unidentified
structural feature of hydrolases destined for lysosomes.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1072 mac76 mac76:385_reb:
27.3 Protein Targeting and Degradation 1073
Nuclear
envelope
Nuclear
protein
Importin
Nuclear
pore
complex
NLS
Cytosol
Nucleoplasm
1
2
3
4
5
6
b
b
a
a
a
GTP
Ran
GDP H11001 P
i
(a)
0.2 mH9262
(b)
FIGURE 27–37 Targeting of nuclear proteins. (a) 1 A protein with an appropriate nuclear
localization signal (NLS) is bound by a complex of importin H9251 and H9252. 2 The resulting complex
binds to a nuclear pore, and 3 translocation is mediated by the Ran GTPase. 4 Inside the
nucleus, importin H9252 dissociates from importin H9251, and 5 importin H9251 then releases the nuclear
protein. 6 Importin H9251 and H9252 are transported out of the nucleus and recycled. (b) Scanning
electron micrograph of the surface of the nuclear envelope, showing numerous nuclear pores.
cleavage
site
Inner membrane proteins
Phage fd, major
coat protein
Phage fd, minor
coat protein
Periplasmic proteins
Alkaline phosphatase
Leucine-specific
binding protein
-Lactamase of
pBR322
Outer membrane proteins
Lipoprotein
LamB
OmpA
Met Lys Lys Ser Leu Val Leu Lys Ala Ser Val Ala Val Ala Thr Leu Val Pro Met Leu Ser Phe Ala Ala Glu
Met Lys Lys Leu Leu Phe Ala Ile Pro Leu Val Val Pro Phe Tyr Ser His Ser Ala Glu
Met Lys Gln Ser Thr Ile Leu Ala Leu Leu Pro Leu Leu Phe Thr Pro Val Thr Lys Ala Arg Thr
Ala Val
Ala Leu
Ala Met
Val Ala
Ala
Pro
Asp
Cys
Val
His
Asp
Pro
Asp
Ser
Ala Ala Gln
Ala
Gly
Phe
Ala Met Ala Thr
Pro
Leu
Gln
His
Leu
Ala
Thr
Thr
Ser
Cys
Ala
Ser
Ser Met
Gly
Phe
Ile
Phe
Val
Leu
Gly
Ala
Ala
Ala
Gly
Ile
Ala
Leu
Ala
Val
Leu
Phe
Ala
Ala
Ala
Ala
Phe
Ile
Val
Val
Gly
Pro
Met
Ala
Ala
Leu
Ile
Gly
Ile
Val
Val
Leu
Ala
Ala
Ala
Leu
Ile
Ala
Ile
Leu
Lys
Ile
Val
Ala
Thr
Pro
Arg
Thr
Thr
Leu
Ala
Phe
Lys Ala
His
Lys
Lys
Lys Lys
Arg
Met
Asn
Gln
Met
Leu
Thr Ile
Ile Ser Met
Met Lys Ala
Met Met
H9252
FIGURE 27–38 Signal sequences that target proteins to different lo-
cations in bacteria. Basic amino acids (blue) near the amino termi-
nus and hydrophobic core amino acids (yellow) are highlighted. The
cleavage sites marking the ends of the signal sequences are indicated
by red arrows. Note that the inner bacterial cell membrane (see Fig.
1–6) is where phage fd coat proteins and DNA are assembled into
phage particles. OmpA is outer membrane protein A; LamB is a cell
surface receptor protein for bacteriophage lambda.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1073 mac76 mac76:385_reb:
Chapter 27 Protein Metabolism1074
FIGURE 27–39 Model for protein export in
bacteria. 1 A newly translated polypeptide binds
to the cytosolic chaperone protein SecB, which 2
delivers it to SecA, a protein associated with the
translocation complex (SecYEG) in the bacterial cell
membrane. 3 SecB is released, and SecA inserts
itself into the membrane, forcing about 20 amino
acid residues of the protein to be exported through
the translocation complex. 4 Hydrolysis of an ATP
by SecA provides the energy for a conformational
change that causes SecA to withdraw from the
membrane, releasing the polypeptide. 5 SecA
binds another ATP, and the next stretch of 20 amino
acid residues is pushed across the membrane
through the translocation complex. Steps 4 and
5 are repeated until 6 the entire protein has
passed through and is released to the periplasm.
The electrochemical potential across the membrane
(denoted by H11001 and H11002) also provides some of the
driving force required for protein translocation.
Cytosol
Periplasmic
space
3 4 5
6
2
H11001
1
SecB
SecB
ATP ADP H11001 P
i
ATP
SecA
SecYEG
H11001H11001H11001
H11002H11002H11002
(c) 0.1 mm
Light
chain
Heavy
chain
(a) (b)
~80
nm
FIGURE 27–40 Clathrin. (a) Three light (L) chains (M
r
35,000) and
three heavy (H) chains (M
r
180,000) of the (HL)
3
clathrin unit, or-
ganized as a three-legged structure called a triskelion. (b) Triskelions
tend to assemble into polyhedral lattices. (c) Electron micrograph
of a coated pit on the cytosolic face of the plasma membrane of a
fibroblast.
An exported protein is thus pushed through the
membrane by a SecA protein located on the cytoplas-
mic surface, rather than being pulled through the mem-
brane by a protein on the periplasmic surface. This dif-
ference may simply reflect the need for the translocating
ATPase to be where the ATP is. The transmembrane
electrochemical potential can also provide energy for
translocation of the protein, by an as yet unknown
mechanism.
Although most exported bacterial proteins use this
pathway, some follow an alternative pathway that uses
signal recognition and receptor proteins homologous to
components of the eukaryotic SRP and SRP receptor
(Fig. 27–33).
Cells Import Proteins by Receptor-Mediated
Endocytosis
Some proteins are imported into cells from the sur-
rounding medium; examples in eukaryotes include low-
density lipoprotein (LDL), the iron-carrying protein
transferrin, peptide hormones, and circulating proteins
destined for degradation. The proteins bind to recep-
tors in invaginations of the membrane called coated
pits, which concentrate endocytic receptors in prefer-
ence to other cell-surface proteins. The pits are coated
on their cytosolic side with a lattice of the protein
clathrin, which forms closed polyhedral structures
(Fig. 27–40). The clathrin lattice grows as more recep-
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1074 mac76 mac76:385_reb:
tors are occupied by target proteins, until a complete
membrane-bounded endocytic vesicle buds off the
plasma membrane and enters the cytoplasm. The
clathrin is quickly removed by uncoating enzymes, and
the vesicle fuses with an endosome. ATPase activity
in the endosomal membranes reduces the pH therein,
facilitating dissociation of receptors from their target
proteins.
The imported proteins and receptors then go their
separate ways, their fates varying with the cell and pro-
tein type. Transferrin and its receptor are eventually re-
cycled. Some hormones, growth factors, and immune
complexes, after eliciting the appropriate cellular re-
sponse, are degraded along with their receptors. LDL is
degraded after the associated cholesterol has been de-
livered to its destination, but the LDL receptor is recy-
cled (see Fig. 21–42).
Receptor-mediated endocytosis is exploited by
some toxins and viruses to gain entry to cells. Influenza
virus (see Fig. 11–24), diphtheria toxin, and cholera
toxin all enter cells in this way.
Protein Degradation Is Mediated by Specialized
Systems in All Cells
Protein degradation prevents the buildup of abnormal
or unwanted proteins and permits the recycling of amino
acids. The half-lives of eukaryotic proteins vary from 30
seconds to many days. Most proteins turn over rapidly
relative to the lifetime of a cell, although a few (such as
hemoglobin) can last for the life of the cell (about 110
days for an erythrocyte). Rapidly degraded proteins in-
clude those that are defective because of incorrectly in-
serted amino acids or because of damage accumulated
during normal functioning. And enzymes that act at key
regulatory points in metabolic pathways often turn over
rapidly.
Defective proteins and those with characteristically
short half-lives are generally degraded in both bacterial
and eukaryotic cells by selective ATP-dependent cy-
tosolic systems. A second system in vertebrates, oper-
ating in lysosomes, recycles the amino acids of mem-
brane proteins, extracellular proteins, and proteins with
characteristically long half-lives.
In E. coli, many proteins are degraded by an ATP-
dependent protease called Lon (the name refers to the
“long form” of proteins, observed only when this pro-
tease is absent). The protease is activated in the pres-
ence of defective proteins or those slated for rapid
turnover; two ATP molecules are hydrolyzed for every
peptide bond cleaved. The precise role of this ATP hy-
drolysis is not yet clear. Once a protein has been reduced
to small inactive peptides, other ATP-independent pro-
teases complete the degradation process.
The ATP-dependent pathway in eukaryotic cells is
quite different, involving the protein ubiquitin, which,
as its name suggests, occurs throughout the eukaryotic
kingdoms. One of the most highly conserved proteins
known, ubiquitin (76 amino acid residues) is essentially
identical in organisms as different as yeasts and humans.
Ubiquitin is covalently linked to proteins slated for de-
struction via an ATP-dependent pathway involving three
separate enzymes (E1, E2, and E3 in Fig. 27–41).
27.3 Protein Targeting and Degradation 1075
Ubiquitin
E3
E2
Target protein
O
C
Ubiquitin
O
O
H11002
G
C
AMP H11001 PP
i
HS E1
HS
HS
H11001 ATP
B
Ubiquitin
O
C
B
S
Lys
Ubiquitin
O
C
B
S
H
2
N Target protein
LysONH
E1
E1
E2
HS E2
Repeated cycles lead to
attachment of additional
ubiquitin
FIGURE 27–41 Three-step cascade pathway by which ubiquitin is at-
tached to a protein. Two different enzyme-ubiquitin intermediates are
involved. The free carboxyl group of ubiquitin’s carboxyl-terminal Gly
residue is ultimately linked through an amide (isopeptide) bond to an
H9280-amino group of a Lys residue of the target protein. Additional cycles
produce polyubiquitin, a covalent polymer of ubiquitin subunits that
targets the attached protein for destruction in eukaryotes.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1075 mac76 mac76:385_reb:
Ubiquitinated proteins are degraded by a large com-
plex known as the 26S proteasome (M
r
2.5 H11003 10
6
)
(Fig. 27–42). The proteasome consists of two copies
each of at least 32 different subunits, most of which are
highly conserved from yeasts to humans. The protea-
some contains two main types of subcomplexes, a barrel-
like core particle and regulatory particles on either end
of the barrel. The 20S core particle consists of four rings;
the outer rings are formed from seven H9251 subunits, and
the inner rings from seven H9252 subunits. Three of the
seven subunits in each H9252 ring have protease activities,
each with different substrate specificities. The stacked
rings of the core particle form the barrel-like structure
within which target proteins are degraded. The 19S reg-
ulatory particle on each end of the core particle con-
tains 18 subunits, including some that recognize and
bind to ubiquitinated proteins. Six of the subunits are
ATPases that probably function in unfolding the ubiq-
uitinated proteins and translocating the unfolded
polypeptide into the core particle for degradation.
Although we do not yet understand all the signals
that trigger ubiquitination, one simple signal has been
found. For many proteins, the identity of the first residue
that remains after removal of the amino-terminal Met
residue, and any other posttranslational proteolytic pro-
cessing of the amino-terminal end, has a profound in-
fluence on half-life (Table 27–9). These amino-terminal
signals have been conserved over billions of years of evo-
lution, and are the same in bacterial protein degradation
systems and in the human ubiquitination pathway. More
complex signals, such as the destruction box discussed
in Chapter 12 (see Fig. 12–44), are also being identified.
Ubiquitin-dependent proteolysis is as important for
the regulation of cellular processes as for the elimina-
tion of defective proteins. Many proteins required at
only one stage of the eukaryotic cell cycle are rapidly
degraded by the ubiquitin-dependent pathway after
completing their function. The same pathway also
processes and presents class I MHC antigens (see Fig.
5–22). Ubiquitin-dependent destruction of cyclin is crit-
ical to cell-cycle regulation (see Fig. 12–44). The E2 and
E3 components of the ubiquitination cascade pathway
Chapter 27 Protein Metabolism1076
TABLE 27–9
Amino-terminal residue Half-life*
Stabilizing
Met, Gly, Ala, Ser, Thr, Val >20 h
Destabilizing
Ile, Gln ~30 min
Tyr, Glu ~10 min
Pro ~7 min
Leu, Phe, Asp, Lys ~3 min
Arg ~2 min
Relationship between Protein
Half-Life and Amino-Terminal Amino Acid Residue
Source: Modified from Bachmair, A., Finley, D., & Varshavsky, A. (1986) In vivo half-life of a
protein is a function of its amino-terminal residue. Science 234, 179–186.
*Half-lives were measured in yeast for the H9252-galactosidase protein modified so that in each
experiment it had a different amino-terminal residue. (See Chapter 9 for a discussion of
techniques used to engineer proteins with altered amino acid sequences.) Half-lives may
vary for different proteins and in different organisms, but this general pattern appears to
hold for all organisms.
Substrate
protein
Polyubiquitin
attached to
protein interacts
with proteasome
19S regulatory
particle
(a) 20S core particle (b) Complete proteasome
FIGURE 27–42 Three-dimensional structure of the eukaryotic pro-
teasome. The 26S proteasome is highly conserved in all eukaryotes.
The two subassemblies are the 20S core particle and the 19S regula-
tory particle. (a) (PDB ID 1IRU) The core particle consists of four rings
arranged to form a barrel-like structure. Each of the inner rings has
seven different H9252 subunits (light blue), three of which have protease
activities (dark blue). The outer rings each have seven different H9251 sub-
units (gray). (b) A regulatory particle forms a cap on each end of the
core particle. The core particle is colored as in (a). The base and lid
segments of each regulatory particle are presented in different shades
of red. The regulatory particle unfolds ubiquitinated proteins (blue)
and translocates them into the core particle, as shown.
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1076 mac76 mac76:385_reb:
(Fig. 27–41) are in fact two large families of proteins.
Different E2 and E3 enzymes exhibit different speci-
ficities for target proteins and thus regulate different
cellular processes. Some E2 and E3 enzymes are highly
localized in certain cellular compartments, reflecting a
specialized function.
Not surprisingly, defects in the ubiquitination
pathway have been implicated in a wide range of
disease states. An inability to degrade certain proteins
that activate cell division (the products of oncogenes)
can lead to tumor formation, whereas a too-rapid degra-
dation of proteins that act as tumor suppressors can
have the same effect. The ineffective or overly rapid
degradation of cellular proteins also appears to play a
role in a range of other conditions: renal diseases,
asthma, neurodegenerative disorders such as Alz-
heimer’s and Parkinson’s diseases (associated with the
formation of characteristic proteinaceous structures in
neurons), cystic fibrosis (caused in some cases by a too-
rapid degradation of a chloride ion channel, with re-
sultant loss of function; see Box 11–3), Liddle’s syn-
drome (in which a sodium channel in the kidney is not
degraded, leading to excessive Na
H11001
absorption and
early-onset hypertension)—and many other disorders.
Drugs designed to inhibit proteasome function are be-
ing developed as potential treatments for some of these
conditions. In a changing metabolic environment, pro-
tein degradation is as important to a cell’s survival as is
protein synthesis, and much remains to be learned about
these interesting pathways. ■
SUMMARY 27.3 Protein Targeting and Degradation
■ After synthesis, many proteins are directed to
particular locations in the cell. One targeting
mechanism involves a peptide signal sequence,
generally found at the amino terminus of a
newly synthesized protein.
■ In eukaryotic cells, one class of signal sequences
is recognized by the signal recognition particle
(SRP), which binds the signal sequence as
soon as it appears on the ribosome and
transfers the entire ribosome and incomplete
polypeptide to the ER. Polypeptides with
these signal sequences are moved into the ER
lumen as they are synthesized; once in the
lumen they may be modified and moved to the
Golgi complex, then sorted and sent to
lysosomes, the plasma membrane, or transport
vesicles.
■ Proteins targeted to mitochondria and
chloroplasts in eukaryotic cells, and those
destined for export in bacteria, also make use
of an amino-terminal signal sequence.
■ Proteins targeted to the nucleus have an
internal signal sequence that is not cleaved
once the protein is successfully targeted.
■ Some eukaryotic cells import proteins by
receptor-mediated endocytosis.
■ All cells eventually degrade proteins, using
specialized proteolytic systems. Defective
proteins and those slated for rapid turnover are
generally degraded by an ATP-dependent
system. In eukaryotic cells, the proteins are
first tagged by linkage to ubiquitin, a highly
conserved protein. Ubiquitin-dependent
proteolysis is carried out by proteasomes, also
highly conserved, and is critical to the
regulation of many cellular processes.
Chapter 27 Key Terms 1077
Key Terms
aminoacyl-tRNA 1035
aminoacyl-tRNA
synthetases 1035
translation 1035
codon 1035
reading frame 1036
initiation codon 1038
termination
codons 1038
open reading frame
(ORF) 1039
anticodon 1039
wobble 1041
initiation 1056
Shine-Dalgarno
sequence 1056
aminoacyl (A) site 1056
peptidyl (P) site 1056
exit (E) site 1056
initiation
complex 1057
elongation 1058
elongation
factors 1058
peptidyl
transferase 1058
translocation 1060
termination 1061
release factors 1061
polysome 1062
posttranslational
modification 1062
nonsense
suppressor 1065
puromycin 1066
tetracyclines 1066
chloramphenicol 1066
cycloheximide 1067
streptomycin 1067
diphtheria toxin 1067
ricin 1067
signal recognition particle
(SRP) 1069
signal sequence 1068
tunicamycin 1070
coated pits 1074
clathrin 1074
ubiquitin 1075
proteasome 1076
Terms in bold are defined in the glossary.
8885d_c27_1077 2/13/04 2:49 PM Page 1077 mac76 mac76:385_reb:
Chapter 27 Protein Metabolism1078
Further Reading
Genetic Code
Bass, B.L. (2002) RNA editing by adenosine deaminases that act
on RNA. Annu. Rev. Biochem. 71, 817–846.
Blanc, V. & Davidson, N.O. (2003) C-to-U RNA editing:
mechanisms leading to genetic diversity J. Biol. Chem. 278,
1395–1398.
Crick, F.H.C. (1966) The genetic code: III. Sci. Am. 215
(October), 55–62.
An insightful overview of the genetic code at a time when the
code words had just been worked out.
Fox, T.D. (1987) Natural variation in the genetic code. Annu.
Rev. Genet. 21, 67–91.
Hatfield, D. & Oroszlan, S. (1990) The where, what and how
of ribosomal frameshifting in retroviral protein synthesis. Trends
Biochem. Sci. 15, 186–190.
Klobutcher, L.A. & Farabaugh, P.J. (2002) Shifty ciliates:
frequent programmed translational frameshifting in Euplotids.
Cell 111, 763–766.
Knight, R.D., Freeland, S.J., & Landweber, L.F. (2001)
Rewiring the keyboard: evolvability of the genetic code. Nat. Rev.
Genet. 2, 49–58.
Maas, S., Rich, A., & Nishikura, K. (2003) A-to-I RNA editing:
recent news and residual mysteries. J. Biol. Chem. 278, 1391–1394.
Nirenberg, M.W. (1963) The genetic code: II. Sci. Am. 208
(March), 80–94.
A description of the original experiments.
Stadtman, T.C. (1996) Selenocysteine. Annu. Rev. Biochem. 65,
83–100.
Protein Synthesis
Ban, N., Nissen, P., Hansen, J., Moore, P.B., & Steitz, T.A.
(2000) The complete atomic structure of the large ribosomal
subunit at 2.4 angstrom resolution. Science 289, 905–920.
The first high-resolution structure of a major ribosomal subunit.
Bj?rk, G.R., Ericson, J.U., Gustafsson, C.E.D., Hagervall,
T.G., J?nsson, Y.H., & Wikstr?m, P.M. (1987) Transfer RNA
modification. Annu. Rev. Biochem. 56, 263–288.
Chapeville, F., Lipmann, F., von Ehrenstein, G., Weisblum,
B., Ray, W.J., Jr., & Benzer, S. (1962) On the role of soluble
ribonucleic acid in coding for amino acids. Proc. Natl. Acad. Sci.
USA 48, 1086–1092.
Classic experiments providing proof for Crick’s adaptor
hypothesis and showing that amino acids are not checked after
they are linked to tRNAs.
Dintzis, H.M. (1961) Assembly of the peptide chains of
hemoglobin. Proc. Natl. Acad. Sci. USA 47, 247–261.
A classic experiment establishing that proteins are assembled
beginning at the amino terminus.
Giege, R., Sissler, M., & Florentz, C. (1998) Universal rules
and idiosyncratic features in tRNA identity. Nucleic Acid Res. 26,
5017–5035.
Gingras, A.-C., Raught, B., & Sonenberg, N. (1999) eIF4
initiation factors: effectors of mRNA recruitment to ribosomes and
regulators of translation. Annu. Rev. Biochem. 68, 913–964.
Gray, N.K. & Wickens, M. (1998) Control of translation initiation
in animals. Annu. Rev. Cell Dev. Biol. 14, 399–458.
Green, R. & Noller, J.F. (1997) Ribosomes and translation.
Annu. Rev. Biochem. 66, 679–716.
Ibba, M. & Soll, D. (2000) Aminoacyl-tRNA synthesis. Annu.
Rev. Biochem. 69, 617–650.
Maden, B.E.H. (1990) The numerous modified nucleotides in
eukaryotic ribosomal RNA. Prog. Nucleic Acid Res. Mol. Biol. 39,
241–303.
Moore, P.B. & Steitz, T.A. (2003) The structural basis of large
ribosomal subunit function. Annu. Rev. Biochem. 72, 813–850.
Ramakrishnan, V. (2002) Ribosome structure and the mechanism
of translation. Cell 108, 557–572.
A good overview, incorporating structural advances.
Rodnina, M.V. & Wintermeyer, W. (2001) Fidelity of aminoacyl-
tRNA selection on the ribosome: kinetic and structural
mechanisms. Annu. Rev. Biochem. 70, 415–435.
Sprinzl, M. (1994) Elongation factor Tu: a regulatory GTPase with
an integrated effector. Trends Biochem. Sci. 19, 245–250.
Woese, C.R., Olsen, G.J., Ibba, M., & Soll, D. (2000)
Aminoacyl-tRNA synthetases, the genetic code, and the
evolutionary process. Microbiol. Mol. Biol. Rev. 64, 202–236.
Protein Targeting and Secretion
G?rlich, D. & Mattaj, I.W. (1996) Nucleocytoplasmic transport.
Science 271, 1513–1518.
Hartmann-Petersen, R., Seeger, M., & Gordon C. (2003)
Transferring substrates to the 26S proteasome. Trends Biochem.
Sci. 28, 26–31.
Higgins, M.K. & McMahon, H.T. (2002) Snap-shots of clathrin-
mediated endocytosis. Trends Biochem. Sci. 27, 257–263.
Neupert, W. (1997) Protein import into mitochondria. Annu. Rev.
Biochem. 66, 863–917.
Pryer, N.K., Wuestehube, L.J., & Schekman, R. (1992) Vesicle-
mediated protein sorting. Annu. Rev. Biochem. 61, 471–516.
Rapoport, T.A., Jungnickel, B., & Kutay, U. (1996) Protein
transport across the eukaryotic endoplasmic reticulum and
bacterial inner membranes. Annu. Rev. Biochem. 65, 271–303.
Schatz, G. & Dobberstein, B. (1996) Common principles of
protein translocation across membranes. Science 271, 1519–1525.
Schekman, R. & Orci, L. (1996) Coat proteins and vesicle
budding. Science 271, 1526–1532.
Schmid, S.L. (1997) Clathrin-coated vesicle formation and protein
sorting: an integrated process. Annu. Rev. Biochem. 66, 511–548.
Varshavsky, A. (1997) The ubiquitin system. Trends Biochem.
Sci. 22, 383–387.
Voges, D., Zwickl, P., & Baumeister, W. (1999) The 26S
proteasome: a molecular machine designed for controlled proteolysis.
Annu. Rev. Biochem. 68, 1015–1057.
Ward, W.H.J. (1987) Diphtheria toxin: a novel cytocidal enzyme.
Trends Biochem. Sci. 12, 28–31.
8885d_c27_1078 2/13/04 2:50 PM Page 1078 mac76 mac76:385_reb:
Chapter 27 Problems 1079
1. Messenger RNA Translation Predict the amino acid
sequences of peptides formed by ribosomes in response to
the following mRNA sequences, assuming that the reading
frame begins with the first three bases in each sequence.
(a) GGUCAGUCGCUCCUGAUU
(b) UUGGAUGCGCCAUAAUUUGCU
(c) CAUGAUGCCUGUUGCUAC
(d) AUGGACGAA
2. How Many Different mRNA Sequences Can Specify
One Amino Acid Sequence? Write all the possible mRNA
sequences that can code for the simple tripeptide segment
Leu–Met–Tyr. Your answer will give you some idea about the
number of possible mRNAs that can code for one polypeptide.
3. Can the Base Sequence of an mRNA Be Predicted
from the Amino Acid Sequence of Its Polypeptide
Product? A given sequence of bases in an mRNA will code
for one and only one sequence of amino acids in a polypep-
tide, if the reading frame is specified. From a given sequence
of amino acid residues in a protein such as cytochrome c, can
we predict the base sequence of the unique mRNA that coded
it? Give reasons for your answer.
4. Coding of a Polypeptide by Duplex DNA The tem-
plate strand of a segment of double-helical DNA contains the
sequence
(5H11032)CTTAACACCCCTGACTTCGCGCCGTCG(3H11032)
(a) What is the base sequence of the mRNA that can be
transcribed from this strand?
(b) What amino acid sequence could be coded by the
mRNA in (a), starting from the 5H11032 end?
(c) If the complementary (nontemplate) strand of this
DNA were transcribed and translated, would the resulting
amino acid sequence be the same as in (b)? Explain the bi-
ological significance of your answer.
5. Methionine Has Only One Codon Methionine is one
of two amino acids with only one codon. How does the single
codon for methionine specify both the initiating residue and
interior Met residues of polypeptides synthesized by E. coli?
6. Synthetic mRNAs The genetic code was elucidated
with polyribonucleotides synthesized either enzymatically or
chemically in the laboratory. Given what we now know about
the genetic code, how would you make a polyribonucleotide
that could serve as an mRNA coding predominantly for many
Phe residues and a small number of Leu and Ser residues?
What other amino acid(s) would be coded for by this polyri-
bonucleotide, but in smaller amounts?
7. Energy Cost of Protein Biosynthesis Determine
the minimum energy cost, in terms of ATP equivalents ex-
pended, required for the biosynthesis of the H9252-globin chain
of hemoglobin (146 residues), starting from a pool includ-
ing all necessary amino acids, ATP, and GTP. Compare your
answer with the direct energy cost of the biosynthesis of a
linear glycogen chain of 146 glucose residues in (H92511n4) link-
age, starting from a pool including glucose, UTP, and ATP
(Chapter 15). From your data, what is the extra energy cost
of making a protein, in which all the residues are ordered in
a specific sequence, compared with the cost of making a poly-
saccharide containing the same number of residues but lack-
ing the informational content of the protein?
In addition to the direct energy cost for the synthesis of
a protein, there are indirect energy costs—those required for
the cell to make the necessary enzymes for protein synthe-
sis. Compare the magnitude of the indirect costs to a eu-
karyotic cell of the biosynthesis of linear (H92511n4) glycogen
chains and the biosynthesis of polypeptides, in terms of the
enzymatic machinery involved.
8. Predicting Anticodons from Codons Most amino
acids have more than one codon and attach to more than one
tRNA, each with a different anticodon. Write all possible an-
ticodons for the four codons of glycine: (5H11032)GGU, GGC, GGA,
and GGG.
(a) From your answer, which of the positions in the an-
ticodons are primary determinants of their codon specificity
in the case of glycine?
(b) Which of these anticodon-codon pairings has/have a
wobbly base pair?
(c) In which of the anticodon-codon pairings do all three
positions exhibit strong Watson-Crick hydrogen bonding?
9. Effect of Single-Base Changes on Amino Acid Se-
quence Much important confirmatory evidence on the ge-
netic code has come from assessing changes in the amino acid
sequence of mutant proteins after a single base has been
changed in the gene that encodes the protein. Which of the
following amino acid replacements would be consistent with
the genetic code if the replacements were caused by a single
base change? Which cannot be the result of a single-base mu-
tation? Why?
(a) PhenLeu (e) IlenLeu
(b) LysnAla (f) HisnGlu
(c) AlanThr (g) PronSer
(d) PhenLys
10. Basis of the Sickle-Cell Mutation Sickle-cell hemo-
globin has a Val residue at position 6 of the H9252-globin chain,
instead of the Glu residue found in normal hemoglobin A. Can
you predict what change took place in the DNA codon for glu-
tamate to account for replacement of the Glu residue by Val?
11. Importance of the “Second Genetic Code” Some
aminoacyl-tRNA synthetases do not recognize and bind the
anticodon of their cognate tRNAs but instead use other struc-
tural features of the tRNAs to impart binding specificity. The
tRNAs for alanine apparently fall into this category.
(a) What features of tRNA
Ala
are recognized by Ala-tRNA
synthetase?
(b) Describe the consequences of a CnG mutation in
the third position of the anticodon of tRNA
Ala
.
(c) What other kinds of mutations might have similar ef-
fects?
(d) Mutations of these types are never found in natural
populations of organisms. Why? (Hint: Consider what might
happen both to individual proteins and to the organism as a
whole.)
Problems
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1079 mac76 mac76:385_reb:
Chapter 27 Protein Metabolism1080
12. Maintaining the Fidelity of Protein Synthesis The
chemical mechanisms used to avoid errors in protein syn-
thesis are different from those used during DNA replication.
DNA polymerases use a 3H11032n5H11032 exonuclease proofreading ac-
tivity to remove mispaired nucleotides incorrectly inserted
into a growing DNA strand. There is no analogous proof-
reading function on ribosomes and, in fact, the identity of an
amino acid attached to an incoming tRNA and added to the
growing polypeptide is never checked. A proofreading step
that hydrolyzed the previously formed peptide bond after an
incorrect amino acid had been inserted into a growing
polypeptide (analogous to the proofreading step of DNA poly-
merases) would be impractical. Why? (Hint: Consider how
the link between the growing polypeptide and the mRNA is
maintained during elongation; see Figs 27–24 and 27–25.)
13. Predicting the Cellular Location of a Protein The
gene for a eukaryotic polypeptide 300 amino acid residues
long is altered so that a signal sequence recognized by SRP
occurs at the polypeptide’s amino terminus and a nuclear lo-
calization signal (NLS) occurs internally, beginning at residue
150. Where is the protein likely to be found in the cell?
14. Requirements for Protein Translocation across a
Membrane The secreted bacterial protein OmpA has a
precursor, ProOmpA, which has the amino-terminal signal
sequence required for secretion. If purified ProOmpA is dena-
tured with 8 M urea and the urea is then removed (such as
by running the protein solution rapidly through a gel filtra-
tion column) the protein can be translocated across isolated
bacterial inner membranes in vitro. However, translocation
becomes impossible if ProOmpA is first allowed to incubate
for a few hours in the absence of urea. Furthermore, the ca-
pacity for translocation is maintained for an extended period
if ProOmpA is first incubated in the presence of another bac-
terial protein called trigger factor. Describe the probable func-
tion of this factor.
15. Protein-Coding Capacity of a Viral DNA The 5,386 bp
genome of bacteriophage H9278X174 includes genes for 10 pro-
teins, designated A to K, with sizes given in the table below.
How much DNA would be required to encode these 10 pro-
teins? How can you reconcile the size of the H9278X174 genome
with its protein-coding capacity?
Number of Number of
amino amino
Protein acid residues Protein acid residues
A 455 F 427
B 120 G 175
C 86 H 328
D 152 J 38
E91 K56
8885d_c27_1034-1080 2/12/04 1:19 PM Page 1080 mac76 mac76:385_reb:
chapter
O
f the 4,000 or so genes in the typical bacterial
genome, or the perhaps 35,000 genes in the human
genome, only a fraction are expressed in a cell at any
given time. Some gene products are present in very large
amounts: the elongation factors required for protein
synthesis, for example, are among the most abundant
proteins in bacteria, and ribulose 1,5-bisphosphate
carboxylase/oxygenase (rubisco) of plants and photosyn-
thetic bacteria is, as far as we know, the most abundant
enzyme in the biosphere. Other gene products occur in
much smaller amounts; for instance, a cell may contain
only a few molecules of the enzymes that repair rare
DNA lesions. Requirements for some gene products
change over time. The need for enzymes in certain meta-
bolic pathways may wax and wane as food sources
change or are depleted. During development of a mul-
ticellular organism, some proteins that influence cellu-
lar differentiation are present for just a brief time in only
a few cells. Specialization of cellular function can dra-
matically affect the need for various gene products; an
example is the uniquely high concentration of a single
protein—hemoglobin—in erythrocytes. Given the high
cost of protein synthesis, regulation of gene expression
is essential to making optimal use of available energy.
The cellular concentration of a protein is deter-
mined by a delicate balance of at least seven processes,
each having several potential points of regulation:
1. Synthesis of the primary RNA transcript
(transcription)
2. Posttranscriptional modification of mRNA
3. Messenger RNA degradation
4. Protein synthesis (translation)
5. Posttranslational modification of proteins
6. Protein targeting and transport
7. Protein degradation
These processes are summarized in Figure 28–1. We
have examined several of these mechanisms in previous
chapters. Posttranscriptional modification of mRNA, by
processes such as alternative splicing patterns (see
Fig. 26–19b) or RNA editing (see Box 27–1), can affect
which proteins are produced from an mRNA transcript
and in what amounts. A variety of nucleotide sequences
in an mRNA can affect the rate of its degradation (p.
1020). Many factors affect the rate at which an mRNA
is translated into a protein, as well as the posttransla-
tional modification, targeting, and eventual degradation
of that protein (Chapter 27).
This chapter focuses primarily on the regulation of
transcription initiation, although aspects of posttran-
scriptional and translational regulation are also de-
scribed. Of the regulatory processes illustrated in Fig-
ure 28–1, those operating at the level of transcription
initiation are the best documented and probably the most
28
1081
REGULATION OF
GENE EXPRESSION
The fundamental problem of chemical physiology and of
embryology is to understand why tissue cells do not all
express, all the time, all the potentialities inherent in their
genome.
—Fran?ois Jacob and Jacques Monod,
article in Journal of Molecular Biology, 1961
28.1 Principles of Gene Regulation 1082
28.2 Regulation of Gene Expression in Prokaryotes 1092
28.3 Regulation of Gene Expression in Eukaryotes 1102
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1081 mac76 mac76:385_reb:
common. As in all biochemical processes, an efficient
place for regulation is at the beginning of the pathway.
Because synthesis of informational macromolecules is
so extraordinarily expensive in terms of energy, elabo-
rate mechanisms have evolved to regulate the process.
Researchers continue to discover complex and some-
times surprising regulatory mechanisms. Increasingly,
posttranscriptional and translational regulation are
proving to be among the more important of these
processes, especially in eukaryotes. In fact, the regula-
tory processes themselves can involve a considerable in-
vestment of chemical energy.
Control of transcription initiation permits the syn-
chronized regulation of multiple genes encoding prod-
ucts with interdependent activities. For example, when
their DNA is heavily damaged, bacterial cells require a
coordinated increase in the levels of the many DNA re-
pair enzymes. And perhaps the most sophisticated form
of coordination occurs in the complex regulatory circuits
that guide the development of multicellular eukaryotes,
which can involve many types of regulatory mechanisms.
We begin by examining the interactions between
proteins and DNA that are the key to transcriptional reg-
ulation. We next discuss the specific proteins that in-
fluence the expression of specific genes, first in prokary-
otic and then in eukaryotic cells. Information about
posttranscriptional and translational regulation is in-
cluded in the discussion, where relevant, to provide a
more complete overview of the rich complexity of reg-
ulatory mechanisms.
28.1 Principles of Gene Regulation
Genes for products that are required at all times, such
as those for the enzymes of central metabolic path-
ways, are expressed at a more or less constant level in
virtually every cell of a species or organism. Such genes
are often referred to as housekeeping genes. Un-
varying expression of a gene is called constitutive
gene expression.
For other gene products, cellular levels rise and fall
in response to molecular signals; this is regulated gene
expression. Gene products that increase in concen-
tration under particular molecular circumstances are re-
ferred to as inducible; the process of increasing their
expression is induction. The expression of many of the
genes encoding DNA repair enzymes, for example, is in-
duced by high levels of DNA damage. Conversely, gene
products that decrease in concentration in response to
a molecular signal are referred to as repressible, and
the process is called repression. For example, in bac-
teria, ample supplies of tryptophan lead to repression
of the genes for the enzymes that catalyze tryptophan
biosynthesis.
Transcription is mediated and regulated by protein-
DNA interactions, especially those involving the protein
components of RNA polymerase (Chapter 26). We first
consider how the activity of RNA polymerase is regu-
lated, and proceed to a general description of the pro-
teins participating in this process. We then examine the
molecular basis for the recognition of specific DNA se-
quences by DNA-binding proteins.
RNA Polymerase Binds to DNA at Promoters
RNA polymerases bind to DNA and initiate transcrip-
tion at promoters (see Fig. 26–5), sites generally found
near points at which RNA synthesis begins on the DNA
template. The regulation of transcription initiation of-
ten entails changes in how RNA polymerase interacts
with a promoter.
The nucleotide sequences of promoters vary consid-
erably, affecting the binding affinity of RNA polymerases
and thus the frequency of transcription initiation. Some
Chapter 28 Regulation of Gene Expression1082
DNA
Gene
Transcription
Primary
transcript
Posttranscriptional
processing
Mature mRNA
Translation
Posttranslational
processing
mRNA
degradation
Protein
degradation
Protein
(inactive)
Modified
protein
(active)
Nucleotides
Amino acids
Protein targeting
and transport
FIGURE 28–1 Seven processes that affect the steady-state concen-
tration of a protein. Each process has several potential points of
regulation.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1082 mac76 mac76:385_reb:
Escherichia coli genes are transcribed once per second,
others less than once per cell generation. Much of this
variation is due to differences in promoter sequence. In
the absence of regulatory proteins, differences in pro-
moter sequences may affect the frequency of transcrip-
tion initiation by a factor of 1,000 or more. Most E. coli
promoters have a sequence close to a consensus (Fig.
28–2). Mutations that result in a shift away from the con-
sensus sequence usually decrease promoter function;
conversely, mutations toward consensus usually enhance
promoter function.
Although housekeeping genes are expressed con-
stitutively, the cellular concentrations of the proteins
they encode vary widely. For these genes, the RNA
polymerase–promoter interaction strongly influences
the rate of transcription initiation; differences in pro-
moter sequence allow the cell to synthesize the appro-
priate level of each housekeeping gene product.
The basal rate of transcription initiation at the pro-
moters of nonhousekeeping genes is also determined by
the promoter sequence, but expression of these genes
is further modulated by regulatory proteins. Many of
these proteins work by enhancing or interfering with the
interaction between RNA polymerase and the promoter.
The sequences of eukaryotic promoters are more
variable than their prokaryotic counterparts (see
Fig. 26–8). The three eukaryotic RNA polymerases usu-
ally require an array of general transcription factors in
order to bind to a promoter. Yet, as with prokaryotic
gene expression, the basal level of transcription is de-
termined by the effect of promoter sequences on the
function of RNA polymerase and its associated tran-
scription factors.
Transcription Initiation Is Regulated by Proteins That
Bind to or near Promoters
At least three types of proteins regulate transcription
initiation by RNA polymerase: specificity factors alter
the specificity of RNA polymerase for a given promoter
or set of promoters; repressors impede access of RNA
polymerase to the promoter; and activators enhance
the RNA polymerase–promoter interaction.
We introduced prokaryotic specificity factors in
Chapter 26 (see Fig. 26–5), although we did not refer to
them by that name. The H9268 subunit of the E. coli RNA
polymerase holoenzyme is a specificity factor that medi-
ates promoter recognition and binding. Most E. coli pro-
moters are recognized by a single H9268 subunit (M
r
70,000),
H9268
70
. Under some conditions, some of the H9268
70
subunits are
replaced by another specificity factor. One notable case
arises when the bacteria are subjected to heat stress,
leading to the replacement of H9268
70
by H9268
32
(M
r
32,000).
When bound to H9268
32
, RNA polymerase is directed to a spe-
cialized set of promoters with a different consensus
sequence (Fig. 28–3). These promoters control the ex-
pression of a set of genes that encode the heat-shock
response proteins. Thus, through changes in the binding
affinity of the polymerase that direct it to different pro-
moters, a set of genes involved in related processes is co-
ordinately regulated. In eukaryotic cells, some of the gen-
eral transcription factors, in particular the TATA-binding
protein (TBP; see Fig. 26–8), may be considered speci-
ficity factors.
Repressors bind to specific sites on the DNA. In
prokaryotic cells, such binding sites, called operators,
are generally near a promoter. RNA polymerase binding,
28.1 Principles of Gene Regulation 1083
TTGACA TATAAT
H500835 region H500810 region
N
5–9
mRNA
RNA start site
N
17
UP element5H11032DNA
FIGURE 28–2 Consensus sequence for many E. coli promoters. Most
base substitutions in the H1100210 and H1100235 regions have a negative effect
on promoter function. Some promoters also include the UP (upstream
promoter) element (see Fig. 26–5). By convention, DNA sequences
are shown as they exist in the nontemplate strand, with the 5H11032 termi-
nus on the left. Nucleotides are numbered from the transcription start
site, with positive numbers to the right (in the direction of transcrip-
tion) and negative numbers to the left. N indicates any nucleotide.
TNTCNCCCTTGAA CCCCATTTA N
7
mRNA
RNA start site
N
13–15
5H11032DNA
FIGURE 28–3 Consensus sequence for promoters that regulate expression of the E. coli heat-
shock genes. This system responds to temperature increases as well as some other environmental
stresses, resulting in the induction of a set of proteins. Binding of RNA polymerase to heat-shock
promoters is mediated by a specialized H9268 subunit of the polymerase, H9268
32
, which replaces H9268
70
in
the RNA polymerase initiation complex.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1083 mac76 mac76:385_reb:
or its movement along the DNA after binding, is blocked
when the repressor is present. Regulation by means of
a repressor protein that blocks transcription is referred
to as negative regulation. Repressor binding to DNA
is regulated by a molecular signal (or effector), usually
a small molecule or a protein, that binds to the repres-
sor and causes a conformational change. The interaction
between repressor and signal molecule either increases
or decreases transcription. In some cases, the confor-
mational change results in dissociation of a DNA-bound
repressor from the operator (Fig. 28–4a). Transcription
initiation can then proceed unhindered. In other cases,
interaction between an inactive repressor and the signal
molecule causes the repressor to bind to the operator
(Fig. 28–4b). In eukaryotic cells, the binding site for a
repressor may be some distance from the promoter;
binding has the same effect as in bacterial cells: inhibit-
ing the assembly or activity of a transcription complex
at the promoter.
Activators provide a molecular counterpoint to re-
pressors; they bind to DNA and enhance the activity of
RNA polymerase at a promoter; this is positive regu-
lation. Activator binding sites are often adjacent to
promoters that are bound weakly or not at all by RNA
polymerase alone, such that little transcription occurs
in the absence of the activator. Some eukaryotic acti-
vators bind to DNA sites, called enhancers, that are
quite distant from the promoter, affecting the rate of
transcription at a promoter that may be located thou-
sands of base pairs away. Some activators are normally
bound to DNA, enhancing transcription until dissociation
of the activator is triggered by the binding of a signal
molecule (Fig. 28–4c). In other cases the activator binds
to DNA only after interaction with a signal molecule
Chapter 28 Regulation of Gene Expression1084
Molecular signal
causes dissociation
of regulatory protein
from DNA
Signal
molecule
Promoter
DNA
Operator
mRNA
5H11032 3H11032
mRNA
5H11032 3H11032
(a)
RNA polymerase
(c)
Molecular signal
causes binding
of regulatory protein
to DNA
mRNA
5H11032 3H11032
mRNA
5H11032 3H11032
(b) (d)
Negative regulation
(bound repressor inhibits transcription)
Positive regulation
(bound activator facilitates transcription)
FIGURE 28–4 Common patterns of regulation of transcription initi-
ation. Two types of negative regulation are illustrated. (a) Repressor
(pink) binds to the operator in the absence of the molecular signal;
the external signal causes dissociation of the repressor to permit tran-
scription. (b) Repressor binds in the presence of the signal; the re-
pressor dissociates and transcription ensues when the signal is re-
moved. Positive regulation is mediated by gene activators. Again, two
types are shown. (c) Activator (green) binds in the absence of the mo-
lecular signal and transcription proceeds; when the signal is added,
the activator dissociates and transcription is inhibited. (d) Activator
binds in the presence of the signal; it dissociates only when the sig-
nal is removed. Note that “positive” and “negative” regulation refer to
the type of regulatory protein involved: the bound protein either fa-
cilitates or inhibits transcription. In either case, addition of the mo-
lecular signal may increase or decrease transcription, depending on
its effect on the regulatory protein.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1084 mac76 mac76:385_reb:
(Fig. 28–4d). Signal molecules can therefore increase or
decrease transcription, depending on how they affect
the activator. Positive regulation is particularly common
in eukaryotes, as we shall see.
Many Prokaryotic Genes Are Clustered and
Regulated in Operons
Bacteria have a simple general mechanism for coordi-
nating the regulation of genes encoding products that
participate in a set of related processes: these genes are
clustered on the chromosome and are transcribed to-
gether. Many prokaryotic mRNAs are polycistronic—
multiple genes on a single transcript—and the single
promoter that initiates transcription of the cluster is the
site of regulation for expression of all the genes in the
cluster. The gene cluster and promoter, plus additional
sequences that function together in regulation, are
called an operon (Fig. 28–5). Operons that include two
to six genes transcribed as a unit are common; some
operons contain 20 or more genes.
Many of the principles of prokaryotic gene expres-
sion were first defined by studies of lactose metabolism
in E. coli, which can use lactose as its sole carbon source.
In 1960, Fran?ois Jacob and Jacques Monod published
a short paper in the Proceedings of the French Acad-
emy of Sciences that described how two adjacent genes
involved in lactose metabolism were coordinately regu-
lated by a genetic element located at one end of the
gene cluster. The genes were those for H9252-galactosidase,
which cleaves lactose to galactose and glucose, and
galactoside permease, which transports lactose into the
cell (Fig. 28–6). The terms “operon” and “operator”
were first introduced in this paper. With the operon
model, gene regulation could, for the first time, be con-
sidered in molecular terms.
The lac Operon Is Subject to Negative Regulation
The lactose (lac) operon (Fig. 28–7a) includes the
genes for H9252-galactosidase (Z), galactoside permease
(Y ), and thiogalactoside transacetylase (A). The last of
these enzymes appears to modify toxic galactosides to
facilitate their removal from the cell. Each of the three
genes is preceded by a ribosome binding site (not shown
in Fig. 28–7) that independently directs the translation
28.1 Principles of Gene Regulation 1085
DNA Promoter
Activator
binding site
Repressor
binding site
(operator)
Regulatory sequences Genes transcribed as a unit
ABC
FIGURE 28–5 Representative prokaryotic operon. Genes A, B, and
C are transcribed on one polycistronic mRNA. Typical regulatory se-
quences include binding sites for proteins that either activate or re-
press transcription from the promoter. Fran?ois Jacob Jacques Monod, 1910–1976
CH
2
OH
Outside
Inside
Lactose
Lactose
Galactoside permease
CH
2
OH
H
H
H
H
H
O
O
OH
OH
HO H
H
H
H
H
O
OH
OH
OH
CH
2
OH
H
H
H
H
H
O
O
OH
OH
HO
CH
2
HO
H
H
H
H
O
OH
OH
H OH
CH
2
OH
H
H
H
Galactose Glucose
Allolactose
-galactosidaseH9252
H
H
O
H11001
OH
OH
HO
HO
OH
CH
2
OH
H
H
H
H
H
O
OH
OH
OH
FIGURE 28–6 Lactose metabolism in E. coli. Uptake and metabolism
of lactose require the activities of galactoside permease and H9252-
galactosidase. Conversion of lactose to allolactose by transglycosyla-
tion is a minor reaction also catalyzed by H9252-galactosidase.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1085 mac76 mac76:385_reb:
of that gene (Chapter 27). Regulation of the lac operon
by the lac repressor protein (Lac) follows the pattern
outlined in Figure 28–4a.
The study of lac operon mutants has revealed some
details of the workings of the operon’s regulatory sys-
tem. In the absence of lactose, the lac operon genes are
repressed. Mutations in the operator or in another gene,
the I gene, result in constitutive synthesis of the gene
products. When the I gene is defective, repression can
be restored by introducing a functional I gene into the
cell on another DNA molecule, demonstrating that the
I gene encodes a diffusible molecule that causes gene
repression. This molecule proved to be a protein, now
called the Lac repressor, a tetramer of identical
monomers. The operator to which it binds most tightly
(O
1
) abuts the transcription start site (Fig. 28–7a). The
I gene is transcribed from its own promoter (P
I
) inde-
pendent of the lac operon genes. The lac operon has
two secondary binding sites for the Lac repressor. One
(O
2
) is centered near position H11001410, within the gene
encoding H9252-galactosidase (Z); the other (O
3
) is near po-
sition H1100290, within the I gene. To repress the operon, the
Lac repressor appears to bind to both the main opera-
tor and one of the two secondary sites, with the inter-
vening DNA looped out (Fig. 28–7b, c). Either binding
arrangement blocks transcription initiation.
Chapter 28 Regulation of Gene Expression1086
Operators
(b)
(c) (d)
FIGURE 28–7 The lac operon. (a) The lac operon in the repressed
state. The I gene encodes the Lac repressor. The lac Z, Y, and A genes
encode H9252-galactosidase, galactoside permease, and thiogalactoside
transacetylase, respectively. P is the promoter for the lac genes, and
P
I
is the promoter for the I gene. O
1
is the main operator for the lac
operon; O
2
and O
3
are secondary operator sites of lesser affinity for
the Lac repressor. (b) The Lac repressor binds to the main operator
and O
2
or O
3
, apparently forming a loop in the DNA that might wrap
around the repressor as shown. (c) Lac repressor bound to DNA (de-
rived from PDB ID 1LBG). This shows the protein (gray) bound to short,
discontinuous segments of DNA (blue). (d) Conformational change in
the Lac repressor caused by binding of the artificial inducer iso-
propylthiogalactoside, IPTG (derived from PDB ID 1LBH and 1LBG).
The structure of the tetrameric repressor is shown without IPTG bound
(transparent image) and with IPTG bound (overlaid solid image; IPTG
not shown). The DNA bound when IPTG is absent (transparent struc-
ture) is not shown. When IPTG is bound and DNA is not bound, the
repressor’s DNA-binding domains are too disordered to be defined in
the crystal structure.
DNA P
I
IZYA
mRNA
Lac repressor
PO
1
O
2
O
3
(a)
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1086 mac76 mac76:385_reb:
Despite this elaborate binding complex, repression
is not absolute. Binding of the Lac repressor reduces
the rate of transcription initiation by a factor of 10
3
. If
the O
2
and O
3
sites are eliminated by deletion or muta-
tion, the binding of repressor to O
1
alone reduces tran-
scription by a factor of about 10
2
. Even in the repressed
state, each cell has a few molecules of H9252-galactosidase
and galactoside permease, presumably synthesized on
the rare occasions when the repressor transiently dis-
sociates from the operators. This basal level of tran-
scription is essential to operon regulation.
When cells are provided with lactose, the lac operon
is induced. An inducer (signal) molecule binds to a spe-
cific site on the Lac repressor, causing a conformational
change (Fig. 28–7d) that results in dissociation of the
repressor from the operator. The inducer in the lac
operon system is not lactose itself but allolactose, an
isomer of lactose (Fig. 28–6). After entry into the E.
coli cell (via the few existing molecules of permease),
lactose is converted to allolactose by one of the few ex-
isting H9252-galactosidase molecules. Release of the opera-
tor by Lac repressor, triggered as the repressor binds to
allolactose, allows expression of the lac operon genes
and leads to a 10
3
-fold increase in the concentration of
H9252-galactosidase.
Several H9252-galactosides structurally related to allo-
lactose are inducers of the lac operon but are not sub-
strates for H9252-galactosidase; others are substrates but not
inducers. One particularly effective and nonmetaboliz-
able inducer of the lac operon that is often used ex-
perimentally is isopropylthiogalactoside (IPTG):
C
CH
2
CH
3
H
Isopropylthiogalactoside
(IPTG)
CH
3
OH
OH
O
H
OH
H
H
OH
SH
H
An inducer that cannot be metabolized allows researchers
to explore the physiological function of lactose as a car-
bon source for growth, separate from its function in the
regulation of gene expression.
In addition to the multitude of operons now known
in bacteria, a few polycistronic operons have been found
in the cells of lower eukaryotes. In the cells of higher
eukaryotes, however, almost all protein-encoding genes
are transcribed separately.
The mechanisms by which operons are regulated
can vary significantly from the simple model presented
in Figure 28–7. Even the lac operon is more complex
than indicated here, with an activator also contributing
to the overall scheme, as we shall see in Section 28.2.
Before any further discussion of the layers of regulation
of gene expression, however, we examine the critical
molecular interactions between DNA-binding proteins
(such as repressors and activators) and the DNA se-
quences to which they bind.
Regulatory Proteins Have Discrete
DNA-Binding Domains
Regulatory proteins generally bind to specific DNA se-
quences. Their affinity for these target sequences is
roughly 10
4
to 10
6
times higher than their affinity for
any other DNA sequences. Most regulatory proteins
have discrete DNA-binding domains containing sub-
structures that interact closely and specifically with the
DNA. These binding domains usually include one or
more of a relatively small group of recognizable and
characteristic structural motifs.
To bind specifically to DNA sequences, regulatory
proteins must recognize surface features on the DNA.
Most of the chemical groups that differ among the four
bases and thus permit discrimination between base pairs
are hydrogen-bond donor and acceptor groups exposed
in the major groove of DNA (Fig. 28–8), and most of the
protein-DNA contacts that impart specificity are hydro-
gen bonds. A notable exception is the nonpolar surface
28.1 Principles of Gene Regulation 1087
N
N
H
O
N
N
O
N
N
N
H
O
N
O
N
N
N
H
N
N
H
H
H
H
Major groove Major groove
Minor groove Minor groove
Major groove
Minor grooveMinor groove
Major groove
O
N
O
N
N
N
N
H
N
N
H
H
N
N
N
H
O
N
N
O
N
N
N
H
H
N
H
H
N
H
CH
3
CH
3
6
5
1
Adenine Thymine Thymine AdenineGuanine Cytosine Cytosine Guanine
FIGURE 28–8 Groups in DNA available for protein binding. Shown
here are functional groups on all four base pairs that are displayed in
the major and minor grooves of DNA. Groups that can be used for
base-pair recognition by proteins are shown in red.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1087 mac76 mac76:385_reb:
near C-5 of pyrimidines, where thymine is readily dis-
tinguished from cytosine by its protruding methyl group.
Protein-DNA contacts are also possible in the minor
groove of the DNA, but the hydrogen-bonding patterns
here generally do not allow ready discrimination be-
tween base pairs.
Within regulatory proteins, the amino acid side
chains most often hydrogen-bonding to bases in the
DNA are those of Asn, Gln, Glu, Lys, and Arg residues.
Is there a simple recognition code in which a particular
amino acid always pairs with a particular base? The two
hydrogen bonds that can form between Gln or Asn and
the N
6
and N-7 positions of adenine cannot form with
any other base. And an Arg residue can form two hy-
drogen bonds with N-7 and O
6
of guanine (Fig. 28–9).
Examination of the structures of many DNA-binding
proteins, however, has shown that a protein can recog-
nize each base pair in more than one way, leading to the
conclusion that there is no simple amino acid–base code.
For some proteins, the Gln-adenine interaction can
specify AUT base pairs, but in others a van der Waals
pocket for the methyl group of thymine can recognize
AUT base pairs. Researchers cannot yet examine the
structure of a DNA-binding protein and infer the DNA
sequence to which it binds.
To interact with bases in the major groove of DNA,
a protein requires a relatively small structure that can
stably protrude from the protein surface. The DNA-
binding domains of regulatory proteins tend to be small
(60 to 90 amino acid residues), and the structural mo-
tifs within these domains that are actually in contact
with the DNA are smaller still. Many small proteins are
unstable because of their limited capacity to form lay-
ers of structure to bury hydrophobic groups (p. 118).
The DNA-binding motifs provide either a very compact
stable structure or a way of allowing a segment of pro-
tein to protrude from the protein surface.
The DNA-binding sites for regulatory proteins are
often inverted repeats of a short DNA sequence (a palin-
drome) at which multiple (usually two) subunits of a
regulatory protein bind cooperatively. The Lac repres-
sor is unusual in that it functions as a tetramer, with two
dimers tethered together at the end distant from the
DNA-binding sites (Fig. 28–7b). An E. coli cell normally
contains about 20 tetramers of the Lac repressor. Each
of the tethered dimers separately binds to a palindromic
operator sequence, in contact with 17 bp of a 22 bp re-
gion in the lac operon (Fig. 28–10). And each of the
tethered dimers can independently bind to an operator
sequence, with one generally binding to O
1
and the other
to O
2
or O
3
(as in Fig. 28–7b). The symmetry of the O
1
operator sequence corresponds to the twofold axis of
symmetry of two paired Lac repressor subunits. The
tetrameric Lac repressor binds to its operator sequences
in vivo with an estimated dissociation constant of about
10
H1100210
M. The repressor discriminates between the op-
erators and other sequences by a factor of about 10
6
, so
binding to these few base pairs among the 4.6 million
or so of the E. coli chromosome is highly specific.
Several DNA-binding motifs have been described,
but here we focus on two that play prominent roles in
the binding of DNA by regulatory proteins: the helix-
turn-helix and the zinc finger. We also consider a type
of DNA-binding domain—the homeodomain—found in
some eukaryotic proteins.
Helix-Turn-Helix This DNA-binding motif is crucial to the
interaction of many prokaryotic regulatory proteins with
DNA, and similar motifs occur in some eukaryotic reg-
ulatory proteins. The helix-turn-helix motif comprises
about 20 amino acids in two short H9251-helical segments,
Chapter 28 Regulation of Gene Expression1088
N
N
N
N
N
C
CH
3
CH
2
H
H
H
O
N
N
H
N
7
6
H
N
N
O
H11001
N
N
C
C
NH
CH
2
CH
2
CH
2
H
R
N
H
H
H
O
H
N
N
N
N
H
N
H
7
6
H
H
O
O
Glutamine
(or asparagine)
Arginine
CH
2
C
H
HO
RH11032N
CR C
H
HO
RH11032N
Thymine Adenine Cytosine Guanine
DNA TAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCAC
mRNA
H1100235 region H1100210 region
Operator
(bound by Lac repressor)
Promoter
(bound by RNA polymerase)
RNA start site
FIGURE 28–9 Two examples of specific amino acid–base pair inter-
actions that have been observed in DNA-protein binding.
FIGURE 28–10 Relationship between the lac operator sequence O
1
and the lac promoter. The bases shaded beige exhibit twofold (palin-
dromic) symmetry about the axis indicated by the dashed vertical line.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1088 mac76 mac76:385_reb:
each seven to nine amino acid residues long, separated
by a H9252 turn (Fig. 28–11). This structure generally is not
stable by itself; it is simply the reactive portion of a
somewhat larger DNA-binding domain. One of the two
H9251-helical segments is called the recognition helix, be-
cause it usually contains many of the amino acids that
interact with the DNA in a sequence-specific way. This
H9251 helix is stacked on other segments of the protein
structure so that it protrudes from the protein surface.
When bound to DNA, the recognition helix is positioned
in or nearly in the major groove. The Lac repressor has
this DNA-binding motif (Fig. 28–11).
28.1 Principles of Gene Regulation 1089
FIGURE 28–11 Helix-turn-helix. (a) DNA-binding domain of the Lac
repressor (PDB ID 1LCC). The helix-turn-helix motif is shown in red
and orange; the DNA recognition helix is red. (b) Entire Lac repres-
sor (derived from PDB ID 1LBG). The DNA-binding domains are gray,
and the H9251 helices involved in tetramerization are red. The remainder
of the protein (shades of green) has the binding sites for allolactose.
The allolactose-binding domains are linked to the DNA-binding do-
mains through linker helices (yellow). (c) Surface rendering of the
DNA-binding domain of the Lac repressor (gray) bound to DNA (blue).
(d) The same DNA-binding domain as in (c), but separated from the
DNA, with the binding interaction surfaces shown. Some groups on
the protein and DNA that interact through hydrogen-bonding are
shown in red; some groups that interact through hydrophobic inter-
actions are in orange. This model shows only a few of the groups in-
volved in sequence recognition. The complementary nature of the two
surfaces is evident.
(a) (b)
(c) (d)
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1089 mac76 mac76:385_reb:
Zinc Finger In a zinc finger, about 30 amino acid
residues form an elongated loop held together at the
base by a single Zn
2H11001
ion, which is coordinated to four
of the residues (four Cys, or two Cys and two His). The
zinc does not itself interact with DNA; rather, the coor-
dination of zinc with the amino acid residues stabilizes
this small structural motif. Several hydrophobic side
chains in the core of the structure also lend stability.
Figure 28–12 shows the interaction between DNA and
three zinc fingers of a single polypeptide from the mouse
regulatory protein Zif268.
Many eukaryotic DNA-binding proteins contain zinc
fingers. The interaction of a single zinc finger with DNA
is typically weak, and many DNA-binding proteins, like
Zif268, have multiple zinc fingers that substantially en-
hance binding by interacting simultaneously with the
DNA. One DNA-binding protein of the frog Xenopus has
37 zinc fingers. There are few known examples of the
zinc finger motif in prokaryotic proteins.
The precise manner in which proteins with zinc fin-
gers bind to DNA differs from one protein to the next.
Some zinc fingers contain the amino acid residues that
are important in sequence discrimination, whereas oth-
ers appear to bind DNA nonspecifically (the amino acids
required for specificity are located elsewhere in the
protein). Zinc fingers can also function as RNA-binding
motifs—for example, in certain proteins that bind eu-
karyotic mRNAs and act as translational repressors. We
discuss this role later (Section 28.3).
Homeodomain Another type of DNA-binding domain has
been identified in a number of proteins that function as
transcriptional regulators, especially during eukaryotic
development. This domain of 60 amino acids—called the
homeodomain, because it was discovered in homeotic
genes (genes that regulate the development of body pat-
terns)—is highly conserved and has now been identified
in proteins from a wide variety of organisms, including
humans (Fig. 28–13). The DNA-binding segment of the
domain is related to the helix-turn-helix motif. The DNA
sequence that encodes this domain is known as the
homeobox.
Regulatory Proteins Also Have Protein-Protein
Interaction Domains
Regulatory proteins contain domains not only for DNA
binding but also for protein-protein interactions—with
RNA polymerase, other regulatory proteins, or other sub-
units of the same regulatory protein. Examples include
many eukaryotic transcription factors that function as
gene activators, which often bind as dimers to the DNA,
using DNA-binding domains that contain zinc fingers.
Some structural domains are devoted to the interactions
required for dimer formation, which is generally a pre-
requisite for DNA binding. Like DNA-binding motifs, the
structural motifs that mediate protein-protein interac-
tions tend to fall within one of a few common categories.
Two important examples are the leucine zipper and
the basic helix-loop-helix. Structural motifs such as
Chapter 28 Regulation of Gene Expression1090
FIGURE 28–13 Homeodomain. Shown here is a homeodomain
bound to DNA; one of the H9251 helices (red), stacked on two others, can
be seen protruding into the major groove (PDB ID 1B8I). This is only
a small part of the much larger protein Ultrabithorax (Ubx), active in
the regulation of development in fruit flies.
FIGURE 28–12 Zinc fingers. Three zinc fingers (gray) of the regula-
tory protein Zif268, complexed with DNA (blue and white) (PDB ID
1A1L). Each Zn
2H11001
(maroon) coordinates with two His and two Cys
residues (not shown).
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1090 mac76 mac76:385_reb:
these are the basis for classifying some regulatory pro-
teins into structural families.
Leucine Zipper This motif is an amphipathic H9251 helix with
a series of hydrophobic amino acid residues concen-
trated on one side (Fig. 28–14), with the hydrophobic
surface forming the area of contact between the two
polypeptides of a dimer. A striking feature of these
H9251 helices is the occurrence of Leu residues at every
seventh position, forming a straight line along the
hydrophobic surface. Although researchers initially
thought the Leu residues interdigitated (hence the
name “zipper”), we now know that they line up side by
side as the interacting H9251 helices coil around each other
(forming a coiled coil; Fig. 28–14b). Regulatory proteins
with leucine zippers often have a separate DNA-binding
domain with a high concentration of basic (Lys or Arg)
residues that can interact with the negatively charged
phosphates of the DNA backbone. Leucine zippers have
been found in many eukaryotic and a few prokaryotic
proteins.
Basic Helix-Loop-Helix Another common structural motif
occurs in some eukaryotic regulatory proteins implicated
in the control of gene expression during the develop-
ment of multicellular organisms. These proteins share a
conserved region of about 50 amino acid residues im-
portant in both DNA binding and protein dimerization.
This region can form two short amphipathic H9251 helices
linked by a loop of variable length, the helix-loop-helix
(distinct from the helix-turn-helix motif associated
with DNA binding). The helix-loop-helix motifs of two
polypeptides interact to form dimers (Fig. 28–15). In
these proteins, DNA binding is mediated by an adjacent
short amino acid sequence rich in basic residues, simi-
lar to the separate DNA-binding region in proteins con-
taining leucine zippers.
Subunit Mixing in Eukaryotic Regulatory Proteins Several
families of eukaryotic transcription factors have been
defined based on close structural similarities. Within
each family, dimers can sometimes form between two
identical proteins (a homodimer) or between two dif-
ferent members of the family (a heterodimer). A hypo-
thetical family of four different leucine-zipper proteins
could thus form up to ten different dimeric species. In
many cases, the different combinations appear to have
distinct regulatory and functional properties.
28.1 Principles of Gene Regulation 1091
(b)
Zipper
region
FIGURE 28–14 Leucine zippers. (a) Comparison of
amino acid sequences of several leucine zipper
proteins. Note the Leu (L) residues at every seventh
position in the zipper region, and the number of Lys
(K) and Arg (R) residues in the DNA-binding region.
(b) Leucine zipper from the yeast activator protein
GCN4 (PDB ID 1YSA). Only the “zippered” H9251 helices
(gray and light blue), derived from different subunits of
the dimeric protein, are shown. The two helices wrap
around each other in a gently coiled coil. The inter-
acting Leu residues are shown in red.
Source
Regulatory
protein
Amino acid sequence
Mammal
C/EBP
Jun
Fos
GCN4
D
S
E
P
–
K
Q
E
E
–
N
E
R
S
–
S
R
R
S
–
N
I
R
D
–
E
K
I
P
–
Y
A
R
A
–
R
E
R
A
–
V
R
I
L
–
R
K
R
K
R
K
R
R
R
R
R
K
E
M
E
A
–
R
R
R
R
R
K
N
N
N
N
N
N
R
K
T
–
I
I
M
E
–
A
A
A
A
–
V
A
A
A
–
R
S
A
R
–
K
K
K
R
R
K
S
C
C
S
–
R
R
R
R
R
D
K
N
A
–
K
R
R
R
R
K
A
K
R
K
R
K
K
L
R
L
–
Q
E
E
Q
–
R
R
L
R
–
N
I
T
M
–
V
A
D
K
–
E
R
T
Q
–
T
L
L
L
L
Q
E
Q
E
–
Q
E
A
D
–
K
K
E
K
–
V
V
T
V
–
L
K
D
E
–
E
T
Q
E
–
L
L
L
L
L
T
K
E
L
–
S
A
D
S
–
D
Q
K
K
–
N
N
K
N
–
D
S
S
Y
–
R
E
A
H
–
L
L
L
L
L
R
A
Q
E
–
K
S
T
N
–
R
T
E
E
–
V
A
I
V
–
E
N
A
A
–
Q
M
N
R
–
L
L
L
L
L
S
T
L
K
–
R
E
K
K
–
E
Q
E
L
–
L
V
K
V
–
D
A
E
G
–
T
Q
K
E
–
L
L
L
R
L
R
K
E
–
G
Q
F
–
–
–
––– – – –––
DNA-binding region
6 Amino acid
connector Leucine zipper
Invariant Asn(a)
Yeast
Consensus
molecule
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1091 mac76 mac76:385_reb:
In addition to structural domains devoted to DNA
binding and dimerization (or oligomerization), many
regulatory proteins must interact with RNA polymerase,
with unrelated regulatory proteins, or with both. At least
three different types of additional domains for protein-
protein interaction have been characterized (primarily
in eukaryotes): glutamine-rich, proline-rich, and acidic
domains, the names reflecting the amino acid residues
that are especially abundant.
Protein-DNA binding interactions are the basis of
the intricate regulatory circuits fundamental to gene
function. We now turn to a closer examination of these
gene regulatory schemes, first in prokaryotic, then in
eukaryotic systems.
SUMMARY 28.1 Principles of Gene Regulation
■ The expression of genes is regulated by
processes that affect the rates at which gene
products are synthesized and degraded. Much
of this regulation occurs at the level of
transcription initiation, mediated by regulatory
proteins that either repress transcription
(negative regulation) or activate transcription
(positive regulation) at specific promoters.
■ In bacteria, genes that encode products with
interdependent functions are often clustered in
an operon, a single transcriptional unit.
Transcription of the genes is generally blocked
by binding of a specific repressor protein at a
DNA site called an operator. Dissociation of the
repressor from the operator is mediated by a
specific small molecule, an inducer. These
principles were first elucidated in studies of the
lactose (lac) operon. The Lac repressor
dissociates from the lac operator when the
repressor binds to its inducer, allolactose.
■ Regulatory proteins are DNA-binding proteins
that recognize specific DNA sequences; most
have distinct DNA-binding domains. Within
these domains, common structural motifs that
bind DNA are the helix-turn-helix, zinc finger,
and homeodomain.
■ Regulatory proteins also contain domains for
protein-protein interactions, including the
leucine zipper and helix-loop-helix, which are
involved in dimerization, and other motifs
involved in activation of transcription.
28.2 Regulation of Gene Expression
in Prokaryotes
As in many other areas of biochemical investigation, the
study of the regulation of gene expression advanced ear-
lier and faster in bacteria than in other experimental or-
ganisms. The examples of bacterial gene regulation pre-
sented here are chosen from among scores of
well-studied systems, partly for their historical signifi-
cance, but primarily because they provide a good
overview of the range of regulatory mechanisms em-
ployed in prokaryotes. Many of the principles of prokary-
otic gene regulation are also relevant to understanding
gene expression in eukaryotic cells.
We begin by examining the lactose and tryptophan
operons; each system has regulatory proteins, but the
overall mechanisms of regulation are very different. This
is followed by a short discussion of the SOS response in
E. coli, illustrating how genes scattered throughout the
genome can be coordinately regulated. We then describe
two prokaryotic systems of quite different types, illus-
trating the diversity of gene regulatory mechanisms:
regulation of ribosomal protein synthesis at the level of
translation, with many of the regulatory proteins bind-
ing to RNA (rather than DNA), and regulation of a
process called phase variation in Salmonella, which re-
sults from genetic recombination. First, we return to the
lac operon to examine its features in greater detail.
Chapter 28 Regulation of Gene Expression1092
FIGURE 28–15 Helix-loop-helix. The human transcription factor Max,
bound to its DNA target site (PDB ID 1HLO). The protein is dimeric;
one subunit is colored. The DNA-binding segment (pink) merges with
the first helix of the helix-loop-helix (red). The second helix merges
with the carboxyl-terminal end of the subunit (purple). Interaction of
the carboxyl-terminal helices of the two subunits describes a coiled
coil very similar to that of a leucine zipper (see Fig. 28–14b), but with
only one pair of interacting Leu residues (red side chains near the top)
in this particular example. The overall structure is sometimes called a
helix-loop-helix/leucine zipper motif.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1092 mac76 mac76:385_reb:
The lac Operon Undergoes Positive Regulation
The operator-repressor-inducer interactions described
earlier for the lac operon (Fig. 28–7) provide an intu-
itively satisfying model for an on/off switch in the reg-
ulation of gene expression. In truth, operon regulation
is rarely so simple. A bacterium’s environment is too
complex for its genes to be controlled by one signal.
Other factors besides lactose affect the expression of
the lac genes, such as the availability of glucose. Glu-
cose, metabolized directly by glycolysis, is E. coli’s pre-
ferred energy source. Other sugars can serve as the main
or sole nutrient, but extra steps are required to prepare
them for entry into glycolysis, necessitating the syn-
thesis of additional enzymes. Clearly, expressing the
genes for proteins that metabolize sugars such as lac-
tose or arabinose is wasteful when glucose is abundant.
What happens to the expression of the lac operon
when both glucose and lactose are present? A regula-
tory mechanism known as catabolite repression re-
stricts expression of the genes required for catabolism
of lactose, arabinose, and other sugars in the presence
of glucose, even when these secondary sugars are also
present. The effect of glucose is mediated by cAMP, as
a coactivator, and an activator protein known as cAMP
receptor protein, or CRP (the protein is sometimes
called CAP, for catabolite gene activator protein). CRP
is a homodimer (subunit M
r
22,000) with binding sites
for DNA and cAMP. Binding is mediated by a helix-turn-
helix motif within the protein’s DNA-binding domain
(Fig. 28–16). When glucose is absent, CRP-cAMP binds
to a site near the lac promoter (Fig. 28–17a) and stim-
ulates RNA transcription 50-fold. CRP-cAMP is there-
fore a positive regulatory element responsive to glucose
levels, whereas the Lac repressor is a negative regula-
tory element responsive to lactose. The two act in con-
cert. CRP-cAMP has little effect on the lac operon when
the Lac repressor is blocking transcription, and dissoci-
ation of the repressor from the lac operator has little
effect on transcription of the lac operon unless CRP-
cAMP is present to facilitate transcription; when CRP is
not bound, the wild-type lac promoter is a relatively
weak promoter (Fig. 28–17b). The open complex of
RNA polymerase and the promoter (see Fig. 26–6) does
not form readily unless CRP-cAMP is present. CRP inter-
acts directly with RNA polymerase (at the region shown
in Fig. 28–16) through the polymerase’s H9251 subunit.
28.2 Regulation of Gene Expression in Prokaryotes 1093
FIGURE 28–16 CRP homodimer. (PDB ID 1RUN) Bound molecules
of cAMP are shown in red. Note the bending of the DNA around the
protein. The region that interacts with RNA polymerase is shaded
yellow.
FIGURE 28–17 Activation of transcription of the lac operon by CRP.
(a) The binding site for CRP-cAMP is near the promoter. As in the case
of the lac operator, the CRP site has twofold symmetry (bases shaded
beige) about the axis indicated by the dashed line. (b) Sequence of
the lac promoter compared with the promoter consensus sequence.
The differences mean that RNA polymerase binds relatively weakly to
the lac promoter until the polymerase is activated by CRP-cAMP.
TTTACA TATGTTlac promoter
H1100235 region
ATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACAC
H1100210 region
TTGACA TATAAT
Promoter
consensus sequence
(a)
(b)
CRP site
Operator
Bound by RNA polymerase mRNA
H1100235 region H1100210 region
DNA 5H11032
5H11032
3H11032
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1093 mac76 mac76:385_reb:
The effect of glucose on CRP is mediated by the
cAMP interaction (Fig. 28–18). CRP binds to DNA most
avidly when cAMP concentrations are high. In the pres-
ence of glucose, the synthesis of cAMP is inhibited and
efflux of cAMP from the cell is stimulated. As [cAMP]
declines, CRP binding to DNA declines, thereby de-
creasing the expression of the lac operon. Strong in-
duction of the lac operon therefore requires both lac-
tose (to inactivate the lac repressor) and a lowered
concentration of glucose (to trigger an increase in
[cAMP] and increased binding of cAMP to CRP).
CRP and cAMP are involved in the coordinated reg-
ulation of many operons, primarily those that encode
enzymes for the metabolism of secondary sugars such
as lactose and arabinose. A network of operons with a
common regulator is called a regulon. This arrange-
ment, which allows for coordinated shifts in cellular
functions that can require the action of hundreds of
genes, is a major theme in the regulated expression of
dispersed networks of genes in eukaryotes. Other bac-
terial regulons include the heat-shock gene system that
responds to changes in temperature (p. 1083) and the
genes induced in E. coli as part of the SOS response to
DNA damage, described later.
Many Genes for Amino Acid Biosynthetic Enzymes Are
Regulated by Transcription Attenuation
The 20 common amino acids are required in large
amounts for protein synthesis, and E. coli can synthe-
size all of them. The genes for the enzymes needed to
synthesize a given amino acid are generally clustered in
an operon and are expressed whenever existing supplies
of that amino acid are inadequate for cellular require-
ments. When the amino acid is abundant, the biosyn-
thetic enzymes are not needed and the operon is
repressed.
The E. coli tryptophan (trp) operon (Fig. 28–19)
includes five genes for the enzymes required to convert
chorismate to tryptophan. Note that two of the enzymes
catalyze more than one step in the pathway. The mRNA
from the trp operon has a half-life of only about 3 min,
allowing the cell to respond rapidly to changing needs
for this amino acid. The Trp repressor is a homodimer,
each subunit containing 107 amino acid residues (Fig.
28–20). When tryptophan is abundant it binds to the
Trp repressor, causing a conformational change that
permits the repressor to bind to the trp operator and
inhibit expression of the trp operon. The trp operator
site overlaps the promoter, so binding of the repressor
blocks binding of RNA polymerase.
Once again, this simple on/off circuit mediated by a
repressor is not the entire regulatory story. Different
cellular concentrations of tryptophan can vary the rate
of synthesis of the biosynthetic enzymes over a 700-fold
range. Once repression is lifted and transcription be-
gins, the rate of transcription is fine-tuned by a second
regulatory process, called transcription attenuation,
in which transcription is initiated normally but is
abruptly halted before the operon genes are transcribed.
The frequency with which transcription is attenuated is
regulated by the availability of tryptophan and relies on
the very close coupling of transcription and translation
in bacteria.
The trp operon attenuation mechanism uses signals
encoded in four sequences within a 162 nucleotide
leader region at the 5H11032 end of the mRNA, preceding the
initiation codon of the first gene (Fig. 28–21a). Within
the leader lies a region known as the attenuator, made
up of sequences 3 and 4. These sequences base-pair to
Chapter 28 Regulation of Gene Expression1094
Low
glucose
(high cAMP)
cAMP
Lac
repressor
bound
CRP
CRP site Promoter
High
glucose
(low cAMP)
Lac
repressor
Lactose
Lac
repressor
Lactose
(a)
(b)
RNA polymerase
FIGURE 28–18 Combined effects of glucose and lactose on expression of the lac operon. (a) High
levels of transcription take place only when glucose concentrations are low (so cAMP levels are high
and CRP-cAMP is bound) and lactose concentrations are high (so the Lac repressor is not bound).
(b) Without bound activator (CRP-cAMP), the lac promoter is poorly transcribed even when lactose
concentrations are high and the Lac repressor is not bound.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1094 mac76 mac76:385_reb:
form a GqC-rich stem-and-loop structure closely fol-
lowed by a series of U residues. The attenuator struc-
ture acts as a transcription terminator (Fig. 28–21b).
Sequence 2 is an alternative complement for sequence
3 (Fig. 28–21c). If sequences 2 and 3 base-pair, the at-
tenuator structure cannot form and transcription con-
tinues into the trp biosynthetic genes; the loop formed
by the pairing of sequences 2 and 3 does not obstruct
transcription.
Regulatory sequence 1 is crucial for a tryptophan-
sensitive mechanism that determines whether sequence
3 pairs with sequence 2 (allowing transcription to con-
tinue) or with sequence 4 (attenuating transcription).
Formation of the attenuator stem-and-loop structure
depends on events that occur during translation of reg-
ulatory sequence 1, which encodes a leader peptide (so
called because it is encoded by the leader region of the
mRNA) of 14 amino acids, two of which are Trp residues.
The leader peptide has no other known cellular func-
tion; its synthesis is simply an operon regulatory device.
28.2 Regulation of Gene Expression in Prokaryotes 1095
DNA trpR trpE
trp mRNA
(low tryptophan levels)
P
Leader (trpL)
Attenuator
Regulatory region Regulated genes
Anthranilate
synthase,
component I
Anthranilate
synthase,
component II
Anthranilate
synthase
(I
2
, II
2
)
Tryptophan synthase
(
2
H9252
2
)
Chorismate Anthranilate
Glutamine Glutamate
H11001
Pyruvate
PRPP PP
i
N-(5H11032-Phosphoribosyl)- Enol-1-o-carboxy-
phenylamino-
1-deoxyribulose
phosphate
Indole-3-glycerol L-Tryptophan
L-SerineGlyceraldehyde
3-phosphate
trpD trpC trpB trpA
Attenuated
mRNA
(high tryptophan levels)
Tryptophan
synthase,
subunit
Tryptophan
synthase,
subunit
N-(5H11032-Phosphoribosyl)-
anthranilate isomerase
Indole-3-glycerol
phosphate synthase
Trp
Trp
repressor
O
anthranilate phosphate
CO
2
H11001
H
2
O
H9252 H9251
H9251
FIGURE 28–19 The trp operon. This operon is regulated by two
mechanisms: when tryptophan levels are high, (1) the repressor
(upper left) binds to its operator and (2) transcription of trp mRNA
is attenuated (see Fig. 28–21). The biosynthesis of tryptophan by the
enzymes encoded in the trp operon is diagrammed at the bottom
(see also Fig. 22–17).
FIGURE 28–20 Trp repressor. The repressor is a dimer, with both sub-
units (gray and light blue) binding the DNA at helix-turn-helix motifs
(PDB ID 1TRO). Bound molecules of tryptophan are in red.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1095 mac76 mac76:385_reb:
(c)
AGAUACC
C
A
G
C
C
C
G
C
C
U
A
A
U
G
A
G
C
G
G
G
C
U
U
UUUUU
110
3:4 Pair
(attenuator)
A
AC
C
U
C
G
G
G
C
G
C
C
C
G
AA
GC
UC
G
U
A
C
A
U
U
U
C
A
G
AA
CC
CU
AA
U
G
C
A
C
G
G
U
A
A
A
2:3 Pair
100
110
80
90
UUUU 3H110325H11032
mRNA
Trp codons
trpL
trpL
12
34
Ribosome
Attenuator
structure
When tryptophan levels are high, the ribosome quickly translates
sequence 1 (open reading frame encoding leader peptide) and blocks
sequence 2 before sequence 3 is transcribed. Continued transcription
leads to attenuation at the terminator-like attenuator structure
formed by sequences 3 and 4.
When tryptophan levels are low, the ribosome pauses at the
Trp codons in sequence 1. Formation of the paired structure
between sequences 2 and 3 prevents attenuation, because
sequence 3 is no longer available to form the attenuator
structure with sequence 4. The 2:3 structure, unlike the
3:4 attenuator, does not prevent transcription.
Completed
leader
peptide
5H11032
1
23
M
K
A
I
F
V
L
K
G
M
K
AI
FV
L
K
G
W
W
R
T
S
Incomplete
leader peptide
4
trp-regulated genes
RNA
polymerase
DNA
DNA
(b)
(c)
AGAUACC
C
A
G
C
C
C
G
C
C
U
A
A
U
G
A
G
C
G
G
G
C
U
U
UUUUU
110
3:4 Pair
(attenuator)
A
AC
C
U
C
G
G
G
C
G
C
C
C
G
AA
GC
UC
G
U
A
C
A
U
U
U
C
A
G
AA
CC
CU
AA
U
G
C
A
C
G
G
U
A
A
A
2:3 Pair
100
110
80
90
mRNA
(a)
pppAAGUUCACGUAAAAAGGGUAUCGACAAUGAAAGCAAUUUUCGUAC
U
G
A
A
A
G
G
UUGGUGGCGCACUUCCUGAAACGGGCAGUGUAUUCACCAUGCGUAAAG
C
A
A
U
C
A
G
A
U
ACCCAGCCCGCCUAAUGAGCGGGCUUUUUUUUGAACAAAAUUAGAGAAUAACAAUGCAAACA
1
2
34
Met Lys Ala Ile Phe Val
TrpTrpArgThrSer(stop)
Gly
Leu
Lys
Met Gln Thr
Site of
transcription
attenuation
Leader peptide
139
End of leader
region (trpL)
TrpE polypeptide
162
FIGURE 28–21 Transcriptional attenuation in the trp operon. Tran-
scription is initiated at the beginning of the 162 nucleotide mRNA
leader encoded by a DNA region called trpL (see Fig. 28-19). A reg-
ulatory mechanism determines whether transcription is attenuated at
the end of the leader or continues into the structural genes. (a) The
trp mRNA leader (trpL). The attenuation mechanism in the trp operon
involves sequences 1 to 4 (highlighted). (b) Sequence 1 encodes a
small peptide, the leader peptide, containing two Trp residues (W); it
is translated immediately after transcription begins. Sequences 2 and
3 are complementary, as are sequences 3 and 4. The attenuator struc-
ture forms by the pairing of sequences 3 and 4 (top). Its structure and
function are similar to those of a transcription terminator (see Fig.
26–7). Pairing of sequences 2 and 3 (bottom) prevents the attenuator
structure from forming. Note that the leader peptide has no other cel-
lular function. Translation of its open reading frame has a purely reg-
ulatory role that determines which complementary sequences (2 and
3 or 3 and 4) are paired. (c) Base-pairing schemes for the comple-
mentary regions of the trp mRNA leader.
8885d_c28_1096 2/19/04 6:13 AM Page 1096 mac76 mac76:385_reb:
28.2 Regulation of Gene Expression in Prokaryotes 1097
This peptide is translated immediately after it is tran-
scribed, by a ribosome that follows closely behind RNA
polymerase as transcription proceeds.
When tryptophan concentrations are high, concen-
trations of charged tryptophan tRNA (Trp-tRNA
Trp
) are
also high. This allows translation to proceed rapidly past
the two Trp codons of sequence 1 and into sequence 2,
before sequence 3 is synthesized by RNA polymerase.
In this situation, sequence 2 is covered by the ribosome
and unavailable for pairing to sequence 3 when se-
quence 3 is synthesized; the attenuator structure (se-
quences 3 and 4) forms and transcription halts (Fig.
28–21b, top). When tryptophan concentrations are low,
however, the ribosome stalls at the two Trp codons in
sequence 1, because charged tRNA
Trp
is less available.
Sequence 2 remains free while sequence 3 is synthe-
sized, allowing these two sequences to base-pair and
permitting transcription to proceed (Fig. 28–21b, bot-
tom). In this way, the proportion of transcripts that
are attenuated declines as tryptophan concentration
declines.
Many other amino acid biosynthetic operons use a
similar attenuation strategy to fine-tune biosynthetic en-
zymes to meet the prevailing cellular requirements. The
15 amino acid leader peptide produced by the phe
operon contains seven Phe residues. The leu operon
leader peptide has four contiguous Leu residues. The
leader peptide for the his operon contains seven con-
tiguous His residues. In fact, in the his operon and a
number of others, attenuation is sufficiently sensitive to
be the only regulatory mechanism.
Induction of the SOS Response Requires Destruction
of Repressor Proteins
Extensive DNA damage in the bacterial chromosome
triggers the induction of many distantly located genes.
This response, called the SOS response (p. 976), pro-
vides another good example of coordinated gene regu-
lation. Many of the induced genes are involved in DNA
repair (see Table 25–6). The key regulatory proteins are
the RecA protein and the LexA repressor.
The LexA repressor (M
r
22,700) inhibits transcrip-
tion of all the SOS genes (Fig. 28–22), and induction
of the SOS response requires removal of LexA. This is
not a simple dissociation from DNA in response to bind-
ing of a small molecule, as in the regulation of the lac
operon described above. Instead, the LexA repressor is
FIGURE 28–22 SOS response in E. coli. See Table
25–6 for the functions of many of these proteins.
The LexA protein is the repressor in this system,
which has an operator site (red) near each gene.
Because the recA gene is not entirely repressed by
the LexA repressor, the normal cell contains about
1,000 RecA monomers. 1 When DNA is exten-
sively damaged (e.g., by UV light), DNA replication
is halted and the number of single-strand gaps in
the DNA increases. 2 RecA protein binds to this
damaged, single-stranded DNA, activating the
protein’s coprotease activity. 3 While bound to
DNA, the RecA protein facilitates cleavage and
inactivation of the LexA repressor. When the
repressor is inactivated, the SOS genes, including
recA, are induced; RecA levels increase 50- to
100-fold.
1 Damage to
DNA produces
single-strand gap.
lexA
dinF
uvrA
polB
dinB uvrB
sulA
umuC,D
recA
E. coli chromosome
LexA
repressor
Replication
2 RecA binds to
single-stranded DNA.
lexA
dinF
uvrA
polB
dinB uvrB
sulA
umuC,D
recA
3 LexA repressor is inactivated
activated
proteolysis
RecA
protein
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1097 mac76 mac76:385_reb:
inactivated when it catalyzes its own cleavage at a spe-
cific Ala–Gly peptide bond, producing two roughly
equal protein fragments. At physiological pH, this au-
tocleavage reaction requires the RecA protein. RecA is
not a protease in the classical sense, but its interaction
with LexA facilitates the repressor’s self-cleavage reac-
tion. This function of RecA is sometimes called a co-
protease activity.
The RecA protein provides the functional link be-
tween the biological signal (DNA damage) and induc-
tion of the SOS genes. Heavy DNA damage leads to nu-
merous single-strand gaps in the DNA, and only RecA
that is bound to single-stranded DNA can facilitate
cleavage of the LexA repressor (Fig. 28–22, bottom).
Binding of RecA at the gaps eventually activates its co-
protease activity, leading to cleavage of the LexA re-
pressor and SOS induction.
During induction of the SOS response in a severely
damaged cell, RecA also cleaves and thus inactivates the
repressors that otherwise allow propagation of certain
viruses in a dormant lysogenic state within the bacter-
ial host. This provides a remarkable illustration of evo-
lutionary adaptation. These repressors, like LexA, also
undergo self-cleavage at a specific Ala–Gly peptide
bond, so induction of the SOS response permits repli-
cation of the virus and lysis of the cell, releasing new
viral particles. Thus the bacteriophage can make a hasty
exit from a compromised bacterial host cell.
Synthesis of Ribosomal Proteins Is Coordinated
with rRNA Synthesis
In bacteria, an increased cellular demand for protein
synthesis is met by increasing the number of ribosomes
rather than altering the activity of individual ribosomes.
In general, the number of ribosomes increases as the
cellular growth rate increases. At high growth rates, ri-
bosomes make up approximately 45% of the cell’s dry
weight. The proportion of cellular resources devoted to
making ribosomes is so large, and the function of ribo-
somes so important, that cells must coordinate the syn-
thesis of the ribosomal components: the ribosomal pro-
teins (r-proteins) and RNAs (rRNAs). This regulation is
distinct from the mechanisms described so far, because
it occurs largely at the level of translation.
The 52 genes that encode the r-proteins occur in at
least 20 operons, each with 1 to 11 genes. Some of these
operons also contain the genes for the subunits of
DNA primase (see Fig. 25–13), RNA polymerase (see
Fig. 26–4), and protein synthesis elongation factors (see
Fig. 27–23)—revealing the close coupling of replication,
transcription, and protein synthesis during cell growth.
The r-protein operons are regulated primarily
through a translational feedback mechanism. One
r-protein encoded by each operon also functions as a
translational repressor, which binds to the mRNA
transcribed from that operon and blocks translation of
all the genes the messenger encodes (Fig. 28–23). In
general, the r-protein that plays the role of repressor
also binds directly to an rRNA. Each translational re-
pressor r-protein binds with higher affinity to the ap-
propriate rRNA than to its mRNA, so the mRNA is bound
and translation repressed only when the level of the
r-protein exceeds that of the rRNA. This ensures that
translation of the mRNAs encoding r-proteins is re-
pressed only when synthesis of these r-proteins exceeds
that needed to make functional ribosomes. In this way,
the rate of r-protein synthesis is kept in balance with
rRNA availability.
The mRNA binding site for the translational re-
pressor is near the translational start site of one of the
genes in the operon, usually the first gene (Fig. 28–23).
In other operons this would affect only that one gene,
because in bacterial polycistronic mRNAs most genes
have independent translation signals. In the r-protein
operons, however, the translation of one gene depends
on the translation of all the others. The mechanism of
this translational coupling is not yet understood in de-
tail. However, in some cases the translation of multiple
genes appears to be blocked by folding of the mRNA
into an elaborate three-dimensional structure that is sta-
bilized both by internal base-pairing (as in Fig. 8–26)
and by binding of the translational repressor protein.
When the translational repressor is absent, ribosome
binding and translation of one or more of the genes dis-
rupts the folded structure of the mRNA and allows all
the genes to be translated.
Because the synthesis of r-proteins is coordinated
with the available rRNA, the regulation of ribosome pro-
duction reflects the regulation of rRNA synthesis. In E.
coli, rRNA synthesis from the seven rRNA operons re-
sponds to cellular growth rate and to changes in the
availability of crucial nutrients, particularly amino acids.
The regulation coordinated with amino acid concentra-
tions is known as the stringent response (Fig. 28–24).
When amino acid concentrations are low, rRNA synthe-
sis is halted. Amino acid starvation leads to the binding
of uncharged tRNAs to the ribosomal A site; this trig-
gers a sequence of events that begins with the binding
of an enzyme called stringent factor (RelA protein)
to the ribosome. When bound to the ribosome, stringent
factor catalyzes formation of the unusual nucleotide
guanosine tetraphosphate (ppGpp; see Fig. 8–42); it
adds pyrophosphate to the 3H11032 position of GTP, in the
reaction
GTP H11001 ATP 88n pppGpp H11001 AMP
then a phosphohydrolase cleaves off one phosphate to
form ppGpp. The abrupt rise in ppGpp level in response
to amino acid starvation results in a great reduction in
rRNA synthesis, mediated at least in part by the bind-
ing of ppGpp to RNA polymerase.
Chapter 28 Regulation of Gene Expression1098
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1098 mac76 mac76:385_reb:
L105H11032 3H11032H9252H9252 H11032
S125H11032 3H11032EF-G EF-Tu
S10 L35H11032 3H11032
L7/L12
S7
L4 L23 L2 (L22, S19) S3 L16 L29 S17
L14 L245H11032 3H11032L5 S14 S8 L18 S5 L30 L15L6
S13 S115H11032 3H11032S4 L17
operon
str operon
S10 operon
spc operon
operon
S4
L4
S8
S7
L10
H9252
H9251H9251
28.2 Regulation of Gene Expression in Prokaryotes 1099
FIGURE 28–23 Translational feedback in some ribosomal
protein operons. The r-proteins that act as translational
repressors are shaded pink. Each translational repressor
blocks the translation of all genes in that operon by binding
to the indicated site on the mRNA. Genes that encode
subunits of RNA polymerase are shaded yellow; genes that
encode elongation factors are blue. The r-proteins of the
large (50S) ribosomal subunit are designated L1 to L34;
those of the small (30S) subunit, S1 to S21.
FIGURE 28–24 Stringent response in E. coli. This response
to amino acid starvation is triggered by binding of an
uncharged tRNA in the ribosomal A site. A protein called
stringent factor binds to the ribosome and catalyzes the
synthesis of pppGpp, which is converted by a phosphohy-
drolase to ppGpp. The signal ppGpp reduces transcription
of some genes and increases that of others, in part by
binding to the H9252 subunit of RNA polymerase and altering
the enzyme’s promoter specificity. Synthesis of rRNA is
reduced when ppGpp levels increase.
P
E
A
3H110325H11032
Growing
polypeptide
NH
3
+
mRNA
GTP H11001 ATP
Stringent
factor (RelA
protein)
(p)ppGpp H11001 AMP
RNA
polymerase
OH
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1099 mac76 mac76:385_reb:
The nucleotide ppGpp, along with cAMP, belongs to
a class of modified nucleotides that act as cellular sec-
ond messengers (p. 302). In E. coli, these two nu-
cleotides serve as starvation signals; they cause large
changes in cellular metabolism by increasing or de-
creasing the transcription of hundreds of genes. In eu-
karyotic cells, similar nucleotide second messengers
also have multiple regulatory functions. The coordina-
tion of cellular metabolism with cell growth is highly
complex, and further regulatory mechanisms undoubt-
edly remain to be discovered.
Some Genes Are Regulated
by Genetic Recombination
Salmonella typhimurium, which inhabits the mam-
malian intestine, moves by rotating the flagella on its
cell surface (Fig. 28–25). The many copies of the pro-
tein flagellin (M
r
53,000) that make up the flagella are
prominent targets of mammalian immune systems. But
Salmonella cells have a mechanism that evades the im-
mune response: they switch between two distinct fla-
gellin proteins (FljB and FliC) roughly once every 1,000
generations, using a process called phase variation.
The switch is accomplished by periodic inversion of
a segment of DNA containing the promoter for a fla-
gellin gene. The inversion is a site-specific recombina-
tion reaction (see Fig. 25–39) mediated by the Hin re-
combinase at specific 14 bp sequences (hix sequences)
at either end of the DNA segment. When the DNA seg-
ment is in one orientation, the gene for FljB flagellin and
the gene encoding a repressor (FljA) are expressed
(Fig. 28–26a); the repressor shuts down expression of
the gene for FliC flagellin. When the DNA segment is
inverted (Fig. 28–26b), the fljA and fljB genes are no
longer transcribed, and the fliC gene is induced as the
repressor becomes depleted. The Hin recombinase, en-
coded by the hin gene in the DNA segment that un-
dergoes inversion, is expressed when the DNA segment
is in either orientation, so the cell can always switch
from one state to the other.
This type of regulatory mechanism has the advan-
tage of being absolute: gene expression is impossible
Chapter 28 Regulation of Gene Expression1100
FIGURE 28–25 Salmonella typhimurium, with flagella evident.
fljAhin fliC
hin mRNA fljB and fljA mRNA
Hin recombinase FljB
flagellin
FljA
protein
(repressor)
DNA
(a)
fljAhin fliC
hin mRNA
Hin recombinase
(b)
fliC mRNA
FliC flagellin
Inverted repeat (hix)
Promoter for FljB
and repressor
Promoter
for FliC
fljB
fljB
Transposed
segment
FIGURE 28–26 Regulation of flagellin genes
in Salmonella: phase variation. The products
of genes fliC and fljB are different flagellins.
The hin gene encodes the recombinase that
catalyzes inversion of the DNA segment
containing the fljB promoter and the hin gene.
The recombination sites (inverted repeats) are
called hix (yellow). (a) In one orientation, fljB
is expressed along with a repressor protein
(product of the fljA gene) that represses tran-
scription of the fliC gene. (b) In the opposite
orientation only the fliC gene is expressed; the
fljA and fljB genes cannot be transcribed. The
interconversion between these two states,
known as phase variation, also requires two
other nonspecific DNA-binding proteins (not
shown), HU (histonelike protein from U13, a
strain of E. coli) and FIS (factor for inversion
stimulation).
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1100 mac76 mac76:385_reb:
when the gene is physically separated from its promoter
(note the position of the fljB promoter in Fig. 28–26b).
An absolute on/off switch may be important in this sys-
tem (even though it affects only one of the two flagellin
genes), because a flagellum with just one copy of the
wrong flagellin might be vulnerable to host antibodies
against that protein. The Salmonella system is by no
means unique. Similar regulatory systems occur in a num-
ber of other bacteria and in some bacteriophages, and
recombination systems with similar functions have been
found in eukaryotes (Table 28–1). Gene regulation by
DNA rearrangements that move genes and/or promot-
ers is particularly common in pathogens that benefit by
changing their host range or by changing their surface
proteins, thereby staying ahead of host immune systems.
SUMMARY 28.2 Regulation of Gene Expression
in Prokaryotes
■ In addition to repression by the Lac repressor,
the E. coli lac operon undergoes positive
regulation by the cAMP receptor protein
(CRP). When [glucose] is low, [cAMP] is high
and CRP-cAMP binds to a specific site on the
DNA, stimulating transcription of the lac
operon and production of lactose-metabolizing
enzymes. The presence of glucose depresses
[cAMP], decreasing expression of lac and other
genes involved in metabolism of secondary
sugars. A group of coordinately regulated
operons is referred to as a regulon.
■ Operons that produce the enzymes of amino
acid synthesis have a regulatory circuit called
attenuation, which uses a transcription
termination site (the attenuator) in the mRNA.
Formation of the attenuator is modulated by a
mechanism that couples transcription and
translation while responding to small changes
in amino acid concentration.
■ In the SOS system, multiple unlinked genes
repressed by a single repressor are induced
simultaneously when DNA damage triggers
RecA protein–facilitated autocatalytic
proteolysis of the repressor.
■ In the synthesis of ribosomal proteins, one
protein in each r-protein operon acts as a
translational repressor. The mRNA is bound by
the repressor, and translation is blocked only
when the r-protein is present in excess of
available rRNA. Some genes are regulated by
genetic recombination processes that move
promoters relative to the genes being
regulated. Regulation can also take place at the
level of translation. These diverse mechanisms
permit very sensitive cellular responses to
environmental change.
28.2 Regulation of Gene Expression in Prokaryotes 1101
Recombinase/ Type of
System recombination site recombination Function
Phase variation (Salmonella) Hin/hix Site-specific Alternative expression of two
flagellin genes allows evasion
of host immune response.
Host range (bacteriophage H9262) Gin/gix Site-specific Alternative expression of two
sets of tail fiber genes affects
host range.
Mating-type switch (yeast) HO endonuclease, Nonreciprocal Alternative expression of two
RAD52 protein, other gene conversion
*
mating types of yeast,
proteins/MAT a and H9251, creates cells of
different mating types that can
mate and undergo meiosis.
Antigenic variation (trypanosomes)
?
Varies Nonreciprocal gene Successive expression of
conversion
*
different genes encoding the
variable surface glycoproteins
(VSGs) allows evasion of host
immune response.
TABLE 28–1 Examples of Gene Regulation by Recombination
*
In nonreciprocal gene conversion (a class of recombination events not discussed in Chapter 25), genetic information is moved from one part of
the genome (where it is silent) to another (where it is expressed). The reaction is similar to replicative transposition (see Fig. 25–43).
?
Trypanosomes cause African sleeping sickness and other diseases (see Box 22–2). The outer surface of a trypanosome is made up of multiple
copies of a single VSG, the major surface antigen. A cell can change surface antigens to more than 100 different forms, precluding an effective
defense by the host immune system.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1101 mac76 mac76:385_reb:
28.3 Regulation of Gene Expression
in Eukaryotes
Initiation of transcription is a crucial regulation point for
both prokaryotic and eukaryotic gene expression. Al-
though some of the same regulatory mechanisms are
used in both systems, there is a fundamental difference
in the regulation of transcription in eukaryotes and
bacteria.
We can define a transcriptional ground state as the
inherent activity of promoters and transcriptional ma-
chinery in vivo in the absence of regulatory sequences.
In bacteria, RNA polymerase generally has access to
every promoter and can bind and initiate transcription
at some level of efficiency in the absence of activators
or repressors; the transcriptional ground state is there-
fore nonrestrictive. In eukaryotes, however, strong pro-
moters are generally inactive in vivo in the absence of
regulatory proteins; that is, the transcriptional ground
state is restrictive. This fundamental difference gives
rise to at least four important features that distinguish
the regulation of gene expression in eukaryotes from
that in bacteria.
First, access to eukaryotic promoters is restricted
by the structure of chromatin, and activation of tran-
scription is associated with many changes in chromatin
structure in the transcribed region. Second, although
eukaryotic cells have both positive and negative regula-
tory mechanisms, positive mechanisms predominate in
all systems characterized so far. Thus, given that the
transcriptional ground state is restrictive, virtually every
eukaryotic gene requires activation to be transcribed.
Third, eukaryotic cells have larger, more complex mul-
timeric regulatory proteins than do bacteria. Finally,
transcription in the eukaryotic nucleus is separated
from translation in the cytoplasm in both space and
time.
The complexity of regulatory circuits in eukaryotic
cells is extraordinary, as the following discussion shows.
We conclude the section with an illustrated description
of one of the most elaborate circuits: the regulatory cas-
cade that controls development in fruit flies.
Transcriptionally Active Chromatin Is Structurally
Distinct from Inactive Chromatin
The effects of chromosome structure on gene regula-
tion in eukaryotes have no clear parallel in prokaryotes.
In the eukaryotic cell cycle, interphase chromosomes
appear, at first viewing, to be dispersed and amorphous
(see Figs 12–41, 24–25). Nevertheless, several forms of
chromatin can be found along these chromosomes.
About 10% of the chromatin in a typical eukaryotic cell
is in a more condensed form than the rest of the chro-
matin. This form, heterochromatin, is transcription-
ally inactive. Heterochromatin is generally associated
with particular chromosome structures—the cen-
tromeres, for example. The remaining, less condensed
chromatin is called euchromatin.
Transcription of a eukaryotic gene is strongly re-
pressed when its DNA is condensed within heterochro-
matin. Some, but not all, of the euchromatin is
transcriptionally active. Transcriptionally active chro-
mosomal regions can be detected based on their in-
creased sensitivity to nuclease-mediated degradation.
Nucleases such as DNase I tend to cleave the DNA of
carefully isolated chromatin into fragments of multiples
of about 200 bp, reflecting the regular repeating struc-
ture of the nucleosome (see Fig. 24–26). In actively tran-
scribed regions, the fragments produced by nuclease ac-
tivity are smaller and more heterogeneous in size. These
regions contain hypersensitive sites, sequences es-
pecially sensitive to DNase I, which consist of about 100
to 200 bp within the 1,000 bp flanking the 5H11032 ends of
transcribed genes. In some genes, hypersensitive sites
are found farther from the 5H11032 end, near the 3H11032 end, or
even within the gene itself.
Many hypersensitive sites correspond to binding
sites for known regulatory proteins, and the relative ab-
sence of nucleosomes in these regions may allow the
binding of these proteins. Nucleosomes are entirely ab-
sent in some regions that are very active in transcrip-
tion, such as the rRNA genes. Transcriptionally active
chromatin tends to be deficient in histone H1, which
binds to the linker DNA between nucleosome particles.
Histones within transcriptionally active chromatin
and heterochromatin also differ in their patterns of co-
valent modification. The core histones of nucleosome
particles (H2A, H2B, H3, H4; see Fig. 24–27) are mod-
ified by irreversible methylation of Lys residues, phos-
phorylation of Ser or Thr residues, acetylation (see be-
low), or attachment of ubiquitin (see Fig. 27–41). Each
of the core histones has two distinct structural domains.
A central domain is involved in histone-histone interac-
tion and the wrapping of DNA around the nucleosome.
A second, lysine-rich amino-terminal domain is gener-
ally positioned near the exterior of the assembled nu-
cleosome particle; the covalent modifications occur at
specific residues concentrated in this amino-terminal
domain. The patterns of modification have led some re-
searchers to propose the existence of a histone code, in
which modification patterns are recognized by enzymes
that alter the structure of chromatin. Modifications as-
sociated with transcriptional activation would be recog-
nized by enzymes that make the chromatin more ac-
cessible to the transcription machinery.
5-Methylation of cytosine residues of CpG se-
quences is common in eukaryotic DNA (p. 296), but
DNA in transcriptionally active chromatin tends to be
undermethylated. Furthermore, CpG sites in particular
genes are more often undermethylated in cells from tis-
sues where the genes are expressed than in those where
Chapter 28 Regulation of Gene Expression1102
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1102 mac76 mac76:385_reb:
the genes are not expressed. The overall pattern sug-
gests that active chromatin is prepared for transcription
by the removal of potential structural barriers.
Chromatin Is Remodeled by Acetylation and
Nucleosomal Displacements
The detailed mechanisms for transcription-associated
structural changes in chromatin, called chromatin re-
modeling, are now coming to light, including identifi-
cation of a variety of enzymes directly implicated in the
process. These include enzymes that covalently modify
the core histones of the nucleosome and others that use
the chemical energy of ATP to remodel nucleosomes on
the DNA (Table 28–2).
The acetylation and deacetylation of histones figure
prominently in the processes that activate chromatin
for transcription. As noted above, the amino-terminal
domains of the core histones are generally rich in Lys
residues. Particular Lys residues are acetylated by
histone acetyltransferases (HATs). Cytosolic (type B)
HATs acetylate newly synthesized histones before the
histones are imported into the nucleus. The subsequent
assembly of the histones into chromatin is facilitated by
additional proteins: CAF1 for H3 and H4, and NAP1 for
H2A and H2B. (See Table 28–2 for an explanation of
some of these abbreviated names.)
Where chromatin is being activated for transcrip-
tion, the nucleosomal histones are further acetylated by
nuclear (type A) HATs. The acetylation of multiple Lys
residues in the amino-terminal domains of histones H3
and H4 can reduce the affinity of the entire nucleosome
for DNA. Acetylation may also prevent or promote in-
teractions with other proteins involved in transcription
or its regulation. When transcription of a gene is no
longer required, the acetylation of nucleosomes in that
vicinity is reduced by the activity of histone deacety-
lases, as part of a general gene-silencing process that
restores the chromatin to a transcriptionally inactive
state. In addition to the removal of certain acetyl groups,
new covalent modification of histones marks chromatin
as transcriptionally inactive. As an example, the Lys
residue at position 9 in histone H3 is often methylated
in heterochromatin.
Chromatin remodeling also requires protein com-
plexes that actively move or displace nucleosomes, hy-
drolyzing ATP in the process (Table 28–2). The enzyme
complex SWI/SNF found in all eukaryotic cells, contains
11 polypeptides (total M
r
2 H11003 10
6
) that together create
hypersensitive sites in the chromatin and stimulate the
binding of transcription factors. SWI/SNF is not required
for the transcription of every gene. NURF is another
ATP-dependent enzyme complex that remodels chro-
matin in ways that complement and overlap the activ-
ity of SWI/SNF. These enzyme complexes play an im-
portant role in preparing a region of chromatin for active
transcription.
Many Eukaryotic Promoters Are Positively Regulated
As already noted, eukaryotic RNA polymerases have lit-
tle or no intrinsic affinity for their promoters; initiation
of transcription is almost always dependent on the
action of multiple activator proteins. One important
reason for the apparent predominance of positive regu-
lation seems obvious: the storage of DNA within chro-
matin effectively renders most promoters inaccessible,
so genes are normally silent in the absence of other reg-
ulation. The structure of chromatin affects access to
some promoters more than others, but repressors that
28.3 Regulation of Gene Expression in Eukaryotes 1103
Oligomeric structure
Enzyme complex
*
(number of polypeptides) Source Activities
GCN5-ADA2-ADA3 3 Yeast GCN5 has type A HAT activity
SAGA/PCAF H1102220 Eukaryotes Includes GCN5-ADA2-ADA3
SWI/SNF 11; total M
r
2 H11003 10
6
Eukaryotes ATP-dependent nucleosome remodeling
NURF 4; total M
r
500,000 Drosophila ATP-dependent nucleosome remodeling
CAFI H110222 Humans; Drosophila Responsible for binding histones H3
and H4 to DNA
NAP1 1; M
r
125,000 Widely distributed in Responsible for binding histones H2A
eukaryotes and H2B to DNA
TABLE 28–2 Some Enzyme Complexes Catalyzing Chromatin Structural Changes Associated with Transcription
*
The abbreviations for eukaryotic genes and proteins are often more confusing or obscure than those used for bacteria. The complex of GCN5
(general control nonderepressible) and ADA (alteration/deficiency activation) proteins was discovered during investigation of the regulation of
nitrogen metabolism genes in yeast. These proteins can be part of the larger SAGA complex (SPF, ADA2,3, GCN5, acetyltransferase) in yeasts.
The equivalent of SAGA in humans is PCAF (p300/CBP-associated factor). SWI (switching) was discovered as a protein required for expression
of certain genes involved in mating-type switching in yeast, and SNF (sucrose nonfermenting) as a factor for expression of the yeast gene for
sucrase. Subsequent studies revealed multiple SWI and SNF proteins that acted in a complex. The SWI/SNF complex has a role in the
expression of a wide range of genes and has been found in many eukaryotes, including humans. NURF is nuclear remodeling factor; CAF1,
chromatin assembly factor; and NAP1, nucleosome assembly protein.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1103 mac76 mac76:385_reb:
bind to DNA so as to preclude access of RNA polymerase
(negative regulation) would often be simply redundant.
Other factors are at play in the use of positive regula-
tion, and speculation generally centers around two: the
large size of eukaryotic genomes and the greater effi-
ciency of positive regulation.
First, nonspecific DNA binding of regulatory pro-
teins becomes a more important problem in the much
larger genomes of higher eukaryotes. And the chance
that a single specific binding sequence will occur ran-
domly at an inappropriate site also increases with
genome size. Specificity for transcriptional activation
can be improved if each of several positive-regulatory
proteins must bind specific DNA sequences and then
form a complex in order to become active. The average
number of regulatory sites for a gene in a multicellular
organism is probably at least five. The requirement for
binding of several positive-regulatory proteins to spe-
cific DNA sequences vastly reduces the probability of
the random occurrence of a functional juxtaposition of
all the necessary binding sites. In principle, a similar
strategy could be used by multiple negative-regulatory
elements, but this brings us to the second reason for the
use of positive regulation: it is simply more efficient. If
the 30,000 to 35,000 genes in the human genome were
negatively regulated, each cell would have to synthe-
size, at all times, this same number of different repres-
sors (or many times this number if multiple regulatory
elements were used at each promoter) in concentra-
tions sufficient to permit specific binding to each “un-
wanted” gene. In positive regulation, most of the genes
are normally inactive (that is, RNA polymerases do not
bind to the promoters) and the cell synthesizes only the
activator proteins needed to promote transcription of
the subset of genes required in the cell at that time.
These arguments notwithstanding, there are examples
of negative regulation in eukaryotes, from yeast to hu-
mans, as we shall see.
DNA-Binding Transactivators and Coactivators
Facilitate Assembly of the General
Transcription Factors
To continue our exploration of the regulation of gene
expression in eukaryotes, we return to the interactions
between promoters and RNA polymerase II (Pol II), the
enzyme responsible for the synthesis of eukaryotic
mRNAs. Although most (but not all) Pol II promoters
include the TATA box and Inr (initiator) sequences, with
their standard spacing (see Fig. 26–8), they vary greatly
in both the number and the location of additional se-
quences required for the regulation of transcription.
These additional regulatory sequences are usually called
enhancers in higher eukaryotes and upstream acti-
vator sequences (UASs) in yeast. A typical enhancer
may be found hundreds or even thousands of base pairs
upstream from the transcription start site, or may even
be downstream, within the gene itself. When bound by
the appropriate regulatory proteins, an enhancer in-
creases transcription at nearby promoters regardless of
its orientation in the DNA. The UASs of yeast function
in a similar way, although generally they must be posi-
tioned upstream and within a few hundred base pairs of
the transcription start site. An average Pol II promoter
may be affected by a half-dozen regulatory sequences
of this type, and even more complex promoters are quite
common.
Successful binding of active RNA polymerase II
holoenzyme at one of its promoters usually requires
the action of other proteins (Fig. 28–27), of three types:
(1) basal transcription factors (see Fig. 26–9, Table
26–1), required at every Pol II promoter; (2) DNA-
binding transactivators, which bind to enhancers or
UASs and facilitate transcription; and (3) coactivators.
The latter group act indirectly—not by binding to the
DNA—and are required for essential communication be-
tween the DNA-binding transactivators and the complex
composed of Pol II and the general transcription factors.
Furthermore, a variety of repressor proteins can inter-
fere with communication between the RNA polymerase
and the DNA-binding transactivators, resulting in re-
pression of transcription (Fig. 28–27b). Here we focus
on the protein complexes shown in Figure 28–27 and
on how they interact to activate transcription.
TATA-Binding Protein The first component to bind in the
assembly of a preinitiation complex at the TATA box of
a typical Pol II promoter is the TATA-binding protein
(TBP). The complete complex includes the basal
(or general) transcription factors TFIIB, TFIIE, TFIIF,
TFIIH; Pol II; and perhaps TFIIA (not all of the factors
are shown in Fig. 28–27). This minimal preinitiation
complex, however, is often insufficient for the initiation
of transcription and generally does not form at all if the
promoter is obscured within chromatin. Positive regu-
lation leading to transcription is imposed by the trans-
activators and coactivators.
DNA-Binding Transactivators The requirements for trans-
activators vary greatly from one promoter to another. A
few transactivators are known to facilitate transcription
at hundreds of promoters, whereas others are specific
for a few promoters. Many transactivators are sensitive
to the binding of signal molecules, providing the capac-
ity to activate or deactivate transcription in response to
a changing cellular environment. Some enhancers bound
by DNA-binding transactivators are quite distant from
the promoter’s TATA box. How do the transactivators
function at a distance? The answer in most cases seems
to be that, as indicated earlier, the intervening DNA is
looped so that the various protein complexes can inter-
act directly. The looping is promoted by certain non-
Chapter 28 Regulation of Gene Expression1104
8885d_c28_1104 2/19/04 6:13 AM Page 1104 mac76 mac76:385_reb:
histone proteins that are abundant in chromatin and
bind nonspecifically to DNA. These high mobility group
(HMG) proteins (Fig. 28–27; “high mobility” refers to
their electrophoretic mobility in polyacrylamide gels)
play an important structural role in chromatin remod-
eling and transcriptional activation.
Coactivator Protein Complexes Most transcription re-
quires the presence of additional protein complexes.
Some major regulatory protein complexes that interact
with Pol II have been defined both genetically and bio-
chemically. These coactivator complexes act as inter-
mediaries between the DNA-binding transactivators and
the Pol II complex.
The best-characterized coactivator is the transcrip-
tion factor TFIID (Fig. 28–27). In eukaryotes, TFIID is
a large complex that includes TBP and ten or more TBP-
associated factors (TAFs). Some TAFs resemble his-
tones and may play a role in displacing nucleosomes dur-
ing the activation of transcription. Many DNA-binding
transactivators aid in transcription initiation by inter-
acting with one or more TAFs. The requirement for
TAFs to initiate transcription can vary greatly from one
gene to another. Some promoters require TFIID, some
do not, and some require only subsets of the TFIID TAF
subunits.
Another important coactivator consists of 20 or
more polypeptides in a protein complex called media-
tor (Fig. 28–27); the 20 core polypeptides are highly
conserved from fungi to humans. Mediator binds tightly
to the carboxyl-terminal domain (CTD) of the largest
subunit of Pol II. The mediator complex is required for
both basal and regulated transcription at promoters
used by Pol II, and it also stimulates the phosphoryla-
tion of the CTD by TFIIH. Both mediator and TFIID are
required at some promoters. As with TFIID, some DNA-
binding transactivators interact with one or more com-
ponents of the mediator complex. Coactivator com-
plexes function at or near the promoter’s TATA box.
Choreography of Transcriptional Activation We can now be-
gin to piece together the sequence of transcriptional ac-
tivation events at a typical Pol II promoter. First, cru-
cial remodeling of the chromatin takes place in stages.
Some DNA-binding transactivators have significant
affinity for their binding sites even when the sites are
within condensed chromatin. Binding of one transacti-
vator may facilitate the binding of others, gradually dis-
placing some nucleosomes.
The bound transactivators can then interact di-
rectly with HATs or enzyme complexes such as
SWI/SNF (or both), accelerating the remodeling of the
surrounding chromatin. In this way a bound transac-
tivator can draw in other components necessary for
further chromatin remodeling to permit transcription
of specific genes. The bound transactivators, gener-
ally acting through complexes such as TFIID or me-
diator (or both), stabilize the binding of Pol II and its
associated transcription factors and greatly facilitate
formation of the preinitiation transcription complex.
Complexity in these regulatory circuits is the rule
rather than the exception, with multiple DNA-bound
transactivators promoting transcription.
28.3 Regulation of Gene Expression in Eukaryotes 1105
TATAUAS Inr
DNA
Enhancers
HMG proteins
DNA-binding
transactivators
co-
activators
TBP
TFIID CTD RNA
polymerase II
complex
Mediator
(a)
Transcription
TATAUAS Inr
Enhancers
TBP
TFIID
Mediator
Repressor
(b)
FIGURE 28–27 Eukaryotic promoters and regulatory proteins. RNA
polymerase II and its associated general transcription factors form a
preinitiation complex at the TATA box and Inr site of the cognate pro-
moters, a process facilitated by DNA-binding transactivators, acting
through TFIID and/or mediator. (a) A composite promoter with typi-
cal sequence elements and protein complexes found in both yeast and
higher eukaryotes. The carboxyl-terminal domain (CTD) of Pol II (see
Fig. 26–9) is an important point of interaction with mediator and other
protein complexes. Not shown are the protein complexes required for
histone acetylation and chromatin remodeling. For the DNA-binding
transactivators, DNA-binding domains are shown in green, activation
domains in pink. The interactions symbolized by blue arrows are dis-
cussed in the text. (b) A wide variety of eukaryotic transcriptional re-
pressors function by a range of mechanisms. Some bind directly to
DNA, displacing a protein complex required for activation; others in-
teract with various parts of the transcription or activation complexes
to prevent activation. Possible points of interaction are indicated with
red arrows.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1105 mac76 mac76:385_reb:
The script can change from one promoter to an-
other, but most promoters seem to require a precisely
ordered assembly of components to initiate transcrip-
tion. The assembly process is not always fast. At some
genes it may take minutes; at certain genes in higher
eukaryotes the process can take days.
Reversible Transcriptional Activation Although rarer, some
eukaryotic regulatory proteins that bind to Pol II pro-
moters can act as repressors, inhibiting the formation
of active preinitiation complexes (Fig. 28–27b). Some
transactivators can adopt different conformations, en-
abling them to serve as transcriptional activators or re-
pressors. For example, some steroid hormone receptors
(described later) function in the nucleus as DNA-
binding transactivators, stimulating the transcription of
certain genes when a particular steroid hormone signal
is present. When the hormone is absent, the receptor
proteins revert to a repressor conformation, prevent-
ing the formation of preinitiation complexes. In some
cases this repression involves interaction with histone
deacetylases and other proteins that help restore the
surrounding chromatin to its transcriptionally inactive
state.
The Genes of Galactose Metabolism in Yeast Are
Subject to Both Positive and Negative Regulation
Some of the general principles described above can be
illustrated by one well-studied eukaryotic regulatory
circuit (Fig. 28–28). The enzymes required for the im-
portation and metabolism of galactose in yeast are en-
coded by genes scattered over several chromosomes
(Table 28–3). Each of the GAL genes is transcribed sep-
arately, and yeast cells have no operons like those in
bacteria. However, all the GAL genes have similar pro-
moters and are regulated coordinately by a common set
of proteins. The promoters for the GAL genes consist
of the TATA box and Inr sequences, as well as an up-
stream activator sequence (UAS
G
) recognized by a
DNA-binding transcriptional activator known as Gal4
protein (Gal4p). Regulation of gene expression by galac-
tose entails an interplay between Gal4p and two other
proteins, Gal80p and Gal3p (Fig. 28–28). Gal80p forms
a complex with Gal4p, preventing Gal4p from function-
ing as an activator of the GAL promoters. When galac-
tose is present, it binds Gal3p, which then interacts with
Gal80p, allowing Gal4p to function as an activator at the
various GAL promoters.
Other protein complexes also have a role in acti-
vating transcription of the GAL genes. These may in-
clude the SAGA complex for histone acetylation, the
SWI/SNF complex for nucleosome remodeling, and the
mediator complex. Figure 28–29 provides an idea of the
complexity of protein interactions in the overall process
of transcriptional activation in eukaryotic cells.
Glucose is the preferred carbon source for yeast, as
it is for bacteria. When glucose is present, most of the
GAL genes are repressed—whether galactose is present
or not. The GAL regulatory system described above is
effectively overridden by a complex catabolite repres-
sion system that includes several proteins (not depicted
in Fig. 28–29).
DNA-Binding Transactivators Have a
Modular Structure
DNA-binding transactivators typically have a distinct
structural domain for specific DNA binding and one or
more additional domains for transcriptional activation
or for interaction with other regulatory proteins. Inter-
action of two regulatory proteins is often mediated by
domains containing leucine zippers (Fig. 28–14) or helix-
loop-helix motifs (Fig. 28–15). We consider here three
Chapter 28 Regulation of Gene Expression1106
TATA
UAS
G
Inr
TBP
RNA
polymerase II
complex
Gal4p
Gal3p
Gal80p
Intermediary complex
(TFIID or mediator)
TATA Inr
TBP
UASG
Intermediary complex
Gal3p
+
galactose
HMG
proteins
0FIGURE 28–28 Regulation of transcription at genes of galactose
metabolism in yeast. Galactose is imported into the cell and converted
to galactose 6-phosphate by a pathway involving six enzymes whose
genes are scattered over three chromosomes (see Table 28–3). Tran-
scription of these genes is regulated by the combined actions of the
proteins Gal4p, Gal80p, and Gal3p, with Gal4p playing the central
role of DNA-binding transactivator. The Gal4p-Gal80p complex is in-
active in gene activation. Binding of galactose to Gal3p and its inter-
action with Gal80p produce a conformational change in Gal80p that
allows Gal4p to function in transcription activation.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1106 mac76 mac76:385_reb:
distinct types of structural domains used in activation
by DNA-binding transactivators (Fig. 28–30a): Gal4p,
Sp1, and CTF1.
Gal4p contains a zinc fingerlike structure in its
DNA-binding domain, near the amino terminus; this do-
main has six Cys residues that coordinate two Zn
2H11001
. The
protein functions as a homodimer (with dimerization
mediated by interactions between two coiled coils) and
binds to UAS
G
, a palindromic DNA sequence about 17 bp
long. Gal4p has a separate activation domain with many
acidic amino acid residues. Experiments that substitute
a variety of different peptide sequences for the acidic
activation domain of Gal4p suggest that the acidic na-
ture of this domain is critical to its function, although
its precise amino acid sequence can vary considerably.
Sp1 (M
r
80,000) is a DNA-binding transactivator
for a large number of genes in higher eukaryotes. Its
DNA binding site, the GC box (consensus sequence
28.3 Regulation of Gene Expression in Eukaryotes 1107
FIGURE 28–29 Protein complexes involved in transcription activa-
tion of a group of related eukaryotic genes. The GAL system illustrates
the complexity of this process, but not all these protein complexes are
yet known to affect GAL gene transcription. Note that many of the
complexes (such as SWI/SNF, GCN5-ADA2-ADA3, and mediator) af-
fect the transcription of many genes. The complexes assemble step-
wise. First the DNA-binding transactivators bind, then the additional
protein complexes needed to remodel the chromatin and allow tran-
scription to begin.
Relative protein expression
Chromosomal Protein size
in different carbon sources
Protein function location (number of residues) Glucose Glycerol Galactose
Regulated genes
GAL1 Galactokinase II 528 H11002 H11002 H11001H11001H11001
GAL2 Galactose permease XII 574 H11002 H11002 H11001H11001H11001
PGM2 Phosphoglucomutase XIII 569 H11001H11001 H11001H11001
GAL7 Galactose 1-phosphate
uridylyltransferase II 365 H11002 H11002 H11001H11001H11001
GAL10 UDP-glucose 4-epimerase II 699 H11002 H11002 H11001H11001H11001
MEL1 H9251-Galactosidase II 453 H11002H11001 H11001H11001
Regulatory genes
GAL3 Inducer IV 520 H11002H11001 H11001H11001
GAL4 Transcriptional activator XVI 881 H11001/H11002H11001 H11001
GAL80 Transcriptional inhibitor XIII 435 H11001H11001 H11001H11001
TABLE 28–3 Genes of Galactose Metabolism in Yeast
Source: Adapted from Reece, R. & Platt, A. (1997) Signaling activation and repression of RNA polymerase II transcription in yeast. Bioessays
19, 1001–1010.
TFIIA
TFIIA
TATA
TBP
TBP,
UAS
G
TFIIA
Mediator
SWI/
SNF
UAS
G
RNA
polymerase II
complex
TFIIB
TFIIF
TFIIE
TFIIH
TBP
TATA
UAS
G
Gal4p
GCN5-ADA2-ADA3
HMG proteins
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1107 mac76 mac76:385_reb:
GGGCGG), is usually quite near the TATA box. The
DNA-binding domain of the Sp1 protein is near its car-
boxyl terminus and contains three zinc fingers. Two
other domains in Sp1 function in activation, and are no-
table in that 25% of their amino acid residues are Gln.
A wide variety of other activator proteins also have these
glutamine-rich domains.
CCAAT-binding transcription factor 1 (CTF1) be-
longs to a family of DNA-binding transactivators that
bind a sequence called the CCAAT site (its consensus
sequence is TGGN
6
GCCAA, where N is any nucleotide).
The DNA-binding domain of CTF1 contains many basic
amino acid residues, and the binding region is probably
arranged as an H9251 helix. This protein has neither a helix-
turn-helix nor a zinc finger motif; its DNA-binding mech-
anism is not yet clear. CTF1 has a proline-rich acti-
vation domain, with Pro accounting for more than 20%
of the amino acid residues.
The discrete activation and DNA-binding domains
of regulatory proteins often act completely independ-
ently, as has been demonstrated in “domain-swapping”
experiments. Genetic engineering techniques (Chap-
ter 9) can join the proline-rich activation domain of
CTF1 to the DNA-binding domain of Sp1 to create a pro-
tein that, like normal Sp1, binds to GC boxes on the DNA
and activates transcription at a nearby promoter (as in
Fig. 28–30b). The DNA-binding domain of Gal4p has
similarly been replaced experimentally with the DNA-
binding domain of the prokaryotic LexA repressor (of
the SOS response; Fig. 28–22). This chimeric protein
neither binds at UAS
G
nor activates the yeast GAL genes
(as would normal Gal4p) unless the UAS
G
sequence in
the DNA is replaced by the LexA recognition site.
Eukaryotic Gene Expression Can Be Regulated
by Intercellular and Intracellular Signals
The effects of steroid hormones (and of thyroid and
retinoid hormones, which have the same mode of ac-
tion) provide additional well-studied examples of the
modulation of eukaryotic regulatory proteins by direct
interaction with molecular signals (see Fig. 12–40). Un-
like other types of hormones, steroid hormones do not
have to bind to plasma membrane receptors. Instead,
they can interact with intracellular receptors that are
themselves transcriptional transactivators. Steroid hor-
mones too hydrophobic to dissolve readily in the blood
(estrogen, progesterone, and cortisol, for example)
travel on specific carrier proteins from their point of re-
lease to their target tissues. In the target tissue, the hor-
mone passes through the plasma membrane by simple
diffusion and binds to its specific receptor protein in the
nucleus. The hormone-receptor complex acts by bind-
ing to highly specific DNA sequences called hormone
response elements (HREs), thereby altering gene ex-
pression. Hormone binding triggers changes in the con-
formation of the receptor proteins so that they become
capable of interacting with additional transcription fac-
tors. The bound hormone-receptor complex can either
enhance or suppress the expression of adjacent genes.
The DNA sequences (HREs) to which hormone-
receptor complexes bind are similar in length and
arrangement, but differ in sequence, for the various
steroid hormones. Each receptor has a consensus HRE
sequence (Table 28–4) to which the hormone-receptor
complex binds well, with each consensus consisting of
two six-nucleotide sequences, either contiguous or sep-
arated by three nucleotides, in tandem or in a palindromic
arrangement. The hormone receptors have a highly
conserved DNA-binding domain with two zinc fingers
Chapter 28 Regulation of Gene Expression1108
TFIID
TATA
UAS
G
CCAAT
CTFI
INR
DNA
(a)
HMG proteins
TFIIH
TBP
GC
P
P
P
Sp1
QQQ
–––
TFIID
TATA INR
DNA
(b)
TFIIH
TBP
CTFI
GC
Sp1
PPP
Gal4p
FIGURE 28–30 DNA-binding transactivators. (a) Typical DNA-bind-
ing transactivators such as CTF1, Gal4p, and Sp1 have a DNA-bind-
ing domain and an activation domain. The nature of the activation do-
main is indicated by symbols: H11002H11002H11002, acidic; Q Q Q, glutamine-rich;
P P P, proline-rich. Some or all of these proteins may activate tran-
scription by interacting with intermediary complexes such as TFIID or
mediator. Note that the binding sites illustrated here are not generally
found together near a single gene. (b) A chimeric protein containing
the DNA-binding domain of Sp1 and the activation domain of CTF1
activates transcription if a GC box is present.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1108 mac76 mac76:385_reb:
(Fig. 28–31). The hormone-receptor complex binds to
the DNA as a dimer, with the zinc finger domains of each
monomer recognizing one of the six-nucleotide se-
quences. The ability of a given hormone to act through
the hormone-receptor complex to alter the expression
of a specific gene depends on the exact sequence of the
HRE, its position relative to the gene, and the number
of HREs associated with the gene.
Unlike the DNA-binding domain, the ligand-binding
region of the receptor protein—always at the carboxyl
terminus—is quite specific to the particular receptor. In
the ligand-binding region, the glucocorticoid receptor is
only 30% similar to the estrogen receptor and 17% sim-
ilar to the thyroid hormone receptor. The size of the lig-
and-binding region varies dramatically; in the vitamin D
receptor it has only 25 amino acid residues, whereas in
the mineralocorticoid receptor it has 603 residues. Mu-
tations that change one amino acid in these regions can
result in loss of responsiveness to a specific hormone.
Some humans unable to respond to cortisol, testos-
terone, vitamin D, or thyroxine have mutations of this
type.
Regulation Can Result from Phosphorylation
of Nuclear Transcription Factors
We noted in Chapter 12 that the effects of insulin on
gene expression are mediated by a series of steps lead-
ing ultimately to the activation of a protein kinase in the
nucleus that phosphorylates specific DNA-binding pro-
teins and thereby alters their ability to act as tran-
scription factors (see Fig. 12–6). This general mecha-
nism mediates the effects of many nonsteroid hormones.
For example, the H9252-adrenergic pathway that leads to el-
evated levels of cytosolic cAMP, which acts as a second
messenger in eukaryotes as well as in prokaryotes (see
Figs 12–12, 28–18), also affects the transcription of a
set of genes, each of which is located near a specific
DNA sequence called a cAMP response element (CRE).
The catalytic subunit of protein kinase A, released when
cAMP levels rise (see Fig. 12–15), enters the nucleus
and phosphorylates a nuclear protein, the CRE-binding
protein (CREB). When phosphorylated, CREB binds to
CREs near certain genes and acts as a transcription fac-
tor, turning on the expression of these genes.
Many Eukaryotic mRNAs Are Subject
to Translational Repression
Regulation at the level of translation assumes a much
more prominent role in eukaryotes than in bacteria and
is observed in a range of cellular situations. In contrast to
the tight coupling of transcription and translation in bac-
teria, the transcripts generated in a eukaryotic nucleus
28.3 Regulation of Gene Expression in Eukaryotes 1109
Receptor Consensus sequence bound
*
Androgen GG(A/T)ACAN
2
TGTTCT
Glucocorticoid GGTACAN
3
TGTTCT
Retinoic acid (some) AGGTCAN
5
AGGTCA
Vitamin D AGGTCAN
3
AGGTCA
Thyroid hormone AGGTCAN
3
AGGTCA
RX
?
AGGTCANAGGTCANAGGTCANAGGTCA
*
N represents any nucleotide.
?
Forms a dimer with the retinoic acid receptor or vitamin D receptor.
TABLE 28–4 Hormone Response Elements (HREs)
Bound by Steroid-Type Hormone Receptors
60
N
K
D
I
T
C
C
Zn
R
R
K
S
C
C
MKETRY KAFFKRSIQGHNDYM RLRKCYEVGMMKGGIRKDRRGG
Y
G
S
A
Y
D
N
C
V
A
C
Zn
H
Y
G
V
W
S
C
E
G
C
Q
N
T
A
P
Q
A
COO
H11002
H
3
N
H11001
20
10
50
70 8030 40
Hormone binding
(variable sequence
and length)
DNA binding
(66–68 residues,
highly
conserved)
Transcription
activation
(variable sequence
and length)
FIGURE 28–31 Typical steroid hormone receptors.
These receptor proteins have a binding site for the
hormone, a DNA-binding domain, and a region that
activates transcription of the regulated gene. The highly
conserved DNA-binding domain has two zinc fingers.
The sequence shown here is that for the estrogen
receptor, but the residues in bold type are common to
all steroid hormone receptors.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1109 mac76 mac76:385_reb:
must be processed and transported to the cytoplasm be-
fore translation. This can impose a significant delay on
the appearance of a protein. When a rapid increase in
protein production is needed, a translationally repressed
mRNA already in the cytoplasm can be activated for
translation without delay. Translational regulation may
play an especially important role in regulating certain
very long eukaryotic genes (a few are measured in the
millions of base pairs), for which transcription and
mRNA processing can require many hours. Some genes
are regulated at both the transcriptional and transla-
tional stages, with the latter playing a role in the fine-
tuning of cellular protein levels. In some anucleate cells,
such as reticulocytes (immature erythrocytes), tran-
scriptional control is entirely unavailable and transla-
tional control of stored mRNAs becomes essential. As
described below, translational controls can also have
spatial significance during development, when the reg-
ulated translation of prepositioned mRNAs creates a
local gradient of the protein product.
Eukaryotes have at least three main mechanisms of
translational regulation.
1. Initiation factors are subject to phosphorylation by
a number of protein kinases. The phosphorylated
forms are often less active and cause a general
depression of translation in the cell.
2. Some proteins bind directly to mRNA and act as
translational repressors, many of them binding at
specific sites in the 3H11032 untranslated region
(3H11032UTR). So positioned, these proteins interact
with other translation initiation factors bound to
the mRNA or with the 40S ribosomal subunit to
prevent translation initiation (Fig. 28–32; compare
this with Fig. 27–22).
3. Binding proteins, present in eukaryotes from yeast
to mammals, disrupt the interaction between
eIF4E and eIF4G (see Fig. 27–22). The mammalian
versions are known as 4E-BPs (eIF4E binding
proteins). When cell growth is slow, these proteins
limit translation by binding to the site on eIF4E
that normally interacts with eIF4G. When cell
growth resumes or increases in response to
growth factors or other stimuli, the binding
proteins are inactivated by protein kinase–
dependent phosphorylation.
The variety of translational regulation mechanisms pro-
vides flexibility, allowing focused repression of a few
mRNAs or global regulation of all cellular translation.
Translational regulation has been particularly well
studied in reticulocytes. One such mechanism in these
cells involves eIF2, the initiation factor that binds to the
initiator tRNA and conveys it to the ribosome; when
Met-tRNA has bound to the P site, the factor eIF2B
binds to eIF2, recycling it with the aid of GTP binding
and hydrolysis. The maturation of reticulocytes includes
destruction of the cell nucleus, leaving behind a plasma
membrane packed with hemoglobin. Messenger RNAs
deposited in the cytoplasm before the loss of the nu-
cleus allow for the replacement of hemoglobin. When
reticulocytes become deficient in iron or heme, the
translation of globin mRNAs is repressed. A protein ki-
nase called HCR (hemin-controlled repressor) is acti-
vated, catalyzing the phosphorylation of eIF2. In its
phosphorylated form, eIF2 forms a stable complex with
eIF2B that sequesters the eIF2, making it unavailable
for participation in translation. In this way, the reticu-
locyte coordinates the synthesis of globin with the avail-
ability of heme.
Many additional examples of translational regula-
tion have been found in studies of the development of
multicellular organisms, as discussed in more detail
below.
Posttranscriptional Gene Silencing Is Mediated
by RNA Interference
In higher eukaryotes, including nematodes, fruit flies,
plants, and mammals, a class of small RNAs has been
discovered that mediates the silencing of particular
genes. The RNAs function by interacting with mRNAs,
often in the 3H11032UTR, resulting in either mRNA degrada-
tion or translation inhibition. In either case, the mRNA,
and thus the gene that produces it, is silenced. This form
of gene regulation controls developmental timing in at
least some organisms. It is also used as a mechanism
to protect against invading RNA viruses (particularly
Chapter 28 Regulation of Gene Expression1110
FIGURE 28–32 Translational regulation of eukaryotic mRNA. One of
the most important mechanisms for translational regulation in eu-
karyotes involves the binding of translational repressors (RNA-binding
proteins) to specific sites in the 3H11032 untranslated region (3H11032UTR) of the
mRNA. These proteins interact with eukaryotic initiation factors or with
the ribosome (see Fig. 27–22) to prevent or slow translation.
A
A
A
A(A)
n
eIF4G
Translational repressors
eIF4E
AUG
eIF3
5H11032 cap
3H11032 poly(A)
binding
protein
40S Ribosomal subunit
3H11032 Untranslated
region (3H11032UTR)
8885d_c28_1110 2/19/04 7:43 AM Page 1110 mac76 mac76:385_reb:
important in plants, which lack an immune system) and
to control the activity of transposons. In addition, small
RNA molecules may play a critical (but still undefined)
role in the formation of heterochromatin.
The small RNAs are sometimes called micro-RNAs
(miRNAs). Many are present only transiently during
development, and these are sometimes referred to as
small temporal RNAs (stRNAs). Hundreds of different
miRNAs have been identified in higher eukaryotes. They
are transcribed as precursor RNAs about 70 nucleotides
long, with internally complementary sequences that
form hairpinlike structures (Fig. 28–33). The precursors
are cleaved by endonucleases to form short duplexes
about 20 to 25 nucleotides long. The best-characterized
nuclease goes by the delightfully suggestive name Dicer;
endonucleases in the Dicer family are widely distributed
in higher eukaryotes. One strand of the processed
miRNA is transferred to the target mRNA (or to a viral
or transposon RNA), leading to inhibition of translation
or degradation of the RNA (Fig. 28–33a).
This gene regulation mechanism has an interesting
and very useful practical side. If an investigator intro-
duces into an organism a duplex RNA molecule corre-
sponding in sequence to virtually any mRNA, the Dicer
endonuclease cleaves the duplex into short segments,
called small interfering RNAs (siRNAs). These bind to
the mRNA and silence it (Fig. 28–33b). The process is
known as RNA interference (RNAi). In plants, virtu-
ally any gene can be effectively shut down in this way.
In nematodes, simply introducing the duplex RNA into
the worm’s diet produces very effective suppression of
the target gene. The technique has rapidly become an
important tool in the ongoing efforts to study gene func-
tion, because it can disrupt gene function without cre-
ating a mutant organism. The procedure can be applied
to humans as well. Laboratory-produced siRNAs have
already been used to block HIV and poliovirus infections
in cultured human cells for a week or so at a time. Al-
though this work is in its infancy, the rapid progress
makes RNA interference a field to watch for future med-
ical advances.
Development Is Controlled by Cascades
of Regulatory Proteins
For sheer complexity and intricacy of coordination, the
patterns of gene regulation that bring about develop-
ment of a zygote into a multicellular animal or plant have
no peer. Development requires transitions in morphol-
ogy and protein composition that depend on tightly co-
ordinated changes in expression of the genome. More
genes are expressed during early development than in
any other part of the life cycle. For example, in the sea
urchin, an oocyte has about 18,500 different mRNAs,
compared with about 6,000 different mRNAs in the cells
of a typical differentiated tissue. The mRNAs in the
oocyte give rise to a cascade of events that regulate the
expression of many genes across both space and time.
Several animals have emerged as important model
systems for the study of development, because they are
easy to maintain in a laboratory and have relatively short
generation times. These include nematodes, fruit flies,
zebra fish, mice, and the plant Arabidopsis. This dis-
cussion focuses on the development of fruit flies. Our
understanding of the molecular events during develop-
ment of Drosophila melanogaster is particularly well
advanced and can be used to illustrate patterns and
principles of general significance.
The life cycle of the fruit fly includes complete
metamorphosis during its progression from an embryo
to an adult (Fig. 28–34). Among the most important
characteristics of the embryo are its polarity (the an-
terior and posterior parts of the animal are readily dis-
tinguished, as are its dorsal and ventral parts) and its
metamerism (the embryo body is made up of serially
repeating segments, each with characteristic features).
During development, these segments become organized
into a head, thorax, and abdomen. Each segment of the
adult thorax has a different set of appendages. Devel-
opment of this complex pattern is under genetic con-
trol, and a variety of pattern-regulating genes have been
28.3 Regulation of Gene Expression in Eukaryotes 1111
Precursor Duplex RNA
stRNA
Degradation
Silenced mRNA
Translation
inhibition
AAA(A)
n
siRNA
Dicer Dicer
(a) (b)
FIGURE 28–33 Gene silencing by RNA interference. (a) Small tem-
poral RNAs (stRNAs) are generated by Dicer-mediated cleavage of
longer precursors that fold to create duplex regions. The stRNAs then
bind to mRNAs, leading to degradation of mRNA or inhibition of trans-
lation. (b) Double-stranded RNAs can be constructed and introduced
into a cell. Dicer processes the duplex RNAs into small interfering
RNAs (siRNAs), which interact with the target mRNA. Again, the mRNA
is either degraded or its translation inhibited.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1111 mac76 mac76:385_reb:
discovered that dramatically affect the organization of
the body.
The Drosophila egg, along with 15 nurse cells, is
surrounded by a layer of follicle cells (Fig. 28–35). As
the egg cell forms (before fertilization), mRNAs and pro-
teins originating in the nurse and follicle cells are de-
posited in the egg cell, where some play a critical role
in development. Once a fertilized egg is laid, its nucleus
divides and the nuclear descendants continue to divide
in synchrony every 6 to 10 min. Plasma membranes are
not formed around the nuclei, which are distributed
within the egg cytoplasm (or syncytium). Between the
eighth and eleventh rounds of nuclear division, the nu-
clei migrate to the outer layer of the egg, forming a
monolayer of nuclei surrounding the common yolk-rich
cytoplasm; this is the syncytial blastoderm. After a few
additional divisions, membrane invaginations surround
the nuclei to create a layer of cells that form the cellu-
lar blastoderm. At this stage, the mitotic cycles in the
various cells lose their synchrony. The developmental
fate of the cells is determined by the mRNAs and pro-
teins originally deposited in the egg by the nurse and
follicle cells.
Proteins that, through changes in local concentra-
tion or activity, cause the surrounding tissue to take up
a particular shape or structure are sometimes referred
to as morphogens; they are the products of pattern-
regulating genes. As defined by Christiane Nüsslein-
Volhard, Edward B. Lewis, and Eric F. Wieschaus, three
major classes of pattern-regulating genes—maternal,
segmentation, and homeotic genes—function in suc-
cessive stages of development to specify the basic fea-
tures of the Drosophila embryo’s body. Maternal
genes are expressed in the unfertilized egg, and the
resulting maternal mRNAs remain dormant until fer-
tilization. These provide most of the proteins needed in
very early development, until the cellular blastoderm is
formed. Some of the proteins encoded by maternal
mRNAs direct the spatial organization of the develop-
ing embryo at early stages, establishing its polarity.
Segmentation genes, transcribed after fertilization,
direct the formation of the proper number of body seg-
ments. At least three subclasses of segmentation genes
act at successive stages: gap genes divide the devel-
oping embryo into several broad regions, and pair-rule
genes together with segment polarity genes define
14 stripes that become the 14 segments of a normal em-
bryo. Homeotic genes are expressed still later; they
specify which organs and appendages will develop in
particular body segments.
The many regulatory genes in these three classes
direct the development of an adult fly, with a head, tho-
rax, and abdomen, with the proper number of segments,
and with the correct appendages on each segment. Al-
though embryogenesis takes about a day to complete,
all these genes are activated during the first four hours.
Some mRNAs and proteins are present for only a few
minutes at specific points during this period. Some of
the genes code for transcription factors that affect the
expression of other genes in a kind of developmental
cascade. Regulation at the level of translation also oc-
curs, and many of the regulatory genes encode transla-
tional repressors, most of which bind to the 3H11032UTR of
the mRNA (Fig. 28–32). Because many mRNAs are
Chapter 28 Regulation of Gene Expression1112
Late embryo—segmented
Pupa
Larva
Adult
Day 9
T
1
T
2
T
3
A
1
A
2
A
3
A
4
A
5
A
6
A
7
Oocyte
Early embryo—
no segments
Day 0 Egg
Day 1
hatching
Day 5
pupation
fertilization
embryonic
development
metamorphosis
separated by molts
three larval stages,
1 mm
AbdomenThoraxHead
FIGURE 28–34 Life cycle of the fruit fly Drosophila
melanogaster. Drosophila undergoes a complete
metamorphosis, which means that the adult insect is
radically different in form from its immature stages, a
transformation that requires extensive alterations
during development. By the late embryonic stage,
segments have formed, each containing specialized
structures from which the various appendages and
other features of the adult fly will develop.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1112 mac76 mac76:385_reb:
Egg
Oocyte
nanos
mRNA
bicoid mRNA
Nurse
cells
PosteriorAnterior
Nurse cells
Follicle cells
Oocyte
Egg chamber
Follicle cells
Oocyte
Fertilized
egg
nuclear
divisions
nuclear
migration
membrane
invagination
Syncytium
Syncytial
blastoderm
Cellular
blastoderm
Pole cells
fertilization
deposited in the egg long before their translation is
required, translational repression provides an especially
important avenue for regulation in developmental
pathways.
Maternal Genes Some maternal genes are expressed
within the nurse and follicle cells, and some in the egg
itself. Within the unfertilized Drosophila egg, the mater-
nal gene products establish two axes—anterior-posterior
and dorsal-ventral—and thus define which regions of the
radially symmetric egg will develop into the head and ab-
domen and the top and bottom of the adult fly. A key
event in very early development is establishment of
mRNA and protein gradients along the body axes. Some
maternal mRNAs have protein products that diffuse
through the cytoplasm to create an asymmetric distribu-
tion in the egg. Different cells in the cellular blastoderm
therefore inherit different amounts of these proteins,
setting the cells on different developmental paths. The
products of the maternal mRNAs include transcriptional
activators or repressors as well as translational rep-
ressors, all regulating the expression of other pattern-
regulating genes. The resulting specific patterns and
sequences of gene expression therefore differ between
cell lineages, ultimately orchestrating the development of
each adult structure.
The anterior-posterior axis in Drosophila is defined
at least in part by the products of the bicoid and nanos
genes. The bicoid gene product is a major anterior
morphogen, and the nanos gene product is a major
posterior morphogen. The mRNA from the bicoid gene
is synthesized by nurse cells
and deposited in the unfertil-
ized egg near its anterior pole.
Nüsslein-Volhard found that
this mRNA is translated soon
after fertilization, and the Bi-
coid protein diffuses through
28.3 Regulation of Gene Expression in Eukaryotes 1113
FIGURE 28–35 Early development in Drosophila. During develop-
ment of the egg, maternal mRNAs (including the bicoid and nanos
gene transcripts, discussed in the text) and proteins are deposited in
the developing oocyte (unfertilized egg cell) by nurse cells and folli-
cle cells. After fertilization, the two nuclei of the fertilized egg divide
in synchrony within the common cytoplasm (syncytium), then migrate
to the periphery. Membrane invaginations surround the nuclei to cre-
ate a monolayer of cells at the periphery; this is the cellular blasto-
derm stage. During the early nuclear divisions, several nuclei at the
far posterior become pole cells, which later become the germ-line
cells.
Christiane Nüsslein-Volhard
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1113 mac76 mac76:385_reb:
the cell to create, by the seventh nuclear division, a
concentration gradient radiating out from the anterior
pole (Fig. 28–36a). The Bicoid protein is a transcription
factor that activates the expression of a number of seg-
mentation genes; the protein contains a homeodomain
(p. 1090). Bicoid is also a translational repressor that in-
activates certain mRNAs. The amounts of Bicoid protein
in various parts of the embryo affect the subsequent ex-
pression of a number of other genes in a threshold-
dependent manner. Genes are transcriptionally activated
or translationally repressed only where the Bicoid protein
concentration exceeds the threshold. Changes in the
shape of the Bicoid concentration gradient have dramatic
effects on the body pattern. Lack of Bicoid protein results
in development of an embryo with two abdomens but nei-
ther head nor thorax (Fig. 28–36b); however, embryos
without Bicoid will develop normally if an adequate
amount of bicoid mRNA is injected into the egg at the ap-
propriate end. The nanos gene has an analogous role, but
its mRNA is deposited at the posterior end of the egg and
the anterior-posterior protein gradient peaks at the pos-
terior pole. The Nanos protein is a translational repressor.
A broader look at the effects of maternal genes re-
veals the outline of a developmental circuit. In addition
to the bicoid and nanos mRNAs, which are deposited
in the egg asymmetrically, a number of other maternal
mRNAs are deposited uniformly throughout the egg cy-
toplasm. Three of these mRNAs encode the Pumilio,
Hunchback, and Caudal proteins, all affected by nanos
and bicoid (Fig. 28–37). Caudal and Pumilio are in-
volved in development of the posterior end of the fly.
Caudal is a transcriptional activator with a home-
odomain; Pumilio is a translational repressor. Hunch-
back protein plays an important role in the development
of the anterior end and is also a transcriptional regula-
tor of a variety of genes, in some cases a positive regu-
lator, in other cases negative. Bicoid suppresses trans-
lation of caudal in the anterior and also acts as a
transcriptional activator of hunchback in the cellular
blastoderm. Because hunchback is expressed both from
maternal mRNAs and from genes in the developing egg,
it is considered both a maternal and a segmentation
gene. The result of the activities of Bicoid is an increased
concentration of Hunchback at the anterior end of the
Chapter 28 Regulation of Gene Expression1114
Double-posterior larva
bcd
H11002
/bcd
H11002
egg
(b)
Relative concentration of Bicoid (Bcd) protein
100
0
0 10050
Distance from anterior end
(% of egg length)
bcd
H11002
/ bcd
H11002
mutant
Normal larva
Normal egg
(a)
100
Relative concentration of Bicoid (Bcd) protein
0
0 10050
Distance from anterior end
(% of egg length)
Normal
FIGURE 28–36 Distribution of a maternal gene product in a
Drosophila egg. (a) Micrograph of an immunologically stained egg,
showing distribution of the bicoid (bcd) gene product. The graph meas-
ures stain intensity. This distribution is essential for normal develop-
ment of the anterior structures of the animal. (b) If the bcd gene is not
expressed by the mother (bcd
H11002
/bcd
H11002
mutant) and thus no bicoid
mRNA is deposited in the egg, the resulting embryo has two posteri-
ors (and soon dies).
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1114 mac76 mac76:385_reb:
egg. The Nanos and Pumilio proteins act as translational
repressors of hunchback, suppressing synthesis of its
protein near the posterior end of the egg. Pumilio does
not function in the absence of the Nanos protein, and
the gradient of Nanos expression confines the activity
of both proteins to the posterior region. Translational
repression of the hunchback gene leads to degradation
of hunchback mRNA near the posterior end. However,
lack of Bicoid protein in the posterior leads to expres-
sion of caudal. In this way, the Hunchback and Caudal
proteins become asymmetrically distributed in the egg.
Segmentation Genes Gap genes, pair-rule genes, and
segment polarity genes, three subclasses of segmenta-
tion genes in Drosophila, are activated at successive
stages of embryonic development. Expression of the gap
genes is generally regulated by the products of one or
more maternal genes. At least some of the gap genes
encode transcription factors that affect the expression
of other segmentation or (later) homeotic genes.
One well-characterized segmentation gene is fushi
tarazu ( ftz), of the pair-rule subclass. When ftz is
deleted, the embryo develops 7 segments instead of the
normal 14, each segment twice the normal width. The
Fushi-tarazu protein (Ftz) is a transcriptional activator
with a homeodomain. The mRNAs and proteins derived
from the normal ftz gene accumulate in a striking pat-
tern of seven stripes that encircle the posterior two-
thirds of the embryo (Fig. 28–38). The stripes demar-
cate the positions of segments that develop later; these
segments are eliminated if ftz function is lost. The Ftz
protein and a few similar regulatory proteins directly or
indirectly regulate the expression of vast numbers of
genes in the continuing developmental cascade.
28.3 Regulation of Gene Expression in Eukaryotes 1115
Anterior
Posterior
Localized
bicoid
mRNA
Localized
nanos
mRNA
Bicoid
protein
Nanos
protein
translation of mRNA and
diffusion of product creates
concentration gradients
translation suppression/activation
of uniformly distributed mRNAs
reflects gradient of regulator
Caudal protein
caudal mRNA
hunchback mRNA
pumilio mRNA
Hunchback protein
Pumilio protein
Egg cytoplasm
FIGURE 28–37 Regulatory circuits of the anterior-posterior axis in
a Drosophila egg. The bicoid and nanos mRNAs are localized near
the anterior and posterior poles, respectively. The caudal, hunchback,
and pumilio mRNAs are distributed throughout the egg cytoplasm. The
gradients of Bicoid (Bcd) and Nanos proteins lead to accumulation of
Hunchback protein in the anterior and Caudal protein in the poste-
rior of the egg. Because Pumilio protein requires Nanos protein for its
activity as a translational repressor of hunchback, it functions only at
the posterior end.
(a)
(c)
(b) 100 mH9262
FIGURE 28–38 Distribution of the fushi tarazu (ftz) gene product in
early Drosophila embryos. (a) In the normal embryo, the gene prod-
uct can be detected in seven bands around the circumference of the
embryo (shown schematically). These bands (b) appear as dark spots
(generated by a radioactive label) in a cross-sectional autoradiograph
and (c) demarcate the anterior margins of the segments in the late em-
bryo (marked in red).
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1115 mac76 mac76:385_reb:
Homeotic Genes Loss of homeotic genes by mutation or
deletion causes the appearance of a normal appendage
or body structure at an inappropriate body position. An
important example is the ultrabithorax (ubx) gene.
When Ubx function is lost, the first abdominal segment
develops incorrectly, having the structure of the third
thoracic segment. Other known homeotic mutations
cause the formation of an extra set of wings, or two legs
at the position in the head where the antennae are nor-
mally found (Fig. 28–39).
The homeotic genes often span long regions of DNA.
The ubx gene, for example, is 77,000 bp long. More than
73,000 bp of this gene are in introns, one of which is
more than 50,000 bp long. Transcription of the ubx gene
takes nearly an hour. The delay this imposes on ubx
gene expression is believed to be a timing mechanism
involved in the temporal regulation of subsequent steps
in development. The Ubx protein is yet another tran-
scriptional activator with a homeodomain (Fig. 28–13).
Many of the principles of development outlined
above apply to eukaryotes from nematodes to humans.
Some of the regulatory proteins themselves are con-
served. For example, the products of the homeobox-
containing genes HOX 1.1 in mouse and antennapedia
in fruit fly differ in only one amino acid residue. Of
course, although the molecular regulatory mechanisms
may be similar, many of the ultimate developmental
events are not conserved (humans do not have wings
or antennae). The discovery of structural determinants
with identifiable molecular functions is the first step in
understanding the molecular events underlying devel-
opment. As more genes and their protein products are
discovered, the biochemical side of this vast puzzle will
be elucidated in increasingly rich detail.
SUMMARY 28.3 Regulation of Gene Expression
in Eukaryotes
■ In eukaryotes, positive regulation is more
common than negative regulation, and
transcription is accompanied by large changes
in chromatin structure. Promoters for Pol II
typically have a TATA box and Inr sequence, as
well as multiple binding sites for DNA-binding
transactivators. The latter sites, sometimes
located hundreds or thousands of base pairs
away from the TATA box, are called upstream
activator sequences in yeast and enhancers in
higher eukaryotes.
■ Large complexes of proteins are generally
required to regulate transcriptional activity.
The effects of DNA-binding transactivators on
Pol II are mediated by coactivator protein
complexes such as TFIID or mediator. The
modular structures of the transactivators have
distinct activation and DNA-binding domains.
Other protein complexes, including histone
acetyltransferases such as GCN5-ADA2-ADA3
and ATP-dependent complexes such as
SWI/SNF and NURF, reversibly remodel
chromatin structure.
■ Hormones affect the regulation of gene
expression in one of two ways. Steroid
hormones interact directly with intracellular
receptors that are DNA-binding regulatory
proteins; binding of the hormone has either
positive or negative effects on the transcription
of genes targeted by the hormone. Nonsteroid
Chapter 28 Regulation of Gene Expression1116
(b)
(a)
(c)
(d)
FIGURE 28–39 Effects of mutations in homeotic genes in Drosophila. (a) Normal head.
(b) Homeotic mutant (antennapedia) in which antennae are replaced by legs. (c) Normal
body structure. (d) Homeotic mutant (bithorax) in which a segment has developed incor-
rectly to produce an extra set of wings.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1116 mac76 mac76:385_reb:
hormones bind to cell-surface receptors,
triggering a signaling pathway that can lead to
phosphorylation of a regulatory protein,
affecting its activity.
■ Development of a multicellular organism
presents the most complex regulatory
challenge. The fate of cells in the early embryo
is determined by establishment of
anterior-posterior and dorsal-ventral gradients
of proteins that act as transcriptional
transactivators or translational repressors,
regulating the genes required for the
development of structures appropriate to a
particular part of the organism. Sets of
regulatory genes operate in temporal and
spatial succession, transforming given areas of
an egg cell into predictable structures in the
adult organism.
Chapter 28 Further Reading 1117
Key Terms
housekeeping genes 1082
induction 1082
repression 1082
specificity factor 1083
repressor 1083
activator 1083
operator 1083
negative regulation 1084
positive regulation 1084
operon 1085
helix-turn-helix 1088
zinc finger 1088
homeodomain 1090
homeobox 1090
leucine zipper 1090
basic helix-loop-helix 1090
catabolite repression
1093
cAMP receptor protein
(CRP) 1093
regulon 1094
transcription attenuation
1094
translational
repressor 1098
stringent response 1098
phase variation 1100
hypersensitive sites 1102
chromatin remodeling
1103
enhancers 1104
upstream activator se-
quences (UASs) 1104
basal transcription
factors 1104
DNA-binding
transactivators 1104
coactivators 1104
TATA-binding protein
(TBP) 1104
mediator 1105
hormone response ele-
ments (HREs) 1108
RNA interference (RNAi)
1111
polarity 1111
metamerism 1111
morphogens 1112
maternal genes 1112
maternal mRNAs 1112
segmentation genes 1112
gap genes 1112
pair-rule genes 1112
segment polarity genes
1112
homeotic genes 1112
Terms in bold are defined in the glossary.
Further Reading
General
Hershey, J.W.B., Mathews, M.B., & Sonenberg, N. (1996)
Translational Control, Cold Spring Harbor Laboratory Press, Cold
Spring Harbor, NY.
Many detailed reviews cover all aspects of this topic.
Müller-Hill, B. (1996) The lac Operon: A Short History of a
Genetic Paradigm, Walter de Gruyter, New York.
An excellent detailed account of the investigation of this
important system.
Neidhardt, F.C. (ed.) (1996) Escherichia coli and Salmonella
typhimurium, 2nd edn, Vol. 1: Cellular and Molecular Biology
(Curtiss, R., Ingraham, J.L., Lin, E.C.C., Magasanik, B., Low, K.B.,
Reznikoff, W.S., Riley, M., Schaechter, M., & Umbarger, H.E., vol.
eds), American Society for Microbiology, Washington, DC.
An excellent source for reviews of many bacterial operons. The
Web-based version, EcoSal, is updated regularly.
Pabo, C.O. & Sauer, R.T. (1992) Transcription factors: structural
factors and principles of DNA recognition. Annu. Rev. Biochem.
61, 1053–1095.
Schleif, R. (1993) Genetics and Molecular Biology, 2nd edn,
The Johns Hopkins University Press, Baltimore.
Provides an excellent account of the experimental basis of
important concepts of prokaryotic gene regulation.
Regulation of Gene Expression in Prokaryotes
Condon, C., Squires, C., & Squires, C.L. (1995) Control of rRNA
transcription in Escherichia coli. Microbiol. Rev. 59, 623–645.
Gourse, R.L., Gaal, T., Bartlett, M.S., Appleman, J.A., &
Ross, W. (1996) rRNA transcription and growth rate–dependent
regulation of ribosome synthesis in Escherichia coli. Annu. Rev.
Microbiol. 50, 645–677.
Jacob, F. & Monod, J. (1961) Genetic regulatory mechanisms in
the synthesis of proteins. J. Mol. Biol. 3, 318–356.
The operon model and the concept of messenger RNA, first
proposed in the Proceedings of the French Academy of
Sciences in 1960, are presented in this historic paper.
Johnson, R.C. (1991) Mechanism of site-specific DNA inversion
in bacteria. Curr. Opin. Genet. Dev. 1, 404–411.
Kolb, A., Busby, S., Buc, H., Garges, S., & Adhya, S. (1993)
Transcriptional regulation by cAMP and its receptor protein.
Annu. Rev. Biochem. 62, 749–795.
Romby, P. & Springer, M. (2003) Bacterial translational control
at atomic resolution. Trends Genet. 19, 155–161.
Yanofsky, C., Konan, K.V., & Sarsero, J.P. (1996) Some novel
transcription attenuation mechanisms used by bacteria. Biochimie
78, 1017–1024.
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1117 mac76 mac76:385_reb:
Chapter 28 Regulation of Gene Expression1118
Regulation of Gene Expression in Eukaryotes
Agami, R. (2002) RNAi and related mechanisms and their
potential use for therapy. Curr. Opin. Chem. Biol. 6, 829–834.
Bashirullah, A., Cooperstock, R.L., & Lipshitz, H.D. (1998)
RNA localization in development. Annu. Rev. Biochem. 67,
335–394.
Becker, P.B. & Horz W. (2002) ATP-dependent nucleosome
remodeling. Annu. Rev. Biochem. 71, 247–273.
Boube, M., Joulia, L., Cribbs, D.L., & Bourbon, H.M. (2002)
Evidence for a mediator of RNA polymerase II transcriptional
regulation conserved from yeast to man. Cell 110, 143–151.
Cerutti, H. (2003) RNA interference: traveling in the cell and
gaining functions? Trends Genet. 19, 9–46.
Conaway, R.C., Brower, C.S., & Conaway, J.W. (2002) Gene
expression—emerging roles of ubiquitin in transcription regulation.
Science 296, 1254–1258.
Cosma, M.P. (2002) Ordered recruitment: gene-specific
mechanism of transcription activation. Mol. Cell 10, 227–236.
Dean, K.A., Aggarwal, A.K., & Wharton, R.P. (2002)
Translational repressors in Drosophila. Trends Genet. 18,
572–577.
DeRobertis, E.M., Oliver, G., & Wright, C.V.E. (1990)
Homeobox genes and the vertebrate body plan. Sci. Am. 263
(July), 46–52.
Edmondson, D.G. & Roth, S.Y. (1996) Chromatin and
transcription. FASEB J. 10, 1173–1182.
Gingras, A.-C., Raught, B., & Sonenberg, N. (1999) eIF4
initiation factors: effectors of mRNA recruitment to ribosomes and
regulators of translation. Annu. Rev. Biochem. 68, 913–963.
Gray, N.K. & Wickens, M. (1998) Control of translation initiation
in animals. Annu. Rev. Cell Dev. Biol. 14, 399–458.
Hannon, G.J. (2002) RNA interference. Nature 418, 244–251.
Luger, K. (2003) Structure and dynamic behavior of nucleosomes.
Curr. Opin. Genet. Dev. 13, 127–135.
Mannervik, M., Nibu, Y., Zhang, H., & Levine, M. (1999)
Transcriptional coregulators in development. Science 284,
606–609.
Martens, J.A. & Winston, F. (2003) Recent advances in
understanding chromatin remodeling by Swi/Snf complexes. Curr.
Opin. Genet. Dev. 13, 136–142.
McKnight, S.L. (1991) Molecular zippers in gene regulation. Sci.
Am. 264 (April), 54–64.
A good description of leucine zippers.
Melton, D.A. (1991) Pattern formation during animal
development. Science 252, 234–241.
Muller, W.A. (1997) Developmental Biology, Springer, New York.
A good elementary text.
Myers, L.C. & Kornberg, R.D. (2000) Mediator of transcriptional
regulation. Annu. Rev. Biochem. 69, 729–749.
Reese, J.C. (2003) Basal transcription factors. Curr. Opin.
Genet. Dev. 13, 114–118.
Rivera-Pomar, R. & Jackle, H. (1996) From gradients to stripes
in Drosophila embryogenesis: filling in the gaps. Trends Genet.
12, 478–483.
Struhl, K. (1999) Fundamentally different logic of gene regulation
in eukaryotes and prokaryotes. Cell 98, 1–4.
Waterhouse, P.M. & Helliwell, C.A. (2003) Exploring plant
genomes by RNA-induced gene silencing. Nat. Rev. Genet. 4,
29–38.
1. Effect of mRNA and Protein Stability on Regula-
tion E. coli cells are growing in a medium with glucose as
the sole carbon source. Tryptophan is suddenly added. The
cells continue to grow, and divide every 30 min. Describe
(qualitatively) how the amount of tryptophan synthase
activity in the cells changes with time under the following
conditions:
(a) The trp mRNA is stable (degraded slowly over many
hours).
(b) The trp mRNA is degraded rapidly, but tryptophan
synthase is stable.
(c) The trp mRNA and tryptophan synthase are both
degraded rapidly.
2. Negative Regulation Describe the probable effects on
gene expression in the lac operon of a mutation in (a) the
lac operator that deletes most of O
1
; (b) the lacI gene that
inactivates the repressor; and (c) the promoter that alters
the region around position H1100210.
3. Specific DNA Binding by Regulatory Proteins A
typical prokaryotic repressor protein discriminates between
its specific DNA binding site (operator) and nonspecific DNA
by a factor of 10
4
to 10
6
. About 10 molecules of repressor per
cell are sufficient to ensure a high level of repression. Assume
that a very similar repressor existed in a human cell, with a
similar specificity for its binding site. How many copies of the
repressor would be required to elicit a level of repression sim-
ilar to that in the prokaryotic cell? (Hint: The E. coli genome
contains about 4.6 million bp; the human haploid genome has
about 3.2 billion bp.)
4. Repressor Concentration in E. coli The dissociation
constant for a particular repressor-operator complex is very
low, about 10
H1100213
M. An E. coli cell (volume 2 H11003 10
H1100212
mL)
contains 10 copies of the repressor. Calculate the cellular con-
centration of the repressor protein. How does this value com-
pare with the dissociation constant of the repressor-operator
complex? What is the significance of this result?
Problems
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1118 mac76 mac76:385_reb:
Chapter 28 Problems 1119
5. Catabolite Repression E. coli cells are growing in a
medium containing lactose but no glucose. Indicate whether
each of the following changes or conditions would increase,
decrease, or not change the expression of the lac operon. It
may be helpful to draw a model depicting what is happening
in each situation.
(a) Addition of a high concentration of glucose
(b) A mutation that prevents dissociation of the Lac re-
pressor from the operator
(c) A mutation that completely inactivates H9252-galactosi-
dase
(d) A mutation that completely inactivates galactoside
permease
(e) A mutation that prevents binding of CRP to its bind-
ing site near the lac promoter
6. Transcription Attenuation How would transcription
of the E. coli trp operon be affected by the following manip-
ulations of the leader region of the trp mRNA?
(a) Increasing the distance (number of bases) between
the leader peptide gene and sequence 2
(b) Increasing the distance between sequences 2 and 3
(c) Removing sequence 4
(d) Changing the two Trp codons in the leader peptide
gene to His codons
(e) Eliminating the ribosome-binding site for the gene
that encodes the leader peptide
(f) Changing several nucleotides in sequence 3 so that
it can base-pair with sequence 4 but not with sequence 2
7. Repressors and Repression How would the SOS re-
sponse in E. coli be affected by a mutation in the lexA gene
that prevented autocatalytic cleavage of the LexA protein?
8. Regulation by Recombination In the phase variation
system of Salmonella, what would happen to the cell if the
Hin recombinase became more active and promoted re-
combination (DNA inversion) several times in each cell
generation?
9. Initiation of Transcription in Eukaryotes A new
RNA polymerase activity is discovered in crude extracts of
cells derived from an exotic fungus. The RNA polymerase ini-
tiates transcription only from a single, highly specialized pro-
moter. As the polymerase is purified its activity declines, and
the purified enzyme is completely inactive unless crude ex-
tract is added to the reaction mixture. Suggest an explana-
tion for these observations.
10. Functional Domains in Regulatory Proteins A bio-
chemist replaces the DNA-binding domain of the yeast Gal4
protein with the DNA-binding domain from the Lac repres-
sor, and finds that the engineered protein no longer regulates
transcription of the GAL genes in yeast. Draw a diagram of
the different functional domains you would expect to find in
the Gal4 protein and in the engineered protein. Why does the
engineered protein no longer regulate transcription of the
GAL genes? What might be done to the DNA-binding site rec-
ognized by this chimeric protein to make it functional in ac-
tivating transcription of GAL genes?
11. Inheritance Mechanisms in Development A
Drosophila egg that is bcd
H11002
/bcd
H11002
may develop normally but
as an adult will not be able to produce viable offspring.
Explain.
Biochemistry on the Internet
12. TATA Binding Protein and the TATA Box To ex-
amine the interactions between transcription factors and
DNA, go to the Protein Data Bank (www.rcsb.org/pdb) and
download the PDB file 1TGH. This file models the interac-
tions between a human TATA-binding protein and a segment
of double-stranded DNA. Use the Noncovalent Bond Finder
at the Chime Resources website (www.umass.edu/microbio/
chime) to examine the roles of hydrogen bonds and hydro-
phobic interactions involved in the binding of this transcrip-
tion factor to the TATA box.
Within the Noncovalent Bond Finder program, load the
PDB file and display the protein in Spacefill mode and the
DNA in Wireframe mode.
(a) Which of the base pairs in the DNA form hydrogen
bonds with the protein? Which of these contribute to the spe-
cific recognition of the TATA box by this protein? (Hydrogen-
bond length between hydrogen donor and hydrogen accep-
tor ranges from 2.5 to 3.3 ?.)
(b) Which amino acid residues in the protein interact
with these base pairs? On what basis did you make this de-
termination? Do these observations agree with the informa-
tion presented in the text?
(c) What is the sequence of the DNA in this model and
which portions of the sequence are recognized by the TATA-
binding protein?
(d) Can you identify any hydrophobic interactions in
this complex? (Hydrophobic interactions usually occur with
interatomic distances of 3.3 to 4.0 ?.)
8885d_c28_1081-1119 2/12/04 2:28 PM Page 1119 mac76 mac76:385_reb: