T he third and final part of this book explores the bio- chemical mechanisms underlying the apparently con- tradictory requirements for both genetic continuity and the evolution of living organisms. What is the molecular nature of genetic material? How is genetic information transmitted from one generation to the next with high fidelity? How do the rare changes in genetic material that are the raw material of evolution arise? How is ge- netic information ultimately expressed in the amino acid sequences of the astonishing variety of protein mole- cules in a living cell? The fundamental unit of information in living sys- tems is the gene. A gene can be defined biochemically as a segment of DNA (or, in a few cases, RNA) that en- codes the information required to produce a functional biological product. The final product is usually a pro- tein, so much of the material in Part III concerns genes that encode proteins. A functional gene product might also be one of several classes of RNA molecules. The storage, maintenance, and metabolism of these infor- mational units form the focal points of our discussion in Part III. Modern biochemical research on gene structure and function has brought to biology a revolution compara- ble to that stimulated by the publication of Darwin’s the- ory on the origin of species nearly 150 years ago. An un- derstanding of how information is stored and used in cells has brought penetrating new insights to some of the most fundamental questions about cellular structure and function. A comprehensive conceptual framework for biochemistry is now unfolding. Today’s understanding of information pathways has arisen from the convergence of genetics, physics, and chemistry in modern biochemistry. This was epitomized by the discovery of the double-helical structure of DNA, postulated by James Watson and Francis Crick in 1953 (see Fig. 8–15). Genetic theory contributed the concept of coding by genes. Physics permitted the determina- tion of molecular structure by x-ray diffraction analysis. Chemistry revealed the composition of DNA. The pro- found impact of the Watson-Crick hypothesis arose from its ability to account for a wide range of observations derived from studies in these diverse disciplines. This revolution in our understanding of the struc- ture of DNA inevitably stimulated questions about its function. The double-helical structure itself clearly sug- gested how DNA might be copied so that the informa- tion it contains can be transmitted from one generation to the next. Clarification of how the information in DNA is converted into functional proteins came with the dis- covery of both messenger RNA and transfer RNA and with the deciphering of the genetic code. These and other major advances gave rise to the central dogma of molecular biology, comprising the three major processes in the cellular utilization of ge- netic information. The first is replication, the copying of parental DNA to form daughter DNA molecules with identical nucleotide sequences. The second is tran- scription, the process by which parts of the genetic message encoded in DNA are copied precisely into RNA. The third is translation, whereby the genetic message encoded in messenger RNA is translated on the ribo- somes into a polypeptide with a particular sequence of amino acids. PART INFORMATION PATHWAYS III 24 Genes and Chromosomes 923 25 DNA Metabolism 948 26 RNA Metabolism 995 27 Protein Metabolism 1034 28 Regulation of Gene Expression 1081 921 8885d_c24_920-947 2/11/04 1:36 PM Page 921 mac76 mac76:385_reb: Part III explores these and related processes. In Chapter 24 we examine the structure, topology, and packaging of chromosomes and genes. The processes underlying the central dogma are elaborated in Chap- ters 25 through 27. Finally, we turn to regulation, ex- amining how the expression of genetic information is controlled (Chapter 28). A major theme running through these chapters is the added complexity inherent in the biosynthesis of macromolecules that contain information. Assembling nucleic acids and proteins with particular sequences of nucleotides and amino acids represents nothing less than preserving the faithful expression of the template upon which life itself is based. We might expect the for- mation of phosphodiester bonds in DNA or peptide bonds in proteins to be a trivial feat for cells, given the arsenal of enzymatic and chemical tools described in Part II. However, the framework of patterns and rules established in our examination of metabolic pathways thus far must be enlarged considerably to take into account molecular information. Bonds must be formed between particular subunits in informational biopoly- mers, avoiding either the occurrence or the persistence of sequence errors. This has an enormous impact on the thermodynamics, chemistry, and enzymology of the biosynthetic processes. Formation of a peptide bond re- quires an energy input of only about 21 kJ/mol of bonds and can be catalyzed by relatively simple enzymes. But to synthesize a bond between two specific amino acids at a particular point in a polypeptide, the cell invests about 125 kJ/mol while making use of more than 200 enzymes, RNA molecules, and specialized proteins. The chemistry involved in peptide bond formation does not change because of this requirement, but additional processes are layered over the basic reaction to ensure that the peptide bond is formed between particular amino acids. Information is expensive. The dynamic interaction between nucleic acids and proteins is another central theme of Part III. With the important exception of a few catalytic RNA molecules (discussed in Chapters 26 and 27), the processes that make up the pathways of cellular information flow are catalyzed and regulated by proteins. An understanding of these enzymes and other proteins can have practical as well as intellectual rewards, because they form the basis of recombinant DNA technology (introduced in Chapter 9). Part III Information Pathways922 The central dogma of molecular biology, showing the general path- ways of information flow via replication, transcription, and transla- tion. The term “dogma” is a misnomer. Introduced by Francis Crick at a time when little evidence supported these ideas, the dogma has be- come a well-established principle. RNA Protein Transcription Translation DNAReplication 8885d_c24_922 2/11/04 3:11 PM Page 922 mac76 mac76:385_reb: chapter A lmost every cell of a multicellular organism contains the same complement of genetic material—its genome. Just look at any human individual for a hint of the wealth of information contained in each human cell. Chromosomes, the nucleic acid molecules that are the repository of an organism’s genetic information, are the largest molecules in a cell and may contain thou- sands of genes as well as considerable tracts of inter- genic DNA. The 16 chromosomes in the relatively small genome of the yeast Saccharomyces cerevisiae have molecular masses ranging from 1.5 H11003 10 8 to 1 H11003 10 9 dal- tons, corresponding to DNA molecules with 230,000 to 1,532,000 contiguous base pairs (bp). Human chromo- somes range up to 279 million bp. The very size of DNA molecules presents an inter- esting biological puzzle, given that they are generally much longer than the cells or viral packages that con- tain them (Fig. 24–1). In this chapter we shift our focus from the secondary structure of DNA, considered in Chapter 8, to the extraordinary degree of organization required for the tertiary packaging of DNA into chromo- somes. We first examine the elements within viral and cellular chromosomes, then assess their size and organi- zation. We next consider DNA topology, providing a GENES AND CHROMOSOMES 24.1 Chromosomal Elements 924 24.2 DNA Supercoiling 930 24.3 The Structure of Chromosomes 938 DNA topoisomerases are the magicians of the DNA world. By allowing DNA strands or double helices to pass through each other, they can solve all of the topological problems of DNA in replication, transcription and other cellular transactions. —James Wang, article in Nature Reviews in Molecular Cell Biology, 2002 Supercoiling, in fact, does more for DNA than act as an executive enhancer; it keeps the unruly, spreading DNA inside the cramped confines that the cell has provided for it. —Nicholas Cozzarelli, Harvey Lectures, 1993 24 923 0.5 mH9262 FIGURE 24–1 Bacteriophage T2 protein coat surrounded by its sin- gle, linear molecule of DNA. The DNA was released by lysing the bacteriophage particle in distilled water and allowing the DNA to spread on the water surface. An undamaged T2 bacteriophage parti- cle consists of a head structure that tapers to a tail by which the bac- teriophage attaches itself to the outer surface of a bacterial cell. All the DNA shown in this electron micrograph is normally packaged in- side the phage head. 8885d_c24_920-947 2/11/04 1:36 PM Page 923 mac76 mac76:385_reb: description of the coiling of DNA molecules. Finally, we discuss the protein-DNA interactions that organize chromosomes into compact structures. 24.1 Chromosomal Elements Cellular DNA contains genes and intergenic regions, both of which may serve functions vital to the cell. The more complex genomes, such as those of eukaryotic cells, demand increased levels of chromosomal organi- zation, and this is reflected in the chromosome’s struc- tural features. We begin by considering the different types of DNA sequences and structural elements within chromosomes. Genes Are Segments of DNA That Code for Polypeptide Chains and RNAs Our understanding of genes has evolved tremendously over the last century. Classically, a gene was defined as a portion of a chromosome that determines or affects a single character or phenotype (visible property), such as eye color. George Beadle and Edward Tatum proposed a molecular definition of a gene in 1940. After exposing spores of the fungus Neurospora crassa to x rays and other agents known to damage DNA and cause alterations in DNA sequence (mutations), they detected mutant fungal strains that lacked one or another specific en- zyme, sometimes resulting in the failure of an entire metabolic pathway. Beadle and Tatum concluded that a gene is a segment of genetic material that determines or codes for one enzyme: the one gene–one enzyme hypothesis. Later this concept was broadened to one gene–one polypeptide, because many genes code for proteins that are not enzymes or for one polypeptide of a multisubunit protein. The modern biochemical definition of a gene is even more precise. A gene is all the DNA that encodes the primary sequence of some final gene product, which can be either a polypeptide or an RNA with a structural or catalytic function. DNA also contains other segments or sequences that have a purely regulatory function. Reg- ulatory sequences provide signals that may denote the beginning or the end of genes, or influence the tran- scription of genes, or function as initiation points for replication or recombination (Chapter 28). Some genes can be expressed in different ways to generate multiple gene products from one segment of DNA. The special transcriptional and translational mechanisms that allow this are described in Chapters 26 through 28. We can make direct estimations of the minimum overall size of genes that encode proteins. As described in detail in Chapter 27, each amino acid of a polypep- tide chain is coded for by a sequence of three consec- utive nucleotides in a single strand of DNA (Fig. 24–2), with these “codons” arranged in a sequence that corre- sponds to the sequence of amino acids in the polypep- tide that the gene encodes. A polypeptide chain of 350 amino acid residues (an average-size chain) corre- Chapter 24 Genes and Chromosomes924 George W. Beadle, 1903–1989 Edward L. Tatum, 1909–1975 U C U A G A C G U G C A G G A C C T U A C A T G A C U T G A U U U A A A G C C C G G G U U C A A 5H11032 3H11032 3H11032 5H11032 DNA mRNA T C T C G T G G A T A C A C T T T T G C C G T T 3H11032 5H11032 Arg Gly Tyr Thr Phe Ala Val Ser Carboxyl terminus Amino terminus Polypeptide Template strand FIGURE 24–2 Colinearity of the coding nucleotide sequences of DNA and mRNA and the amino acid sequence of a polypeptide chain. The triplets of nucleotide units in DNA determine the amino acids in a protein through the intermediary mRNA. One of the DNA strands serves as a template for synthesis of mRNA, which has nucleotide triplets (codons) complementary to those of the DNA. In some bacte- rial and many eukaryotic genes, coding sequences are interrupted at intervals by regions of noncoding sequences (called introns). 8885d_c24_920-947 2/11/04 1:36 PM Page 924 mac76 mac76:385_reb: sponds to 1,050 bp. Many genes in eukaryotes and a few in prokaryotes are interrupted by noncoding DNA seg- ments and are therefore considerably longer than this simple calculation would suggest. How many genes are in a single chromosome? The Escherichia coli chromosome, one of the prokaryotic genomes that has been completely sequenced, is a cir- cular DNA molecule (in the sense of an endless loop rather than a perfect circle) with 4,639,221 bp. These base pairs encode about 4,300 genes for proteins and another 115 genes for stable RNA molecules. Among eu- karyotes, the approximately 3.2 billion base pairs of the human genome include 30,000 to 35,000 genes on 24 different chromosomes. DNA Molecules Are Much Longer Than the Cellular Packages That Contain Them Chromosomal DNAs are often many orders of magni- tude longer than the cells or viruses in which they are found (Fig. 24–1; Table 24–1). This is true of every class of organism or parasite. Viruses Viruses are not free-living organisms; rather, they are infectious parasites that use the resources of a host cell to carry out many of the processes they re- quire to propagate. Many viral particles consist of no more than a genome (usually a single RNA or DNA mol- ecule) surrounded by a protein coat. Almost all plant viruses and some bacterial and an- imal viruses have RNA genomes. These genomes tend to be particularly small. For example, the genomes of mammalian retroviruses such as HIV are about 9,000 nu- cleotides long, and that of the bacteriophage QH9252 has 4,220 nucleotides. Both types of viruses have single- stranded RNA genomes. The genomes of DNA viruses vary greatly in size (Table 24–1). Many viral DNAs are circular for at least part of their life cycle. During viral replication within a host cell, specific types of viral DNA called replicative forms may appear; for example, many linear DNAs be- come circular and all single-stranded DNAs become double-stranded. A typical medium-sized DNA virus is bacteriophage H9261 (lambda), which infects E. coli. In its replicative form inside cells, H9261 DNA is a circular double helix. This double-stranded DNA contains 48,502 bp and has a contour length of 17.5 H9262m. Bacteriophage H9278X174 is a much smaller DNA virus; the DNA in the viral par- ticle is a single-stranded circle, and the double-stranded replicative form contains 5,386 bp. Although viral genomes are small, the contour lengths of their DNAs are much greater than the long dimensions of the viral particles that contain them. The DNA of bacteriophage T4, for example, is about 290 times longer than the vi- ral particle itself (Table 24–1). Bacteria A single E. coli cell contains almost 100 times as much DNA as a bacteriophage H9261 particle. The chro- mosome of an E. coli cell is a single double-stranded circular DNA molecule. Its 4,639,221 bp have a contour length of about 1.7 mm, some 850 times the length of the E. coli cell (Fig. 24–3). In addition to the very large, circular DNA chromosome in their nucleoid, many bac- teria contain one or more small circular DNA molecules that are free in the cytosol. These extrachromosomal elements are called plasmids (Fig. 24–4; see also p. 311). Most plasmids are only a few thousand base pairs long, but some contain more than 10,000 bp. They carry genetic information and undergo replication to yield daughter plasmids, which pass into the daughter cells at cell division. Plasmids have been found in yeast and other fungi as well as in bacteria. In many cases plasmids confer no obvious advan- tage on their host, and their sole function appears to be self-propagation. However, some plasmids carry genes that are useful to the host bacterium. For example, some plasmid genes make a host bacterium resistant to antibacterial agents. Plasmids carrying the gene for the enzyme H9252-lactamase confer resistance to H9252-lactam antibiotics such as penicillin and amoxicillin (see Box 20–1). These and similar plasmids may pass from an antibiotic-resistant cell to an antibiotic-sensitive cell of the same or another bacterial species, making the recipient cell antibiotic resistant. The extensive use of antibiotics 24.1 Chromosomal Elements 925 TABLE 24–1 The Sizes of DNA and Viral Particles for Some Bacterial Viruses (Bacteriophages) Size of viral Length of Long dimension of Virus DNA (bp) viral DNA (nm) viral particle (nm) H9278X174 5,386 1,939 25 T7 39,936 14,377 78 H9261 (lambda) 48,502 17,460 190 T4 168,889 60,800 210 Note: Data on size of DNA are for the replicative form (double-stranded). The contour length is calculated assuming that each base pair occupies a length of 3.4 ? (see Fig. 8–15). 8885d_c24_925 2/12/04 11:21 AM Page 925 mac76 mac76:385_reb: E. coli E. coli DNA mosomes (Fig. 24–5). Each chromosome of a eukary- otic cell, such as that shown in Figure 24–5a, contains a single, very large, duplex DNA molecule. The DNA molecules in the 24 different types of human chromo- somes (22 matching pairs plus the X and Y sex chro- mosomes) vary in length over a 25-fold range. Each type of chromosome in eukaryotes carries a characteristic set of genes. Interestingly, the number of genes does not vary nearly as much as does genome size (see Chapter 9 for a discussion of the types of sequences, besides genes, that contribute to genome size). The DNA of one human genome (22 chromosomes plus X and Y or two X chromosomes), placed end to end, would extend for about a meter. Most human cells are diploid and each cell contains a total of 2 m of DNA. An adult human body contains approximately 10 14 cells and thus a total DNA length of 2 H11003 10 11 km. Compare this with the circumference of the earth (4 H11003 10 4 km) or the distance between the earth and the sun (1.5 H11003 10 8 km)—a dramatic illustration of the extraor- dinary degree of DNA compaction in our cells. in some human populations has served as a strong selective force, encouraging the spread of antibiotic resistance–coding plasmids (as well as transposable el- ements, described below, that harbor similar genes) in disease-causing bacteria and creating bacterial strains that are resistant to several antibiotics. Physicians are becoming increasingly reluctant to prescribe antibiotics unless a clear clinical need is confirmed. For similar rea- sons, the widespread use of antibiotics in animal feeds is being curbed. Eukaryotes A yeast cell, one of the simplest eukary- otes, has 2.6 times more DNA in its genome than an E. coli cell (Table 24–2). Cells of Drosophila, the fruit fly used in classical genetic studies, contain more than 35 times as much DNA as E. coli cells, and human cells have almost 700 times as much. The cells of many plants and amphibians contain even more. The genetic material of eukaryotic cells is apportioned into chromosomes, the diploid (2n) number depending on the species (Table 24–2). A human somatic cell, for example, has 46 chro- Chapter 24 Genes and Chromosomes926 FIGURE 24–3 The length of the E. coli chromosome (1.7 mm) depicted in linear form relative to the length of a typical E. coli cell (2 H9262m). FIGURE 24–4 DNA from a lysed E. coli cell. In this electron micrograph several small, circu- lar plasmid DNAs are indicated by white arrows. The black spots and white specks are artifacts of the preparation. 8885d_c24_920-947 2/11/04 1:36 PM Page 926 mac76 mac76:385_reb: Eukaryotic cells also have organelles, mitochondria (Fig. 24–6) and chloroplasts, that contain DNA. Mito- chondrial DNA (mtDNA) molecules are much smaller than the nuclear chromosomes. In animal cells, mtDNA contains fewer than 20,000 bp (16,569 bp in human mtDNA) and is a circular duplex. Each mitochondrion typically has two to ten copies of this mtDNA molecule, and the number can rise to hundreds in certain cells when an embryo is undergoing cell differentiation. In a few organisms (trypanosomes, for example) each mito- chondrion contains thousands of copies of mtDNA, or- ganized into a complex and interlinked matrix known as a kinetoplast. Plant cell mtDNA ranges in size from 200,000 to 2,500,000 bp. Chloroplast DNA (cpDNA) also exists as circular duplexes and ranges in size from 120,000 to 160,000 bp. The evolutionary origin of mito- chondrial and chloroplast DNAs has been the subject of much speculation. A widely accepted view is that they are vestiges of the chromosomes of ancient bacteria that gained access to the cytoplasm of host cells and became the precursors of these organelles (see Fig. 1–36). 24.1 Chromosomal Elements 927 (a) (b) FIGURE 24–6 A dividing mitochondrion. Some mitochondrial proteins and RNAs are encoded by one of the copies of the mito- chondrial DNA (none of which are visible here). The DNA (mtDNA) is replicated each time the mitochondrion divides, before cell division. FIGURE 24–5 Eukaryotic chromosomes. (a) A pair of linked and condensed sister chromatids from a human chromosome. Eukaryotic chromosomes are in this state after replication and at metaphase during mitosis. (b) A complete set of chromosomes from a leukocyte from one of the authors. There are 46 chromosomes in every normal human somatic cell. 8885d_c24_920-947 2/11/04 1:36 PM Page 927 mac76 mac76:385_reb: Mitochondrial DNA codes for the mitochondrial tRNAs and rRNAs and for a few mitochondrial proteins. More than 95% of mitochondrial proteins are encoded by nu- clear DNA. Mitochondria and chloroplasts divide when the cell divides. Their DNA is replicated before and dur- ing division, and the daughter DNA molecules pass into the daughter organelles. Eukaryotic Genes and Chromosomes Are Very Complex Many bacterial species have only one chromosome per cell and, in nearly all cases, each chromosome contains only one copy of each gene. A very few genes, such as those for rRNAs, are repeated several times. Genes and regulatory sequences account for almost all the DNA in prokaryotes. Moreover, almost every gene is precisely colinear with the amino acid sequence (or RNA se- quence) for which it codes (Fig. 24–2). The organization of genes in eukaryotic DNA is structurally and functionally much more complex. The study of eukaryotic chromosome structure, and more recently the sequencing of entire eukaryotic genomes, has yielded many surprises. Many, if not most, eukary- otic genes have a distinctive and puzzling structural feature: their nucleotide sequences contain one or more intervening segments of DNA that do not code for the amino acid sequence of the polypeptide product. These nontranslated inserts interrupt the otherwise colinear relationship between the nucleotide sequence of the gene and the amino acid sequence of the polypeptide it encodes. Such nontranslated DNA segments in genes are called intervening sequences or introns, and the coding segments are called exons. Few prokaryotic genes contain introns. In higher eukaryotes, the typical gene has much more intron sequence than sequences devoted to ex- ons. For example, in the gene coding for the single polypeptide chain of the avian egg protein ovalbumin (Fig. 24–7), the introns are much longer than the ex- ons; altogether, seven introns make up 85% of the gene’s DNA. In the gene for the H9252 subunit of hemoglobin, a sin- gle intron contains more than half of the gene’s DNA. The gene for the muscle protein titin is the intron cham- pion, with 178 introns. Genes for histones appear to have no introns. In most cases the function of introns is not clear. In total, only about 1.5% of human DNA is “cod- ing” or exon DNA, carrying information for protein or RNA products. However, when the much larger introns are included in the count, as much as 30% of the hu- man genome consists of genes. The relative paucity of genes in the human genome leaves a lot of DNA unaccounted for. Figure 24–8 provides a summary of sequence types. Much of the nongene DNA is in the form of repeated sequences of several kinds. Perhaps most surprising, about half the human genome is made up of moderately repeated se- quences that are derived from transposable elements— segments of DNA, ranging from a few hundred to sev- eral thousand base pairs long, that can move from one location to another in the genome. Transposable ele- ments (transposons) are a kind of molecular parasite, efficiently making a home within the host genome. Many have genes encoding proteins that catalyze the trans- position process, described in more detail in Chapters 25 and 26. Some transposons in the human genome are active, moving at a low frequency, but most are inactive relics, evolutionarily altered by mutations. Although these elements generally do not encode proteins or RNAs that are used in human cells, they have played a Chapter 24 Genes and Chromosomes928 TABLE 24–2 DNA, Gene, and Chromosome Content in Some Genomes Total DNA (bp) Number of Approximate chromosomes * number of genes Bacterium (Escherichia coli) 4,639,221 1 4,405 Yeast (Saccharomyces cerevisiae) 12,068,000 16 ? 6,200 Nematode (Caenorhabditis elegans) 97,000,000 12 ? 19,000 Plant (Arabidopsis thaliana) 125,000,000 10 25,500 Fruit fly (Drosophila melanogaster) 180,000,000 18 13,600 Plant (Oryza sativa; rice) 480,000,000 24 57,000 Mouse (Mus musculus) 2,500,000,000 40 30,000–35,000 Human (Homo sapiens) 3,200,000,000 46 30,000–35,000 Note: This information is constantly being refined. For the most current information, consult the websites for the individual genome projects. * The diploid chromosome number is given for all eukaryotes except yeast. ? Haploid chromosome number. Wild yeast strains generally have eight (octoploid) or more sets of these chromosomes. ? Number for females, with two X chromosomes. Males have an X but no Y, thus 11 chromosomes in all. 8885d_c24_920-947 2/11/04 1:36 PM Page 928 mac76 mac76:385_reb: major role in human evolution: movement of trans- posons can lead to the redistribution of other genomic sequences. Another 3% or so of the human genome consists of highly repetitive sequences, also referred to as simple-sequence DNA or simple sequence repeats (SSR). These short sequences, generally less than 10 bp long, are sometimes repeated millions of times per cell. The simple-sequence DNA has also been called satellite DNA, so named because its unusual base com- position often causes it to migrate as “satellite” bands (separated from the rest of the DNA) when fragmented cellular DNA samples are centrifuged in a cesium chlo- ride density gradient. Studies suggest that simple- sequence DNA does not encode proteins or RNAs. Un- like the transposable elements, the highly repetitive DNA can have identifiable functional importance in human cellular metabolism, because much of it is asso- ciated with two defining features of eukaryotic chro- mosomes: centromeres and telomeres. 24.1 Chromosomal Elements 929 A BC D E F G 12 3 4 5 6 7 Ovalbumin gene A 131 bp B 851 bp 1 90 bp 2 222 bp 3 126 bp L Hemoglobin H9252 subunit Exon Intron FIGURE 24–7 Introns in two eukaryotic genes. The gene for ovalbu- min has seven introns (A to G), splitting the coding sequences into eight exons (L, and 1 to 7). The gene for the H9252 subunit of hemoglobin has two introns and three exons, including one intron that alone con- tains more than half the base pairs of the gene. G e n e s 3 0 % M i s c e l l a n e o u s 2 5 % T ra nsposon s 45% 13% SINEs 8% Retroviruslike 3% SSR 5% SD 17% ? 28.5% Introns and noncoding segments 21% LINEs 1.5% Exons FIGURE 24–8 Types of sequences in the human genome. This pie chart divides the genome into transposons (transposable elements), genes, and miscellaneous sequences. There are four main classes of transposons. Long interspersed elements (LINEs), 6 to 8 kbp long (1 kbp H11005 1,000 bp), typically include a few genes encoding proteins that cat- alyze transposition. The genome has about 850,000 LINEs. Short inter- spersed elements (SINEs) are about 100 to 300 bp long. Of the 1.5 million in the human genome more than 1 million are Alu elements, so called because they generally include one copy of the recognition sequence for AluI, a restriction endonuclease (see Fig. 9–3). The genome also contains 450,000 copies of retroviruslike transposons, 1.5 to 11 kbp long. Although these are “trapped” in the genome and cannot move from one cell to another, they are evolutionarily related to the retroviruses (Chapter 26), which include HIV. A final class of transposons (making up H110211% and not shown here) consists of a vari- ety of transposon remnants that differ greatly in length. About 30% of the genome consists of sequences included in genes for proteins, but only a small fraction of this DNA is in exons (coding sequences). Miscellaneous sequences include simple-sequence re- peats (SSR) and large segmental duplications (SD), the latter being seg- ments that appear more than once in different locations. Among the unlisted sequence elements (denoted by a question mark) are genes encoding RNAs (which can be harder to identify than genes for pro- teins) and remnants of transposons that have been evolutionarily al- tered so that they are now hard to identify. 8885d_c24_920-947 2/11/04 1:36 PM Page 929 mac76 mac76:385_reb: The centromere (Fig. 24–9) is a sequence of DNA that functions during cell division as an attachment point for proteins that link the chromosome to the mi- totic spindle. This attachment is essential for the equal and orderly distribution of chromosome sets to daugh- ter cells. The centromeres of Saccharomyces cere- visiae have been isolated and studied. The sequences essential to centromere function are about 130 bp long and are very rich in AUT pairs. The centromeric se- quences of higher eukaryotes are much longer and, un- like those of yeast, generally contain simple-sequence DNA, which consists of thousands of tandem copies of one or a few short sequences of 5 to 10 bp, in the same orientation. The precise role of simple-sequence DNA in centromere function is not yet understood. Telomeres (Greek telos, “end”) are sequences at the ends of eukaryotic chromosomes that help stabilize the chromosome. The best-characterized telomeres are those of the simpler eukaryotes. Yeast telomeres end with about 100 bp of imprecisely repeated sequences of the form (5H11032)(T x G y ) n (3H11032)(A x C y ) n where x and y are generally between 1 and 4. The num- ber of telomere repeats, n, is in the range of 20 to 100 for most single-celled eukaryotes and generally more than 1,500 in mammals. The ends of a linear DNA mol- ecule cannot be routinely replicated by the cellular repli- cation machinery (which may be one reason why bac- terial DNA molecules are circular). Repeated telomeric sequences are added to eukaryotic chromosome ends primarily by the enzyme telomerase (see Fig. 26–35). Artificial chromosomes (Chapter 9) have been con- structed as a means of better understanding the func- tional significance of many structural features of eukar- yotic chromosomes. A reasonably stable artificial linear chromosome requires only three components: a centro- mere, telomeres at each end, and sequences that allow the initiation of DNA replication. Yeast artificial chromo- somes (YACs; see Fig. 9–8) have been developed as a research tool in biotechnology. Similarly, human artificial chromosomes (HACs) are being developed for the treat- ment of genetic diseases by somatic gene therapy. SUMMARY 24.1 Chromosomal Elements ■ Genes are segments of a chromosome that contain the information for a functional polypeptide or RNA molecule. In addition to genes, chromosomes contain a variety of regulatory sequences involved in replication, transcription, and other processes. ■ Genomic DNA and RNA molecules are generally orders of magnitude longer than the viral particles or cells that contain them. ■ Many genes in eukaryotic cells, and a few in bacteria, are interrupted by noncoding sequences called introns. The coding segments separated by introns are called exons. ■ Less than one-third of human genomic DNA consists of genes. Much of the remainder consists of repeated sequences of various types. Nucleic acid parasites known as transposons account for about half of the human genome. ■ Eukaryotic chromosomes have two important special-function repetitive DNA sequences: centromeres, which are attachment points for the mitotic spindle, and telomeres, located at the ends of chromosomes. 24.2 DNA Supercoiling Cellular DNA, as we have seen, is extremely compacted, implying a high degree of structural organization. The folding mechanism must not only pack the DNA but also permit access to the information in the DNA. Before considering how this is accomplished in processes such as replication and transcription, we need to examine an important property of DNA structure known as super- coiling. Supercoiling means the coiling of a coil. A telephone cord, for example, is typically a coiled wire. The path taken by the wire between the base of the phone and the receiver often includes one or more supercoils (Fig. 24–10). DNA is coiled in the form of a double helix, with both strands of the DNA coiling around an axis. The further coiling of that axis upon itself (Fig. 24–11) pro- duces DNA supercoiling. As detailed below, DNA supercoiling is generally a manifestation of structural strain. When there is no net bending of the DNA axis upon itself, the DNA is said to be in a relaxed state. We might have predicted that DNA compaction in- volved some form of supercoiling. Perhaps less pre- dictable is that replication and transcription of DNA also affect and are affected by supercoiling. Both processes Chapter 24 Genes and Chromosomes930 Unique sequences (genes), dispersed repeats, and multiple replication origins TelomereCentromereTelomere FIGURE 24–9 Important structural elements of a yeast chromosome. 8885d_c24_920-947 2/11/04 1:36 PM Page 930 mac76 mac76:385_reb: require a separation of DNA strands—a process com- plicated by the helical interwinding of the strands (as demonstrated in Fig. 24–12). That DNA would bend on itself and become super- coiled in tightly packaged cellular DNA would seem log- ical, then, and perhaps even trivial, were it not for one additional fact: many circular DNA molecules remain highly supercoiled even after they are extracted and pu- rified, freed from protein and other cellular components. This indicates that supercoiling is an intrinsic property of DNA tertiary structure. It occurs in all cellular DNAs and is highly regulated by each cell. A number of measurable properties of supercoiling have been established, and the study of supercoiling has provided many insights into DNA structure and func- tion. This work has drawn heavily on concepts derived from a branch of mathematics called topology, the study of the properties of an object that do not change under continuous deformations. For DNA, continuous deformations include conformational changes due to thermal motion or an interaction with proteins or other molecules; discontinuous deformations involve DNA strand breakage. For circular DNA molecules, a topolo- gical property is one that is unaffected by deformations FIGURE 24–10 Supercoils. A typical phone cord is coiled like a DNA helix, and the coiled cord can itself coil in a supercoil. The illustra- tion is especially appropriate because an examination of phone cords helped lead Jerome Vinograd and his colleagues to the insight that many properties of small circular DNAs can be explained by super- coiling. They first detected DNA supercoiling, in small circular viral DNAs, in 1965. DNA double helix (coil) DNA supercoil Axis FIGURE 24–11 Supercoiling of DNA. When the axis of the DNA dou- ble helix is coiled on itself, it forms a new helix (superhelix). The DNA superhelix is usually called a supercoil. FIGURE 24–12 Supercoiling induced by separating the strands of a helical structure. Twist two linear strands of rubber band into a right- handed double helix as shown. Fix one end by having a friend hold onto it, then pull apart the two strands at the other end. The resulting strain will produce supercoiling. 24.2 DNA Supercoiling 931 8885d_c24_920-947 2/11/04 1:36 PM Page 931 mac76 mac76:385_reb: of the DNA strands as long as no breaks are introduced. Topological properties are changed only by breakage and rejoining of the backbone of one or both DNA strands. We now examine the fundamental properties and physical basis of supercoiling. Most Cellular DNA Is Underwound To understand supercoiling we must first focus on the properties of small circular DNAs such as plasmids and small viral DNAs. When these DNAs have no breaks in either strand, they are referred to as closed-circular DNAs. If the DNA of a closed-circular molecule con- forms closely to the B-form structure (the Watson-Crick structure; see Fig. 8–15), with one turn of the double helix per 10.5 bp, the DNA is relaxed rather than su- percoiled (Fig. 24–13). Supercoiling results when DNA is subject to some form of structural strain. Purified closed-circular DNA is rarely relaxed, regardless of its biological origin. Furthermore, DNAs derived from a given cellular source have a characteristic degree of su- percoiling. DNA structure is therefore strained in a man- ner that is regulated by the cell to induce the super- coiling. In almost every instance, the strain is a result of un- derwinding of the DNA double helix in the closed cir- cle. In other words, the DNA has fewer helical turns than would be expected for the B-form structure. The effects of underwinding are summarized in Figure 24–14. An 84 bp segment of a circular DNA in the re- laxed state would contain eight double-helical turns, or one for every 10.5 bp. If one of these turns were re- moved, there would be (84 bp)/7 H11005 12.0 bp per turn, rather than the 10.5 found in B-DNA (Fig. 24–14b). This is a deviation from the most stable DNA form, and the molecule is thermodynamically strained as a result. Gen- erally, much of this strain would be accommodated by coiling the axis of the DNA on itself to form a supercoil (Fig. 24–14c; some of the strain in this 84 bp segment would simply become dispersed in the untwisted struc- ture of the larger DNA molecule). In principle, the strain could also be accommodated by separating the two DNA strands over a distance of about 10 bp (Fig. 24–14d). In isolated closed-circular DNA, strain introduced by un- derwinding is generally accommodated by supercoiling rather than strand separation, because coiling the axis of the DNA usually requires less energy than breaking the hydrogen bonds that stabilize paired bases. Note, however, that the underwinding of DNA in vivo makes Chapter 24 Genes and Chromosomes932 0.2 mH9262 FIGURE 24–13 Relaxed and supercoiled plasmid DNAs. The molecule in the leftmost electron micrograph is relaxed; the degree of supercoiling increases from left to right. (a) Relaxed (8 turns) (d) Strand separation (b) Strained (7 turns) (c) Supercoil FIGURE 24–14 Effects of DNA underwinding. (a) A segment of DNA within a closed-circular molecule, 84 bp long, in its relaxed form with eight helical turns. (b) Removal of one turn induces structural strain. (c) The strain is generally accommodated by formation of a supercoil. (d) DNA underwinding also makes the separation of strands some- what easier. In principle, each turn of underwinding should facilitate strand separation over about 10 bp, as shown. However, the hydrogen- bonded base pairs would generally preclude strand separation over such a short distance, and the effect becomes important only for longer DNAs and higher levels of DNA underwinding. 8885d_c24_920-947 2/11/04 1:36 PM Page 932 mac76 mac76:385_reb: it easier to separate DNA strands, giving access to the information they contain. Every cell actively underwinds its DNA with the aid of enzymatic processes (described below), and the resulting strained state represents a form of stored en- ergy. Cells maintain DNA in an underwound state to fa- cilitate its compaction by coiling. The underwinding of DNA is also important to enzymes of DNA metabolism that must bring about strand separation as part of their function. The underwound state can be maintained only if the DNA is a closed circle or if it is bound and stabilized by proteins so that the strands are not free to rotate about each other. If there is a break in one strand of an iso- lated, protein-free circular DNA, free rotation at that point will cause the underwound DNA to revert spon- taneously to the relaxed state. In a closed-circular DNA molecule, however, the number of helical turns cannot be changed without at least transiently breaking one of the DNA strands. The number of helical turns in a DNA molecule therefore provides a precise description of supercoiling. DNA Underwinding Is Defined by Topological Linking Number The field of topology provides a number of ideas that are useful to this discussion, particularly the concept of linking number. Linking number is a topological prop- erty of double-stranded DNA, because it does not vary when the DNA is bent or deformed, as long as both DNA strands remain intact. Linking number (Lk) is illustrated in Figure 24–15. Let’s begin by visualizing the separation of the two strands of a double-stranded circular DNA. If the two strands are linked as shown in Figure 24–15a, they are effectively joined by what can be described as a topological bond. Even if all hydrogen bonds and base- stacking interactions were abolished such that the strands were not in physical contact, this topological bond would still link the two strands. Visualize one of the circular strands as the boundary of a surface (such as a soap film spanning the space framed by a circular wire before you blow a soap bubble). The linking num- ber can be defined as the number of times the second strand pierces this surface. For the molecule in Figure 24–15a, Lk H11005 1; for that in Figure 24–15b, Lk H11005 6. The linking number for a closed-circular DNA is always an integer. By convention, if the links between two DNA strands are arranged so that the strands are interwound in a right-handed helix, the linking number is defined as positive (H11001); for strands interwound in a left-handed helix, the linking number is negative (H11002). Negative link- ing numbers are, for all practical purposes, not en- countered in DNA. We can now extend these ideas to a closed-circular DNA with 2,100 bp (Fig. 24–16a). When the molecule is relaxed, the linking number is simply the number of base pairs divided by the number of base pairs per turn, which is close to 10.5; so in this case, Lk H11005 200. For a circular DNA molecule to have a topological property such as linking number, neither strand may contain a break. If there is a break in either strand, the strands can, in principle, be unraveled and separated com- pletely. In this case, no topological bond exists and Lk is undefined (Fig. 24–16b). We can now describe DNA underwinding in terms of changes in the linking number. The linking number in relaxed DNA, Lk 0 , is used as a reference. For the mol- ecule shown in Figure 24–16a, Lk 0 H11005 200; if two turns are removed from this molecule, Lk H11005 198. The change can be described by the equation H9004Lk H11005 Lk H11002 Lk 0 H11005 198 H11002 200 H11005H110022 It is often convenient to express the change in linking number in terms of a quantity that is independent of the length of the DNA molecule. This quantity, called the specific linking difference (H9268), or superhelical density, is a measure of the number of turns removed relative to the number present in relaxed DNA: H9268 H11005 H5007 H9004 L L k k 0 H5007 In the example in Figure 24–16c, H9268 H11005H110020.01, which means that 1% (2 of 200) of the helical turns present 24.2 DNA Supercoiling 933 (b) Lk = 6 (a) Lk = 1 FIGURE 24–15 Linking number, Lk. Here, as usual, each blue ribbon represents one strand of a double-stranded DNA molecule. For the molecule in (a), Lk H11005 1. For the molecule in (b), Lk H11005 6. One of the strands in (b) is kept untwisted for illustrative purposes, to define the border of an imaginary surface (shaded blue). The number of times the twisting strand penetrates this surface provides a rigorous definition of linking number. 8885d_c24_920-947 2/11/04 1:36 PM Page 933 mac76 mac76:385_reb: in the DNA (in its B form) have been removed. The de- gree of underwinding in cellular DNAs generally falls in the range of 5% to 7%; that is, H9268 H11005H110020.05 to H110020.07. The negative sign indicates that the change in linking num- ber is due to underwinding of the DNA. The supercoil- ing induced by underwinding is therefore defined as negative supercoiling. Conversely, under some condi- tions DNA can be overwound, resulting in positive su- percoiling. Note that the twisting path taken by the axis of the DNA helix when the DNA is underwound (nega- tive supercoiling) is the mirror image of that taken when the DNA is overwound (positive supercoiling) (Fig. 24–17). Supercoiling is not a random process; the path of the supercoiling is largely prescribed by the torsional strain imparted to the DNA by decreasing or increasing the linking number relative to B-DNA. Linking number can be changed by H110061 by breaking one DNA strand, rotating one of the ends 360H11034 about the unbroken strand, and rejoining the broken ends. This change has no effect on the number of base pairs or the number of atoms in the circular DNA molecule. Two forms of a circular DNA that differ only in a topological property such as linking number are referred to as topoisomers. Linking number can be broken down into two struc- tural components called writhe (Wr) and twist (Tw) (Fig. 24–18). These are more difficult to describe than linking number, but writhe may be thought of as a meas- ure of the coiling of the helix axis and twist as deter- mining the local twisting or spatial relationship of neigh- boring base pairs. When the linking number changes, some of the resulting strain is usually compensated for by writhe (supercoiling) and some by changes in twist, giving rise to the equation Lk H11005 Tw H11001 Wr Tw and Wr need not be integers. Twist and writhe are geometric rather than topological properties, because they may be changed by deformation of a closed-circular DNA molecule. In addition to causing supercoiling and making strand separation somewhat easier, the underwinding of Chapter 24 Genes and Chromosomes934 Relaxed DNA Lk H11005 200 ?Lk H11005 H110012?Lk H11005 H110022 Negative supercoils Lk H11005 198 Positive supercoils Lk H11005 202 FIGURE 24–17 Negative and positive supercoils. For the relaxed DNA molecule of Figure 24–16a, underwinding or overwinding by two helical turns (Lk H11005 198 or 202) will produce negative or positive su- percoiling, respectively. Note that the DNA axis twists in opposite directions in the two cases. Straight ribbon (relaxed DNA) Zero writhe, large change in twist Large writhe, small change in twist FIGURE 24–18 Ribbon model for illustrating twist and writhe. The pink ribbon represents the axis of a relaxed DNA molecule. Strain introduced by twisting the ribbon (underwinding the DNA) can be manifested as writhe or twist. Changes in linking number are usually accompanied by changes in both writhe and twist. (a) Lk H11005 200 H11005 Lk 0 (b) Lk undefined (c) Lk = 198 strand break ?Lk H11005 H110022 Nick FIGURE 24–16 Linking number applied to closed-circular DNA mol- ecules. A 2,100 bp circular DNA is shown in three forms: (a) relaxed, Lk H11005 200; (b) relaxed with a nick (break) in one strand, Lk undefined; and (c) underwound by two turns, Lk H11005 198. The underwound mole- cule generally exists as a supercoiled molecule, but underwinding also facilitates the separation of DNA strands. 8885d_c24_920-947 2/11/04 1:36 PM Page 934 mac76 mac76:385_reb: DNA facilitates a number of structural changes in the molecule. These are of less physiological importance but help illustrate the effects of underwinding. Recall that a cruciform (see Fig. 8–21) generally contains a few un- paired bases; DNA underwinding helps to maintain the required strand separation (Fig. 24–19). Underwinding of a right-handed DNA helix also facilitates the forma- tion of short stretches of left-handed Z-DNA in regions where the base sequence is consistent with the Z form (Chapter 8). Topoisomerases Catalyze Changes in the Linking Number of DNA DNA supercoiling is a precisely regulated process that influences many aspects of DNA metabolism. Every cell has enzymes with the sole function of underwinding and/or relaxing DNA. The enzymes that increase or de- crease the extent of DNA underwinding are topoiso- merases; the property of DNA that they change is the linking number. These enzymes play an especially im- 24.2 DNA Supercoiling 935 Relaxed DNA Underwound DNA Cruciform DNA FIGURE 24–19 Promotion of cruciform structures by DNA under- winding. In principle, cruciforms can form at palindromic sequences (see Fig. 8–21), but they seldom occur in relaxed DNA because the linear DNA accommodates more paired bases than does the cruci- form structure. Underwinding of the DNA facilitates the partial strand separation needed to promote cruciform formation at appropriate sequences. Relaxed DNA Highly supercoiled DNA 12 3 Decreasing Lk FIGURE 24–20 Visualization of topoisomers. In this experiment, all DNA molecules have the same number of base pairs but exhibit some range in the degree of supercoiling. Because supercoiled DNA mole- cules are more compact than relaxed molecules, they migrate more rapidly during gel electrophoresis. The gels shown here separate topoi- somers (moving from top to bottom) over a limited range of superhe- lical density. In lane 1, highly supercoiled DNA migrates in a single band, even though different topoisomers are probably present. Lanes 2 and 3 illustrate the effect of treating the supercoiled DNA with a type I topoisomerase; the DNA in lane 3 was treated for a longer time than that in lane 2. As the superhelical density of the DNA is reduced to the point where it corresponds to the range in which the gel can resolve individual topoisomers, distinct bands appear. Individual bands in the region indicated by the bracket next to lane 3 each contain DNA circles with the same linking number; the linking number changes by 1 from one band to the next. portant role in processes such as replication and DNA packaging. There are two classes of topoisomerases. Type I topoisomerases act by transiently breaking one of the two DNA strands, passing the unbroken strand through the break, and rejoining the broken ends; they change Lk in increments of 1. Type II topoisomerases break both DNA strands and change Lk in increments of 2. The effects of these enzymes can be demonstrated using agarose gel electrophoresis (Fig. 24–20). A pop- ulation of identical plasmid DNAs with the same linking number migrates as a discrete band during electro- phoresis. Topoisomers with Lk values differing by as little as 1 can be separated by this method, so changes in linking number induced by topoisomerases are read- ily detected. 8885d_c24_920-947 2/11/04 1:36 PM Page 935 mac76 mac76:385_reb: 5H11032 Tyr Closed conformation 3H11032 3H11032 5H11032 Open conformation 5H110323H11032 3H11032 3H110325H11032 After DNA binds (step 1 ), an active-site Tyr attacks a phosphodiester bond on one DNA strand in step 2 , cleaving it, creating a covalent 5H11032- P –Tyr protein-DNA linkage, and liberating the 3H11032-hydroxyl group of the adjacent nucleotide. In step 3 the enzyme switches to its open conformation, and the unbroken DNA strand passes through the break in the first strand. With the enzyme in the closed conformation, the liberated 3H11032-hydroxyl group attacks the 5H11032- P –Tyr protein-DNA linkage in step 4 to religate the cleaved DNA strand. Release, or begin new cyle : 1 2 3 (a) (b) (c) O O O P O – O OO OH CH 2 CH 2 O H Tyr O O O P O – O O CH 2 CH 2 O H Tyr O – O O P O CH 2 CH 2 O Tyr : 4 5 O – OH O O P H + OO O CH 2 CH 2 Tyr 5H110323H11032 3H11032 3H110325H11032 5H11032 3H11032 3H11032 5H11032 Base Base Base O BaseO BaseO BaseO BaseO O Base OO MECHANISM FIGURE 24–21 Bacterial type I topoisomerases alter linking number. A proposed reaction sequence for the bacterial topoi- somerase I is illustrated. The enzyme has closed and open conforma- tions. (a) A DNA molecule binds to the closed conformation and one DNA strand is cleaved. (b) The enzyme changes to its open confor- mation, and the other DNA strand moves through the break in the first strand. (c) In the closed conformation, the DNA strand is religated. 8885d_c24_920-947 2/11/04 1:36 PM Page 936 mac76 mac76:385_reb: E. coli has at least four different individual topo- isomerases (I through IV). Those of type I (topoiso- merases I and III) generally relax DNA by removing negative supercoils (increasing Lk). The way in which bacterial type I topoisomerases change linking number is illustrated in Figure 24–21. A bacterial type II enzyme, called either topoisomerase II or DNA gyrase, can in- troduce negative supercoils (decrease Lk). It uses the energy of ATP to accomplish this. To alter DNA linking number, type II topoisomerases cleave both strands of a DNA molecule and pass another duplex through the break. The degree of supercoiling of bacterial DNA is maintained by regulation of the net activity of topoiso- merases I and II. Eukaryotic cells also have type I and type II topo- isomerases. The type I enzymes are topoisomerases I and III; the type II enzymes are topoisomerases IIH9251 and IIH9252. The eukaryotic type II topoisomerases cannot under- wind DNA (introduce negative supercoils), but they can relax both positive and negative supercoils. We consider one probable origin of negative supercoils in eukaryotic cells in our discussion of chromatin in Section 24.3. The process catalyzed by eukaryotic type II topoisomerases is illustrated in Figure 24–22. DNA Compaction Requires a Special Form of Supercoiling Supercoiled DNA molecules are uniform in a number of respects. The supercoils are right-handed in a negatively supercoiled DNA molecule (Fig. 24–17), and they tend to be extended and narrow rather than compacted, of- ten with multiple branches (Fig. 24–23). At the super- helical densities normally encountered in cells, the length of the supercoil axis, including branches, is about 40% of the length of the DNA. This type of supercoiling is referred to as plectonemic (from the Greek plektos, “twisted,” and nema, “thread”). This term can be ap- plied to any structure with strands intertwined in some simple and regular way, and it is a good description of the general structure of supercoiled DNA in solution. 24.2 DNA Supercoiling 937 3 4 5 2 1 N-gate C-gate FIGURE 24–22 Proposed mechanism for the alteration of linking number by eukaryotic type IIA topoisomerases. 1 The multisubunit enzyme binds one DNA molecule (blue). Gated cavities above and below the bound DNA are called the N-gate and the C-gate. 2 A second segment of the same DNA molecule (red) is bound at the N- gate and 3 trapped. Both strands of the first DNA are now cleaved (the chemistry is similar to that in Fig. 24–20b), and 4 the second DNA segment is passed through the break. 5 The broken DNA is re- ligated, and the second DNA segment is released through the C-gate. Two ATPs are bound and hydrolyzed during this cycle; it is likely that one is hydrolyzed in the step leading to the complex in step 4 . Ad- ditional details of the ATP hydrolysis component of the reaction re- main to be worked out. Plectonemic supercoiling, the form observed in isolated DNAs in the laboratory, does not produce suf- ficient compaction to package DNA in the cell. A sec- ond form of supercoiling, solenoidal (Fig. 24–24), can be adopted by an underwound DNA. Instead of the (a) (c) Branch points Supercoil axis (b) FIGURE 24–23 Plectonemic supercoiling. (a) Electron micrograph of plectonemically supercoiled plasmid DNA and (b) an interpretation of the observed structure. The purple lines show the axis of the supercoil; note the branching of the supercoil. (c) An idealized representation of this structure. 8885d_c24_920-947 2/11/04 1:36 PM Page 937 mac76 mac76:385_reb: extended right-handed supercoils characteristic of the plectonemic form, solenoidal supercoiling involves tight left-handed turns, similar to the shape taken up by a garden hose neatly wrapped on a reel. Although their structures are dramatically different, plectonemic and solenoidal supercoiling are two forms of negative super- coiling that can be taken up by the same segment of underwound DNA. The two forms are readily intercon- vertible. Although the plectonemic form is more stable in solution, the solenoidal form can be stabilized by protein binding and is the form found in chromatin. It provides a much greater degree of compaction (Fig. 24–24b). Solenoidal supercoiling is the mechanism by which underwinding contributes to DNA compaction. SUMMARY 24.2 DNA Supercoiling ■ Most cellular DNAs are supercoiled. Under- winding decreases the total number of helical turns in the DNA relative to the relaxed, B form. To maintain an underwound state, DNA must be either a closed circle or bound to protein. Underwinding is quantified by a topological parameter called linking number, Lk. ■ Underwinding is measured in terms of specific linking difference, H9268 (also called superhelical density), which is (Lk H11002 Lk 0 )/Lk 0 . For cellular DNAs, H9268 is typically H110020.05 to H110020.07, which means that approximately 5% to 7% of the helical turns in the DNA have been removed. DNA underwinding facilitates strand separation by enzymes of DNA metabolism. ■ DNAs that differ only in linking number are called topoisomers. Enzymes that underwind and/or relax DNA, the topoisomerases, catalyze changes in linking number. The two classes of topoisomerases, type I and type II, change Lk in increments of 1 or 2, respectively, per catalytic event. 24.3 The Structure of Chromosomes The term “chromosome” is used to refer to a nucleic acid molecule that is the repository of genetic informa- tion in a virus, a bacterium, a eukaryotic cell, or an or- ganelle. It also refers to the densely colored bodies seen in the nuclei of dye-stained eukaryotic cells, as visual- ized using a light microscope. Chromatin Consists of DNA and Proteins The eukaryotic cell cycle (see Fig. 12–41) produces re- markable changes in the structure of chromosomes (Fig. 24–25). In nondividing eukaryotic cells (in G0) and those in interphase (G1, S, and G2), the chromosomal material, chromatin, is amorphous and appears to be randomly dispersed in certain parts of the nucleus. In the S phase of interphase the DNA in this amorphous state replicates, each chromosome producing two sister chromosomes (called sister chromatids) that remain as- sociated with each other after replication is complete. The chromosomes become much more condensed dur- ing prophase of mitosis, taking the form of a species- specific number of well-defined pairs of sister chro- matids (Fig. 24–5). Chromatin consists of fibers containing protein and DNA in approximately equal masses, along with a small amount of RNA. The DNA in the chromatin is very tightly associated with proteins called histones, which package and order the DNA into structural units called nucleosomes (Fig. 24–26). Also found in chromatin are many nonhistone proteins, some of which help maintain chromosome structure, others that regulate the ex- pression of specific genes (Chapter 28). Beginning with nucleosomes, eukaryotic chromosomal DNA is packaged into a succession of higher-order structures that ulti- mately yield the compact chromosome seen with the light microscope. We now turn to a description of this structure in eukaryotes and compare it with the pack- aging of DNA in bacterial cells. Chapter 24 Genes and Chromosomes938 (b)(a) Plectonemic Solenoidal FIGURE 24–24 Plectonemic and solenoidal supercoiling. (a) Plec- tonemic supercoiling takes the form of extended right-handed coils. Solenoidal negative supercoiling takes the form of tight left-handed turns about an imaginary tubelike structure. The two forms are read- ily interconverted, although the solenoidal form is generally not ob- served unless certain proteins are bound to the DNA. (b) Plectonemic (top) and solenoidal supercoiling of the same DNA molecule, drawn to scale. Solenoidal supercoiling provides a much greater degree of compaction. 8885d_c24_920-947 2/11/04 1:36 PM Page 938 mac76 mac76:385_reb: Histones Are Small, Basic Proteins Found in the chromatin of all eukaryotic cells, histones have molecular weights between 11,000 and 21,000 and are very rich in the basic amino acids arginine and ly- sine (together these make up about one-fourth of the amino acid residues). All eukaryotic cells have five ma- jor classes of histones, differing in molecular weight and amino acid composition (Table 24–3). The H3 histones are nearly identical in amino acid sequence in all eukaryotes, as are the H4 histones, suggesting strict conservation of their functions. For example, only 2 of 102 amino acid residues differ between the H4 histone molecules of peas and cows, and only 8 differ between the H4 histones of humans and yeast. Histones H1, H2A, and H2B show less sequence similarity among eukary- otic species. Each type of histone has variant forms, because cer- tain amino acid side chains are enzymatically modified by methylation, ADP-ribosylation, phosphorylation, gly- cosylation, or acetylation. Such modifications affect the net electric charge, shape, and other properties of histones, as well as the structural and functional prop- erties of the chromatin, and they play a role in the reg- ulation of transcription (Chapter 28). 24.3 The Structure of Chromosomes 939 Interphase Mitosis Metaphase Anaphase Prophase Spindle pole G2G1 condensation replication and cohesion Condensins Cohesins Replication completed Cohesin Duplex DNA S Replication occurs from multiple origins of replication; daughter chromatids are linked by cohesins alignment separation FIGURE 24–25 Changes in chromosome structure during the eukaryotic cell cycle. Cellular DNA is uncondensed throughout interphase. The interphase period can be subdivided (see Fig. 12–41) into the G1 (gap) phase; the S (synthesis) phase, when the DNA is replicated; and the G2 phase, in which the replicated chromosomes cohere to one another. The DNA undergoes condensation in the prophase of mitosis. Cohesins (green) and condensins (red) are proteins involved in cohesion and condensation (discussed later in the chapter). The architecture of the cohesin- condensin-DNA complex is not yet established, and the interactions shown here are figurative, simply suggesting their role in condensation of the chromosome. During metaphase, the condensed chromosomes line up along a plane halfway between the spindle poles. One chromosome of each pair is linked to each spindle pole via microtubules that extend between the spindle and the centromere. The sister chromatids separate at anaphase, each drawn toward the spindle pole to which it is connected. After cell division is complete, the chromosomes decondense and the cycle begins anew. Histone core of nucleosome Linker DNA of nucleosome (a) (b) 50 nm FIGURE 24–26 Nucleosomes. Regularly spaced nucleosomes consist of histone complexes bound to DNA. (a) Schematic illustration and (b) electron micrograph. 8885d_c24_920-947 2/11/04 1:36 PM Page 939 mac76 mac76:385_reb: Nucleosomes Are the Fundamental Organizational Units of Chromatin The eukaryotic chromosome depicted in Figure 24–5 represents the compaction of a DNA molecule about 10 5 H9262m long into a cell nucleus that is typically 5 to 10 H9262m in diameter. This compaction involves several levels of highly organized folding. Subjection of chromo- somes to treatments that partially unfold them reveals a structure in which the DNA is bound tightly to beads of protein, often regularly spaced (Fig. 24–26). The beads in this “beads-on-a-string” arrangement are com- plexes of histones and DNA. The bead plus the con- necting DNA that leads to the next bead form the nu- cleosome, the fundamental unit of organization upon which the higher-order packing of chromatin is built. The bead of each nucleosome contains eight histone molecules: two copies each of H2A, H2B, H3, and H4. The spacing of the nucleosome beads provides a re- peating unit typically of about 200 bp, of which 146 bp are bound tightly around the eight-part histone core and the remainder serve as linker DNA between nucleosome beads. Histone H1 binds to the linker DNA. Brief treat- ment of chromatin with enzymes that digest DNA causes preferential degradation of the linker DNA, releasing his- tone particles containing 146 bp of bound DNA that have been protected from digestion. Researchers have crys- tallized nucleosome cores obtained in this way, and x-ray diffraction analysis reveals a particle made up of the eight histone molecules with the DNA wrapped around it in the form of a left-handed solenoidal super- coil (Fig. 24–27). A close inspection of this structure reveals why eu- karyotic DNA is underwound even though eukaryotic cells lack enzymes that underwind DNA. Recall that the solenoidal wrapping of DNA in nucleosomes is but one form of supercoiling that can be taken up by under- wound (negatively supercoiled) DNA. The tight wrap- ping of DNA around the histone core requires the re- moval of about one helical turn in the DNA. When the protein core of a nucleosome binds in vitro to a relaxed, closed-circular DNA, the binding introduces a negative supercoil. Because this binding process does not break the DNA or change the linking number, the formation of a negative solenoidal supercoil must be accompanied by a compensatory positive supercoil in the unbound re- gion of the DNA (Fig. 24–28). As mentioned earlier, eu- karyotic topoisomerases can relax positive supercoils. Relaxing the unbound positive supercoil leaves the neg- ative supercoil fixed (through its binding to the nucle- osome histone core) and results in an overall decrease in linking number. Indeed, topoisomerases have proved necessary for assembling chromatin from purified his- tones and closed-circular DNA in vitro. Another factor that affects the binding of DNA to histones in nucleosome cores is the sequence of the Chapter 24 Genes and Chromosomes940 H2B H4 H2A H2A H3 H4 H3 H2B (a) (b) (c) FIGURE 24–27 DNA wrapped around a nucleosome core. (a) Space- filling representation of the nucleosome protein core, with different colors for the different histones (PDB ID 1AOI). (b) Top and (c) side views of the crystal structure of a nucleosome with 146 bp of bound DNA. The protein is depicted as a gray surface contour, with the bound DNA in blue. The DNA binds in a left-handed solenoidal supercoil that circumnavigates the histone complex 1.8 times. A schematic draw- ing is included in (c) for comparison with other figures depicting nucleosomes. 8885d_c24_920-947 2/11/04 1:36 PM Page 940 mac76 mac76:385_reb: bound DNA. Histone cores do not bind randomly to DNA; rather, they tend to position themselves at certain locations. This positioning is not fully understood but in some cases appears to depend on a local abundance of AUT base pairs in the DNA helix where it is in contact with the histones (Fig. 24–29). The tight wrapping of the DNA around the nucleosome’s histone core requires compression of the minor groove of the helix at these points, and a cluster of two or three AUT base pairs makes this compression more likely. Other proteins are required for the positioning of some nucleosome cores on DNA. In several organisms, certain proteins bind to a specific DNA sequence and then facilitate the formation of a nucleosome core nearby. Precise positioning of nucleosome cores can play a role in the expression of some eukaryotic genes (Chapter 28). 24.3 The Structure of Chromosomes 941 TABLE 24–3 Types and Properties of Histones Number of Content of basic amino Molecular amino acid acids (% of total) Histone weight residues Lys Arg H1 * 21,130 223 29.5 11.3 H2A * 13,960 129 10.9 19.3 H2B * 13,774 125 16.0 16.4 H3 15,273 135 19.6 13.3 H4 11,236 102 10.8 13.7 * The sizes of these histones vary somewhat from species to species. The numbers given here are for bovine histones. FIGURE 24–28 Chromatin assembly. (a) Relaxed, closed-circular DNA. (b) Binding of a histone core to form a nucleosome induces one negative supercoil. In the absence of any strand breaks, a positive supercoil must form elsewhere in the DNA (H9004Lk H11005 0). (c) Relaxation of this positive supercoil by a topoisomerase leaves one net negative supercoil (H9004Lk H11005H110021). DNA Histone core (a) (b) (c) One (net) negative supercoil H9004Lk H11005 0 H9004Lk H11005 H110021 topoisomerase Bound negative supercoil (solenoidal) Unbound positive supercoil (plectonemic) DNA Histone core A T pairs abundant FIGURE 24–29 Positioning of a nucleosome to make optimal use of AUT base pairs where the histone core is in contact with the minor groove of the DNA helix. 8885d_c24_920-947 2/11/04 1:36 PM Page 941 mac76 mac76:385_reb: Nucleosomes Are Packed into Successively Higher Order Structures Wrapping of DNA around a nucleosome core compacts the DNA length about sevenfold. The overall compaction in a chromosome, however, is greater than 10,000-fold— ample evidence for even higher orders of structural or- ganization. In chromosomes isolated by very gentle methods, nucleosome cores appear to be organized into a structure called the 30 nm fiber (Fig. 24–30). This packing requires one molecule of histone H1 per nucle- osome core. Organization into 30 nm fibers does not ex- tend over the entire chromosome but is punctuated by regions bound by sequence-specific (nonhistone) DNA- binding proteins. The 30 nm structure also appears to depend on the transcriptional activity of the particular region of DNA. Regions in which genes are being tran- scribed are apparently in a less-ordered state that con- tains little, if any, histone H1. The 30 nm fiber, a second level of chromatin or- ganization, provides an approximately 100-fold com- paction of the DNA. The higher levels of folding are not yet understood, but it appears that certain regions of DNA associate with a nuclear scaffold (Fig. 24–31). The scaffold-associated regions are separated by loops of DNA with perhaps 20 to 100 kbp. The DNA in a loop may contain a set of related genes. For example, in Drosophila complete sets of histone-coding genes seem to cluster together in loops that are bounded by scaf- fold attachment sites (Fig. 24–32). The scaffold itself appears to contain several proteins, notably large amounts of histone H1 (located in the interior of the fiber) and topoisomerase II. The presence of topoiso- merase II further emphasizes the relationship between DNA underwinding and chromatin structure. Topoiso- merase II is so important to the maintenance of chro- matin structure that inhibitors of this enzyme can kill Chapter 24 Genes and Chromosomes942 FIGURE 24–30 The 30 nm fiber, a higher-order organization of nu- cleosomes. (a) Schematic illustration of the probable structure of the fiber, showing nucleosome packing. (b) Electron micrograph. FIGURE 24–31 A partially unraveled human chromosome, revealing numerous loops of DNA attached to a scaffoldlike structure. 30 nm 30 nm Fiber Histone genes Nuclear scaffold H1 H3 H4 H2B H2A FIGURE 24–32 Loops of chromosomal DNA attached to a nuclear scaffold. The DNA in the loops is packaged as 30 nm fibers, so the loops are the next level of organization. Loops often contain groups of genes with related functions. Complete sets of histone-coding genes, as shown in this schematic illustration, appear to be clustered in loops of this kind. Unlike most genes, histone genes occur in multiple copies in many eukaryotic genomes. (a) (b) 8885d_c24_920-947 2/11/04 1:36 PM Page 942 mac76 mac76:385_reb: rapidly dividing cells. Several drugs used in cancer chemotherapy are topoisomerase II inhibitors that allow the enzyme to promote strand breakage but not the re- sealing of the breaks. Evidence exists for additional layers of organization in eukaryotic chromosomes, each dramatically enhanc- ing the degree of compaction. One model for achieving this compaction is illustrated in Figure 24–33. Higher- order chromatin structure probably varies from chro- mosome to chromosome, from one region to the next in a single chromosome, and from moment to moment in the life of a cell. No single model can adequately de- scribe these structures. Nevertheless, the principle is clear: DNA compaction in eukaryotic chromosomes is likely to involve coils upon coils upon coils . . . Three- Dimensional Packaging of Nuclear Chromosomes Condensed Chromosome Structures Are Maintained by SMC Proteins A third major class of chromatin proteins, in addition to the histones and topoisomerases, is the SMC proteins (structural maintenance of chromosomes). The primary structure of SMC proteins consists of five distinct do- mains (Fig. 24–34a). The amino- and carboxyl-terminal globular domains, N and C, each of which has part of an ATP hydrolytic site, are connected by two regions of H9251-helical coiled-coil motifs (see Fig. 4–11) that are joined by a hinge domain. The proteins are generally dimeric, forming a V-shaped complex that is thought to be tied together through their hinge domains (Fig. 24–34b). One N and one C domain come together to form a complete ATP hydrolytic site at each end of the V. Proteins in the SMC family are found in all types of organisms, from bacteria to humans. Eukaryotes have two major types, cohesins and condensins (Fig. 24–25). The cohesins play a substantial role in linking together sister chromatids immediately after replication and keeping them together as the chromosomes condense to metaphase. This linkage is essential if chromosomes are to segregate properly at cell division. The detailed mechanism by which cohesins link sister chromosomes, and the role of ATP hydrolysis, are not yet understood. The condensins are essential to the condensation of chromosomes as cells enter mitosis. In the laboratory, condensins bind to DNA in a manner that creates pos- itive supercoils; that is, condensin binding causes the DNA to become overwound, in contrast to the under- winding induced by the binding of nucleosomes. It is not yet clear how this helps to compact the chromatin, al- though one possibility is presented in Figure 24–35. Bacterial DNA Is Also Highly Organized We now turn briefly to the structure of bacterial chro- mosomes. Bacterial DNA is compacted in a structure called the nucleoid, which can occupy a significant 24.3 The Structure of Chromosomes 943 Nuclear scaffold Two chromatids (10 coils each) One coil (30 rosettes) One rosette (6 loops) One loop (~75,000 bp) 30 nm Fiber “Beads-on- a-string” form of chromatin DNA FIGURE 24–33 Compaction of DNA in a eukaryotic chromosome. Model for levels of organization that could provide DNA compaction in the chromosomes of eukaryotes. The levels take the form of coils upon coils. In cells, the higher-order structures (above the 30 nm fibers) are unlikely to be as uniform as depicted here. 8885d_c24_920-947 2/11/04 1:36 PM Page 943 mac76 mac76:385_reb: fraction of the cell volume (Fig. 24–36). The DNA ap- pears to be attached at one or more points to the inner surface of the plasma membrane. Much less is known about the structure of the nucleoid than of eu- karyotic chromatin. In E. coli, a scaffoldlike structure appears to organize the circular chromosome into a series of looped domains, as described above for chro- matin. Bacterial DNA does not seem to have any struc- ture comparable to the local organization provided by nucleosomes in eukaryotes. Histonelike proteins are abundant in E. coli—the best-characterized example is a two-subunit protein called HU (M r 19,000)—but these proteins bind and dissociate within minutes, and no regular, stable DNA-histone structure has been found. The bacterial chromosome is a relatively dy- namic molecule, possibly reflecting a requirement for more ready access to its genetic information. The bac- terial cell division cycle can be as short as 15 min, whereas a typical eukaryotic cell may not divide for hours or even months. In addition, a much greater fraction of prokaryotic DNA is used to encode RNA and/or protein products. Higher rates of cellular me- tabolism in bacteria mean that a much higher propor- tion of the DNA is being transcribed or replicated at a given time than in most eukaryotic cells. Chapter 24 Genes and Chromosomes944 Condensin + Relaxed DNA (–) topoisomerase I (–) (+)(+) (+)(+) FIGURE 24–35 Model for the effect of condensins on DNA super- coiling. Binding of condensins to a closed-circular DNA in the pres- ence of topoisomerase I leads to the production of positive supercoils (H11001). Wrapping of the DNA about the condensin introduces positive supercoils because it wraps in the opposite sense to a solenoidal su- percoil (see Fig. 24–24). The compensating negative supercoils (H11002) that appear elsewhere in the DNA are then relaxed by topoisomerase I. In the chromosome, it is the wrapping of the DNA about condensin that may contribute to DNA condensation. 2 mH9262 FIGURE 24–36 E. coli cells showing nucleoids. The DNA is stained with a dye that fluoresces when exposed to UV light. The light area defines the nucleoid. Note that some cells have replicated their DNA but have not yet undergone cell division and hence have multiple nucleoids. ATP ATP (a) N Hinge Coiled coil Coiled coil 50 nm C (b) (c) FIGURE 24–34 Structure of SMC proteins. (a) The five domains of the SMC primary structure. N and C denoted the amino-terminal and carboxyl-terminal domains, respectively. (b) Each polypeptide is folded so that the two coiled-coil domains wrap around each other and the N and C domains come together to form a complete ATP- binding site. Two of these domains are linked at the hinge region to form the dimeric V-shaped molecule. (c) Electron micrograph of SMC proteins from Bacillus subtilis. 8885d_c24_920-947 2/11/04 1:36 PM Page 944 mac76 mac76:385_reb: With this overview of the complexity of DNA struc- ture, we are now ready to turn, in the next chapter, to a discussion of DNA metabolism. SUMMARY 24.3 The Structure of Chromosomes ■ The fundamental unit of organization in the chromatin of eukaryotic cells is the nucleosome, which consists of histones and a 200 bp segment of DNA. A core protein particle containing eight histones (two copies each of histones H2A, H2B, H3, and H4) is encircled by a segment of DNA (about 146 bp) in the form of a left-handed solenoidal supercoil. ■ Nucleosomes are organized into 30 nm fibers, and the fibers are extensively folded to provide the 10,000-fold compaction required to fit a typical eukaryotic chromosome into a cell nucleus. The higher-order folding involves attachment to a nuclear scaffold that contains histone H1, topoisomerase II, and SMC proteins. ■ Bacterial chromosomes are also extensively compacted into the nucleoid, but the chromosome appears to be much more dynamic and irregular in structure than eukaryotic chromatin, reflecting the shorter cell cycle and very active metabolism of a bacterial cell. Chapter 24 Further Reading 945 Key Terms gene 921 genome 923 chromosome 923 phenotype 924 mutation 924 regulatory sequence 924 plasmid 925 intron 928 exon 928 simple-sequence DNA 929 satellite DNA 929 centromere 930 telomere 930 supercoil 930 relaxed DNA 930 topology 931 underwinding 932 linking number 933 specific linking difference (H9268) 933 superhelical density 933 topoisomers 934 topoisomerases 935 plectonemic 937 solenoidal 937 chromatin 938 histones 938 nucleosome 938 30 nm fiber 942 SMC proteins 943 cohesins 943 condensins 943 nucleoid 943 Terms in bold are defined in the glossary. Further Reading General Blattner, F.R., Plunkett, G., III, Bloch, C.A., Perna, N.T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., Mayhew, G.F., et al. (1997) The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1474. New secrets of this common laboratory organism are revealed. Cozzarelli, N.R. & Wang, J.C. (eds) (1990) DNA Topology and Its Biological Effects, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Kornberg, A. & Baker, T.A. (1991) DNA Replication, 2nd edn, W. H. Freeman & Company, New York. A good place to start for further information on the structure and function of DNA. Lodish, H., Berk, A., Matsudaira, P., Kaiser, C.A., Krieger, M., Scott, M.P., Zipursky, S.L., & Darnell, J. (2003) Molecular Cell Biology, 5th edn, W. H. Freeman & Company, New York. Another excellent general reference. Genes and Chromosomes Bromham, L. (2002) The human zoo: endogenous retroviruses in the human genome. Trends Ecol. Evolut. 17, 91–97. A thorough description of one of the transposon classes that makes up a large part of the human genome. Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., et al. (1996) Life with 6000 genes. Science 274, 546, 563–567. Report of the first complete sequence of a eukaryotic genome, the yeast Saccharomyces cerevisiae. Greider, C.W. & Blackburn, E.H. (1996) Telomeres, telomerase and cancer. Sci. Am. 274 (February), 92–97. Huxley, C. (1997) Mammalian artificial chromosomes and chromo- some transgenics. Trends Genet. 13, 345–347. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860–921. One of the first reports on the draft sequence of the human genome, with lots of analysis and many associated articles. Long, M., de Souza, S.J., & Gilbert, W. (1995) Evolution of the intron-exon structure of eukaryotic genes. Curr. Opin. Genet. Dev. 5, 774–778. McEachern, M.J., Krauskopf, A., & Blackburn, E.H. (2000) Telomeres and their control. Annu. Rev. Genet. 34, 331–358. 8885d_c24_945 2/12/04 11:22 AM Page 945 mac76 mac76:385_reb: Chapter 24 Genes and Chromosomes946 Schmid, C.W. (1996) Alu: structure, origin, evolution, significance and function of one-tenth of human DNA. Prog. Nucleic Acid Res. Mol. Biol. 53, 283–319. Tyler-Smith, C. & Floridia, G. (2000) Many paths to the top of the mountain: diverse evolutionary solutions to centromere struc- ture. Cell 102, 5–8. Details of the diversity of centromere structures from different organisms, as currently understood. Zakian, V.A. (1996) Structure, function, and replication of Saccharomyces cerevisiae telomeres. Annu. Rev. Genet. 30, 141–172. Supercoiling and Topoisomerases Berger, J.M. (1998) Type II DNA topoisomerases. Curr. Opin. Struct. Biol. 8, 26–32. Boles, T.C., White, J.H., & Cozzarelli, N.R. (1990) Structure of plectonemically supercoiled DNA. J. Mol. Biol. 213, 931–951. A study that defines several fundamental features of supercoiled DNA. Champoux, J.J. (2001) DNA topoisomerases: structure, function, and mechanism. Annu. Rev. Biochem. 70, 369–413. An excellent summary of the topoisomerase classes. Cozzarelli, N.R., Boles, T.C., & White, J.H. (1990) Primer on the topology and geometry of DNA supercoiling. In DNA Topology and Its Biological Effects (Cozzarelli, N.R. & Wang, J.C., eds), pp. 139–184, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. A more advanced and thorough discussion. Lebowitz, J. (1990) Through the looking glass: the discovery of supercoiled DNA. Trends Biochem. Sci. 15, 202–207. A short and interesting historical note. Wang, J.C. (2002) Cellular roles of DNA topoisomerases: a molecular perspective. Nat. Rev. Mol. Cell Biol. 3, 430–440. Chromatin and Nucleosomes Filipski, J., Leblanc, J., Youdale, T., Sikorska, M., & Walker, P.R. (1990) Periodicity of DNA folding in higher order chromatin structures. EMBO J. 9, 1319–1327. Hirano, T. (2002) The ABCs of SMC proteins: two-armed ATPases for chromosome condensation, cohesion and repair. Genes Dev. 16, 399–414. Description of the rapid advances in understanding of this interesting class of proteins. Kornberg, R.D. (1974) Chromatin structure: a repeating unit of histones and DNA. Science 184, 868–871. A classic paper that introduced the subunit model for chromatin. Nasmyth, K. (2002) Segregating sister genomes: the molecular biology of chromosome separation. Science 297, 559–565. Wyman, C. & Kanaar, R. (2002) Chromosome organization: reaching out to embrace new models. Curr. Biol. 12, R446–R448. A good, short summary of chromosome structure and the roles of SMC proteins within it. Zlatanova, J. & van Holde, K. (1996) The linker histones and chromatin structure: new twists. Prog. Nucleic Acid Res. Mol. Biol. 52, 217–259. 1. Packaging of DNA in a Virus Bacteriophage T2 has a DNA of molecular weight 120 H11003 10 6 contained in a head about 210 nm long. Calculate the length of the DNA (assume the molecular weight of a nucleotide pair is 650) and com- pare it with the length of the T2 head. 2. The DNA of Phage M13 The base composition of phage M13 DNA is A, 23%; T, 36%; G, 21%; C, 20%. What does this tell you about the DNA of phage M13? 3. The Mycoplasma Genome The complete genome of the simplest bacterium known, Mycoplasma genitalium, is a circular DNA molecule with 580,070 bp. Calculate the mo- lecular weight and contour length (when relaxed) of this mol- ecule. What is Lk 0 for the Mycoplasma chromosome? If H9268 H11005H110020.06, what is Lk? 4. Size of Eukaryotic Genes An enzyme isolated from rat liver has 192 amino acid residues and is coded for by a gene with 1,440 bp. Explain the relationship between the number of amino acid residues in the enzyme and the num- ber of nucleotide pairs in its gene. 5. Linking Number A closed-circular DNA molecule in its relaxed form has an Lk of 500. Approximately how many base pairs are in this DNA? How is the linking number altered (increases, decreases, doesn’t change, becomes undefined) when (a) a protein complex is bound to form a nucleosome, (b) one DNA strand is broken, (c) DNA gyrase and ATP are added to the DNA solution, or (d) the double helix is dena- tured by heat? 6. Superhelical Density Bacteriophage H9261 infects E. coli by integrating its DNA into the bacterial chromosome. The success of this recombination depends on the topology of the E. coli DNA. When the superhelical density (H9268) of the E. coli DNA is greater than H110020.045, the probability of integration is H1102120%; when H9268 is less than H110020.06, the probability is H1102270%. Plasmid DNA isolated from an E. coli culture is found to have a length of 13,800 bp and an Lk of 1,222. Calculate H9268 for this DNA and predict the likelihood that bacteriophage H9261 will be able to infect this culture. 7. Altering Linking Number (a) What is the Lk of a 5,000 bp circular duplex DNA molecule with a nick in one strand? (b) What is the Lk of the molecule in (a) when the nick is sealed (relaxed)? (c) How would the Lk of the mole- cule in (b) be affected by the action of a single molecule of E. coli topoisomerase I? (d) What is the Lk of the molecule in (b) after eight enzymatic turnovers by a single molecule of DNA gyrase in the presence of ATP? (e) What is the Lk of the molecule in (d) after four enzymatic turnovers by a single mol- ecule of bacterial type I topoisomerase? (f) What is the Lk of the molecule in (d) after binding of one nucleosome? Problems 8885d_c24_920-947 2/11/04 1:36 PM Page 946 mac76 mac76:385_reb: Chapter 24 Problems 947 8. Chromatin Early evidence that helped researchers define nucleosome structure is illustrated by the agarose gel below, in which the thick bands represent DNA. It was gen- erated by briefly treating chromatin with an enzyme that degrades DNA, then removing all protein and subjecting the purified DNA to electrophoresis. Numbers at the side of the gel denote the position to which a linear DNA of the indicated size would migrate. What does this gel tell you about chro- matin structure? Why are the DNA bands thick and spread out rather than sharply defined? 9. DNA Structure Explain how the underwinding of a B- DNA helix might facilitate or stabilize the formation of Z-DNA. 10. Maintaining DNA Structure (a) Describe two struc- tural features required for a DNA molecule to maintain a neg- atively supercoiled state. (b) List three structural changes that become more favorable when a DNA molecule is nega- tively supercoiled. (c) What enzyme, with the aid of ATP, can generate negative superhelicity in DNA? (d) Describe the physical mechanism by which this enzyme acts. 11. Yeast Artificial Chromosomes (YACs) YACs are used to clone large pieces of DNA in yeast cells. What three types of DNA sequences are required to ensure proper repli- cation and propagation of a YAC in a yeast cell? 200 bp 400 bp 600 bp 800 bp 1,000 bp 8885d_c24_920-947 2/11/04 1:36 PM Page 947 mac76 mac76:385_reb: another set of subunits, a clamp-loading complex, or H9253 complex, consisting of five subunits of four different types, H9270 2 H9253H9254H9254H11032. The core polymerases are linked through the H9270 (tau) subunits. Two additional subunits, H9273 (chi) and H9274 (psi), are bound to the clamp-loading complex. The entire assembly of 13 protein subunits (nine different types) is called DNA polymerase III* (Fig. 25–10a). DNA polymerase III* can polymerize DNA, but with a much lower processivity than one would expect for the organized replication of an entire chromosome. The necessary increase in processivity is provided by the ad- dition of the H9252 subunits, four of which complete the DNA polymerase III holoenzyme. The H9252 subunits associate in pairs to form donut-shaped structures that encircle the DNA and act like clamps (Fig. 25–10b). Each dimer as- sociates with a core subassembly of polymerase III* (one dimeric clamp per core subassembly) and slides along the DNA as replication proceeds. The H9252 sliding clamp prevents the dissociation of DNA polymerase III from DNA, dramatically increasing processivity—to greater than 500,000 (Table 25–1). DNA Replication Requires Many Enzymes and Protein Factors Replication in E. coli requires not just a single DNA polymerase but 20 or more different enzymes and pro- teins, each performing a specific task. The entire com- plex has been termed the DNA replicase system or replisome. The enzymatic complexity of replication re- flects the constraints imposed by the structure of DNA and by the requirements for accuracy. The main classes of replication enzymes are considered here in terms of the problems they overcome. Access to the DNA strands that are to act as tem- plates requires separation of the two parent strands. This is generally accomplished by helicases, enzymes that move along the DNA and separate the strands, us- ing chemical energy from ATP. Strand separation cre- ates topological stress in the helical DNA structure (see Fig. 24–12), which is relieved by the action of topo- isomerases. The separated strands are stabilized by DNA-binding proteins. As noted earlier, before DNA polymerases can begin synthesizing DNA, primers must be present on the template—generally short segments 25.1 DNA Replication 957 3H110325H11032 5H110323H11032 OH P RNA or DNA Template DNA strand (PP i ) n 3H110325H11032 5H110323H11032 OH P dNTPs dNMPs or rNMPs 3H110325H11032 5H110323H11032 OH P 3H110325H11032 5H110323H11032 OH P Nick Nick DNA polymerase I FIGURE 25–8 Large (Klenow) fragment of DNA polymerase I. This polymerase is widely distributed in bacteria. The Klenow fragment, produced by proteolytic treatment of the polymerase, retains the poly- merization and proofreading activities of the enzyme. The Klenow fragment shown here is from the thermophilic bacterium Bacillus stearothermophilus (PDB ID 3BDP). The active site for addition of nu- cleotides is deep in the crevice at the far end of the bound DNA. The dark blue strand is the template. FIGURE 25–9 Nick translation. In this process, an RNA or DNA strand paired to a DNA template is simultaneously degraded by the 5H11032n3H11032 exonuclease activity of DNA polymerase I and replaced by the poly- merase activity of the same enzyme. These activities have a role in both DNA repair and the removal of RNA primers during replication (both described later). The strand of nucleic acid to be removed (ei- ther DNA or RNA) is shown in green, the replacement strand in red. DNA synthesis begins at a nick (a broken phosphodiester bond, leav- ing a free 3H11032 hydroxyl and a free 5H11032 phosphate). Polymerase I extends the nontemplate DNA strand and moves the nick along the DNA—a process called nick translation. A nick remains where DNA polymerase I dissociates, and is later sealed by another enzyme. 8885d_c25_948-994 2/11/04 1:57 PM Page 957 mac76 mac76:385_reb: A Word about Terminology Before beginning to look closely at replication, we must make a short digression into the use of abbreviations in naming genes and pro- teins. By convention, bacterial genes generally are named using three lowercase, italicized letters that of- ten reflect their apparent function. For example, the dna, uvr, and rec genes affect DNA replication, resist- ance to the damaging effects of UV radiation, and re- combination, respectively. Where several genes affect the same process, the letters A, B, C, and so forth, are added—as in dnaA, dnaB, dnaQ, for example—usually reflecting their order of discovery rather than their or- der in a reaction sequence. During genetic investigations, the protein product of each gene is usually isolated and characterized. Many bacterial genes have been identified and named before the roles of their protein products are understood in detail. Sometimes the gene product is found to be a pre- viously isolated protein, and some renaming occurs. Often the product turns out to be an as yet unknown protein, with an activity not easily described by a sim- ple enzyme name. In a practice that can be confusing, Chapter 25 DNA Metabolism 949 Mismatch repair protein mutL Single-stranded DNA–binding protein ssb Helicase dnaB RNA polymerase subunits rpoB rpoC DNA polymerase I polA mutU dnaP rep (Replication origin) oriC Replication initiation dnaA dnaN Recombinational repair recF Methylation dam RNA polymerase subunits rpoA rpoD Primase dnaG Mismatch repair proteins mutH mutS recC Recombination and recombinational repair recB recD recA Recombination and recombinational repair DNA repair uvrA DNA helicase/mismatch repair uvrD DNA gyrase subunit Primosome assembly gyrB priA Ter (Replication termination) DNA ligase lig Uracyl glycosylase ung recO Recombinational repair nfo AP endonuclease DNA gyrase subunit gyrA sbcB Exonuclease I uvrC DNA repair ruvC ruvA Recombination and recombinational repair holE DNA polymerase III subunit xthA AP endonuclease ogt O 6 -G alkyltransferase ruvB umuC umuD uvrB DNA repair phr DNA photolyase holB DNA polymerase III subunit holA DNA polymerase III subunit recR Recombinational repair dinB DNA polymerase IV dnaQ DNA polymerase III subunit polC (dnaE) DNA polymerase III subunit mutT polB DNA polymerase II holC DNA polymerase III subunit dnaJ, dnaK dnaC Primosome component holD DNA polymerase III subunit 100/0 50 75 25 5H11032Helicase 3H11032 DNA polymerase V FIGURE 25–1 Map of the E. coli chromosome. The map shows the relative positions of genes encoding many of the proteins important in DNA metabolism. The number of genes known to be involved pro- vides a hint of the complexity of these processes. The numbers 0 to 100 inside the circular chromosome denote a genetic measurement called minutes. Each minute corresponds to ~40,000 bp along the DNA molecule of E. coli. The three-letter names of genes and other elements generally reflect some aspect of their function. These include mut, mutagenesis; dna, DNA replication; pol, DNA polymerase; rpo, RNA polymerase; uvr, UV resistance; rec, recombination; dam, DNA adenine methylation; lig, DNA ligase; Ter, termination of replication; and ori, origin of replication. 8885d_c25_948-994 2/11/04 1:57 PM Page 949 mac76 mac76:385_reb: these bacterial proteins often retain the name of their genes. When referring to the protein, roman type is used and the first letter is capitalized: for example, the dnaA and recA gene products are called the DnaA and RecA proteins, respectively. You will encounter many such ex- amples in this chapter. Similar conventions exist for the naming of eukary- otic genes, although the exact form of the abbreviations may vary with the species and no single convention ap- plies to all eukaryotic systems. 25.1 DNA Replication Long before the structure of DNA was known, scientists wondered at the ability of organisms to create faithful copies of themselves and, later, at the ability of cells to produce many identical copies of large and complex macromolecules. Speculation about these problems cen- tered around the concept of a template, a structure that would allow molecules to be lined up in a specific order and joined, to create a macromolecule with a unique sequence and function. The 1940s brought the revelation that DNA was the genetic molecule, but not until James Watson and Francis Crick deduced its struc- ture did the way in which DNA could act as a template for the replication and transmission of genetic informa- tion become clear: one strand is the complement of the other. The strict base-pairing rules mean that each strand provides the template for a sister strand with a predictable and complementary sequence (see Figs 8–16, 8–17). Nucleotides: Building Blocks of Nucleic Acids The fundamental properties of the DNA replication process and the mechanisms used by the enzymes that catalyze it have proved to be essentially identical in all species. This mechanistic unity is a major theme as we proceed from general properties of the replication process, to E. coli replication enzymes, and, finally, to replication in eukaryotes. DNA Replication Follows a Set of Fundamental Rules Early research on bacterial DNA replication and its en- zymes helped to establish several basic properties that have proven applicable to DNA synthesis in every organism. DNA Replication Is Semiconservative Each DNA strand serves as a template for the synthesis of a new strand, producing two new DNA molecules, each with one new strand and one old strand. This is semiconservative replication. Watson and Crick proposed the hypothesis of semi- conservative replication soon after publication of their 1953 paper on the structure of DNA, and the hypothe- sis was proved by ingeniously designed experiments car- ried out by Matthew Meselson and Franklin Stahl in 1957. Meselson and Stahl grew E. coli cells for many generations in a medium in which the sole nitrogen source (NH 4 Cl) contained 15 N, the “heavy” isotope of nitrogen, instead of the normal, more abundant “light” isotope, 14 N. The DNA isolated from these cells had a density about 1% greater than that of normal [ 14 N]DNA (Fig. 25–2a). Although this is only a small difference, a mixture of heavy [ 15 N]DNA and light [ 14 N]DNA can be separated by centrifugation to equilibrium in a cesium chloride density gradient. The E. coli cells grown in the 15 N medium were transferred to a fresh medium containing only the 14 N isotope, where they were allowed to grow until the cell population had just doubled. The DNA isolated from these first-generation cells formed a single band in the CsCl gradient at a position indicating that the double- helical DNA molecules of the daughter cells were hy- brids containing one new 14 N strand and one parent 15 N strand (Fig. 25–2b). This result argued against conservative replication, an alternative hypothesis in which one progeny DNA Chapter 25 DNA Metabolism950 DNA extracted and centrifuged to equilibrium in CsCl density gradient Original parent molecule First-generation daughter molecules Second-generation daughter molecules Heavy DNA ( 15 N) Hybrid DNA ( 15 N– 14 N) Hybrid DNA Light DNA ( 14 N) (a) (b) (c) FIGURE 25–2 The Meselson-Stahl experiment. (a) Cells were grown for many generations in a medium containing only heavy nitrogen, 15 N, so that all the nitrogen in their DNA was 15 N, as shown by a sin- gle band (blue) when centrifuged in a CsCl density gradient. (b) Once the cells had been transferred to a medium containing only light ni- trogen, 14 N, cellular DNA isolated after one generation equilibrated at a higher position in the density gradient (purple band). (c) Contin- uation of replication for a second generation yielded two hybrid DNAs and two light DNAs (red), confirming semiconservative replication. 8885d_c25_948-994 2/11/04 1:57 PM Page 950 mac76 mac76:385_reb: molecule would consist of two newly synthesized DNA strands and the other would contain the two parent strands; this would not yield hybrid DNA molecules in the Meselson-Stahl experiment. The semiconservative replication hypothesis was further supported in the next step of the experiment (Fig. 25–2c). Cells were again allowed to double in number in the 14 N medium. The isolated DNA product of this second cycle of replication exhibited two bands in the density gradient, one with a density equal to that of light DNA and the other with the density of the hybrid DNA observed after the first cell doubling. Replication Begins at an Origin and Usually Proceeds Bidirec- tionally Following the confirmation of a semiconserva- tive mechanism of replication, a host of questions arose. Are the parent DNA strands completely unwound be- fore each is replicated? Does replication begin at ran- dom places or at a unique point? After initiation at any point in the DNA, does replication proceed in one di- rection or both? An early indication that replication is a highly co- ordinated process in which the parent strands are si- multaneously unwound and replicated was provided by John Cairns, using autoradiography. He made E. coli DNA radioactive by growing cells in a medium contain- ing thymidine labeled with tritium ( 3 H). When the DNA was carefully isolated, spread, and overlaid with a pho- tographic emulsion for several weeks, the radioactive thymidine residues generated “tracks” of silver grains in the emulsion, producing an image of the DNA molecule. These tracks revealed that the intact chromosome of E. coli is a single huge circle, 1.7 mm long. Radioactive DNA isolated from cells during replication showed an extra loop (Fig. 25–3a). Cairns concluded that the loop resulted from the formation of two radioactive daugh- ter strands, each complementary to a parent strand. One or both ends of the loop are dynamic points, termed replication forks, where parent DNA is being un- wound and the separated strands quickly replicated. Cairns’s results demonstrated that both DNA strands are replicated simultaneously, and a variation on his exper- iment (Fig. 25–3b) indicated that replication of bacter- ial chromosomes is bidirectional: both ends of the loop have active replication forks. The determination of whether the replication loops originate at a unique point in the DNA required land- marks along the DNA molecule. These were provided 25.1 DNA Replication 951 (a) Bidirectional Origin Replication forks Unidirectional Origin (b) FIGURE 25–3 Visualization of bidirectional DNA replication. Repli- cation of a circular chromosome produces a structure resembling the Greek letter theta (H9258). (a) Labeling with tritium ( 3 H) shows that both strands are replicated at the same time (new strands shown in red). The electron micrographs illustrate the replication of a circular E. coli plasmid as visualized by autoradiography. (b) Addition of 3 H for a short period just before the reaction is stopped allows a distinction to be made between unidirectional and bidirectional replication, by de- termining whether label (red) is found at one or both replication forks in autoradiograms. This technique has revealed bidirectional replica- tion in E. coli, Bacillus subtilis, and other bacteria. 8885d_c25_948-994 2/11/04 1:57 PM Page 951 mac76 mac76:385_reb: by a technique called denaturation mapping, devel- oped by Ross Inman and colleagues. Using the 48,502 bp chromosome of bacteriophage H9261, Inman showed that DNA could be selectively denatured at sequences un- usually rich in AUT base pairs, generating a repro- ducible pattern of single-strand bubbles (see Fig. 8–31). Isolated DNA containing replication loops can be par- tially denatured in the same way. This allows the posi- tion and progress of the replication forks to be meas- ured and mapped, using the denatured regions as points of reference. The technique revealed that in this system the replication loops always initiate at a unique point, which was termed an origin. It also confirmed the ear- lier observation that replication is usually bidirectional. For circular DNA molecules, the two replication forks meet at a point on the side of the circle opposite to the origin. Specific origins of replication have since been identified and characterized in bacteria and lower eukaryotes. DNA Synthesis Proceeds in a 5H11541n3H11541 Direction and Is Semidis- continuous A new strand of DNA is always synthesized in the 5H11032n3H11032 direction, with the free 3H11032 OH as the point at which the DNA is elongated (the 5H11032 and 3H11032 ends of a DNA strand are defined in Fig. 8–7). Because the two DNA strands are antiparallel, the strand serving as the template is read from its 3H11032 end toward its 5H11032 end. If synthesis always proceeds in the 5H11032n3H11032 direction, how can both strands be synthesized simultaneously? If both strands were synthesized continuously while the replication fork moved, one strand would have to un- dergo 3H11032n5H11032 synthesis. This problem was resolved by Reiji Okazaki and colleagues in the 1960s. Okazaki found that one of the new DNA strands is synthesized in short pieces, now called Okazaki fragments. This work ul- timately led to the conclusion that one strand is syn- thesized continuously and the other discontinuously (Fig. 25–4). The continuous strand, or leading strand, is the one in which 5H11032n3H11032 synthesis proceeds in the same direction as replication fork movement. The dis- continuous strand, or lagging strand, is the one in which 5H11032n3H11032 synthesis proceeds in the direction oppo- site to the direction of fork movement. Okazaki frag- ments range in length from a few hundred to a few thou- sand nucleotides, depending on the cell type. As we shall see later, leading and lagging strand syntheses are tightly coordinated. DNA Is Degraded by Nucleases To explain the enzymology of DNA replication, we first introduce the enzymes that degrade DNA rather than synthesize it. These enzymes are known as nucleases, or DNases if they are specific for DNA rather than RNA. Every cell contains several different nucleases, belong- ing to two broad classes: exonucleases and endonucle- ases. Exonucleases degrade nucleic acids from one end of the molecule. Many operate in only the 5H11032n3H11032 or the 3H11032n5H11032 direction, removing nucleotides only from the 5H11032 or the 3H11032 end, respectively, of one strand of a double- stranded nucleic acid or of a single-stranded DNA. En- donucleases can begin to degrade at specific internal sites in a nucleic acid strand or molecule, reducing it to smaller and smaller fragments. A few exonucleases and endonucleases degrade only single-stranded DNA. There are a few important classes of endonucleases that cleave only at specific nucleotide sequences (such as the restriction endonucleases that are so important in biotechnology; see Chapter 9, Fig. 9–3). You will en- counter many types of nucleases in this and subsequent chapters. DNA Is Synthesized by DNA Polymerases The search for an enzyme that could synthesize DNA began in 1955. Work by Arthur Kornberg and colleagues led to the puri- fication and characterization of DNA polymerase from E. coli cells, a single-polypeptide en- zyme now called DNA poly- merase I (M r 103,000; encoded by the polA gene). Much later, investigators found that E. coli contains at least four other distinct DNA polymerases, de- scribed below. Detailed studies of DNA polymerase I revealed fea- tures of the DNA synthetic process that are now known to be common to all DNA polymerases. The fundamen- Chapter 25 DNA Metabolism952 3H11032 5H11032 3H11032 5H11032 Direction of movement of replication fork Lagging strand 5H11032 5H11032 3H11032 3H11032 Leading strand Okazaki fragments 5H11032 3H11032 FIGURE 25–4 Defining DNA strands at the replication fork. A new DNA strand (red) is always synthesized in the 5H11032n3H11032 direction. The template is read in the opposite direction, 3H11032n5H11032. The leading strand is continuously synthesized in the direction taken by the replication fork. The other strand, the lagging strand, is synthesized discontinu- ously in short pieces (Okazaki fragments) in a direction opposite to that in which the replication fork moves. The Okazaki fragments are spliced together by DNA ligase. In bacteria, Okazaki fragments are ~1,000 to 2,000 nucleotides long. In eukaryotic cells, they are 150 to 200 nucleotides long. Arthur Kornberg 8885d_c25_948-994 2/11/04 1:57 PM Page 952 mac76 mac76:385_reb: tal reaction is a phosphoryl group transfer. The nucleo- phile is the 3H11032-hydroxyl group of the nucleotide at the 3H11032 end of the growing strand. Nucleophilic attack occurs at the H9251 phosphorus of the incoming deoxynucleoside 5H11032-triphosphate (Fig. 25–5). Inorganic pyrophosphate is released in the reaction. The general reaction is (dNMP) n H11001 dNTP 88n (dNMP) nH110011 H11001 PP i (25–1) DNA Lengthened DNA where dNMP and dNTP are deoxynucleoside 5H11032-mono- phosphate and 5H11032-triphosphate, respectively. The reac- tion appears to proceed with only a minimal change in free energy, given that one phosphodiester bond is formed at the expense of a somewhat less stable phos- phate anhydride. However, noncovalent base-stacking and base-pairing interactions provide additional stabi- lization to the lengthened DNA product relative to the free nucleotide. Also, the formation of products is facil- itated in the cell by the 19 kJ/mol generated in the sub- sequent hydrolysis of the pyrophosphate product by the enzyme pyrophosphatase. 25.1 DNA Replication 953 A T P PP GA T T OH G C A PO O O H11002 O PO O O H11002 PO O H11002 O H11002 O H11002 PO O O H11002 PO O H11002 O H11002 OH Template DNA strand Growing DNA strand (primer) 5H11032 3H11032 P PPPP P Deoxyribose GA T C 5H11032 3H11032 P PP P 5H11032 5H11032 3H11032 5H11032 OH Incoming deoxynucleoside 5H11032-triphosphate G : CH 2 O P P P DNA polymerase P PP i O OO – O O CH 2 HH H – O O B HH H O O – – O O OO – O – O – H HOH O B C Asp Asp Asp O Template strand CH 2 HH HO O – O O O B CH 2 HH HOH O B Template strand Mg 2+ Mg 2+ MECHANISM FIGURE 25–5 Elongation of a DNA chain. (a) DNA polymerase I activity requires a single unpaired strand to act as template and a primer strand to provide a free hydroxyl group at the 3H11032 end, to which a new nucleotide unit is added. Each incoming nucleotide is selected in part by base pairing to the appropriate nucleotide in the template strand. The reaction product has a new free 3H11032 hydroxyl, allowing the addition of another nucleotide. (b) The catalytic mechanism likely involves two Mg 2H11001 ions, coordinated to the phosphate groups of the incoming nucleotide triphosphate and to three Asp residues, two of which are highly conserved in all DNA polymerases. The top Mg 2H11001 ion in the figure facilitates attack of the 3H11032-hydroxyl group of the primer on the H9251 phosphate of the nucleotide triphosphate; the lower Mg 2H11001 ion facilitates displacement of the pyrophosphate. Both ions stabilize the structure of the pentacovalent transition state. RNA polymerases use a similar mechanism (See Fig. 26–1b). Nucleic Acid Synthesis (a) (b) 8885d_c25_948-994 2/11/04 1:57 PM Page 953 mac76 mac76:385_reb: Early work on DNA polymerase I led to the defini- tion of two central requirements for DNA polymeriza- tion. First, all DNA polymerases require a template. The polymerization reaction is guided by a template DNA strand according to the base-pairing rules pre- dicted by Watson and Crick: where a guanine is present in the template, a cytosine deoxynucleotide is added to the new strand, and so on. This was a particularly im- portant discovery, not only because it provided a chem- ical basis for accurate semiconservative DNA replication but also because it represented the first example of the use of a template to guide a biosynthetic reaction. Second, the polymerases require a primer. A primer is a strand segment (complementary to the template) with a free 3H11032-hydroxyl group to which a nucleotide can be added; the free 3H11032 end of the primer is called the primer terminus. In other words, part of the new strand must already be in place: all DNA polymerases can only add nucleotides to a preexisting strand. Most primers are oligonucleotides of RNA rather than DNA, and specialized enzymes synthesize primers when and where they are required. After adding a nucleotide to a growing DNA strand, a DNA polymerase either dissociates or moves along the template and adds another nucleotide. Dissociation and reassociation of the polymerase can limit the overall polymerization rate—the process is generally faster when a polymerase adds more nucleotides without dis- sociating from the template. The average number of nu- cleotides added before a polymerase dissociates defines its processivity. DNA polymerases vary greatly in pro- cessivity; some add just a few nucleotides before disso- ciating, others add many thousands. Nucleotide Poly- merization by DNA Polymerase Replication Is Very Accurate Replication proceeds with an extraordinary degree of fi- delity. In E. coli, a mistake is made only once for every 10 9 to 10 10 nucleotides added. For the E. coli chromo- some of ~4.6 H11003 10 6 bp, this means that an error occurs only once per 1,000 to 10,000 replications. During poly- merization, discrimination between correct and incor- rect nucleotides relies not just on the hydrogen bonds that specify the correct pairing between complementary bases but also on the common geometry of the standard AUT and GmC base pairs (Fig. 25–6). The active site of DNA polymerase I accommodates only base pairs with this geometry. An incorrect nucleotide may be able to hydrogen-bond with a base in the template, but it gen- erally will not fit into the active site. Incorrect bases can be rejected before the phosphodiester bond is formed. The accuracy of the polymerization reaction itself, however, is insufficient to account for the high degree of fidelity in replication. Careful measurements in vitro have shown that DNA polymerases insert one incorrect nucleotide for every 10 4 to 10 5 correct ones. These Chapter 25 DNA Metabolism954 N N N N H H A N N N NH N O NH H G N H11001 H N N N A NHN H C N N O H H NT N O OH HO N N N N H NH G (b) CH 3 NT N CH 3 O O H H N H N N N N H A NC N N O H H O H N N N N H H N G (a) FIGURE 25–6 Contribution of base-pair geometry to the fidelity of DNA replication. (a) The standard AUT and GmC base pairs have very similar geometries, and an active site sized to fit one (blue shading) will generally accommodate the other. (b) The geometry of incorrectly paired bases can exclude them from the active site, as occurs on DNA polymerase I. 8885d_c25_948-994 2/11/04 1:57 PM Page 954 mac76 mac76:385_reb: mistakes sometimes occur because a base is briefly in an unusual tautomeric form (see Fig. 8–9), allowing it to hydrogen-bond with an incorrect partner. In vivo, the er- ror rate is reduced by additional enzymatic mechanisms. One mechanism intrinsic to virtually all DNA poly- merases is a separate 3H11032n5H11032 exonuclease activity that double-checks each nucleotide after it is added. This nuclease activity permits the enzyme to remove a newly added nucleotide and is highly specific for mismatched base pairs (Fig. 25–7). If the polymerase has added the wrong nucleotide, translocation of the enzyme to the position where the next nucleotide is to be added is in- hibited. This kinetic pause provides the opportunity for a correction. The 3H11032n5H11032 exonuclease activity removes the mispaired nucleotide, and the polymerase begins again. This activity, known as proofreading, is not simply the reverse of the polymerization reaction (Eqn 25–1), because pyrophosphate is not involved. The polymerizing and proofreading activities of a DNA poly- merase can be measured separately. Proofreading im- proves the inherent accuracy of the polymerization re- action 10 2 - to 10 3 -fold. In the monomeric DNA polymerase I, the polymerizing and proofreading activ- ities have separate active sites within the same polypeptide. When base selection and proofreading are com- bined, DNA polymerase leaves behind one net error for every 10 6 to 10 8 bases added. Yet the measured accu- racy of replication in E. coli is higher still. The addi- tional accuracy is provided by a separate enzyme sys- tem that repairs the mismatched base pairs remaining after replication. We describe this mismatch repair, along with other DNA repair processes, in Section 25.2. E. coli Has at Least Five DNA Polymerases More than 90% of the DNA polymerase activity observed in E. coli extracts can be accounted for by DNA poly- merase I. Soon after the isolation of this enzyme in 1955, however, evidence began to accumulate that it is not suited for replication of the large E. coli chromosome. First, the rate at which it adds nucleotides (600 nu- cleotides/min) is too slow (by a factor of 100 or more) to account for the rates at which the replication fork moves in the bacterial cell. Second, DNA polymerase I has a relatively low processivity. Third, genetic studies have demonstrated that many genes, and therefore many proteins, are involved in replication: DNA poly- merase I clearly does not act alone. Fourth, and most important, in 1969 John Cairns isolated a bacterial strain with an altered gene for DNA polymerase I that pro- duced an inactive enzyme. Although this strain was ab- normally sensitive to agents that damaged DNA, it was nevertheless viable! A search for other DNA polymerases led to the discovery of E. coli DNA polymerase II and DNA polymerase III in the early 1970s. DNA polymerase II is an enzyme involved in one type of DNA repair (Sec- tion 25.3). DNA polymerase III is the principal replica- tion enzyme in E. coli. The properties of these three DNA polymerases are compared in Table 25–1. DNA 25.1 DNA Replication 955 DNA polymerase I OH Before the polymerase moves on, the cytosine undergoes a tautomeric shift from C* to C. The new nucleotide is now mispaired. is a rare tautomeric form of cytosine (C*) that pairs with A and is incorporated into the growing strand. The mispaired 3H11032-OH end of the growing strand blocks further elongation. DNA polymerase slides back to position the mispaired base in the 3H11032→5H11032 exonuclease active site. The mispaired nucleotide is removed. DNA polymerase slides forward and resumes its polymerization activity. DNA polymerase active site 3H11032→5H11032 (proofreading) exonuclease active site C A OH C OH OH C A OH OH 5H11032 3H11032 FIGURE 25–7 An example of error correction by the 3H11541n5H11541 exonu- clease activity of DNA polymerase I. Structural analysis has located the exonuclease activity ahead of the polymerase activity as the en- zyme is oriented in its movement along the DNA. A mismatched base (here, a C–A mismatch) impedes translocation of DNA polymerase I to the next site. Sliding backward, the enzyme corrects the mistake with its 3H11032n5H11032 exonuclease activity, then resumes its polymerase ac- tivity in the 5H11032n3H11032 direction. 8885d_c25_948-994 2/11/04 1:57 PM Page 955 mac76 mac76:385_reb: polymerases IV and V, identified in 1999, are involved in an unusual form of DNA repair (Section 25.2). DNA polymerase I, then, is not the primary enzyme of replication; instead it performs a host of clean-up functions during replication, recombination, and repair. The polymerase’s special functions are enhanced by its 5H11032n3H11032 exonuclease activity. This activity, distinct from the 3H11032n5H11032 proofreading exonuclease (Fig. 25–7), is lo- cated in a structural domain that can be separated from the enzyme by mild protease treatment. When the 5H11032n3H11032 exonuclease domain is removed, the remaining fragment (M r 68,000), the large fragment or Klenow fragment (Fig. 25–8), retains the polymerization and proofreading activities. The 5H11032n3H11032 exonuclease activity of intact DNA polymerase I can replace a segment of DNA (or RNA) paired to the template strand, in a process known as nick translation (Fig. 25–9). Most other DNA polymerases lack a 5H11032n3H11032 exonuclease activity. DNA polymerase III is much more complex than DNA polymerase I, having ten types of subunits (Table 25–2). Its polymerization and proofreading activities re- side in its H9251 and H9255 (epsilon) subunits, respectively. The H9258 subunit associates with H9251 and H9255 to form a core poly- merase, which can polymerize DNA but with limited processivity. Two core polymerases can be linked by Chapter 25 DNA Metabolism956 TABLE 25–1 Comparison of DNA Polymerases of E. coli DNA polymerase I II III Structural gene * polA polB polC (dnaE) Subunits (number of different types) 1 7 H1135010 M r 103,000 88,000 ? 791,500 3H11032n5H11032 Exonuclease (proofreading) Yes Yes Yes 5H11032n3H11032 Exonuclease Yes No No Polymerization rate (nucleotides/s) 16–20 40 250–1,000 Processivity (nucleotides added 3–200 1,500 H11350500,000 before polymerase dissociates) * For enzymes with more than one subunit, the gene listed here encodes the subunit with polymerization activity. Note that dnaE is an earlier designation for the gene now referred to as polC. ? Polymerization subunit only. DNA polymerase II shares several subunits with DNA polymerase III, including the H9252, H9253, H9254, H9254H11032, H9273, and H9274 subunits (see Table 25–2). TABLE 25–2 Subunits of DNA Polymerase III of E. coli Number of subunits per Subunit holoenzyme M r of subunit Gene Function of subunit H9251 2 129,900 polC (dnaE) Polymerization activity H9255 2 27,500 dnaQ (mutD)3H11032n5H11032 Proofreading exonuclease Core polymerase H9258 2 8,600 holE H9270 2 71,100 dnaX Stable template binding; core enzyme dimerization Clamp-loading (H9253) complex that H9253 1 47,500 dnaX * Clamp loader loads H9252 subunits on lagging H9254 1 38,700 holA Clamp opener strand at each Okazaki fragment H9254H11032 1 36,900 holB Clamp loader H9273 1 16,600 holC Interaction with SSB H9274 1 15,200 holD Interaction with H9253 and H9273 H9252 4 40,600 dnaN DNA clamp required for optimal processivity * The H9253 subunit is encoded by a portion of the gene for the H9270 subunit, such that the amino-terminal 66% of the H9270 subunit has the same amino acid sequence as the H9253 subunit. The H9253 subunit is generated by a translational frameshifting mechanism (see Box 27–1) that leads to premature translational termination. H20903 H20903 8885d_c25_948-994 2/11/04 1:57 PM Page 956 mac76 mac76:385_reb: another set of subunits, a clamp-loading complex, or H9253 complex, consisting of five subunits of four different types, H9270 2 H9253H9254H9254H11032. The core polymerases are linked through the H9270 (tau) subunits. Two additional subunits, H9273 (chi) and H9274 (psi), are bound to the clamp-loading complex. The entire assembly of 13 protein subunits (nine different types) is called DNA polymerase III* (Fig. 25–10a). DNA polymerase III* can polymerize DNA, but with a much lower processivity than one would expect for the organized replication of an entire chromosome. The necessary increase in processivity is provided by the ad- dition of the H9252 subunits, four of which complete the DNA polymerase III holoenzyme. The H9252 subunits associate in pairs to form donut-shaped structures that encircle the DNA and act like clamps (Fig. 25–10b). Each dimer as- sociates with a core subassembly of polymerase III* (one dimeric clamp per core subassembly) and slides along the DNA as replication proceeds. The H9252 sliding clamp prevents the dissociation of DNA polymerase III from DNA, dramatically increasing processivity—to greater than 500,000 (Table 25–1). DNA Replication Requires Many Enzymes and Protein Factors Replication in E. coli requires not just a single DNA polymerase but 20 or more different enzymes and pro- teins, each performing a specific task. The entire com- plex has been termed the DNA replicase system or replisome. The enzymatic complexity of replication re- flects the constraints imposed by the structure of DNA and by the requirements for accuracy. The main classes of replication enzymes are considered here in terms of the problems they overcome. Access to the DNA strands that are to act as tem- plates requires separation of the two parent strands. This is generally accomplished by helicases, enzymes that move along the DNA and separate the strands, us- ing chemical energy from ATP. Strand separation cre- ates topological stress in the helical DNA structure (see Fig. 24–12), which is relieved by the action of topo- isomerases. The separated strands are stabilized by DNA-binding proteins. As noted earlier, before DNA polymerases can begin synthesizing DNA, primers must be present on the template—generally short segments 25.1 DNA Replication 957 3H110325H11032 5H110323H11032 OH P RNA or DNA Template DNA strand (PP i ) n 3H110325H11032 5H110323H11032 OH P dNTPs dNMPs or rNMPs 3H110325H11032 5H110323H11032 OH P 3H110325H11032 5H110323H11032 OH P Nick Nick DNA polymerase I FIGURE 25–8 Large (Klenow) fragment of DNA polymerase I. This polymerase is widely distributed in bacteria. The Klenow fragment, produced by proteolytic treatment of the polymerase, retains the poly- merization and proofreading activities of the enzyme. The Klenow fragment shown here is from the thermophilic bacterium Bacillus stearothermophilus (PDB ID 3BDP). The active site for addition of nu- cleotides is deep in the crevice at the far end of the bound DNA. The dark blue strand is the template. FIGURE 25–9 Nick translation. In this process, an RNA or DNA strand paired to a DNA template is simultaneously degraded by the 5H11032n3H11032 exonuclease activity of DNA polymerase I and replaced by the poly- merase activity of the same enzyme. These activities have a role in both DNA repair and the removal of RNA primers during replication (both described later). The strand of nucleic acid to be removed (ei- ther DNA or RNA) is shown in green, the replacement strand in red. DNA synthesis begins at a nick (a broken phosphodiester bond, leav- ing a free 3H11032 hydroxyl and a free 5H11032 phosphate). Polymerase I extends the nontemplate DNA strand and moves the nick along the DNA—a process called nick translation. A nick remains where DNA polymerase I dissociates, and is later sealed by another enzyme. 8885d_c25_948-994 2/16/04 6:43 AM Page 957 mac39 Pdrive 01:es%0:freeman:8885d:ch25: End view of RNA synthesized by enzymes known as primases. Ultimately, the RNA primers are removed and replaced by DNA; in E. coli, this is one of the many functions of DNA polymerase I. After an RNA primer is removed and the gap is filled in with DNA, a nick remains in the DNA backbone in the form of a broken phosphodiester bond. These nicks are sealed by DNA ligases. All these processes require coordination and regulation, an in- terplay best characterized in the E. coli system. Replication of the E. coli Chromosome Proceeds in Stages The synthesis of a DNA molecule can be divided into three stages: initiation, elongation, and termination, distinguished both by the reactions taking place and by the enzymes required. As you will find here and in the next two chapters, synthesis of the major information- containing biological polymers—DNAs, RNAs, and pro- teins—can be understood in terms of these same three stages, with the stages of each pathway having unique characteristics. The events described below reflect in- formation derived primarily from in vitro experiments using purified E. coli proteins, although the principles are highly conserved in all replication systems. Initiation The E. coli replication origin, oriC, consists of 245 bp; it bears DNA sequence elements that are highly conserved among bacterial replication origins. The general arrangement of the conserved sequences is Chapter 25 DNA Metabolism958 t b clamp DnaB helicase t b clamp (open) Core (aev) d g dH11032 FIGURE 25–10 DNA polymerase III. (a) Architecture of bacterial DNA polymerase III. Two core domains, composed of subunits H9251, H9255, and H9258, are linked by a five-subunit H9253 complex (also known as the clamp-loading complex) with the composition H9270 2 H9253H9254H9254H11032. The H9253 and H9270 subunits are encoded by the same gene. The H9253 subunit is a shortened version of H9270; the H9270 subunit thus contains a domain identical to H9253, along with an additional segment that interacts with the core polymerase. The other two subunits of DNA polymerase III*, H9273 and H9274 (not shown), also bind to the H9253 complex. Two H9252 clamps interact with the two-core subassembly, each clamp a dimer of the H9252 subunit. The complex in- teracts with the DnaB helicase through the H9270 subunit. (b) Two H9252 sub- units of E. coli polymerase III form a circular clamp that surrounds the DNA. The clamp slides along the DNA molecule, increasing the pro- cessivity of the polymerase III holoenzyme to greater than 500,000 by preventing its dissociation from the DNA. The end-on view shows the two H9252 subunits as gray and light-blue ribbon structures surrounding a space-filling model of DNA. In the side view, surface contour models of the H9252 subunits (gray) surround a stick representation of a DNA dou- ble helix (light and dark blue) (derived from PDB ID 2POL). Side view (b) (a) 8885d_c25_958 2/12/04 11:32 AM Page 958 mac76 mac76:385_reb: illustrated in Figure 25–11. The key sequences of in- terest here are two series of short repeats: three repeats of a 13 bp sequence and four repeats of a 9 bp sequence. At least nine different enzymes or proteins (sum- marized in Table 25–3) participate in the initiation phase of replication. They open the DNA helix at the origin and establish a prepriming complex for subsequent re- actions. The crucial component in the initiation process is the DnaA protein. A single complex of four to five DnaA protein molecules binds to the four 9 bp repeats in the origin (Fig. 25–12, step 1 ), then recognizes and successively denatures the DNA in the region of the three 13 bp repeats, which are rich in AUT pairs (step 2 ). This process requires ATP and the bacterial his- tonelike protein HU. The DnaC protein then loads the DnaB protein onto the unwound region. Two ring- shaped hexamers of DnaB, one loaded onto each DNA strand, act as helicases, unwinding the DNA bidirec- tionally and creating two potential replication forks. If the E. coli single-stranded DNA–binding protein (SSB) and DNA gyrase (DNA topoisomerase II) are now added in vitro, thousands of base pairs are rapidly unwound by the DnaB helicase, proceeding out from the origin. Many molecules of SSB bind cooperatively to single- stranded DNA, stabilizing the separated strands and preventing renaturation while gyrase relieves the topo- logical stress produced by the DnaB helicase. When ad- ditional replication proteins are included in the in vitro system, the DNA unwinding mediated by DnaB is cou- pled to replication, as described below. Initiation is the only phase of DNA replication that is known to be regulated, and it is regulated such that replication occurs only once in each cell cycle. The mechanism of regulation is not yet well understood, but genetic and biochemical studies have provided a few insights. The timing of replication initiation is affected by DNA methylation and interactions with the bacterial plasma membrane. The oriC DNA is methylated by the Dam methylase (Table 25–3), which methylates the N 6 position of adenine within the palindromic sequence (5H11032)GATC. (Dam is not a biochemical expletive; it stands for DNA adenine methylation.) The oriC region of E. coli is highly enriched in GATC sequences—it has 11 of them in its 245 bp, whereas the average frequency of GATC in the E. coli chromosome as a whole is 1 in 256 bp. 25.1 DNA Replication 959 Tandem array of three 13 bp sequences Binding sites for DnaA protein, four 9 bp sequences Consensus sequence TTATCCACA Consensus sequence GATCTNTTNTTTT FIGURE 25–11 Arrangement of sequences in the E. coli replication origin, oriC. Although the repeated sequences (shaded in color) are not identical, certain nucleotides are particularly common in each po- sition, forming a consensus sequence. In positions where there is no consensus, N represents any of the four nucleotides. The arrows indi- cate the orientations of the nucleotide sequences. 1 2 3 DnaB helicase Priming and replication DnaB DnaC HU DnaA Supercoiled template Three 13 bp repeats Four 9 bp repeats oriC ATPH11001 ATPH11001 ATPH11001 FIGURE 25–12 Model for initiation of replication at the E. coli ori- gin, oriC. 1H22071 About 20 DnaA protein molecules, each with a bound ATP, bind at the four 9 bp repeats. The DNA is wrapped around this complex. 2H22071 The three AUT-rich 13 bp repeats are denatured se- quentially. 3H22071 Hexamers of the DnaB protein bind to each strand, with the aid of DnaC protein. The DnaB helicase activity further un- winds the DNA in preparation for priming and DNA synthesis. 8885d_c25_948-994 2/11/04 1:57 PM Page 959 mac76 mac76:385_reb: Immediately after replication, the DNA is hemi- methylated: the parent strands have methylated oriC sequences but the newly synthesized strands do not. The hemimethylated oriC sequences are now sequestered for a period by interaction with the plasma membrane (the mechanism is unknown). After a time, oriC is re- leased from the plasma membrane, and it must be fully methylated by Dam methylase before it can again bind DnaA. Regulation of initiation also involves the slow hy- drolysis of ATP by DnaA protein, which cycles the pro- tein between active (with bound ATP) and inactive (with bound ADP) forms on a timescale of 20 to 40 minutes. Elongation The elongation phase of replication includes two distinct but related operations: leading strand syn- thesis and lagging strand synthesis. Several enzymes at the replication fork are important to the synthesis of both strands. Parent DNA is first unwound by DNA helicases, and the resulting topological stress is relieved by topo- isomerases. Each separated strand is then stabilized by Chapter 25 DNA Metabolism960 TABLE 25–3 Proteins Required to Initiate Replication at the E. coli Origin Number of Protein M r subunits Function DnaA protein 52,000 1 Recognizes ori sequence; opens duplex at specific sites in origin DnaB protein (helicase) 300,000 6 * Unwinds DNA DnaC protein 29,000 1 Required for DnaB binding at origin HU 19,000 2 Histonelike protein; DNA-binding protein; stimulates initiation Primase (DnaG protein) 60,000 1 Synthesizes RNA primers Single-stranded DNA–binding protein (SSB) 75,600 4 * Binds single-stranded DNA RNA polymerase 454,000 5 Facilitates DnaA activity DNA gyrase (DNA topoisomerase II) 400,000 4 Relieves torsional strain generated by DNA unwinding Dam methylase 32,000 1 Methylates (5H11032)GATC sequences at oriC FIGURE 25–13 Synthesis of Okazaki fragments. (a) At intervals, primase synthesizes an RNA primer for a new Okazaki fragment. Note that if we consider the two template strands as lying side by side, lagging strand synthesis formally proceeds in the opposite direction from fork movement. (b) Each primer is extended by DNA polymerase III. (c) DNA synthesis continues until the fragment extends as far as the primer of the previously added Okazaki fragment. A new primer is synthesized near the replication fork to begin the process again. 5H11032 3H11032 5H11032 3H11032 5H11032 3H11032 Replication fork movement Leading strand synthesis (DNA polymerase III) DnaB helicase DNA topoisomerase II (DNA gyrase) Lagging strand Lagging strand synthesis (DNA polymerase III) SSB RNA primer DNA primase (a) (c) (b) RNA primer from previous Okazaki fragment * Subunits in these cases are identical. 8885d_c25_948-994 2/11/04 1:57 PM Page 960 mac76 mac76:385_reb: SSB. From this point, synthesis of leading and lagging strands is sharply different. Leading strand synthesis, the more straightforward of the two, begins with the synthesis by primase (DnaG protein) of a short (10 to 60 nucleotide) RNA primer at the replication origin. Deoxyribonucleotides are added to this primer by DNA polymerase III. Leading strand synthesis then proceeds continuously, keeping pace with the unwinding of DNA at the replication fork. Lagging strand synthesis, as we have noted, is ac- complished in short Okazaki fragments. First, an RNA primer is synthesized by primase and, as in leading strand synthesis, DNA polymerase III binds to the RNA primer and adds deoxyribonucleotides (Fig. 25–13). On this level, the synthesis of each Okazaki fragment seems straightforward, but the reality is quite complex. The complexity lies in the coordination of leading and lag- ging strand synthesis: both strands are produced by a single asymmetric DNA polymerase III dimer, which is accomplished by looping the DNA of the lagging strand as shown in Figure 25–14, bringing together the two points of polymerization. 25.1 DNA Replication 961 DnaB Core Clamp-loading complex with open b sliding clamp Lagging strand RNA primer of previous Okazaki fragment Leading strand (a) Continuous synthesis on the leading strand proceeds as DNA is unwound by the DnaB helicase. Primase New RNA primer Primer of previous Okazaki fragment approaches core subunits (b) DNA primase binds to DnaB, synthesizes a new primer, then dissociates. Primase Discarded b clamp The next b clamp is readied New b clamp is loaded onto new template primer Synthesis of new Okazaki fragment is completed (c) New b clamp (e) (d) FIGURE 25–14 DNA synthesis on the leading and lagging strands. Events at the replication fork are coordinated by a single DNA polymerase III dimer, in an integrated complex with DnaB helicase. This figure shows the replication process already underway (parts (a) through (e) are discussed in the text). The lagging strand is looped so that DNA synthesis proceeds steadily on both the leading and lagging strand templates at the same time. Red arrows indicate the 3H11032 end of the two new strands and the direction of DNA synthesis. Black arrows show the direction of movement of the parent DNA through the complex. An Okazaki fragment is being synthesized on the lagging strand. 8885d_c25_948-994 2/11/04 1:57 PM Page 961 mac76 mac76:385_reb: The synthesis of Okazaki fragments on the lagging strand entails some elegant enzymatic choreography. The DnaB helicase and DnaG primase constitute a func- tional unit within the replication complex, the primo- some. DNA polymerase III uses one set of its core sub- units (the core polymerase) to synthesize the leading strand continuously, while the other set of core subunits cycles from one Okazaki fragment to the next on the looped lagging strand. The DnaB helicase unwinds the DNA at the replication fork (Fig. 25–14a) as it travels along the lagging strand template in the 5H11032n3H11032 direc- tion. DNA primase occasionally associates with DnaB helicase and synthesizes a short RNA primer (Fig. 25–14b). A new H9252 sliding clamp is then positioned at the primer by the clamp-loading complex of DNA poly- merase III (Fig. 25–14c). When synthesis of an Okazaki fragment has been completed, replication halts, and the core subunits of DNA polymerase III dissociate from their H9252 sliding clamp (and from the completed Okazaki fragment) and associate with the new clamp (Fig. 25–14d, e). This initiates synthesis of a new Okazaki fragment. As noted earlier, the entire complex respon- sible for coordinated DNA synthesis at a replication fork is a replisome. The proteins acting at the replication fork are summarized in Table 25–4. The replisome promotes rapid DNA synthesis, adding ~1,000 nucleotides/s to each strand (leading and lagging). Once an Okazaki fragment has been com- pleted, its RNA primer is removed and replaced with DNA by DNA polymerase I, and the remaining nick is sealed by DNA ligase (Fig. 25–15). DNA ligase catalyzes the formation of a phosphodi- ester bond between a 3H11032 hydroxyl at the end of one DNA strand and a 5H11032 phosphate at the end of another strand. The phosphate must be activated by adenylyla- tion. DNA ligases isolated from viruses and eukaryotes use ATP for this purpose. DNA ligases from bacteria are unusual in that they generally use NAD H11001 —a cofactor that normally functions in hydride transfer reactions (see Fig. 13–15)—as the source of the AMP activating group (Fig. 25–16). DNA ligase is another enzyme of DNA metabolism that has become an important reagent in recombinant DNA experiments (see Fig. 9–1). Termination Eventually, the two replication forks of the circular E. coli chromosome meet at a terminus region containing multiple copies of a 20 bp sequence called Ter (for terminus) (Fig. 25–17a). The Ter sequences are arranged on the chromosome to create a sort of trap that a replication fork can enter but cannot leave. The Ter sequences function as binding sites for a protein called Tus (terminus utilization substance). The Tus-Ter complex can arrest a replication fork from only one di- rection. Only one Tus-Ter complex functions per repli- cation cycle—the complex first encountered by either Chapter 25 DNA Metabolism962 TABLE 25–4 Proteins at the E. coli Replication Fork Number of Protein M r subunits Function SSB 75,600 4 Binding to single-stranded DNA DnaB protein (helicase) 300,000 6 DNA unwinding; primosome constituent Primase (DnaG protein) 60,000 1 RNA primer synthesis; primosome constituent DNA polymerase III 791,500 17 New strand elongation DNA polymerase I 103,000 1 Filling of gaps; excision of primers DNA ligase 74,000 1 Ligation DNA gyrase (DNA topoisomerase II) 400,000 4 Supercoiling Modified from Kornberg, A. (1982) Supplement to DNA Replication, Table S11–2, W. H. Freeman and Company, New York. 5H110323H11032 3H110325H11032 Lagging strand dNTPs DNA polymerase I rNMPs Nick ATP (or NAD + ) AMP +PP i (or NMN) DNA ligase FIGURE 25–15 Final steps in the synthesis of lagging strand seg- ments. RNA primers in the lagging strand are removed by the 5H11032n3H11032 exonuclease activity of DNA polymerase I and replaced with DNA by the same enzyme. The remaining nick is sealed by DNA ligase. The role of ATP or NAD H11001 is shown in Figure 25–16. 8885d_c25_948-994 2/11/04 1:57 PM Page 962 mac76 mac76:385_reb: O PP i (from ATP) or NMN (from NAD H11001 ) Enzyme P O O H11002 O Ribose Adenine Enzyme P O DNA ligase OOH Nick in DNA Enzyme-AMP NH 3 H11001 O P OOOH O O H11002 P O H11002 OO DNA ligase P O O H11002 O AdenineRibose AMP H11002 OP O H11002 O Sealed DNA AdenineRibose ROPO O H11002 O Ribose Adenine AMP from ATP (R H11005 PP i ) or NAD H11001 (R H11005 NMN) NH 2 H11001 O H11002 O H11002 Enzyme NH 3 H11001 1 Adenylylation of DNA ligase 2 Activation of 5H11032 phosphate in nick 5H11032 3H11032 3H11032 5H11032 3 Displacement of AMP seals nick replication fork. Given that opposing replication forks generally halt when they collide, Ter sequences do not seem essential, but they may prevent overreplication by one replication fork in the event that the other is de- layed or halted by an encounter with DNA damage or some other obstacle. So, when either replication fork encounters a func- tional Tus-Ter complex, it halts; the other fork halts when it meets the first (arrested) fork. The final few hundred base pairs of DNA between these large protein complexes are then replicated (by an as yet unknown mechanism), completing two topologically interlinked (catenated) circular chromosomes (Fig. 25–17b). DNA circles linked in this way are known as catenanes. Sep- aration of the catenated circles in E. coli requires topoi- somerase IV (a type II topoisomerase). The separated chromosomes then segregate into daughter cells at cell division. The terminal phase of replication of other cir- cular chromosomes, including many of the DNA viruses that infect eukaryotic cells, is similar. Bacterial Replication Is Organized in Membrane- Bound Replication Factories The replication of a circular bacterial chromosome is highly organized. Once bidirectional replication is initi- ated at the origin, the two replisomes do not travel away from each other along the DNA. Instead, the replisomes are linked together and tethered to one point on the bacterial inner membrane, and the DNA substrate is fed through this “replication factory” (Fig. 25–18a). The tethering point is at the center of the elongated bacte- rial cell. After initiation, each of the two newly synthe- sized replication origins is partitioned into one half of 25.1 DNA Replication 963 FIGURE 25–16 Mechanism of the DNA ligase reaction. In each of the three steps, one phosphodiester bond is formed at the expense of another. Steps 1H22071 and 2H22071 lead to activation of the 5H11032 phosphate in the nick. An AMP group is transferred first to a Lys residue on the en- zyme and then to the 5H11032 phosphate in the nick. In step 3H22071, the 3H11032- hydroxyl group attacks this phosphate and displaces AMP, producing a phosphodiester bond to seal the nick. In the E. coli DNA ligase reac- tion, AMP is derived from NAD H11001 . The DNA ligases isolated from a number of viral and eukaryotic sources use ATP rather than NAD H11001 , and they release pyrophosphate rather than nicotinamide mononu- cleotide (NMN) in step 1H22071. 8885d_c25_963 2/12/04 11:32 AM Page 963 mac76 mac76:385_reb: (a) Origin Clockwise fork Counter- clockwise fork trap Clockwise fork trap Counterclockwise fork TerG TerF TerB TerC TerA TerD TerB Clockwise fork Counter- clockwise fork completion of replication Catenated chromosomes Separated chromosomes (b) DNA topoisomerase IV the cell, and continuing replication extrudes each new chromosome into that half (Fig. 25–18b). The elaborate spatial organization of the newly replicated chromo- somes is orchestrated and maintained by many proteins, including bacterial homologs of the SMC proteins and topoisomerases (Chapter 24). Once replication is ter- minated, the cell divides, and the chromosomes se- questered in the two halves of the original cell are ac- curately partitioned into the daughter cells. When replication commences in the daughter cells, the origin of replication is sequestered in new replication factories formed at a point on the membrane at the center of the cell, and the entire process is repeated. Replication in Eukaryotic Cells Is More Complex The DNA molecules in eukaryotic cells are considerably larger than those in bacteria and are organized into com- plex nucleoprotein structures (chromatin; p. 938). The essential features of DNA replication are the same in eukaryotes and prokaryotes, and many of the protein complexes are functionally and structurally conserved. However, some interesting variations on the general principles discussed above promise new insights into the regulation of replication and its link with the cell cycle. Origins of replication, called autonomously repli- cating sequences (ARS) or replicators, have been identified and best studied in yeast. Yeast replicators span ~150 bp and contain several essential conserved sequences. About 400 replicators are distributed among the 16 chromosomes in a haploid yeast genome. Initia- tion of replication in all eukaryotes requires a multi- subunit protein, the origin recognition complex (ORC), which binds to several sequences within the replicator. ORC interacts with and is regulated by a number of other proteins involved in control of the eukaryotic cell cycle. Two other proteins, CDC6 (discovered in a screen for genes affecting the cell division cycle) and CDT1 (Cdc10-dependent transcript 1), bind to ORC and me- diate the loading of a heterohexamer of minichromo- some maintenance proteins (MCM2 to MCM7). The MCM complex is a ring-shaped replicative helicase, anal- ogous to the bacterial DnaB helicase. The CDC6 and CDT1 proteins have a role comparable to that of the bacterial DnaC protein, loading the MCM helicase onto the DNA near the replication origin. The rate of replication fork movement in eukary- otes (~50 nucleotides/s) is only one-twentieth that ob- served in E. coli. At this rate, replication of an average human chromosome proceeding from a single origin Chapter 25 DNA Metabolism964 FIGURE 25–17 Termination of chromosome replication in E. coli. (a) The Ter sequences are positioned on the chromo- some in two clusters with opposite orientations. (b) Replication of the DNA separating the opposing replication forks leaves the completed chromosomes joined as catenanes, or topologically interlinked circles. The circles are not covalently linked, but because they are interwound and each is covalently closed, they cannot be separated—except by the action of topoiso- merases. In E. coli, a type II topoisomerase known as DNA topoisomerase IV plays the primary role in the separation of catenated chromosomes, transiently breaking both DNA strands of one chromosome and allowing the other chromosome to pass through the break. 8885d_c25_948-994 2/11/04 1:57 PM Page 964 mac76 mac76:385_reb: 5H110323H11032 5H110323H11032 (a) would take more than 500 hours. Replication of human chromosomes in fact proceeds bidirectionally from many origins, spaced 30,000 to 300,000 bp apart. Eu- karyotic chromosomes are almost always much larger than bacterial chromosomes, so multiple origins are probably a universal feature in eukaryotic cells. Like bacteria, eukaryotes have several types of DNA polymerases. Some have been linked to particu- lar functions, such as the replication of mitochondrial DNA. The replication of nuclear chromosomes involves DNA polymerase H9251, in association with DNA poly- merase H9254. DNA polymerase H9251 is typically a multisub- unit enzyme with similar structure and properties in all eukaryotic cells. One subunit has a primase activity, and the largest subunit (M r ~180,000) contains the poly- merization activity. However, this polymerase has no proofreading 3H11032n5H11032 exonuclease activity, making it un- suitable for high-fidelity DNA replication. DNA poly- merase H9251 is believed to function only in the synthesis of short primers (containing either RNA or DNA) for Okazaki fragments on the lagging strand. These primers are then extended by the multisubunit DNA poly- merase H9254. This enzyme is associated with and stimu- lated by a protein called proliferating cell nuclear anti- gen (PCNA; M r 29,000), found in large amounts in the nuclei of proliferating cells. The three-dimensional structure of PCNA is remarkably similar to that of the H9252 subunit of E. coli DNA polymerase III (Fig. 25–10b), although primary sequence homology is not evident. PCNA has a function analogous to that of the H9252 sub- unit, forming a circular clamp that greatly enhances the processivity of the polymerase. DNA polymerase H9254 has a 3H11032n5H11032 proofreading exonuclease activity and appears to carry out both leading and lagging strand synthesis in a complex comparable to the dimeric bacterial DNA polymerase III. Yet another polymerase, DNA polymerase H9255, re- places DNA polymerase H9254 in some situations, such as in DNA repair. DNA polymerase H9255 may also function at the replication fork, perhaps playing a role analogous to that of the bacterial DNA polymerase I, removing the primers of Okazaki fragments on the lagging strand. 25.1 DNA Replication 965 Origin Bacterium Replisome replication begins origins separate cell elongates as replication continues chromosomes separate cells divide Terminator (b) Chromosome FIGURE 25–18 Chromosome partitioning in bacteria. (a) All replication is carried out at a central replication factory that includes two complete replication forks. (b) The two replicated copies of the bacterial chromosome are extruded from the replication factory into the two halves of the cell, possibly with each newly synthesized origin bound separately to different points on the plasma membrane. Sequestering the two chromosome copies in separate cell halves facilitates their proper segregation at cell division. 8885d_c25_948-994 2/11/04 1:57 PM Page 965 mac76 mac76:385_reb: Many DNA viruses encode their own DNA poly- merases, and some of these have become targets for pharmaceuticals. For example, the DNA polymerase of the herpes simplex virus is inhibited by acyclovir, a com- pound developed by Gertrude Elion (p. 876). Acyclovir consists of guanine attached to an incomplete ribose ring. It is phosphorylated by a virally encoded thymi- dine kinase; acyclovir binds to this viral enzyme with an affinity 200-fold greater than its binding to the cellular thymidine kinase. This ensures that phosphorylation oc- curs mainly in virus-infected cells. Cellular kinases con- vert the resulting acyclo-GMP to acyclo-GTP, which is both an inhibitor and a substrate of DNA polymerases, and which competitively inhibits the herpes DNA poly- merase more strongly than cellular DNA polymerases. Because it lacks a 3H11032 hydroxyl, acyclo-GTP also acts as a chain terminator when incorporated into DNA. Thus viral replication is inhibited at several steps. Two other protein complexes also function in eu- karyotic DNA replication. RPA (replication protein A) is a eukaryotic single-stranded DNA–binding protein, equivalent in function to the E. coli SSB protein. RFC (replication factor C) is a clamp loader for PCNA and facilitates the assembly of active replication complexes. The subunits of the RFC complex have significant se- quence similarity to the subunits of the bacterial clamp- loading (H9253) complex. The termination of replication on linear eukaryotic chromosomes involves the synthesis of special struc- tures called telomeres at the ends of each chromo- some, as discussed in the next chapter. SUMMARY 25.1 DNA Replication ■ Replication of DNA occurs with very high fidelity and at a designated time in the cell cycle. Replication is semiconservative, each strand acting as template for a new daughter strand. It is carried out in three identifiable phases: initiation, elongation, and termination. The reaction starts at the origin and usually proceeds bidirectionally. ■ DNA is synthesized in the 5H11032n3H11032 direction by DNA polymerases. At the replication fork, the leading strand is synthesized continuously in the same direction as replication fork movement; the lagging strand is synthesized discontinuously as Okazaki fragments, which are subsequently ligated. HN N N O O OH H 2 N N ■ The fidelity of DNA replication is maintained by (1) base selection by the polymerase, (2) a 3H11032n5H11032 proofreading exonuclease activity that is part of most DNA polymerases, and (3) specific repair systems for mismatches left behind after replication. ■ Most cells have several DNA polymerases. In E. coli, DNA polymerase III is the primary replication enzyme. DNA polymerase I is responsible for special functions during replication, recombination, and repair. ■ Replication of the E. coli chromosome involves many enzymes and protein factors organized in replication factories, in which template DNA is spooled through two replisomes tethered to the bacterial plasma membrane. ■ Replication is similar in eukaryotic cells, but eukaryotic chromosomes have many replication origins. 25.2 DNA Repair A cell generally has only one or two sets of genomic DNA. Damaged proteins and RNA molecules can be quickly replaced by using information encoded in the DNA, but DNA molecules themselves are irreplaceable. Maintaining the integrity of the information in DNA is a cellular imperative, supported by an elaborate set of DNA repair systems. DNA can become damaged by a variety of processes, some spontaneous, others cat- alyzed by environmental agents (Chapter 8). Replica- tion itself can very occasionally damage the information content in DNA when errors introduce mismatched base pairs (such as G paired with T). The chemistry of DNA damage is diverse and com- plex. The cellular response to this damage includes a wide range of enzymatic systems that catalyze some of the most interesting chemical transformations in DNA metabolism. We first examine the effects of alterations in DNA sequence and then consider specific repair systems. Mutations Are Linked to Cancer The best way to illustrate the importance of DNA repair is to consider the effects of unrepaired DNA damage (a lesion). The most serious outcome is a change in the base sequence of the DNA, which, if replicated and transmitted to future cell generations, becomes perma- nent. A permanent change in the nucleotide sequence of DNA is called a mutation. Mutations can involve the replacement of one base pair with another (substitution mutation) or the addition or deletion of one or more base pairs (insertion or deletion mutations). If the mu- tation affects nonessential DNA or if it has a negligible Chapter 25 DNA Metabolism966 8885d_c25_948-994 2/11/04 1:57 PM Page 966 mac76 mac76:385_reb: effect on the function of a gene, it is known as a silent mutation. Rarely, a mutation confers some biological advantage. Most nonsilent mutations, however, are deleterious. In mammals there is a strong correlation between the accumulation of mutations and cancer. A simple test developed by Bruce Ames measures the potential of a given chemical compound to promote certain easily de- tected mutations in a specialized bacterial strain (Fig. 25–19). Few of the chemicals that we encounter in daily life score as mutagens in this test. However, of the com- pounds known to be carcinogenic from extensive animal trials, more than 90% are also found to be mutagenic in the Ames test. Because of this strong correlation be- tween mutagenesis and carcinogenesis, the Ames test for bacterial mutagens is widely used as a rapid and in- expensive screen for potential human carcinogens. The genome of a typical mammalian cell accumulates many thousands of lesions during a 24-hour period. However, as a result of DNA repair, fewer than 1 in 1,000 becomes a mutation. DNA is a relatively stable mole- cule, but in the absence of repair systems, the cumula- tive effect of many infrequent but damaging reactions would make life impossible. All Cells Have Multiple DNA Repair Systems The number and diversity of repair systems reflect both the importance of DNA repair to cell survival and the diverse sources of DNA damage (Table 25–5). Some common types of lesions, such as pyrimidine dimers (see Fig. 8–34), can be repaired by several distinct sys- tems. Many DNA repair processes also appear to be ex- traordinarily inefficient energetically—an exception to 25.2 DNA Repair 967 FIGURE 25–19 Ames test for carcinogens, based on their muta- genicity. A strain of Salmonella typhimurium having a mutation that inactivates an enzyme of the histidine biosynthetic pathway is plated on a histidine-free medium. Few cells grow. (a) The few small colonies of S. typhimurium that do grow on a histidine-free medium carry spon- taneous back-mutations that permit the histidine biosynthetic pathway to operate. Three identical nutrient plates (b), (c), and (d) have been inoculated with an equal number of cells. Each plate then receives a disk of filter paper containing progressively lower concentrations of a mutagen. The mutagen greatly increases the rate of back-mutation and hence the number of colonies. The clear areas around the filter paper indicate where the concentration of mutagen is so high that it is lethal to the cells. As the mutagen diffuses away from the filter paper, it is diluted to sublethal concentrations that promote back-mutation. Mu- tagens can be compared on the basis of their effect on mutation rate. Because many compounds undergo a variety of chemical transforma- tions after entering a cell, compounds are sometimes tested for mu- tagenicity after first incubating them with a liver extract. Some sub- stances have been found to be mutagenic only after this treatment. TABLE 25–5 Types of DNA Repair Systems in E. coli Enzymes/proteins Type of damage Mismatch repair Dam methylase Mismatches MutH, MutL, MutS proteins DNA helicase II SSB DNA polymerase III Exonuclease I Exonuclease VII RecJ nuclease Exonuclease X DNA ligase Base-excision repair DNA glycosylases Abnormal bases (uracil, hypoxanthine, xanthine); alkylated bases; in some other organisms, pyrimidine dimers AP endonucleases DNA polymerase I DNA ligase Nucleotide-excision repair ABC excinuclease DNA lesions that cause large structural changes (e.g., pyrimidine dimers) DNA polymerase I DNA ligase Direct repair DNA photolyases Pyrimidine dimers O 6 -Methylguanine-DNA O 6 -Methylguanine methyltransferase AlkB protein 1-Methylguanine, 3-methylcytosine (a) (b) (c) (d) 8885d_c25_948-994 2/11/04 1:57 PM Page 967 mac76 mac76:385_reb: the pattern observed in the metabolic pathways, where every ATP is generally accounted for and used optimally. When the integrity of the genetic information is at stake, the amount of chemical energy invested in a repair process seems almost irrelevant. DNA repair is possible largely because the DNA mol- ecule consists of two complementary strands. DNA damage in one strand can be removed and accurately replaced by using the undamaged complementary strand as a template. We consider here the principal types of repair systems, beginning with those that repair the rare nucleotide mismatches that are left behind by repli- cation. Mismatch Repair Correction of the rare mismatches left after replication in E. coli improves the overall fidelity of replication by an additional factor of 10 2 to 10 3 . The mismatches are nearly always corrected to reflect the information in the old (template) strand, so the repair system must somehow discriminate between the tem- plate and the newly synthesized strand. The cell ac- complishes this by tagging the template DNA with methyl groups to distinguish it from newly synthesized strands. The mismatch repair system of E. coli includes at least 12 protein components (Table 25–5) that func- tion either in strand discrimination or in the repair process itself. The strand discrimination mechanism has not been worked out for most bacteria or eukaryotes, but is well understood for E. coli and some closely related bacte- ria. In these prokaryotes, strand discrimination is based on the action of Dam methylase (Table 25–3), which, as you will recall, methylates DNA at the N 6 position of all adenines within (5H11032)GATC sequences. Immediately af- ter passage of the replication fork, there is a short pe- riod (a few seconds or minutes) during which the tem- plate strand is methylated but the newly synthesized strand is not (Fig. 25–20). The transient unmethylated state of GATC sequences in the newly synthesized strand permits the new strand to be distinguished from the template strand. Replication mismatches in the vicinity of a hemimethylated GATC sequence are then repaired according to the information in the methylated parent (template) strand. Tests in vitro show that if both strands are methylated at a GATC sequence, few mis- matches are repaired; if neither strand is methylated, repair occurs but does not favor either strand. The cell’s Chapter 25 DNA Metabolism968 GATC CTAG 5H11032 3H11032 3H11032 5H11032 CH 3 CH 3 replication GATC CTAG 5H11032 3H11032 3H11032 5H11032 CH 3 GATC CTAG CH 3 5H11032 3H11032 For a short period following replication, the template strand is methylated and the new strand is not. Hemimethylated DNA GATC CTAG 5H11032 3H11032 3H11032 5H11032 CH 3 GATC CTAG 5H11032 3H11032 3H11032 5H11032 CH 3 After a few minutes the new strand is methylated and the two strands can no longer be distinguished. Dam methylase GATC CTAG 5H11032 3H11032 3H11032 5H11032 CH 3 CH 3 GATC CTAG 5H11032 3H11032 3H11032 5H11032 CH 3 CH 3 FIGURE 25–20 Methylation and mismatch repair. Methylation of DNA strands can serve to distinguish parent (template) strands from newly synthesized strands in E. coli DNA, a function that is critical to mismatch repair (see Fig. 25–21). The methylation occurs at the N 6 of adenines in (5H11032)GATC sequences. This sequence is a palindrome (see Fig. 8–20), present in opposite orientations on the two strands. 8885d_c25_948-994 2/11/04 1:57 PM Page 968 mac76 mac76:385_reb: CH 3 CH 3 CH 3 CH 3 5H11032 3H11032 3H11032 5H11032 MutL-MutS complex MutH MutH cleaves the unmodified strand CH 3 CH 3 CH 3 CH 3 MutS MutL MutH ATP ADP+P i ATP ADP+P i G A T C C T A G methyl-directed mismatch repair system efficiently re- pairs mismatches up to 1,000 bp from a hemimethylated GATC sequence. For many bacterial species, the mech- anism of strand discrimination during mismatch repair has not been determined. How is the mismatch correction process directed by relatively distant GATC sequences? A mechanism is illustrated in Figure 25–21. MutL protein forms a com- plex with MutS protein, and the complex binds to all mismatched base pairs (except C–C). MutH protein binds to MutL and to GATC sequences encountered by the MutL-MutS complex. DNA on both sides of the mis- match is threaded through the MutL-MutS complex, cre- ating a DNA loop; simultaneous movement of both legs of the loop through the complex is equivalent to the complex moving in both directions at once along the DNA. MutH has a site-specific endonuclease activity that is inactive until the complex encounters a hemimethyl- ated GATC sequence. At this site, MutH catalyzes cleav- age of the unmethylated strand on the 5H11032 side of the G in GATC, which marks the strand for repair. Further steps in the pathway depend on where the mismatch is located relative to this cleavage site (Fig. 25–22). When the mismatch is on the 5H11032 side of the cleav- age site, the unmethylated strand is unwound and de- graded in the 3H11032n5H11032 direction from the cleavage site through the mismatch, and this segment is replaced with new DNA. This process requires the combined action of DNA helicase II, SSB, exonuclease I or exonuclease X (both of which degrade strands of DNA in the 3H11032n5H11032 di- rection), DNA polymerase III, and DNA ligase. The pathway for repair of mismatches on the 3H11032 side of the cleavage site is similar, except that the exonuclease is either exonuclease VII (which degrades single-stranded DNA in the 5H11032n3H11032 or 3H11032n5H11032 direction) or RecJ nucle- ase (which degrades single-stranded DNA in the 5H11032n3H11032 direction). Mismatch repair is a particularly expensive process for E. coli in terms of energy expended. The mismatch may be 1,000 bp or more from the GATC sequence. The degradation and replacement of a strand segment of this length require an enormous investment in activated de- oxynucleotide precursors to repair a single mismatched base. This again underscores the importance to the cell of genomic integrity. All eukaryotic cells have several proteins struc- turally and functionally analogous to the bacterial MutS and MutL (but not MutH) proteins. Alterations in hu- man genes encoding proteins of this type produce some of the most common inherited cancer-susceptibility syn- dromes (Box 25–1), further demonstrating the value to the organism of DNA repair systems. The main MutS ho- mologs in most eukaryotes, from yeast to humans, are MSH2 (MutShomolog 2), MSH3, and MSH6. Het- erodimers of MSH2 and MSH6 generally bind to single 25.2 DNA Repair 969 FIGURE 25–21 A model for the early steps of methyl-directed mis- match repair. The proteins involved in this process in E. coli have been purified (see Table 25–5). Recognition of the sequence (5H11032)GATC and of the mismatch are specialized functions of the MutH and MutS pro- teins, respectively. The MutL protein forms a complex with MutS at the mismatch. DNA is threaded through this complex such that the complex moves simultaneously in both directions along the DNA un- til it encounters a MutH protein bound at a hemimethylated GATC se- quence. MutH cleaves the unmethylated strand on the 5H11032 side of the G in this sequence. A complex consisting of DNA helicase II and one of several exonucleases then degrades the unmethylated DNA strand from that point toward the mismatch (see Fig. 25–22). 8885d_c25_948-994 2/11/04 1:57 PM Page 969 mac76 mac76:385_reb: Chapter 25 DNA Metabolism970 BOX 25–1 BIOCHEMISTRY IN MEDICINE DNA Repair and Cancer Human cancer develops when certain genes that reg- ulate normal cell division (oncogenes and tumor suppressor genes; Chapter 12) fail to function, are ac- tivated at the wrong time, or are altered. As a conse- quence, cells may grow out of control and form a tumor. The genes controlling cell division can be dam- aged by spontaneous mutation or overridden by the invasion of a tumor virus (Chapter 26). Not surpris- ingly, alterations in DNA-repair genes that result in an increase in the rate of mutation can greatly increase an individual’s susceptibility to cancer. Defects in the genes encoding the proteins involved in nucleotide- excision repair, mismatch repair, recombinational re- pair, and error-prone translesion synthesis have all been linked to human cancers. Clearly, DNA repair can be a matter of life and death. Nucleotide-excision repair requires a larger num- ber of proteins in humans than in bacteria, although the overall pathways are very similar. Genetic defects that inactivate nucleotide-excision repair have been associated with several genetic diseases, the best- studied of which is xeroderma pigmentosum, or XP. Because nucleotide-excision repair is the sole repair pathway for pyrimidine dimers in humans, people with XP are extremely light sensitive and readily develop sunlight-induced skin cancers. Most people with XP also have neurological abnormalities, presumably be- cause of their inability to repair certain lesions caused by the high rate of oxidative metabolism in neurons. Defects in the genes encoding any of at least seven different protein components of the nucleotide- excision repair system can result in XP, giving rise to seven different genetic groups denoted XPA to XPG. Several of these proteins (notably XPB, XPD, and XPG) also play roles in transcription-coupled base-excision repair of oxidative lesions, described in Chapter 26. Most microorganisms have redundant pathways for the repair of cyclobutane pyrimidine dimers— making use of DNA photolyase and sometimes base- excision repair as alternatives to nucleotide-excision repair—but humans and other placental mammals do not. This lack of a back-up to nucleotide-excision repair for the removal of pyrimidine dimers has led to speculation that early mammalian evolution involved small, furry, nocturnal animals with little need to re- pair UV damage. However, mammals do have a path- way for the translesion bypass of cyclobutane pyrim- idine dimers, which involves DNA polymerase H9257. This enzyme preferentially inserts two A residues opposite a T–T pyrimidine dimer, minimizing mutations. Peo- ple with a genetic condition in which DNA polymerase H9257 function is missing exhibit an XP-like illness known as XP-variant or XP-V. Clinical manifestations of XP- V are similar to those of the classic XP diseases, al- though mutation levels are higher when cells are ex- posed to UV light. Apparently, the nucleotide-excision repair system works in concert with DNA polymerase H9257 in normal human cells, repairing and/or bypassing pyrimidine dimers as needed to keep cell growth and DNA replication going. Exposure to UV light intro- duces a heavy load of pyrimidine dimers, requiring that some be bypassed by translesion synthesis to keep replication on track. When either system is miss- ing, it is partly compensated for by the other. A loss of polymerase H9257 activity leads to stalled replication forks and bypass of UV lesions by different, and more mutagenic, translesion synthesis (TLS) polymerases. As when other DNA repair systems are absent, the resulting increase in mutations often leads to cancer. One of the most common inherited cancer-sus- ceptibility syndromes is hereditary nonpolyposis colon cancer, or HNPCC. This syndrome has been traced to defects in mismatch repair. Human and other eukary- otic cells have several proteins analogous to the bac- terial MutL and MutS proteins (see Fig. 25–21). De- fects in at least five different mismatch repair genes can give rise to HNPCC. The most prevalent are de- fects in the hMLH1 (human MutL homolog 1) and hMSH2 (human MutS homolog 2) genes. In individu- als with HNPCC, cancer generally develops at an early age, with colon cancers being most common. Most human breast cancer occurs in women with no known predisposition. However, about 10% of cases are associated with inherited defects in two genes, BRCA1 and BRCA2. BRCA1 and BRCA2 are large pro- teins (human BRCA1 and BRCA2 are 1834 and 3418 amino acid residues long, respectively). They both in- teract with a wide range of other proteins involved in transcription, chromosome maintenance, DNA repair, and control of the cell cycle. However, the precise molecular function of BRACA1 and BRCA2 in these various cellular processes is not yet clear. Women with defects in either the BRCA1 or BRCA2 gene have a greater than 80% chance of developing breast cancer. 8885d_c25_948-994 2/11/04 1:57 PM Page 970 mac76 mac76:385_reb: base-pair mismatches, and bind less well to slightly longer mispaired loops. In many organisms the longer mismatches (2 to 6 bp) may be bound instead by a het- erodimer of MSH2 and MSH3, or are bound by both types of heterodimers in tandem. Homologs of MutL, predominantly a heterodimer of MLH1 and PMS1 (post- meiotic segregation), bind to and stabilize the MSH com- plexes. Many details of the subsequent events in eu- karyotic mismatch repair remain to be worked out. In particular, we do not know the mechanism by which newly synthesized DNA strands are identified, although research has revealed that this strand identification does not involve GATC sequences. Base-Excision Repair Every cell has a class of enzymes called DNA glycosylases that recognize particularly common DNA lesions (such as the products of cytosine and adenine deamination; see Fig. 8–33a) and remove the affected base by cleaving the N-glycosyl bond. This cleavage creates an apurinic or apyrimidinic site in the DNA, commonly referred to as an AP site or abasic site. Each DNA glycosylase is generally specific for one type of lesion. Uracil DNA glycosylases, for example, found in most cells, specifically remove from DNA the uracil that re- sults from spontaneous deamination of cytosine. Mutant cells that lack this enzyme have a high rate of GmC to AUT mutations. This glycosylase does not remove uracil residues from RNA or thymine residues from DNA. The capacity to distinguish thymine from uracil, the product of cytosine deamination—necessary for the selective re- pair of the latter—may be one reason why DNA evolved to contain thymine instead of uracil (p. 293). Bacteria generally have just one type of uracil DNA glycosylase, whereas humans have at least four types, with different specificities—an indicator of the impor- tance of uracil removal from DNA. The most abundant human uracil glycosylase, UNG, is associated with the human replisome, where it eliminates the occasional U residue inserted in place of a T during replication. The deamination of C residues is 100-fold faster in single- stranded DNA than in double-stranded DNA, and 25.2 DNA Repair 971 3H11032 5H11032 5H11032 3H11032 CH 3 CH 3 CH 3 CH 3 CH 3 CH 3 CH 3 CH 3 CH 3 CH 3 CH 3 CH 3 CH 3 CH 3 MutS MutL MutH MutL-MutS DNA helicase II exonuclease VII or RecJ nuclease MutL-MutS DNA helicase II exonuclease I or exonuclease X DNA polymerase III SSB DNA polymerase III SSB ATP ADP+P i ATP ADP+P i ATP ADP+P i FIGURE 25–22 Completing methyl-directed mismatch repair. The combined action of DNA helicase II, SSB, and one of four different exonucleases removes a segment of the new strand between the MutH cleavage site and a point just beyond the mismatch. The exonuclease that is used depends on the location of the cleavage site relative to the mismatch. The resulting gap is filled in by DNA polymerase III, and the nick is sealed by DNA ligase (not shown). 8885d_c25_948-994 2/11/04 1:57 PM Page 971 mac76 mac76:385_reb: humans have the enzyme hSMUG1, which removes any U residues that occur in single-stranded DNA during replication or transcription. Two other human DNA glycosylases, TDG and MBD4, remove either U or T residues paired with G, generated by deamination of cytosine or 5-methylcytosine, respectively. Other DNA glycosylases recognize and remove a va- riety of damaged bases, including formamidopyrimidine and 8-hydroxyguanine (both arising from purine oxida- tion), hypoxanthine (arising from adenine deamina- tion), and alkylated bases such as 3-methyladenine and 7-methylguanine. Glycosylases that recognize other le- sions, including pyrimidine dimers, have also been iden- tified in some classes of organisms. Remember that AP sites also arise from the slow, spontaneous hydrolysis of the N-glycosyl bonds in DNA (see Fig. 8–33b). Once an AP site has formed, another group of en- zymes must repair it. The repair is not made by simply inserting a new base and re-forming the N-glycosyl bond. Instead, the deoxyribose 5H11032-phosphate left behind is removed and replaced with a new nucleotide. This process begins with AP endonucleases, enzymes that cut the DNA strand containing the AP site. The position of the incision relative to the AP site (5H11032 or 3H11032 to the site) varies with the type of AP endonuclease. A seg- ment of DNA including the AP site is then removed, DNA polymerase I replaces the DNA, and DNA ligase seals the remaining nick (Fig. 25–23). In eukaryotes, nu- cleotide replacement is carried out by specialized poly- merases, as described below. Nucleotide-Excision Repair DNA lesions that cause large distortions in the helical structure of DNA generally are repaired by the nucleotide-excision system, a repair pathway critical to the survival of all free-living organ- isms. In nucleotide-excision repair (Fig. 25–24), a mul- tisubunit enzyme hydrolyzes two phosphodiester bonds, one on either side of the distortion caused by the lesion. In E. coli and other prokaryotes, the enzyme system hy- drolyzes the fifth phosphodiester bond on the 3H11032 side and the eighth phosphodiester bond on the 5H11032 side to generate a fragment of 12 to 13 nucleotides (depending on whether the lesion involves one or two bases). In hu- mans and other eukaryotes, the enzyme system hy- drolyzes the sixth phosphodiester bond on the 3H11032 side and the twenty-second phosphodiester bond on the 5H11032 side, producing a fragment of 27 to 29 nucleotides. Fol- lowing the dual incision, the excised oligonucleotides are released from the duplex and the resulting gap is filled—by DNA polymerase I in E. coli and DNA poly- merase H9255 in humans. DNA ligase seals the nick. In E. coli, the key enzymatic complex is the ABC excinuclease, which has three subunits, UvrA (M r 104,000), UvrB (M r 78,000), and UvrC (M r 68,000). The Chapter 25 DNA Metabolism972 FIGURE 25–23 DNA repair by the base-excision repair pathway. 1H22071 A DNA glycosylase recognizes a damaged base and cleaves between the base and deoxyribose in the backbone. 2H22071 An AP en- donuclease cleaves the phosphodiester backbone near the AP site. 3H22071 DNA polymerase I initiates repair synthesis from the free 3H11032 hy- droxyl at the nick, removing (with its 5H11032n3H11032 exonuclease activity) a portion of the damaged strand and replacing it with undamaged DNA. 4H22071 The nick remaining after DNA polymerase I has dissociated is sealed by DNA ligase. 5H11032 P P P P P P P P P P P P P P P P P P P P P P P P P P P P 5H11032 DNA glycosylase Damaged base 3H11032 3H11032 AP endonuclease NTPs DNA ligase New DNA Nick Deoxyribose phosphate H11001 dNMPs P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P PP P P P P P P P P P P P P P P P P P P P P P P P P P P PP P DNA polymerase I 2 1 3 4 8885d_c25_948-994 2/11/04 1:57 PM Page 972 mac76 mac76:385_reb: 25.2 DNA Repair 973 DNA polymerase I 3H110325H11032 5H110323H11032 P PP P P P POH OH P DNA ligase DNA polymerase e DNA ligase DNA helicaseDNA helicase 13 mer DNA lesion human excinuclease E. coli excinuclease 29 mer 1 2 3 4 4 3 2 1 FIGURE 25–24 Nucleotide-excision repair in E. coli and humans. The general pathway of nucleotide-excision repair is similar in all organisms. 1H22071 An excinuclease binds to DNA at the site of a bulky lesion and cleaves the damaged DNA strand on either side of the lesion. 2H22071 The DNA segment—of 13 nucleotides (13 mer) or 29 nucleotides (29 mer)—is removed with the aid of a helicase. 3H22071 The gap is filled in by DNA polymerase, and 4H22071 the remaining nick is sealed with DNA ligase. term “excinuclease” is used to describe the unique ca- pacity of this enzyme complex to catalyze two specific endonucleolytic cleavages, distinguishing this activity from that of standard endonucleases. A complex of the UvrA and UvrB proteins (A 2 B) scans the DNA and binds to the site of a lesion. The UvrA dimer then dissociates, leaving a tight UvrB-DNA complex. UvrC protein then binds to UvrB, and UvrB makes an incision at the fifth phosphodiester bond on the 3H11032 side of the lesion. This is followed by a UvrC-mediated incision at the eighth phosphodiester bond on the 5H11032 side. The resulting 12 to 13 nucleotide fragment is removed by UvrD helicase. The short gap thus created is filled in by DNA poly- merase I and DNA ligase. This pathway is a primary repair route for many types of lesions, including cyclo- butane pyrimidine dimers, 6-4 photoproducts (see Fig. 8–34), and several other types of base adducts includ- ing benzo[a]pyrene-guanine, which is formed in DNA by exposure to cigarette smoke. The nucleolytic activity of the ABC excinuclease is novel in the sense that two cuts are made in the DNA (Fig. 25–24). The mechanism of eukaryotic excinucleases is quite similar to that of the bacterial enzyme, although 16 poly- peptides with no similarity to the E. coli excinuclease subunits are required for the dual incision. As described in Chapter 26, some of the nucleotide-excision repair and base-excision repair in eukaryotes is closely tied to transcription. Genetic deficiencies in nucleotide- excision repair in humans give rise to a variety of serious diseases (Box 25–1). 8885d_c25_948-994 2/11/04 1:57 PM Page 973 mac76 mac76:385_reb: age (Fig. 25–25). Photolyases generally contain two cofactors that serve as light-absorbing agents, or chro- mophores. One of the chromophores is always FADH H11002 . In E. coli and yeast, the other chromophore is a folate. The reaction mechanism entails the generation of free radicals. DNA photolyases are not present in the cells of placental mammals (which include humans). Chapter 25 DNA Metabolism974 O O NHHN ON P Cyclobutane pyrimidine dimer Monomeric pyrimidines in repaired DNA O N O HN N H H C N H N N (Glu) n C H 2 O H11002 e H11002 e H11002 O NHHN ON P O N O H11002 O NHHN ON P O N O H11002 O NH NH CH 3 CH 3 HN O O N P O N H11002 O O N N R N H O NHHN ON P O N O Flavin radical FADH ? NH CH 3 CH 3 H 2 N N H11002 O O N R N H *FADH H11002 NH CH 3 CH 3 N H11002 O O N R N H FADH H11002 H11001 H11001H11001 MTHFpolyGlu O HN N H H C N H N N (Glu) n C H 2 H 2 N *MTHFpolyGlu light * 3 4 4 5 1 2 MECHANISM FIGURE 25–25 Repair of pyrimidine dimers with pho- tolyase. Energy derived from absorbed light is used to reverse the pho- toreaction that caused the lesion. The two chromophores in E. coli photolyase (M r 54,000), N 5 ,N 10 -methenyltetrahydrofolylpolygluta- mate (MTHFpolyGlu) and FADH H11002 , perform complementary functions. On binding of photolyase to a pyrimidine dimer, repair proceeds as follows. 1H22071 A blue-light photon (300 to 500 nm wavelength) is ab- sorbed by the MTHFpolyGlu, which functions as a photoantenna. 2H22071 The excitation energy passes to FADH H11002 in the active site of the enzyme. 3H22071 The excited flavin (*FADH H11002 ) donates an electron to the pyrimidine dimer (shown here in a simplified representation) to gen- erate an unstable dimer radical. 4H22071 Electronic rearrangement restores the monomeric pyrimidines, and 5H22071 the electron is transferred back to the flavin radical to regenerate FADH H11002 . Direct Repair Several types of damage are repaired without removing a base or nucleotide. The best-char- acterized example is direct photoreactivation of cy- clobutane pyrimidine dimers, a reaction promoted by DNA photolyases. Pyrimidine dimers result from an ultraviolet light–induced reaction, and photolyases use energy derived from absorbed light to reverse the dam- 8885d_c25_948-994 2/11/04 1:57 PM Page 974 mac76 mac76:385_reb: 25.2 DNA Repair 975 O Guanine methylation and replication O 6 -Methylguanine ON N HN N R H N N H N N N R R H O N N N R O H Cytosine CH 3 O N N CH 3 Thymine HN H H (a) G CH 3 methylation CG C G CH 3 TC replication A T replication Correctly paired DNA (no mutations) G CH 3 T G (b) FIGURE 25–26 Example of how DNA damage results in mutations. (a) The methylation product O 6 -methylguanine pairs with thymine rather than cytosine. (b) If not repaired, this leads to a GmC to AUT mutation after replication. Additional examples can be seen in the repair of nucleotides with alkylation damage. The modified nu- cleotide O 6 -methylguanine forms in the presence of alkylating agents and is a common and highly mutagenic lesion (p. 295). It tends to pair with thymine rather than cytosine during replication, and therefore causes GmC to AUT mutations (Fig. 25–26). Direct repair of O 6 - methylguanine is carried out by O 6 -methylguanine-DNA methyltransferase, a protein that catalyzes transfer of the methyl group of O 6 -methylguanine to one of its own Cys residues. This methyltransferase is not strictly an enzyme, because a single methyl transfer event perma- nently methylates the protein, making it inactive in this pathway. The consumption of an entire protein mole- cule to correct a single damaged base is another vivid illustration of the priority given to maintaining the in- tegrity of cellular DNA. OCH 3 CH 3 N N N Cys active SH R H 2 N N O HN N N R Guanine nucleotide methyltransferase H 2 N N Cys inactive S O 6 -Methylguanine nucleotide A very different but equally direct mechanism is used to repair 1-methyladenine and 3-methylcytosine. The amino groups of A and C residues are sometimes methylated when the DNA is single-stranded, and the methylation directly affects proper base pairing. In E. coli, oxidative demethylation of these alkylated nu- cleotides is mediated by the AlkB protein, a member of the H9251-ketoglutarate-Fe 2H11001 –dependent dioxygenase su- perfamily (Fig. 25–27). (See Box 4–3 for a description of another member of this enzyme family.) 8885d_c25_948-994 2/11/04 1:57 PM Page 975 mac76 mac76:385_reb: The Interaction of Replication Forks with DNA Damage Can Lead to Error-Prone Translesion DNA Synthesis The repair pathways considered to this point generally work only for lesions in double-stranded DNA, the un- damaged strand providing the correct genetic informa- tion to restore the damaged strand to its original state. However, in certain types of lesions, such as double- strand breaks, double-strand cross-links, or lesions in a single-stranded DNA, the complementary strand is it- self damaged or is absent. Double-strand breaks and le- sions in single-stranded DNA most often arise when a replication fork encounters an unrepaired DNA lesion (Fig. 25–28). Such lesions and DNA cross-links can also result from ionizing radiation and oxidative reactions. At a stalled bacterial replication fork, there are two avenues for repair. In the absence of a second strand, the information required for accurate repair must come from a separate, homologous chromosome. The repair system thus involves homologous genetic recombina- tion. This recombinational DNA repair is considered in detail in Section 25.3. Under some conditions, a sec- ond repair pathway, error-prone translesion DNA synthesis (often abbreviated TLS), becomes available. When this pathway is active, DNA repair becomes sig- nificantly less accurate and a high mutation rate can re- sult. In bacteria, error-prone translesion DNA synthesis is part of a cellular stress response to extensive DNA damage known, appropriately enough, as the SOS re- sponse. Some SOS proteins, such as the UvrA and UvrB proteins already described (Table 25–6), are normally present in the cell but are induced to higher levels as part of the SOS response. Additional SOS proteins par- ticipate in the pathway for error-prone repair; these in- clude the UmuC and UmuD proteins (“Umu” from un- mutable; lack of the umu gene function eliminates error-prone repair). The UmuD protein is cleaved in an SOS-regulated process to a shorter form called UmuDH11032, which forms a complex with UmuC to create a special- ized DNA polymerase (DNA polymerase V) that can replicate past many of the DNA lesions that would nor- mally block replication. Proper base pairing is often im- possible at the site of such a lesion, so this translesion replication is error-prone. Given the emphasis on the importance of genomic integrity throughout this chapter, the existence of a sys- tem that increases the rate of mutation may seem in- congruous. However, we can think of this system as a desperation strategy. The umuC and umuD genes are fully induced only late in the SOS response, and they are not activated for translesion synthesis initiated by UmuD cleavage unless the levels of DNA damage are particularly high and all replication forks are blocked. The mutations resulting from DNA polymerase V– mediated replication kill some cells and create deleteri- ous mutations in others, but this is the biological price an organism pays to overcome an otherwise insur- mountable barrier to replication, as it permits at least a few mutant cells to survive. In addition to DNA polymerase V, translesion repli- cation requires the RecA protein, SSB, and some sub- units derived from DNA polymerase III. Yet another DNA polymerase, DNA polymerase IV, is also induced during Chapter 25 DNA Metabolism976 N N N NH 2 N H11001 CH 3 CH 3 N H11001 CH 2 N NH 2 N H11001 N N N NH 2 H 2 C OH N N N N NH 2 CH 2 CH 2 CO 2 H11001 COO H11002 COO H11002 C CH 2 CH 2 O 2 H11001 CO 2 H11001 O 2 H11001 COO H11002 COO H11002 H9251-Ketoglutarate 1-Methyladenine 3-Methylcytosine Succinate AlkB, Fe 2H11001 H9251-Ketoglutarate Succinate AlkB, Fe 2H11001 O O H11001 H H11001 Formaldehyde N H11001 CH 2 N NH 2 H 2 C OH O H11001 H H11001 Formaldehyde Adenine O O N N NH 2 O Cytosine FIGURE 25–27 Direct repair of alkylated bases by AlkB. The AlkB protein is an H9251-ketoglutarate-Fe 2H11001 –dependent dioxygenase. It cat- alyzes the oxidative demethylation of 1-methyladenine and 3-methyl- cytosine residues. 8885d_c25_948-994 2/11/04 1:57 PM Page 976 mac76 mac76:385_reb: the SOS response. Replication by DNA polymerase IV, a product of the dinB gene, is also highly error-prone. The bacterial DNA polymerases IV and V are part of a family of TLS polymerases found in all organisms. These enzymes lack a proofreading exonuclease activity, and the fidelity of replicative base selection can be reduced by a factor of 10 2 , lowering overall replication fidelity to one error in ~1,000 nucleotides. Mammals have many low-fidelity DNA polymerases of the TLS polymerase family. However, the presence of these enzymes does not necessarily translate into an unacceptable mutational burden, because most of the 25.2 DNA Repair 977 FIGURE 25–28 DNA damage and its effect on DNA replication. If the replication fork encounters an unrepaired lesion or strand break, replication generally halts and the fork may collapse. A lesion is left behind in an unreplicated, single-stranded segment of the DNA; a strand break becomes a double-strand break. In each case, the damage to one strand cannot be repaired by mechanisms described earlier in this chapter, because the complementary strand required to direct accurate repair is damaged or absent. There are two possible avenues for repair: recombinational DNA repair (described in Fig. 25–37) or, when lesions are unusually numerous, error-prone repair. The latter mechanism involves a novel DNA poly- merase (DNA polymerase V, encoded by the umuC and umuD genes) that can replicate, albeit inaccurately, over many types of lesions. The repair mechanism is referred to as error-prone because mutations often result. Unrepaired lesion Unrepaired break Single-stranded DNA Double-strand break Recombinational DNA repair or error-prone repair Recombinational DNA repair TABLE 25–6 Genes Induced as Part of the SOS Response in E. coli Gene name Protein encoded and/or role in DNA repair Genes of known function polB (dinA) Encodes polymerization subunit of DNA polymerase II, required for replication restart in recombinational DNA repair uvrA Encode ABC excinuclease subunits UvrA and UvrB uvrB umuC Encode DNA polymerase V umuD sulA Encodes protein that inhibits cell division, possibly to allow time for DNA repair recA Encodes RecA protein, required for error-prone repair and recombinational repair dinB Encodes DNA polymerase IV Genes involved in DNA metabolism, but role in DNA repair unknown ssb Encodes single-stranded DNA–binding protein (SSB) uvrD Encodes DNA helicase II (DNA-unwinding protein) himA Encodes subunit of integration host factor (IHF), involved in site-specific recombination, replication, transposition, regulation of gene expression recN Required for recombinational repair Genes of unknown function dinD dinF H20903 H20903 Note: Some of these genes and their functions are further discussed in Chapter 28. 8885d_c25_948-994 2/11/04 1:57 PM Page 977 mac76 mac76:385_reb: enzymes also have specialized functions in DNA repair. DNA polymerase H9257 (eta), for example, is a TLS poly- merase found in all eukaryotes. It promotes translesion synthesis primarily across cyclobutane T–T dimers. Few mutations result in this case, because the enzyme pref- erentially inserts two A residues across from the linked T residues. Several other low-fidelity polymerases, in- cluding DNA polymerases H9252, H9259 (iota), and H9261, have spe- cialized roles in eukaryotic base-excision repair. Each of these enzymes has a 5H11032-deoxyribose phosphate lyase ac- tivity in addition to its polymerase activity. After base removal by a glycosylase and backbone cleavage by an AP endonuclease, these enzymes remove the abasic site (a 5H11032-deoxyribose phosphate) and fill in the very short gap with their polymerase activity. The frequency of mu- tations due to DNA polymerase H9257 activity is minimized by the very short lengths (often one nucleotide) of DNA synthesized. What emerges from research into cellular DNA re- pair systems is a picture of a DNA metabolism that main- tains genomic integrity with multiple and often redun- dant systems. In the human genome, more than 130 genes encode proteins dedicated to the repair of DNA. In many cases, the loss of function of one of these pro- teins results in genomic instability and an increased oc- currence of oncogenesis (Box 25–1). These repair sys- tems are often integrated with the DNA replication systems and are complemented by the recombination systems that we turn to next. SUMMARY 25.2 DNA Repair ■ Cells have many systems for DNA repair. Mismatch repair in E. coli is directed by transient nonmethylation of (5H11032)GATC sequences on the newly synthesized strand. ■ Base-excision repair systems recognize and repair damage caused by environmental agents (such as radiation and alkylating agents) and spontaneous reactions of nucleotides. Some repair systems recognize and excise only damaged or incorrect bases, leaving an AP (abasic) site in the DNA. This is repaired by excision and replacement of the DNA segment containing the AP site. ■ Nucleotide-excision repair systems recognize and remove a variety of bulky lesions and pyrimidine dimers. They excise a segment of the DNA strand including the lesion, leaving a gap that is filled in by DNA polymerase and ligase activities. ■ Some DNA damage is repaired by direct reversal of the reaction causing the damage: pyrimidine dimers are directly converted to monomeric pyrimidines by a photolyase, and the methyl group of O 6 -methylguanine is removed by a methyltransferase. ■ In bacteria, error-prone translesion DNA synthesis, involving TLS DNA polymerases, occurs in response to very heavy DNA damage. In eukaryotes, similar polymerases have specialized roles in DNA repair that minimize the introduction of mutations. 25.3 DNA Recombination The rearrangement of genetic information within and among DNA molecules encompasses a variety of proc- esses, collectively placed under the heading of genetic recombination. The practical applications of DNA re- arrangements in altering the genomes of increasing num- bers of organisms are now being explored (Chapter 9). Genetic recombination events fall into at least three general classes. Homologous genetic recombination (also called general recombination) involves genetic ex- changes between any two DNA molecules (or segments of the same molecule) that share an extended region of nearly identical sequence. The actual sequence of bases is irrelevant, as long as it is similar in the two DNAs. In site-specific recombination, the exchanges occur only at a particular DNA se- quence. DNA transposition is distinct from both other classes in that it usually in- volves a short segment of DNA with the remarkable capacity to move from one location in a chromosome to another. These “jumping genes” were first ob- served in maize in the 1940s by Barbara McClintock. There is in addition a wide range of un- usual genetic rearrangements for which no mechanism or purpose has yet been proposed. Here we focus on the three general classes. The functions of genetic recombination systems are as varied as their mechanisms. They include roles in spe- cialized DNA repair systems, specialized activities in DNA replication, regulation of expression of certain genes, facilitation of proper chromosome segregation during eukaryotic cell division, maintenance of genetic diversity, and implementation of programmed genetic rearrangements during embryonic development. In most cases, genetic recombination is closely integrated with other processes in DNA metabolism, and this be- comes a theme of our discussion. Chapter 25 DNA Metabolism978 Barbara McClintock, 1902–1992 8885d_c25_948-994 2/11/04 1:57 PM Page 978 mac76 mac76:385_reb: Homologous Genetic Recombination Has Several Functions In bacteria, homologous genetic recombination is pri- marily a DNA repair process and in this context (as noted in Section 25.2) is referred to as recombina- tional DNA repair. It is usually directed at the recon- struction of replication forks stalled at the site of DNA damage. Homologous genetic recombination can also occur during conjugation (mating), when chromosomal DNA is transferred from a donor to a recipient bacter- ial cell. Recombination during conjugation, although rare in wild bacterial populations, contributes to genetic diversity. In eukaryotes, homologous genetic recombination can have several roles in replication and cell division, including the repair of stalled replication forks. Recom- bination occurs with the highest frequency during meio- sis, the process by which diploid germ-line cells with two sets of chromosomes divide to produce haploid ga- metes—sperm cells or ova in higher eukaryotes—each gamete having only one member of each chromosome pair (Fig. 25–29). Meiosis begins with replication of the DNA in the germ-line cell so that each DNA molecule is present in four copies. The cell then goes through two rounds of cell division without an intervening round of DNA replication. This reduces the DNA content to the haploid level in each gamete. After the DNA is replicated during prophase of the first meiotic division, the resulting sister chromatids re- main associated at their centromeres. At this stage, each set of four homologous chromosomes exists as two pairs of chromatids. Genetic information is now exchanged between the closely associated homologous chromatids by homologous genetic recombination, a process in- volving the breakage and rejoining of DNA (Fig. 25–30). This exchange, also referred to as crossing over, can be observed with the light microscope. Crossing over links the two pairs of sister chromatids together at points called chiasmata (singular, chiasma). 25.3 DNA Recombination 979 replication Diploid germ-line cell Prophase I separation of homologous pairs first meiotic division second meiotic division Haploid gametes FIGURE 25–29 Meiosis in eukaryotic germ-line cells. The chromo- somes of a hypothetical diploid germ-line cell (six chromosomes; three homologous pairs) replicate and are held together at their centromeres. Each replicated double-stranded DNA molecule is called a chromatid (sister chromatid). In prophase I, just before the first meiotic division, the three homologous sets of chromatids align to form tetrads, held together by covalent links at homologous junctions (chiasmata). Crossovers occur within the chiasmata (see Fig. 25–30). These tran- sient associations between homologs ensure that the two tethered chromosomes segregate properly in the next step, when they migrate toward opposite poles of the dividing cell in the first meiotic division. The products of this division are two daughter cells, each with three pairs of chromatids. The pairs now line up across the equator of the cell in preparation for separation of the chromatids (now called chro- mosomes). The second meiotic division produces four haploid daugh- ter cells that can serve as gametes. Each has three chromosomes, half the number of the diploid germ-line cell. The chromosomes have re- sorted and recombined. 8885d_c25_948-994 2/11/04 1:57 PM Page 979 mac76 mac76:385_reb: Crossing over effectively links together all four ho- mologous chromatids, a linkage that is essential to the proper segregation of chromosomes in the subsequent meiotic cell divisions. Crossing over is not an entirely random process, and “hot spots” have been identified on many eukaryotic chromosomes. However, the as- sumption that crossing over can occur with equal prob- ability at almost any point along the length of two homologous chromosomes remains a reasonable ap- proximation in many cases, and it is this assumption that permits the genetic mapping of genes. The frequency of homologous recombination in any region separating two points on a chromosome is roughly proportional to the distance between the points, and this allows determi- nation of the relative positions of and distances between different genes. Homologous recombination thus serves at least three identifiable functions: (1) it contributes to the re- pair of several types of DNA damage; (2) it provides, in eukaryotic cells, a transient physical link between chro- matids that promotes the orderly segregation of chro- mosomes at the first meiotic cell division; and (3) it enhances genetic diversity in a population. Recombination during Meiosis Is Initiated with Double-Strand Breaks A likely pathway for homologous recombination during meiosis is outlined in Figure 25–31a. The model has four key features. First, homologous chromosomes are aligned. Second, a double-strand break in a DNA mole- cule is enlarged by an exonuclease, leaving a single- strand extension with a free 3H11032-hydroxyl group at the broken end (step 1 ). Third, the exposed 3H11032 ends in- vade the intact duplex DNA, and this is followed by branch migration (Fig. 25–32) and/or replication to create a pair of crossover structures, called Holliday junctions (Fig. 25–31a, steps 2 to 4 ). Fourth, cleav- age of the two crossovers creates two complete recom- binant products (step 5 ). In this double-strand break repair model for re- combination, the 3H11032 ends are used to initiate the genetic exchange. Once paired with the complementary strand on the intact homolog, a region of hybrid DNA is created that contains complementary strands from two differ- ent parent DNAs (the product of step 2 in Fig. 25–31a). Each of the 3H11032 ends can then act as a primer for DNA replication. The structures thus formed, Holliday inter- mediates (Fig. 25–31b), are a feature of homologous genetic recombination pathways in all organisms. Homologous recombination can vary in many details from one species to another, but most of the steps out- lined above are generally present in some form. There are two ways to cleave, or “resolve,” the Holliday inter- mediate so that the two recombinant products carry genes in the same linear order as in the substrates—the original, unrecombined chromosomes (step 5 of Fig. 25–31a). If cleaved one way, the DNA flanking the re- gion containing the hybrid DNA is not recombined; if cleaved the other way, the flanking DNA is recombined. Both outcomes are observed in vivo in eukaryotes and prokaryotes. Chapter 25 DNA Metabolism980 Centromere Homologous pair Homolog Tetrad Crossover point (chiasma) (a) (b) Sister chromatids 2 m Centromeres Chromatids H9262 FIGURE 25–30 Crossing over. (a) Crossing over often produces an exchange of genetic material. (b) The homologous chromosomes of a grasshopper are shown during prophase I of meiosis. Many points of joining (chiasmata) are evident between the two homologous pairs of chromatids. These chiasmata are the physical manifestation of prior homologous recombination (crossing over) events. 8885d_c25_948-994 2/11/04 1:57 PM Page 980 mac76 mac76:385_reb: The homologous recombination illustrated in Figure 25–31 is a very elaborate process with subtle molecular consequences for the generation of genetic diversity. To understand how this process contributes to diversity, we should keep in mind that the two homologous chromo- somes that undergo recombination are not necessarily identical. The linear array of genes may be the same, but the base sequences in some of the genes may differ slightly (in different alleles). In a human, for example, one chromosome may contain the allele for hemoglobin A 25.3 DNA Recombination 981 Gene A Gene B (a) 5H11032 3H11032 3H11032 5H11032 5H11032 3H11032 3H11032 5H11032 A double-strand break in one of two homologs is converted to a double- strand gap by the action of exonucleases. Strands with 3H11032 ends are degraded less than those with 5H11032 ends, producing 3H11032 single-strand extensions. An exposed 3H11032 end pairs with its complement in the intact homolog. The other strand of the duplex is displaced. The invading 3H11032 end is extended by DNA polymerase plus branch migration, eventually generating a DNA molecule with two crossovers called Holliday intermediates. Further DNA replication replaces the DNA missing from the site of the original double-strand break. Cleavage of the Holliday intermediates by specialized nucleases generates either of the two recombination products. In product set 2, the DNA on either side of the region undergoing repair is recombined. Product set 1 Product set 2 4 2 1 5 3 (b) FIGURE 25–31 Recombination during meiosis. (a) Model of double-strand break repair for homologous genetic recombination. The two homologous chromosomes involved in this recombination event have similar sequences. Each of the two genes shown has different alleles on the two chromosomes. The DNA strands and alleles are colored differently so that their fate is evident. The steps are described in the text. (b) A Holliday intermediate formed between two bacterial plasmids in vivo, as seen with the electron microscope. The intermediates are named for Robin Holliday, who first proposed their existence in 1964. 8885d_c25_948-994 2/11/04 1:57 PM Page 981 mac76 mac76:385_reb: (normal hemoglobin) while the other contains the allele for hemoglobin S (the sickle-cell mutation). The differ- ence may consist of no more than one base pair among millions. Homologous recombination does not change the linear array of genes, but it can determine which alleles become linked together on a single chromosome. Recombination Requires a Host of Enzymes and Other Proteins Enzymes that promote various steps of homologous re- combination have been isolated from both prokaryotes and eukaryotes. In E. coli, the recB, recC, and recD genes encode the RecBCD enzyme, which has both he- licase and nuclease activities. The RecA protein pro- motes all the central steps in the homologous recombi- nation process: the pairing of two DNAs, formation of Holliday intermediates, and branch migration (as de- scribed below). The RuvA and RuvB proteins (repair of UV damage) form a complex that binds to Holliday in- termediates, displaces RecA protein, and promotes branch migration at higher rates than does RecA. Nu- cleases that specifically cleave Holliday intermediates, often called resolvases, have been isolated from bacte- ria and yeast. The RuvC protein is one of at least two such nucleases in E. coli. The RecBCD enzyme binds to linear DNA at a free (broken) end and moves inward along the double helix, unwinding and degrading the DNA in a reaction coupled to ATP hydrolysis (Fig. 25–33). The activity of the en- zyme is altered when it interacts with a sequence re- ferred to as chi, (5H11032)GCTGGTGG. From that point, degradation of the strand with a 3H11032 terminus is greatly reduced, but degradation of the 5H11032-terminal strand is in- creased. This process creates a single-stranded DNA with a 3H11032 end, which is used during subsequent steps in recombination (Fig. 25–31). The 1,009 chi sequences scattered throughout the E. coli genome enhance the frequency of recombination about five- to tenfold within 1,000 bp of the chi site. The enhancement declines as the distance from the site increases. Sequences that en- hance recombination frequency have also been identi- fied in several other organisms. RecA is unusual among the proteins of DNA me- tabolism in that its active form is an ordered, helical fil- ament of up to several thousand RecA monomers that assemble cooperatively on DNA (Fig. 25–34). This fila- Chapter 25 DNA Metabolism982 FIGURE 25–32 Branch migration. When a template strand pairs with two different complementary strands, a branch is formed at the point where the three complementary strands meet. The branch “migrates” when base pairing to one of the two complementary strands is bro- ken and replaced with base pairing to the other complementary strand. In the absence of an enzyme to direct it, this process can move the branch spontaneously in either direction. Spontaneous branch migra- tion is blocked wherever one of the otherwise complementary strands has a sequence nonidentical to the other strand. ATP ADP+P i 5H11032 3H11032 3H11032 5H11032 chi Helicase and nuclease activities of RecBCD degrade the DNA. On reaching a chi sequence, nuclease activity on the strand with the 3H11032 end is suppressed. The other strand continues to be degraded, generating a 3H11032-terminal single- stranded end. OH 3H11032 OH 3H11032 RecBCD enzyme 5H11032 3H11032 chi 5H11032 3H11032 FIGURE 25–33 Helicase and nuclease activities of the RecBCD en- zyme. Entering at a double-stranded end, RecBCD unwinds and de- grades the DNA until it encounters a chi sequence. The interaction with chi alters the activity of RecBCD so that it generates a single- stranded DNA with a 3H11032 end, suitable for subsequent steps in recom- bination. Movement of the enzyme requires ATP hydrolysis. This en- zyme is believed to help initiate homologous genetic recombination in E. coli. It is also involved in the repair of double-strand breaks at collapsed replication forks. 8885d_c25_948-994 2/11/04 1:57 PM Page 982 mac76 mac76:385_reb: ment normally forms on single-stranded DNA, such as that produced by the RecBCD enzyme. The filament will also form on a duplex DNA with a single-strand gap; in this case, the first RecA monomers bind to the single- stranded DNA in the gap, after which the assembled fil- ament rapidly envelops the neighboring duplex. The RecF, RecO, and RecR proteins regulate the assembly and disassembly of RecA filaments. A useful model to illustrate the recombination ac- tivities of the RecA filament is the in vitro DNA strand exchange reaction (Fig. 25–35). A single strand of DNA is first bound by RecA to establish the nucleoprotein fil- ament. The RecA filament then takes up a homologous duplex DNA and aligns it with the bound single strand. Strands are then exchanged between the two DNAs to create hybrid DNA. The exchange occurs at a rate of 6 bp/s and progresses in the 5H11032n3H11032 direction relative to the single-stranded DNA within the RecA filament. This reaction can involve either three or four strands (Fig. 25–35); in the latter case, a Holliday intermediate forms during the process. As the duplex DNA is incorporated within the RecA filament and aligned with the bound single-stranded DNA over regions of hundreds of base pairs, one strand of the duplex switches pairing partners (Fig. 25–36, step 2 ). Because DNA is a helical structure, continued strand exchange requires an ordered rotation of the two aligned DNAs. This brings about a spooling action (steps 3 and 4 ) that shifts the branch point along the helix. ATP is hydrolyzed by RecA protein during this reaction. Once a Holliday intermediate has formed, a host of enzymes—topoisomerases, the RuvAB branch migration protein, a resolvase, other nucleases, DNA polymerase 25.3 DNA Recombination 983 Circular single- stranded DNA Circular duplex DNA with single-strand gap RecA protein RecA protein H11001H11001 Homologous linear duplex DNA H11001 H11001 Branched intermediates RecA protein binds to single-stranded or gapped DNA. The complementary strand of the linear DNA pairs with a circular single strand. The other linear strand is displaced (left) or pairs with its complement in the circular duplex to yield a Holliday structure (right). RecA protein RecA protein ADP H11001 P i ADP H11001 P i Continued branch migration yields a circular duplex with a nick and a displaced linear strand (left) or a partially single-stranded linear duplex (right). ATP ATP (a) (b) FIGURE 25–35 DNA strand-exchange reactions promoted by RecA protein in vitro. Strand exchange involves the separation of one strand of a duplex DNA from its complement and transfer of the strand to an alternative complementary strand to form a new duplex (het- eroduplex) DNA. The transfer forms a branched intermediate. Forma- tion of the final product depends on branch migration, which is fa- cilitated by RecA. The reaction can involve three strands (left) or a reciprocal exchange between two homologous duplexes—four strands in all (right). When four strands are involved, the branched interme- diate that results is a Holliday intermediate. RecA protein promotes the branch-migration phases of these reactions, using energy derived from ATP hydrolysis. FIGURE 25–34 RecA. (a) Nucleoprotein filament of RecA protein on single-stranded DNA, as seen with the electron microscope. The stri- ations indicate the right-handed helical structure of the filament. (b) Surface contour model of a 24-subunit RecA filament. The filament has six subunits per turn. One subunit is colored red to provide per- spective (derived from PDB ID 2REB). 8885d_c25_948-994 2/11/04 1:57 PM Page 983 mac76 mac76:385_reb: I or III, and DNA ligase—are required to complete re- combination. The RuvC protein (M r 20,000) of E. coli cleaves Holliday intermediates to generate full-length, unbranched chromosome products. All Aspects of DNA Metabolism Come Together to Repair Stalled Replication Forks Like all cells, bacteria sustain high levels of DNA dam- age even under normal growth conditions. Most DNA lesions are repaired rapidly by base-excision repair, nucleotide-excision repair, and the other pathways de- scribed earlier. Nevertheless, almost every bacterial replication fork encounters an unrepaired DNA lesion or break at some point in its journey from the replica- tion origin to the terminus (Fig. 25–28). DNA poly- merase III cannot proceed past many types of DNA le- sions, and these encounters tend to leave the lesion in a single-strand gap. An encounter with a DNA strand break creates a double-strand break. Both situations re- quire recombinational DNA repair (Fig. 25–37). Under normal growth conditions, stalled replication forks are reactivated by an elaborate repair pathway encompass- ing recombinational DNA repair, the restart of replica- tion, and the repair of any lesions left behind. All as- pects of DNA metabolism come together in this process. After a replication fork has been halted, it can be restored by at least two major paths, both of which re- quire the RecA protein. The repair pathway for lesion- containing DNA gaps also requires the RecF, RecO, and RecR proteins. Repair of double-strand breaks requires the RecBCD enzyme (Fig. 25–37). Additional recom- bination steps are followed by a process called origin- independent restart of replication, in which the replication fork reassembles with the aid of a complex of seven proteins (PriA, B, and C, and DnaB, C, G, and T). This complex, originally discovered as a component required for the replication of H9278X174 DNA in vitro, is now termed the replication restart primosome. Restart of the replication fork also requires DNA poly- merase II, in a role not yet defined; this polymerase II activity gives way to DNA polymerase III for the ex- tensive replication generally required to complete the chromosome. The repair of stalled replication forks entails a coor- dinated transition from replication to recombination and back to replication. The recombination steps function to fill the DNA gap or rejoin the broken DNA branch to recre- ate the branched DNA structure at the replication fork. Lesions left behind in what is now duplex DNA are repaired by pathways such as base-excision or nucleotide- excision repair. Thus a wide range of enzymes encom- passing every aspect of DNA metabolism ultimately take part in the repair of a stalled replication fork. This type of repair process is clearly a primary function of the ho- mologous recombination system of every cell, and defects in recombinational DNA repair play an important role in human disease (Box 25–1). Site-Specific Recombination Results in Precise DNA Rearrangements Homologous genetic recombination can involve any two homologous sequences. The second general type of re- combination, site-specific recombination, is a very dif- ferent type of process: recombination is limited to spe- Chapter 25 DNA Metabolism984 RecA protein 5H11032 Three-stranded pairing intermediate Homologous duplex DNA Rotation spools DNA Branch point 5H11032 5H11032 5H11032 5H11032 5H11032 5H11032 3H11032 3H11032 Homologous duplex DNA 5H11032 ADP+P i ADP+P i Branch migration ATP ATP 2 1 3 4 5 FIGURE 25–36 Model for DNA strand exchange mediated by RecA protein. A three-strand reaction is shown. The balls representing RecA protein are undersized relative to the thickness of DNA to clarify the fate of the DNA strands. 1H22071 RecA protein forms a filament on the sin- gle-stranded DNA. 2H22071 A homologous duplex incorporates into this complex. 3H22071 As spooling shifts the three-stranded region from left to right, one of the strands in the duplex is transferred to the single strand originally bound in the filament. The other strand of the duplex is dis- placed, and a new duplex forms within the filament. As rotation con- tinues ( 4H22071 and 5H22071), the displaced strand separates entirely. In this model, hydrolysis of ATP by RecA protein rotates the two DNA mol- ecules relative to each other and thus directs the strand exchange from left to right as shown. 8885d_c25_948-994 2/11/04 1:57 PM Page 984 mac76 mac76:385_reb: FIGURE 25–37 Models for recombinational DNA repair of stalled replication forks. The replication fork collapses on encountering a DNA lesion (left) or strand break (right). Recombination enzymes pro- mote the DNA strand transfers needed to repair the branched DNA structure at the replication fork. A lesion in a single-strand gap is re- paired in a reaction requiring the RecF, RecO, and RecR proteins. Double-strand breaks are repaired in a pathway requiring the RecBCD enzyme. Both pathways require RecA. Recombination intermediates are processed by additional enzymes (e.g., RuvA, RuvB, and RuvC, which process Holliday intermediates). Lesions in double-stranded DNA are repaired by nucleotide-excision repair or other pathways. The replication fork re-forms with the aid of enzymes catalyzing origin-independent replication restart, and chromosomal replication is completed. The overall process requires an elaborate coordination of all aspects of bacterial DNA metabolism. 3H11032 5H11032 3H11032 5H11032 DNA lesion DNA nick RecA RecFOR RecA RecBCD fork regression strand invasion replicationPol I branch migration resolution of Holliday junction reverse branch migration RuvAB RuvC Origin-independent replication restart 25.3 DNA Recombination 985 cific sequences. Recombination reactions of this type occur in virtually every cell, filling specialized roles that vary greatly from one species to another. Examples in- clude regulation of the expression of certain genes and promotion of programmed DNA rearrangements in em- bryonic development or in the replication cycles of some viral and plasmid DNAs. Each site-specific recombina- tion system consists of an enzyme called a recombinase and a short (20 to 200 bp), unique DNA sequence where the recombinase acts (the recombination site). One or 8885d_c25_948-994 2/11/04 1:57 PM Page 985 mac76 mac76:385_reb: more auxiliary proteins may regulate the timing or out- come of the reaction. In vitro studies of many site-specific recombination systems have elucidated some general principles, in- cluding the fundamental reaction pathway (Fig. 25–38a). A separate recombinase recognizes and binds to each of two recombination sites on two different DNA mole- cules or within the same DNA. One DNA strand in each site is cleaved at a specific point within the site, and the recombinase becomes covalently linked to the DNA at the cleavage site through a phosphotyrosine (or phos- phoserine) bond (step 1 ). The transient protein-DNA linkage preserves the phosphodiester bond that is lost in cleaving the DNA, so high-energy cofactors such as ATP are unnecessary in subsequent steps. The cleaved DNA strands are rejoined to new partners to form a Holliday intermediate, with new phosphodiester bonds created at the expense of the protein-DNA linkage (step 2 ). To complete the reaction, the process must be re- peated at a second point within each of the two recom- bination sites (steps 3 and 4 ). In some systems, both strands of each recombination site are cut concurrently and rejoined to new partners without the Holliday inter- mediate. The exchange is always reciprocal and precise, regenerating the recombination sites when the reaction is complete. We can view a recombinase as a site-specific endonuclease and ligase in one package. The sequences of the recombination sites recognized by site-specific recombinases are partially asymmetric (nonpalindromic), and the two recombining sites align in the same orientation during the recombinase reaction. The outcome depends on the location and orientation of the recombination sites (Fig. 25–39). If the two sites are on the same DNA molecule, the reaction either inverts or deletes the intervening DNA, determined by whether the recombination sites have the opposite or the same Chapter 25 DNA Metabolism986 (b) 2 5H11032 5H11032 5H11032 5H11032 3H11032 3H11032 3H11032 3H11032 5H11032 5H11032 5H11032 5H110323H11032 3H11032 3H11032 3H11032 5H11032 5H11032 5H11032 5H110323H11032 3H11032 3H11032 3H11032 5H11032 5H11032 5H11032 5H110323H11032 3H11032 3H11032 3H11032 5H11032 5H11032 5H11032 5H110323H11032 3H11032 3H11032 3H11032 Recombinase Tyr Tyr Tyr Tyr Tyr- P P -Tyr Tyr Tyr Tyr Tyr Tyr Tyr Tyr Tyr Tyr OH OH HO HO Tyr 1 4 3 (a) FIGURE 25–38 A site-specific recombination reaction. (a) The reac- tion shown here is for a common class of site-specific recombinases called integrase-class recombinases (named after bacteriophage H9261 in- tegrase, the first recombinase characterized). The reaction is carried out within a tetramer of identical subunits. Recombinase subunits bind to a specific sequence, often called simply the recombination site. 1H22071 One strand in each DNA is cleaved at particular points within the sequence. The nucleophile is the OH group of an active-site Tyr residue, and the product is a covalent phosphotyrosine link between protein and DNA. 2H22071 The cleaved strands join to new partners, pro- ducing a Holliday intermediate. Steps 3H22071 and 4H22071 complete the re- action by a process similar to the first two steps. The original sequence of the recombination site is regenerated after recombining the DNA flanking the site. These steps occur within a complex of multiple recombinase subunits that sometimes includes other proteins not shown here. (b) A surface contour model of a four-subunit integrase- class recombinase called the Cre recombinase, bound to a Holliday intermediate (shown with light blue and dark blue helix strands). The protein has been rendered transparent so that the bound DNA is vis- ible (derived from PDB ID 3CRX). 8885d_c25_948-994 2/11/04 1:57 PM Page 986 mac76 mac76:385_reb: orientation, respectively. If the sites are on different DNAs, the recombination is intermolecular; if one or both DNAs are circular, the result is an insertion. Some re- combinase systems are highly specific for one of these reaction types and act only on sites with particular orientations. The first site-specific recombination system studied in vitro was that encoded by bacteriophage H9261. When H9261 phage DNA enters an E. coli cell, a complex series of regulatory events commits the DNA to one of two fates. The H9261 DNA either replicates and produces more bacte- riophages (destroying the host cell) or integrates into the host chromosome, replicating passively along with the chromosome for many cell generations. Integration is accomplished by a phage-encoded recombinase (H9261 in- tegrase) that acts at recombination sites on the phage and bacterial DNAs—at attachment sites attP and attB, respectively (Fig. 25–40). The role of site-specific re- combination in regulating gene expression is considered in Chapter 28. 25.3 DNA Recombination 987 (a) Sites of exchange (b) Inversion Deletion and insertion + deletioninsertion FIGURE 25–39 Effects of site-specific recombination. The outcome of site-specific recombination depends on the location and orientation of the recombination sites (red and green) in a double-stranded DNA molecule. Orientation here (shown by arrowheads) refers to the order of nucleotides in the recombination site, not the 5H11032n3H11032 direction. (a) Recombination sites with opposite orientation in the same DNA molecule. The result is an inversion. (b) Recombination sites with the same orientation, either on one DNA molecule, producing a deletion, or on two DNA molecules, producing an insertion. Bacterial attachment site (attB) Phage attachment site (attP) Point of crossover H9261 Phage DNA Integration: H9261 integrase (INT) IHF Excision: H9261 integrase (INT) IHF FIS H11001 XIS attR attL Integrated H9261 phage DNA (prophage) E. coli chromosome FIGURE 25–40 Integration and excision of bacteriophage H9261 DNA at the chromosomal target site. The attachment site on the H9261 phage DNA (attP) shares only 15 bp of complete homology with the bacterial site (attB) in the region of the crossover. The reaction generates two new attachment sites (attR and attL) flanking the integrated phage DNA. The recombinase is the H9261 integrase (or INT protein). Integration and excision use different attachment sites and different auxiliary proteins. Excision uses the proteins XIS, encoded by the bacteriophage, and FIS, encoded by the bacterium. Both reactions require the protein IHF (in- tegration host factor), encoded by the bacterium. 8885d_c25_948-994 2/11/04 1:57 PM Page 987 mac76 mac76:385_reb: Transposable Genetic Elements Move from One Location to Another We now consider the third general type of recombina- tion system: recombination that allows the movement of transposable elements, or transposons. These seg- ments of DNA, found in virtually all cells, move, or “jump,” from one place on a chromosome (the donor site) to another on the same or a different chromosome (the target site). DNA sequence homology is not usu- ally required for this movement, called transposition; the new location is determined more or less randomly. Insertion of a transposon in an essential gene could kill the cell, so transposition is tightly regulated and usually very infrequent. Transposons are perhaps the simplest of molecular parasites, adapted to replicate passively within the chromosomes of host cells. In some cases they carry genes that are useful to the host cell, and thus exist in a kind of symbiosis with the host. Bacteria have two classes of transposons. Insertion sequences (simple transposons) contain only the se- quences required for transposition and the genes for proteins (transposases) that promote the process. Complex transposons contain one or more genes in addition to those needed for transposition. These extra genes might, for example, confer resistance to antibi- otics and thus enhance the survival chances of the host cell. The spread of antibiotic-resistance elements among disease-causing bacterial populations that is rendering some antibiotics ineffectual (pp. 925–926) is mediated in part by transposition. Bacterial transposons vary in structure, but most have short repeated sequences at each end that serve as binding sites for the transposase. When transposition occurs, a short sequence at the target site (5 to 10 bp) is duplicated to form an additional short repeated se- quence that flanks each end of the inserted transposon (Fig. 25–42). These duplicated segments result from the cutting mechanism used to insert a transposon into the DNA at a new location. There are two general pathways for transposition in bacteria. In direct or simple transposition (Fig. 25–43, left), cuts on each side of the transposon excise it, and the transposon moves to a new location. This leaves a double-strand break in the donor DNA that must be Complete Chromosome Replication Can Require Site-Specific Recombination Recombinational DNA repair of a circular bacterial chro- mosome, while essential, sometimes generates deleteri- ous byproducts. The resolution of a Holliday junction at a replication fork by a nuclease such as RuvC, followed by completion of replication, can give rise to one of two products: the usual two monomeric chromosomes or a contiguous dimeric chromosome (Fig. 25–41). In the lat- ter case, the covalently linked chromosomes cannot be segregated to daughter cells at cell division and the di- viding cells become “stuck.” A specialized site-specific recombination system in E. coli, the XerCD system, con- verts the dimeric chromosomes to monomeric chromo- somes so that cell division can proceed. The reaction is a site-specific deletion reaction (Fig. 25–39b). This is another example of the close coordination between DNA recombination processes and other aspects of DNA metabolism. Chapter 25 DNA Metabolism988 termination of replication Fork undergoing recombinational DNA repair resolution to monomers by XerCD system Dimeric genome 2 FIGURE 25–41 DNA deletion to undo a deleterious effect of re- combinational DNA repair. The resolution of a Holliday intermediate during recombinational DNA repair (if cut at the points indicated by red arrows) can generate a contiguous dimeric chromosome. A spe- cialized site-specific recombinase in E. coli, XerCD, converts the dimer to monomers, allowing chromosome segregation and cell division to proceed. 8885d_c25_948-994 2/11/04 1:57 PM Page 988 mac76 mac76:385_reb: repaired. At the target site, a staggered cut is made (as in Fig. 25–42), the transposon is inserted into the break, and DNA replication fills in the gaps to duplicate the target site sequence. In replicative transposition (Fig. 25–43, right), the entire transposon is replicated, leav- ing a copy behind at the donor location. A cointegrate is an intermediate in this process, consisting of the donor region covalently linked to DNA at the target site. Two complete copies of the transposon are present in the cointegrate, both having the same relative orienta- tion in the DNA. In some well-characterized trans- posons, the cointegrate intermediate is converted to products by site-specific recombination, in which spe- cialized recombinases promote the required deletion reaction. 25.3 DNA Recombination 989 FIGURE 25–42 Duplication of the DNA sequence at a target site when a transposon is inserted. The duplicated sequences are shown in red. These sequences are generally only a few base pairs long, so their size (compared with that of a typical transposon) is greatly ex- aggerated in this drawing. Transposon Terminal repeats Transposase makes staggered cuts in the target site. Target DNA The transposon is inserted at the site of the cuts. Replication fills in the gaps, duplicating the sequences flanking the transposon. Cleavage Free ends of transposons attack target DNA Gaps filled (left) or entire transposon replicated (right) Site-specific recombination (within transposon) DNA polymerase DNA ligase replication Direct transposition Replicative transposition OH HO 3H11032 3H11032 3H11032 Target DNA Cointegrate 5H11032 OH HO 3H11032 3H11032 HO 3H11032 OH 3H11032 1 2 3 4 FIGURE 25–43 Two general pathways for transposition: direct (simple) and replicative. 1H22071 The DNA is first cleaved on each side of the transposon, at the sites indicated by arrows. 2H22071 The liberated 3H11032- hydroxyl groups at the ends of the transposon act as nucleophiles in a direct attack on phosphodiester bonds in the target DNA. The target phosphodiester bonds are staggered (not directly across from each other) in the two DNA strands. 3H22071 The transposon is now linked to the target DNA. In direct transposition, replication fills in gaps at each end. In replicative transposition, the entire transposon is replicated to create a cointegrate intermediate. 4H22071 The cointegrate is often resolved later, with the aid of a separate site-specific recombination system. The cleaved host DNA left behind after direct transposition is either repaired by DNA end-joining or degraded (not shown). The latter outcome can be lethal to an organism. 8885d_c25_948-994 2/11/04 1:57 PM Page 989 mac76 mac76:385_reb: V 1 V 2 V 3 V 300 J 1 J 2 J 4 J 5 C V segments (1 to ~300) C segmentJ segments Germ-line DNA V 1 V 2 V 3 V 84 J 4 J 5 C Mature light- chain gene DNA of B lymphocyte recombination resulting in deletion of DNA between V and J segments V 84 J 4 J 5 C transcription 5H110323H11032 V 84 J 4 C Processed mRNA removal of sequences between J 4 and C by mRNA splicing Light-chain polypeptide translation Antibody molecule Light chain Constant region Variable region protein folding and assembly Primary transcript Heavy chain Chapter 25 DNA Metabolism990 FIGURE 25–44 Recombination of the V and J gene segments of the human IgG kappa light chain. This process is designed to generate antibody diversity. At the top is shown the arrangement of IgG-coding sequences in a bone marrow stem cell. Recombination deletes the DNA between a particular V segment and a J segment. After transcription, the transcript is processed by RNA splicing, as described in Chapter 26; translation produces the light-chain polypeptide. The light chain can combine with any of 5,000 possible heavy chains to produce an antibody molecule. Eukaryotes also have transposons, structurally sim- ilar to bacterial transposons, and some use similar trans- position mechanisms. In other cases, however, the mechanism of transposition appears to involve an RNA intermediate. Evolution of these transposons is inter- twined with the evolution of certain classes of RNA viruses. Both are described in the next chapter. Immunoglobulin Genes Assemble by Recombination Some DNA rearrangements are a programmed part of development in eukaryotic organisms. An important ex- ample is the generation of complete immunoglobulin genes from separate gene segments in vertebrate genomes. A human (like other mammals) is capable of producing millions of different immunoglobulins (anti- bodies) with distinct binding specificities, even though the human genome contains only ~35,000 genes. Recom- bination allows an organism to produce an extraordinary diversity of antibodies from a limited DNA-coding capacity. Studies of the recombination mechanism reveal a close relationship to DNA transposition and suggest that this system for generating antibody diver- sity may have evolved from an ancient cellular invasion of transposons. We can use the human genes that encode proteins of the immunoglobulin G (IgG) class to illustrate how antibody diversity is generated. Immunoglobulins con- sist of two heavy and two light polypeptide chains (see Fig. 5–23). Each chain has two regions, a variable re- gion, with a sequence that differs greatly from one im- munoglobulin to another, and a region that is virtually constant within a class of immunoglobulins. There are also two distinct families of light chains, kappa and lambda, which differ somewhat in the sequences of their constant regions. For all three types of polypeptide chain (heavy chain, and kappa and lambda light chains), diversity in the variable regions is generated by a simi- lar mechanism. The genes for these polypeptides are di- vided into segments, and the genome contains clusters with multiple versions of each segment. The joining of one version of each of the segments creates a complete gene. Figure 25–44 depicts the organization of the DNA encoding the kappa light chains of human IgG and shows how a mature kappa light chain is generated. In undif- ferentiated cells, the coding information for this polypeptide chain is separated into three segments. The V (variable) segment encodes the first 95 amino acid residues of the variable region, the J (joining) segment encodes the remaining 12 residues of the variable re- gion, and the C segment encodes the constant region. The genome contains ~300 different V segments, 4 dif- ferent J segments, and 1 C segment. As a stem cell in the bone marrow differentiates to form a mature B lymphocyte, one V segment and one J segment are brought together by a specialized recom- bination system (Fig. 25–44). During this programmed DNA deletion, the intervening DNA is discarded. There are about 300 H11003 4 H11005 1,200 possible V–J combinations. 8885d_c25_990 2/12/04 11:32 AM Page 990 mac76 mac76:385_reb: The recombination process is not as precise as the site- specific recombination described earlier, so additional variation occurs in the sequence at the V–J junction. This increases the overall variation by a factor of at least 2.5, thus the cells can generate about 2.5 H11003 1,200 H11005 3,000 different V–J combinations. The final joining of the V–J combination to the C region is accomplished by an RNA- splicing reaction after transcription, a process described in Chapter 26. The recombination mechanism for joining the V and J segments is illustrated in Figure 25–45. Just beyond each V segment and just before each J segment lie re- combination signal sequences (RSS). These are bound by proteins called RAG1 and RAG2 (recombination ac- tivating gene). The RAG proteins catalyze the formation of a double-strand break between the signal sequences and the V (or J) segments to be joined. The V and J segments are then joined with the aid of a second com- plex of proteins. The genes for the heavy chains and the lambda light chains form by similar processes. Heavy chains have more gene segments than light chains, with more than 5,000 possible combinations. Because any heavy chain can com- bine with any light chain to generate an immunoglobulin, each human has at least 3,000 H11003 5,000 H11005 1.5 H11003 10 7 possible IgGs. And additional diversity is generated by high mutation rates (of unknown mechanism) in the V sequences during B-lymphocyte differentiation. Each mature B lymphocyte produces only one type of anti- body, but the range of antibodies produced by different cells is clearly enormous. Did the immune system evolve in part from ancient transposons? The mechanism for generation of the double-strand breaks by RAG1 and RAG2 does mirror several reaction steps in transposition (Fig. 25–45). In addition, the deleted DNA, with its terminal RSS, has a sequence structure found in most transposons. In the test tube, RAG1 and RAG2 can associate with this deleted DNA and insert it, transposonlike, into other DNA molecules (probably a rare reaction in B lymphocytes). Although we cannot know for certain, the properties of the immunoglobulin gene rearrangement system suggest an intriguing origin in which the distinction between host and parasite has become blurred by evolution. SUMMARY 25.3 DNA Recombination ■ DNA sequences are rearranged in recombina- tion reactions, usually in processes tightly coordinated with DNA replication or repair. ■ Homologous genetic recombination can take place between any two DNA molecules that share sequence homology. In meiosis (in eukaryotes), this type of recombination helps to ensure accurate chromosomal segregation and create genetic diversity. In both bacteria and eukaryotes it serves in the repair of stalled replication forks. A Holliday intermediate forms during homologous recombination. ■ Site-specific recombination occurs only at specific target sequences, and this process can also involve a Holliday intermediate. Recombinases cleave the DNA at specific points and ligate the strands to new partners. This type of recombination is found in virtually all cells, and its many functions include DNA integration and regulation of gene expression. ■ In virtually all cells, transposons use recombination to move within or between chromosomes. In vertebrates, a programmed recombination reaction related to transposition joins immunoglobulin gene segments to form immunoglobulin genes during B-lymphocyte differentiation. 25.3 DNA Recombination 991 cleavage intramolecular transesterification double-strand break repair via end-joining RAG1 RAG2 RSSV segment J segment VJ RSS Intervening DNA HO OH FIGURE 25–45 Mechanism of immunoglobulin gene rearrangement. The RAG1 and RAG2 proteins bind to the recombination signal se- quences (RSS) and cleave one DNA strand between the RSS and the V (or J) segments to be joined. The liberated 3H11032 hydroxyl then acts as a nucleophile, attacking a phosphodiester bond in the other strand to create a double-strand break. The resulting hairpin bends on the V and J segments are cleaved, and the ends are covalently linked by a com- plex of proteins specialized for end-joining repair of double-strand breaks. The steps in the generation of the double-strand break cat- alyzed by RAG1 and RAG2 are chemically related to steps in trans- position reactions. 8885d_c25_948-994 2/11/04 1:57 PM Page 991 mac76 mac76:385_reb: Chapter 25 DNA Metabolism992 Key Terms template 950 semiconservative replication 950 replication fork 951 origin 952 Okazaki fragments 952 leading strand 952 lagging strand 952 nucleases 952 exonuclease 952 endonuclease 952 DNA polymerase I 952 primer 954 primer terminus 954 processivity 954 proofreading 955 DNA polymerase III 955 replisome 957 helicases 957 topoisomerases 957 primases 958 DNA ligase 958 primosome 962 catenane 963 DNA polymerase H9251 965 DNA polymerase H9254 965 DNA polymerase H9255 965 mutation 966 base-excision repair 971 DNA glycosylases 971 AP site 971 AP endonucleases 972 DNA photolyases 974 recombinational DNA repair 976 error-prone translesion DNA synthesis 976 SOS response 976 homologous genetic re- combination 978 site-specific recombina- tion 978 DNA transposition 978 meiosis 979 branch migration 980 double-strand break repair model 980 Holliday intermediate 980 transposons 988 transposition 988 insertion sequence 988 cointegrate 989 Terms in bold are defined in the glossary. Further Reading General Friedberg, E.C., Walker, G.C., & Siede, W. (1995) DNA Repair and Mutagenesis, American Society for Microbiology, Washington, DC. A thorough treatment of DNA metabolism and a good place to start exploring this field. Kornberg, A. & Baker, T.A. (1991) DNA Replication, 2nd edn, W. H. Freeman and Company, New York. Excellent primary source for all aspects of DNA metabolism. DNA Replication Benkovic, S.J., Valentine, A.M., & Salinas, F. (2001) Replisome-mediated DNA replication. Annu. Rev. Biochem. 70, 181–208. This review describes the similar strategies and enzymes of DNA replication in different classes of organisms. Boye, E., Lobner-Olesen, A., & Skarstad, K. (2000) Limiting DNA replication to once and only once. EMBO Rep. 1, 479–483. Good summary of the mechanisms by which replication initia- tion is regulated. Davey, M.J. & O’Donnell, M. (2000) Mechanisms of DNA repli- cation. Curr. Opin. Chem. Biol. 4, 581–586. Ellison, V. & Stillman, B. (2001) Opening of the clamp: an inti- mate view of an ATP-driven biological machine. Cell 106, 655–660. Frick, D.N. & Richardson, C.C. (2001) DNA primases. Annu. Rev. Biochem. 70, 39–80. Hübscher, U., Maga, G., & Spadari, S. (2002) Eukaryotic DNA polymerases. Annu. Rev. Biochem. 71, 133–163. Good summary of the properties and roles of the more than one dozen known eukaryotic DNA polymerases. Jeruzalmi, D., O’Donnell, M., & Kuriyan, J. (2002) Clamp loaders and sliding clamps. Curr. Opin. Struct. Biol. 12, 217–224. Summary of some of the elegant work elucidating how clamp loaders function. Kamada, K., Horiuchi, T., Ohsumi, K., Shimamoto, N., & Morikawa, K. (1996) Structure of a replication-terminator protein complexed with DNA. Nature 383, 598–603. The report revealing the structure of the Tus-Ter complex. Katayama, T. (2001) Feedback controls restrain the initiation of Escherichia coli chromosomal replication. Mol. Microbiol. 41, 9–17. Kool, E.T. (2002) Active site tightness and substrate fit in DNA replication. Annu. Rev. Biochem. 71, 191–219. Excellent summary of the molecular basis of replication fidelity by a DNA polymerase—base-pair geometry as well as hydrogen bonding. Lemon, K.P. & Grossman, A.D. (2001) The extrusion-capture model for chromosome partitioning in bacteria. Genes Dev. 15, 2031–2041. Report describing the replication factory model for bacterial DNA replication. Nishitani, H. & Lygerou, Z. (2002) Control of DNA replication licensing in a cell cycle. Genes Cells 7, 523–534. A good summary of recent advances in the understanding of how eukaryotic DNA replication is initiated. Toyn, J.H., Toone, M.W., Morgan, B.A., & Johnston, L.H. (1995) The activation of DNA replication in yeast. Trends Biochem. Sci. 20, 70–73. DNA Repair Begley, T.J. & Samson, L.D. (2003) AlkB mystery solved: oxidative demethylation of N1-methyladenine and N3-methylcytosine adducts by a direct reversal mechanism. Trends Biochem. Sci. 28, 2–5. Friedberg, E.C., Fischhaber, P.L., & Kisker, C. (2001) Error- prone DNA polymerases: novel structures and the benefits of infidelity. Cell 107, 9–12. 8885d_c25_948-994 2/11/04 1:57 PM Page 992 mac76 mac76:385_reb: Chapter 25 Problems 993 Goodman, M.F. (2002) Error-prone repair DNA polymerases in prokaryotes and eukaryotes. Annu. Rev. Biochem. 71, 17–50. Review of a class of DNA polymerases that continues to grow. Kolodner, R.D. (1995) Mismatch repair: mechanisms and relation- ship to cancer susceptibility. Trends Biochem. Sci. 20, 397–401. Lindahl, T. & Wood, R.D. (1999) Quality control by DNA repair. Science 286, 1897–1905. Marnett, L.J. & Plastaras, J.P. (2001) Endogenous DNA damage and mutation. Trends Genet. 17, 214–221. McCullough, A.K., Dodson, M.L., & Lloyd, R.S. (1999) Initiation of base excision repair: glycosylase mechanisms and structures. Annu. Rev. Biochem. 68, 255–286. Modrich, P. & Lahue, R. (1996) Mismatch repair in replication fidelity, genetic recombination, and cancer biology. Annu. Rev. Biochem. 65, 101–133. Sancar, A. (1996) DNA excision repair. Annu. Rev. Biochem. 65, 43–81. Sutton, M.D., Smith, B.T., Godoy, V.G., & Walker, G.C. (2000) The SOS response: recent insights into umuDC-dependent mutage- nesis and DNA damage tolerance. Annu. Rev. Genet. 34, 479–497. Wood, R.D., Mitchell, M., Sgouros, J., & Lindahl T. (2001) Human DNA repair genes. Science 291, 1284–1289. Description of what an early look at the human genome reveals about DNA repair. DNA Recombination Cox, M.M. (2001) Historical overview: searching for replication help in all of the rec places. Proc. Natl. Acad. Sci. USA 98, 8173–8180. A review of how recombination was shown to be a replication fork repair process. Craig, N.L. (1995) Unity in transposition reactions. Science 270, 253–254. Eggleston, A.K. & West, S.C. (1996) Exchanging partners: recombination in E. coli. Trends Genet. 12, 20–26. Gellert, M. (2002) V(D)J recombination: RAG proteins, repair factors, and regulation. Annu. Rev. Biochem. 71, 101–132. Hallet, B. & Sherratt, D.J. (1997) Transposition and site- specific recombination: adapting DNA cut-and-paste mechanisms to a variety of genetic rearrangements. FEMS Microbiol. Rev. 21, 157–178. Kogoma, T. (1996) Recombination by replication. Cell 85, 625–627. Lieber, M. (1996) Immunoglobulin diversity: rearranging by cutting and repairing. Curr. Biol. 6, 134–136. Lusetti, S.L. & Cox, M.M. (2002) The bacterial RecA protein and the recombinational DNA repair of stalled replication forks. Annu. Rev. Biochem. 71, 71–100. Marians, K.J. (2000) PriA-directed replication fork restart in Escherichia coli. Trends Biochem. Sci. 25, 185–189. Paques, F. & Haber, J.E. (1999) Multiple pathways of recombination induced by double-strand breaks in Saccharomyces cerevisiae. Microbiol. Mol. Biol. Rev. 63, 349–404. Van Duyne, G.D. (2001) A structural view of Cre-loxP site- specific recombination. Annu. Rev. Biophys. Biomol. Struct. 30, 87–104. 11. Conclusions from the Meselson-Stahl Experiment The Meselson-Stahl experiment (see Fig. 25–2) proved that DNA undergoes semiconservative replication in E. coli. In the “dispersive” model of DNA replication, the parent DNA strands are cleaved into pieces of random size, then joined with pieces of newly replicated DNA to yield daughter du- plexes. In the Meselson-Stahl experiment, each strand would contain random segments of heavy and light DNA. Explain how the results of Meselson and Stahl’s experiment ruled out such a model. 12. Heavy Isotope Analysis of DNA Replication A cul- ture of E. coli growing in a medium containing 15 NH 4 Cl is switched to a medium containing 14 NH 4 Cl for three genera- tions (an eightfold increase in population). What is the mo- lar ratio of hybrid DNA ( 15 N– 14 N) to light DNA ( 14 N– 14 N) at this point? 13. Replication of the E. coli Chromosome The E. coli chromosome contains 4,639,221 bp. (a) How many turns of the double helix must be un- wound during replication of the E. coli chromosome? (b) From the data in this chapter, how long would it take to replicate the E. coli chromosome at 37 H11034C if two replica- tion forks proceeded from the origin? Assume replication oc- curs at a rate of 1,000 bp/s. Under some conditions E. coli cells can divide every 20 min. How might this be possible? (c) In the replication of the E. coli chromosome, about how many Okazaki fragments would be formed? What factors guarantee that the numerous Okazaki fragments are assem- bled in the correct order in the new DNA? 14. Base Composition of DNAs Made from Single- Stranded Templates Predict the base composition of the total DNA synthesized by DNA polymerase on templates pro- vided by an equimolar mixture of the two complementary strands of bacteriophage ?X174 DNA (a circular DNA mole- cule). The base composition of one strand is A, 24.7%; G, 24.1%; C, 18.5%; and T, 32.7%. What assumption is neces- sary to answer this problem? 15. DNA Replication Kornberg and his colleagues incu- bated soluble extracts of E. coli with a mixture of dATP, dTTP, dGTP, and dCTP, all labeled with 32 P in the H9251-phosphate group. After a time, the incubation mixture was treated with trichloroacetic acid, which precipitates the DNA but not the nucleotide precursors. The precipitate was collected, and the extent of precursor incorporation into DNA was determined Problems 8885d_c25_948-994 2/11/04 1:57 PM Page 993 mac76 mac76:385_reb: Chapter 25 DNA Metabolism994 from the amount of radioactivity present in the precipitate. (a) If any one of the four nucleotide precursors were omitted from the incubation mixture, would radioactivity be found in the precipitate? Explain. (b) Would 32 P be incorporated into the DNA if only dTTP were labeled? Explain. (c) Would radioactivity be found in the precipitate if 32 P labeled the H9252 or H9253 phosphate rather than the H9251 phosphate of the deoxyribonucleotides? Explain. 16. Leading and Lagging Strands Prepare a table that lists the names and compares the functions of the precursors, enzymes, and other proteins needed to make the leading ver- sus lagging strands during DNA replication in E. coli. 17. Function of DNA Ligase Some E. coli mutants con- tain defective DNA ligase. When these mutants are exposed to 3 H-labeled thymine and the DNA produced is sedimented on an alkaline sucrose density gradient, two radioactive bands appear. One corresponds to a high molecular weight fraction, the other to a low molecular weight fraction. Explain. 18. Fidelity of Replication of DNA What factors pro- mote the fidelity of replication during the synthesis of the leading strand of DNA? Would you expect the lagging strand to be made with the same fidelity? Give reasons for your answers. 19. Importance of DNA Topoisomerases in DNA Repli- cation DNA unwinding, such as that occurring in replica- tion, affects the superhelical density of DNA. In the absence of topoisomerases, the DNA would become overwound ahead of a replication fork as the DNA is unwound behind it. A bac- terial replication fork will stall when the superhelical density (H9268) of the DNA ahead of the fork reaches H110010.14 (see Chap- ter 24). Bidirectional replication is initiated at the origin of a 6,000 bp plasmid in vitro, in the absence of topoisomerases. The plasmid initially has a H9268 of H110020.06. How many base pairs will be unwound and replicated by each replication fork be- fore the forks stall? Assume that each fork travels at the same rate and that each includes all components necessary for elongation except topoisomerase. 10. The Ames Test In a nutrient medium that lacks histi- dine, a thin layer of agar containing ~10 9 Salmonella ty- phimurium histidine auxotrophs (mutant cells that require histidine to survive) produces ~13 colonies over a two-day incubation period at 37 H11034C (see Fig. 25–19). How do these colonies arise in the absence of histidine? The experiment is repeated in the presence of 0.4 H9262g of 2-aminoanthracene. The number of colonies produced over two days exceeds 10,000. What does this indicate about 2-aminoanthracene? What can you surmise about its carcinogenicity? 11. DNA Repair Mechanisms Vertebrate and plant cells often methylate cytosine in DNA to form 5-methylcytosine (see Fig. 8–5a). In these same cells, a specialized repair sys- tem recognizes G–T mismatches and repairs them to GmC base pairs. How might this repair system be advantageous to the cell? (Explain in terms of the presence of 5-methylcytosine in the DNA.) 12. DNA Repair in People with Xeroderma Pig- mentosum The condition known as xeroderma pig- mentosum (XP) arises from mutations in at least seven dif- ferent human genes. The deficiencies are generally in genes encoding enzymes involved in some part of the pathway for human nucleotide-excision repair. The various types of XP are labeled A through G (XPA, XPB, etc.), with a few addi- tional variants lumped under the label XPV. Cultures of cells from healthy individuals and from pa- tients with XPG are irradiated with ultraviolet light. The DNA is isolated and denatured, and the resulting single-stranded DNA is characterized by analytical ultracentrifugation. (a) Samples from the normal fibroblasts show a signifi- cant reduction in the average molecular weight of the single- stranded DNA after irradiation, but samples from the XPG fi- broblasts show no such reduction. Why might this be? (b) If you assume that a nucleotide-excision repair sys- tem is operative, which step might be defective in the fi- broblasts from the patients with XPG? Explain. 13. Holliday Intermediates How does the formation of Holliday intermediates in homologous genetic recombination differ from their formation in site-specific recombination? 8885d_c25_948-994 2/11/04 1:57 PM Page 994 mac76 mac76:385_reb: chapter E xpression of the information in a gene generally in- volves production of an RNA molecule transcribed from a DNA template. Strands of RNA and DNA may seem quite similar at first glance, differing only in that RNA has a hydroxyl group at the 2H11032 position of the al- dopentose and uracil instead of thymine. However, un- like DNA, most RNAs carry out their functions as sin- gle strands, strands that fold back on themselves and have the potential for much greater structural diversity than DNA (Chapter 8). RNA is thus suited to a variety of cellular functions. RNA is the only macromolecule known to have a role both in the storage and transmission of information and in catalysis, which has led to much speculation about its possible role as an essential chemical inter- mediate in the development of life on this planet. The discovery of catalytic RNAs, or ribozymes, has changed the very definition of an enzyme, extending it beyond the domain of proteins. Proteins nevertheless remain es- sential to RNA and its functions. In the modern cell, all nucleic acids, including RNAs, are complexed with pro- teins. Some of these complexes are quite elaborate, and RNA can assume both structural and catalytic roles within complicated biochemical machines. All RNA molecules except the RNA genomes of cer- tain viruses are derived from information permanently stored in DNA. During transcription, an enzyme sys- tem converts the genetic information in a segment of double-stranded DNA into an RNA strand with a base sequence complementary to one of the DNA strands. Three major kinds of RNA are produced. Messenger RNAs (mRNAs) encode the amino acid sequence of one or more polypeptides specified by a gene or set of genes. Transfer RNAs (tRNAs) read the information encoded in the mRNA and transfer the appropriate amino acid to a growing polypeptide chain during pro- tein synthesis. Ribosomal RNAs (rRNAs) are con- stituents of ribosomes, the intricate cellular machines that synthesize proteins. Many additional specialized RNAs have regulatory or catalytic functions or are pre- cursors to the three main classes of RNA. During replication the entire chromosome is usually copied, but transcription is more selective. Only partic- ular genes or groups of genes are transcribed at any one time, and some portions of the DNA genome are never transcribed. The cell restricts the expression of genetic information to the formation of gene products needed at any particular moment. Specific regulatory sequences mark the beginning and end of the DNA segments to be transcribed and designate which strand in duplex DNA is to be used as the template. The regulation of tran- scription is described in detail in Chapter 28. In this chapter we examine the synthesis of RNA on a DNA template and the postsynthetic processing and turnover of RNA molecules. In doing so we encounter many of the specialized functions of RNA, including cat- alytic functions. Interestingly, the substrates for RNA enzymes are often other RNA molecules. We also de- scribe systems in which RNA is the template and DNA the product, rather than vice versa. The information pathways thus come full circle, revealing that template- dependent nucleic acid synthesis has standard rules 26 995 RNA METABOLISM 26.1 DNA-Dependent Synthesis of RNA 996 26.2 RNA Processing 1007 26.3 RNA-Dependent Synthesis of RNA and DNA 1021 The RNA of the cell is partly in the nucleus, partly in particles in the cytoplasm and partly as the “soluble” RNA of the cell sap; many workers have shown that all these three fractions turn over differently. It is very important to realize in any discussion of the role of RNA in the cell that it is very inhomogeneous metabolically, and probably of more than one type. —Francis H. C. Crick, article in Symposia of the Society for Experimental Biology, 1958 8885d_c26_995-1035 2/12/04 11:18 AM Page 995 mac34 mac34: kec_420: regardless of the nature of template or product (RNA or DNA). This examination of the biological intercon- version of DNA and RNA as information carriers leads to a discussion of the evolutionary origin of biological information. 26.1 DNA-Dependent Synthesis of RNA Our discussion of RNA synthesis begins with a compar- ison between transcription and DNA replication (Chap- ter 25). Transcription resembles replication in its fun- damental chemical mechanism, its polarity (direction of synthesis), and its use of a template. And like replica- tion, transcription has initiation, elongation, and termi- nation phases—though in the literature on transcrip- tion, initiation is further divided into discrete phases of DNA binding and initiation of RNA synthesis. Tran- scription differs from replication in that it does not require a primer and, generally, involves only limited segments of a DNA molecule. Additionally, within transcribed segments only one DNA strand serves as a template. RNA Is Synthesized by RNA Polymerases The discovery of DNA polymerase and its dependence on a DNA template spurred a search for an enzyme that synthesizes RNA complementary to a DNA strand. By 1960, four research groups had independently detected an enzyme in cellular extracts that could form an RNA polymer from ribonucleoside 5H11032-triphosphates. Subse- quent work on the purified Escherichia coli RNA poly- merase helped to define the fundamental properties of transcription (Fig. 26–1). DNA-dependent RNA poly- merase requires, in addition to a DNA template, all four ribonucleoside 5H11032-triphosphates (ATP, GTP, UTP, and CTP) as precursors of the nucleotide units of RNA, as well as Mg 2H11001 . The protein also binds one Zn 2H11001 . The chemistry and mechanism of RNA synthesis closely re- semble those used by DNA polymerases (see Fig. 25–5). RNA polymerase elongates an RNA strand by adding ri- bonucleotide units to the 3H11032-hydroxyl end, building RNA in the 5H11032n3H11032 direction. The 3H11032-hydroxyl group acts as a nucleophile, attacking the H9251 phosphate of the incoming ribonucleoside triphosphate (Fig. 26–1b) and releasing pyrophosphate. The overall reaction is RNA polymerase requires DNA for activity and is most active when bound to a double-stranded DNA. As noted above, only one of the two DNA strands serves as a tem- plate. The template DNA strand is copied in the 3H11032n5H11032 direction (antiparallel to the new RNA strand), just as in DNA replication. Each nucleotide in the newly formed RNA is selected by Watson-Crick base-pairing interac- H11001(NMP) n (NMP) nH110011 RNA NTP Lengthened RNA PP i H11001 tions; U residues are inserted in the RNA to pair with A residues in the DNA template, G residues are inserted to pair with C residues, and so on. Base-pair geometry (see Fig. 25–6) may also play a role in base selection. Unlike DNA polymerase, RNA polymerase does not require a primer to initiate synthesis. Initiation occurs when RNA polymerase binds at specific DNA sequences called promoters (described below). The 5H11032-triphos- phate group of the first residue in a nascent (newly formed) RNA molecule is not cleaved to release PP i , but instead remains intact throughout the transcription process. During the elongation phase of transcription, the growing end of the new RNA strand base-pairs tem- porarily with the DNA template to form a short hybrid Chapter 26 RNA Metabolism996 Rewinding Transcription bubble RNA-DNA hybrid, ~8 bp Nontemplate strand Unwinding Direction of transcription Template strand Active site dNTP channel 5H11032 3H11032 RNA 5H11032 3H11032 DNA MECHANISM FIGURE 26–1 Transcription by RNA polymerase in E. coli. For synthesis of an RNA strand complementary to one of two DNA strands in a double helix, the DNA is transiently unwound. (a) About 17 bp are unwound at any given time. RNA polymerase and the bound transcription bubble move from left to right along the DNA as shown; facilitating RNA synthesis. The DNA is unwound ahead and rewound behind as RNA is transcribed. Red arrows show the direc- tion in which the DNA must rotate to permit this process. As the DNA is rewound, the RNA-DNA hybrid is displaced and the RNA strand extruded. The RNA polymerase is in close contact with the DNA ahead of the transcription bubble, as well as with the separated DNA strands and the RNA within and immediately behind the bubble. A channel in the protein funnels new nucleoside triphosphates (NTPs) to the poly- merase active site. The polymerase footprint encompasses about 35 bp of DNA during elongation. (b) Catalytic mechanism of RNA synthesis by RNA polymerase. Note that this is essentially the same mechanism used by DNA poly- (a) 8885d_c26_995-1035 2/12/04 11:18 AM Page 996 mac34 mac34: kec_420: 26.1 DNA-Dependent Synthesis of RNA 997 RNA-DNA double helix, estimated to be 8 bp long (Fig. 26–1a). The RNA in this hybrid duplex “peels off” shortly after its formation, and the DNA duplex re-forms. To enable RNA polymerase to synthesize an RNA strand complementary to one of the DNA strands, the DNA duplex must unwind over a short distance, form- ing a transcription “bubble.” During transcription, the E. coli RNA polymerase generally keeps about 17 bp unwound. The 8 bp RNA-DNA hybrid occurs in this un- wound region. Elongation of a transcript by E. coli RNA polymerase proceeds at a rate of 50 to 90 nucleotides/s. Because DNA is a helix, movement of a transcription bubble requires considerable strand rotation of the nu- cleic acid molecules. DNA strand rotation is restricted in most DNAs by DNA-binding proteins and other struc- tural barriers. As a result, a moving RNA polymerase generates waves of positive supercoils ahead of the tran- scription bubble and negative supercoils behind (Fig. 26–1c). This has been observed both in vitro and in vivo (in bacteria). In the cell, the topological problems caused by transcription are relieved through the action of topoisomerases (Chapter 24). The two complementary DNA strands have differ- ent roles in transcription. The strand that serves as tem- plate for RNA synthesis is called the template strand. The DNA strand complementary to the template, the nontemplate strand, or coding strand, is identical in base sequence to the RNA transcribed from the gene, CH 2 O P P P RNA polymerase P PP i O OO – O O CH 2 HH OH – O O B HH H O O – – O O OO – O – O – H OHOH O B C Asp Asp Asp O Template strand CH 2 HH OHO O – O O O B CH 2 HH OHOH O B Template strand Mg 2+ Mg 2+ (b) Negative supercoils Positive supercoils Direction of transcription 5H11032 3H11032 RNA merases (see Fig. 25–5b). The addition of nu- cleotides involves an attack by the 3H11032-hydroxyl group at the end of the growing RNA molecule on the H9251 phosphate of the incoming NTP. The reaction involves two Mg 2H11001 ions, coordinated to the phosphate groups of the incoming NTP and to three Asp residues (Asp 460 , Asp 462 , and Asp 464 in the H9252H11032 subunit of the E. coli RNA polymerase), which are highly conserved in the RNA polymerases of all species. One Mg 2H11001 ion facilitates at- tack by the 3H11032-hydroxyl group on the H9251 phosphate of the NTP; the other Mg 2H11001 ion facilitates displacement of the pyrophosphate; and both metal ions stabilize the pentacovalent transition state. (c) Changes in the supercoiling of DNA brought about by tran- scription. Movement of an RNA polymerase along DNA tends to cre- ate positive supercoils (overwound DNA) ahead of the transcription bubble and negative supercoils (underwound DNA) behind it. In a cell, topoisomerases rapidly eliminate the positive supercoils and reg- ulate the level of negative supercoiling (Chapter 24). (c) 8885d_c26_995-1035 2/12/04 11:18 AM Page 997 mac34 mac34: kec_420: with U in the RNA in place of T in the DNA (Fig. 26–2). The coding strand for a particular gene may be located in either strand of a given chromosome (as shown in Fig. 26–3 for a virus). The regulatory sequences that control transcription (described later in this chapter) are by convention designated by the sequences in the coding strand. The DNA-dependent RNA polymerase of E. coli is a large, complex enzyme with five core subunits (H9251 2 H9252H9252H11032H9275; M r 390,000) and a sixth subunit, one of a group designated H9268, with variants designated by size (mole- cular weight). The H9268 subunit binds transiently to the core and directs the enzyme to specific binding sites on the DNA (described below). These six subunits consti- tute the RNA polymerase holoenzyme (Fig. 26–4). The RNA polymerase holoenzyme of E. coli thus exists in several forms, depending on the type of H9268 subunit. The most common subunit is H9268 70 (M r 70,000), and the up- coming discussion focuses on the corresponding RNA polymerase holoenzyme. RNA polymerases lack a separate proofreading 3H11032n5H11032 exonuclease active site (such as that of many DNA polymerases), and the error rate for transcription is higher than that for chromosomal DNA replication— approximately one error for every 10 4 to 10 5 ribonu- cleotides incorporated into RNA. Because many copies of an RNA are generally produced from a single gene and all RNAs are eventually degraded and replaced, a mistake in an RNA molecule is of less consequence to the cell than a mistake in the permanent information stored in DNA. Many RNA polymerases, including bac- terial RNA polymerase and the eukaryotic RNA poly- merase II (discussed below), do pause when a mispaired base is added during transcription, and they can remove mismatched nucleotides from the 3H11032 end of a transcript by direct reversal of the polymerase reaction. But we do not yet know whether this activity is a true proof- reading function and to what extent it may contribute to the fidelity of transcription. RNA Synthesis Begins at Promoters Initiation of RNA synthesis at random points in a DNA molecule would be an extraordinarily wasteful process. Instead, an RNA polymerase binds to specific sequences in the DNA called promoters, which direct the tran- scription of adjacent segments of DNA (genes). The sequences where RNA polymerases bind can be quite variable, and much research has focused on identifying the particular sequences that are critical to promoter function. In E. coli, RNA polymerase binding occurs within a region stretching from about 70 bp before the tran- scription start site to about 30 bp beyond it. By con- vention, the DNA base pairs that correspond to the be- ginning of an RNA molecule are given positive numbers, and those preceding the RNA start site are given nega- tive numbers. The promoter region thus extends be- tween positions H1100270 and H1100130. Analyses and compar- isons of the most common class of bacterial promoters (those recognized by an RNA polymerase holoenzyme containing H9268 70 ) have revealed similarities in two short sequences centered about positions H1100210 and H1100235 (Fig. 26–5). These sequences are important interaction sites for the H9268 70 subunit. Although the sequences are not identical for all bacterial promoters in this class, certain nucleotides that are particularly common at each posi- tion form a consensus sequence (recall the E. coli Chapter 26 RNA Metabolism998 DNA RNA transcripts 3.6 H11003 10 4 bp (5H11032) (3H11032) C G G C C G T A (3H11032) (5H11032) A T T A A T G C C G G C T A T A T A DNA nontemplate (coding) strand DNA template strand RNA transcript(5H11032)(3H11032) CGCUAUAGCGUUU FIGURE 26–2 Template and nontemplate (coding) DNA strands. The two complementary strands of DNA are defined by their function in transcription. The RNA transcript is synthe- sized on the template strand and is identical in sequence (with U in place of T) to the nontemplate strand, or coding strand. FIGURE 26–3 Organization of coding information in the adenovirus genome. The genetic information of the adenovirus genome (a con- veniently simple example) is encoded by a double-stranded DNA mol- ecule of 36,000 bp, both strands of which encode proteins. The in- formation for most proteins is encoded by the top strand—by convention, the strand transcribed from left to right—but some is en- coded by the bottom strand, which is transcribed in the opposite direction. Synthesis of mRNAs in adenovirus is actually much more complex than shown here. Many of the mRNAs shown for the upper strand are initially synthesized as a single, long transcript (25,000 nucleotides), which is then extensively processed to produce the separate mRNAs. Adenovirus causes upper respiratory tract infections in some vertebrates. 8885d_c26_995-1035 2/12/04 11:18 AM Page 998 mac34 mac34: kec_420: oriC consensus sequence; see Fig. 25–11). The con- sensus sequence at the H1100210 region is (5H11032)TATAAT(3H11032); the consensus sequence at the H1100235 region is (5H11032)TTGACA(3H11032). A third AT-rich recognition element, called the UP (upstream promoter) element, occurs be- tween positions H1100240 and H1100260 in the promoters of cer- tain highly expressed genes. The UP element is bound by the H9251 subunit of RNA polymerase. The efficiency with which an RNA polymerase binds to a promoter and ini- tiates transcription is determined in large measure by these sequences, the spacing between them, and their distance from the transcription start site. Many independent lines of evidence attest to the functional importance of the sequences in the H1100235 and H1100210 regions. Mutations that affect the function of a given promoter often involve a base pair in these re- gions. Variations in the consensus sequence also affect the efficiency of RNA polymerase binding and tran- scription initiation. A change in only one base pair can decrease the rate of binding by several orders of mag- nitude. The promoter sequence thus establishes a basal level of expression that can vary greatly from one E. coli gene to the next. A method that provides information about the interaction between RNA polymerase and pro- moters is illustrated in Box 26–1. The pathway of transcription initiation is becoming much better defined (Fig. 26–6a). It consists of two ma- jor parts, binding and initiation, each with multiple steps. First, the polymerase binds to the promoter, form- ing, in succession, a closed complex (in which the bound DNA is intact) and an open complex (in which the bound DNA is intact and partially unwound near the H1100210 sequence). Second, transcription is initiated within the complex, leading to a conformational change that converts the complex to the elongation form, followed by movement of the transcription complex away from 26.1 DNA-Dependent Synthesis of RNA 999 bH11032 b j a a q FIGURE 26–4 Structure of the RNA polymerase holoenzyme of the bacterium Thermus aquaticus. (Derived from PDB ID 1IW7.) The over- all structure of this enzyme is very similar to that of the E. coli RNA polymerase; no DNA or RNA is shown here. The H9252 subunit is in gray, the H9252H11032 subunit is white; the two H9251 subunits are different shades of red; the H9275 subunit is yellow; the H9268 subunit is orange. The image on the left is oriented as in Figure 26–6. When the structure is rotated 180H11034 about the y axis (right) the small H9275 subunit is visible. trp lac recA araBAD N 17 TTAACTTTGACA N 17 TATGTTTTTACA N 16 TATAATTTGATA N 18 TACTGT ACTGACG N 7 N 6 N 7 N 6 A UP element H1100235 Region H1100210 Region Spacer RNA startSpacer rrnB P1 NNAAA T TTTTNNAAAANNN N TTGACA TATAAT N 6 H110011 N 17 AGAAAATTATTTTAAATTTCCT N GTGTCA TATAAT N 8 AN 16 Consensus sequence AA TT A T A A FIGURE 26–5 Typical E. coli promoters recognized by an RNA poly- merase holoenzyme containing H9268 70 . Sequences of the nontemplate strand are shown, read in the 5H11032n3H11032 direction, as is the convention for representations of this kind. The sequences vary from one promoter to the next, but comparisons of many promoters reveal similarities, particularly in the H1100210 and H1100235 regions. The sequence element UP, not present in all E. coli promoters, is shown in the P1 promoter for the highly expressed rRNA gene rrnB. UP elements, generally occur- ring in the region between H1100240 and H1100260, strongly stimulate tran- scription at the promoters that contain them. The UP element in the rrnB P1 promoter encompasses the region between H1100238 and H1100259. The consensus sequence for E. coli promoters recognized by H9268 70 is shown second from the top. Spacer regions contain slightly variable numbers of nucleotides (N). Only the first nucleotide coding the RNA transcript (at position H110011) is shown. 8885d_c26_995-1035 2/12/04 11:18 AM Page 999 mac34 mac34: kec_420: Chapter 26 RNA Metabolism1000 5H11032 3H11032 5H110323H11032 Initiation transcription initiation promoter clearance Binding Closed complex H1100235 H1100210 H110011 j Open complex Elongation form j FIGURE 26–6 Transcription initiation and elongation by E. coli RNA polymerase. (a) Initiation of transcription requires several steps gen- erally divided into two phases, binding and initiation. In the binding phase, the initial interaction of the RNA polymerase with the promoter leads to formation of a closed complex, in which the promoter DNA is stably bound but not unwound. A 12 to 15 bp region of DNA— from within the H1100210 region to position H110012 or H110013—is then unwound to form an open complex. Additional intermediates (not shown) have been detected in the pathways leading to the closed and open com- plexes, along with several changes in protein conformation. The ini- tiation phase encompasses transcription initiation and promoter clear- ance. Once the first 8 or 9 nucleotides of a new RNA are synthesized, the H9268 subunit is released and the polymerase leaves the promoter and becomes committed to elongation of the RNA. (b) Structure of the RNA core polymerase from E. coli. RNA and DNA are included here to illustrate a polymerase in the elongation phase. Subunit coloring matches Figure 26–4: the H9252 and H9252H11032 subunits are light gray and white; the H9251 subunits, shades of red. The H9275 subunit is on the opposite side of the complex and is not visible in this view. The H9268 subunit is not present in this complex, having dissociated after the initiation steps. The top panel shows the entire complex. The ac- tive site for transcription is in a cleft between the H9252 and H9252H11032 subunits. In the middle panel, the H9252 subunit has been removed, exposing the active site and the DNA-RNA hybrid region. The active site is marked in part by a Mg 2H11001 ion (red). In the bottom panel, all the protein has been removed to reveal the circuitous path taken by the DNA and RNA through the complex. (a) (b) 8885d_c26_995-1035 2/12/04 11:18 AM Page 1000 mac34 mac34: kec_420: 26.1 DNA-Dependent Synthesis of RNA 1001 the promoter (promoter clearance). Any of these steps can be affected by the specific makeup of the promoter sequences. The H9268 subunit dissociates as the polymerase enters the elongation phase of transcription (Fig. 26–6a). E. coli has other classes of promoters, bound by RNA polymerase holoenzymes with different H9268 subunits. An example is the promoters of the heat-shock genes. The products of this set of genes are made at higher lev- els when the cell has received an insult, such as a sud- den increase in temperature. RNA polymerase binds to the promoters of these genes only when H9268 70 is replaced with the H9268 32 (M r 32,000) subunit, which is specific for the heat-shock promoters (see Fig. 28–3). By using different H9268 subunits the cell can coordinate the expres- sion of sets of genes, permitting major changes in cell physiology. Transcription Is Regulated at Several Levels Requirements for any gene product vary with cellular conditions or developmental stage, and transcription of each gene is carefully regulated to form gene products only in the proportions needed. Regulation can occur at any step in transcription, including elongation and ter- mination. However, much of the regulation is directed at the polymerase binding and transcription initiation steps outlined in Figure 26–6. Differences in promoter sequences are just one of several levels of control. The binding of proteins to sequences both near to and distant from the promoter can also affect levels of gene expression. Protein binding can activate tran- scription by facilitating either RNA polymerase binding or steps further along in the initiation process, or it can repress transcription by blocking the activity of the polymerase. In E. coli, one protein that activates tran- scription is the cAMP receptor protein (CRP), which increases the transcription of genes coding for enzymes that metabolize sugars other than glucose when cells are grown in the absence of glucose. Repressors are pro- teins that block the synthesis of RNA at specific genes. In the case of the Lac repressor (Chapter 28), tran- scription of the genes for the enzymes of lactose me- tabolism is blocked when lactose is unavailable. Transcription is the first step in the complicated and energy-intensive pathway of protein synthesis, so much of the regulation of protein levels in both bacterial and eukaryotic cells is directed at transcription, particularly its early stages. In Chapter 28 we describe many mech- anisms by which this regulation is accomplished. Specific Sequences Signal Termination of RNA Synthesis RNA synthesis is processive (that is, the RNA polymer- ase has high processivity; p. 954)—necessarily so, be- cause if an RNA polymerase released an RNA transcript prematurely, it could not resume synthesis of the same RNA but instead would have to start over. However, an encounter with certain DNA sequences results in a pause in RNA synthesis, and at some of these sequences transcription is terminated. The process of termination is not yet well understood in eukaryotes, so our focus is again on bacteria. E. coli has at least two classes of termination signals: one class relies on a protein factor called H9267 (rho) and the other is H9267-independent. Most H9267-independent terminators have two distin- guishing features. The first is a region that produces an RNA transcript with self-complementary sequences, permitting the formation of a hairpin structure (see Fig. 8–21a) centered 15 to 20 nucleotides before the pro- jected end of the RNA strand. The second feature is a highly conserved string of three A residues in the template strand that are transcribed into U residues near the 3H11032 end of the hairpin. When a polymerase ar- rives at a termination site with this structure, it pauses (Fig. 26–7). Formation of the hairpin structure in the RNA disrupts several AUU base pairs in the RNA-DNA hybrid segment and may disrupt important interactions Isomerize Terminate Escape Bypass Pause 3H11032 3H11032 5H11032 5H11032 FIGURE 26–7 Model for H9267-independent termination of transcription in E. coli. RNA polymerase pauses at a variety of DNA sequences, some of which are terminators. One of two outcomes is then possible: the polymerase bypasses the site and continues on its way, or the com- plex undergoes a conformational change (isomerization). In the latter case, intramolecular pairing of complementary sequences in the newly formed RNA transcript may form a hairpin that disrupts the RNA-DNA hybrid and/or the interactions between the RNA and the polymerase, resulting in isomerization. An AUU hybrid region at the 3H11032 end of the new transcript is relatively unstable, and the RNA dissociates completely, leading to termination and dissociation of the RNA molecule. This is the usual outcome at terminators. At other pause sites, the complex may escape after the isomerization step to continue RNA synthesis. 8885d_c26_995-1033 2/12/04 12:39 PM Page 1001 mac34 mac34: kec_420: BOX 26–1 WORKING IN BIOCHEMISTRY RNA Polymerase Leaves Its Footprint on a Promoter Footprinting, a technique derived from principles used in DNA sequencing, identifies the DNA se- quences bound by a particular protein. Researchers isolate a DNA fragment thought to contain sequences recognized by a DNA-binding protein and radiolabel one end of one strand (Fig. 1). They then use chem- ical or enzymatic reagents to introduce random breaks in the DNA fragment (averaging about one per mole- cule). Separation of the labeled cleavage products (bro- ken fragments of various lengths) by high-resolution electrophoresis produces a ladder of radioactive bands. In a separate tube, the cleavage procedure is repeated on copies of the same DNA frag- ment in the presence of the DNA-binding protein. The researchers then subject the two sets of cleavage products to elec- trophoresis and compare them side by side. A gap (“footprint”) in the series of radioactive bands derived from the DNA- protein sample, attributable to protection of the DNA by the bound protein, identi- fies the sequences that the protein binds. The precise location of the protein- binding site can be determined by di- rectly sequencing (see Fig. 8–37) copies of the same DNA fragment and including the sequencing lanes (not shown here) on the same gel with the footprint. Fig- ure 2 shows footprinting results for the binding of RNA polymerase to a DNA fragment containing a promoter. The polymerase covers 60 to 80 bp; protec- tion by the bound enzyme includes the H1100210 and H1100235 regions. Nontemplate strand H11002H11001C H110011 H1100220 H1100230 H1100240 H1100250 H1100210 Regions bound by RNA polymerase 5H11032 3H11032 3H11032 5H11032 Treat with DNase under conditions in which each strand is cut once (on average). No cuts are made in the area where RNA polymerase has bound. Solution of identical DNA fragments radioactively labeled at one end of one strand. Isolate labeled DNA fragments and denature. Only labeled strands are detected in next step. Separate fragments by polyacrylamide gel electrophoresis and visualize radiolabeled bands on x-ray film. DNA migration Missing bands indicate where RNA polymerase was bound to DNA. Uncut DNA fragment Site of DNase cut (H11001)(H11002) H11001H11002 DNase I FIGURE 1 Footprint analysis of the RNA polymerase–binding site on a DNA fragment. Separate experiments are carried out in the presence (H11001) and absence (H11002) of the polymerase. FIGURE 2 Footprinting results of RNA polymerase binding to the lac promoter (see Fig. 26–5). In this experiment, the 5H11032 end of the nontemplate strand was radioactively labeled. Lane C is a control in which the labeled DNA fragments were cleaved with a chemical reagent that produces a more uniform banding pattern. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1002 mac34 mac34: kec_420: between RNA and the RNA polymerase, facilitating dis- sociation of the transcript. The H9267-dependent terminators lack the sequence of repeated A residues in the template strand but usually include a CA-rich sequence called a rut (rho utilization) element. The H9267 protein associates with the RNA at spe- cific binding sites and migrates in the 5H11032n3H11032 direction until it reaches the transcription complex that is paused at a termination site. Here it contributes to release of the RNA transcript. The H9267 protein has an ATP-depend- ent RNA-DNA helicase activity that promotes translo- cation of the protein along the RNA, and ATP is hy- drolyzed by H9267 protein during the termination process. The detailed mechanism by which the protein promotes the release of the RNA transcript is not known. Eukaryotic Cells Have Three Kinds of Nuclear RNA Polymerases The transcriptional machinery in the nucleus of a eu- karyotic cell is much more complex than that in bacte- ria. Eukaryotes have three RNA polymerases, desig- nated I, II, and III, which are distinct complexes but have certain subunits in common. Each polymerase has a spe- cific function and is recruited to a specific promoter sequence. RNA polymerase I (Pol I) is responsible for the syn- thesis of only one type of RNA, a transcript called pre- ribosomal RNA (or pre-rRNA), which contains the pre- cursor for the 18S, 5.8S, and 28S rRNAs (see Fig. 26–22). Pol I promoters vary greatly in sequence from one species to another. The principal function of RNA polymerase II (Pol II) is synthesis of mRNAs and some specialized RNAs. This enzyme can recognize thousands of promoters that vary greatly in sequence. Many Pol II promoters have a few sequence features in common, in- cluding a TATA box (eukaryotic consensus sequence TATAAA) near base pair H1100230 and an Inr sequence (ini- tiator) near the RNA start site at H110011 (Fig. 26–8). RNA polymerase III (Pol III) makes tRNAs, the 5S rRNA, and some other small specialized RNAs. The pro- moters recognized by Pol III are well characterized. In- terestingly, some of the sequences required for the reg- ulated initiation of transcription by Pol III are located within the gene itself, whereas others are in more con- ventional locations upstream of the RNA start site (Chapter 28). RNA Polymerase II Requires Many Other Protein Factors for Its Activity RNA polymerase II is central to eukaryotic gene ex- pression and has been studied extensively. Although this polymerase is strikingly more complex than its bacter- ial counterpart, the complexity masks a remarkable con- servation of structure, function, and mechanism. Pol II is a huge enzyme with 12 subunits. The largest subunit (RBP1) exhibits a high degree of homology to the H9252H11032 subunit of bacterial RNA polymerase. Another subunit (RBP2) is structurally similar to the bacterial H9252 subunit, and two others (RBP3 and RBP11) show some struc- tural homology to the two bacterial H9251 subunits. Pol II must function with genomes that are more complex and with DNA molecules more elaborately packaged than in bacteria. The need for protein-protein contacts with the numerous other protein factors required to navigate this labyrinth accounts in large measure for the added com- plexity of the eukaryotic polymerase. The largest subunit of Pol II also has an unusual fea- ture, a long carboxyl-terminal tail consisting of many re- peats of a consensus heptad amino acid sequence –YSPTSPS–. There are 27 repeats in the yeast enzyme (18 exactly matching the consensus) and 52 (21 exact) in the mouse and human enzymes. This carboxyl- terminal domain (CTD) is separated from the main body of the enzyme by an unstructured linker sequence. The CTD has many important roles in Pol II function, as out- lined below. RNA polymerase II requires an array of other pro- teins, called transcription factors, in order to form the active transcription complex. The general tran- scription factors required at every Pol II promoter 26.1 DNA-Dependent Synthesis of RNA 1003 YYAN T YY A TATAAA TATA boxVarious regulatory sequences Inr H1100230 H110011 3H110325H11032 FIGURE 26–8 Common sequences in promoters recognized by eu- karyotic RNA polymerase II. The TATA box is the major assembly point for the proteins of the preinitiation complexes of Pol II. The DNA is unwound at the initiator sequence (Inr), and the transcription start site is usually within or very near this sequence. In the Inr consensus se- quence shown here, N represents any nucleotide; Y, a pyrimidine nu- cleotide. Many additional sequences serve as binding sites for a wide variety of proteins that affect the activity of Pol II. These sequences are important in regulating Pol II promoters and vary greatly in type and number, and in general the eukaryotic promoter is much more com- plex than suggested here. Many of the sequences are located within a few hundred base pairs of the TATA box on the 5H11032 side; others may be thousands of base pairs away. The sequence elements summarized here are more variable among the Pol II promoters of eukaryotes than among the E. coli promoters (see Fig. 26–5). Many Pol II promoters lack a TATA box or a consensus Inr element or both. Additional se- quences around the TATA box and downstream (to the right as drawn) of Inr may be recognized by one or more transcription factors. 8885d_c26_995-1033 2/12/04 12:39 PM Page 1003 mac34 mac34: kec_420: (factors usually designated TFII with an additional iden- tifier) are highly conserved in all eukaryotes (Table 26–1). The process of transcription by Pol II can be de- scribed in terms of several phases—assembly, initiation, elongation, termination—each associated with charac- teristic proteins (Fig. 26–9). The step-by-step pathway described below leads to active transcription in vitro. In the cell, many of the proteins may be present in larger, preassembled complexes, simplifying the pathways for assembly on promoters. As you read about this process, consult Figure 26–9 and Table 26–1 to help keep track of the many participants. Chapter 26 RNA Metabolism1004 H1100230 TATA TFIIB TBP (or TFIID and/or TFIIA) DNA TFIIF – Pol II TFIIE TFIIH DNA unwinding to produce open complex phosphorylation of Pol II, initiation, and promoter escape Closed complex Open complex 5H11032 3H11032 TFIID TFIIA TFIIB TFIIF TFIIE TFIIH Pol II TBP Unwound DNA RNA Inr H110011 TFIID TFIIA TFIIB TBP Inr P P P P P P TFIIH RNA Pol II release and dephosphorylation elongation termination Elongation factors TFIIE (a) (b) FIGURE 26–9 Transcription at RNA polymerase II promoters. (a) The sequential assembly of TBP (often with TFIIA), TFIIB, TFIIF plus Pol II, TFIIE, and TFIIH results in a closed complex. TBP often binds as part of a larger complex, TFIID. Some of the TFIID subunits play a role in transcription regulation (see Fig. 28–30). Within the complex, the DNA is unwound at the Inr region by the helicase activity of TFIIH and per- haps of TFIIE, creating an open complex. The carboxyl-terminal do- main of the largest Pol II subunit is phosphorylated by TFIIH, and the polymerase then escapes the promoter and begins transcription. Elon- gation is accompanied by the release of many transcription factors and is also enhanced by elongation factors (see Table 26–1). After ter- mination, Pol II is released, dephosphorylated, and recycled. (b) The structure of human TBP (gray) bound to DNA (blue and white) (PDB ID 1TGH). 8885d_c26_995-1035 2/12/04 11:18 AM Page 1004 mac34 mac34: kec_420: Assembly of RNA Polymerase and Transcription Factors at a Promoter The formation of a closed complex begins when the TATA-binding protein (TBP) binds to the TATA box (Fig. 26–9b). TBP is bound in turn by the transcription factor TFIIB, which also binds to DNA on either side of TBP. TFIIA binding, although not always essential, can stabilize the TFIIB-TBP complex on the DNA and can be important at nonconsensus promoters where TBP binding is relatively weak. The TFIIB-TBP complex is next bound by another complex consisting of TFIIF and Pol II. TFIIF helps target Pol II to its pro- moters, both by interacting with TFIIB and by reducing the binding of the polymerase to nonspecific sites on the DNA. Finally, TFIIE and TFIIH bind to create the closed complex. TFIIH has DNA helicase activity that promotes the unwinding of DNA near the RNA start site (a process requiring the hydrolysis of ATP), thereby cre- ating an open complex. Counting all the subunits of the various essential factors (excluding TFIIA), this mini- mal active assembly has more than 30 polypeptides. RNA Strand Initiation and Promoter Clearance TFIIH has an additional function during the initiation phase. A kinase activity in one of its subunits phosphorylates Pol II at many places in the CTD (Fig. 26–9). Several other pro- tein kinases, including CDK9 (cyclin-dependent kinase 9), which is part of the complex pTEFb (positive tran- scription elongation factor b), also phosphorylate the CTD. This causes a conformational change in the over- all complex, initiating transcription. Phosphorylation of the CTD is also important during the subsequent elon- gation phase, and it affects the interactions between the transcription complex and other enzymes involved in processing the transcript (as described below). During synthesis of the initial 60 to 70 nucleotides of RNA, first TFIIE and then TFIIH is released, and Pol II enters the elongation phase of transcription. Elongation, Termination, and Release TFIIF remains asso- ciated with Pol II throughout elongation. During this stage, the activity of the polymerase is greatly enhanced by proteins called elongation factors (Table 26–1). The elongation factors suppress pausing during transcription and also coordinate interactions between protein com- plexes involved in the posttranscriptional processing of mRNAs. Once the RNA transcript is completed, tran- scription is terminated. Pol II is dephosphorylated and recycled, ready to initiate another transcript (Fig. 26–9). Regulation of RNA Polymerase II Activity Regulation of tran- scription at Pol II promoters is quite elaborate. It in- volves the interaction of a wide variety of other proteins with the preinitiation complex. Some of these regula- tory proteins interact with transcription factors, others with Pol II itself. Many interact through TFIID, a com- plex of about 12 proteins, including TBP and certain 26.1 DNA-Dependent Synthesis of RNA 1005 TABLE 26–1 Proteins Required for Initiation of Transcription at the RNA Polymerase II (Pol II) Promoters of Eukaryotes Transcription Number of protein subunits Subunit(s) M r Function(s) Initiation Pol II 12 10,000–220,000 Catalyzes RNA synthesis TBP (TATA-binding protein) 1 38,000 Specifically recognizes the TATA box TFIIA 3 12,000, 19,000, 35,000 Stabilizes binding of TFIIB and TBP to the promoter TFIIB 1 35,000 Binds to TBP; recruits Pol II–TFIIF complex TFIIE 2 34,000, 57,000 Recruits TFIIH; has ATPase and helicase activities TFIIF 2 30,000, 74,000 Binds tightly to Pol II; binds to TFIIB and prevents binding of Pol II to nonspecific DNA sequences TFIIH 12 35,000–89,000 Unwinds DNA at promoter (helicase activity); phosphorylates Pol II (within the CTD); recruits nucleotide-excision repair proteins Elongation * ELL ? 1 80,000 p-TEFb 2 43,000, 124,000 Phosphorylates Pol II (within the CTD) SII (TFIIS) 1 38,000 Elongin (SIII) 3 15,000, 18,000, 110,000 *The function of all elongation factors is to suppress the pausing or arrest of transcription by the Pol II–TFIIF complex. ? Name derived from eleven-nineteen lysine-rich leukemia. The gene for ELL is the site of chromosomal recombination events frequently associated with acute myeloid leukemia. 8885d_c26_995-1033 2/12/04 2:46 PM Page 1005 mac34 mac34: kec_420: TBP-associated factors, or TAFs. The regulation of tran- scription is described in more detail in Chapter 28. Diverse Functions of TFIIH In eukaryotes, the repair of damaged DNA (see Table 25–5) is more efficient within genes that are actively being transcribed than for other damaged DNA, and the template strand is repaired somewhat more efficiently than the nontemplate strand. These remarkable observations are explained by the al- ternative roles of the TFIIH subunits. Not only does TFIIH participate in the formation of the closed com- plex during assembly of a transcription complex (as de- scribed above), but some of its subunits are also essen- tial components of the separate nucleotide-excision repair complex (see Fig. 25–24). When Pol II transcription halts at the site of a DNA lesion, TFIIH can interact with the lesion and recruit the entire nucleotide-excision repair com- plex. Genetic loss of certain TFIIH subunits can produce human diseases. Some examples are xeroderma pig- mentosum (see Box 25–1) and Cockayne’s syndrome, which is characterized by arrested growth, photosensi- tivity, and neurological disorders. ■ DNA-Dependent RNA Polymerase Undergoes Selective Inhibition The elongation of RNA strands by RNA polymerase in both bacteria and eukaryotes is inhibited by the antibi- otic actinomycin D (Fig. 26–10). The planar portion of this molecule inserts (intercalates) into the double- helical DNA between successive GqC base pairs, deforming the DNA. This prevents movement of the polymerase along the template. Because actinomycin D inhibits RNA elongation in intact cells as well as in cell extracts, it is used to identify cell processes that depend on RNA synthesis. Acridine inhibits RNA synthesis in a similar fashion (Fig. 26–10). Rifampicin inhibits bacterial RNA synthesis by binding to the H9252 subunit of bacterial RNA polymerases, preventing the promoter clearance step of transcription (Fig. 26–6). It is sometimes used as an antibiotic. The mushroom Amanita phalloides has evolved a very effective defense mechanism against predators. It produces H9251-amanitin, which disrupts mRNA formation in animal cells by blocking Pol II and, at higher con- centrations, Pol III. Neither Pol I nor bacterial RNA poly- merase is sensitive to H9251-amanitin—nor is the RNA poly- merase II of A. phalloides itself! SUMMARY 26.1 DNA-Dependent Synthesis of RNA ■ Transcription is catalyzed by DNA-dependent RNA polymerases, which use ribonucleoside 5H11032-triphosphates to synthesize RNA complementary to the template strand of duplex DNA. Transcription occurs in several phases: binding of RNA polymerase to a DNA site called a promoter, initiation of transcript synthesis, elongation, and termination. ■ Bacterial RNA polymerase requires a special subunit to recognize the promoter. As the first committed step in transcription, binding of RNA polymerase to the promoter and initiation of transcription are closely regulated. Transcription stops at sequences called terminators. Chapter 26 RNA Metabolism1006 (b) Sar L-meValL-Pro D-Val L-Thr OC N NH 2 Actinomycin D H11001 Acridine N H O OC Sar L-meValL-Pro D-Val L-Thr O CH 3 CH 3 O O (a) FIGURE 26–10 Actinomycin D and acridine, inhibitors of DNA transcription. (a) The shaded portion of actinomycin D is planar and intercalates between two successive GqC base pairs in duplex DNA. The two cyclic peptide structures of actinomycin D bind to the minor groove of the double helix. Sarcosine (Sar) is N-methylglycine; meVal is methylvaline. Acridine also acts by intercalation in DNA. (b) A complex of actinomycin D with DNA (PDB ID 1DSC). The DNA backbone is shown in blue, the bases are white, the intercalated part of actinomycin (shaded in (a)) is orange, and the remainder of the actinomycin is red. The DNA is bent as a result of the actinomycin binding. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1006 mac34 mac34: kec_420: ■ Eukaryotic cells have three types of RNA polymerases. Binding of RNA polymerase II to its promoters requires an array of proteins called transcription factors. Elongation factors participate in the elongation phase of transcription. The largest subunit of Pol II has a long carboxyl-terminal domain, which is phosphorylated during the initiation and elongation phases. 26.2 RNA Processing Many of the RNA molecules in bacteria and virtually all RNA molecules in eukaryotes are processed to some de- gree after synthesis. Some of the most interesting mo- lecular events in RNA metabolism occur during this postsynthetic processing. Intriguingly, several of the en- zymes that catalyze these reactions consist of RNA rather than protein. The discovery of these catalytic RNAs, or ribozymes, has brought a revolution in think- ing about RNA function and about the origin of life. A newly synthesized RNA molecule is called a pri- mary transcript. Perhaps the most extensive process- ing of primary transcripts occurs in eukaryotic mRNAs and in tRNAs of both bacteria and eukaryotes. The primary transcript for a eukaryotic mRNA typ- ically contains sequences encompassing one gene, al- though the sequences encoding the polypeptide may not be contiguous. Noncoding tracts that break up the cod- ing region of the transcript are called introns, and the coding segments are called exons (see the discussion of introns and exons in DNA in Chapter 24). In a process called splicing, the introns are removed from the pri- mary transcript and the exons are joined to form a con- tinuous sequence that specifies a functional polypep- tide. Eukaryotic mRNAs are also modified at each end. A modified residue called a 5H11032 cap (p. 1008) is added at the 5H11032 end. The 3H11032 end is cleaved, and 80 to 250 A residues are added to create a poly(A) “tail.” The some- times elaborate protein complexes that carry out each of these three mRNA-processing reactions do not oper- ate independently. They appear to be organized in as- sociation with each other and with the phosphorylated CTD of Pol II; each complex affects the function of the others. Other proteins involved in mRNA transport to the cytoplasm are also associated with the mRNA in the nucleus, and the processing of the transcript is coupled to its transport. In effect, a eukaryotic mRNA, as it is synthesized, is ensconced in an elaborate complex in- volving dozens of proteins. The composition of the com- plex changes as the primary transcript is processed, transported to the cytoplasm, and delivered to the ri- bosome for translation. These processes are outlined in Figure 26–11 and described in more detail below. The primary transcripts of prokaryotic and eukary- otic tRNAs are processed by the removal of sequences from each end (cleavage) and in a few cases by the re- moval of introns (splicing). Many bases and sugars in tRNAs are also modified; mature tRNAs are replete with unusual bases not found in other nucleic acids (see Fig. 26–24). The ultimate fate of any RNA is its complete and regulated degradation. The rate of turnover of RNAs plays a critical role in determining their steady-state lev- els and the rate at which cells can shut down expres- sion of a gene whose product is no longer needed. Dur- ing the development of multicellular organisms, for example, certain proteins must be expressed at one stage only, and the mRNA encoding such a protein must be made and destroyed at the appropriate times. 26.2 RNA Processing 1007 completion of primary transcript 5H11032 3H11032 Primary transcript cleavage, polyadenylation, and splicing 5H11032 AAA(A) n 3H11032 Mature mRNA 5H11032 DNA Exon 5H11032 Cap Noncoding end sequence Intron transcription and 5H11032 capping Pol II FIGURE 26–11 Formation of the primary transcript and its processing during maturation of mRNA in a eukaryotic cell. The 5H11032 cap (red) is added before synthesis of the primary transcript is complete. A noncoding sequence following the last exon is shown in orange. Splicing can occur either before or after the cleavage and polyadenylation steps. All the processes shown here take place within the nucleus. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1007 mac34 mac34: kec_420: Eukaryotic mRNAs Are Capped at the 5H11541 End Most eukaryotic mRNAs have a 5H11541 cap, a residue of 7- methylguanosine linked to the 5H11032-terminal residue of the mRNA through an unusual 5H11032,5H11032-triphosphate linkage (Fig. 26–12). The 5H11032 cap helps protect mRNA from ribonucleases. The cap also binds to a specific cap- binding complex of proteins and participates in binding of the mRNA to the ribosome to initiate translation (Chapter 27). The 5H11032 cap is formed by condensation of a molecule of GTP with the triphosphate at the 5H11032 end of the tran- script. The guanine is subsequently methylated at N-7, and additional methyl groups are often added at the 2H11032 hydroxyls of the first and second nucleotides adjacent to the cap (Fig. 26–12). The methyl groups are derived from S-adenosylmethionine. All these reactions occur very early in transcription, after the first 20 to 30 nu- cleotides of the transcript have been added. All three of the capping enzymes, and through them the 5H11032 end of the transcript itself, are associated with the RNA poly- merase II CTD until the cap is synthesized. The capped 5H11032 end is then released from the capping enzymes and bound by the cap-binding complex (Fig. 26–12c). Both Introns and Exons Are Transcribed from DNA into RNA In bacteria, a polypeptide chain is generally encoded by a DNA sequence that is colinear with the amino acid se- quence, continuing along the DNA template without in- terruption until the information needed to specify the polypeptide is complete. However, the notion that all genes are continuous was disproved in 1977 when Chapter 26 RNA Metabolism1008 O N H11001 7-Methyl- guanosine NH 2 N H CH 2 H5008 O O P H HN H Base CH 3 A N O PO O A O A H5008 OPPO O A O A H5008 OPPO O A OH H CH 2 H OH H 3H11032 O A (a) H Sometimes methylated D O H5008 OPPO O D OCH 3 CH 2 O 5H11032,5H11032-Triphosphate linkage H H H Base O H Sometimes methylated OCH 3 A O O A H5008 O PPO D O D O 5H11032 5H11032 2H11032 H 2H11032 pppNp H9253H9252H9251 ppNp P i H9253H9252H9251 GpppNp PP i m 7 GpppNp adoHcy m 7 GpppmNp adoMet adoHcy adoMet Gppp GTP (b) (c) phosphohydrolase guanylyltransferase guanine-7- methyltransferase 2H11032-O-methyltransferase 5H11032 End of RNA with cap 5H11032 End of RNA with triphosphate group Cap- synthesizing complex Cap CBC P P P P P P P P P P P P P P P P P P FIGURE 26–12 The 5H11541 cap of mRNA. (a) 7-Methylguanosine is joined to the 5H11032 end of almost all eukaryotic mRNAs in an unusual 5H11032,5H11032- triphosphate linkage. Methyl groups (pink) are often found at the 2H11032 position of the first and second nucleotides. RNAs in yeast cells lack the 2H11032-methyl groups. The 2H11032-methyl group on the second nucleotide is generally found only in RNAs from vertebrate cells. (b) Generation of the 5H11032 cap involves four to five separate steps (adoHcy is S- adenosylhomocysteine). (c) Synthesis of the cap is carried out by en- zymes tethered to the CTD of Pol II. The cap remains tethered to the CTD through an association with the cap-binding complex (CBC). 8885d_c26_995-1035 2/12/04 11:18 AM Page 1008 mac34 mac34: kec_420: Phillip Sharp and Richard Roberts independently dis- covered that many genes for polypeptides in eukaryotes are interrupted by noncoding sequences (introns). The vast majority of genes in vertebrates contain in- trons; among the few exceptions are those that encode histones. The occurrence of introns in other eukaryotes varies. Many genes in the yeast Saccharomyces cere- visiae lack introns, although in some other yeast species introns are more common. Introns are also found in a few eubacterial and archaebacterial genes. Introns in DNA are transcribed along with the rest of the gene by RNA polymerases. The introns in the primary RNA tran- script are then spliced, and the exons are joined to form a mature, functional RNA. In eukaryotic mRNAs, most exons are less than 1,000 nucleotides long, with many in the 100 to 200 nucleotide size range, encoding stretches of 30 to 60 amino acids within a longer polypeptide. In- trons vary in size from 50 to 20,000 nucleotides. Genes of higher eukaryotes, including humans, typically have much more DNA devoted to introns than to exons. Many genes have introns; some genes have dozens of them. RNA Catalyzes the Splicing of Introns There are four classes of introns. The first two, the group I and group II introns, differ in the details of their splicing mechanisms but share one surprising charac- teristic: they are self-splicing—no protein enzymes are involved. Group I introns are found in some nuclear, mi- tochondrial, and chloroplast genes coding for rRNAs, mRNAs, and tRNAs. Group II introns are generally found in the primary transcripts of mitochondrial or chloro- plast mRNAs in fungi, algae, and plants. Group I and group II introns are also found among the rarer exam- ples of introns in bacteria. Neither class requires a high- energy cofactor (such as ATP) for splicing. The splicing mechanisms in both groups involve two transesterifica- tion reaction steps (Fig. 26–13). A ribose 2H11032- or 3H11032- hydroxyl group makes a nucleophilic attack on a phos- phorus and, in each step, a new phosphodiester bond is formed at the expense of the old, maintaining the bal- ance of energy. These reactions are very similar to the DNA breaking and rejoining reactions promoted by topoisomerases (see Fig. 24–21) and site-specific re- combinases (see Fig. 25–38). The group I splicing reaction requires a guanine nu- cleoside or nucleotide cofactor, but the cofactor is not used as a source of energy; instead, the 3H11032-hydroxyl group of guanosine is used as a nucleophile in the first step of the splicing pathway. The guanosine 3H11032-hydroxyl group forms a normal 3H11032,5H11032-phosphodiester bond with the 5H11032 end of the intron (Fig. 26–14). The 3H11032 hydroxyl of the exon that is displaced in this step then acts as a nucleophile in a similar reaction at the 3H11032 end of the in- tron. The result is precise excision of the intron and lig- ation of the exons. In group II introns the reaction pattern is similar ex- cept for the nucleophile in the first step, which in this case is the 2H11032-hydroxyl group of an A residue within the intron (Fig. 26–15). A branched lariat structure is formed as an intermediate. Self-splicing of introns was first revealed in 1982 in studies of the splicing mechanism of the group I rRNA intron from the ciliated protozoan Tetrahymena ther- mophila, conducted by Thomas Cech and colleagues. These workers transcribed isolated Tetrahymena DNA (including the intron) in vitro using purified bacterial RNA polymerase. The resulting RNA spliced itself ac- curately without any protein enzymes from Tetrahy- mena. The discovery that RNAs could have catalytic functions was a milestone in our understanding of bio- logical systems. 26.2 RNA Processing 1009 O OH P A G 3H11032 H5008 O O O OH O O U OH O 5H11032 Intron Exon Guanosine O OH P A G 3H11032 H5008 O O OH O OH O O U OH O O 5H11032 OH O H11001 FIGURE 26–13 Transesterification reaction. This is the first step in the splicing of group I introns. Here, the 3H11032 OH of a guanosine molecule acts as nucleophile. Thomas Cech 8885d_c26_995-1035 2/12/04 11:18 AM Page 1009 mac34 mac34: kec_420: Chapter 26 RNA Metabolism1010 5H11032 3H11032 Primary transcript The 3H11032 OH of guanosine acts as a nucleophile, attacking the phosphate at the 5H11032 splice site. 5H11032 Intermediate The 3H11032 OH of the 5H11032exon becomes the nucleophile, completing the reaction. Spliced RNA 3H11032 5H11032 UpU pApG OHpG U OH pUG UpA pUG 5H11032 Exon 3H11032 Exon Intron 3H11032 G OH pApG5H11032 3H11032 FIGURE 26–14 Splicing mechanism of group I introns. The nucleophile in the first step may be guanosine, GMP, GDP, or GTP. The spliced intron is eventually degraded. Most introns are not self-splicing, and these types are not designated with a group number. The third and largest class of introns includes those found in nuclear mRNA primary transcripts. These are called spliceo- somal introns, because their removal occurs within and is catalyzed by a large protein complex called a spliceosome. Within the spliceosome, the introns un- dergo splicing by the same lariat-forming mechanism as the group II introns. The spliceosome is made up of spe- cialized RNA-protein complexes, small nuclear ribonu- cleoproteins (snRNPs, often pronounced “snurps”). Each snRNP contains one of a class of eukaryotic RNAs, 100 to 200 nucleotides long, known as small nuclear RNAs (snRNAs). Five snRNAs (U1, U2, U4, U5, and U6) involved in splicing reactions are generally found in abundance in eukaryotic nuclei. The RNAs and proteins in snRNPs are highly conserved in eukaryotes from yeasts to humans. mRNA Splicing Spliceosomal introns generally have the dinu- cleotide sequence GU and AG at the 5H11032 and 3H11032 ends, re- spectively, and these sequences mark the sites where splicing occurs. The U1 snRNA contains a sequence complementary to sequences near the 5H11032 splice site of nuclear mRNA introns (Fig. 26–16a), and the U1 snRNP binds to this region in the primary transcript. Addition of the U2, U4, U5, and U6 snRNPs leads to formation of the spliceosome (Fig. 26–16b). The snRNPs together contribute five RNAs and about 50 proteins to the spliceosome, a supramolecular assembly nearly as com- plex as the ribosome (described in Chapter 27). ATP is required for assembly of the spliceosome, but the RNA cleavage-ligation reactions do not seem to require ATP. Some mRNA introns are spliced by a less common type of spliceosome, in which the U1 and U2 snRNPs are re- placed by the U11 and U12 snRNPs. Whereas U1- and U2-containing spliceosomes remove introns with (5H11032)GU and AG(3H11032) terminal sequences, as shown in Figure 26–16, the U11- and U12-containing spliceosomes re- move a rare class of introns that have (5H11032)AU and AC(3H11032) terminal sequences to mark the intronic splice sites. The spliceosomes used in nuclear RNA splicing may have evolved from more ancient group II introns, with the snRNPs replacing the catalytic domains of their self- splicing ancestors. Some components of the splicing apparatus appear to be tethered to the CTD of RNA polymerase II, sug- gesting an interesting model for the splicing reaction. As the first splice junction is synthesized, it is bound by 8885d_c26_995-1035 2/12/04 11:18 AM Page 1010 mac34 mac34: kec_420: a tethered spliceosome. The second splice junction is then captured by this complex as it passes, facilitating the juxtaposition of the intron ends and the subsequent splicing process (Fig. 26–16c). After splicing, the intron remains in the nucleus and is eventually degraded. The fourth class of introns, found in certain tRNAs, is distinguished from the group I and II introns in that the splicing reaction requires ATP and an endonucle- ase. The splicing endonuclease cleaves the phosphodi- ester bonds at both ends of the intron, and the two ex- ons are joined by a mechanism similar to the DNA ligase reaction (see Fig. 25–16). Although spliceosomal introns appear to be limited to eukaryotes, the other intron classes are not. Genes with group I and II introns have now been found in both bacteria and bacterial viruses. Bacteriophage T4, for ex- ample, has several protein-encoding genes with group I introns. Introns appear to be more common in archae- bacteria than in eubacteria. Eukaryotic mRNAs Have a Distinctive 3H11541 End Structure At their 3H11032 end, most eukaryotic mRNAs have a string of 80 to 250 A residues, making up the poly(A) tail. This tail serves as a binding site for one or more spe- cific proteins. The poly(A) tail and its associated pro- teins probably help protect mRNA from enzymatic de- struction. Many prokaryotic mRNAs also acquire poly(A) tails, but these tails stimulate decay of mRNA rather than protecting it from degradation. The poly(A) tail is added in a multistep process. The transcript is extended beyond the site where the poly(A) tail is to be added, then is cleaved at the poly(A) addition site by an endonuclease component of a large enzyme complex, again associated with the CTD of RNA polymerase II (Fig. 26–17). The mRNA site where cleav- age occurs is marked by two sequence elements: the highly conserved sequence (5H11032)AAUAAA(3H11032), 10 to 30 26.2 RNA Processing 1011 5H11032 UpG 3H11032pU p C pA pA OH The 2H11032 OH of a specific adenosine in the intron acts as a nucleophile, attacking the 5H11032 splice site to form a lariat structure. 5H11032 Adenosine in the lariat structure has three phosphodiester bonds. U OH 3H11032pU Intermediate Primary transcript Intron The 3H11032 OH of the 5H11032 exon acts as a nucleophile, completing the reaction. 5H11032 UpU 3H11032 Spliced RNA p A OH(3H11032) C A G A To 3H11032 end 2H11032,5H11032-Phosphodiester bond GpApC p A G p ApC FIGURE 26–15 Splicing mechanism of group II introns. The chemistry is similar to that of group I intron splicing, except for the identity of the nucleophile in the first step and formation of a lariatlike intermediate, in which one branch is a 2H11032,5H11032-phosphodiester bond. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1011 mac34 mac34: kec_420: 5H11032 3H11032A 5H11032 3H11032A U1 U2 5H11032 3H11032 U5 U2 U1 U4/U6 A Inactive spliceosome 5H11032 3H11032 U5 U2U6 A Active spliceosome 5H11032 3H11032 5H11032 3H11032 U5 U2U6 OH U5 U2 U6 Intron release lariat formation ATP ADP H11001 P i ATP ADP H11001 P i ATP ADP H11001 P i U1 snRNP U2 snRNP U4/U6 H11001 U5 U1, U4 GU AG GU AG GU GU AG AG AG AG A U G A U G (b) UCCA CAUA AUGAUGU A 5H11032 Exon 3H11032 5H11032 3H11032 Exon G GUAGGU UACUA C A U1 U2 H9274H9274 GU AG (a) Spliceosome CTD CBC Cap Spliced intron (c) FIGURE 26–16 Splicing mechanism in mRNA primary transcripts. (a) RNA pairing interactions in the formation of spliceosome complexes. The U1 snRNA has a sequence near its 5H11032 end that is complementary to the splice site at the 5H11032 end of the intron. Base pairing of U1 to this region of the primary transcript helps define the 5H11032 splice site during spliceosome assembly (H9023 is pseudouridine; see Fig. 26–24). U2 is paired to the intron at a position encompassing the A residue (shaded pink) that becomes the nucleophile during the splicing reaction. Base pairing of U2 snRNA causes a bulge that displaces and helps to activate the adenylate, whose 2H11032 OH will form the lariat structure through a 2H11032,5H11032-phosphodiester bond. (b) Assembly of spliceosomes. The U1 and U2 snRNPs bind, then the remaining snRNPs (the U4/U6 complex and U5) bind to form an inactive spliceosome. Internal rearrangements convert this species to an active spliceosome in which U1 and U4 have been expelled and U6 is paired with both the 5H11032 splice site and U2. This is followed by the catalytic steps, which parallel those of the splicing of group II introns (see Fig. 26–15). (c) Coordination of splicing with transcription provides an attractive mechanism for bringing the two splice sites together. See the text for details. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1012 mac34 mac34: kec_420: nucleotides on the 5H11032 side (upstream) of the cleavage site, and a less well-defined sequence rich in G and U residues, 20 to 40 nucleotides downstream of the cleav- age site. Cleavage generates the free 3H11032-hydroxyl group that defines the end of the mRNA, to which A residues are immediately added by polyadenylate polymerase, which catalyzes the reaction RNA H11001 nATP 88n RNA–(AMP) n H11001 nPP i where n H11005 80 to 250. This enzyme does not require a template but does require the cleaved mRNA as a primer. The overall processing of a typical eukaryotic mRNA is summarized in Figure 26–18. In some cases the polypep- tide-coding region of the mRNA is also modified by RNA “editing” (see Box 27–1 for details).This editing includes processes that add or delete bases in the coding regions 26.2 RNA Processing 1013 5H11032 Pol II Enzyme complex Template DNA RNA AAUAAA Cap endonuclease PP i 5H11032 AAUAAA OH(3H11032) ATP 5H11032 AAUAAA AAA(A) n polyadenylate polymerase OH(3H11032) 1 2 3 5H11032 AAUAAA FIGURE 26–17 Addition of the poly(A) tail to the primary RNA tran- script of eukaryotes. Pol II synthesizes RNA beyond the segment of the transcript containing the cleavage signal sequences, including the highly conserved upstream sequence (5H11032)AAUAAA. 1 The cleavage signal sequence is bound by an enzyme complex that includes an en- donuclease, a polyadenylate polymerase, and several other multisub- unit proteins involved in sequence recognition, stimulation of cleav- age, and regulation of the length of the poly(A) tail. 2 The RNA is cleaved by the endonuclease at a point 10 to 30 nucleotides 3H11032 to (downstream of) the sequence AAUAAA. 3 The polyadenylate poly- merase synthesizes a poly(A) tail 80 to 250 nucleotides long, begin- ning at the cleavage site. FIGURE 26–18 Overview of the processing of a eukaryotic mRNA. The ovalbumin gene, shown here, has introns A to G and exons 1 to 7 and L (L encodes a signal peptide sequence that targets the protein for export from the cell; see Fig. 27–34). About three-quarters of the RNA is removed during processing. Pol II extends the primary tran- script well beyond the cleavage and polyadenylation site (“extra RNA”) before terminating transcription. Termination signals for Pol II have not yet been defined. Ovalbumin gene transcription and 5H11032 capping splicing, cleavage, and polyadenylation Extra RNA Seven introns Mature mRNA Primary transcript DNA 5H11032 3H11032 Extra RNA Cap 123 7456L 1,872 nucleotides 56 7 7,700 bp 12 3 4L ABCDEF G 56 712 3 4L ABCDEF G AAA(A) n 8885d_c26_995-1035 2/12/04 11:18 AM Page 1013 mac34 mac34: kec_420: of primary transcripts or that change the sequence (by, for example, enzymatic deamination of a C residue to create a U residue). A particularly dramatic example oc- curs in trypanosomes, which are parasitic protozoa: large regions of an mRNA are synthesized without any uridylate, and the U residues are inserted later by RNA editing. A Gene Can Give Rise to Multiple Products by Differential RNA Processing The transcription of introns seems to consume cellular resources and energy without returning any benefit to the organism, but introns may confer an advantage not yet fully appreciated by scientists. Introns may be ves- tiges of a molecular parasite not unlike transposons (Chapter 25). Although the benefits of introns are not yet clear in most cases, cells have evolved to take ad- vantage of the splicing pathways to alter the expression of certain genes. Most eukaryotic mRNA transcripts produce only one mature mRNA and one corresponding polypeptide, but some can be processed in more than one way to produce different mRNAs and thus different polypeptides. The primary transcript contains molecular signals for all the alternative processing pathways, and the pathway favored in a given cell is determined by processing factors, RNA- binding proteins that promote one particular path. Complex transcripts can have either more than one site for cleavage and polyadenylation or alternative splicing patterns, or both. If there are two or more sites for cleavage and polyadenylation, use of the one closest to the 5H11032 end will remove more of the primary transcript sequence (Fig. 26–19a). This mechanism, called poly(A) site choice, generates diversity in the variable domains of immunoglobulin heavy chains. Alternative splicing patterns (Fig. 26–19b) produce, from a common pri- mary transcript, three different forms of the myosin heavy chain at different stages of fruit fly development. Both mechanisms come into play when a single RNA transcript is processed differently to produce two dif- ferent hormones: the calcium-regulating hormone cal- citonin in rat thyroid and calcitonin-gene-related pep- tide (CGRP) in rat brain (Fig. 26–20). Ribosomal RNAs and tRNAs Also Undergo Processing Posttranscriptional processing is not limited to mRNA. Ribosomal RNAs of both prokaryotic and eukaryotic cells are made from longer precursors called preribosomal RNAs, or pre-rRNAs, synthesized by Pol I. In bacteria, 16S, 23S, and 5S rRNAs (and some tRNAs, although most tRNAs are encoded elsewhere) arise from a single 30S RNA precursor of about 6,500 nucleotides. RNA at both ends of the 30S precursor and segments between the rRNAs are removed during processing (Fig. 26–21). Chapter 26 RNA Metabolism1014 Poly(A) site cleavage and polyadenylation AAA(A) n AAA(A) n AAA(A) n AAA(A) n AAA(A) n 5H11032 Splice site 3H11032 Splice sites DNA Cap DNA Primary transcript Cap Primary transcript Poly(A) sites A 1 A 2 A 1 A 2 Mature mRNA cleavage and polyadenylation at A 1 cleavage and polyadenylation at A 2 (a) Mature mRNA (b) splicing FIGURE 26–19 Two mechanisms for the alternative processing of complex transcripts in eukaryotes. (a) Alternative cleavage and polyadenylation patterns. Two poly(A) sites, A 1 and A 2 , are shown. (b) Alternative splicing patterns. Two different 3H11032 splice sites are shown. In both mechanisms, different mature mRNAs are produced from the same primary transcript. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1014 mac34 mac34: kec_420: 26.2 RNA Processing 1015 Exon Intron 123 Calcitonin CGRP 6 Poly(A) site Poly(A) site cleavage and polyadenylation cleavage and polyadenylation Primary transcript 6 AAA(A) n BrainThyroid 54321 AAA(A) n 3 421 AAA(A) n 3 421 Mature mRNA AAA(A) n 3 521 Mature mRNA splicing splicing translation translation 6 protease action protease action Calcitonin CGRP 45 FIGURE 26–20 Alternative processing of the calcitonin gene tran- script in rats. The primary transcript has two poly(A) sites; one pre- dominates in the brain, the other in the thyroid. In the brain, splicing eliminates the calcitonin exon (exon 4); in the thyroid, this exon is re- tained. The resulting peptides are processed further to yield the final hormone products: calcitonin-gene-related peptide (CGRP) in the brain and calcitonin in the thyroid. methyl groups Intermediates Mature RNAs Pre-rRNA transcript (30S) 16S tRNA (4S) 23S 5S methylation cleavage nucleases 17S 25S nucleases tRNA tRNA 16S rRNA 23S rRNA 5S rRNA 5S 112 31 13 3 2 1 3 FIGURE 26–21 Processing of pre-rRNA transcripts in bacteria. 1 Before cleavage, the 30S RNA precursor is methylated at specific bases. 2 Cleavage liberates precursors of rRNAs and tRNA(s). Cleavage at the points labeled 1, 2, and 3 is carried out by the enzymes RNase III, RNase P, and RNase E, respectively. As discussed later in the text, RNase P is a ribozyme. 3 The final 16S, 23S, and 5S rRNA products result from the action of a variety of specific nucleases. The seven copies of the gene for pre-rRNA in the E. coli chromosome differ in the number, location, and identity of tRNAs included in the primary transcript. Some copies of the gene have additional tRNA gene segments between the 16S and 23S rRNA segments and at the far 3H11032 end of the primary transcript. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1015 mac34 mac34: kec_420: The genome of E. coli encodes seven pre-rRNA mol- ecules. All these genes have essentially identical rRNA- coding regions, but they differ in the segments between these regions. The segment between the 16S and 23S rRNA genes generally encodes one or two tRNAs, with different tRNAs arising from different pre-rRNA tran- scripts. Coding sequences for tRNAs are also found on the 3H11032 side of the 5S rRNA in some precursor transcripts. In eukaryotes, a 45S pre-rRNA transcript is processed in the nucleolus to form the 18S, 28S, and 5.8S rRNAs characteristic of eukaryotic ribosomes (Fig. 26–22). The 5S rRNA of most eukaryotes is made as a completely separate transcript by a different poly- merase (Pol III instead of Pol I). Most cells have 40 to 50 distinct tRNAs, and eu- karyotic cells have multiple copies of many of the tRNA genes. Transfer RNAs are derived from longer RNA pre- cursors by enzymatic removal of nucleotides from the 5H11032 and 3H11032 ends (Fig. 26–23). In eukaryotes, introns are present in a few tRNA transcripts and must be excised. Where two or more different tRNAs are contained in a single primary transcript, they are separated by enzymatic cleavage. The endonuclease RNase P, found in all organisms, removes RNA at the 5H11032 end of tRNAs. This enzyme contains both protein and RNA. The RNA component is essential for activity, and in bacterial cells it can carry out its processing function with precision even without the protein component. RNase P is there- fore another example of a catalytic RNA, as described in more detail below. The 3H11032 end of tRNAs is processed by one or more nucleases, including the exonuclease RNase D. Chapter 26 RNA Metabolism1016 Pre-rRNA transcript (45S) 18S 5.8S 28S methylation methyl groups cleavage 5.8S rRNA18S rRNA 28S rRNA Mature rRNAs 2 1 FIGURE 26–22 Processing of pre-rRNA transcripts in vertebrates. In step 1 , the 45S precursor is methylated at more than 100 of its 14,000 nucleotides, mostly on the 2H11032-OH groups of ribose units retained in the final products. 2 A series of enzymatic cleavages produces the 18S, 5.8S, and 28S rRNAs. The cleavage reactions require RNAs found in the nucleolus, called small nucleolar RNAs (snoRNAs), within protein complexes reminiscent of spliceosomes. The 5S rRNA is produced separately. A G A C G GGCG C CCGC C G A C U U U GCCA CGGAA U G G G A U U U U A 3H11032 U U C G G C A G G G C C G A U C OH U U AGUUAAUUGACUAUUG5H11032 U A C A C A A G A C U U C U U G G A A A G G C A U C C U U A U A RNase D cut Primary transcript RNase P cut H9274 H9274 H9274 A G A mC G GGCG C CCGC C G mA C U T D C A A G A C U U C U G G A A A mG mG GCCA CGGAA D G G A D D D D 3H11032 U U C G G C A G G G C C G A C C A U C OH mG Mature tRNA Tyr p U 5H11032 p U A C A C U G A A A G C A U C C U U A A Intermediate H9274 H9274 H9274 A G A mC G GGCG C CCGC C G mA C U T D C A A G A U U C G mG mG GCCA CGGAA D G G A D D D D 3H11032 U U C G G C A G G G C C G A C C A U C OH mG 5H11032 p splicingbase modification 5H11032 cleavage 3H11032 cleavage CCA addition FIGURE 26–23 Processing of tRNAs in bacteria and eukaryotes. The yeast tRNA Tyr (the tRNA specific for tyrosine binding; see Chapter 27) is used to illustrate the important steps. The nucleotide sequences shown in yellow are removed from the primary transcript. The ends are processed first, the 5H11032 end before the 3H11032 end. CCA is then added to the 3H11032 end, a necessary step in processing eukaryotic tRNAs and those bacterial tRNAs that lack this sequence in the primary transcript. While the ends are being processed, specific bases in the rest of the transcript are modified (see Fig. 26–24). For the eukaryotic tRNA shown here, the final step is splicing of the 14-nucleotide intron. In- trons are found in some eukaryotic tRNAs but not in bacterial tRNAs. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1016 mac34 mac34: kec_420: Transfer RNA precursors may undergo further post- transcriptional processing. The 3H11032-terminal trinucleotide CCA(3H11032) to which an amino acid will be attached dur- ing protein synthesis (Chapter 27) is absent from some bacterial and all eukaryotic tRNA precursors and is added during processing (Fig. 26–23). This addition is carried out by tRNA nucleotidyltransferase, an unusual enzyme that binds the three ribonucleoside triphos- phate precursors in separate active sites and catalyzes formation of the phosphodiester bonds to produce the CCA(3H11032) sequence. The creation of this defined se- quence of nucleotides is therefore not dependent on a DNA or RNA template—the template is the binding site of the enzyme. The final type of tRNA processing is the modifica- tion of some of the bases by methylation, deamination, or reduction (Fig. 26–24). In the case of pseudouridine (H9023), the base (uracil) is removed and reattached to the sugar through C-5. Some of these modified bases occur at characteristic positions in all tRNAs (Fig. 26–23). RNA Enzymes Are the Catalysts of Some Events in RNA Metabolism The study of posttranscriptional processing of RNA mol- ecules led to one of the most exciting discoveries in modern biochemistry—the existence of RNA enzymes. The best-characterized ribozymes are the self-splicing group I introns, RNase P, and the hammerhead ribozyme (discussed below). Most of the activities of these ri- bozymes are based on two fundamental reactions: trans- esterification (Fig. 26–13) and phosphodiester bond hy- drolysis (cleavage). The substrate for ribozymes is often an RNA molecule, and it may even be part of the ri- bozyme itself. When its substrate is RNA, an RNA cat- alyst can make use of base-pairing interactions to align the substrate for the reaction. Ribozymes vary greatly in size. A self-splicing group I intron may have more than 400 nucleotides. The ham- merhead ribozyme consists of two RNA strands with only 41 nucleotides in all (Fig. 26–25). As with protein enzymes, the three-dimensional structure of ribozymes is important for function. Ribozymes are inactivated by heating above their melting temperature or by addition of denaturing agents or complementary oligonu- cleotides, which disrupt normal base-pairing patterns. Ribozymes can also be inactivated if essential nu- cleotides are changed. The secondary structure of a self- splicing group I intron from the 26S rRNA precursor of Tetrahymena is shown in detail in Figure 26–26. Enzymatic Properties of Group I Introns Self-splicing group I introns share several properties with enzymes besides accelerating the reaction rate, including their kinetic be- haviors and their specificity. Binding of the guanosine cofactor (Fig. 26–13) to the Tetrahymena group I rRNA intron (Fig. 26–26) is saturable (K m ≈ 30 H9262M) and can be competitively inhibited by 3H11032-deoxyguanosine. The intron is very precise in its excision reaction, largely due to a segment called the internal guide sequence that can base-pair with exon sequences near the 5H11032 splice site (Fig. 26–26). This pairing promotes the alignment of specific bonds to be cleaved and rejoined. Because the intron itself is chemically altered dur- ing the splicing reaction—its ends are cleaved—it may appear to lack one key enzymatic property: the ability to catalyze multiple reactions. Closer inspection has shown that after excision, the 414 nucleotide intron from Tetrahymena rRNA can, in vitro, act as a true enzyme (but in vivo it is quickly degraded). A series of 26.2 RNA Processing 1017 Ribose S CH N O N N N O Ribose N N NH 2 N CH 3 O N N HN Ribose N CO H 3 C CH 2 ONH P N D G N CH 3 HN Ribose Ribose N H O O H9274 Ribose HN N CH 3 O H Dihydrouridine (D) HN Ribose HN NO O O O i f O H H H A AA H Pseudouridine ( )Ribothymidine (T)N 6 -Isopentenyladenosine (i 6 A) 4-Thiouridine (S 4 U) Inosine (I) 1-Methylguanosine (m 1 G) FIGURE 26–24 Some modified bases of tRNAs, produced in posttranscriptional reactions. The standard symbols (used in Fig. 26–23) are shown in parentheses. Note the unusual ribose attachment point in pseudouridine. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1017 mac34 mac34: kec_420: Chapter 26 RNA Metabolism1018 5H11032 3H11032 P5b P5a P6a P6b P2.1 P2 P5 P4 P1 P6 P3 P8 P7 P9.0 P9.2 P9.1 P9.1a P9 P10 P5c A AA G CG UG UG UA GC AU GC AU U U U A G G G G G G CC G C GC U U A A A A A A U A A A U C C G G UA GC AU U U U A A A A A A GC GU UG CG CG G A CA GU A A A U U U G G A G U A A U A U U A U G G G U U U U A A U U A G U C G G A G C A A A U G C G C G U U U CG GC GC CG CG CG GC C CA G C CA AG AC G AU U U AA UG U U U UA GG AC AC A A AU UU C U A G UA CG CG UA AU GC GU GC AU G G A C C G U UA UA U C C A U U A A A A U U U CG CG AU AU A A A U UA AA AU GC UG UA A A U U U G CG G A A U GC GC C C AG CG A A U U G C U G UA U A A CG UA G U GC A U UU CG CG GC CG A A U A G A A U U UA A AU GC UA CG GC C GUA A G G U A G C G U G A U G C C G GA AU GG GA G C G C A AA UA GC U G C U A G C U A C C AC U U UA UA A A A CG UG GC GU UA CG AU AU CG AU GC AGACA C A UG A 340 360 320 20 120 260 140 220 240 60 80 40 300 100 280 200 180 160 400 380 FIGURE 26–26 Secondary structure of the self-splicing rRNA intron from Tetrahymena. Intron sequences are shaded yellow, exon sequences green. Each thick yellow line represents a bond between neighboring nucleotides in a continuous sequence (a device necessitated by showing this complex molecule in two dimensions; similarly an oversize blue line between a C and G residue indicates normal base pairing); all nucleotides are shown. The catalytic core of the self-splicing activity is shaded. Some base-paired regions are labeled (P1, P3, P2.1, P5a, and so forth) according to an established convention for this RNA molecule. The P1 region, which contains the internal guide sequence (boxed), is the location of the 5H11032 splice site (red arrow). Part of the internal guide sequence pairs with the end of the 3H11032 exon, bringing the 5H11032 and 3H11032 splice sites (red and blue arrows) into close proximity. The three- dimensional structure of a large segment of this intron is illustrated in Figure 8–28c. (b) (a) G5H11032 3H11032 3H11032 5H11032C G C A U C G C G C GU C G A G A A C U G A U A G U CG UA C GA U A G C G A C FIGURE 26–25 Hammerhead ribozyme. Certain viruslike elements called virusoids have small RNA genomes and usually require another virus to assist in their replication and/or packaging. Some virusoid RNAs include small segments that promote site- specific RNA cleavage reactions associated with replication. These segments are called hammerhead ribozymes, because their secondary structures are shaped like the head of a hammer. Hammerhead ribozymes have been defined and studied separately from the much larger viral RNAs. (a) The minimal sequences required for catalysis by the ribozyme. The boxed nucleotides are highly conserved and are required for catalytic function. The arrow indicates the site of self-cleavage. (b) Three-dimensional structure (PDB 1D 1MME). The strands are colored as in (a). The hammerhead ribozyme is a metalloenzyme; Mg 2H11001 ions are required for activity. The phosphodiester bond at the site of self-cleavage is indicated by an arrow. Hammerhead Ribozyme 8885d_c26_995-1035 2/12/04 11:18 AM Page 1018 mac34 mac34: kec_420: 26.2 RNA Processing 1019 intramolecular cyclization and cleavage reactions in the excised intron leads to the loss of 19 nucleotides from its 5H11032 end. The remaining 395 nucleotide, linear RNA— referred to as L-19 IVS—promotes nucleotidyl transfer reactions in which some oligonucleotides are lengthened at the expense of others (Fig. 26–27). The best sub- strates are oligonucleotides, such as a synthetic (C) 5 oligomer, that can base-pair with the same guanylate- rich internal guide sequence that held the 5H11032 exon in place for self-splicing. The enzymatic activity of the L-19 IVS ribozyme re- sults from a cycle of transesterification reactions mech- anistically similar to self-splicing. Each ribozyme mole- cule can process about 100 substrate molecules per hour and is not altered in the reaction; therefore the intron acts as a catalyst. It follows Michaelis-Menten kinetics, is specific for RNA oligonucleotide substrates, and can be competitively inhibited. The k cat /K m (specificity con- stant) is 10 3 M H110021 s H110021 , lower than that of many enzymes, but the ribozyme accelerates hydrolysis by a factor of 10 10 relative to the uncatalyzed reaction. It makes use of substrate orientation, covalent catalysis, and metal- ion catalysis—strategies used by protein enzymes. Characteristics of Other Ribozymes E. coli RNase P has both an RNA component (the M1 RNA, with 377 nu- cleotides) and a protein component (M r 17,500). In 1983 Sidney Altman and Norman Pace and their coworkers discovered that under some conditions, the M1 RNA alone is capable of catalysis, cleaving tRNA precursors at the correct position. The protein component appar- ently serves to stabilize the RNA or facilitate its func- tion in vivo. The RNase P ribozyme recognizes the three- dimensional shape of its pre-tRNA substrate, along with the CCA sequence, and thus can cleave the 5H11032 leaders from diverse tRNAs (Fig. 26–23). The known catalytic repertoire of ribozymes con- tinues to expand. Some virusoids, small RNAs associ- ated with plant RNA viruses, include a structure that promotes a self-cleavage reaction; the hammerhead ribozyme illustrated in Figure 26–25 is in this class, catalyzing the hydrolysis of an internal phosphodiester bond. The splicing reaction that occurs in a spliceosome seems to rely on a catalytic center formed by the U2, U5, and U6 snRNAs (Fig. 26–16). And perhaps most im- portant, an RNA component of ribosomes catalyzes the synthesis of proteins (Chapter 27). Exploring catalytic RNAs has provided new insights into catalytic function in general and has important im- plications for our understanding of the origin and evo- lution of life on this planet, a topic discussed in Section 26.3. Spliced rRNA intron GOH(5H11032) G A A A U A G C A A U A U A U A C C U U U G G A G G G L-19 IVS A 19 nucleotides from 5H11032 end G OH (3H11032) (a) (5H11032) U U G G A G G G A (C) 5 HO OH HO (C) 5 C C C C CHO C C C C CHO HO C C C C CHO C C C C C C C C C C (C) 6 3 4 1 2 C G HO C G (b) U U G G A G G G (3H11032) (5H11032) (C) 4 G (3H11032) HO U U G G A G G G A C C C C C U U G G A G G G A U U G G A G G G A G OH FIGURE 26–27 In vitro catalytic activity of L-19 IVS. (a) L-19 IVS is generated by the autocatalytic removal of 19 nucleotides from the 5H11032 end of the spliced Tetrahymena intron. The cleavage site is indicated by the arrow in the internal guide sequence (boxed). The G residue (shaded pink) added in the first step of the splicing reaction (see Fig. 26–14) is part of the removed sequence. A portion of the internal guide sequence remains at the 5H11032 end of L-19 IVS. (b) L-19 IVS lengthens some RNA oligonucleotides at the expense of others in a cycle of transesterification reactions (steps 1 through 4 ). The 3H11032 OH of the G residue at the 3H11032 end of L-19 IVS plays a key role in this cycle (note that this is not the G residue added in the splicing reaction). (C) 5 is one of the ribozyme’s better substrates because it can base-pair with the guide sequence remaining in the intron. Although this catalytic activity is probably irrelevant to the cell, it has important implications for current hypotheses on evolution, discussed at the end of this chapter. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1019 mac34 mac34: kec_420: Cellular mRNAs Are Degraded at Different Rates The expression of genes is regulated at many levels. A crucial factor governing a gene’s expression is the cel- lular concentration of its associated mRNA. The con- centration of any molecule depends on two factors: its rate of synthesis and its rate of degradation. When syn- thesis and degradation of an mRNA are balanced, the concentration of the mRNA remains in a steady state. A change in either rate will lead to net accumulation or depletion of the mRNA. Degradative pathways ensure that mRNAs do not build up in the cell and direct the synthesis of unnecessary proteins. The rates of degradation vary greatly for mRNAs from different eukaryotic genes. For a gene product that is needed only briefly, the half-life of its mRNA may be only minutes or even seconds. Gene products needed constantly by the cell may have mRNAs that are stable over many cell generations. The average half-life of a vertebrate cell mRNA is about 3 hours, with the pool of each type of mRNA turning over about ten times per cell generation. The half-life of bacterial mRNAs is much shorter—only about 1.5 min—perhaps because of reg- ulatory requirements. Messenger RNA is degraded by ribonucleases pres- ent in all cells. In E. coli, the process begins with one or a few cuts by an endoribonuclease, followed by 3H11032n5H11032 degradation by exoribonucleases. In lower eukaryotes, the major pathway involves first shortening the poly(A) tail, then decapping the 5H11032 end and degrading the mRNA in the 5H11032n3H11032 direction. A 3H11032n5H11032 degradative pathway also exists and may be the major path in higher eu- karyotes. All eukaryotes have a complex of up to ten conserved 3H11032n5H11032 exoribonucleases, called the exosome, which is involved in the processing of the 3H11032 end of rRNAs and tRNAs as well as the degradation of mRNAs. A hairpin structure in bacterial mRNAs with a H9267- independent terminator (Fig. 26–7) confers stability against degradation. Similar hairpin structures can make some parts of a primary transcript more stable, leading to nonuniform degradation of transcripts. In eukaryotic cells, both the 3H11032 poly(A) tail and the 5H11032 cap are im- portant to the stability of many mRNAs. Life Cycle of an mRNA Polynucleotide Phosphorylase Makes Random RNA-like Polymers In 1955, Marianne Grunberg-Manago and Severo Ochoa discovered the bacterial enzyme polynucleotide phos- phorylase, which in vitro catalyzes the reaction (NMP) n H11001 NDP (NMP) nH110011 H11001 P i Lengthened polynucleotide Polynucleotide phosphorylase was the first nucleic acid– synthesizing enzyme discovered (Arthur Kornberg’s dis- covery of DNA polymerase followed soon thereafter). z y The reaction catalyzed by polynucleotide phosphorylase differs fundamentally from the polymerase activities dis- cussed so far in that it is not template-dependent. The enzyme uses the 5H11032-diphosphates of ribonucleosides as substrates and cannot act on the homologous 5H11032-triphos- phates or on deoxyribonucleoside 5H11032-diphosphates. The RNA polymer formed by polynucleotide phosphorylase contains the usual 3H11032,5H11032-phosphodiester linkages, which can be hydrolyzed by ribonuclease. The reaction is read- ily reversible and can be pushed in the direction of breakdown of the polyribonucleotide by increasing the phosphate concentration. The probable function of this enzyme in the cell is the degradation of mRNAs to nu- cleoside diphosphates. Because the polynucleotide phosphorylase reaction does not use a template, the polymer it forms does not have a specific base sequence. The reaction proceeds equally well with any or all of the four nucleoside diphos- phates, and the base composition of the resulting poly- mer reflects nothing more than the relative concentra- tions of the 5H11032-diphosphate substrates in the medium. Polynucleotide phosphorylase can be used in the laboratory to prepare RNA polymers with many differ- ent base sequences and frequencies. Synthetic RNA polymers of this sort were critical for deducing the ge- netic code for the amino acids (Chapter 27). SUMMARY 26.2 RNA Processing ■ Eukaryotic mRNAs are modified by addition of a 7-methylguanosine residue at the 5H11032 end and by cleavage and polyadenylation at the 3H11032 end to form a long poly(A) tail. ■ Many primary mRNA transcripts contain introns (noncoding regions), which are removed by splicing. Excision of the group I introns found in some rRNAs requires a guanosine cofactor. Some group I and group II introns are capable of self-splicing; no protein enzymes are required. Nuclear mRNA precursors have a third class (the largest class) of introns, which are spliced Chapter 26 RNA Metabolism1020 Marianne Grunberg-Manago Severo Ochoa, 1905–1993 8885d_c26_995-1035 2/12/04 11:18 AM Page 1020 mac34 mac34: kec_420: with the aid of RNA-protein complexes called snRNPs, assembled into spliceosomes. A fourth class of introns, found in some tRNAs, is the only class known to be spliced by protein enzymes. ■ Ribosomal RNAs and transfer RNAs are derived from longer precursor RNAs, trimmed by nucleases. Some bases are modified enzymatically during the maturation process. ■ The self-splicing introns and the RNA component of RNase P (which cleaves the 5H11032 end of tRNA precursors) are two examples of ribozymes. These biological catalysts have the properties of true enzymes. They generally pro- mote hydrolytic cleavage and transesterification, using RNA as substrate. Combinations of these reactions can be promoted by the excised group I intron of Tetrahymena rRNA, resulting in a type of RNA polymerization reaction. ■ Polynucleotide phosphorylase reversibly forms RNA-like polymers from ribonucleoside 5H11032-diphosphates, adding or removing ribonucleotides at the 3H11032-hydroxyl end of the polymer. The enzyme degrades RNA in vivo. 26.3 RNA-Dependent Synthesis of RNA and DNA In our discussion of DNA and RNA synthesis up to this point, the role of the template strand has been reserved for DNA. However, some enzymes use an RNA template for nucleic acid synthesis. With the very important ex- ception of viruses with an RNA genome, these enzymes play only a modest role in information pathways. RNA viruses are the source of most RNA-dependent poly- merases characterized so far. The existence of RNA replication requires an elab- oration of the central dogma (Fig. 26–28; contrast this with the diagram on p. 922). The enzymes involved in RNA replication have profound implications for investi- gations into the nature of self-replicating molecules that may have existed in prebiotic times. Reverse Transcriptase Produces DNA from Viral RNA Certain RNA viruses that infect animal cells carry within the viral particle an RNA-dependent DNA polymerase called reverse transcriptase. On infection, the single- stranded RNA viral genome (~10,000 nucleotides) and the enzyme enter the host cell. The reverse transcrip- tase first catalyzes the synthesis of a DNA strand com- plementary to the viral RNA (Fig. 26–29), then degrades the RNA strand of the viral RNA-DNA hybrid and re- places it with DNA. The resulting duplex DNA often be- comes incorporated into the genome of the eukaryotic host cell. These integrated (and dormant) viral genes can be activated and transcribed, and the gene prod- ucts—viral proteins and the viral RNA genome itself— packaged as new viruses. The RNA viruses that contain reverse transcriptases are known as retroviruses (retro is the Latin prefix for “backward”). 26.3 RNA-Dependent Synthesis of RNA and DNA 1021 DNA replication DNA RNA RNA replication Reverse transcription Transcription Translation Protein FIGURE 26–28 Extension of the central dogma to include RNA- dependent synthesis of RNA and DNA. FIGURE 26–29 Retroviral infection of a mammalian cell and inte- gration of the retrovirus into the host chromosome. Viral particles entering the host cell carry viral reverse transcriptase and a cellular tRNA (picked up from a former host cell) already base-paired to the viral RNA. The tRNA facilitates immediate conversion of viral RNA to double-stranded DNA by the action of reverse transcriptase, as de- scribed in the text. Once converted to double-stranded DNA, the DNA enters the nucleus and is integrated into the host genome. The integration is catalyzed by a virally encoded integrase. Integration of viral DNA into host DNA is mechanistically similar to the insertion of transposons in bacterial chromosomes (see Fig. 25–43). For ex- ample, a few base pairs of host DNA become duplicated at the site of integration, forming short repeats of 4 to 6 bp at each end of the inserted retroviral DNA (not shown). Cytoplasm RNA genome Retrovirus Host cell RNA reverse transcription Viral DNA Nucleus Chromosome integration 8885d_c26_995-1035 2/12/04 11:18 AM Page 1021 mac34 mac34: kec_420: The existence of reverse transcriptases in RNA viruses was predicted by Howard Temin in 1962, and the enzymes were ultimately detected by Temin and, inde- pendently, by David Baltimore in 1970. Their discovery aroused much attention as dogma-shaking proof that genetic information can flow “backward” from RNA to DNA. Retroviruses typically have three genes: gag (de- rived from the historical designation group associated antigen), pol, and env (Fig. 26–30). The transcript that contains gag and pol is translated into a long “polypro- tein,” a single large polypeptide that is cleaved into six proteins with distinct functions. The proteins derived from the gag gene make up the interior core of the vi- ral particle. The pol gene encodes the protease that cleaves the long polypeptide, an integrase that inserts the viral DNA into the host chromosomes, and reverse transcriptase. Many reverse transcriptases have two Chapter 26 RNA Metabolism1022 Howard Temin, David Baltimore 1934–1994 gagw pol env LTR Host-cell DNA LTR transcription translation Primary transcript Virus structural proteins Integrase Protease Reverse transcriptase proteolytic cleavage Polyprotein A Viral envelope proteins proteolytic cleavage Polyprotein B FIGURE 26–30 Structure and gene products of an integrated retro- viral genome. The long terminal repeats (LTRs) have sequences needed for the regulation and initiation of transcription. The sequence denoted H9023 is required for packaging of retroviral RNAs into mature viral par- ticles. Transcription of the retroviral DNA produces a primary tran- script encompassing the gag, pol, and env genes. Translation (Chap- ter 27) produces a polyprotein, a single long polypeptide derived from the gag and pol genes, which is cleaved into six distinct proteins. Splic- ing of the primary transcript yields an mRNA derived largely from the env gene, which is also translated into a polyprotein, then cleaved to generate viral envelope proteins. subunits, H9251 and H9252. The pol gene specifies the H9252 subunit (M r 90,000), and the H9251 subunit (M r 65,000) is simply a proteolytic fragment of the H9252 subunit. The env gene en- codes the proteins of the viral envelope. At each end of the linear RNA genome are long terminal repeat (LTR) sequences of a few hundred nucleotides. Transcribed into the duplex DNA, these sequences facilitate inte- gration of the viral chromosome into the host DNA and contain promoters for viral gene expression. Reverse transcriptases catalyze three different re- actions: (1) RNA-dependent DNA synthesis, (2) RNA degradation, and (3) DNA-dependent DNA synthesis. Like many DNA and RNA polymerases, reverse tran- scriptases contain Zn 2H11001 . Each transcriptase is most ac- tive with the RNA of its own virus, but each can be used experimentally to make DNA complementary to a vari- ety of RNAs. The DNA and RNA synthesis and RNA degradation activities use separate active sites on the protein. For DNA synthesis to begin, the reverse tran- scriptase requires a primer, a cellular tRNA obtained during an earlier infection and carried within the viral particle. This tRNA is base-paired at its 3H11032 end with a complementary sequence in the viral RNA. The new DNA strand is synthesized in the 5H11032n3H11032 direction, as in all RNA and DNA polymerase reactions. Reverse tran- scriptases, like RNA polymerases, do not have 3H11032n5H11032 proofreading exonucleases. They generally have error rates of about 1 per 20,000 nucleotides added. An error rate this high is extremely unusual in DNA replication and appears to be a feature of most enzymes that repli- cate the genomes of RNA viruses. A consequence is a higher mutation rate and faster rate of viral evolution, which is a factor in the frequent appearance of new strains of disease-causing retroviruses. Reverse transcriptases have become important reagents in the study of DNA-RNA relationships and in DNA cloning techniques. They make possible the syn- thesis of DNA complementary to an mRNA template, and synthetic DNA prepared in this manner, called com- plementary DNA (cDNA), can be used to clone cel- lular genes (see Fig. 9–14). 8885d_c26_995-1035 2/12/04 11:18 AM Page 1022 mac34 mac34: kec_420: Some Retroviruses Cause Cancer and AIDS Retroviruses have featured prominently in recent ad- vances in the molecular understanding of cancer. Most retroviruses do not kill their host cells but remain inte- grated in the cellular DNA, replicating when the cell di- vides. Some retroviruses, classified as RNA tumor viruses, contain an oncogene that can cause the cell to grow abnormally (see Fig. 12–47). The first retrovirus of this type to be studied was the Rous sarcoma virus (also called avian sarcoma virus; Fig. 26–31), named for F. Peyton Rous, who studied chicken tumors now known to be caused by this virus. Since the initial discovery of oncogenes by Harold Varmus and Michael Bishop, many dozens of such genes have been found in retroviruses. The human immunodeficiency virus (HIV), which causes acquired immune deficiency syndrome (AIDS), is a retrovirus. Identified in 1983, HIV has an RNA genome with standard retroviral genes along with sev- eral other unusual genes (Fig. 26–32). Unlike many other retroviruses, HIV kills many of the cells it infects (principally T lymphocytes) rather than causing tumor formation. This gradually leads to suppression of the im- mune system in the host organism. The reverse tran- scriptase of HIV is even more error prone than other known reverse transcriptases—ten times more so— resulting in high mutation rates in this virus. One or more errors are generally made every time the viral genome is replicated, so any two viral RNA molecules are likely to differ. Many modern vaccines for viral infections consist of one or more coat proteins of the virus, produced by methods described in Chapter 9. These proteins are not infectious on their own but stimulate the immune sys- tem to recognize and resist subsequent viral invasions (Chapter 5). Because of the high error rate of the HIV reverse transcriptase, the env gene in this virus (along with the rest of the genome) undergoes very rapid mu- tation, complicating the development of an effective vaccine. However, repeated cycles of cell invasion and replication are needed to propagate an HIV infection, so inhibition of viral enzymes offers promise as an ef- fective therapy. The HIV protease is targeted by a class of drugs called protease inhibitors (see Box 6–3). Re- verse transcriptase is the target of some additional drugs widely used to treat HIV-infected individuals (Box 26–2). Many Transposons, Retroviruses, and Introns May Have a Common Evolutionary Origin Some well-characterized eukaryotic DNA transposons from sources as diverse as yeast and fruit flies have a structure very similar to that of retroviruses; these are sometimes called retrotransposons (Fig. 26–33). Retro- transposons encode an enzyme homologous to the retro- viral reverse transcriptase, and their coding regions are flanked by LTR sequences. They transpose from one po- sition to another in the cellular genome by means of an RNA intermediate, using reverse transcriptase to make a DNA copy of the RNA, followed by integration of the DNA at a new site. Most transposons in eukaryotes use this mechanism for transposition, distinguishing them from bacterial transposons, which move as DNA directly from one chromosomal location to another (see Fig. 25–43). 26.3 RNA-Dependent Synthesis of RNA and DNA 1023 gag envpol LTR LTR src FIGURE 26–31 Rous sarcoma virus genome. The src gene encodes a tyrosine-specific protein kinase, one of a class of enzymes known to function in systems that affect cell division, cell-cell interactions, and intercellular communication (Chapter 12). The same gene is found in chicken DNA (the usual host for this virus) and in the genomes of many other eukaryotes, including humans. When associated with the Rous sarcoma virus, this oncogene is often expressed at abnormally high levels, contributing to unregulated cell division and cancer. LTR LTR pol env gag vif tat vpu tat nef revrevvpr FIGURE 26–32 The genome of HIV, the virus that causes AIDS. In addition to the typical retroviral genes, HIV contains several small genes with a variety of functions (not identified here, and not all known). Some of these genes overlap (see Box 27–1). Alternative splicing mechanisms produce many different proteins from this small (9.7 H11003 10 3 nucleotides) genome. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1023 mac34 mac34: kec_420: Retrotransposons lack an env gene and so cannot form viral particles. They can be thought of as defective viruses, trapped in cells. Comparisons between retro- viruses and eukaryotic transposons suggest that reverse transcriptase is an ancient enzyme that predates the evolution of multicellular organisms. Interestingly, many group I and group II introns are also mobile genetic elements. In addition to their self- splicing activities, they encode DNA endonucleases that promote their movement. During genetic exchanges be- tween cells of the same species, or when DNA is intro- duced into a cell by parasites or by other means, these endonucleases promote insertion of the intron into an identical site in another DNA copy of a homologous gene that does not contain the intron, in a process termed homing (Fig. 26–34). Whereas group I intron homing is DNA-based, group II intron homing occurs through an RNA intermediate. The endonucleases of the group II introns have associated reverse transcriptase activity. The proteins can form complexes with the intron RNAs themselves, after the introns are spliced from the pri- mary transcripts. Because the homing process involves insertion of the RNA intron into DNA and reverse tran- scription of the intron, the movement of these introns has been called retrohoming. Over time, every copy of a particular gene in a population may acquire the intron. Chapter 26 RNA Metabolism1024 BOX 26–2 BIOCHEMISTRY IN MEDICINE Fighting AIDS with Inhibitors of HIV Reverse Transcriptase Research into the chemistry of template-dependent nucleic acid biosynthesis, combined with modern techniques of molecular biology, has elucidated the life cycle and structure of the human immunodeficiency virus, the retrovirus that causes AIDS. A few years af- ter the isolation of HIV, this research resulted in the development of drugs capable of prolonging the lives of people infected by HIV. The first drug to be approved for clinical use was AZT, a structural analog of deoxythymidine. AZT was first synthesized in 1964 by Jerome P. Horwitz. It failed as an anticancer drug (the purpose for which it was made), but in 1985 it was found to be a useful treat- ment for AIDS. AZT is taken up by T lymphocytes, immune system cells that are particularly vulnerable to HIV infection, and converted to AZT triphosphate. (AZT triphosphate taken directly would be ineffective, because it cannot cross the plasma membrane.) HIV’s reverse transcriptase has a higher affinity for AZT triphosphate than for dTTP, and binding of AZT triphosphate to this enzyme competitively inhibits dTTP binding. When AZT is added to the 3H11032 end of the growing DNA strand, lack of a 3H11032 hydroxyl means that the DNA strand is terminated prematurely and viral DNA synthesis grinds to a halt. AZT triphosphate is not as toxic to the T lym- phocytes themselves, because cellular DNA poly- merases have a lower affinity for this compound than for dTTP. At concentrations of 1 to 5 H9262M, AZT affects HIV reverse transcription but not most cellular DNA replication. Unfortunately, AZT appears to be toxic to the bone marrow cells that are the progenitors of ery- throcytes, and many individuals taking AZT develop anemia. AZT can increase the survival time of people with advanced AIDS by about a year, and it delays the onset of AIDS in those who are still in the early stages of HIV infection. Some other AIDS drugs, such as dideoxyinosine (DDI), have a similar mechanism of ac- tion. Newer drugs target and inactivate the HIV pro- tease. Because of the high error rate of HIV reverse transcriptase and the resulting rapid evolution of HIV, the most effective treatments of HIV infections use a combination of drugs directed at both the protease and the reverse transcriptase. FIGURE 26–33 Eukaryotic transposons. The Ty element of the yeast Saccharomyces and the copia element of the fruit fly Drosophila serve as examples of eukaryotic transposons, which often have a structure similar to retroviruses but lack the env gene. The H9254 sequences of the Ty element are functionally equivalent to retroviral LTRs. In the copia element, int and RT are homologous to the integrase and reverse tran- scriptase segments, respectively, of the pol gene. Ty element (Saccharomyces) LTR Copia element (Drosophila) LTR gag int RT? H9254H9254TYA TYB (gag)(LTR) (LTR)(pol) O N N N NH CH 3 HN NO O HOCH 2 H N H11001H11002 O H H H H O HOCH 2 H H H H H H NN 3H11032-Azido-2H11032,3H11032-dideoxy- thymidine (AZT) 2H11032,3H11032-Dideoxyinosine (DDI) 8885d_c26_995-1035 2/12/04 11:18 AM Page 1024 mac34 mac34: kec_420: Much more rarely, the intron may insert itself into a new location in an unrelated gene. If this event does not kill the host cell, it can lead to the evolution and distribu- tion of an intron in a new location. The structures and mechanisms used by mobile introns support the idea that at least some introns originated as molecular par- asites whose evolutionary past can be traced to retro- viruses and transposons. Telomerase Is a Specialized Reverse Transcriptase Telomeres, the structures at the ends of linear eukary- otic chromosomes (see Fig. 24–9), generally consist of many tandem copies of a short oligonucleotide se- quence. This sequence usually has the form T x G y in one strand and C y A x in the complementary strand, where x and y are typically in the range of 1 to 4 (p. 930). Telo- meres vary in length from a few dozen base pairs in some ciliated protozoans to tens of thousands of base pairs in mammals. The TG strand is longer than its complement, leaving a region of single-stranded DNA of up to a few hundred nucleotides at the 3H11032 end. The ends of a linear chromosome are not readily replicated by cellular DNA polymerases. DNA replica- tion requires a template and primer, and beyond the end of a linear DNA molecule no template is available for the pairing of an RNA primer. Without a special mechanism for replicating the ends, chromosomes would be short- ened somewhat in each cell generation. The enzyme telomerase solves this problem by adding telomeres to chromosome ends. 26.3 RNA-Dependent Synthesis of RNA and DNA 1025 homing endonuclease DNA for gene X, allele b, no intron Gene X, allele a with intron double-strand break repair a with intron b with intron (b) Homing Type II intron endonuclease transcription translationSpliced intron reverse splicing splicing RNA replaced by DNA, ligation b with intron (c) Retrohoming reverse transcriptase DNA for gene Y, allele a, donor Endonuclease/ reverse transcriptase DNA for gene Y, allele b, recipient transcription DNA for gene X, allele a Type I intron splicing Primary transcript translation Homing endonucleaseGene X product Spliced type I intron (a) Production of homing endonucleaseFIGURE 26–34 Introns that move: homing and retrohoming. Certain introns include a gene (shown in red) for enzymes that promote hom- ing (type I introns) or retrohoming (type II introns). (a) The gene within the spliced intron is bound by a ribosome and translated. Type I hom- ing introns specify a site-specific endonuclease, called a homing en- donuclease. Type II retrohoming introns specify a protein with both endonuclease and reverse transcriptase activities. (b) Homing. Allele a of a gene X containing a type I homing in- tron is present in a cell containing allele b of the same gene, which lacks the intron. The homing endonuclease produced by a cleaves b at the position corresponding to the intron in a, and double-strand break repair (recombination with allele a; see Fig. 25–31a) then cre- ates a new copy of the intron in b. (c) Retrohoming. Allele a of gene Y contains a retrohoming type II intron; allele b lacks the intron. The spliced intron inserts itself into the coding strand of b in a reaction that is the reverse of the splicing that excised the intron from the pri- mary transcript (see Fig. 26–15), except that here the insertion is into DNA rather than RNA. The noncoding DNA strand of b is then cleaved by the intron-encoded endonuclease/reverse transcriptase. This same enzyme uses the inserted RNA as a template to synthesize a comple- mentary DNA strand. The RNA is then degraded by cellular ribonu- cleases and replaced with DNA. 8885d_c26_995-1033 2/12/04 2:46 PM Page 1025 mac34 mac34: kec_420: Although the existence of this enzyme may not be surprising, the mechanism by which it acts is remark- able and unprecedented. Telomerase, like some other enzymes described in this chapter, contains both RNA and protein components. The RNA component is about 150 nucleotides long and contains about 1.5 copies of the appropriate C y A x telomere repeat. This region of the RNA acts as a template for synthesis of the T x G y strand of the telomere. Telomerase thereby acts as a cellular reverse transcriptase that provides the active site for RNA-dependent DNA synthesis. Unlike retroviral re- verse transcriptases, telomerase copies only a small segment of RNA that it carries within itself. Telomere synthesis requires the 3H11032 end of a chromosome as primer and proceeds in the usual 5H11032n3H11032 direction. Having syn- thesized one copy of the repeat, the enzyme repositions to resume extension of the telomere (Fig. 26–35a). After extension of the T x G y strand by telomerase, the complementary C y A x strand is synthesized by cel- lular DNA polymerases, starting with an RNA primer (see Fig. 25–13). The single-stranded region is pro- tected by specific binding proteins in many lower eu- karyotes, especially those species with telomeres of less than a few hundred base pairs. In higher eukaryotes (in- cluding mammals) with telomeres many thousands of base pairs long, the single-stranded end is sequestered in a specialized structure called a T loop. The single- stranded end is folded back and paired with its com- plement in the double-stranded portion of the telomere. The formation of a T loop involves invasion of the 3H11032 end Chapter 26 RNA Metabolism1026 (a) DNA 5H11032 3H11032 TTTTGGGGTTTTG 3H11032 5H11032 C U A G C CAAAACCCCAA AA C A A A OH(3H11032) Internal template RNA 5H11032 3H11032 TTTTGGGGTTTTGGGGTTT T 3H11032 5H11032 C U A G C CAAAACCCCAAA A C A A A OH(3H11032)G polymerization and hybridization 5H11032 3H11032 3H11032 5H11032 C U A G C CAAAACCCCAA A A C A A A OH(3H11032)G translocation and rehybridization Telomerase Further polymerization TTTTGGGGTTTTGGGGTTT T 2 1 3 FIGURE 26–35 The TG strand and T loop of telomeres. The internal template RNA of telomerase binds to and base-pairs with the DNA’s TG primer (TxGy). 1 Telomerase adds more T and G residues to the TG primer, then 2 repositions the internal template RNA to allow 3 the addition of more T and G residues. The complementary strand is synthesized by cellular DNA polymerases (not shown). (b) Proposed structure of T loops in telomeres. The single-stranded tail synthesized by telomerase is folded back and paired with its complement in the duplex portion of the telomere. The telomere is bound by several telomere-binding proteins, including TRF1 and TRF2 (telomere repeat binding factors). (c) Electron micrograph of a T loop at the end of a chromosome isolated from a mouse hepatocyte. The bar at the bot- tom of the micrograph represents a length of 5,000 bp. (b) 3H11032 5H11032 TRF1 and TRF2 TG strand Telomere duplex DNA- binding proteins CA strand (c) 8885d_c26_995-1033 2/12/04 2:46 PM Page 1026 mac34 mac34: kec_420: of the telomere’s single strand into the duplex DNA, per- haps by a mechanism similar to the initiation of homol- ogous genetic recombination (see Fig. 25–31). In mam- mals, the looped DNA is bound by two proteins, TRF1 and TRF2, with the latter protein involved in formation of the T loop. T loops protect the 3H11032 ends of chromo- somes, making them inaccessible to nucleases and the enzymes that repair double-strand breaks (Fig. 26–35b). In protozoans (such as Tetrahymena), loss of telomerase activity results in a gradual shortening of telomeres with each cell division, ultimately leading to the death of the cell line. A similar link between telo- mere length and cell senescence (cessation of cell divi- sion) has been observed in humans. In germ-line cells, which contain telomerase activity, telomere lengths are maintained; in somatic cells, which lack telomerase, they are not. There is a linear, inverse relationship between the length of telomeres in cultured fibroblasts and the age of the individual from whom the fibroblasts were taken: telomeres in human somatic cells gradually shorten as an individual ages. If the telomerase reverse transcriptase is introduced into human somatic cells in vitro, telomerase activity is restored and the cellular life span increases markedly. Is the gradual shortening of telomeres a key to the aging process? Is our natural life span determined by the length of the telomeres we are born with? Further research in this area should yield some fascinating insights. Some Viral RNAs Are Replicated by RNA-Dependent RNA Polymerase Some E. coli bacteriophages, including f2, MS2, R17, and QH9252, as well as some eukaryotic viruses (including influenza and Sindbis viruses, the latter associated with a form of encephalitis) have RNA genomes. The single- stranded RNA chromosomes of these viruses, which also function as mRNAs for the synthesis of viral proteins, are replicated in the host cell by an RNA-dependent RNA polymerase (RNA replicase). All RNA viruses—with the exception of retroviruses—must encode a protein with RNA-dependent RNA polymerase activity because the host cells do not possess this enzyme. The RNA replicase of most RNA bac- teriophages has a molecular weight of ~210,000 and consists of four subunits. One subunit (M r 65,000) is the product of the replicase gene encoded by the vi- ral RNA and has the active site for repli- cation. The other three subunits are host proteins normally involved in host-cell protein synthesis: the E. coli elongation factors Tu (M r 30,000) and Ts (M r 45,000) (which ferry amino acyl–tRNAs to the ribosomes) and the protein S1 (an inte- gral part of the 30S ribosomal subunit). These three host proteins may help the RNA replicase locate and bind to the 3H11032 ends of the viral RNAs. RNA replicase isolated from QH9252-infected E. coli cells catalyzes the formation of an RNA complementary to the viral RNA, in a reaction equivalent to that cat- alyzed by DNA-dependent RNA polymerases. New RNA strand synthesis proceeds in the 5H11032n3H11032 direction by a chemical mechanism identical to that used in all other nucleic acid synthetic reactions that require a template. RNA replicase requires RNA as its template and will not function with DNA. It lacks a separate proofreading en- donuclease activity and has an error rate similar to that of RNA polymerase. Unlike the DNA and RNA poly- merases, RNA replicases are specific for the RNA of their own virus; the RNAs of the host cell are generally not replicated. This explains how RNA viruses are pref- erentially replicated in the host cell, which contains many other types of RNA. RNA Synthesis Offers Important Clues to Biochemical Evolution The extraordinary complexity and order that distinguish living from inanimate systems are key manifestations of fundamental life processes. Maintaining the living state requires that selected chemical transformations occur very rapidly—especially those that use environmental energy sources and synthesize elaborate or specialized cellular macromolecules. Life depends on powerful and selective catalysts—enzymes—and on informational systems capable of both securely storing the blueprint for these enzymes and accurately reproducing the blue- print for generation after generation. Chromosomes en- code the blueprint not for the cell but for the enzymes that construct and maintain the cell. The parallel de- mands for information and catalysis present a classic co- nundrum: what came first, the information needed to specify structure or the enzymes needed to maintain and transmit the information? The unveiling of the structural and functional com- plexity of RNA led Carl Woese, Francis Crick, and Leslie Orgel to propose in the 1960s that this macromolecule might serve as both information carrier and catalyst. The discovery of catalytic RNAs took this proposal from 26.3 RNA-Dependent Synthesis of RNA and DNA 1027 Carl Woese Francis Crick Leslie Orgel 8885d_c26_995-1035 2/12/04 11:18 AM Page 1027 mac34 mac34: kec_420: conjecture to hypothesis and has led to widespread speculation that an “RNA world” might have been im- portant in the transition from prebiotic chemistry to life (see Fig. 1–34). The parent of all life on this planet, in the sense that it could reproduce itself across the gen- erations from the origin of life to the present, might have been a self-replicating RNA or a polymer with equivalent chemical characteristics. How might a self-replicating polymer come to be? How might it maintain itself in an environment where the precursors for polymer synthesis are scarce? How could evolution progress from such a polymer to the modern DNA-protein world? These difficult questions can be addressed by careful experimentation, providing clues about how life on Earth began and evolved. The probable origin of purine and pyrimidine bases is suggested by experiments designed to test hypothe- ses about prebiotic chemistry (pp. 32–33). Beginning with simple molecules thought to be present in the early atmosphere (CH 4 , NH 3 , H 2 O, H 2 ), electrical discharges such as lightning generate, first, more reactive mole- cules such as HCN and aldehydes, then an array of amino acids and organic acids (see Fig. 1–33). When molecules such as HCN become abundant, purine and pyrimidine bases are synthesized in detectable amounts. Remarkably, a concentrated solution of ammonium cyanide, refluxed for a few days, generates adenine in yields of up to 0.5% (Fig. 26–36). Adenine may well have been the first and most abundant nucleotide constituent to appear on Earth. Intriguingly, most enzyme cofactors contain adenosine as part of their structure, although it plays no direct role in the cofactor function (see Fig. 8–41). This may suggest an evolutionary relationship, based on the simple synthesis of adenine from cyanide. The RNA world hypothesis requires a nucleotide polymer to reproduce itself. Can a ribozyme bring about its own synthesis in a template-directed manner? The self-splicing rRNA intron of Tetrahymena (Fig. 26–26) catalyzes the reversible attack of a guanosine residue on the 5H11032 splice junction (Fig. 26–37). If the 5H11032 splice site and the internal guide sequence are removed from the intron, the rest of the intron can bind RNA strands paired with short oligonucleotides. Part of the remain- ing intact intron effectively acts as a template for the alignment and ligation of the short oligonucleotides. The reaction is in essence a reversal of the attack of guano- sine on the 5H11032 splice junction, but the result is the syn- thesis of long RNA polymers from short ones, with the sequence of the product defined by an RNA template. A self-replicating polymer would quickly use up available supplies of precursors provided by the rela- tively slow processes of prebiotic chemistry. Thus, from an early stage in evolution, metabolic pathways would be required to generate precursors efficiently, with the synthesis of precursors presumably catalyzed by ri- bozymes. The extant ribozymes found in nature have a limited repertoire of catalytic functions, and of the ri- bozymes that may once have existed, no trace is left. To explore the RNA world hypothesis more deeply, we need to know whether RNA has the potential to catalyze the many different reactions needed in a primitive system of metabolic pathways. The search for RNAs with new catalytic functions has been aided by the development of a method that rapidly searches pools of random polymers of RNA and extracts those with particular activities: SELEX is noth- ing less than accelerated evolution in a test tube (Box 26–3). It has been used to generate RNA molecules that bind to amino acids, organic dyes, nucleotides, cyano- cobalamin, and other molecules. Researchers have iso- lated ribozymes that catalyze ester and amide bond for- mation, S N 2 reactions, metallation of (addition of metal ions to) porphyrins, and carbon–carbon bond formation. The evolution of enzymatic cofactors with nucleotide “handles” that facilitate their binding to ribozymes might have further expanded the repertoire of chemical processes available to primitive metabolic systems. As we shall see in the next chapter, some natural RNA molecules catalyze the formation of peptide bonds, offering an idea of how the RNA world might have been transformed by the greater catalytic potential of pro- teins. The synthesis of proteins would have been a ma- jor event in the evolution of the RNA world, but would also have hastened its demise. The information- carrying role of RNA may have passed to DNA because DNA is chemically more stable. RNA replicase and re- verse transcriptase may be modern versions of enzymes that once played important roles in making the transi- tion to the modern DNA-based system. Molecular parasites may also have originated in an RNA world. With the appearance of the first inefficient self-replicators, transposition could have been a poten- tially important alternative to replication as a strategy for successful reproduction and survival. Early parasitic RNAs would simply hop into a self-replicating molecule via catalyzed transesterification, then passively undergo replication. Natural selection would have driven trans- position to become site-specific, targeting sequences that did not interfere with the catalytic activities of the Chapter 26 RNA Metabolism1028 N N H N NH 2 N C C C C CHCN (NH 4 CN) Reflux FIGURE 26–36 Possible prebiotic synthesis of adenine from ammo- nium cyanide. Adenine is derived from five molecules of cyanide, de- noted by shading. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1028 mac34 mac34: kec_420: host RNA. Replicators and RNA transposons could have existed in a primitive symbiotic relationship, each con- tributing to the evolution of the other. Modern introns, retroviruses, and transposons may all be vestiges of a “piggy-back” strategy pursued by early parasitic RNAs. These elements continue to make major contributions to the evolution of their hosts. Although the RNA world remains a hypothesis, with many gaps yet to be explained, experimental evidence supports a growing list of its key elements. Further ex- perimentation should increase our understanding. Im- portant clues to the puzzle will be found in the work- ings of fundamental chemistry, in living cells, and perhaps on other planets. 26.3 RNA-Dependent Synthesis of RNA and DNA 1029 Template RNA Complementary oligo-RNAs G GGAGUACCAC G G G AGUAGCAC CCUCAU GUGCCG UCAUC UGG O H G CAUGGU CCUCAGUCGUG GUACCA GGAGUCAAC UGAC G C G G U A C GU A U A U U C A C G A C U A A A U U U (a) (b) U G G Ribozyme P1 Internal guide sequence AU UGAC G C G U A C GU A U AA U U C A C G A C U A A A U U U U G G Cleaved ribozyme P1 U G 3H11032 5H11032 3H11032 5H11032 5H11032 FIGURE 26–37 RNA-dependent synthesis of an RNA polymer from oligonucleotide precursors. (a) The first step in the removal of the self- splicing group I intron of the rRNA precursor of Tetrahymena is re- versible attack of a guanosine residue on the 5H11032 splice site. Only P1, the region of the ribozyme that includes the internal guide sequence (boxed) and the 5H11032 splice site, is shown in detail; the rest of the ri- bozyme is represented as a green blob. The complete secondary struc- ture of the ribozyme is shown in Figure 26–26. (b) If P1 is removed (shown as the darker green “hole”), the ribozyme retains both its three- dimensional shape and its catalytic capacity. A new RNA molecule added in vitro can bind to the ribozyme in the same manner as does the internal guide sequence of P1 in (a). This provides a template for further RNA polymerization reactions when oligonucleotides com- plementary to the added RNA base-pair with it. The ribozyme can link these oligonucleotides in a process equivalent to the reversal of the reaction in (a). Although only one such reaction is shown in (b), re- peated binding and catalysis can result in the RNA-dependent syn- thesis of long RNA polymers. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1029 mac34 mac34: kec_420: BOX 26–3 WORKING IN BIOCHEMISTRY The SELEX Method for Generating RNA Polymers with New Functions SELEX (systematic evolution of ligands by exponen- tial enrichment) is used to generate aptamers, oligonucleotides selected to tightly bind a specific mo- lecular target. The process is generally automated to allow rapid identification of one or more aptamers with the desired binding specificity. Figure 1 illustrates how SELEX is used to select an RNA species that binds tightly to ATP. In step 1 , a random mixture of RNA polymers is subjected to “unnatural selection” by passing it through a resin to which ATP is attached. The practical limit for the com- plexity of an RNA mixture in SELEX is about 10 15 dif- ferent sequences, which allows for the complete ran- domization of 25 nucleotides (4 25 H11005 10 15 ). When longer RNAs are used, the RNA pool used to initiate the search does not include all possible sequences. 2 RNA polymers that pass through the column are discarded; 3 those that bind to ATP are washed from the column with salt solution and collected. 4 The collected RNA polymers are amplified by reverse tran- scriptase to make many DNA complements to the se- lected RNAs; then an RNA polymerase makes many RNA complements of the resulting DNA molecules. 5 This new pool of RNA is subjected to the same se- lection procedure, and the cycle is repeated a dozen or more times. At the end, only a few aptamers, in this case RNA sequences with considerable affinity for ATP, remain. Critical sequence features of an RNA aptamer that binds ATP are shown in Figure 2; molecules with this general structure bind ATP (and other adenosine nu- cleotides) with K d H11021 50 H9262M. Figure 3 presents the three-dimensional structure of a 36 nucleotide RNA aptamer (shown as a complex with AMP) generated by SELEX. This RNA has the backbone structure shown in Figure 2. In addition to its use in exploring the potential functionality of RNA, SELEX has an important practical side in identifying short RNAs with pharmaceutical uses. Finding an aptamer that binds specifically to every potential therapeutic target may be impossible, but the capacity of SELEX to rapidly select and amplify a specific oligonucleotide sequence from a highly complex pool of sequences makes this a promising approach for the generation of new ther- apies. For example, one could select an RNA that binds tightly to a receptor protein prominent in the plasma membrane of cells in a particular cancerous tumor. Blocking the activity of the receptor, or tar- geting a toxin to the tumor cells by attaching it to the aptamer, would kill the cells. SELEX also has been used to select DNA aptamers that detect anthrax spores. Many other promising applications are under development. ■ G G A A A A A C G G U G 5H11032 3H11032 ATP 10 15 random RNA sequences RNA sequences that do not bind ATP (discard) RNA sequences that bind ATP RNA sequences enriched for ATP-binding function amplify ATP coupled to resin 1 5 2 3 4 repeat 3H11032 5H11032 FIGURE 1 The SELEX procedure. FIGURE 2 RNA aptamer that binds ATP. The shaded nucleotides are those required for the binding activity. FIGURE 3 (Derived from PDB ID 1RAW.) RNA aptamer bound to AMP. The bases of the conserved nucleotides (forming the binding pocket) are white; the bound AMP is red. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1030 mac34 mac34: kec_420: SUMMARY 26.3 RNA-Dependent Synthesis of RNA and DNA ■ RNA-dependent DNA polymerases, also called reverse transcriptases, were first discovered in retroviruses, which must convert their RNA genomes into double-stranded DNA as part of their life cycle. These enzymes transcribe the viral RNA into DNA, a process that can be used experimentally to form complementary DNA. ■ Many eukaryotic transposons are related to retroviruses, and their mechanism of transposition includes an RNA intermediate. ■ Telomerase, the enzyme that synthesizes the telomere ends of linear chromosomes, is a specialized reverse transcriptase that contains an internal RNA template. ■ RNA-dependent RNA polymerases, such as the replicases of RNA bacteriophages, are template-specific for the viral RNA. ■ The existence of catalytic RNAs and pathways for the interconversion of RNA and DNA has led to speculation that an important stage in evolution was the appearance of an RNA (or an equivalent polymer) that could catalyze its own replication. The biochemical potential of RNAs can be explored by SELEX, a method for rapidly selecting RNA sequences with particular binding or catalytic properties. Chapters 26 Further Reading 1031 Key Terms transcription 995 messenger RNA (mRNA) 995 transfer RNA (tRNA) 995 ribosomal RNA (rRNA) 995 DNA-dependent RNA polymerase 996 promoter 998 consensus sequence 998 cAMP receptor protein (CRP) 1001 repressor 1001 footprinting 1002 transcription factors 1003 ribozymes 1007 primary transcript 1007 RNA splicing 1007 5H11032 cap 1008 spliceosome 1010 poly(A) tail 1011 reverse transcriptase 1021 retrovirus 1021 complementary DNA (cDNA) 1022 homing 1024 telomerase 1025 RNA-dependent RNA polymerase (RNA replicase) 1027 aptamer 1030 Terms in bold are defined in the glossary. Further Reading General Jacob, F. & Monod, J. (1961) Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356. A classic article that introduced many important ideas. Lodish, H., Berk, A., Matsudaira, P., Kaiser, C.A., Krieger, M., Scott, M.P., Zipursky, S.L., & Darnell, J. (2003) Molecular Cell Biology, 5th edn, W. H. Freeman & Company, New York. DNA-Directed RNA Synthesis Conaway, J.W. & Conaway, R.C. (1999) Transcription elongation and human disease. Annu. Rev. Biochem. 68, 301–320. Conaway, J.W., Shilatifard, A., Dvir, A., & Conaway, R.C. (2000) Control of elongation by RNA polymerase II. Trends Biochem. Sci. 25, 375–380. A particularly good summary of what is known about elongation factors. DeHaseth, P.L., Zupancic, M.L., & Record, M.T., Jr. (1998) RNA polymerase-promoter interactions: the comings and goings of RNA polymerase. J. Bacteriol. 180, 3019–3025. Friedberg, E.C. (1996) Relationships between DNA repair and transcription. Annu. Rev. Biochem. 65, 15–42. Kornberg, R.D. (1996) RNA polymerase II transcription control. Trends Biochem. Sci. 21, 325–327. Introduction to an issue of Trends in Biochemical Sciences that is devoted to RNA polymerase II. Mooney, R.A., Artsimovitch, I., & Landick, R. (1998) Informational processing by RNA polymerase: recognition of regulatory signals during RNA chain elongation. J. Bacteriol. 180, 3265–3275. Murakami, K.S. & Darst, S.A. (2003) Bacterial RNA polymerases: the wholo story. Curr. Opin. Struct. Biol. 13, 31–39. This article and the two listed below explore the wealth of new structural information and what it tells us about RNA polymerase function. Woychik, N.A. & Hampsey, M. (2002) The RNA polymerase II machinery: structure illuminates function. Cell 108, 453–463. Young, B.A., Gruber, T.M., & Gross, C.A. (2002) Views of transcription initiation. Cell 109, 417–420. RNA Processing Beelman, C.A. & Parker, R. (1995) Degradation of mRNA in eukaryotes. Cell 81, 179–183. 8885d_c26_995-1035 2/12/04 11:18 AM Page 1031 mac34 mac34: kec_420: Chapter 26I RNA Metabolism1032 Brow, D.A. (2002) Allosteric cascade of spliceosome activation. Annu. Rev. Genet. 36, 333–360. Chevalier, B.S. & Stoddard, B.L. (2001) Homing endonucleases: structural and functional insight into the catalysts of intron/intein mobility. Nucleic Acid Res. 29, 3757–3774. Curcio, M.J. & Belfort, M. (1996) Retrohoming: cDNA-mediated mobility of group II introns requires a catalytic RNA. Cell 84, 9–12. Frank, D.N. & Pace, N.R. (1998) Ribonuclease P: unity and diversity in a tRNA-processing ribozyme. Annu. Rev. Biochem. 67, 153–180. Jensen, T.H., Dower, K., Libri, D., & Rosbash, M. (2003) Early formation of mRNP: license for export or quality control? Mol. Cell 11, 1129–1138. A good summary of current ideas about the coupled processing and transport of eukaryotic mRNAs. Kushner, S.R. (2002) mRNA decay in Escherichia coli comes of age. J. Bacteriol. 184, 4658–4665. Narlikar, G.J. & Herschlag, D. (1997) Mechanistic aspects of enzymatic catalysis: lessons from comparison of RNA and protein enzymes. Annu. Rev. Biochem. 66, 19–59. Proudfoot, N.J., Furger, A., & Dye, M.J. (2002) Integrating mRNA processing with transcription. Cell 108, 501–512. A description of current evidence for how processing is linked to the CTD of RNA polymerase II. Sarkar, N. (1997) Polyadenylation of mRNA in prokaryotes. Annu. Rev. Biochem. 66, 173–197. Staley, J.P. & Guthrie, C. (1988) Mechanical devices of the spliceosome—motors, clocks, springs, and things. Cell 92, 315–326. RNA-Directed RNA or DNA Synthesis Bishop, J.M. (1991) Molecular themes in oncogenesis. Cell 64, 235–248. A good overview of oncogenes; it introduces a series of more detailed reviews included in the same issue of Cell. Blackburn, E.H. (1992) Telomerases. Annu. Rev. Biochem. 61, 113–129. Boeke, J.D. & Devine, S.E. (1998) Yeast retrotransposons: finding a nice, quiet neighborhood. Cell 93, 1087–1089. Collins, K. (1999) Ciliate telomerase biochemistry. Annu. Rev. Biochem. 68, 187–218. Frankel, A.D. & Young, J.A.T. (1998) HIV-1: fifteen proteins and an RNA. Annu. Rev. Biochem. 67, 1–25. Greider, C.W. (1996) Telomere length regulation. Annu. Rev. Biochem. 65, 337–365. Griffith, J.D., Comeau, L., Rosenfield, S., Stansel, R.M., Bianchi, A., Moss, H., & de Lange, T. (1999) Mammalian telomeres end in a large duplex loop. Cell 97, 503–514. Lingner, J. & Cech, T.R. (1998) Telomerase and chromosome end maintenance. Curr. Opin. Genet. Dev. 8, 226–232. Temin, H.M. (1976) The DNA provirus hypothesis: the establishment and implications of RNA-directed DNA synthesis. Science 192, 1075–1080. Discussion of the original proposal for reverse transcription in retroviruses. Zakian, V.A. (1995) Telomeres: beginning to understand the end. Science 270, 1601–1607. Ribozymes and Evolution Bittker, J.A., Phillips, K.J., & Liu, D.R. (2002) Recent advances in the in vitro evolution of nucleic acids. Curr. Opin. Chem. Biol. 6, 367–374. DeRose, V.J. (2002) Two decades of RNA catalysis. Chem. Biol. 9, 961–969. Johnston, W.K., Unrau, P.J., Lawrence, M.S., Glasner, M.E., & Bartel, D.P. (2001) RNA-catalyzed RNA polymerization: accurate and general RNA-templated primer extension. Science 292, 1319–1325. Review of progress toward the laboratory evolution of a self-replicating RNA. Joyce, G.F. (2002) The antiquity of RNA-based evolution. Nature 418, 214–221. Wilson, D.S. & Szostak, J.W. (1999) In vitro selection of functional nucleic acids. Annu. Rev. Biochem. 68, 611–648. Yarus, M. (2002) Primordial genetics: phenotype of the ribocyte. Annu. Rev. Genet. 36, 125–151. Detailed speculations about what an RNA-based life form might have been like, and a good summary of the research behind the speculations. 1. RNA Polymerase (a) How long would it take for the E. coli RNA polymerase to synthesize the primary transcript for the E. coli genes encoding the enzymes for lactose me- tabolism (the 5,300 bp lac operon, considered in Chapter 28)? (b) How far along the DNA would the transcription “bubble” formed by RNA polymerase move in 10 seconds? 2. Error Correction by RNA Polymerases DNA poly- merases are capable of editing and error correction, whereas the capacity for error correction in RNA polymerases appears to be quite limited. Given that a single base error in either replication or transcription can lead to an error in protein synthesis, suggest a possible biological explanation for this striking difference. 3. RNA Posttranscriptional Processing Predict the likely effects of a mutation in the sequence (5H11032)AAUAAA in a eukaryotic mRNA transcript. 4. Coding versus Template Strands The RNA genome of phage QH9252 is the nontemplate or coding strand, and when introduced into the cell it functions as an mRNA. Suppose the RNA replicase of phage QH9252 synthesized primarily template-strand RNA and uniquely incorporated this, rather than nontemplate strands, into the viral particles. What would be the fate of the template strands when they entered a new cell? What enzyme would such a template-strand virus need to include in the viral particles for successful invasion of a host cell? Problems 8885d_c26_995-1035 2/12/04 11:18 AM Page 1032 mac34 mac34: kec_420: Chapters 26 Problems 1033 5. The Chemistry of Nucleic Acid Biosynthesis De- scribe three properties common to the reactions catalyzed by DNA polymerase, RNA polymerase, reverse transcriptase, and RNA replicase. How is the enzyme polynucleotide phos- phorylase similar to and different from these three enzymes? 6. RNA Splicing What is the minimum number of trans- esterification reactions needed to splice an intron from an mRNA transcript? Explain. 7. RNA Genomes The RNA viruses have relatively small genomes. For example, the single-stranded RNAs of retro- viruses have about 10,000 nucleotides and the QH9252 RNA is only 4,220 nucleotides long. Given the properties of reverse tran- scriptase and RNA replicase described in this chapter, can you suggest a reason for the small size of these viral genomes? 8. Screening RNAs by SELEX The practical limit for the number of different RNA sequences that can be screened in a SELEX experiment is 10 15 . (a) Suppose you are work- ing with oligonucleotides 32 nucleotides in length. How many sequences exist in a randomized pool containing every se- quence possible? (b) What percentage of these can be screened in a SELEX experiment? (c) Suppose you wish to select an RNA molecule that catalyzes the hydrolysis of a par- ticular ester. From what you know about catalysis (Chapter 6), propose a SELEX strategy that might allow you to select the appropriate catalyst. 9. Slow Death The death cap mushroom, Amanita phal- loides, contains several dangerous substances, including the lethal H9251-amanitin. This toxin blocks RNA elongation in con- sumers of the mushroom by binding to eukaryotic RNA poly- merase II with very high affinity; it is deadly in concentra- tions as low as 10 H110028 M. The initial reaction to ingestion of the mushroom is gastrointestinal distress (caused by some of the other toxins). These symptoms disappear, but about 48 hours later, the mushroom-eater dies, usually from liver dysfunc- tion. Speculate on why it takes this long for H9251-amanitin to kill. 10. Detection of Rifampicin-Resistant Strains of Tu- berculosis Rifampicin is an important antibiotic used to treat tuberculosis, as well as other mycobacterial diseases. Some strains of Mycobacterium tuberculosis, the causative agent of tuberculosis, are resistant to rifampicin. These strains become resistant through mutations that alter the rpoB gene, which encodes the H9252 subunit of the RNA poly- merase. Rifampicin cannot bind to the mutant RNA polymerase and so is unable to block the initiation of tran- scription. DNA sequences from a large number of rifampicin- resistant M. tuberculosis strains have been found to have mutations in a specific 69 bp region of rpoB. One well- characterized strain with rifampicin resistance has a single base pair alteration in rpoB that results in a single amino acid substitution in the H9252 subunit: a His residue is replaced by an Asp residue. (a) Based on your knowledge of protein chemistry (Chapters 3 and 4), suggest a technique that would allow de- tection of the rifampicin-resistant strain containing this par- ticular mutant protein. (b) Based on your knowledge of nucleic acid chemistry (Chapter 8), suggest a technique to identify the mutant form of rpoB. Biochemistry on the Internet 11. The Ribonuclease Gene Human pancreatic ribonu- clease has 128 amino acid residues. (a) What is the minimum number of nucleotide pairs re- quired to code for this protein? (b) The mRNA expressed in human pancreatic cells was copied with reverse transcriptase to create a “library” of hu- man DNA. The sequence of the mRNA coding for human pan- creatic ribonuclease was determined by sequencing the com- plementary DNA (cDNA) from this library that included an open reading frame for the protein. Use the Entrez database system (www.ncbi.nlm.nih.gov/Entrez) to find the published sequence of this mRNA (search the nucleotide database for accession number D26129). What is the length of this mRNA? (c) How can you account for the discrepancy between the size you calculated in (a) and the actual length of the mRNA? 8885d_c26_995-1033 2/12/04 2:46 PM Page 1033 mac34 mac34: kec_420: chapter P roteins are the end products of most information pathways. A typical cell requires thousands of dif- ferent proteins at any given moment. These must be synthesized in response to the cell’s current needs, transported (targeted) to their appropriate cellular lo- cations, and degraded when no longer needed. An understanding of protein synthesis, the most complex biosynthetic process, has been one of the great- est challenges in biochemistry. Eukaryotic protein syn- thesis involves more than 70 different ribosomal pro- teins; 20 or more enzymes to activate the amino acid precursors; a dozen or more auxiliary enzymes and other protein factors for the initiation, elongation, and termi- nation of polypeptides; perhaps 100 additional enzymes for the final processing of different proteins; and 40 or more kinds of transfer and ribosomal RNAs. Overall, al- most 300 different macromolecules cooperate to syn- thesize polypeptides. Many of these macromolecules are organized into the complex three-dimensional structure of the ribosome. To appreciate the central importance of protein syn- thesis, consider the cellular resources devoted to this process. Protein synthesis can account for up to 90% of the chemical energy used by a cell for all biosynthetic reactions. Every prokaryotic and eukaryotic cell con- tains from several to thousands of copies of many dif- ferent proteins and RNAs. The 15,000 ribosomes, 100,000 molecules of protein synthesis–related protein factors and enzymes, and 200,000 tRNA molecules in a typical bacterial cell can account for more than 35% of the cell’s dry weight. Despite the great complexity of protein synthesis, proteins are made at exceedingly high rates. A polypep- tide of 100 residues is synthesized in an Escherichia coli cell (at 37 H11034C) in about 5 seconds. Synthesis of the thousands of different proteins in a cell is tightly regu- lated, so that just enough copies are made to match the current metabolic circumstances. To maintain the ap- propriate mix and concentration of proteins, the tar- geting and degradative processes must keep pace with synthesis. Research is gradually uncovering the finely coordinated cellular choreography that guides each pro- tein to its proper cellular location and selectively de- grades it when it is no longer required. The study of protein synthesis offers another im- portant reward: a look at a world of RNA catalysts that may have existed before the dawn of life “as we know it.” Researchers have elucidated the structure of bacte- rial ribosomes, revealing the workings of cellular pro- tein synthesis in beautiful molecular detail. And what did they find? Proteins are synthesized by a gigantic RNA enzyme! 27.1 The Genetic Code Three major advances set the stage for our present knowledge of protein biosynthesis. First, in the early 1950s, Paul Zamecnik and his colleagues designed a set of experiments to investigate where in the cell proteins are synthesized. They injected radioactive amino acids into rats and, at different time intervals after the injec- 27 1034 PROTEIN METABOLISM 27.1 The Genetic Code 1034 27.2 Protein Synthesis 1044 27.3 Protein Targeting and Degradation 1068 Obviously, Harry [Noller]’s finding doesn’t speak to how life started, and it doesn’t explain what came before RNA. But as part of the continually growing body of circumstantial evidence that there was a life form before us on this planet, from which we emerged—boy, it’s very strong! —Gerald Joyce, quoted in commentary in Science, 1992 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1034 mac76 mac76:385_reb: tion, removed the liver, ho- mogenized it, fractionated the homogenate by centrifuga- tion, and examined the sub- cellular fractions for the pres- ence of radioactive protein. When hours or days were al- lowed to elapse after injection of the labeled amino acids, all the subcellular fractions con- tained labeled proteins. How- ever, when only minutes had elapsed, labeled protein ap- peared only in a fraction containing small ribonucleo- protein particles. These particles, visible in animal tis- sues by electron microscopy, were therefore identified as the site of protein synthesis from amino acids, and later were named ribosomes (Fig. 27–1). The second key advance was made by Mahlon Hoagland and Zamecnik, when they found that amino acids were “activated” when incubated with ATP and the cytosolic fraction of liver cells. The amino acids became attached to a heat-stable soluble RNA of the type that had been discovered and characterized by Robert Holley and later called transfer RNA (tRNA), to form aminoacyl-tRNAs. The enzymes that catalyze this process are the aminoacyl-tRNA synthetases. The third advance resulted from Francis Crick’s rea- soning on how the genetic information encoded in the 4- letter language of nucleic acids could be translated into the 20-letter language of proteins. A small nucleic acid (perhaps RNA) could serve the role of an adaptor, one part of the adaptor molecule binding a specific amino acid and another part recognizing the nucleotide sequence encoding that amino acid in an mRNA (Fig. 27–2). This idea was soon verified. The tRNA adaptor “translates” the nucleotide sequence of an mRNA into the amino acid sequence of a polypeptide. The overall process of mRNA-guided protein synthesis is often referred to sim- ply as translation. These three developments soon led to recognition of the major stages of protein synthesis and ultimately to the elucidation of the genetic code that specifies each amino acid. The Genetic Code Was Cracked Using Artificial mRNA Templates By the 1960s it had long been apparent that at least three nucleotide residues of DNA are necessary to en- code each amino acid. The four code letters of DNA (A, T, G, and C) in groups of two can yield only 4 2 H11005 16 dif- ferent combinations, insufficient to encode 20 amino acids. Groups of three, however, yield 4 3 H11005 64 different combinations. Several key properties of the genetic code were es- tablished in early genetic studies (Figs 27–3, 27–4). A codon is a triplet of nucleotides that codes for a spe- cific amino acid. Translation occurs in such a way that these nucleotide triplets are read in a successive, nonoverlapping fashion. A specific first codon in the 27.1 The Genetic Code 1035 Cytosol ER lumen Ribosomes FIGURE 27–1 Ribosomes and endoplasmic reticulum. Electron mi- crograph and schematic drawing of a portion of a pancreatic cell, showing ribosomes attached to the outer (cytosolic) face of the endo- plasmic reticulum (ER). The ribosomes are the numerous small dots bordering the parallel layers of membranes. Paul Zamecnik FIGURE 27–2 Crick’s adaptor hypothesis. Today we know that the amino acid is covalently bound at the 3H11032 end of a tRNA molecule and that a specific nucleotide triplet elsewhere in the tRNA interacts with a particular triplet codon in mRNA through hydrogen bonding of com- plementary bases. GAUC Amino acid U C G GG AUAUC mRNA HC R O H11002 C H 3 N H11001 Amino acid binding site Adaptor Nucleotide triplet coding for an amino acid O 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1035 mac76 mac76:385_reb: sequence establishes the reading frame, in which a new codon begins every three nucleotide residues. There is no punctuation between codons for successive amino acid residues. The amino acid sequence of a pro- tein is defined by a linear sequence of contiguous triplets. In principle, any given single-stranded DNA or mRNA sequence has three possible reading frames. Each reading frame gives a different sequence of codons (Fig. 27–5), but only one is likely to encode a given pro- tein. A key question remained: what were the three- letter code words for each amino acid? In 1961 Marshall Nirenberg and Heinrich Matthaei re- ported the first breakthrough. They incubated synthetic polyuridylate, poly(U), with an E. coli extract, GTP, ATP, and a mixture of the 20 amino acids in 20 different tubes, each tube containing a different radioactively labeled amino acid. Because poly(U) mRNA is made up of many successive UUU triplets, it should promote the synthesis of a polypeptide containing only the amino acid encoded by the triplet UUU. A radioac- tive polypeptide was indeed formed in only one of the 20 tubes, the one containing ra- dioactive phenylalanine. Niren- berg and Matthaei therefore concluded that the triplet codon UUU encodes phenyl- alanine. The same approach re- vealed that polycytidylate, poly(C), encodes a polypep- tide containing only proline (polyproline), and polyadeny- late, poly(A), encodes polylysine. Polyguanylate did not generate any polypeptide in this experiment because it spontaneously forms tetraplexes (see Fig. 8–22) that can- not be bound by ribosomes. The synthetic polynucleotides used in such exper- iments were prepared with polynucleotide phosphory- lase (p. 1020), which catalyzes the formation of RNA polymers starting from ADP, UDP, CDP, and GDP. This enzyme requires no template and makes polymers with a base composition that directly reflects the relative concentrations of the nucleoside 5H11032-diphosphate pre- cursors in the medium. If polynucleotide phosphorylase is presented with UDP only, it makes only poly(U). If it is presented with a mixture of five parts ADP and one part CDP, it makes a polymer in which about five-sixths of the residues are adenylate and one-sixth are cytidy- late. This random polymer is likely to have many triplets of the sequence AAA, smaller numbers of AAC, ACA, and CAA triplets, relatively few ACC, CCA, and CAC triplets, and very few CCC triplets (Table 27–1). Using a variety of artificial mRNAs made by polynucleotide phosphorylase from different starting mixtures of ADP, GDP, UDP, and CDP, investigators soon identified the base compositions of the triplets coding for almost all the amino acids. Although these experiments revealed the base composition of the coding triplets, they could not reveal the sequence of the bases. Chapter 27 Protein Metabolism1036 Nonoverlapping code AUA C G A G U C 123 Overlapping code A U A C G A G U C 1 2 3 FIGURE 27–4 The triplet, nonoverlapping code. Evidence for the general nature of the genetic code came from many types of experiments, including genetic experiments on the effects of deletion and insertion mutations. Inserting or deleting one base pair (shown here in the mRNA transcript) alters the sequence of triplets in a nonoverlapping code; all amino acids coded by the mRNA following the change are affected. Combining insertion and deletion mutations affects some amino acids but can eventually restore the correct amino acid sequence. Adding or subtracting three nucleotides (not shown) leaves the remaining triplets intact, providing evidence that a codon has three, rather than four or five, nucleotides. The triplet codons shaded in gray are those transcribed from the original gene; codons shaded in blue are new codons resulting from the insertion or deletion mutations. FIGURE 27–3 Overlapping versus nonoverlapping genetic codes. In a nonoverlapping code, codons (numbered consecutively) do not share nucleotides. In an overlapping code, some nucleotides in the mRNA are shared by different codons. In a triplet code with maxi- mum overlap, many nucleotides, such as the third nucleotide from the left (A), are shared by three codons. Note that in an overlapping code, the triplet sequence of the first codon limits the possible sequences for the second codon. A nonoverlapping code provides much more flexibility in the triplet sequence of neighboring codons and therefore in the possible amino acid sequences designated by the code. The ge- netic code used in all living systems is now known to be nonover- lapping. mRNA 5H11032 Insertion Deletion GUA G C C U A C G G A U 3H11032 GUA G C C U C A C G G A U GUA C C U A C G G A U Insertion and deletion GUA A G C C A C G G A U (H11545) (H11546) (H11545) (H11546) Reading frame restored Marshall Nirenberg 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1036 mac76 mac76:385_reb: In 1964 Nirenberg and Philip Leder achieved an- other experimental breakthrough. Isolated E. coli ribo- somes would bind a specific aminoacyl-tRNA in the presence of the corresponding synthetic polynucleotide messenger. (By convention, the identity of a tRNA is in- dicated by a superscript, such as tRNA Ala , and the aminoacylated tRNA by a hyphenated name: alanyl- tRNA Ala or Ala-tRNA Ala .) For example, ribosomes incu- bated with poly(U) and phenylalanyl-tRNA Phe (Phe- tRNA Phe ) bind both RNAs, but if the ribosomes are incubated with poly(U) and some other aminoacyl- tRNA, the aminoacyl-tRNA is not bound, because it does not recognize the UUU triplets in poly(U) (Table 27–2). Even trinucleotides could promote specific binding of appropriate tRNAs, so these experiments could be car- ried out with chemically synthesized small oligonu- cleotides. With this technique researchers determined which aminoacyl-tRNA bound to about 50 of the 64 pos- sible triplet codons. For some codons, either no amino- acyl-tRNA or more than one would bind. Another method was needed to complete and confirm the entire genetic code. 27.1 The Genetic Code 1037 Expected frequency Observed Tentative assignment of incorporation frequency of for nucleotide based on incorporation composition * of assignment Amino acid (Lys = 100) corresponding codon (Lys = 100) Asparagine 24 A 2 C20 Glutamine 24 A 2 Histidine 6 AC 2 4 Lysine 100 AAA 100 Proline 7 AC 2 , CCC 4.8 Threonine 26 A 2 C, AC 2 24 Note: Presented here is a summary of data from one of the early experiments designed to elucidate the genetic code. A synthetic RNA contain- ing only A and C residues in a 5:1 ratio directed polypeptide synthesis, and both the identity and the quantity of incorporated amino acids were determined. Based on the relative abundance of A and C residues in the synthetic RNA, and assigning the codon AAA (the most likely codon) a frequency of 100, there should be three different codons of composition A 2 C, each at a relative frequency of 20; three of composition AC 2 , each at a relative frequency of 4.0; and CCC at a relative frequency of 0.8. The CCC assignment was based on information derived from prior studies with poly(C). Where two tentative codon assignments are made, both are proposed to code for the same amino acid. *These designations of nucleotide composition contain no information on nucleotide sequence (except, of course, AAA and CCC). TABLE 27–1 Incorporation of Amino Acids into Polypeptides in Response to Random Polymers of RNA Reading frame 1 5H11032 UUC U C G G A C C U G 3H11032GA G AUU CAC AGU Reading frame 2 U U C U C G G A C C GUGG AGA UUC ACA Reading frame 3 U U C U C G G A C C U UGGA GA U UC A CAG U FIGURE 27–5 Reading frames in the genetic code. In a triplet, nonoverlapping code, all mRNAs have three potential reading frames, shaded here in different colors. The triplets, and hence the amino acids specified, are different in each reading frame. Relative increase in 14 C-labeled aminoacyl-tRNA bound to ribosome* Trinucleotide Phe-tRNA Phe Lys-tRNA Lys Pro-tRNA Pro UUU 4.6 0 0 AAA 0 7.7 0 CCC 0 0 3.1 Source: Modified from Nirenberg, M. & Leder, P. (1964) RNA code words and protein synthesis. Science 145, 1399. * Each number represents the factor by which the amount of bound 14 C increased when the indicated trinucleotide was present, relative to a control with no trinucleotide. TABLE 27–2 Trinucleotides That Induce Specific Binding of Aminoacyl-tRNAs to Ribosomes+ 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1037 mac76 mac76:385_reb: At about this time, a com- plementary approach was pro- vided by H. Gobind Khorana, who developed chemical methods to synthesize poly- ribonucleotides with defined, repeating sequences of two to four bases. The polypeptides produced by these mRNAs had one or a few amino acids in repeating patterns. These patterns, when combined with information from the random polymers used by Nirenberg and colleagues, permitted unambiguous codon assignments. The copolymer (AC) n , for example, has alternating ACA and CAC codons: ACACACACACACACA. The polypeptide syn- thesized on this messenger contained equal amounts of threonine and histidine. Given that a histidine codon has one A and two Cs (Table 27–1), CAC must code for his- tidine and ACA for threonine. Consolidation of the results from many experiments permitted the assignment of 61 of the 64 possible codons. The other three were identified as termination codons, in part because they disrupted amino acid coding patterns when they occurred in a synthetic RNA polymer (Fig. 27–6). Meanings for all the triplet codons (tabulated in Fig. 27–7) were established by 1966 and have been verified in many different ways. The cracking of the genetic code is regarded as one of the most important scientific discoveries of the twentieth century. Codons are the key to the translation of genetic in- formation, directing the synthesis of specific proteins. The reading frame is set when translation of an mRNA molecule begins, and it is maintained as the synthetic machinery reads sequentially from one triplet to the next. If the initial reading frame is off by one or two bases, or if translation somehow skips a nucleotide in the mRNA, all the subsequent codons will be out of reg- ister; the result is usually a “missense” protein with a garbled amino acid sequence. There are a few unusual but interesting exceptions to this rule (Box 27–1). Several codons serve special functions (Fig. 27–7). The initiation codon AUG is the most common signal for the beginning of a polypeptide in all cells (some rare alternatives are discussed in Box 27–2), in addition to coding for Met residues in internal positions of polypep- tides. The termination codons (UAA, UAG, and UGA), also called stop codons or nonsense codons, normally signal the end of polypeptide synthesis and do not code for any known amino acids. As described in Section 27.2, initiation of protein synthesis in the cell is an elaborate process that relies on initiation codons and other signals in the mRNA. In retrospect, the experiments of Nirenberg and Khorana to identify codon function should not have worked in the absence of initiation codons. Serendipitously, experi- mental conditions caused the normal initiation require- Chapter 27 Protein Metabolism1038 UA A GUAAGU AA G AA AGUA UG GUA A G U A A G U A A U A A A 3H11032AGU G Reading frame 1 5H11032 Reading frame 2 AAGUA AGUAAGU AGUA AA U Reading frame 3 G FIGURE 27–6 Effect of a termination codon in a repeating tetranucleotide. Termination codons (pink) are encountered every fourth codon in three different reading frames (shown in different colors). Dipeptides or tripeptides are synthesized, depending on where the ribosome initially binds. H. Gobind Khorana UUU UUC UUA UUG Phe Phe Leu Leu UCU UCC UCA UCG Ser Ser Ser Ser UAU UAC UAA UAG Tyr Tyr Stop Stop UGU UGC UGA UGG Cys Cys Stop Trp CUU CUC CUA CUG Leu Leu Leu Leu CCU CCC CCA CCG Pro Pro Pro Pro CAU CAC CAA CAG His His Gln Gln CGU CGC CGA CGG Arg Arg Arg Arg AUU AUC AUA AUG Ile Ile Ile Met ACU ACC ACA ACG Thr Thr Thr Thr AAU AAC AAA AAG Asn Asn Lys Lys AGU AGC AGA AGG Ser Ser Arg Arg GUU GUC GUA GUG Val Val Val Val GCU GCC GCA GCG Ala Ala Ala Ala GAU GAC GAA GAG Asp Asp Glu Glu Gly Gly Gly Gly GGU GGC GGA GGG U C A G U CAG Second letter of codon First letter of codon (5H11032 end) FIGURE 27–7 ”Dictionary” of amino acid code words in mRNAs. The codons are written in the 5H11032n3H11032 direction. The third base of each codon (in bold type) plays a lesser role in specifying an amino acid than the first two. The three termination codons are shaded in pink, the initiation codon AUG in green. All the amino acids except me- thionine and tryptophan have more than one codon. In most cases, codons that specify the same amino acid differ only at the third base. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1038 mac76 mac76:385_reb: ments for protein synthesis to be relaxed. Diligence combined with chance to produce a breakthrough—a common occurrence in the history of biochemistry. In a random sequence of nucleotides, 1 in every 20 codons in each reading frame is, on average, a termina- tion codon. In general, a reading frame without a ter- mination codon among 50 or more codons is referred to as an open reading frame (ORF). Long open reading frames usually correspond to genes that encode pro- teins. In the analysis of sequence databases, sophisti- cated programs are used to search for open reading frames in order to find genes among the often huge background of nongenic DNA. An uninterrupted gene coding for a typical protein with a molecular weight of 60,000 would require an open reading frame with 500 or more codons. A striking feature of the genetic code is that an amino acid may be specified by more than one codon, so the code is described as degenerate. This does not suggest that the code is flawed: although an amino acid may have two or more codons, each codon specifies only one amino acid. The degeneracy of the code is not uni- form. Whereas methionine and tryptophan have single codons, for example, three amino acids (Leu, Ser, Arg) have six codons, five amino acids have four, isoleucine has three, and nine amino acids have two (Table 27–3). The genetic code is nearly universal. With the in- triguing exception of a few minor variations in mito- chondria, some bacteria, and some single-celled eu- karyotes (Box 27–2), amino acid codons are identical in all species examined so far. Human beings, E. coli, to- bacco plants, amphibians, and viruses share the same genetic code. Thus it would appear that all life forms have a common evolutionary ancestor, whose genetic code has been preserved throughout biological evolu- tion. Even the variations (Box 27–2) reinforce this theme. Wobble Allows Some tRNAs to Recognize More than One Codon When several different codons specify one amino acid, the difference between them usually lies at the third base position (at the 3H11032 end). For example, alanine is coded by the triplets GCU, GCC, GCA, and GCG. The codons for most amino acids can be symbolized by XY A G or XY U C . The first two letters of each codon are the pri- mary determinants of specificity, a feature that has some interesting consequences. Transfer RNAs base-pair with mRNA codons at a three-base sequence on the tRNA called the anticodon. The first base of the codon in mRNA (read in the 5H11032n3H11032 direction) pairs with the third base of the anticodon (Fig. 27–8a). If the anticodon triplet of a tRNA recog- nized only one codon triplet through Watson-Crick base pairing at all three positions, cells would have a differ- ent tRNA for each amino acid codon. This is not the case, however, because the anticodons in some tRNAs include the nucleotide inosinate (designated I), which contains the uncommon base hypoxanthine (see Fig. 8–5b). Inosinate can form hydrogen bonds with three different nucleotides (U, C, and A; Fig. 27–8b), although 27.1 The Genetic Code 1039 TABLE 27–3 Number Number Amino acid of codons Amino acid of codons Met 1 Tyr 2 Trp 1 Ile 3 Asn 2 Ala 4 Asp 2 Gly 4 Cys 2 Pro 4 Gln 2 Thr 4 Glu 2 Val 4 His 2 Arg 6 Lys 2 Leu 6 Phe 2 Ser 6 Degeneracy of the Genetic Code CUAmRNA 5H11032 tRNA 321 Codon GAU 123 Anticodon 3H11032 5H11032 3H11032 (a) 321 321 321 Anticodon (3H11032)G–C–I G–C– I G–C– I (5H11032) Codon (5H11032)C–G–A C–G–U C–G–C (3H11032) 123 123 123 (b) FIGURE 27–8 Pairing relationship of codon and anticodon. (a) Align- ment of the two RNAs is antiparallel. The tRNA is shown in the tra- ditional cloverleaf configuration. (b) Three different codon pairing re- lationships are possible when the tRNA anticodon contains inosinate. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1039 mac76 mac76:385_reb: these pairings are much weaker than the hydrogen bonds of Watson-Crick base pairs (GmC and AUU). In yeast, one tRNA Arg has the anticodon (5H11032)ICG, which recognizes three arginine codons: (5H11032)CGA, (5H11032)CGU, and (5H11032)CGC. The first two bases are identical (CG) and form strong Watson-Crick base pairs with the corre- sponding bases of the anticodon, but the third base (A, U, or C) forms rather weak hydrogen bonds with the I residue at the first position of the anticodon. Examination of these and other codon-anticodon pairings led Crick to conclude that the third base of most codons pairs rather loosely with the corresponding base of its anticodon; to use his picturesque word, the third base of such codons (and the first base of their corre- Chapter 27 Protein Metabolism1040 BOX 27–1 WORKING IN BIOCHEMISTRY Changing Horses in Midstream: Translational Frameshifting and mRNA Editing Once the reading frame has been set during protein synthesis, codons are translated without overlap or punctuation until the ribosomal complex encounters a termination codon. The other two possible reading frames usually contain no useful genetic information, but a few genes are structured so that ribosomes “hiccup” at a certain point in the translation of their mRNAs, changing the reading frame from that point on. This appears to be a mechanism either to allow two or more related but distinct proteins to be pro- duced from a single transcript or to regulate the syn- thesis of a protein. One of the best-documented examples occurs in translation of the mRNA for the overlapping gag and pol genes of the Rous sarcoma virus (see Fig. 26–31). The reading frame for pol is offset to the left by one base pair (H110021 reading frame) relative to the reading frame for gag (Fig. 1). The product of the pol gene (reverse transcrip- tase) is translated as a larger polyprotein, on the same mRNA that is used for the gag protein alone (see Fig. 26–30). The polyprotein, or gag-pol protein, is then trimmed to the mature reverse transcriptase by pro- teolytic digestion. Production of the polyprotein re- quires a translational frameshift in the overlap region to allow the ribosome to bypass the UAG termination codon at the end of the gag gene (shaded pink in Fig. 1). Frameshifts occur during about 5% of translations of this mRNA, and the gag-pol polyprotein (and ulti- mately reverse transcriptase) is synthesized at about one-twentieth the frequency of the gag protein, a level that suffices for efficient reproduction of the virus. In some retroviruses, another translational frameshift al- lows translation of an even larger polyprotein that in- cludes the product of the env gene fused to the gag and pol gene products (see Fig. 26–30). A similar mechanism produces both the H9270 and H9253 subunits of E. coli DNA polymerase III from a single dnaX gene transcript (see Table 25–2). This mechanism also occurs in the gene for E. coli release factor 2 (RF-2), discussed in Section 27.2, which is required for termination of protein synthesis at the termination codons UAA and UGA. The twenty- sixth codon in the transcript of the gene for RF-2 is UGA, which would normally halt protein synthesis. The remainder of the gene is in the H110011 reading frame (offset one base pair to the right) relative to this UGA codon. Translation pauses at this codon, but termina- tion does not occur unless RF-2 is bound to the codon (the lower the level of RF-2, the less likely the bind- ing). The absence of bound RF-2 prevents the termi- nation of protein synthesis at UGA and allows time for a frameshift to occur. The UGA plus the C that follows it (UGAC) is therefore read as GAC, which translates to Asp. Translation then proceeds in the new reading frame to complete synthesis of RF-2. In this way, RF-2 regulates its own synthesis in a feedback loop. Some mRNAs are edited before translation. The initial transcripts of the genes that encode cytochrome oxidase subunit II in some protist mitochondria do not correspond precisely to the sequence needed at the gag reading frame CUAGGGCUC CGC 3 H11032 UUGACAAAU UUAUA GGGA CUA G G G C U C C G C U U G A C A A A U U U A UA GGGAG Ile Gly Arg Ala GGC pol reading frame GGGCC A Leu Gly Leu Arg Leu Thr Asn Leu CA Stop 5 H11032 FIGURE 1 The gag-pol overlap region in Rous sarcoma virus RNA. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1040 mac76 mac76:385_reb: sponding anticodons) “wobbles.” Crick proposed a set of four relationships called the wobble hypothesis: 1. The first two bases of an mRNA codon always form strong Watson-Crick base pairs with the corresponding bases of the tRNA anticodon and confer most of the coding specificity. 2. The first base of the anticodon (reading in the 5H11032n3H11032 direction; this pairs with the third base of the codon) determines the number of codons recognized by the tRNA. When the first base of the anticodon is C or A, base pairing is specific and only one codon is recognized by that tRNA. When the first base is U or G, binding is less 27.1 The Genetic Code 1041 carboxyl terminus of the protein product. A posttran- scriptional editing process inserts four U residues that shift the translational reading frame of the transcript. Figure 2a shows the added U residues in the small part of the transcript that is affected by editing. Neither the function nor the mechanism of this editing process is understood. Investigators have detected a special class of RNA molecules encoded by these mitochon- dria, with sequences complementary to the edited mRNAs. These so-called guide RNAs (Fig. 2b) appear to act as templates for the editing process. Note that the base pairing involves a number of GUU base pairs (blue dots), which are common in RNA molecules. A distinct form of RNA editing occurs in the gene for the apolipoprotein B component of low-density lipoprotein in vertebrates. One form of apolipopro- tein B, apoB-100 (M r 513,000), is synthesized in the liver; a second form, apoB-48 (M r 250,000), is syn- thesized in the intestine. Both are encoded by an mRNA produced from the gene for apoB-100. A cy- tosine deaminase enzyme found only in the intestine binds to the mRNA at the codon for amino acid residue 2,153 (CAA H11005 Gln) and converts the C to a U, to introduce the termination codon UAA. The apoB-48 produced in the intestine from this modified mRNA is simply an abbreviated form (corresponding to the amino-terminal half) of apoB-100 (Fig. 3). This reaction permits tissue-specific synthesis of two dif- ferent proteins from one gene. 5 H11032 AAAG T A G A G A A C 3 H11032 CT GGT A Glu Asn Leu ValLys Val AAAGUAGA U U G U A U A CCU Asp Cys Ile ProLys Val GG Gly (a) U DNA coding strand Edited mRNA 5 H11032 AAAGU AGA U U G U 3 H11032 A U ACCUGU G UU AUAUCUA AUA UAUGGAU U A mRNA Guide RNA 3 H11032 5 H11032 (b) CAA 3 H11032 CAGACAUA UAUG CAA Gln Residue number UUU GA U CAGUA U Leu Gln Thr Tyr Met Gln Phe Asp Gln Tyr CUG CAA CAGACAUAUAUGAUA Gln UAA GA U CAGUA U Leu Gln Thr Tyr Met Ile Stop CUG 2,146 2,148 2,150 2,152 2,154 2,156 Human intestine (apoB-48) UUU AUA Ile Human liver (apoB-100) 5 H11032 FIGURE 2 RNA editing of the tran- script of the cytochrome oxidase subunit II gene from Trypanosoma brucei mitochondria. (a) Insertion of four U residues (pink) produces a revised reading frame. (b) A special class of guide RNAs, complemen- tary to the edited product, may act as templates for the editing process. FIGURE 3 RNA editing of the transcript of the gene for the apolipoprotein B-100 component of LDL. Deamination, which oc- curs only in the intestine, converts a specific cytosine to uracil, changing a Gln codon to a stop codon and producing a truncated protein. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1041 mac76 mac76:385_reb: specific and two different codons may be read. When inosine (I) is the first (wobble) nucleotide of an anticodon, three different codons can be recognized—the maximum number for any tRNA. These relationships are summarized in Table 27–4. 3. When an amino acid is specified by several different codons, the codons that differ in either of the first two bases require different tRNAs. 4. A minimum of 32 tRNAs are required to translate all 61 codons (31 to encode the amino acids and 1 for initiation). Chapter 27 Protein Metabolism1042 BOX 27–2 WORKING IN BIOCHEMISTRY Exceptions That Prove the Rule: Natural Variations in the Genetic Code In biochemistry, as in other disciplines, exceptions to general rules can be problematic for instructors and frustrating for students. At the same time, though, they teach us that life is complex and inspire us to search for more surprises. Understanding the exceptions can even reinforce the original rule in surprising ways. One would expect little room for variation in the genetic code. Even a single amino acid substitution can have profoundly deleterious effects on the struc- ture of a protein. Nevertheless, variations in the code do occur in some organisms, and they are both inter- esting and instructive. The types of variation and their rarity provide powerful evidence for a common evo- lutionary origin of all living things. To alter the code, changes must occur in one or more tRNAs, with the obvious target for alteration be- ing the anticodon. Such a change would lead to the systematic insertion of an amino acid at a codon that, according to the normal code (see Fig. 27–7), does not specify that amino acid. The genetic code, in ef- fect, is defined by two elements: (1) the anticodons on tRNAs (which determine where an amino acid is placed in a growing polypeptide) and (2) the speci- ficity of the enzymes—the aminoacyl-tRNA syn- thetases—that charge the tRNAs, which determines the identity of the amino acid attached to a given tRNA. Most sudden changes in the code would have cat- astrophic effects on cellular proteins, so code alter- ations are more likely where relatively few proteins would be affected—such as in small genomes encod- ing only a few proteins. The biological consequences of a code change could also be limited by restricting changes to the three termination codons, which do not generally occur within genes (see Box 27–4 for ex- ceptions to this rule). This pattern is in fact observed. Of the very few variations in the genetic code that we know of, most occur in mitochondrial DNA (mtDNA), which encodes only 10 to 20 proteins. Mi- tochondria have their own tRNAs, so their code vari- ations do not affect the much larger cellular genome. The most common changes in mitochondria (and the only code changes that have been observed in cellu- lar genomes) involve termination codons. These changes affect termination in the products of only a subset of genes, and sometimes the effects are minor because the genes have multiple (redundant) termi- nation codons. In mitochondria, these changes can be viewed as a kind of genomic streamlining. Vertebrate mtDNAs have genes that encode 13 proteins, 2 rRNAs, and 22 tRNAs (see Fig. 19–32). An unusual set of wobble rules allows the 22 tRNAs to decode all 64 possible codon triplets; not all of the 32 tRNAs required for the normal code are needed. Four codon families (in which the amino acid is determined entirely by the first two nucleotides) are decoded by a single tRNA with a U residue in the first (or wobble) position in the anticodon. Either the U pairs somehow with any of the four possible bases in the third position of the codon or a “two out of three” mechanism is used— that is, no base pairing is needed at the third position. Other tRNAs recognize codons with either A or G in the third position, and yet others recognize U or C, so that virtually all the tRNAs recognize either two or four codons. In the normal code, only two amino acids are spec- ified by single codons: methionine and tryptophan (see Table 27–3). If all mitochondrial tRNAs recognize two codons, we would expect additional Met and Trp codons in mitochondria. And we find that the single most common code variation is the normal termina- tion codon UGA specifying tryptophan. The tRNA Trp recognizes and inserts a Trp residue at either UGA or the normal Trp codon, UGG. The second most com- mon variation is conversion of AUA from an Ile codon to a Met codon; the normal Met codon is AUG, and a single tRNA recognizes both codons. The known cod- ing variations in mitochondria are summarized in Table 1. Turning to the much rarer changes in the codes for cellular (as distinct from mitochondrial) genomes, we find that the only known variation in a prokaryote is again the use of UGA to encode Trp residues, oc- 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1042 mac76 mac76:385_reb: The wobble (or third) base of the codon con- tributes to specificity, but, because it pairs only loosely with its corresponding base in the anticodon, it per- mits rapid dissociation of the tRNA from its codon dur- ing protein synthesis. If all three bases of a codon engaged in strong Watson-Crick pairing with the three bases of the anticodon, tRNAs would dissociate too slowly and this would severely limit the rate of protein synthesis. Codon-anticodon interactions balance the requirements for accuracy and speed. The genetic code tells us how protein sequence in- formation is stored in nucleic acids and provides some 27.1 The Genetic Code 1043 curring in the simplest free-living cell, Mycoplasma capricolum. Among eukaryotes, the only known ex- tramitochondrial coding changes occur in a few species of ciliated protists, in which both termination codons UAA and UAG can specify glutamine. Changes in the code need not be absolute; a codon might not always encode the same amino acid. In E. coli we find two examples of amino acids being in- serted at positions not specified in the normal code. The first is the occasional use of GUG (Val) as an ini- tiation codon. This occurs only for those genes in which the GUG is properly located relative to partic- ular mRNA sequences that affect the initiation of translation (as discussed in Section 27.2). The second E. coli example also involves contex- tual signals that alter coding patterns. A few proteins in all cells (such as formate dehydrogenase in bacte- ria and glutathione peroxidase in mammals) require the element selenium for their activity, generally in the form of the modified amino acid selenocysteine. Although modified amino acids are generally produced in posttranslational reactions (described in Section 27.3), in E. coli selenocysteine is introduced into for- mate dehydrogenase during translation, in response to an in-frame UGA codon. A special type of serine tRNA, present at lower levels than other Ser-tRNAs, recognizes UGA and no other codons. This tRNA is charged with serine, and the serine is enzymatically converted to selenocysteine before its use at the ri- bosome. The charged tRNA does not recognize just any UGA codon; some contextual signal in the mRNA, still to be identified, ensures that this tRNA recognizes only the few UGA codons, within certain genes, that specify selenocysteine. In effect, E. coli has 21 com- mon amino acids, and UGA doubles as a codon for both termination and (sometimes) selenocysteine. These variations tell us that the code is not quite as universal as once believed, but that its flexibility is severely constrained. The variations are obviously de- rivatives of the normal code, and no example of a com- pletely different code has been found. The limited scope of code variants strengthens the principle that all life on this planet evolved on the basis of a single (slightly flexible) genetic code. TABLE 1 Known Variant Codon Assignments in Mitochondria Codons* AGA UGA AUA AGG CUN CGG Normal code assignment Stop Ile Arg Leu Arg Animals Vertebrates Trp Met Stop H11001H11001 Drosophila Trp Met Ser H11001H11001 Yeasts Saccharomyces cerevisiae Trp Met H11001 Thr H11001 Torulopsis glabrata Trp Met H11001 Thr ? Schizosaccharomyces pombe Trp H11001H11001H11001H11001 Filamentous fungi Trp H11001H11001H11001H11001 Trypanosomes Trp H11001H11001H11001H11001 Higher plants H11001H11001H11001H11001Trp Chlamydomonas reinhardtii ? H11001H11001H11001 ? * N indicates any nucleotide; H11001, codon has the same meaning as in the normal code; ?, codon not observed in this mitochondrial genome. O Se CH 2 CH COO H5008 H H 3 N H11001 Selenocysteine 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1043 mac76 mac76:385_reb: clues about how that information is translated into pro- tein. We now turn to the molecular mechanisms of the translation process. SUMMARY 27.1 The Genetic Code ■ The particular amino acid sequence of a protein is constructed through the translation of information encoded in mRNA. This process is carried out by ribosomes. ■ Amino acids are specified by mRNA codons consisting of nucleotide triplets. Translation requires adaptor molecules, the tRNAs, that recognize codons and insert amino acids into their appropriate sequential positions in the polypeptide. ■ The base sequences of the codons were deduced from experiments using synthetic mRNAs of known composition and sequence. ■ The codon AUG signals initiation of translation. The triplets UAA, UAG, and UGA are signals for termination. ■ The genetic code is degenerate: it has multiple code words for almost every amino acid. ■ The standard genetic code words are universal in all species, with some minor deviations in mitochondria and a few single-celled organisms. ■ The third position in each codon is much less specific than the first and second and is said to wobble. 27.2 Protein Synthesis As we have seen for DNA and RNA (Chapters 25 and 26), the synthesis of polymeric biomolecules can be con- sidered in terms of initiation, elongation, and termina- tion stages. These fundamental processes are typically bracketed by two additional stages: activation of pre- cursors before synthesis and postsynthetic processing of the completed polymer. Protein synthesis follows the same pattern. The activation of amino acids before their incorporation into polypeptides and the posttransla- tional processing of the completed polypeptide play par- ticularly important roles in ensuring both the fidelity of synthesis and the proper function of the protein prod- uct. The cellular components involved in the five stages of protein synthesis in E. coli and other bacteria are listed in Table 27–5; the requirements in eukaryotic cells are quite similar, although the components are in some cases more numerous. An initial overview of the stages of protein synthesis provides a useful outline for the dis- cussion that follows. Protein Biosynthesis Takes Place in Five Stages Stage 1: Activation of Amino Acids For the synthesis of a polypeptide with a defined sequence, two fundamental chemical requirements must be met: (1) the carboxyl group of each amino acid must be activated to facilitate formation of a peptide bond, and (2) a link must be es- tablished between each new amino acid and the infor- mation in the mRNA that encodes it. Both these re- quirements are met by attaching the amino acid to a tRNA in the first stage of protein synthesis. Attaching the right amino acid to the right tRNA is critical. This reaction takes place in the cytosol, not on the ribosome. Each of the 20 amino acids is covalently attached to a specific tRNA at the expense of ATP energy, using Mg 2H11001 - dependent activating enzymes known as aminoacyl- tRNA synthetases. When attached to their amino acid (aminoacylated) the tRNAs are said to be “charged.” Stage 2: Initiation The mRNA bearing the code for the polypeptide to be made binds to the smaller of two ri- bosomal subunits and to the initiating aminoacyl-tRNA. The large ribosomal subunit then binds to form an ini- tiation complex. The initiating aminoacyl-tRNA base- pairs with the mRNA codon AUG that signals the be- ginning of the polypeptide. This process, which requires GTP, is promoted by cytosolic proteins called initiation factors. Stage 3: Elongation The nascent polypeptide is length- ened by covalent attachment of successive amino acid units, each carried to the ribosome and correctly posi- tioned by its tRNA, which base-pairs to its correspon- ding codon in the mRNA. Elongation requires cytosolic proteins known as elongation factors. The binding of each incoming aminoacyl-tRNA and the movement of Chapter 27 Protein Metabolism1044 TABLE 27–4 How the Wobble Base of the Anticodon Determines the Number of Codons a tRNA Can Recognize 1. One codon recognized: 1. Anticodon (3H11032) X–Y–C (5H11032)(3H11032) X–Y– A (5H11032) ––– ––– ––– ––– Codon (5H11032) Y–X–G (3H11032)(5H11032) Y–X– U (3H11032) 2. Two codons recognized: 1. Anticodon (3H11032) X–Y– U (5H11032)(3H11032) X–Y– G (5H11032) ––– ––– ––– ––– Codon (5H11032) Y–X– A G (3H11032)(5H11032) Y–X– C U (3H11032) 3. Three codons recognized: 1. Anticodon (3H11032) X–Y– I (5H11032) ––– ––– Codon (5H11032) Y–X– U A C (3H11032) Note: X and Y denote bases complementary to and capable of strong Watson-Crick base pairing with XH11032 and YH11032, respectively. Wobble bases—in the 3H11032 position of codons and 5H11032 position of anticodons—are shaded in pink. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1044 mac76 mac76:385_reb: the ribosome along the mRNA are facilitated by the hy- drolysis of GTP as each residue is added to the grow- ing polypeptide. Stage 4: Termination and Release Completion of the poly- peptide chain is signaled by a termination codon in the mRNA. The new polypeptide is released from the ribo- some, aided by proteins called release factors. Stage 5: Folding and Posttranslational Processing In order to achieve its biologically active form, the new polypep- tide must fold into its proper three-dimensional confor- mation. Before or after folding, the new polypeptide may undergo enzymatic processing, including removal of one or more amino acids (usually from the amino terminus); addition of acetyl, phosphoryl, methyl, carboxyl, or other groups to certain amino acid residues; proteolytic cleavage; and/or attachment of oligosaccharides or pros- thetic groups. Before looking at these five stages in detail, we must ex- amine two key components in protein biosynthesis: the ribosome and tRNAs. The Ribosome Is a Complex Supramolecular Machine Each E. coli cell contains 15,000 or more ribosomes, making up almost a quarter of the dry weight of the cell. Bacterial ribosomes contain about 65% rRNA and 35% protein; they have a diameter of about 18 nm and are composed of two unequal subunits with sedimentation coefficients of 30S and 50S and a combined sedimenta- tion coefficient of 70S. Both subunits contain dozens of ribosomal proteins and at least one large rRNA (Table 27–6). Following Zamecnik’s discovery that ribosomes are the complexes responsible for protein synthesis, and fol- lowing elucidation of the genetic code, the study of ri- bosomes accelerated. In the late 1960s Masayasu No- mura and colleagues demonstrated that both ribosomal subunits can be broken down into their RNA and pro- tein components, then reconstituted in vitro. Under ap- propriate experimental conditions, the RNA and protein spontaneously reassemble to form 30S or 50S subunits nearly identical in structure and activity to native sub- units. This breakthrough fueled decades of research into 27.2 Protein Synthesis 1045 Stage Essential components 1. Activation of amino acids 20 amino acids 20 aminoacyl-tRNA synthetases 32 or more tRNAs ATP Mg 2H11001 2. Initiation mRNA N-Formylmethionyl-tRNA fmet Initiation codon in mRNA (AUG) 30S ribosomal subunit 50S ribosomal subunit Initiation factors (IF-1, IF-2, IF-3) GTP Mg 2H11001 3. Elongation Functional 70S ribosome (initiation complex) Aminoacyl-tRNAs specified by codons Elongation factors (EF-Tu, EF-Ts, EF-G) GTP Mg 2H11001 4. Termination and release Termination codon in mRNA Release factors (RF-1, RF-2, RF-3) 5. Folding and posttranslational Specific enzymes, cofactors, and other components for processing removal of initiating residues and signal sequences, additional proteolytic processing, modification of terminal residues, and attachment of phosphate, methyl, carboxyl, carbohydrate, or prosthetic groups TABLE 27–5 Components Required for the Five Major Stages of Protein Synthesis in E. coli Masayasu Nomura 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1045 mac76 mac76:385_reb: the function and structure of ribosomal RNAs and pro- teins. At the same time, increasingly sophisticated struc- tural methods revealed more and more details about ribosome structure. The dawn of a new millennium brought with it the elucidation of the first high-resolution structures of bac- terial ribosomal subunits. The bacterial ribosome is com- plex, with a combined molecular weight of H110112.7 million, and it is providing a wealth of surprises (Fig. 27–9). First, the traditional focus on the protein components of ribo- somes was shifted. The ribosomal subunits are huge RNA molecules. In the 50S subunit, the 5S and 23S rRNAs form the structural core. The proteins are secondary elements in the complex, decorating the surface. Second and most important, there is no protein within 18 ? of the active site for peptide bond formation. The high-resolution structure thus confirms what many had suspected for more than a decade: the ribosome is a ribozyme. In ad- dition to the insight they provide into the mechanism of protein synthesis (as elaborated below), the detailed Chapter 27 Protein Metabolism1046 EPA 50S 30S (a) (b) FIGURE 27–9 Ribosomes. Our understanding of ribosome structure took a giant step forward with the publication in 2000 of the high- resolution structure of the 50S ribosomal subunit of the bacterium Haloarcula marismortui by Thomas Steitz, Peter Moore, and their colleagues. This was followed by additional high- resolution structures of the ribosomal subunits from several different bacterial species, and models of the corresponding complete ribosomes. A sampling of that progress is presented here. (a) The 50S and 30S bacterial subunits, split apart to visualize the surfaces that interact in the active ribosome. The structure on the left is the 50S subunit (derived from PDB ID 1JJ2 and 1GIY), with tRNAs (purple, mauve, and gray); bound to sites E, P, and A, described later in the text; the tRNA anti- codons are in orange. Proteins appear as blue wormlike structures; the rRNA as a blended space-filling representation designed to highlight surface features, with the bases in white and the backbone in green. The structure on the right is the 30S subunit (derived from PDB ID 1J5E and 1JGO). Proteins are yellow and the rRNA white. The part of the mRNA that interacts with the tRNA anti- codons is shown in red. The rest of the mRNA winds through grooves or channels on the 30S subunit surface. (b) A model of a complete active bacterial ribosome (derived from PDB ID 1J5E, 1JJ2, 1JGO, and 1GIY). All components are colored as in (a). This is a view down into the groove separating the sub- units. A second view (inset) is from the same angle, but with the tRNAs removed to give a better sense of the cleft where protein synthesis occurs. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1046 mac76 mac76:385_reb: structures of the ribosome and its subunits have stimu- lated a new look at the evolution of life (Box 27–3). The two irregularly shaped ribosomal subunits fit together to form a cleft through which the mRNA passes as the ribosome moves along it during translation (Fig. 27–9b). The 55 proteins in bacterial ribosomes vary enormously in size and structure. Molecular weights range from about 6,000 to 75,000. Most of the proteins have globular domains arranged on the ribosome sur- face. Some also have snakelike protein extensions that protrude into the rRNA core of the ribosome, stabiliz- ing its structure. The functions of some of these pro- teins have not yet been elucidated in detail, although a structural role seems evident for many of them. The sequences of the rRNAs of many organisms are now known. Each of the three single-stranded rRNAs of 27.2 Protein Synthesis 1047 TABLE 27–6 RNA and Protein Components of the E. coli Ribosome Number of Total number Protein Number and Subunit different proteins of proteins designations type of rRNAs 30S 21 21 S1–S21 1 (16S rRNA) 50S 33 36 L1–L36* 2 (5S and 23S rRNAs) * The L1 to L36 protein designations do not correspond to 36 different proteins. The protein originally designated L7 is in fact a modified form of L12, and L8 is a complex of three other proteins. Also, L26 proved to be the same protein as S20 (and not part of the 50S subunit). This gives 33 different proteins in the large subunit. There are four copies of the L7/L12 protein, with the three extra copies bringing the total protein count to 36. (d) Bacterial ribosome 70S M r 2.7 H11547 10 6 Eukaryotic ribosome 80S M r 4.2 H11547 10 6 50S 60S M r 1.8 H11547 10 6 5S rRNA (120 nucleotides) 23S rRNA (3,200 nucleotides) 36 proteins M r 2.8 H11547 10 6 5S rRNA (120 nucleotides) 28S rRNA (4,700 nucleotides) 5.8S rRNA (160 nucleotides) H11011 49 proteins 30S 40S M r 0.9 H11547 10 6 16S rRNA (1,540 nucleotides) 21 proteins M r 1.4 H11547 10 6 18S rRNA (1,900 nucleotides) H11011 33 proteins (c) Structure of the 50S bacterial ribosome subunit (PDB ID 1Q7Y). The subunit is again viewed from the side that attaches to the 30S sub- unit, but is tilted down slightly compared to its orientation in (a). The active site for peptide bond formation (the peptidyl transferase activ- ity), deep within a surface groove and far away from any protein, is marked by a bound inhibitor, puromycin (red). (d) Summary of the composition and mass of ribosomes in prokary- otes and eukaryotes. Ribosomal subunits are identified by their S (Sved- berg unit) values, sedimentation coefficients that refer to their rate of sedimentation in a centrifuge. The S values are not necessarily addi- tive when subunits are combined, because rates of sedimentation are affected by shape as well as mass. (c) 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1047 mac76 mac76:385_reb: Chapter 27 Protein Metabolism1048 BOX 27–3 THE WORLD OF BIOCHEMISTRY From an RNA World to a Protein World Extant ribozymes generally promote one of two types of reactions: hydrolytic cleavage of phosphodiester bonds or phosphoryl transfers (Chapter 26). In both cases, the substrates of the reactions are also RNA molecules. The ribosomal RNAs provide an important expansion of the catalytic range of known ribozymes. Coupled to the laboratory exploration of potential RNA catalytic function (see Box 26–3), the idea of an RNA world as a precursor to current life forms be- comes increasingly attractive. A viable RNA world would require an RNA capa- ble of self-replication, a primitive metabolism to gen- erate the needed ribonucleotide precursors, and a cell boundary to aid in concentrating the precursors and sequestering them from the environment. The re- quirements for catalysis of reactions involving a grow- ing range of metabolites and macromolecules could have led to larger and more complex RNA catalysts. The many negatively charged phosphoryl groups in the RNA backbone limit the stability of very large RNA molecules. In an RNA world, divalent cations or other positively charged groups could be incorporated into the structures to augment stability. Certain peptides could stabilize large RNA mol- ecules. For example, many ribosomal proteins in modern eukaryotic cells have long extensions, lack- ing secondary structure, that snake into the rRNAs and help stabilize them (Fig. 1). Ribozyme-catalyzed synthesis of peptides could thus initially have evolved as part of a general solution to the structural main- tenance of large RNA molecules. The synthesis of peptides may have helped stabilize large ribozymes, but this advance also marked the beginning of the end for the RNA world. Once peptide synthesis was possible, the greater catalytic potential of proteins would have set in motion an irreversible transition to a protein-dominated metabolic system. Most enzymatic processes, then, were eventually surrendered to the proteins—but not all. In every or- ganism, the critical task of synthesizing the proteins remains, even now, a ribozyme-catalyzed process. There appears to be only one good arrangement (or just a very few) of nucleotide residues in a ribozyme active site that can catalyze peptide synthesis. The rRNA residues that seem to be involved in the pep- tidyl transferase activity of ribosomes are highly con- served in the large-subunit rRNAs of all species. Using in vitro evolution (SELEX; see Box 26–3), investiga- tors have isolated artificial ribozymes that promote peptide synthesis. Intriguingly, most of them include the ribonucleotide octet (5H11032)AUAACAGG(3H11032), a highly conserved sequence found at the peptidyl transferase active site in the ribosomes of all cells. There may be just one optimal solution to the overall chemical prob- lem of ribozyme-catalyzed synthesis of proteins of de- fined sequence. Evolution found this solution once, and no life form has notably improved on it. FIGURE 1 The 50S subunit of a bacterial ribosome (PDB ID 1NKW). The protein backbones are shown as blue wormlike structures; the rRNA components are transparent. The unstructured extensions of many of the ribosomal proteins snake into the rRNA structures, help- ing to stabilize them. E. coli has a specific three-dimensional conformation fea- turing extensive intrachain base pairing. The predicted secondary structure of the rRNAs (Fig. 27–10) has largely been confirmed in the high-resolution models, but fails to convey the extensive network of tertiary in- teractions evident in the complete structure. The ribosomes of eukaryotic cells (other than mi- tochondrial and chloroplast ribosomes) are larger and more complex than bacterial ribosomes (Fig. 27–9d), with a diameter of about 23 nm and a sedimentation co- efficient of about 80S. They also have two subunits, which vary in size among species but on average are 60S 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1048 mac76 mac76:385_reb: and 40S. Altogether, eukaryotic ribosomes contain more than 80 different proteins. The ribosomes of mitochon- dria and chloroplasts are somewhat smaller and simpler than bacterial ribosomes. Nevertheless, ribosomal struc- ture and function are strikingly similar in all organisms and organelles. Transfer RNAs Have Characteristic Structural Features To understand how tRNAs can serve as adaptors in translating the language of nucleic acids into the lan- guage of proteins, we must first examine their structure in more detail. Transfer RNAs are relatively small and consist of a single strand of RNA folded into a precise three-dimensional structure (see Fig. 8–28a). The tRNAs in bacteria and in the cytosol of eukaryotes have be- tween 73 and 93 nucleotide residues, corresponding to molecular weights of 24,000 to 31,000. Mitochondria and chloroplasts contain distinctive, somewhat smaller tRNAs. Cells have at least one kind of tRNA for each amino acid; at least 32 tRNAs are required to recognize all the amino acid codons (some recognize more than one codon), but some cells use more than 32. Yeast alanine tRNA (tRNA Ala ), the first nucleic acid to be completely sequenced (Fig. 27–11), contains 76 nucleotide residues, 10 of which have modified bases. Comparisons of tRNAs from various species have re- vealed many common denominators of structure (Fig. 27–12). Eight or more of the nucleotide residues have modified bases and sugars, many of which are methy- lated derivatives of the principal bases. Most tRNAs have a guanylate (pG) residue at the 5H11032 end, and all have the trinucleotide sequence CCA(3H11032) at the 3H11032 end. When 27.2 Protein Synthesis 1049 3H11032 (1,542) 5H11032 (1) 16S rRNA 5S rRNA 5' 3' FIGURE 27–10 Bacterial rRNAs. Diagrams of the secondary structure of E. coli 16S and 5S rRNAs. The first (5H11032 end) and final (3H11032 end) ribonucleotide residues of the 16S rRNA are numbered. FIGURE 27–11 Nucleotide sequence of yeast tRNA Ala . This structure was deduced in 1965 by Robert W. Holley and his colleagues; it is shown in the cloverleaf conformation in which intrastrand base pair- ing is maximal. The following symbols are used for the modified nu- cleotides (shaded pink): H9274, pseudouridine; I, inosine; T, ribothymidine; D, 5,6-dihydrouridine; m 1 I, 1-methylinosine; m 1 G, 1-methylguano- sine; m 2 G, N 2 -dimethylguanosine (see Fig. 26–24). Blue lines between paral- lel sections indicate Watson-Crick base pairs. The anticodon can recognize three codons for alanine (GCA, GCU, and GCC). Other features of tRNA structure are shown in Figures 27–12 and 27–13. Note the presence of two GUU base pairs, signified by a blue dot to indicate non-Watson-Crick pair- ing. In RNAs, guanosine is often base- paired with uridine, although the GUU pair is not as stable as the Watson- Crick GmC pair (Chapter 8). 3H11032 5H11032 Site for amino acid attachment Anticodon triplet G G C G U G pG C U G C U C C U C C C U G A G G G C C C A A U I G C m 1 I 5H11032 3H11032 m 2 G U GCGU CGCGA G G C G A m 1 G A G G C U CCGG A GGCC C G A U U T w w DD D Robert W. Holley, 1922–1993 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1049 mac76 mac76:385_reb: drawn in two dimensions, the hydrogen-bonding pattern of all tRNAs forms a cloverleaf structure with four arms; the longer tRNAs have a short fifth arm, or extra arm (Fig. 27–12). In three dimensions, a tRNA has the form of a twisted L (Fig. 27–13). Two of the arms of a tRNA are critical for its adap- tor function. The amino acid arm can carry a specific amino acid esterified by its carboxyl group to the 2H11032- or 3H11032-hydroxyl group of the A residue at the 3H11032 end of the tRNA. The anticodon arm contains the anticodon. The other major arms are the D arm, which contains the un- usual nucleotide dihydrouridine (D), and the TH9274C arm, which contains ribothymidine (T), not usually present in RNAs, and pseudouridine (H9274), which has an unusual carbon–carbon bond between the base and ribose (see Fig. 26–24). The D and TH9274C arms contribute important Chapter 27 Protein Metabolism1050 pG U A Pu G* G A Py U Pu G T Pu C C A 3H11032 5H11032 Amino acid arm TwC arm Extra arm Variable in size, not present in all tRNAs Anticodon arm Anticodon Wobble position Contains two or three D residues at different positions D arm C C 3H11032 5H11032 FIGURE 27–12 General cloverleaf secondary structure of tRNAs. The large dots on the backbone represent nucleotide residues; the blue lines represent base pairs. Characteristic and/or invariant residues common to all tRNAs are shaded in pink. Transfer RNAs vary in length from 73 to 93 nucleotides. Extra nucleotides occur in the extra arm or in the D arm. At the end of the anti- codon arm is the anticodon loop, which always contains seven unpaired nucleotides. The D arm contains two or three D (5,6-dihydrouridine) residues, depending on the tRNA. In some tRNAs, the D arm has only three hydrogen-bonded base pairs. In addition to the symbols explained in Figure 27–11: Pu, purine nucleotide; Py, pyrimidine nucleotide; G*, guanylate or 2H11032-O-methylguanylate. D arm (residues 10–25) Anticodon arm Anticodon Amino acid armTwC arm 5H11032 1 64 54 56 20 44 32 38 26 12 7 69 72 3H11032 (a) (b) FIGURE 27–13 Three-dimensional structure of yeast tRNA Phe de- duced from x-ray diffraction analysis. The shape resembles a twisted L. (a) Schematic diagram with the various arms identified in Figure 27–12 shaded in different colors. (b) A space-filling model, with the same color coding (PDB ID 4TRA).The CCA sequence at the 3H11032 end (orange) is the attachment point for the amino acid. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1050 mac76 mac76:385_reb: interactions for the overall folding of tRNA molecules, and the TH9274C arm interacts with the large-subunit rRNA. Having looked at the structures of ribosomes and tRNAs, we now consider in detail the five stages of protein syn- thesis. Stage 1: Aminoacyl-tRNA Synthetases Attach the Correct Amino Acids to Their tRNAs During the first stage of protein synthesis, taking place in the cytosol, aminoacyl-tRNA synthetases esterify the 20 amino acids to their corresponding tRNAs. Each en- zyme is specific for one amino acid and one or more cor- responding tRNAs. Most organisms have one aminoacyl- tRNA synthetase for each amino acid. For amino acids with two or more corresponding tRNAs, the same en- zyme usually aminoacylates all of them. The structures of all the aminoacyl-tRNA syn- thetases of E. coli have been determined. Researchers have divided them into two classes (Table 27–7) based on substantial differences in primary and tertiary structure and in reaction mechanism (Fig. 27–14); these two classes are the same in all organisms. There is no evidence for a common ancestor, and the bio- logical, chemical, or evolutionary reasons for two en- zyme classes for essentially identical processes remain obscure. The reaction catalyzed by an aminoacyl-tRNA syn- thetase is Amino acid H11001 tRNA H11001 ATP aminoacyl-tRNA H11001 AMP H11001 PP i This reaction occurs in two steps in the enzyme’s active site. In step 1 (Fig. 27–14) an enzyme-bound interme- diate, aminoacyl adenylate (aminoacyl-AMP), forms when the carboxyl group of the amino acid reacts with the H9251-phosphoryl group of ATP to form an anhydride linkage, with displacement of pyrophosphate. In the sec- Mg 2H11001 3:::4 ond step the aminoacyl group is transferred from enzyme-bound aminoacyl-AMP to its corresponding specific tRNA. The course of this second step depends on the class to which the enzyme belongs, as shown by pathways 2a and 2b in Figure 27–14. The resulting es- ter linkage between the amino acid and the tRNA (Fig. 27–15) has a highly negative standard free energy of hydrolysis (H9004GH11032H11034 H11005 H1100229 kJ/mol). The pyrophosphate formed in the activation reaction undergoes hydrolysis to phosphate by inorganic pyrophosphatase. Thus two high-energy phosphate bonds are ultimately expended for each amino acid molecule activated, rendering the overall reaction for amino acid activation essentially irreversible: Amino acid H11001 tRNA H11001 ATP aminoacyl-tRNA H11001 AMP H11001 2P i DGH11032H11034 H11015 H1100229 kJ/mol Proofreading by Aminoacyl-tRNA Synthetases The amino- acylation of tRNA accomplishes two ends: (1) activation of an amino acid for peptide bond formation and (2) at- tachment of the amino acid to an adaptor tRNA that en- sures appropriate placement of the amino acid in a grow- ing polypeptide. The identity of the amino acid attached to a tRNA is not checked on the ribosome, so attach- ment of the correct amino acid to the tRNA is essential to the fidelity of protein synthesis. As you will recall from Chapter 6, enzyme speci- ficity is limited by the binding energy available from en- zyme-substrate interactions. Discrimination between two similar amino acid substrates has been studied in detail in the case of Ile-tRNA synthetase, which distin- guishes between valine and isoleucine, amino acids that differ by only a single methylene group (OCH 2 O). Ile- tRNA synthetase favors activation of isoleucine (to form Ile-AMP) over valine by a factor of 200—as we would expect, given the amount by which a methylene group (in Ile) could enhance substrate binding. Yet valine is erroneously incorporated into proteins in positions nor- mally occupied by an Ile residue at a frequency of only about 1 in 3,000. How is this greater than tenfold in- crease in accuracy brought about? Ile-tRNA synthetase, like some other aminoacyl-tRNA synthetases, has a proofreading function. Recall a general principle from the discussion of proofreading by DNA polymerases (p. 955): if available binding interactions do not provide sufficient discrimi- nation between two substrates, the necessary specificity can be achieved by substrate-specific binding in two successive steps. The effect of forcing the system through two successive filters is multiplicative. In the case of Ile-tRNA synthetase, the first filter is the initial binding of the amino acid to the enzyme and its activa- tion to aminoacyl-AMP. The second is the binding of any Mg 2H11001 88888n 27.2 Protein Synthesis 1051 TABLE 27–7 Class I Class II Arg Leu Ala Lys Cys Met Asn Phe Gln Trp Asp Pro Glu Tyr Gly Ser Ile Val His Thr The Two Classes of Aminoacyl- tRNA Synthetases Note: Here, Arg represents arginyl-tRNA synthetase, and so forth. The classification applies to all organisms for which tRNA synthetases have been analyzed and is based on protein structural distinctions and on the mechanistic distinction outlined in Figure 27–14. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1051 mac76 mac76:385_reb: Chapter 27 Protein Metabolism1052 CH 2 A O H11002 O H11002 O O H C O O RC H11001 B B P A O H11002 O O O B PO A O H11002 O O O B P H11002 O O ATPAmino acid Adenine O H H H HOH O O H C A RCO B O H11002 H11001 NH 3 CH 2 A O H11002 O O O B POO O AdenineO H OH H H H OH 5H11032-Aminoacyl adenylate (aminoacyl-AMP) class II aminoacyl-tRNA synthetases class I aminoacyl-tRNA synthetases A O H11002 O O H C O O A R OC B O B POO H11001 NH 3 O AdenineO H OH H H H OH CH 2 A A O PO OP A H11001 NH 3 A 3H11032 end of tRNA PP i CH 2 A O O H C O O RCO B P H11002 O AdenosineO H OH H H H OH A OO OP A H11001 NH 3 A Aminoacyl-AMP tRNA O O transesterification O Adenine CH 2 A O H11002 O O H C O O RC B B P H11002 O O Adenine O H AMP H H H A A O PO OP A H11001 NH 3 A CH 2 A O O H C O O RCO B P O H11002 O AdenosineO H OH H H H OH A OO OP A H11001 NH 3 A Aminoacyl-AMP 3H11032 2H11032 3H11032 2H11032 tRNA O O AMP O Adenine Aminoacyl-tRNA O O O OH 3H11032 2H11032 3H11032 2H11032 1 2b 3a 2a MECHANISM FIGURE 27–14 Aminoacylation of tRNA by aminoacyl-tRNA synthetases. Step 1 is formation of an aminoacyl adenylate, which remains bound to the active site. In the second step the aminoacyl group is transferred to the tRNA. The mechanism of this step is somewhat different for the two classes of aminoacyl-tRNA synthetases (see Table 27–7). For class I enzymes, 2a the aminoacyl group is transferred initially to the 2H11032-hydroxyl group of the 3H11032-terminal A residue, then 3a to the 3H11032-hydroxyl group by a transesterification reaction. For class II enzymes, 2b the aminoacyl group is transferred directly to the 3H11032-hydroxyl group of the terminal adenylate. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1052 mac76 mac76:385_reb: incorrect aminoacyl-AMP products to a separate active site on the enzyme; a substrate that binds in this sec- ond active site is hydrolyzed. The R group of valine is slightly smaller than that of isoleucine, so Val-AMP fits the hydrolytic (proofreading) site of the Ile-tRNA syn- thetase but Ile-AMP does not. Thus Val-AMP is hy- drolyzed to valine and AMP in the proofreading active site, and tRNA bound to the synthetase does not be- come aminoacylated to the wrong amino acid. In addition to proofreading after formation of the aminoacyl-AMP intermediate, most aminoacyl-tRNA synthetases can also hydrolyze the ester linkage be- tween amino acids and tRNAs in the aminoacyl-tRNAs. This hydrolysis is greatly accelerated for incorrectly charged tRNAs, providing yet a third filter to enhance the fidelity of the overall process. The few aminoacyl- tRNA synthetases that activate amino acids with no close structural relatives (Cys-tRNA synthetase, for ex- ample) demonstrate little or no proofreading activity; in these cases, the active site for aminoacylation can suf- ficiently discriminate between the proper substrate and any incorrect amino acid. The overall error rate of protein synthesis (~1 mis- take per 10 4 amino acids incorporated) is not nearly as COO H11002 H11001 CH CH H 3 N CH 3 CH 3 Valine COO H11002 H11001 CHH 3 N CCH 3 CH 2 H CH 3 Isoleucine low as that of DNA replication. Because flaws in a pro- tein are eliminated when the protein is degraded and are not passed on to future generations, they have less biological significance. The degree of fidelity in protein synthesis is sufficient to ensure that most proteins con- tain no mistakes and that the large amount of energy required to synthesize a protein is rarely wasted. One defective protein molecule is usually unimportant when many correct copies of the same protein are present. Interaction between an Aminoacyl-tRNA Synthetase and a tRNA: A “Second Genetic Code” An individual aminoacyl-tRNA synthetase must be specific not only for a single amino acid but for certain tRNAs as well. Discriminating among dozens of tRNAs is just as important for the overall fi- delity of protein biosynthesis as is distinguishing among amino acids. The interaction between aminoacyl-tRNA synthetases and tRNAs has been referred to as the “sec- ond genetic code,” reflecting its critical role in main- taining the accuracy of protein synthesis. The “coding” rules appear to be more complex than those in the “first” code. Figure 27–16 summarizes what we know about the nucleotides involved in recognition by some aminoacyl- tRNA synthetases. Some nucleotides are conserved in all tRNAs and therefore cannot be used for discrimination. 27.2 Protein Synthesis 1053 3H11032 end of tRNA A CH 2 A O O H C O O RCO B P H11002 O O H OH H H H A O OP A H11001 NH 3 A Aminoacyl group 3H11032 2H11032 Adenine 5H11032 pG Amino acid arm TwC arm Anticodon arm D arm O FIGURE 27–15 General structure of aminoacyl-tRNAs. The amino- acyl group is esterified to the 3H11032 position of the terminal A residue. The ester linkage that both activates the amino acid and joins it to the tRNA is shaded pink. 3H11032 5H11032 Amino acid arm TwC arm Extra arm Anticodon arm Anticodon D arm FIGURE 27–16 Nucleotide positions in tRNAs that are recognized by aminoacyl-tRNA synthetases. Some positions (blue dots) are the same in all tRNAs and therefore cannot be used to discriminate one from another. Other positions are known recognition points for one (orange) or more (green) aminoacyl-tRNA synthetases. Structural fea- tures other than sequence are important for recognition by some of the synthetases. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1053 mac76 mac76:385_reb: By observing changes in nucleotides that alter substrate specificity, researchers have identified nucleotide posi- tions that are involved in discrimination by the amino- acyl-tRNA synthetases. These nucleotide positions seem to be concentrated in the amino acid arm and the anti- codon arm, including the nucleotides of the anticodon itself, but are also located in other parts of the tRNA molecule. Determination of the crystal structures of aminoacyl-tRNA synthetases complexed with their cog- nate tRNAs and ATP has added a great deal to our un- derstanding of these interactions (Fig. 27–17). Ten or more specific nucleotides may be involved in recognition of a tRNA by its specific aminoacyl-tRNA synthetase. But in a few cases the recognition mecha- nism is quite simple. Across a range of organisms from bacteria to humans, the primary determinant of tRNA recognition by the Ala-tRNA synthetases is a single GUU base pair in the amino acid arm of tRNA Ala (Fig. 27–18a). A short RNA with as few as 7 bp arranged in a simple hairpin minihelix is efficiently aminoacylated by the Ala-tRNA synthetase, as long as the RNA con- tains the critical GUU (Fig. 27–18b). This relatively sim- ple alanine system may be an evolutionary relic of a pe- riod when RNA oligonucleotides, ancestors to tRNA, were aminoacylated in a primitive system for protein synthesis. Stage 2: A Specific Amino Acid Initiates Protein Synthesis Protein synthesis begins at the amino-terminal end and proceeds by the stepwise addition of amino acids to the carboxyl-terminal end of the growing polypeptide, as de- termined by Howard Dintzis in 1961 (Fig. 27–19). The AUG initiation codon thus specifies an amino-terminal methionine residue. Although methionine has only one codon, (5H11032)AUG, all organisms have two tRNAs for me- thionine. One is used exclusively when (5H11032)AUG is the initiation codon for protein synthesis. The other is used to code for a Met residue in an internal position in a polypeptide. The distinction between an initiating (5H11032)AUG and an internal one is straightforward. In bacteria, the two types of tRNA specific for methionine are designated tRNA Met and tRNA fMet . The amino acid incorporated in response to the (5H11032)AUG initiation codon is N-formyl- methionine (fMet). It arrives at the ribosome as N-formylmethionyl-tRNA fMet (fMet-tRNA fMet ), which is formed in two successive reactions. First, methionine is attached to tRNA fMet by the Met-tRNA synthetase (which in E. coli aminoacylates both tRNA fMet and tRNA Met ): Methionine H11001 tRNA fMet H11001 ATP On Met-tRNA fMet H11001 AMP H11001 PP i Chapter 27 Protein Metabolism1054 (a) (b) FIGURE 27–17 Aminoacyl-tRNA synthetases. Both synthetases are complexed with their cognate tRNAs (green stick structures). Bound ATP (red) pinpoints the active site near the end of the aminoacyl arm. (a) Gln-tRNA synthetase from E. coli, a typical monomeric type I syn- thetase (PDB ID 1QRT). (b) Asp-tRNA synthetase from yeast, a typi- cal dimeric type II synthetase (PDB ID 1ASZ). 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1054 mac76 mac76:385_reb: Next, a transformylase transfers a formyl group from N 10 -formyltetrahydrofolate to the amino group of the Met residue: N 10 -Formyltetrahydrofolate H11001 Met-tRNA fMet On tetrahydrofolate H11001 fMet-tRNA fMet The transformylase is more selective than the Met-tRNA synthetase; it is specific for Met residues attached to tRNA fMet , presumably recognizing some unique struc- tural feature of that tRNA. By contrast, Met-tRNA Met in- serts methionine in interior positions in polypeptides. Addition of the N-formyl group to the amino group of methionine by the transformylase prevents fMet from entering interior positions in a polypeptide while also allowing fMet-tRNA fMet to be bound at a specific ribo- somal initiation site that accepts neither Met-tRNA Met nor any other aminoacyl-tRNA. In eukaryotic cells, all polypeptides synthesized by cytosolic ribosomes begin with a Met residue (rather than fMet), but, again, the cell uses a specialized initiating N-Formylmethionine N H H O S CH 2 CH 2 CH 3 C COO H5008 C O H 27.2 Protein Synthesis 1055 4 min Amino terminus Carboxyl terminus Direction of chain growth 7 min 16 min 60 min 146 Residue number 1 3H11032 5H11032 76 GU70 4030 20 10 3H11032 5H11032 76 GU70 (a) 1 5 10 13 66 Deleted nucleotides (b) 60 50 1 FIGURE 27–18 Structural elements of tRNA Ala that are required for recognition by Ala-tRNA synthetase. (a) The tRNA Ala structural elements recognized by the Ala-tRNA synthetase are unusually simple. A single GUU base pair (pink) is the only element needed for specific binding and aminoacylation. (b) A short synthetic RNA minihelix, with the critical GUU base pair but lacking most of the remaining tRNA structure. This is specifically aminoacylated with alanine almost as efficiently as the complete tRNA Ala . FIGURE 27–19 Proof that polypeptides grow by addition of amino acid residues to the carboxyl end: the Dintzis experiment. Reticulo- cytes (immature erythrocytes) actively synthesizing hemoglobin were incubated with radioactive leucine (selected because it occurs fre- quently in both the H9251- and H9252-globin chains). Samples of completed H9251 chains were isolated from the reticulocytes at various times afterward, and the distribution of radioactivity was determined. The dark red zones show the portions of completed H9251-globin chains containing ra- dioactive Leu residues. At 4 min, only a few residues at the carboxyl end of H9251-globin were labeled, because the only complete globin chains with incorporated label after 4 min were those that had nearly com- pleted synthesis at the time the label was added. With longer incu- bation times, successively longer segments of the polypeptide con- tained labeled residues, always in a block at the carboxyl end of the chain. The unlabeled end of the polypeptide (the amino terminus) was thus defined as the initiating end, which means that polypeptides grow by successive addition of amino acids to the carboxyl end. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1055 mac76 mac76:385_reb: tRNA that is distinct from the tRNA Met used at (5H11032)AUG codons at interior positions in the mRNA. Polypeptides synthesized by mitochondrial and chloroplast ribo- somes, however, begin with N-formylmethionine. This strongly supports the view that mitochondria and chloroplasts originated from bacterial ancestors that were symbiotically incorporated into precursor eukary- otic cells at an early stage of evolution (see Fig. 1–36). How can the single (5H11032)AUG codon distinguish be- tween the starting N-formylmethionine (or methionine, in eukaryotes) and interior Met residues? The details of the initiation process provide the answer. The Three Steps of Initiation The initiation of polypep- tide synthesis in bacteria requires (1) the 30S ribosomal subunit, (2) the mRNA coding for the polypeptide to be made, (3) the initiating fMet-tRNA fMet , (4) a set of three proteins called initiation factors (IF-1, IF-2, and IF-3), (5) GTP, (6) the 50S ribosomal subunit, and (7) Mg 2H11001 . Formation of the initiation complex takes place in three steps (Fig. 27–20). In step 1 the 30S ribosomal subunit binds two ini- tiation factors, IF-1 and IF-3. Factor IF-3 prevents the 30S and 50S subunits from combining prematurely. The mRNA then binds to the 30S subunit. The initiating (5H11032)AUG is guided to its correct position by the Shine- Dalgarno sequence (named for Australian researchers John Shine and Lynn Dalgarno, who identified it) in the mRNA. This consensus sequence is an initiation signal of four to nine purine residues, 8 to 13 bp to the 5H11032 side of the initiation codon (Fig. 27–21a). The sequence base-pairs with a complementary pyrimidine-rich se- quence near the 3H11032 end of the 16S rRNA of the 30S ri- bosomal subunit (Fig. 27–21b). This mRNA-rRNA in- teraction positions the initiating (5H11032)AUG sequence of the mRNA in the precise position on the 30S subunit where it is required for initiation of translation. The par- ticular (5H11032)AUG where fMet-tRNA fMet is to be bound is distinguished from other methionine codons by its prox- imity to the Shine-Dalgarno sequence in the mRNA. Bacterial ribosomes have three sites that bind aminoacyl-tRNAs, the aminoacyl (A) site, the pep- tidyl (P) site, and the exit (E) site. Both the 30S and the 50S subunits contribute to the characteristics of the A and P sites, whereas the E site is largely con- fined to the 50S subunit. The initiating (5H11032)AUG is positioned at the P site, the only site to which fMet- tRNA fMet can bind (Fig. 27–20). The fMet-tRNA fMet is the only aminoacyl-tRNA that binds first to the P site; during the subsequent elongation stage, all other in- coming aminoacyl-tRNAs (including the Met-tRNA Met that binds to interior AUG codons) bind first to the A site and only subsequently to the P and E sites. The E site is the site from which the “uncharged” tRNAs leave during elongation. Factor IF-1 binds at the A site and prevents tRNA binding at this site during initiation. Chapter 27 Protein Metabolism1056 P IF-3 3H110325H11032 Initiation codon 1 PA mRNA IF-3 30S Subunit IF-2 (3H11032) UAC (5H11032) Anticodon 5H11032 5H11032 2 fMet fMet GTP GTP P 3H110325H11032 UAC UA C IF-3 IF-1 IF-2 IF-3 IF-1 IF-1 5H11032 H11545H11545 GDP H11001 P i PA E 3H110325H11032 UAC AUG fMet 50S Subunit 50S Subunit Next codon mRNA 3 tRNA IF-1 A U G AUG IF-2 FIGURE 27–20 Formation of the initiation complex in bacteria. The complex forms in three steps (described in the text) at the expense of the hydrolysis of GTP to GDP and P i . IF-1, IF-2, and IF-3 are initia- tion factors. P designates the peptidyl site, A the aminoacyl site, and E the exit site. Here the anticodon of the tRNA is oriented 3H11032 to 5H11032, left to right, as in Figure 27–8 but opposite to the orientation in Fig- ures 27–16 and 27–18. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1056 mac76 mac76:385_reb: In step 2 of the initiation process (Fig. 27–20), the complex consisting of the 30S ribosomal subunit, IF-3, and mRNA is joined by both GTP-bound IF-2 and the initiating fMet-tRNA fMet . The anticodon of this tRNA now pairs correctly with the mRNA’s initiation codon. In step 3 this large complex combines with the 50S ribosomal subunit; simultaneously, the GTP bound to IF-2 is hydrolyzed to GDP and P i , which are released from the complex. All three initiation factors depart from the ribosome at this point. Completion of the steps in Figure 27–20 produces a functional 70S ribosome called the initiation com- plex, containing the mRNA and the initiating fMet- tRNA fMet . The correct binding of the fMet-tRNA fMet to the P site in the complete 70S initiation complex is as- sured by at least three points of recognition and at- tachment: the codon-anticodon interaction involving the initiation AUG fixed in the P site; interaction between the Shine-Dalgarno sequence in the mRNA and the 16S rRNA; and binding interactions between the ribosomal P site and the fMet-tRNA fMet . The initiation complex is now ready for elongation. Initiation in Eukaryotic Cells Translation is generally sim- ilar in eukaryotic and bacterial cells; most of the signif- icant differences are in the mechanism of initiation. Eu- karyotic mRNAs are bound to the ribosome as a complex with a number of specific binding proteins. Several of these tie together the 5H11032 and 3H11032 ends of the message. At the 3H11032 end, the mRNA is bound by the poly(A) binding protein (PAB). Eukaryotic cells have at least nine initi- ation factors. A complex called eIF4F, which includes the proteins eIF4E, eIF4G, and eIF4A, binds to the 5H11032 cap (see Fig. 26–12) through eIF4E. The protein eIF4G binds to both eIF4E and PAB, effectively tying them to- gether (Fig. 27–22). The protein eIF4A has an RNA he- licase activity. It is the eIF4F complex that associates 27.2 Protein Synthesis 1057 FIGURE 27–21 Messenger RNA sequences that serve as signals for initiation of protein synthesis in bacteria. (a) Alignment of the initi- ating AUG (shaded in green) at its correct location on the 30S ribo- somal subunit depends in part on upstream Shine-Dalgarno sequences (pink). Portions of the mRNA transcripts of five prokaryotic genes are shown. Note the unusual example of the E. coli LacI protein, which initiates with a GUG (Val) codon (see Box 27–2). (b) The Shine- Dalgarno sequence of the mRNA pairs with a sequence near the 3H11032 end of the 16S rRNA. (5H11032) A G C A C G A G G G G A A A U C U G A U G G A A C G C U A C (3H11032) E. coli trpA E. coli araB E. coli lacI fX174 phage A protein l phage cro U U U G G A U G G A G U G A A A C G A U G G C G A U U G C A C A A U U C A G G G U G G U G A A U G U G A A A C C A G U A A A U C U U G G A G G C U U U U U U A U G G U U C G U U C U A U G U A C U A A G G A G G U U G U A U G G A A C A A C G C Shine-Dalgarno sequence; pairs with 16S rRNA Initiation codon; pairs with fMet-tRNA fMet (a) (5H11032) G A U U C C U A G G A G G U U U Prokaryotic mRNA with consensus Shine-Dalgarno sequence (b) 3H11032 End of 16S rRNA 3H11032 OH A U U C C U C C G A U C A G A C C U A U G C G A G C U U (3H11032)U U A G U A A A A(A) n Gene eIF4G PABeIF4E AUG eIF3 5H11032 cap 3H11032 poly(A) tail 40S Ribosomal subunit 3H11032 Untranslated region FIGURE 27–22 Protein complexes in the formation of a eukaryotic initiation complex. The 3H11032 and 5H11032 ends of eukaryotic mRNAs are linked by a complex of proteins that includes several initiation factors and the poly(A) binding protein (PAB). The factors eIF4E and eIF4G are part of a larger complex called eIF4F. This complex binds to the 40S ribosomal subunit. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1057 mac76 mac76:385_reb: with another factor, eIF3, and with the 40S ribosomal subunit. The efficiency of translation is affected by many properties of the mRNA and proteins in this complex, including the length of the 3H11032 poly(A) tract (in most cases, longer is better). The end-to-end arrangement of the eukaryotic mRNA facilitates translational regulation of gene expression, considered in Chapter 28. The initiating (5H11032)AUG is detected within the mRNA not by its proximity to a Shine-Dalgarno-like sequence but by a scanning process: a scan of the mRNA from the 5H11032 end until the first AUG is encountered, signaling the beginning of the reading frame. The eIF4F complex is probably involved in this process, perhaps using the RNA helicase activity of eIF4A to eliminate secondary structure in the 5H11032 untranslated portion of the mRNA. Scanning is also facilitated by another protein, eIF4B. The roles of the various bacterial and eukaryotic ini- tiation factors in the overall process are summarized in Table 27–8. The mechanism by which these proteins act is an important area of investigation. Stage 3: Peptide Bonds Are Formed in the Elongation Stage The third stage of protein synthesis is elongation. Again, our initial focus is on bacterial cells. Elongation requires (1) the initiation complex described above, (2) aminoacyl-tRNAs, (3) a set of three soluble cytosolic proteins called elongation factors (EF-Tu, EF-Ts, and EF-G in bacteria), and (4) GTP. Cells use three steps to add each amino acid residue, and the steps are repeated as many times as there are residues to be added. Elongation Step 1: Binding of an Incoming Aminoacyl-tRNA In the first step of the elongation cycle (Fig. 27–23), the appropriate incoming aminoacyl-tRNA binds to a com- plex of GTP-bound EF-Tu. The resulting aminoacyl- tRNA–EF-Tu–GTP complex binds to the A site of the 70S initiation complex. The GTP is hydrolyzed and an EF-Tu–GDP complex is released from the 70S ribosome. The EF-Tu–GTP complex is regenerated in a process in- volving EF-Ts and GTP. Elongation Step 2: Peptide Bond Formation A peptide bond is now formed between the two amino acids bound by their tRNAs to the A and P sites on the ribosome. This occurs by the transfer of the initiating N-formylme- thionyl group from its tRNA to the amino group of the second amino acid, now in the A site (Fig. 27–24). The H9251-amino group of the amino acid in the A site acts as a nucleophile, displacing the tRNA in the P site to form the peptide bond. This reaction produces a dipeptidyl- tRNA in the A site, and the now “uncharged” (deacy- lated) tRNA fMet remains bound to the P site. The tRNAs then shift to a hybrid binding state, with elements of each spanning two different sites on the ribosome, as shown in Figure 27–24. The enzymatic activity that catalyzes peptide bond formation has historically been referred to as peptidyl transferase and was widely assumed to be intrinsic to one or more of the proteins in the large ribosomal sub- unit. We now know that this reaction is catalyzed by the 23S rRNA (Fig. 27–9), adding to the known catalytic repertoire of ribozymes. This discovery has interesting implications for the evolution of life (Box 27–3). Chapter 27 Protein Metabolism1058 TABLE 27–8 Protein Factors Required for Initiation of Translation in Bacterial and Eukaryotic Cells Factor Function Bacterial IF-1 Prevents premature binding of tRNAs to A site IF-2 Facilitates binding of fMet-tRNA fMet to 30S ribosomal subunit IF-3 Binds to 30S subunit; prevents premature association of 50S subunit; enhances specificity of P site for fMet-tRNA fMet Eukaryotic * eIF2 Facilitates binding of initiating Met-tRNA Met to 40S ribosomal subunit eIF2B, eIF3 First factors to bind 40S subunit; facilitate subsequent steps eIF4A RNA helicase activity removes secondary structure in the mRNA to permit binding to 40S subunit; part of the eIF4F complex eIF4B Binds to mRNA; facilitates scanning of mRNA to locate the first AUG eIF4E Binds to the 5H11032 cap of mRNA; part of the eIF4F complex eIF4G Binds to eIF4E and to poly(A) binding protein (PAB); part of the eIF4F complex eIF5 Promotes dissociation of several other initiation factors from 40S subunit as a prelude to association of 60S subunit to form 80S initiation complex eIF6 Facilitates dissociation of inactive 80S ribosome into 40S and 60S subunits * The prefix “e” identifies these as eukaryotic factors. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1058 mac76 mac76:385_reb: 27.2 Protein Synthesis 1059 Tu 5H11032 GTP PA 3H110325H11032 UAC AUG fMet Initiation complex Next codon Initiation codon 30S 5H11032 AA 2 Tu GDP Tu Ts Tu GTP Ts GDP GTP Ts 5H11032 AA 2 UAC 5H11032 5H11032 PA E E 3H110325H11032 AUG fMet binding of incoming aminoacyl- tRNA Incoming aminoacyl- tRNA P i AA 2 50S FIGURE 27–23 First elongation step in bacteria: binding of the sec- ond aminoacyl-tRNA. The second aminoacyl-tRNA enters the A site of the ribosome bound to EF-Tu (shown here as Tu), which also con- tains GTP. Binding of the second aminoacyl-tRNA to the A site is ac- companied by hydrolysis of the GTP to GDP and P i and release of the EF-Tu–GDP complex from the ribosome. The bound GDP is released when the EF-Tu–GDP complex binds to EF-Ts, and EF-Ts is subse- quently released when another molecule of GTP binds to EF-Tu. This recycles EF-Tu and makes it available to repeat the cycle. H C NH R 2 C O O H C NH R 1 C O H C O P siteE site A site 5H11032 HC NH 2 R 2 CO O UAC PA E 3H11032 mRNA 5H11032 AUG HC NH R 1 CO O CO H P siteE site A site fMet-tRNA fMet Aminoacyl- tRNA 2 peptide bond formation .. 5H11032 5H11032 UAC Deacylated tRNA fMet PA E 3H11032 5H11032 AUG Dipeptidyl- tRNA 2 OH 5H11032 5H11032 5H11032 FIGURE 27–24 Second elongation step in bacteria: formation of the first peptide bond. The peptidyl transferase catalyzing this reaction is the 23S rRNA ribozyme. The N-formylmethionyl group is transferred to the amino group of the second aminoacyl-tRNA in the A site, form- ing a dipeptidyl-tRNA. At this stage, both tRNAs bound to the ribo- some shift position in the 50S subunit to take up a hybrid binding state. The uncharged tRNA shifts so that its 3H11032 and 5H11032 ends are in the E site. Similarly, the 3H11032 and 5H11032 ends of the peptidyl tRNA shift to the P site. The anticodons remain in the A and P sites. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1059 mac76 mac76:385_reb: Elongation Step 3: Translocation In the final step of the elongation cycle, translocation, the ribosome moves one codon toward the 3H11032 end of the mRNA (Fig. 27–25a). This movement shifts the anticodon of the dipeptidyl- tRNA, which is still attached to the second codon of the mRNA, from the A site to the P site, and shifts the de- acylated tRNA from the P site to the E site, from where the tRNA is released into the cytosol. The third codon of the mRNA now lies in the A site and the second codon in the P site. Movement of the ribosome along the mRNA requires EF-G (also known as translocase) and the en- ergy provided by hydrolysis of another molecule of GTP. A change in the three-dimensional conformation of the entire ribosome results in its movement along the mRNA. Because the structure of EF-G mimics the structure of the EF-Tu–tRNA complex (Fig. 27–25b), EF-G can bind the A site and presumably displace the peptidyl-tRNA. The ribosome, with its attached dipeptidyl-tRNA and mRNA, is now ready for the next elongation cycle and attachment of a third amino acid residue. This process occurs in the same way as addition of the second residue (as shown in Figs 27–23, 27–24, and 27–25). For each amino acid residue correctly added to the growing polypeptide, two GTPs are hydrolyzed to GDP and P i as the ribosome moves from codon to codon along the mRNA toward the 3H11032 end. The polypeptide remains attached to the tRNA of the most recent amino acid to be inserted. This associ- ation maintains the functional connection between the information in the mRNA and its decoded polypeptide output. At the same time, the ester linkage between this tRNA and the carboxyl terminus of the growing polypep- tide activates the terminal carboxyl group for nucleo- philic attack by the incoming amino acid to form a new peptide bond (Fig. 27–24). As the existing ester linkage between the polypeptide and tRNA is broken during Chapter 27 Protein Metabolism1060 H C NH R 2 C O O H C NH R 1 C O H C O 5H11032 5H11032 PA UA C 5H11032 5H11032 5H11032 3H11032 5H11032 Direction of ribosome movement (a) AUG H C NH R 2 C O O OH H C NH R 1 CO CO H P siteE site A site UAC Deacylated tRNA fMet Incoming aminoacyl-tRNA 3 PA E 3H11032 5H11032 AUG Dipeptidyl- tRNA 2 OH P siteE site A site 5H11032 5H11032 5H11032 5H11032 EF-G translocation GTP EF-G H11001 GDP H11001 P i (b) FIGURE 27–25 Third elongation step in bacteria: translocation. (a) The ribosome moves one codon toward the 3H11032 end of the mRNA, using energy provided by hydrolysis of GTP bound to EF-G (translo- case). The dipeptidyl-tRNA is now entirely in the P site, leaving the A site open for the incoming (third) aminoacyl-tRNA. The uncharged tRNA dissociates from the E site, and the elongation cycle begins again. (b) The structure of EF-G mimics the structure of EF-Tu complexed with tRNA. Shown here are (left) EF-Tu complexed with tRNA (green) (PDB ID 1B23) and (right) EF-G complexed with GDP (red) (PDB ID 1DAR). The carboxyl-terminal part of EF-G (dark gray) mimics the structure of the anticodon loop of tRNA in both shape and charge distribution. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1060 mac76 mac76:385_reb: peptide bond formation, the linkage between the poly- peptide and the information in the mRNA persists, be- cause each newly added amino acid is still attached to its tRNA. The elongation cycle in eukaryotes is quite simi- lar to that in prokaryotes. Three eukaryotic elongation factors (eEF1H9251, eEF1H9252H9253, and eEF2) have functions analogous to those of the bacterial elongation factors (EF-Tu, EF-Ts, and EF-G, respectively). Eukaryotic ribosomes do not have an E site; uncharged tRNAs are expelled directly from the P site. Proofreading on the Ribosome The GTPase activity of EF- Tu during the first step of elongation in bacterial cells (Fig. 27–23) makes an important contribution to the rate and fidelity of the overall biosynthetic process. Both the EF-Tu–GTP and EF-Tu–GDP complexes exist for a few milliseconds before they dissociate. These two in- tervals provide opportunities for the codon-anticodon interactions to be proofread. Incorrect aminoacyl-tRNAs normally dissociate from the A site during one of these periods. If the GTP analog guanosine 5H11032-O-(3-thio- triphosphate) (GTPH9253S) is used in place of GTP, hydro- lysis is slowed, improving the fidelity (by increasing the proofreading intervals) but reducing the rate of protein synthesis. The process of protein synthesis (including the characteristics of codon-anticodon pairing already de- scribed) has clearly been optimized through evolution to balance the requirements of both speed and fidelity. Improved fidelity might diminish speed, whereas in- creases in speed would probably compromise fidelity. Note that the proofreading mechanism on the ribosome establishes only that the proper codon-anticodon pair- ing has taken place. The identity of the amino acid at- tached to a tRNA is not checked on the ribosome. If a tRNA is successfully aminoacylated with the wrong amino acid (as can be done experimentally), this incor- rect amino acid is efficiently incorporated into a protein in response to whatever codon is normally recognized by the tRNA. Stage 4: Termination of Polypeptide Synthesis Requires a Special Signal Elongation continues until the ribosome adds the last amino acid coded by the mRNA. Termination, the fourth stage of polypeptide synthesis, is signaled by the H5008 O O H5008 S PO O H5008 O P NH 2 O H5008 O POO O OH H H H OH H N N O N N CH 2 Guanosine 5H11032-O-(3-thiotriphosphate) (GTPgS) presence of one of three termination codons in the mRNA (UAA, UAG, UGA), immediately following the fi- nal coded amino acid. Mutations in a tRNA anticodon that allow an amino acid to be inserted at a termination codon are generally deleterious to the cell (Box 27–4). In bacteria, once a termination codon occupies the ribosomal A site, three termination factors, or re- lease factors—the proteins RF-1, RF-2, and RF-3— contribute to (1) hydrolysis of the terminal peptidyl- tRNA bond; (2) release of the free polypeptide and the last tRNA, now uncharged, from the P site; and (3) dis- sociation of the 70S ribosome into its 30S and 50S sub- units, ready to start a new cycle of polypeptide synthe- sis (Fig. 27–26). RF-1 recognizes the termination codons UAG and UAA, and RF-2 recognizes UGA and UAA. Either RF-1 or RF-2 (depending on which codon is present) binds at a termination codon and induces peptidyl transferase to transfer the growing polypeptide to a water molecule rather than to another amino acid. The release factors have domains thought to mimic the structure of tRNA, as shown for the elongation factor EF-G in Figure 27–25b. The specific function of RF-3 has not been firmly established, although it is thought to release the ribosomal subunit. In eukaryotes, a sin- gle release factor, eRF, recognizes all three termination codons. Energy Cost of Fidelity in Protein Synthesis Synthesis of a protein true to the information specified in its mRNA requires energy. Formation of each aminoacyl-tRNA uses two high-energy phosphate groups. An additional ATP is consumed each time an incorrectly activated amino acid is hydrolyzed by the deacylation activity of an aminoacyl-tRNA synthetase, as part of its proof- reading activity. A GTP is cleaved to GDP and P i during the first elongation step, and another during the translo- cation step. Thus, on average, the energy derived from the hydrolysis of more than four NTPs to NDPs is re- quired for the formation of each peptide bond of a polypeptide. This represents an exceedingly large thermody- namic “push” in the direction of synthesis: at least 4 H11003 30.5 kJ/mol H11005 122 kJ/mol of phosphodiester bond en- ergy to generate a peptide bond, which has a standard free energy of hydrolysis of only about H1100221 kJ/mol. The net free-energy change during peptide bond synthesis is thus H11002101 kJ/mol. Proteins are information-containing polymers. The biochemical goal is not simply the for- mation of a peptide bond but the formation of a peptide bond between two specified amino acids. Each of the high-energy phosphate compounds expended in this process plays a critical role in maintaining proper align- ment between each new codon in the mRNA and its as- sociated amino acid at the growing end of the polypep- tide. This energy permits very high fidelity in the biological translation of the genetic message of mRNA into the amino acid sequence of proteins. 27.2 Protein Synthesis 1061 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1061 mac76 mac76:385_reb: Rapid Translation of a Single Message by Polysomes Large clusters of 10 to 100 ribosomes that are very active in protein synthesis can be isolated from both eukaryotic and bacterial cells. Electron micrographs show a fiber between adjacent ribosomes in the cluster, which is called a polysome (Fig. 27–27). The connecting strand is a single molecule of mRNA that is being translated si- multaneously by many closely spaced ribosomes, allow- ing the highly efficient use of the mRNA. In bacteria, transcription and translation are tightly coupled. Messenger RNAs are synthesized and trans- lated in the same 5H11032n 3H11032 direction. Ribosomes begin translating the 5H11032 end of the mRNA before transcription is complete (Fig. 27–28). The situation is quite differ- ent in eukaryotic cells, where newly transcribed mRNAs must leave the nucleus before they can be translated. Bacterial mRNAs generally exist for just a few min- utes (p. 1020) before they are degraded by nucleases. In order to maintain high rates of protein synthesis, the mRNA for a given protein or set of proteins must be made continuously and translated with maximum effi- ciency. The short lifetime of mRNAs in bacteria allows a rapid cessation of synthesis when the protein is no longer needed. Stage 5: Newly Synthesized Polypeptide Chains Undergo Folding and Processing In the final stage of protein synthesis, the nascent polypeptide chain is folded and processed into its bio- logically active form. During or after its synthesis, the polypeptide progressively assumes its native conforma- tion, with the formation of appropriate hydrogen bonds and van der Waals, ionic, and hydrophobic interactions. In this way the linear, or one-dimensional, genetic message in the mRNA is converted into the three- dimensional structure of the protein. Some newly made proteins, both prokaryotic and eukaryotic, do not attain their final biologically active conformation until they have been altered by one or more processing reactions called posttranslational modifications. Amino-Terminal and Carboxyl-Terminal Modifications The first residue inserted in all polypeptides is N-formylmethio- nine (in bacteria) or methionine (in eukaryotes). How- ever, the formyl group, the amino-terminal Met residue, and often additional amino-terminal (and, in some cases, carboxyl-terminal) residues may be removed enzymat- ically in formation of the final functional protein. In as many as 50% of eukaryotic proteins, the amino group of the amino-terminal residue is N-acetylated after translation. Carboxyl-terminal residues are also some- times modified. Loss of Signal Sequences As we shall see in Section 27.3, the 15 to 30 residues at the amino-terminal end of some proteins play a role in directing the protein to its ulti- mate destination in the cell. Such signal sequences are ultimately removed by specific peptidases. Modification of Individual Amino Acids The hydroxyl groups of certain Ser, Thr, and Tyr residues of some pro- teins are enzymatically phosphorylated by ATP (Fig. Chapter 27 Protein Metabolism1062 PA 3H110325H11032 5H11032 UAG E RF Release factor binds polypeptidyl-tRNA link hydrolyzed PA 3H110325H11032 UAG E RF COO H11002 3H110325H11032 RF components dissociate UAG 5H11032 5H11032 FIGURE 27–26 Termination of protein synthesis in bacteria. Termi- nation occurs in response to a termination codon in the A site. First, a release factor, RF (RF-1 or RF-2, depending on which termination codon is present), binds to the A site. This leads to hydrolysis of the ester linkage between the nascent polypeptide and the tRNA in the P site and release of the completed polypeptide. Finally, the mRNA, de- acylated tRNA, and release factor leave the ribosome, and the ribo- some dissociates into its 30S and 50S subunits. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1062 mac76 mac76:385_reb: 27–29a); the phosphate groups add negative charges to these polypeptides. The functional significance of this modification varies from one protein to the next. For example, the milk protein casein has many phospho- serine groups that bind Ca 2H11001 . Calcium, phosphate, and amino acids are all valuable to suckling young, so casein efficiently provides three essential nutrients. And as we have seen in numerous instances, phosphorylation- dephosphorylation cycles regulate the activity of many enzymes and regulatory proteins. Extra carboxyl groups may be added to Glu residues of some proteins. For example, the blood-clotting pro- tein prothrombin contains a number of H9253-carboxygluta- mate residues (Fig. 27–29b) in its amino-terminal re- gion, introduced by an enzyme that requires vitamin K. These carboxyl groups bind Ca 2H11001 , which is required to initiate the clotting mechanism. 27.2 Protein Synthesis 1063 0.25 mm FIGURE 27–27 Polysome. (a) Four ribosomes translating a eukaryotic mRNA molecule simultaneously, moving from the 5H11032 end to the 3H11032 end and synthesizing a polypeptide from the amino terminus to the carboxyl terminus. (b) Electron micrograph and explanatory diagram of a polysome from the silk gland of a silkworm larva. The mRNA is being translated by many ribosomes simultaneously. The nascent polypeptides become longer as the ribosomes move toward the 3H11032 end of the mRNA. The final product of this process is silk fibroin. H11001 NH 3 H11001 NH 3 mRNA 5H11032 Direction of translation Ribosome DNA duplex RNA polymerase 3H11032 5H11032 3H11032 5H11032 Direction of transcription FIGURE 27–28 Coupling of transcription and translation in bacte- ria. The mRNA is translated by ribosomes while it is still being tran- scribed from DNA by RNA polymerase. This is possible because the mRNA in bacteria does not have to be transported from a nucleus to the cytoplasm before encountering ribosomes. In this schematic dia- gram the ribosomes are depicted as smaller than the RNA polymerase. In reality the ribosomes (M r 2.7 H11003 10 6 ) are an order of magnitude larger than the RNA polymerase (M r 3.9 H11003 10 5 ). (b) 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1063 mac76 mac76:385_reb: Monomethyl- and dimethyllysine residues (Fig. 27–29c) occur in some muscle proteins and in cy- tochrome c. The calmodulin of most species contains one trimethyllysine residue at a specific position. In other proteins, the carboxyl groups of some Glu residues undergo methylation, removing their negative charge. Attachment of Carbohydrate Side Chains The carbohydrate side chains of glycoproteins are attached covalently dur- ing or after synthesis of the polypeptide. In some gly- coproteins, the carbohydrate side chain is attached en- zymatically to Asn residues (N-linked oligosaccharides), in others to Ser or Thr residues (O-linked oligosaccha- rides) (see Fig. 7–31). Many proteins that function ex- tracellularly, as well as the lubricating proteoglycans that coat mucous membranes, contain oligosaccharide side chains (see Fig. 7–29). Addition of Isoprenyl Groups A number of eukaryotic pro- teins are modified by the addition of groups derived from isoprene (isoprenyl groups). A thioether bond is formed between the isoprenyl group and a Cys residue of the protein (see Fig. 11–14). The isoprenyl groups are derived from pyrophosphorylated intermediates of the cholesterol biosynthetic pathway (see Fig. 21–33), such as farnesyl pyrophosphate (Fig. 27–30). Proteins modified in this way include the Ras proteins, products of the ras oncogenes and proto-oncogenes, and G pro- teins (both discussed in Chapter 12), and lamins, pro- teins found in the nuclear matrix. The isoprenyl group helps to anchor the protein in a membrane. The trans- forming (carcinogenic) activity of the ras oncogene is lost when isoprenylation of the Ras protein is blocked, a finding that has stimulated interest in identifying in- hibitors of this posttranslational modification pathway for use in cancer chemotherapy. Addition of Prosthetic Groups Many prokaryotic and eu- karyotic proteins require for their activity covalently bound prosthetic groups. Two examples are the biotin Chapter 27 Protein Metabolism1064 H11002 OOC g-Carboxyglutamate CH GD COO H11002 H 3 N H11001 COO H11002 CH CH 2 O A A O A H 3 N H11001 COO H11002 CH O PO H11002 Phosphoserine Phosphothreonine Phosphotyrosine O A B P A O O A OOO A O H11002 O OP O H11002 B O A OO O H11002 O O A HC H 3 N H11001 COO H11002 CH CH 2 O A A O A O O H11002 O H11002 P H 3 N H11001 CH CH 2 O A A O COO H11002 CH 3 O H11001 N Methyllysine Dimethyllysine Trimethyllysine Methylglutamate E H 3 N H11001 COO H11002 C H CH 2 O A A O P A O H CH 2 A CH 2 A CH 2 A A H11001 NH 2 H 3 N H11001 COO H11002 CH CH 2 O A A O A A CH 2 A CH 2 A CH 2 A CH 3 H11001 NH H 3 N H11001 COO H11002 CH CH 2 O A A O A GD CH 2 A CH 2 A CH 2 A CH 3 CH 3 CH 3 CH 3 CH 3 H 3 N H11001 COO H11002 CHO A A O A CH 2 A CH 2 A A C O CH 3 FIGURE 27–29 Some modified amino acid residues. (a) Phosphorylated amino acids. (b) A carboxylated amino acid. (c) Some methylated amino acids. Ras SH O H11002 OP PP i CH 2 H11002 O S Farnesyl pyrophosphate Ras protein Farnesylated Ras protein A O B O O PO OO O O B A O H11002 O CH 2 O Ras O FIGURE 27–30 Farnesylation of a Cys residue. The thioether linkage is shown in red. The Ras protein is the product of the ras oncogene. (a) (b) (c) 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1064 mac76 mac76:385_reb: molecule of acetyl-CoA carboxylase and the heme group of hemoglobin or cytochrome c. Proteolytic Processing Many proteins are initially syn- thesized as large, inactive precursor polypeptides that are proteolytically trimmed to form their smaller, active forms. Examples include proinsulin, some viral proteins, and proteases such as chymotrypsinogen and trypsino- gen (see Fig. 6–33). Formation of Disulfide Cross-Links After folding into their native conformations, some proteins form intrachain or interchain disulfide bridges between Cys residues. In eu- karyotes, disulfide bonds are common in proteins to be exported from cells. The cross-links formed in this way help to protect the native conformation of the protein molecule from denaturation in the extracellular envi- ronment, which can differ greatly from intracellular con- ditions and is generally oxidizing. Protein Synthesis Is Inhibited by Many Antibiotics and Toxins Protein synthesis is a central function in cellular phys- iology and is the primary target of many naturally oc- curring antibiotics and toxins. Except as noted, these antibiotics inhibit protein synthesis in bacteria. The dif- ferences between bacterial and eukaryotic protein syn- thesis, though in some cases subtle, are sufficient that most of the compounds discussed below are relatively harmless to eukaryotic cells. Natural selection has fa- vored the evolution of compounds that exploit minor differences in order to affect bacterial systems selec- tively, such that these biochemical weapons are syn- thesized by some microorganisms and are extremely toxic to others. Because nearly every step in protein synthesis can be specifically inhibited by one antibiotic or another, antibiotics have become valuable tools in the study of protein biosynthesis. 27.2 Protein Synthesis 1065 BOX 27–4 WORKING IN BIOCHEMISTRY Induced Variation in the Genetic Code: Nonsense Suppression When a mutation introduces a termination codon in the interior of a gene, translation is prematurely halted and the incomplete polypeptide is usually inactive. These are called nonsense mutations. The gene can be restored to normal function if a second mutation either (1) converts the misplaced termination codon to a codon specifying an amino acid or (2) suppresses the effects of the termination codon. Such restorative mutations are called nonsense suppressors; they generally involve mutations in tRNA genes to produce altered (suppressor) tRNAs that can recognize the termination codon and insert an amino acid at that po- sition. Most known suppressor tRNAs have single base substitutions in their anticodons. Suppressor tRNAs constitute an experimentally induced variation in the genetic code to allow the read- ing of what are usually termination codons, much like the naturally occurring code variations described in Box 27–2. Nonsense suppression does not completely disrupt normal information transfer in a cell, because the cell usually has several copies of each tRNA gene; some of these duplicate genes are weakly expressed and account for only a minor part of the cellular pool of a particular tRNA. Suppressor mutations usually in- volve a “minor” tRNA, leaving the major tRNA to read its codon normally. For example, E. coli has three identical genes for tRNA Tyr , each producing a tRNA with the anticodon (5H11032)GUA. One of these genes is expressed at relatively high levels and thus its product represents the major tRNA Tyr species; the other two genes are transcribed in only small amounts. A change in the anticodon of the tRNA product of one of these duplicate tRNA Tyr genes, from (5H11032)GUA to (5H11032)CUA, produces a minor tRNA Tyr species that will insert tyrosine at UAG stop codons. This insertion of tyrosine at UAG is carried out inefficiently, but it can produce enough full-length protein from a gene with a nonsense mutation to al- low the cell to survive. The major tRNA Tyr continues to translate the genetic code normally for the major- ity of proteins. The mutation that leads to creation of a sup- pressor tRNA does not always occur in the anticodon. The suppression of UGA nonsense codons generally involves the tRNA Trp that normally recognizes UGG. The alteration that allows it to read UGA (and insert Trp residues at these positions) is a G to A change at position 24 (in an arm of the tRNA somewhat re- moved from the anticodon); this tRNA can now rec- ognize both UGG and UGA. A similar change is found in tRNAs involved in the most common naturally oc- curring variation in the genetic code (UGA H11005 Trp; see Box 27–2). Suppression should lead to many abnormally long proteins, but this does not always occur. We under- stand only a few details of the molecular events in translation termination and nonsense suppression. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1065 mac76 mac76:385_reb: OH 3 C CH 3 CONH 2 OH OH OH O CH 3 O CH 3 N OH H OH Tetracycline CH 3 OH CH 2 CH OH OCH NH C O 2 N CHCl 2 Chloramphenicol Puromycin, made by the mold Streptomyces al- boniger, is one of the best-understood inhibitory an- tibiotics. Its structure is very similar to the 3H11032 end of an aminoacyl-tRNA, enabling it to bind to the ribosomal A site and participate in peptide bond formation, produc- ing peptidyl-puromycin (Fig. 27–31). However, because puromycin resembles only the 3H11032 end of the tRNA, it does not engage in translocation and dissociates from the ribosome shortly after it is linked to the carboxyl terminus of the peptide. This prematurely terminates polypeptide synthesis. Tetracyclines inhibit protein synthesis in bacteria by blocking the A site on the ribosome, preventing the binding of aminoacyl-tRNAs. Chloramphenicol in- hibits protein synthesis by bacterial (and mitochondrial Chapter 27 Protein Metabolism1066 HC NH R CO .. HC CH 2 N C O OCH 3 NHOH N N N N N CH 3 O HH HH CH 2 HO CH 3 H (b) AA HC NH R C O OOH N N N N NH 2 HH HH CH 2 O P 5H11032 O peptidyl transferase P 5H11032 A 3H11032 E 5H11032 (a) P site peptidyl-tRNA A site puromycin HC CH 2 H 2 N C O OCH 3 NHOH N N N N N CH 3 O HH HH CH 2 HO CH 3 .. PA 3H110325H11032 mRNA 5H11032 E FIGURE 27–31 Disruption of peptide bond formation by puromycin. (a) The antibiotic puromycin resembles the aminoacyl end of a charged tRNA, and it can bind to the ribosomal A site and participate in pep- tide bond formation. The product of this reaction, instead of being translocated to the P site, dissociates from the ribosome, causing pre- mature chain termination. (b) Peptidyl puromycin. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1066 mac76 mac76:385_reb: NH CH 2 N ON H CH 2 OH 3 C CHOH CH 3 Cycloheximide O H Streptomycin NH H CH 3 O H H C O H HO H CHO H H O H H 2 N H H H O HO CH 3 N OH OH OH NH OH CNH 2 CH 2 OH NH NH ON H CH 2 CHOH Cycloheximide O H H H H and chloroplast) ribosomes by blocking peptidyl trans- fer; it does not affect cytosolic protein synthesis in eu- karyotes. Conversely, cycloheximide blocks the pep- tidyl transferase of 80S eukaryotic ribosomes but not that of 70S bacterial (and mitochondrial and chloro- plast) ribosomes. Streptomycin, a basic trisaccharide, causes misreading of the genetic code (in bacteria) at relatively low concentrations and inhibits initiation at higher concentrations. Each tRNA has an amino acid arm with the terminal sequence CCA(3H11032) to which an amino acid is esterified, an anticodon arm, a TH9274C arm, and a D arm; some tRNAs have a fifth arm. The anticodon is responsible for the specificity of interaction between the aminoacyl-tRNA and the complementary mRNA codon. ■ The growth of polypeptides on ribosomes begins with the amino-terminal amino acid and proceeds by successive additions of new residues to the carboxyl-terminal end. ■ Protein synthesis occurs in five stages. 1. Amino acids are activated by specific aminoacyl-tRNA synthetases in the cytosol. These enzymes catalyze the formation of aminoacyl-tRNAs, with simultaneous cleavage of ATP to AMP and PP i . The fidelity of protein synthesis depends on the accuracy of this reaction, and some of these enzymes carry out proofreading steps at separate active sites. In bacteria, the initiating aminoacyl-tRNA in all proteins is N-formylmethionyl-tRNA fMet . 2. Initiation of protein synthesis involves formation of a complex between the 30S ribosomal subunit, mRNA, GTP, fMet-tRNA fMet , three initiation factors, and the 50S subunit; GTP is hydrolyzed to GDP and P i . 3. In the elongation steps, GTP and elongation factors are required for binding the incoming aminoacyl-tRNA to the A site on the ribosome. In the first peptidyl transfer reaction, the fMet residue is transferred to the amino group of the incoming aminoacyl-tRNA. Movement of the ribosome along the mRNA then translocates the dipeptidyl-tRNA from the A site to the P site, a process requiring hydrolysis of GTP. Deacylated tRNAs dissociate from the ribosomal E site. 4. After many such elongation cycles, synthesis of the polypeptide is terminated with the aid of release factors. At least four high-energy phosphate equivalents (from ATP and GTP) are required to generate each peptide bond, an energy investment required to guarantee fidelity of translation. 5. Polypeptides fold into their active, three-dimensional forms. Many proteins are further processed by posttranslational modification reactions. ■ Many well-studied antibiotics and toxins inhibit some aspect of protein synthesis. 27.2 Protein Synthesis 1067 Several other inhibitors of protein synthesis are no- table because of their toxicity to humans and other mammals. Diphtheria toxin (M r 58,330) catalyzes the ADP-ribosylation of a diphthamide (a modified histi- dine) residue of eukaryotic elongation factor eEF2, thereby inactivating it. Ricin (M r 29,895), an extremely toxic protein of the castor bean, inactivates the 60S sub- unit of eukaryotic ribosomes by depurinating a specific adenosine in 23S rRNA. SUMMARY 27.2 Protein Synthesis ■ Protein synthesis occurs on the ribosomes, which consist of protein and rRNA. Bacteria have 70S ribosomes, with a large (50S) and a small (30S) subunit. Eukaryotic ribosomes are significantly larger (80S) and contain more proteins. ■ Transfer RNAs have 73 to 93 nucleotide residues, some of which have modified bases. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1067 mac76 mac76:385_reb: 27.3 Protein Targeting and Degradation The eukaryotic cell is made up of many structures, com- partments, and organelles, each with specific functions that require distinct sets of proteins and enzymes. These proteins (with the exception of those produced in mi- tochondria and plastids) are synthesized on ribosomes in the cytosol, so how are they directed to their final cellular destinations? We are now beginning to understand this complex and fascinating process. Proteins destined for secretion, integration in the plasma membrane, or inclusion in lyso- somes generally share the first few steps of a pathway that begins in the endoplasmic reticulum. Proteins des- tined for mitochondria, chloroplasts, or the nucleus use three separate mechanisms. And proteins destined for the cytosol simply remain where they are synthesized. The most important element in many of these tar- geting pathways is a short sequence of amino acids called a signal sequence, whose function was first pos- tulated by Günter Blobel and colleagues in 1970. The signal sequence directs a protein to its appropriate lo- cation in the cell and, for many proteins, is removed dur- ing transport or after the protein has reached its final destination. In proteins slated for transport into mito- chondria, chloroplasts, or the ER, the signal sequence is at the amino terminus of a newly synthesized polypep- tide. In many cases, the targeting capacity of particular signal sequences has been confirmed by fusing the sig- nal sequence from one protein to a second protein and showing that the signal directs the second protein to the location where the first protein is normally found. The selective degradation of proteins no longer needed by the cell also relies largely on a set of molecular signals embedded in each protein’s structure. In this concluding section we examine protein tar- geting and degradation, emphasizing the underlying sig- nals and molecular regulation that are so crucial to cel- lular metabolism. Except where noted, the focus is now on eukaryotic cells. Posttranslational Modification of Many Eukaryotic Proteins Begins in the Endoplasmic Reticulum Perhaps the best-characterized targeting system begins in the ER. Most lysosomal, membrane, or secreted pro- teins have an amino-terminal signal sequence (Fig. 27–32) that marks them for translocation into the lu- men of the ER; hundreds of such signal sequences have been determined. The carboxyl terminus of the signal sequence is defined by a cleavage site, where protease action removes the sequence after the protein is im- ported into the ER. Signal sequences vary in length from 13 to 36 amino acid residues, but all have the following features: (1) about 10 to 15 hydrophobic amino acid residues; (2) one or more positively charged residues, usually near the amino terminus, preceding the hy- drophobic sequence; and (3) a short sequence at the carboxyl terminus (near the cleavage site) that is rela- tively polar, typically having amino acid residues with short side chains (especially Ala) at the positions clos- est to the cleavage site. As originally demonstrated by George Palade, pro- teins with these signal sequences are synthesized on ri- bosomes attached to the ER. The signal sequence itself helps to direct the ribosome to the ER, as illustrated by Chapter 27 Protein Metabolism1068 Human influenza virus A Human preproinsulin Bovine growth hormone Bee promellitin Drosophila glue protein Met Lys Ala Lys Leu Leu Val Leu Leu Tyr Ala Phe Val Ala Gly Asp Gln cleavage site Met Ala Leu Trp Met Arg Leu Leu Pro Leu Leu Ala Leu Leu Ala Leu Trp Gly Pro Asp Pro Ala Ala Ala Phe Val Met Lys Phe Leu Val Asn Val Ala Leu Val Phe Met Val Val Tyr Ile Ser Tyr Ile Tyr Ala Ala Pro Met Lys Leu Leu Val Val Ala Val Ile Ala Cys Met Leu Ile Gly Phe Ala Asp Pro Ala Ser Gly Cys Lys Met Met Ala Ala Gly Pro Arg Thr Ser Leu Leu Leu Ala Phe Ala Leu Leu Cys Leu Pro Trp Thr Gln Val Val Gly Ala Phe FIGURE 27–32 Translocation into the ER directed by amino-terminal signal sequences of some eukaryotic proteins. The hydrophobic core (yellow) is preceded by one or more basic residues (blue). Note the polar and short-side-chain residues immediately preceding (to the left of, as shown here) the cleavage sites (indicated by red arrows). Günter Blobel George Palade 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1068 mac76 mac76:385_reb: steps 1 through 8 in Figure 27–33. 1 The targeting pathway begins with initiation of protein synthesis on free ribosomes. 2 The signal sequence appears early in the synthetic process, because it is at the amino termi- nus, which as we have seen is synthesized first. 3 As it emerges from the ribosome, the signal sequence—and the ribosome itself—are bound by the large signal recognition particle (SRP); SRP then binds GTP and halts elongation of the polypeptide when it is about 70 amino acids long and the signal sequence has completely emerged from the ribosome. 4 The GTP-bound SRP now directs the ribosome (still bound to the mRNA) and the incomplete polypeptide to GTP-bound SRP recep- tors in the cytosolic face of the ER; the nascent polypep- tide is delivered to a peptide translocation complex in the ER, which may interact directly with the ribo- some. 5 SRP dissociates from the ribosome, accompa- nied by hydrolysis of GTP in both SRP and the SRP re- ceptor. 6 Elongation of the polypeptide now resumes, with the ATP-driven translocation complex feeding the growing polypeptide into the ER lumen until the com- plete protein has been synthesized. 7 The signal se- quence is removed by a signal peptidase within the ER lumen; 8 the ribosome dissociates and is recycled. Glycosylation Plays a Key Role in Protein Targeting In the ER lumen, newly synthesized proteins are further modified in several ways. Following the removal of sig- nal sequences, polypeptides are folded, disulfide bonds formed, and many proteins glycosylated to form glyco- proteins. In many glycoproteins the linkage to their oligosaccharides is through Asn residues. These N- linked oligosaccharides are diverse (Chapter 7), but the pathways by which they form have a common first step. A 14 residue core oligosaccharide is built up in a step- wise fashion, then transferred from a dolichol phosphate donor molecule to certain Asn residues in the protein (Fig. 27–34). The transferase is on the lumenal face of the ER and thus cannot catalyze glycosylation of cyto- solic proteins. After transfer, the core oligosaccharide is trimmed and elaborated in different ways on different 27.3 Protein Targeting and Degradation 1069 FIGURE 27–33 Directing eukaryotic proteins with the appropriate signals to the endoplasmic reticulum. This process involves the SRP cycle and translocation and cleavage of the nascent polypeptide. The steps are described in the text. SRP is a rod-shaped complex con- taining a 300 nucleotide RNA (7SL-RNA) and six different proteins (combined M r 325,000). One protein subunit of SRP binds directly to the signal sequence, inhibiting elongation by sterically blocking the entry of aminoacyl-tRNAs and inhibiting peptidyl transferase. Another protein subunit binds and hydrolyzes GTP. The SRP receptor is a het- erodimer of H9251 (M r 69,000) and H9252 (M r 30,000) subunits, both of which bind and hydrolyze multiple GTP molecules during this process. A A A ( A ) n Signal sequence SRP 2 1 SRP receptor Peptide translocation complex Ribosome receptor Signal peptidase Cytosol Endoplasmic reticulum ER lumen 5 H11032 cap mRNA Ribosome cycle SRP cycle 3 4 GTP GDP H11001 P i 5 6 7 8 GUA Dolichol phosphate (n H11005 9–22) n CH 3 CH 3 CH 3 CH 3 P 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1069 mac76 mac76:385_reb: proteins, but all N-linked oligosaccharides retain a pen- tasaccharide core derived from the original 14 residue oligosaccharide. Several antibiotics act by interfering with one or more steps in this process and have aided in elucidating the steps of protein glycosylation. The best-characterized is tunicamycin, which mimics the structure of UDP-N-acetylglucosamine and blocks the first step of the process (Fig. 27–34, step 1 ). A few pro- teins are O-glycosylated in the ER, but most O-glyco- sylation occurs in the Golgi complex or in the cytosol (for proteins that do not enter the ER). Suitably modified proteins can now be moved to a variety of intracellular destinations. Proteins travel from the ER to the Golgi complex in transport vesicles (Fig. 27–35). In the Golgi complex, oligosaccharides are O- linked to some proteins, and N-linked oligosaccharides are further modified. By mechanisms not yet fully un- derstood, the Golgi complex also sorts proteins and sends them to their final destinations. The processes that segregate proteins targeted for secretion from those targeted for the plasma membrane or lysosomes must distinguish among these proteins on the basis of structural features other than signal sequences, which were removed in the ER lumen. Chapter 27 Protein Metabolism1070 FIGURE 27–34 Synthesis of the core oligosaccharide of glycopro- teins. The core oligosaccharide is built up by the successive addition of monosaccharide units. 1 , 2 The first steps occur on the cytoso- lic face of the ER. 3 Translocation moves the incomplete oligosac- charide across the membrane (mechanism not shown), and 4 com- pletion of the core oligosaccharide occurs within the lumen of the ER. The precursors that contribute additional mannose and glucose residues to the growing oligosaccharide in the lumen are dolichol phosphate derivatives. In the first step in the construction of the N- linked oligosaccharide moiety of a glycoprotein, 5 , 6 the core oligosaccharide is transferred from dolichol phosphate to an Asn residue of the protein within the ER lumen. The core oligosaccharide is then further modified in the ER and the Golgi complex in pathways that differ for different proteins. The five sugar residues shown sur- rounded by a beige screen (after step 7 ) are retained in the final structure of all N-linked oligosaccharides. 8 The released dolichol pyrophosphate is again translocated so that the pyrophosphate is on the cytosolic face of the ER, then 9 a phosphate is hydrolytically re- moved to regenerate dolichol phosphate. A CH 2 OH H O OH NH HO NH CH 2 C H H H H O O H CH 3 OH H OHH H A HN O O N CHOH OH OH H HH H O (n H11005 8–11) A H O N-Acetylglucosamine Uracil Tunicamine Fatty acyl side chain CH 3 OC HC HC CH (CH 2 ) n CH 3 a b Tunicamycin Cytosol P Dolichol P 5 GDP-Man5 GDP 2 UDP-GlcNAcUMP H11001 UDP tunicamycin P iP P 4 Dolichol P Man 4 Dolichol P 3 Dolichol P Glc 3 Dolichol P P P Asn NH 3 H11001 H 3 N H11001 H 3 N H11001 3H11032 5H11032 mRNA dolichol phosphate recycled translocation Endoplasmic reticulum P P P P P P 12 3 56 7 8 9 4 PP N-Acetylglucosamine (GlcNAc) Mannose (Man) Glucose (Glc) 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1070 mac76 mac76:385_reb: This sorting process is best understood in the case of hydrolases destined for transport to lysosomes. On arrival of a hydrolase (a glycoprotein) in the Golgi com- plex, an as yet undetermined feature (sometimes called a signal patch) of the three-dimensional structure of the hydrolase is recognized by a phosphotransferase, which phosphorylates certain mannose residues in the oligo- saccharide (Fig. 27–36). The presence of one or more mannose 6-phosphate residues in its N-linked oligosac- charide is the structural signal that targets the protein to lysosomes. A receptor protein in the membrane of the Golgi complex recognizes the mannose 6-phosphate signal and binds the hydrolase so marked. Vesicles con- taining these receptor-hydrolase complexes bud from the trans side of the Golgi complex and make their way to sorting vesicles. Here, the receptor-hydrolase com- plex dissociates in a process facilitated by the lower pH in the vesicle and by phosphatase-catalyzed removal of phosphate groups from the mannose 6-phosphate residues. The receptor is then recycled to the Golgi com- plex, and vesicles containing the hydrolases bud from the sorting vesicles and move to the lysosomes. In cells treated with tunicamycin (Fig. 27–34, step 1 ), hydro- lases that should be targeted for lysosomes are instead secreted, confirming that the N-linked oligosaccharide plays a key role in targeting these enzymes to lysosomes. The pathways that target proteins to mitochondria and chloroplasts also rely on amino-terminal signal se- quences. Although mitochondria and chloroplasts con- tain DNA, most of their proteins are encoded by nuclear DNA and must be targeted to the appropriate organelle. Unlike other targeting pathways, however, the mito- chondrial and chloroplast pathways begin only after a precursor protein has been completely synthesized and released from the ribosome. Precursor proteins destined for mitochondria or chloroplasts are bound by cytosolic chaperone proteins and delivered to receptors on the exterior surface of the target organelle. Specialized translocation mechanisms then transport the protein to its final destination in the organelle, after which the sig- nal sequence is removed. Signal Sequences for Nuclear Transport Are Not Cleaved Molecular communication between the nucleus and the cytosol requires the movement of macromolecules through nuclear pores. RNA molecules synthesized in the nucleus are exported to the cytosol. Ribosomal pro- teins synthesized on cytosolic ribosomes are imported into the nucleus and assembled into 60S and 40S ribo- somal subunits in the nucleolus; completed subunits are then exported back to the cytosol. A variety of nuclear proteins (RNA and DNA polymerases, histones, topo- isomerases, proteins that regulate gene expression, and so forth) are synthesized in the cytosol and imported into the nucleus. This traffic is modulated by a complex system of molecular signals and transport proteins that is gradually being elucidated. In most multicellular eukaryotes, the nuclear enve- lope breaks down at each cell division, and once divi- sion is completed and the nuclear envelope reestab- lished, the dispersed nuclear proteins must be reimported. To allow this repeated nuclear importation, the signal sequence that targets a protein to the nu- cleus—the nuclear localization sequence, NLS—is not removed after the protein arrives at its destination. An NLS, unlike other signal sequences, may be located al- most anywhere along the primary sequence of the pro- tein. NLSs can vary considerably, but many consist of four to eight amino acid residues and include several consecutive basic (Arg or Lys) residues. Nuclear importation is mediated by a number of pro- teins that cycle between the cytosol and the nucleus (Fig. 27–37), including importin H9251 and H9252 and a small GTPase known as Ran. A heterodimer of importin H9251 and H9252 functions as a soluble receptor for proteins targeted to the nucleus, with the H9251 subunit binding NLS-bearing 27.3 Protein Targeting and Degradation 1071 granule Golgi complex FIGURE 27–35 Pathway taken by proteins destined for lysosomes, the plasma membrane, or secretion. Proteins are moved from the ER to the cis side of the Golgi complex in transport vesicles. Sorting oc- curs primarily in the trans side of the Golgi complex. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1071 mac76 mac76:385_reb: proteins in the cytosol. The complex of the NLS- bearing protein and the importin docks at a nuclear pore and is translocated through the pore by an energy- dependent mechanism that requires the Ran GTPase. The two importin subunits separate during the translo- cation, and the NLS-bearing protein dissociates from im- portin H9251 inside the nucleus. Importin H9251 and H9252 are then exported from the nucleus to repeat the process. How importin H9251 remains dissociated from the many NLS- bearing proteins inside the nucleus is not yet clear. Bacteria Also Use Signal Sequences for Protein Targeting Bacteria can target proteins to their inner or outer mem- branes, to the periplasmic space between these mem- branes, or to the extracellular medium. They use signal sequences at the amino terminus of the proteins (Fig. 27–38), much like those on eukaryotic proteins targeted to the ER, mitochondria, and chloroplasts. Most proteins exported from E. coli make use of the pathway shown in Figure 27–39. Following transla- tion, a protein to be exported may fold only slowly, the amino-terminal signal sequence impeding the folding. The soluble chaperone protein SecB binds to the pro- tein’s signal sequence or other features of its incom- pletely folded structure. The bound protein is then de- livered to SecA, a protein associated with the inner surface of the plasma membrane. SecA acts as both a receptor and a translocating ATPase. Released from SecB and bound to SecA, the protein is delivered to a translocation complex in the membrane, made up of SecY, E, and G, and is translocated stepwise through the membrane at the SecYEG complex in lengths of about 20 amino acid residues. Each step is facilitated by the hydrolysis of ATP, catalyzed by SecA. Chapter 27 Protein Metabolism1072 H H H O H HO CH 2 OH A O B O O O O Uridine H11001 H H H H HO O O H5008 O H Oligosaccharide O H NO Enzyme Mannose 6-phosphate residue UDP N-Acetylglucosamine O O GlcNAc A CH 3 NH PCO A PO H H H HOH H HO CH 2 OH OO O O A B OP O H5008 O OCH 2 H5008 OO A B OP O H5008 O H OOligosaccharide H NO Enzyme Hydrolase A CH 3 PCO A O H H H H HO OO Oligosaccharide H NO Enzyme O UMP N-acetylglucosamine phosphotransferase NH O H H H HOH H HO CH 2 O O O O A B OP O H5008 O CH 2 OH H O phosphodiesterase Mannose residue OH (UDP-GlcNAc) OH OH HO HO HOFIGURE 27–36 Phosphorylation of mannose residues on lysosome-targeted enzymes. N-Acetylglucosamine phosphotransferase recognizes some as yet unidentified structural feature of hydrolases destined for lysosomes. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1072 mac76 mac76:385_reb: 27.3 Protein Targeting and Degradation 1073 Nuclear envelope Nuclear protein Importin Nuclear pore complex NLS Cytosol Nucleoplasm 1 2 3 4 5 6 b b a a a GTP Ran GDP H11001 P i (a) 0.2 mH9262 (b) FIGURE 27–37 Targeting of nuclear proteins. (a) 1 A protein with an appropriate nuclear localization signal (NLS) is bound by a complex of importin H9251 and H9252. 2 The resulting complex binds to a nuclear pore, and 3 translocation is mediated by the Ran GTPase. 4 Inside the nucleus, importin H9252 dissociates from importin H9251, and 5 importin H9251 then releases the nuclear protein. 6 Importin H9251 and H9252 are transported out of the nucleus and recycled. (b) Scanning electron micrograph of the surface of the nuclear envelope, showing numerous nuclear pores. cleavage site Inner membrane proteins Phage fd, major coat protein Phage fd, minor coat protein Periplasmic proteins Alkaline phosphatase Leucine-specific binding protein -Lactamase of pBR322 Outer membrane proteins Lipoprotein LamB OmpA Met Lys Lys Ser Leu Val Leu Lys Ala Ser Val Ala Val Ala Thr Leu Val Pro Met Leu Ser Phe Ala Ala Glu Met Lys Lys Leu Leu Phe Ala Ile Pro Leu Val Val Pro Phe Tyr Ser His Ser Ala Glu Met Lys Gln Ser Thr Ile Leu Ala Leu Leu Pro Leu Leu Phe Thr Pro Val Thr Lys Ala Arg Thr Ala Val Ala Leu Ala Met Val Ala Ala Pro Asp Cys Val His Asp Pro Asp Ser Ala Ala Gln Ala Gly Phe Ala Met Ala Thr Pro Leu Gln His Leu Ala Thr Thr Ser Cys Ala Ser Ser Met Gly Phe Ile Phe Val Leu Gly Ala Ala Ala Gly Ile Ala Leu Ala Val Leu Phe Ala Ala Ala Ala Phe Ile Val Val Gly Pro Met Ala Ala Leu Ile Gly Ile Val Val Leu Ala Ala Ala Leu Ile Ala Ile Leu Lys Ile Val Ala Thr Pro Arg Thr Thr Leu Ala Phe Lys Ala His Lys Lys Lys Lys Arg Met Asn Gln Met Leu Thr Ile Ile Ser Met Met Lys Ala Met Met H9252 FIGURE 27–38 Signal sequences that target proteins to different lo- cations in bacteria. Basic amino acids (blue) near the amino termi- nus and hydrophobic core amino acids (yellow) are highlighted. The cleavage sites marking the ends of the signal sequences are indicated by red arrows. Note that the inner bacterial cell membrane (see Fig. 1–6) is where phage fd coat proteins and DNA are assembled into phage particles. OmpA is outer membrane protein A; LamB is a cell surface receptor protein for bacteriophage lambda. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1073 mac76 mac76:385_reb: Chapter 27 Protein Metabolism1074 FIGURE 27–39 Model for protein export in bacteria. 1 A newly translated polypeptide binds to the cytosolic chaperone protein SecB, which 2 delivers it to SecA, a protein associated with the translocation complex (SecYEG) in the bacterial cell membrane. 3 SecB is released, and SecA inserts itself into the membrane, forcing about 20 amino acid residues of the protein to be exported through the translocation complex. 4 Hydrolysis of an ATP by SecA provides the energy for a conformational change that causes SecA to withdraw from the membrane, releasing the polypeptide. 5 SecA binds another ATP, and the next stretch of 20 amino acid residues is pushed across the membrane through the translocation complex. Steps 4 and 5 are repeated until 6 the entire protein has passed through and is released to the periplasm. The electrochemical potential across the membrane (denoted by H11001 and H11002) also provides some of the driving force required for protein translocation. Cytosol Periplasmic space 3 4 5 6 2 H11001 1 SecB SecB ATP ADP H11001 P i ATP SecA SecYEG H11001H11001H11001 H11002H11002H11002 (c) 0.1 mm Light chain Heavy chain (a) (b) ~80 nm FIGURE 27–40 Clathrin. (a) Three light (L) chains (M r 35,000) and three heavy (H) chains (M r 180,000) of the (HL) 3 clathrin unit, or- ganized as a three-legged structure called a triskelion. (b) Triskelions tend to assemble into polyhedral lattices. (c) Electron micrograph of a coated pit on the cytosolic face of the plasma membrane of a fibroblast. An exported protein is thus pushed through the membrane by a SecA protein located on the cytoplas- mic surface, rather than being pulled through the mem- brane by a protein on the periplasmic surface. This dif- ference may simply reflect the need for the translocating ATPase to be where the ATP is. The transmembrane electrochemical potential can also provide energy for translocation of the protein, by an as yet unknown mechanism. Although most exported bacterial proteins use this pathway, some follow an alternative pathway that uses signal recognition and receptor proteins homologous to components of the eukaryotic SRP and SRP receptor (Fig. 27–33). Cells Import Proteins by Receptor-Mediated Endocytosis Some proteins are imported into cells from the sur- rounding medium; examples in eukaryotes include low- density lipoprotein (LDL), the iron-carrying protein transferrin, peptide hormones, and circulating proteins destined for degradation. The proteins bind to recep- tors in invaginations of the membrane called coated pits, which concentrate endocytic receptors in prefer- ence to other cell-surface proteins. The pits are coated on their cytosolic side with a lattice of the protein clathrin, which forms closed polyhedral structures (Fig. 27–40). The clathrin lattice grows as more recep- 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1074 mac76 mac76:385_reb: tors are occupied by target proteins, until a complete membrane-bounded endocytic vesicle buds off the plasma membrane and enters the cytoplasm. The clathrin is quickly removed by uncoating enzymes, and the vesicle fuses with an endosome. ATPase activity in the endosomal membranes reduces the pH therein, facilitating dissociation of receptors from their target proteins. The imported proteins and receptors then go their separate ways, their fates varying with the cell and pro- tein type. Transferrin and its receptor are eventually re- cycled. Some hormones, growth factors, and immune complexes, after eliciting the appropriate cellular re- sponse, are degraded along with their receptors. LDL is degraded after the associated cholesterol has been de- livered to its destination, but the LDL receptor is recy- cled (see Fig. 21–42). Receptor-mediated endocytosis is exploited by some toxins and viruses to gain entry to cells. Influenza virus (see Fig. 11–24), diphtheria toxin, and cholera toxin all enter cells in this way. Protein Degradation Is Mediated by Specialized Systems in All Cells Protein degradation prevents the buildup of abnormal or unwanted proteins and permits the recycling of amino acids. The half-lives of eukaryotic proteins vary from 30 seconds to many days. Most proteins turn over rapidly relative to the lifetime of a cell, although a few (such as hemoglobin) can last for the life of the cell (about 110 days for an erythrocyte). Rapidly degraded proteins in- clude those that are defective because of incorrectly in- serted amino acids or because of damage accumulated during normal functioning. And enzymes that act at key regulatory points in metabolic pathways often turn over rapidly. Defective proteins and those with characteristically short half-lives are generally degraded in both bacterial and eukaryotic cells by selective ATP-dependent cy- tosolic systems. A second system in vertebrates, oper- ating in lysosomes, recycles the amino acids of mem- brane proteins, extracellular proteins, and proteins with characteristically long half-lives. In E. coli, many proteins are degraded by an ATP- dependent protease called Lon (the name refers to the “long form” of proteins, observed only when this pro- tease is absent). The protease is activated in the pres- ence of defective proteins or those slated for rapid turnover; two ATP molecules are hydrolyzed for every peptide bond cleaved. The precise role of this ATP hy- drolysis is not yet clear. Once a protein has been reduced to small inactive peptides, other ATP-independent pro- teases complete the degradation process. The ATP-dependent pathway in eukaryotic cells is quite different, involving the protein ubiquitin, which, as its name suggests, occurs throughout the eukaryotic kingdoms. One of the most highly conserved proteins known, ubiquitin (76 amino acid residues) is essentially identical in organisms as different as yeasts and humans. Ubiquitin is covalently linked to proteins slated for de- struction via an ATP-dependent pathway involving three separate enzymes (E1, E2, and E3 in Fig. 27–41). 27.3 Protein Targeting and Degradation 1075 Ubiquitin E3 E2 Target protein O C Ubiquitin O O H11002 G C AMP H11001 PP i HS E1 HS HS H11001 ATP B Ubiquitin O C B S Lys Ubiquitin O C B S H 2 N Target protein LysONH E1 E1 E2 HS E2 Repeated cycles lead to attachment of additional ubiquitin FIGURE 27–41 Three-step cascade pathway by which ubiquitin is at- tached to a protein. Two different enzyme-ubiquitin intermediates are involved. The free carboxyl group of ubiquitin’s carboxyl-terminal Gly residue is ultimately linked through an amide (isopeptide) bond to an H9280-amino group of a Lys residue of the target protein. Additional cycles produce polyubiquitin, a covalent polymer of ubiquitin subunits that targets the attached protein for destruction in eukaryotes. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1075 mac76 mac76:385_reb: Ubiquitinated proteins are degraded by a large com- plex known as the 26S proteasome (M r 2.5 H11003 10 6 ) (Fig. 27–42). The proteasome consists of two copies each of at least 32 different subunits, most of which are highly conserved from yeasts to humans. The protea- some contains two main types of subcomplexes, a barrel- like core particle and regulatory particles on either end of the barrel. The 20S core particle consists of four rings; the outer rings are formed from seven H9251 subunits, and the inner rings from seven H9252 subunits. Three of the seven subunits in each H9252 ring have protease activities, each with different substrate specificities. The stacked rings of the core particle form the barrel-like structure within which target proteins are degraded. The 19S reg- ulatory particle on each end of the core particle con- tains 18 subunits, including some that recognize and bind to ubiquitinated proteins. Six of the subunits are ATPases that probably function in unfolding the ubiq- uitinated proteins and translocating the unfolded polypeptide into the core particle for degradation. Although we do not yet understand all the signals that trigger ubiquitination, one simple signal has been found. For many proteins, the identity of the first residue that remains after removal of the amino-terminal Met residue, and any other posttranslational proteolytic pro- cessing of the amino-terminal end, has a profound in- fluence on half-life (Table 27–9). These amino-terminal signals have been conserved over billions of years of evo- lution, and are the same in bacterial protein degradation systems and in the human ubiquitination pathway. More complex signals, such as the destruction box discussed in Chapter 12 (see Fig. 12–44), are also being identified. Ubiquitin-dependent proteolysis is as important for the regulation of cellular processes as for the elimina- tion of defective proteins. Many proteins required at only one stage of the eukaryotic cell cycle are rapidly degraded by the ubiquitin-dependent pathway after completing their function. The same pathway also processes and presents class I MHC antigens (see Fig. 5–22). Ubiquitin-dependent destruction of cyclin is crit- ical to cell-cycle regulation (see Fig. 12–44). The E2 and E3 components of the ubiquitination cascade pathway Chapter 27 Protein Metabolism1076 TABLE 27–9 Amino-terminal residue Half-life* Stabilizing Met, Gly, Ala, Ser, Thr, Val >20 h Destabilizing Ile, Gln ~30 min Tyr, Glu ~10 min Pro ~7 min Leu, Phe, Asp, Lys ~3 min Arg ~2 min Relationship between Protein Half-Life and Amino-Terminal Amino Acid Residue Source: Modified from Bachmair, A., Finley, D., & Varshavsky, A. (1986) In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179–186. *Half-lives were measured in yeast for the H9252-galactosidase protein modified so that in each experiment it had a different amino-terminal residue. (See Chapter 9 for a discussion of techniques used to engineer proteins with altered amino acid sequences.) Half-lives may vary for different proteins and in different organisms, but this general pattern appears to hold for all organisms. Substrate protein Polyubiquitin attached to protein interacts with proteasome 19S regulatory particle (a) 20S core particle (b) Complete proteasome FIGURE 27–42 Three-dimensional structure of the eukaryotic pro- teasome. The 26S proteasome is highly conserved in all eukaryotes. The two subassemblies are the 20S core particle and the 19S regula- tory particle. (a) (PDB ID 1IRU) The core particle consists of four rings arranged to form a barrel-like structure. Each of the inner rings has seven different H9252 subunits (light blue), three of which have protease activities (dark blue). The outer rings each have seven different H9251 sub- units (gray). (b) A regulatory particle forms a cap on each end of the core particle. The core particle is colored as in (a). The base and lid segments of each regulatory particle are presented in different shades of red. The regulatory particle unfolds ubiquitinated proteins (blue) and translocates them into the core particle, as shown. 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1076 mac76 mac76:385_reb: (Fig. 27–41) are in fact two large families of proteins. Different E2 and E3 enzymes exhibit different speci- ficities for target proteins and thus regulate different cellular processes. Some E2 and E3 enzymes are highly localized in certain cellular compartments, reflecting a specialized function. Not surprisingly, defects in the ubiquitination pathway have been implicated in a wide range of disease states. An inability to degrade certain proteins that activate cell division (the products of oncogenes) can lead to tumor formation, whereas a too-rapid degra- dation of proteins that act as tumor suppressors can have the same effect. The ineffective or overly rapid degradation of cellular proteins also appears to play a role in a range of other conditions: renal diseases, asthma, neurodegenerative disorders such as Alz- heimer’s and Parkinson’s diseases (associated with the formation of characteristic proteinaceous structures in neurons), cystic fibrosis (caused in some cases by a too- rapid degradation of a chloride ion channel, with re- sultant loss of function; see Box 11–3), Liddle’s syn- drome (in which a sodium channel in the kidney is not degraded, leading to excessive Na H11001 absorption and early-onset hypertension)—and many other disorders. Drugs designed to inhibit proteasome function are be- ing developed as potential treatments for some of these conditions. In a changing metabolic environment, pro- tein degradation is as important to a cell’s survival as is protein synthesis, and much remains to be learned about these interesting pathways. ■ SUMMARY 27.3 Protein Targeting and Degradation ■ After synthesis, many proteins are directed to particular locations in the cell. One targeting mechanism involves a peptide signal sequence, generally found at the amino terminus of a newly synthesized protein. ■ In eukaryotic cells, one class of signal sequences is recognized by the signal recognition particle (SRP), which binds the signal sequence as soon as it appears on the ribosome and transfers the entire ribosome and incomplete polypeptide to the ER. Polypeptides with these signal sequences are moved into the ER lumen as they are synthesized; once in the lumen they may be modified and moved to the Golgi complex, then sorted and sent to lysosomes, the plasma membrane, or transport vesicles. ■ Proteins targeted to mitochondria and chloroplasts in eukaryotic cells, and those destined for export in bacteria, also make use of an amino-terminal signal sequence. ■ Proteins targeted to the nucleus have an internal signal sequence that is not cleaved once the protein is successfully targeted. ■ Some eukaryotic cells import proteins by receptor-mediated endocytosis. ■ All cells eventually degrade proteins, using specialized proteolytic systems. Defective proteins and those slated for rapid turnover are generally degraded by an ATP-dependent system. In eukaryotic cells, the proteins are first tagged by linkage to ubiquitin, a highly conserved protein. Ubiquitin-dependent proteolysis is carried out by proteasomes, also highly conserved, and is critical to the regulation of many cellular processes. Chapter 27 Key Terms 1077 Key Terms aminoacyl-tRNA 1035 aminoacyl-tRNA synthetases 1035 translation 1035 codon 1035 reading frame 1036 initiation codon 1038 termination codons 1038 open reading frame (ORF) 1039 anticodon 1039 wobble 1041 initiation 1056 Shine-Dalgarno sequence 1056 aminoacyl (A) site 1056 peptidyl (P) site 1056 exit (E) site 1056 initiation complex 1057 elongation 1058 elongation factors 1058 peptidyl transferase 1058 translocation 1060 termination 1061 release factors 1061 polysome 1062 posttranslational modification 1062 nonsense suppressor 1065 puromycin 1066 tetracyclines 1066 chloramphenicol 1066 cycloheximide 1067 streptomycin 1067 diphtheria toxin 1067 ricin 1067 signal recognition particle (SRP) 1069 signal sequence 1068 tunicamycin 1070 coated pits 1074 clathrin 1074 ubiquitin 1075 proteasome 1076 Terms in bold are defined in the glossary. 8885d_c27_1077 2/13/04 2:49 PM Page 1077 mac76 mac76:385_reb: Chapter 27 Protein Metabolism1078 Further Reading Genetic Code Bass, B.L. (2002) RNA editing by adenosine deaminases that act on RNA. Annu. Rev. Biochem. 71, 817–846. Blanc, V. & Davidson, N.O. (2003) C-to-U RNA editing: mechanisms leading to genetic diversity J. Biol. Chem. 278, 1395–1398. Crick, F.H.C. (1966) The genetic code: III. Sci. Am. 215 (October), 55–62. An insightful overview of the genetic code at a time when the code words had just been worked out. Fox, T.D. (1987) Natural variation in the genetic code. Annu. Rev. Genet. 21, 67–91. Hatfield, D. & Oroszlan, S. (1990) The where, what and how of ribosomal frameshifting in retroviral protein synthesis. Trends Biochem. Sci. 15, 186–190. Klobutcher, L.A. & Farabaugh, P.J. (2002) Shifty ciliates: frequent programmed translational frameshifting in Euplotids. Cell 111, 763–766. Knight, R.D., Freeland, S.J., & Landweber, L.F. (2001) Rewiring the keyboard: evolvability of the genetic code. Nat. Rev. Genet. 2, 49–58. Maas, S., Rich, A., & Nishikura, K. (2003) A-to-I RNA editing: recent news and residual mysteries. J. Biol. Chem. 278, 1391–1394. Nirenberg, M.W. (1963) The genetic code: II. Sci. Am. 208 (March), 80–94. A description of the original experiments. Stadtman, T.C. (1996) Selenocysteine. Annu. Rev. Biochem. 65, 83–100. Protein Synthesis Ban, N., Nissen, P., Hansen, J., Moore, P.B., & Steitz, T.A. (2000) The complete atomic structure of the large ribosomal subunit at 2.4 angstrom resolution. Science 289, 905–920. The first high-resolution structure of a major ribosomal subunit. Bj?rk, G.R., Ericson, J.U., Gustafsson, C.E.D., Hagervall, T.G., J?nsson, Y.H., & Wikstr?m, P.M. (1987) Transfer RNA modification. Annu. Rev. Biochem. 56, 263–288. Chapeville, F., Lipmann, F., von Ehrenstein, G., Weisblum, B., Ray, W.J., Jr., & Benzer, S. (1962) On the role of soluble ribonucleic acid in coding for amino acids. Proc. Natl. Acad. Sci. USA 48, 1086–1092. Classic experiments providing proof for Crick’s adaptor hypothesis and showing that amino acids are not checked after they are linked to tRNAs. Dintzis, H.M. (1961) Assembly of the peptide chains of hemoglobin. Proc. Natl. Acad. Sci. USA 47, 247–261. A classic experiment establishing that proteins are assembled beginning at the amino terminus. Giege, R., Sissler, M., & Florentz, C. (1998) Universal rules and idiosyncratic features in tRNA identity. Nucleic Acid Res. 26, 5017–5035. Gingras, A.-C., Raught, B., & Sonenberg, N. (1999) eIF4 initiation factors: effectors of mRNA recruitment to ribosomes and regulators of translation. Annu. Rev. Biochem. 68, 913–964. Gray, N.K. & Wickens, M. (1998) Control of translation initiation in animals. Annu. Rev. Cell Dev. Biol. 14, 399–458. Green, R. & Noller, J.F. (1997) Ribosomes and translation. Annu. Rev. Biochem. 66, 679–716. Ibba, M. & Soll, D. (2000) Aminoacyl-tRNA synthesis. Annu. Rev. Biochem. 69, 617–650. Maden, B.E.H. (1990) The numerous modified nucleotides in eukaryotic ribosomal RNA. Prog. Nucleic Acid Res. Mol. Biol. 39, 241–303. Moore, P.B. & Steitz, T.A. (2003) The structural basis of large ribosomal subunit function. Annu. Rev. Biochem. 72, 813–850. Ramakrishnan, V. (2002) Ribosome structure and the mechanism of translation. Cell 108, 557–572. A good overview, incorporating structural advances. Rodnina, M.V. & Wintermeyer, W. (2001) Fidelity of aminoacyl- tRNA selection on the ribosome: kinetic and structural mechanisms. Annu. Rev. Biochem. 70, 415–435. Sprinzl, M. (1994) Elongation factor Tu: a regulatory GTPase with an integrated effector. Trends Biochem. Sci. 19, 245–250. Woese, C.R., Olsen, G.J., Ibba, M., & Soll, D. (2000) Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol. Mol. Biol. Rev. 64, 202–236. Protein Targeting and Secretion G?rlich, D. & Mattaj, I.W. (1996) Nucleocytoplasmic transport. Science 271, 1513–1518. Hartmann-Petersen, R., Seeger, M., & Gordon C. (2003) Transferring substrates to the 26S proteasome. Trends Biochem. Sci. 28, 26–31. Higgins, M.K. & McMahon, H.T. (2002) Snap-shots of clathrin- mediated endocytosis. Trends Biochem. Sci. 27, 257–263. Neupert, W. (1997) Protein import into mitochondria. Annu. Rev. Biochem. 66, 863–917. Pryer, N.K., Wuestehube, L.J., & Schekman, R. (1992) Vesicle- mediated protein sorting. Annu. Rev. Biochem. 61, 471–516. Rapoport, T.A., Jungnickel, B., & Kutay, U. (1996) Protein transport across the eukaryotic endoplasmic reticulum and bacterial inner membranes. Annu. Rev. Biochem. 65, 271–303. Schatz, G. & Dobberstein, B. (1996) Common principles of protein translocation across membranes. Science 271, 1519–1525. Schekman, R. & Orci, L. (1996) Coat proteins and vesicle budding. Science 271, 1526–1532. Schmid, S.L. (1997) Clathrin-coated vesicle formation and protein sorting: an integrated process. Annu. Rev. Biochem. 66, 511–548. Varshavsky, A. (1997) The ubiquitin system. Trends Biochem. Sci. 22, 383–387. Voges, D., Zwickl, P., & Baumeister, W. (1999) The 26S proteasome: a molecular machine designed for controlled proteolysis. Annu. Rev. Biochem. 68, 1015–1057. Ward, W.H.J. (1987) Diphtheria toxin: a novel cytocidal enzyme. Trends Biochem. Sci. 12, 28–31. 8885d_c27_1078 2/13/04 2:50 PM Page 1078 mac76 mac76:385_reb: Chapter 27 Problems 1079 1. Messenger RNA Translation Predict the amino acid sequences of peptides formed by ribosomes in response to the following mRNA sequences, assuming that the reading frame begins with the first three bases in each sequence. (a) GGUCAGUCGCUCCUGAUU (b) UUGGAUGCGCCAUAAUUUGCU (c) CAUGAUGCCUGUUGCUAC (d) AUGGACGAA 2. How Many Different mRNA Sequences Can Specify One Amino Acid Sequence? Write all the possible mRNA sequences that can code for the simple tripeptide segment Leu–Met–Tyr. Your answer will give you some idea about the number of possible mRNAs that can code for one polypeptide. 3. Can the Base Sequence of an mRNA Be Predicted from the Amino Acid Sequence of Its Polypeptide Product? A given sequence of bases in an mRNA will code for one and only one sequence of amino acids in a polypep- tide, if the reading frame is specified. From a given sequence of amino acid residues in a protein such as cytochrome c, can we predict the base sequence of the unique mRNA that coded it? Give reasons for your answer. 4. Coding of a Polypeptide by Duplex DNA The tem- plate strand of a segment of double-helical DNA contains the sequence (5H11032)CTTAACACCCCTGACTTCGCGCCGTCG(3H11032) (a) What is the base sequence of the mRNA that can be transcribed from this strand? (b) What amino acid sequence could be coded by the mRNA in (a), starting from the 5H11032 end? (c) If the complementary (nontemplate) strand of this DNA were transcribed and translated, would the resulting amino acid sequence be the same as in (b)? Explain the bi- ological significance of your answer. 5. Methionine Has Only One Codon Methionine is one of two amino acids with only one codon. How does the single codon for methionine specify both the initiating residue and interior Met residues of polypeptides synthesized by E. coli? 6. Synthetic mRNAs The genetic code was elucidated with polyribonucleotides synthesized either enzymatically or chemically in the laboratory. Given what we now know about the genetic code, how would you make a polyribonucleotide that could serve as an mRNA coding predominantly for many Phe residues and a small number of Leu and Ser residues? What other amino acid(s) would be coded for by this polyri- bonucleotide, but in smaller amounts? 7. Energy Cost of Protein Biosynthesis Determine the minimum energy cost, in terms of ATP equivalents ex- pended, required for the biosynthesis of the H9252-globin chain of hemoglobin (146 residues), starting from a pool includ- ing all necessary amino acids, ATP, and GTP. Compare your answer with the direct energy cost of the biosynthesis of a linear glycogen chain of 146 glucose residues in (H92511n4) link- age, starting from a pool including glucose, UTP, and ATP (Chapter 15). From your data, what is the extra energy cost of making a protein, in which all the residues are ordered in a specific sequence, compared with the cost of making a poly- saccharide containing the same number of residues but lack- ing the informational content of the protein? In addition to the direct energy cost for the synthesis of a protein, there are indirect energy costs—those required for the cell to make the necessary enzymes for protein synthe- sis. Compare the magnitude of the indirect costs to a eu- karyotic cell of the biosynthesis of linear (H92511n4) glycogen chains and the biosynthesis of polypeptides, in terms of the enzymatic machinery involved. 8. Predicting Anticodons from Codons Most amino acids have more than one codon and attach to more than one tRNA, each with a different anticodon. Write all possible an- ticodons for the four codons of glycine: (5H11032)GGU, GGC, GGA, and GGG. (a) From your answer, which of the positions in the an- ticodons are primary determinants of their codon specificity in the case of glycine? (b) Which of these anticodon-codon pairings has/have a wobbly base pair? (c) In which of the anticodon-codon pairings do all three positions exhibit strong Watson-Crick hydrogen bonding? 9. Effect of Single-Base Changes on Amino Acid Se- quence Much important confirmatory evidence on the ge- netic code has come from assessing changes in the amino acid sequence of mutant proteins after a single base has been changed in the gene that encodes the protein. Which of the following amino acid replacements would be consistent with the genetic code if the replacements were caused by a single base change? Which cannot be the result of a single-base mu- tation? Why? (a) PhenLeu (e) IlenLeu (b) LysnAla (f) HisnGlu (c) AlanThr (g) PronSer (d) PhenLys 10. Basis of the Sickle-Cell Mutation Sickle-cell hemo- globin has a Val residue at position 6 of the H9252-globin chain, instead of the Glu residue found in normal hemoglobin A. Can you predict what change took place in the DNA codon for glu- tamate to account for replacement of the Glu residue by Val? 11. Importance of the “Second Genetic Code” Some aminoacyl-tRNA synthetases do not recognize and bind the anticodon of their cognate tRNAs but instead use other struc- tural features of the tRNAs to impart binding specificity. The tRNAs for alanine apparently fall into this category. (a) What features of tRNA Ala are recognized by Ala-tRNA synthetase? (b) Describe the consequences of a CnG mutation in the third position of the anticodon of tRNA Ala . (c) What other kinds of mutations might have similar ef- fects? (d) Mutations of these types are never found in natural populations of organisms. Why? (Hint: Consider what might happen both to individual proteins and to the organism as a whole.) Problems 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1079 mac76 mac76:385_reb: Chapter 27 Protein Metabolism1080 12. Maintaining the Fidelity of Protein Synthesis The chemical mechanisms used to avoid errors in protein syn- thesis are different from those used during DNA replication. DNA polymerases use a 3H11032n5H11032 exonuclease proofreading ac- tivity to remove mispaired nucleotides incorrectly inserted into a growing DNA strand. There is no analogous proof- reading function on ribosomes and, in fact, the identity of an amino acid attached to an incoming tRNA and added to the growing polypeptide is never checked. A proofreading step that hydrolyzed the previously formed peptide bond after an incorrect amino acid had been inserted into a growing polypeptide (analogous to the proofreading step of DNA poly- merases) would be impractical. Why? (Hint: Consider how the link between the growing polypeptide and the mRNA is maintained during elongation; see Figs 27–24 and 27–25.) 13. Predicting the Cellular Location of a Protein The gene for a eukaryotic polypeptide 300 amino acid residues long is altered so that a signal sequence recognized by SRP occurs at the polypeptide’s amino terminus and a nuclear lo- calization signal (NLS) occurs internally, beginning at residue 150. Where is the protein likely to be found in the cell? 14. Requirements for Protein Translocation across a Membrane The secreted bacterial protein OmpA has a precursor, ProOmpA, which has the amino-terminal signal sequence required for secretion. If purified ProOmpA is dena- tured with 8 M urea and the urea is then removed (such as by running the protein solution rapidly through a gel filtra- tion column) the protein can be translocated across isolated bacterial inner membranes in vitro. However, translocation becomes impossible if ProOmpA is first allowed to incubate for a few hours in the absence of urea. Furthermore, the ca- pacity for translocation is maintained for an extended period if ProOmpA is first incubated in the presence of another bac- terial protein called trigger factor. Describe the probable func- tion of this factor. 15. Protein-Coding Capacity of a Viral DNA The 5,386 bp genome of bacteriophage H9278X174 includes genes for 10 pro- teins, designated A to K, with sizes given in the table below. How much DNA would be required to encode these 10 pro- teins? How can you reconcile the size of the H9278X174 genome with its protein-coding capacity? Number of Number of amino amino Protein acid residues Protein acid residues A 455 F 427 B 120 G 175 C 86 H 328 D 152 J 38 E91 K56 8885d_c27_1034-1080 2/12/04 1:19 PM Page 1080 mac76 mac76:385_reb: chapter O f the 4,000 or so genes in the typical bacterial genome, or the perhaps 35,000 genes in the human genome, only a fraction are expressed in a cell at any given time. Some gene products are present in very large amounts: the elongation factors required for protein synthesis, for example, are among the most abundant proteins in bacteria, and ribulose 1,5-bisphosphate carboxylase/oxygenase (rubisco) of plants and photosyn- thetic bacteria is, as far as we know, the most abundant enzyme in the biosphere. Other gene products occur in much smaller amounts; for instance, a cell may contain only a few molecules of the enzymes that repair rare DNA lesions. Requirements for some gene products change over time. The need for enzymes in certain meta- bolic pathways may wax and wane as food sources change or are depleted. During development of a mul- ticellular organism, some proteins that influence cellu- lar differentiation are present for just a brief time in only a few cells. Specialization of cellular function can dra- matically affect the need for various gene products; an example is the uniquely high concentration of a single protein—hemoglobin—in erythrocytes. Given the high cost of protein synthesis, regulation of gene expression is essential to making optimal use of available energy. The cellular concentration of a protein is deter- mined by a delicate balance of at least seven processes, each having several potential points of regulation: 1. Synthesis of the primary RNA transcript (transcription) 2. Posttranscriptional modification of mRNA 3. Messenger RNA degradation 4. Protein synthesis (translation) 5. Posttranslational modification of proteins 6. Protein targeting and transport 7. Protein degradation These processes are summarized in Figure 28–1. We have examined several of these mechanisms in previous chapters. Posttranscriptional modification of mRNA, by processes such as alternative splicing patterns (see Fig. 26–19b) or RNA editing (see Box 27–1), can affect which proteins are produced from an mRNA transcript and in what amounts. A variety of nucleotide sequences in an mRNA can affect the rate of its degradation (p. 1020). Many factors affect the rate at which an mRNA is translated into a protein, as well as the posttransla- tional modification, targeting, and eventual degradation of that protein (Chapter 27). This chapter focuses primarily on the regulation of transcription initiation, although aspects of posttran- scriptional and translational regulation are also de- scribed. Of the regulatory processes illustrated in Fig- ure 28–1, those operating at the level of transcription initiation are the best documented and probably the most 28 1081 REGULATION OF GENE EXPRESSION The fundamental problem of chemical physiology and of embryology is to understand why tissue cells do not all express, all the time, all the potentialities inherent in their genome. —Fran?ois Jacob and Jacques Monod, article in Journal of Molecular Biology, 1961 28.1 Principles of Gene Regulation 1082 28.2 Regulation of Gene Expression in Prokaryotes 1092 28.3 Regulation of Gene Expression in Eukaryotes 1102 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1081 mac76 mac76:385_reb: common. As in all biochemical processes, an efficient place for regulation is at the beginning of the pathway. Because synthesis of informational macromolecules is so extraordinarily expensive in terms of energy, elabo- rate mechanisms have evolved to regulate the process. Researchers continue to discover complex and some- times surprising regulatory mechanisms. Increasingly, posttranscriptional and translational regulation are proving to be among the more important of these processes, especially in eukaryotes. In fact, the regula- tory processes themselves can involve a considerable in- vestment of chemical energy. Control of transcription initiation permits the syn- chronized regulation of multiple genes encoding prod- ucts with interdependent activities. For example, when their DNA is heavily damaged, bacterial cells require a coordinated increase in the levels of the many DNA re- pair enzymes. And perhaps the most sophisticated form of coordination occurs in the complex regulatory circuits that guide the development of multicellular eukaryotes, which can involve many types of regulatory mechanisms. We begin by examining the interactions between proteins and DNA that are the key to transcriptional reg- ulation. We next discuss the specific proteins that in- fluence the expression of specific genes, first in prokary- otic and then in eukaryotic cells. Information about posttranscriptional and translational regulation is in- cluded in the discussion, where relevant, to provide a more complete overview of the rich complexity of reg- ulatory mechanisms. 28.1 Principles of Gene Regulation Genes for products that are required at all times, such as those for the enzymes of central metabolic path- ways, are expressed at a more or less constant level in virtually every cell of a species or organism. Such genes are often referred to as housekeeping genes. Un- varying expression of a gene is called constitutive gene expression. For other gene products, cellular levels rise and fall in response to molecular signals; this is regulated gene expression. Gene products that increase in concen- tration under particular molecular circumstances are re- ferred to as inducible; the process of increasing their expression is induction. The expression of many of the genes encoding DNA repair enzymes, for example, is in- duced by high levels of DNA damage. Conversely, gene products that decrease in concentration in response to a molecular signal are referred to as repressible, and the process is called repression. For example, in bac- teria, ample supplies of tryptophan lead to repression of the genes for the enzymes that catalyze tryptophan biosynthesis. Transcription is mediated and regulated by protein- DNA interactions, especially those involving the protein components of RNA polymerase (Chapter 26). We first consider how the activity of RNA polymerase is regu- lated, and proceed to a general description of the pro- teins participating in this process. We then examine the molecular basis for the recognition of specific DNA se- quences by DNA-binding proteins. RNA Polymerase Binds to DNA at Promoters RNA polymerases bind to DNA and initiate transcrip- tion at promoters (see Fig. 26–5), sites generally found near points at which RNA synthesis begins on the DNA template. The regulation of transcription initiation of- ten entails changes in how RNA polymerase interacts with a promoter. The nucleotide sequences of promoters vary consid- erably, affecting the binding affinity of RNA polymerases and thus the frequency of transcription initiation. Some Chapter 28 Regulation of Gene Expression1082 DNA Gene Transcription Primary transcript Posttranscriptional processing Mature mRNA Translation Posttranslational processing mRNA degradation Protein degradation Protein (inactive) Modified protein (active) Nucleotides Amino acids Protein targeting and transport FIGURE 28–1 Seven processes that affect the steady-state concen- tration of a protein. Each process has several potential points of regulation. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1082 mac76 mac76:385_reb: Escherichia coli genes are transcribed once per second, others less than once per cell generation. Much of this variation is due to differences in promoter sequence. In the absence of regulatory proteins, differences in pro- moter sequences may affect the frequency of transcrip- tion initiation by a factor of 1,000 or more. Most E. coli promoters have a sequence close to a consensus (Fig. 28–2). Mutations that result in a shift away from the con- sensus sequence usually decrease promoter function; conversely, mutations toward consensus usually enhance promoter function. Although housekeeping genes are expressed con- stitutively, the cellular concentrations of the proteins they encode vary widely. For these genes, the RNA polymerase–promoter interaction strongly influences the rate of transcription initiation; differences in pro- moter sequence allow the cell to synthesize the appro- priate level of each housekeeping gene product. The basal rate of transcription initiation at the pro- moters of nonhousekeeping genes is also determined by the promoter sequence, but expression of these genes is further modulated by regulatory proteins. Many of these proteins work by enhancing or interfering with the interaction between RNA polymerase and the promoter. The sequences of eukaryotic promoters are more variable than their prokaryotic counterparts (see Fig. 26–8). The three eukaryotic RNA polymerases usu- ally require an array of general transcription factors in order to bind to a promoter. Yet, as with prokaryotic gene expression, the basal level of transcription is de- termined by the effect of promoter sequences on the function of RNA polymerase and its associated tran- scription factors. Transcription Initiation Is Regulated by Proteins That Bind to or near Promoters At least three types of proteins regulate transcription initiation by RNA polymerase: specificity factors alter the specificity of RNA polymerase for a given promoter or set of promoters; repressors impede access of RNA polymerase to the promoter; and activators enhance the RNA polymerase–promoter interaction. We introduced prokaryotic specificity factors in Chapter 26 (see Fig. 26–5), although we did not refer to them by that name. The H9268 subunit of the E. coli RNA polymerase holoenzyme is a specificity factor that medi- ates promoter recognition and binding. Most E. coli pro- moters are recognized by a single H9268 subunit (M r 70,000), H9268 70 . Under some conditions, some of the H9268 70 subunits are replaced by another specificity factor. One notable case arises when the bacteria are subjected to heat stress, leading to the replacement of H9268 70 by H9268 32 (M r 32,000). When bound to H9268 32 , RNA polymerase is directed to a spe- cialized set of promoters with a different consensus sequence (Fig. 28–3). These promoters control the ex- pression of a set of genes that encode the heat-shock response proteins. Thus, through changes in the binding affinity of the polymerase that direct it to different pro- moters, a set of genes involved in related processes is co- ordinately regulated. In eukaryotic cells, some of the gen- eral transcription factors, in particular the TATA-binding protein (TBP; see Fig. 26–8), may be considered speci- ficity factors. Repressors bind to specific sites on the DNA. In prokaryotic cells, such binding sites, called operators, are generally near a promoter. RNA polymerase binding, 28.1 Principles of Gene Regulation 1083 TTGACA TATAAT H500835 region H500810 region N 5–9 mRNA RNA start site N 17 UP element5H11032DNA FIGURE 28–2 Consensus sequence for many E. coli promoters. Most base substitutions in the H1100210 and H1100235 regions have a negative effect on promoter function. Some promoters also include the UP (upstream promoter) element (see Fig. 26–5). By convention, DNA sequences are shown as they exist in the nontemplate strand, with the 5H11032 termi- nus on the left. Nucleotides are numbered from the transcription start site, with positive numbers to the right (in the direction of transcrip- tion) and negative numbers to the left. N indicates any nucleotide. TNTCNCCCTTGAA CCCCATTTA N 7 mRNA RNA start site N 13–15 5H11032DNA FIGURE 28–3 Consensus sequence for promoters that regulate expression of the E. coli heat- shock genes. This system responds to temperature increases as well as some other environmental stresses, resulting in the induction of a set of proteins. Binding of RNA polymerase to heat-shock promoters is mediated by a specialized H9268 subunit of the polymerase, H9268 32 , which replaces H9268 70 in the RNA polymerase initiation complex. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1083 mac76 mac76:385_reb: or its movement along the DNA after binding, is blocked when the repressor is present. Regulation by means of a repressor protein that blocks transcription is referred to as negative regulation. Repressor binding to DNA is regulated by a molecular signal (or effector), usually a small molecule or a protein, that binds to the repres- sor and causes a conformational change. The interaction between repressor and signal molecule either increases or decreases transcription. In some cases, the confor- mational change results in dissociation of a DNA-bound repressor from the operator (Fig. 28–4a). Transcription initiation can then proceed unhindered. In other cases, interaction between an inactive repressor and the signal molecule causes the repressor to bind to the operator (Fig. 28–4b). In eukaryotic cells, the binding site for a repressor may be some distance from the promoter; binding has the same effect as in bacterial cells: inhibit- ing the assembly or activity of a transcription complex at the promoter. Activators provide a molecular counterpoint to re- pressors; they bind to DNA and enhance the activity of RNA polymerase at a promoter; this is positive regu- lation. Activator binding sites are often adjacent to promoters that are bound weakly or not at all by RNA polymerase alone, such that little transcription occurs in the absence of the activator. Some eukaryotic acti- vators bind to DNA sites, called enhancers, that are quite distant from the promoter, affecting the rate of transcription at a promoter that may be located thou- sands of base pairs away. Some activators are normally bound to DNA, enhancing transcription until dissociation of the activator is triggered by the binding of a signal molecule (Fig. 28–4c). In other cases the activator binds to DNA only after interaction with a signal molecule Chapter 28 Regulation of Gene Expression1084 Molecular signal causes dissociation of regulatory protein from DNA Signal molecule Promoter DNA Operator mRNA 5H11032 3H11032 mRNA 5H11032 3H11032 (a) RNA polymerase (c) Molecular signal causes binding of regulatory protein to DNA mRNA 5H11032 3H11032 mRNA 5H11032 3H11032 (b) (d) Negative regulation (bound repressor inhibits transcription) Positive regulation (bound activator facilitates transcription) FIGURE 28–4 Common patterns of regulation of transcription initi- ation. Two types of negative regulation are illustrated. (a) Repressor (pink) binds to the operator in the absence of the molecular signal; the external signal causes dissociation of the repressor to permit tran- scription. (b) Repressor binds in the presence of the signal; the re- pressor dissociates and transcription ensues when the signal is re- moved. Positive regulation is mediated by gene activators. Again, two types are shown. (c) Activator (green) binds in the absence of the mo- lecular signal and transcription proceeds; when the signal is added, the activator dissociates and transcription is inhibited. (d) Activator binds in the presence of the signal; it dissociates only when the sig- nal is removed. Note that “positive” and “negative” regulation refer to the type of regulatory protein involved: the bound protein either fa- cilitates or inhibits transcription. In either case, addition of the mo- lecular signal may increase or decrease transcription, depending on its effect on the regulatory protein. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1084 mac76 mac76:385_reb: (Fig. 28–4d). Signal molecules can therefore increase or decrease transcription, depending on how they affect the activator. Positive regulation is particularly common in eukaryotes, as we shall see. Many Prokaryotic Genes Are Clustered and Regulated in Operons Bacteria have a simple general mechanism for coordi- nating the regulation of genes encoding products that participate in a set of related processes: these genes are clustered on the chromosome and are transcribed to- gether. Many prokaryotic mRNAs are polycistronic— multiple genes on a single transcript—and the single promoter that initiates transcription of the cluster is the site of regulation for expression of all the genes in the cluster. The gene cluster and promoter, plus additional sequences that function together in regulation, are called an operon (Fig. 28–5). Operons that include two to six genes transcribed as a unit are common; some operons contain 20 or more genes. Many of the principles of prokaryotic gene expres- sion were first defined by studies of lactose metabolism in E. coli, which can use lactose as its sole carbon source. In 1960, Fran?ois Jacob and Jacques Monod published a short paper in the Proceedings of the French Acad- emy of Sciences that described how two adjacent genes involved in lactose metabolism were coordinately regu- lated by a genetic element located at one end of the gene cluster. The genes were those for H9252-galactosidase, which cleaves lactose to galactose and glucose, and galactoside permease, which transports lactose into the cell (Fig. 28–6). The terms “operon” and “operator” were first introduced in this paper. With the operon model, gene regulation could, for the first time, be con- sidered in molecular terms. The lac Operon Is Subject to Negative Regulation The lactose (lac) operon (Fig. 28–7a) includes the genes for H9252-galactosidase (Z), galactoside permease (Y ), and thiogalactoside transacetylase (A). The last of these enzymes appears to modify toxic galactosides to facilitate their removal from the cell. Each of the three genes is preceded by a ribosome binding site (not shown in Fig. 28–7) that independently directs the translation 28.1 Principles of Gene Regulation 1085 DNA Promoter Activator binding site Repressor binding site (operator) Regulatory sequences Genes transcribed as a unit ABC FIGURE 28–5 Representative prokaryotic operon. Genes A, B, and C are transcribed on one polycistronic mRNA. Typical regulatory se- quences include binding sites for proteins that either activate or re- press transcription from the promoter. Fran?ois Jacob Jacques Monod, 1910–1976 CH 2 OH Outside Inside Lactose Lactose Galactoside permease CH 2 OH H H H H H O O OH OH HO H H H H H O OH OH OH CH 2 OH H H H H H O O OH OH HO CH 2 HO H H H H O OH OH H OH CH 2 OH H H H Galactose Glucose Allolactose -galactosidaseH9252 H H O H11001 OH OH HO HO OH CH 2 OH H H H H H O OH OH OH FIGURE 28–6 Lactose metabolism in E. coli. Uptake and metabolism of lactose require the activities of galactoside permease and H9252- galactosidase. Conversion of lactose to allolactose by transglycosyla- tion is a minor reaction also catalyzed by H9252-galactosidase. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1085 mac76 mac76:385_reb: of that gene (Chapter 27). Regulation of the lac operon by the lac repressor protein (Lac) follows the pattern outlined in Figure 28–4a. The study of lac operon mutants has revealed some details of the workings of the operon’s regulatory sys- tem. In the absence of lactose, the lac operon genes are repressed. Mutations in the operator or in another gene, the I gene, result in constitutive synthesis of the gene products. When the I gene is defective, repression can be restored by introducing a functional I gene into the cell on another DNA molecule, demonstrating that the I gene encodes a diffusible molecule that causes gene repression. This molecule proved to be a protein, now called the Lac repressor, a tetramer of identical monomers. The operator to which it binds most tightly (O 1 ) abuts the transcription start site (Fig. 28–7a). The I gene is transcribed from its own promoter (P I ) inde- pendent of the lac operon genes. The lac operon has two secondary binding sites for the Lac repressor. One (O 2 ) is centered near position H11001410, within the gene encoding H9252-galactosidase (Z); the other (O 3 ) is near po- sition H1100290, within the I gene. To repress the operon, the Lac repressor appears to bind to both the main opera- tor and one of the two secondary sites, with the inter- vening DNA looped out (Fig. 28–7b, c). Either binding arrangement blocks transcription initiation. Chapter 28 Regulation of Gene Expression1086 Operators (b) (c) (d) FIGURE 28–7 The lac operon. (a) The lac operon in the repressed state. The I gene encodes the Lac repressor. The lac Z, Y, and A genes encode H9252-galactosidase, galactoside permease, and thiogalactoside transacetylase, respectively. P is the promoter for the lac genes, and P I is the promoter for the I gene. O 1 is the main operator for the lac operon; O 2 and O 3 are secondary operator sites of lesser affinity for the Lac repressor. (b) The Lac repressor binds to the main operator and O 2 or O 3 , apparently forming a loop in the DNA that might wrap around the repressor as shown. (c) Lac repressor bound to DNA (de- rived from PDB ID 1LBG). This shows the protein (gray) bound to short, discontinuous segments of DNA (blue). (d) Conformational change in the Lac repressor caused by binding of the artificial inducer iso- propylthiogalactoside, IPTG (derived from PDB ID 1LBH and 1LBG). The structure of the tetrameric repressor is shown without IPTG bound (transparent image) and with IPTG bound (overlaid solid image; IPTG not shown). The DNA bound when IPTG is absent (transparent struc- ture) is not shown. When IPTG is bound and DNA is not bound, the repressor’s DNA-binding domains are too disordered to be defined in the crystal structure. DNA P I IZYA mRNA Lac repressor PO 1 O 2 O 3 (a) 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1086 mac76 mac76:385_reb: Despite this elaborate binding complex, repression is not absolute. Binding of the Lac repressor reduces the rate of transcription initiation by a factor of 10 3 . If the O 2 and O 3 sites are eliminated by deletion or muta- tion, the binding of repressor to O 1 alone reduces tran- scription by a factor of about 10 2 . Even in the repressed state, each cell has a few molecules of H9252-galactosidase and galactoside permease, presumably synthesized on the rare occasions when the repressor transiently dis- sociates from the operators. This basal level of tran- scription is essential to operon regulation. When cells are provided with lactose, the lac operon is induced. An inducer (signal) molecule binds to a spe- cific site on the Lac repressor, causing a conformational change (Fig. 28–7d) that results in dissociation of the repressor from the operator. The inducer in the lac operon system is not lactose itself but allolactose, an isomer of lactose (Fig. 28–6). After entry into the E. coli cell (via the few existing molecules of permease), lactose is converted to allolactose by one of the few ex- isting H9252-galactosidase molecules. Release of the opera- tor by Lac repressor, triggered as the repressor binds to allolactose, allows expression of the lac operon genes and leads to a 10 3 -fold increase in the concentration of H9252-galactosidase. Several H9252-galactosides structurally related to allo- lactose are inducers of the lac operon but are not sub- strates for H9252-galactosidase; others are substrates but not inducers. One particularly effective and nonmetaboliz- able inducer of the lac operon that is often used ex- perimentally is isopropylthiogalactoside (IPTG): C CH 2 CH 3 H Isopropylthiogalactoside (IPTG) CH 3 OH OH O H OH H H OH SH H An inducer that cannot be metabolized allows researchers to explore the physiological function of lactose as a car- bon source for growth, separate from its function in the regulation of gene expression. In addition to the multitude of operons now known in bacteria, a few polycistronic operons have been found in the cells of lower eukaryotes. In the cells of higher eukaryotes, however, almost all protein-encoding genes are transcribed separately. The mechanisms by which operons are regulated can vary significantly from the simple model presented in Figure 28–7. Even the lac operon is more complex than indicated here, with an activator also contributing to the overall scheme, as we shall see in Section 28.2. Before any further discussion of the layers of regulation of gene expression, however, we examine the critical molecular interactions between DNA-binding proteins (such as repressors and activators) and the DNA se- quences to which they bind. Regulatory Proteins Have Discrete DNA-Binding Domains Regulatory proteins generally bind to specific DNA se- quences. Their affinity for these target sequences is roughly 10 4 to 10 6 times higher than their affinity for any other DNA sequences. Most regulatory proteins have discrete DNA-binding domains containing sub- structures that interact closely and specifically with the DNA. These binding domains usually include one or more of a relatively small group of recognizable and characteristic structural motifs. To bind specifically to DNA sequences, regulatory proteins must recognize surface features on the DNA. Most of the chemical groups that differ among the four bases and thus permit discrimination between base pairs are hydrogen-bond donor and acceptor groups exposed in the major groove of DNA (Fig. 28–8), and most of the protein-DNA contacts that impart specificity are hydro- gen bonds. A notable exception is the nonpolar surface 28.1 Principles of Gene Regulation 1087 N N H O N N O N N N H O N O N N N H N N H H H H Major groove Major groove Minor groove Minor groove Major groove Minor grooveMinor groove Major groove O N O N N N N H N N H H N N N H O N N O N N N H H N H H N H CH 3 CH 3 6 5 1 Adenine Thymine Thymine AdenineGuanine Cytosine Cytosine Guanine FIGURE 28–8 Groups in DNA available for protein binding. Shown here are functional groups on all four base pairs that are displayed in the major and minor grooves of DNA. Groups that can be used for base-pair recognition by proteins are shown in red. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1087 mac76 mac76:385_reb: near C-5 of pyrimidines, where thymine is readily dis- tinguished from cytosine by its protruding methyl group. Protein-DNA contacts are also possible in the minor groove of the DNA, but the hydrogen-bonding patterns here generally do not allow ready discrimination be- tween base pairs. Within regulatory proteins, the amino acid side chains most often hydrogen-bonding to bases in the DNA are those of Asn, Gln, Glu, Lys, and Arg residues. Is there a simple recognition code in which a particular amino acid always pairs with a particular base? The two hydrogen bonds that can form between Gln or Asn and the N 6 and N-7 positions of adenine cannot form with any other base. And an Arg residue can form two hy- drogen bonds with N-7 and O 6 of guanine (Fig. 28–9). Examination of the structures of many DNA-binding proteins, however, has shown that a protein can recog- nize each base pair in more than one way, leading to the conclusion that there is no simple amino acid–base code. For some proteins, the Gln-adenine interaction can specify AUT base pairs, but in others a van der Waals pocket for the methyl group of thymine can recognize AUT base pairs. Researchers cannot yet examine the structure of a DNA-binding protein and infer the DNA sequence to which it binds. To interact with bases in the major groove of DNA, a protein requires a relatively small structure that can stably protrude from the protein surface. The DNA- binding domains of regulatory proteins tend to be small (60 to 90 amino acid residues), and the structural mo- tifs within these domains that are actually in contact with the DNA are smaller still. Many small proteins are unstable because of their limited capacity to form lay- ers of structure to bury hydrophobic groups (p. 118). The DNA-binding motifs provide either a very compact stable structure or a way of allowing a segment of pro- tein to protrude from the protein surface. The DNA-binding sites for regulatory proteins are often inverted repeats of a short DNA sequence (a palin- drome) at which multiple (usually two) subunits of a regulatory protein bind cooperatively. The Lac repres- sor is unusual in that it functions as a tetramer, with two dimers tethered together at the end distant from the DNA-binding sites (Fig. 28–7b). An E. coli cell normally contains about 20 tetramers of the Lac repressor. Each of the tethered dimers separately binds to a palindromic operator sequence, in contact with 17 bp of a 22 bp re- gion in the lac operon (Fig. 28–10). And each of the tethered dimers can independently bind to an operator sequence, with one generally binding to O 1 and the other to O 2 or O 3 (as in Fig. 28–7b). The symmetry of the O 1 operator sequence corresponds to the twofold axis of symmetry of two paired Lac repressor subunits. The tetrameric Lac repressor binds to its operator sequences in vivo with an estimated dissociation constant of about 10 H1100210 M. The repressor discriminates between the op- erators and other sequences by a factor of about 10 6 , so binding to these few base pairs among the 4.6 million or so of the E. coli chromosome is highly specific. Several DNA-binding motifs have been described, but here we focus on two that play prominent roles in the binding of DNA by regulatory proteins: the helix- turn-helix and the zinc finger. We also consider a type of DNA-binding domain—the homeodomain—found in some eukaryotic proteins. Helix-Turn-Helix This DNA-binding motif is crucial to the interaction of many prokaryotic regulatory proteins with DNA, and similar motifs occur in some eukaryotic reg- ulatory proteins. The helix-turn-helix motif comprises about 20 amino acids in two short H9251-helical segments, Chapter 28 Regulation of Gene Expression1088 N N N N N C CH 3 CH 2 H H H O N N H N 7 6 H N N O H11001 N N C C NH CH 2 CH 2 CH 2 H R N H H H O H N N N N H N H 7 6 H H O O Glutamine (or asparagine) Arginine CH 2 C H HO RH11032N CR C H HO RH11032N Thymine Adenine Cytosine Guanine DNA TAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCAC mRNA H1100235 region H1100210 region Operator (bound by Lac repressor) Promoter (bound by RNA polymerase) RNA start site FIGURE 28–9 Two examples of specific amino acid–base pair inter- actions that have been observed in DNA-protein binding. FIGURE 28–10 Relationship between the lac operator sequence O 1 and the lac promoter. The bases shaded beige exhibit twofold (palin- dromic) symmetry about the axis indicated by the dashed vertical line. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1088 mac76 mac76:385_reb: each seven to nine amino acid residues long, separated by a H9252 turn (Fig. 28–11). This structure generally is not stable by itself; it is simply the reactive portion of a somewhat larger DNA-binding domain. One of the two H9251-helical segments is called the recognition helix, be- cause it usually contains many of the amino acids that interact with the DNA in a sequence-specific way. This H9251 helix is stacked on other segments of the protein structure so that it protrudes from the protein surface. When bound to DNA, the recognition helix is positioned in or nearly in the major groove. The Lac repressor has this DNA-binding motif (Fig. 28–11). 28.1 Principles of Gene Regulation 1089 FIGURE 28–11 Helix-turn-helix. (a) DNA-binding domain of the Lac repressor (PDB ID 1LCC). The helix-turn-helix motif is shown in red and orange; the DNA recognition helix is red. (b) Entire Lac repres- sor (derived from PDB ID 1LBG). The DNA-binding domains are gray, and the H9251 helices involved in tetramerization are red. The remainder of the protein (shades of green) has the binding sites for allolactose. The allolactose-binding domains are linked to the DNA-binding do- mains through linker helices (yellow). (c) Surface rendering of the DNA-binding domain of the Lac repressor (gray) bound to DNA (blue). (d) The same DNA-binding domain as in (c), but separated from the DNA, with the binding interaction surfaces shown. Some groups on the protein and DNA that interact through hydrogen-bonding are shown in red; some groups that interact through hydrophobic inter- actions are in orange. This model shows only a few of the groups in- volved in sequence recognition. The complementary nature of the two surfaces is evident. (a) (b) (c) (d) 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1089 mac76 mac76:385_reb: Zinc Finger In a zinc finger, about 30 amino acid residues form an elongated loop held together at the base by a single Zn 2H11001 ion, which is coordinated to four of the residues (four Cys, or two Cys and two His). The zinc does not itself interact with DNA; rather, the coor- dination of zinc with the amino acid residues stabilizes this small structural motif. Several hydrophobic side chains in the core of the structure also lend stability. Figure 28–12 shows the interaction between DNA and three zinc fingers of a single polypeptide from the mouse regulatory protein Zif268. Many eukaryotic DNA-binding proteins contain zinc fingers. The interaction of a single zinc finger with DNA is typically weak, and many DNA-binding proteins, like Zif268, have multiple zinc fingers that substantially en- hance binding by interacting simultaneously with the DNA. One DNA-binding protein of the frog Xenopus has 37 zinc fingers. There are few known examples of the zinc finger motif in prokaryotic proteins. The precise manner in which proteins with zinc fin- gers bind to DNA differs from one protein to the next. Some zinc fingers contain the amino acid residues that are important in sequence discrimination, whereas oth- ers appear to bind DNA nonspecifically (the amino acids required for specificity are located elsewhere in the protein). Zinc fingers can also function as RNA-binding motifs—for example, in certain proteins that bind eu- karyotic mRNAs and act as translational repressors. We discuss this role later (Section 28.3). Homeodomain Another type of DNA-binding domain has been identified in a number of proteins that function as transcriptional regulators, especially during eukaryotic development. This domain of 60 amino acids—called the homeodomain, because it was discovered in homeotic genes (genes that regulate the development of body pat- terns)—is highly conserved and has now been identified in proteins from a wide variety of organisms, including humans (Fig. 28–13). The DNA-binding segment of the domain is related to the helix-turn-helix motif. The DNA sequence that encodes this domain is known as the homeobox. Regulatory Proteins Also Have Protein-Protein Interaction Domains Regulatory proteins contain domains not only for DNA binding but also for protein-protein interactions—with RNA polymerase, other regulatory proteins, or other sub- units of the same regulatory protein. Examples include many eukaryotic transcription factors that function as gene activators, which often bind as dimers to the DNA, using DNA-binding domains that contain zinc fingers. Some structural domains are devoted to the interactions required for dimer formation, which is generally a pre- requisite for DNA binding. Like DNA-binding motifs, the structural motifs that mediate protein-protein interac- tions tend to fall within one of a few common categories. Two important examples are the leucine zipper and the basic helix-loop-helix. Structural motifs such as Chapter 28 Regulation of Gene Expression1090 FIGURE 28–13 Homeodomain. Shown here is a homeodomain bound to DNA; one of the H9251 helices (red), stacked on two others, can be seen protruding into the major groove (PDB ID 1B8I). This is only a small part of the much larger protein Ultrabithorax (Ubx), active in the regulation of development in fruit flies. FIGURE 28–12 Zinc fingers. Three zinc fingers (gray) of the regula- tory protein Zif268, complexed with DNA (blue and white) (PDB ID 1A1L). Each Zn 2H11001 (maroon) coordinates with two His and two Cys residues (not shown). 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1090 mac76 mac76:385_reb: these are the basis for classifying some regulatory pro- teins into structural families. Leucine Zipper This motif is an amphipathic H9251 helix with a series of hydrophobic amino acid residues concen- trated on one side (Fig. 28–14), with the hydrophobic surface forming the area of contact between the two polypeptides of a dimer. A striking feature of these H9251 helices is the occurrence of Leu residues at every seventh position, forming a straight line along the hydrophobic surface. Although researchers initially thought the Leu residues interdigitated (hence the name “zipper”), we now know that they line up side by side as the interacting H9251 helices coil around each other (forming a coiled coil; Fig. 28–14b). Regulatory proteins with leucine zippers often have a separate DNA-binding domain with a high concentration of basic (Lys or Arg) residues that can interact with the negatively charged phosphates of the DNA backbone. Leucine zippers have been found in many eukaryotic and a few prokaryotic proteins. Basic Helix-Loop-Helix Another common structural motif occurs in some eukaryotic regulatory proteins implicated in the control of gene expression during the develop- ment of multicellular organisms. These proteins share a conserved region of about 50 amino acid residues im- portant in both DNA binding and protein dimerization. This region can form two short amphipathic H9251 helices linked by a loop of variable length, the helix-loop-helix (distinct from the helix-turn-helix motif associated with DNA binding). The helix-loop-helix motifs of two polypeptides interact to form dimers (Fig. 28–15). In these proteins, DNA binding is mediated by an adjacent short amino acid sequence rich in basic residues, simi- lar to the separate DNA-binding region in proteins con- taining leucine zippers. Subunit Mixing in Eukaryotic Regulatory Proteins Several families of eukaryotic transcription factors have been defined based on close structural similarities. Within each family, dimers can sometimes form between two identical proteins (a homodimer) or between two dif- ferent members of the family (a heterodimer). A hypo- thetical family of four different leucine-zipper proteins could thus form up to ten different dimeric species. In many cases, the different combinations appear to have distinct regulatory and functional properties. 28.1 Principles of Gene Regulation 1091 (b) Zipper region FIGURE 28–14 Leucine zippers. (a) Comparison of amino acid sequences of several leucine zipper proteins. Note the Leu (L) residues at every seventh position in the zipper region, and the number of Lys (K) and Arg (R) residues in the DNA-binding region. (b) Leucine zipper from the yeast activator protein GCN4 (PDB ID 1YSA). Only the “zippered” H9251 helices (gray and light blue), derived from different subunits of the dimeric protein, are shown. The two helices wrap around each other in a gently coiled coil. The inter- acting Leu residues are shown in red. Source Regulatory protein Amino acid sequence Mammal C/EBP Jun Fos GCN4 D S E P – K Q E E – N E R S – S R R S – N I R D – E K I P – Y A R A – R E R A – V R I L – R K R K R K R R R R R K E M E A – R R R R R K N N N N N N R K T – I I M E – A A A A – V A A A – R S A R – K K K R R K S C C S – R R R R R D K N A – K R R R R K A K R K R K K L R L – Q E E Q – R R L R – N I T M – V A D K – E R T Q – T L L L L Q E Q E – Q E A D – K K E K – V V T V – L K D E – E T Q E – L L L L L T K E L – S A D S – D Q K K – N N K N – D S S Y – R E A H – L L L L L R A Q E – K S T N – R T E E – V A I V – E N A A – Q M N R – L L L L L S T L K – R E K K – E Q E L – L V K V – D A E G – T Q K E – L L L R L R K E – G Q F – – – ––– – – ––– DNA-binding region 6 Amino acid connector Leucine zipper Invariant Asn(a) Yeast Consensus molecule 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1091 mac76 mac76:385_reb: In addition to structural domains devoted to DNA binding and dimerization (or oligomerization), many regulatory proteins must interact with RNA polymerase, with unrelated regulatory proteins, or with both. At least three different types of additional domains for protein- protein interaction have been characterized (primarily in eukaryotes): glutamine-rich, proline-rich, and acidic domains, the names reflecting the amino acid residues that are especially abundant. Protein-DNA binding interactions are the basis of the intricate regulatory circuits fundamental to gene function. We now turn to a closer examination of these gene regulatory schemes, first in prokaryotic, then in eukaryotic systems. SUMMARY 28.1 Principles of Gene Regulation ■ The expression of genes is regulated by processes that affect the rates at which gene products are synthesized and degraded. Much of this regulation occurs at the level of transcription initiation, mediated by regulatory proteins that either repress transcription (negative regulation) or activate transcription (positive regulation) at specific promoters. ■ In bacteria, genes that encode products with interdependent functions are often clustered in an operon, a single transcriptional unit. Transcription of the genes is generally blocked by binding of a specific repressor protein at a DNA site called an operator. Dissociation of the repressor from the operator is mediated by a specific small molecule, an inducer. These principles were first elucidated in studies of the lactose (lac) operon. The Lac repressor dissociates from the lac operator when the repressor binds to its inducer, allolactose. ■ Regulatory proteins are DNA-binding proteins that recognize specific DNA sequences; most have distinct DNA-binding domains. Within these domains, common structural motifs that bind DNA are the helix-turn-helix, zinc finger, and homeodomain. ■ Regulatory proteins also contain domains for protein-protein interactions, including the leucine zipper and helix-loop-helix, which are involved in dimerization, and other motifs involved in activation of transcription. 28.2 Regulation of Gene Expression in Prokaryotes As in many other areas of biochemical investigation, the study of the regulation of gene expression advanced ear- lier and faster in bacteria than in other experimental or- ganisms. The examples of bacterial gene regulation pre- sented here are chosen from among scores of well-studied systems, partly for their historical signifi- cance, but primarily because they provide a good overview of the range of regulatory mechanisms em- ployed in prokaryotes. Many of the principles of prokary- otic gene regulation are also relevant to understanding gene expression in eukaryotic cells. We begin by examining the lactose and tryptophan operons; each system has regulatory proteins, but the overall mechanisms of regulation are very different. This is followed by a short discussion of the SOS response in E. coli, illustrating how genes scattered throughout the genome can be coordinately regulated. We then describe two prokaryotic systems of quite different types, illus- trating the diversity of gene regulatory mechanisms: regulation of ribosomal protein synthesis at the level of translation, with many of the regulatory proteins bind- ing to RNA (rather than DNA), and regulation of a process called phase variation in Salmonella, which re- sults from genetic recombination. First, we return to the lac operon to examine its features in greater detail. Chapter 28 Regulation of Gene Expression1092 FIGURE 28–15 Helix-loop-helix. The human transcription factor Max, bound to its DNA target site (PDB ID 1HLO). The protein is dimeric; one subunit is colored. The DNA-binding segment (pink) merges with the first helix of the helix-loop-helix (red). The second helix merges with the carboxyl-terminal end of the subunit (purple). Interaction of the carboxyl-terminal helices of the two subunits describes a coiled coil very similar to that of a leucine zipper (see Fig. 28–14b), but with only one pair of interacting Leu residues (red side chains near the top) in this particular example. The overall structure is sometimes called a helix-loop-helix/leucine zipper motif. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1092 mac76 mac76:385_reb: The lac Operon Undergoes Positive Regulation The operator-repressor-inducer interactions described earlier for the lac operon (Fig. 28–7) provide an intu- itively satisfying model for an on/off switch in the reg- ulation of gene expression. In truth, operon regulation is rarely so simple. A bacterium’s environment is too complex for its genes to be controlled by one signal. Other factors besides lactose affect the expression of the lac genes, such as the availability of glucose. Glu- cose, metabolized directly by glycolysis, is E. coli’s pre- ferred energy source. Other sugars can serve as the main or sole nutrient, but extra steps are required to prepare them for entry into glycolysis, necessitating the syn- thesis of additional enzymes. Clearly, expressing the genes for proteins that metabolize sugars such as lac- tose or arabinose is wasteful when glucose is abundant. What happens to the expression of the lac operon when both glucose and lactose are present? A regula- tory mechanism known as catabolite repression re- stricts expression of the genes required for catabolism of lactose, arabinose, and other sugars in the presence of glucose, even when these secondary sugars are also present. The effect of glucose is mediated by cAMP, as a coactivator, and an activator protein known as cAMP receptor protein, or CRP (the protein is sometimes called CAP, for catabolite gene activator protein). CRP is a homodimer (subunit M r 22,000) with binding sites for DNA and cAMP. Binding is mediated by a helix-turn- helix motif within the protein’s DNA-binding domain (Fig. 28–16). When glucose is absent, CRP-cAMP binds to a site near the lac promoter (Fig. 28–17a) and stim- ulates RNA transcription 50-fold. CRP-cAMP is there- fore a positive regulatory element responsive to glucose levels, whereas the Lac repressor is a negative regula- tory element responsive to lactose. The two act in con- cert. CRP-cAMP has little effect on the lac operon when the Lac repressor is blocking transcription, and dissoci- ation of the repressor from the lac operator has little effect on transcription of the lac operon unless CRP- cAMP is present to facilitate transcription; when CRP is not bound, the wild-type lac promoter is a relatively weak promoter (Fig. 28–17b). The open complex of RNA polymerase and the promoter (see Fig. 26–6) does not form readily unless CRP-cAMP is present. CRP inter- acts directly with RNA polymerase (at the region shown in Fig. 28–16) through the polymerase’s H9251 subunit. 28.2 Regulation of Gene Expression in Prokaryotes 1093 FIGURE 28–16 CRP homodimer. (PDB ID 1RUN) Bound molecules of cAMP are shown in red. Note the bending of the DNA around the protein. The region that interacts with RNA polymerase is shaded yellow. FIGURE 28–17 Activation of transcription of the lac operon by CRP. (a) The binding site for CRP-cAMP is near the promoter. As in the case of the lac operator, the CRP site has twofold symmetry (bases shaded beige) about the axis indicated by the dashed line. (b) Sequence of the lac promoter compared with the promoter consensus sequence. The differences mean that RNA polymerase binds relatively weakly to the lac promoter until the polymerase is activated by CRP-cAMP. TTTACA TATGTTlac promoter H1100235 region ATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACAC H1100210 region TTGACA TATAAT Promoter consensus sequence (a) (b) CRP site Operator Bound by RNA polymerase mRNA H1100235 region H1100210 region DNA 5H11032 5H11032 3H11032 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1093 mac76 mac76:385_reb: The effect of glucose on CRP is mediated by the cAMP interaction (Fig. 28–18). CRP binds to DNA most avidly when cAMP concentrations are high. In the pres- ence of glucose, the synthesis of cAMP is inhibited and efflux of cAMP from the cell is stimulated. As [cAMP] declines, CRP binding to DNA declines, thereby de- creasing the expression of the lac operon. Strong in- duction of the lac operon therefore requires both lac- tose (to inactivate the lac repressor) and a lowered concentration of glucose (to trigger an increase in [cAMP] and increased binding of cAMP to CRP). CRP and cAMP are involved in the coordinated reg- ulation of many operons, primarily those that encode enzymes for the metabolism of secondary sugars such as lactose and arabinose. A network of operons with a common regulator is called a regulon. This arrange- ment, which allows for coordinated shifts in cellular functions that can require the action of hundreds of genes, is a major theme in the regulated expression of dispersed networks of genes in eukaryotes. Other bac- terial regulons include the heat-shock gene system that responds to changes in temperature (p. 1083) and the genes induced in E. coli as part of the SOS response to DNA damage, described later. Many Genes for Amino Acid Biosynthetic Enzymes Are Regulated by Transcription Attenuation The 20 common amino acids are required in large amounts for protein synthesis, and E. coli can synthe- size all of them. The genes for the enzymes needed to synthesize a given amino acid are generally clustered in an operon and are expressed whenever existing supplies of that amino acid are inadequate for cellular require- ments. When the amino acid is abundant, the biosyn- thetic enzymes are not needed and the operon is repressed. The E. coli tryptophan (trp) operon (Fig. 28–19) includes five genes for the enzymes required to convert chorismate to tryptophan. Note that two of the enzymes catalyze more than one step in the pathway. The mRNA from the trp operon has a half-life of only about 3 min, allowing the cell to respond rapidly to changing needs for this amino acid. The Trp repressor is a homodimer, each subunit containing 107 amino acid residues (Fig. 28–20). When tryptophan is abundant it binds to the Trp repressor, causing a conformational change that permits the repressor to bind to the trp operator and inhibit expression of the trp operon. The trp operator site overlaps the promoter, so binding of the repressor blocks binding of RNA polymerase. Once again, this simple on/off circuit mediated by a repressor is not the entire regulatory story. Different cellular concentrations of tryptophan can vary the rate of synthesis of the biosynthetic enzymes over a 700-fold range. Once repression is lifted and transcription be- gins, the rate of transcription is fine-tuned by a second regulatory process, called transcription attenuation, in which transcription is initiated normally but is abruptly halted before the operon genes are transcribed. The frequency with which transcription is attenuated is regulated by the availability of tryptophan and relies on the very close coupling of transcription and translation in bacteria. The trp operon attenuation mechanism uses signals encoded in four sequences within a 162 nucleotide leader region at the 5H11032 end of the mRNA, preceding the initiation codon of the first gene (Fig. 28–21a). Within the leader lies a region known as the attenuator, made up of sequences 3 and 4. These sequences base-pair to Chapter 28 Regulation of Gene Expression1094 Low glucose (high cAMP) cAMP Lac repressor bound CRP CRP site Promoter High glucose (low cAMP) Lac repressor Lactose Lac repressor Lactose (a) (b) RNA polymerase FIGURE 28–18 Combined effects of glucose and lactose on expression of the lac operon. (a) High levels of transcription take place only when glucose concentrations are low (so cAMP levels are high and CRP-cAMP is bound) and lactose concentrations are high (so the Lac repressor is not bound). (b) Without bound activator (CRP-cAMP), the lac promoter is poorly transcribed even when lactose concentrations are high and the Lac repressor is not bound. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1094 mac76 mac76:385_reb: form a GqC-rich stem-and-loop structure closely fol- lowed by a series of U residues. The attenuator struc- ture acts as a transcription terminator (Fig. 28–21b). Sequence 2 is an alternative complement for sequence 3 (Fig. 28–21c). If sequences 2 and 3 base-pair, the at- tenuator structure cannot form and transcription con- tinues into the trp biosynthetic genes; the loop formed by the pairing of sequences 2 and 3 does not obstruct transcription. Regulatory sequence 1 is crucial for a tryptophan- sensitive mechanism that determines whether sequence 3 pairs with sequence 2 (allowing transcription to con- tinue) or with sequence 4 (attenuating transcription). Formation of the attenuator stem-and-loop structure depends on events that occur during translation of reg- ulatory sequence 1, which encodes a leader peptide (so called because it is encoded by the leader region of the mRNA) of 14 amino acids, two of which are Trp residues. The leader peptide has no other known cellular func- tion; its synthesis is simply an operon regulatory device. 28.2 Regulation of Gene Expression in Prokaryotes 1095 DNA trpR trpE trp mRNA (low tryptophan levels) P Leader (trpL) Attenuator Regulatory region Regulated genes Anthranilate synthase, component I Anthranilate synthase, component II Anthranilate synthase (I 2 , II 2 ) Tryptophan synthase ( 2 H9252 2 ) Chorismate Anthranilate Glutamine Glutamate H11001 Pyruvate PRPP PP i N-(5H11032-Phosphoribosyl)- Enol-1-o-carboxy- phenylamino- 1-deoxyribulose phosphate Indole-3-glycerol L-Tryptophan L-SerineGlyceraldehyde 3-phosphate trpD trpC trpB trpA Attenuated mRNA (high tryptophan levels) Tryptophan synthase, subunit Tryptophan synthase, subunit N-(5H11032-Phosphoribosyl)- anthranilate isomerase Indole-3-glycerol phosphate synthase Trp Trp repressor O anthranilate phosphate CO 2 H11001 H 2 O H9252 H9251 H9251 FIGURE 28–19 The trp operon. This operon is regulated by two mechanisms: when tryptophan levels are high, (1) the repressor (upper left) binds to its operator and (2) transcription of trp mRNA is attenuated (see Fig. 28–21). The biosynthesis of tryptophan by the enzymes encoded in the trp operon is diagrammed at the bottom (see also Fig. 22–17). FIGURE 28–20 Trp repressor. The repressor is a dimer, with both sub- units (gray and light blue) binding the DNA at helix-turn-helix motifs (PDB ID 1TRO). Bound molecules of tryptophan are in red. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1095 mac76 mac76:385_reb: (c) AGAUACC C A G C C C G C C U A A U G A G C G G G C U U UUUUU 110 3:4 Pair (attenuator) A AC C U C G G G C G C C C G AA GC UC G U A C A U U U C A G AA CC CU AA U G C A C G G U A A A 2:3 Pair 100 110 80 90 UUUU 3H110325H11032 mRNA Trp codons trpL trpL 12 34 Ribosome Attenuator structure When tryptophan levels are high, the ribosome quickly translates sequence 1 (open reading frame encoding leader peptide) and blocks sequence 2 before sequence 3 is transcribed. Continued transcription leads to attenuation at the terminator-like attenuator structure formed by sequences 3 and 4. When tryptophan levels are low, the ribosome pauses at the Trp codons in sequence 1. Formation of the paired structure between sequences 2 and 3 prevents attenuation, because sequence 3 is no longer available to form the attenuator structure with sequence 4. The 2:3 structure, unlike the 3:4 attenuator, does not prevent transcription. Completed leader peptide 5H11032 1 23 M K A I F V L K G M K AI FV L K G W W R T S Incomplete leader peptide 4 trp-regulated genes RNA polymerase DNA DNA (b) (c) AGAUACC C A G C C C G C C U A A U G A G C G G G C U U UUUUU 110 3:4 Pair (attenuator) A AC C U C G G G C G C C C G AA GC UC G U A C A U U U C A G AA CC CU AA U G C A C G G U A A A 2:3 Pair 100 110 80 90 mRNA (a) pppAAGUUCACGUAAAAAGGGUAUCGACAAUGAAAGCAAUUUUCGUAC U G A A A G G UUGGUGGCGCACUUCCUGAAACGGGCAGUGUAUUCACCAUGCGUAAAG C A A U C A G A U ACCCAGCCCGCCUAAUGAGCGGGCUUUUUUUUGAACAAAAUUAGAGAAUAACAAUGCAAACA 1 2 34 Met Lys Ala Ile Phe Val TrpTrpArgThrSer(stop) Gly Leu Lys Met Gln Thr Site of transcription attenuation Leader peptide 139 End of leader region (trpL) TrpE polypeptide 162 FIGURE 28–21 Transcriptional attenuation in the trp operon. Tran- scription is initiated at the beginning of the 162 nucleotide mRNA leader encoded by a DNA region called trpL (see Fig. 28-19). A reg- ulatory mechanism determines whether transcription is attenuated at the end of the leader or continues into the structural genes. (a) The trp mRNA leader (trpL). The attenuation mechanism in the trp operon involves sequences 1 to 4 (highlighted). (b) Sequence 1 encodes a small peptide, the leader peptide, containing two Trp residues (W); it is translated immediately after transcription begins. Sequences 2 and 3 are complementary, as are sequences 3 and 4. The attenuator struc- ture forms by the pairing of sequences 3 and 4 (top). Its structure and function are similar to those of a transcription terminator (see Fig. 26–7). Pairing of sequences 2 and 3 (bottom) prevents the attenuator structure from forming. Note that the leader peptide has no other cel- lular function. Translation of its open reading frame has a purely reg- ulatory role that determines which complementary sequences (2 and 3 or 3 and 4) are paired. (c) Base-pairing schemes for the comple- mentary regions of the trp mRNA leader. 8885d_c28_1096 2/19/04 6:13 AM Page 1096 mac76 mac76:385_reb: 28.2 Regulation of Gene Expression in Prokaryotes 1097 This peptide is translated immediately after it is tran- scribed, by a ribosome that follows closely behind RNA polymerase as transcription proceeds. When tryptophan concentrations are high, concen- trations of charged tryptophan tRNA (Trp-tRNA Trp ) are also high. This allows translation to proceed rapidly past the two Trp codons of sequence 1 and into sequence 2, before sequence 3 is synthesized by RNA polymerase. In this situation, sequence 2 is covered by the ribosome and unavailable for pairing to sequence 3 when se- quence 3 is synthesized; the attenuator structure (se- quences 3 and 4) forms and transcription halts (Fig. 28–21b, top). When tryptophan concentrations are low, however, the ribosome stalls at the two Trp codons in sequence 1, because charged tRNA Trp is less available. Sequence 2 remains free while sequence 3 is synthe- sized, allowing these two sequences to base-pair and permitting transcription to proceed (Fig. 28–21b, bot- tom). In this way, the proportion of transcripts that are attenuated declines as tryptophan concentration declines. Many other amino acid biosynthetic operons use a similar attenuation strategy to fine-tune biosynthetic en- zymes to meet the prevailing cellular requirements. The 15 amino acid leader peptide produced by the phe operon contains seven Phe residues. The leu operon leader peptide has four contiguous Leu residues. The leader peptide for the his operon contains seven con- tiguous His residues. In fact, in the his operon and a number of others, attenuation is sufficiently sensitive to be the only regulatory mechanism. Induction of the SOS Response Requires Destruction of Repressor Proteins Extensive DNA damage in the bacterial chromosome triggers the induction of many distantly located genes. This response, called the SOS response (p. 976), pro- vides another good example of coordinated gene regu- lation. Many of the induced genes are involved in DNA repair (see Table 25–6). The key regulatory proteins are the RecA protein and the LexA repressor. The LexA repressor (M r 22,700) inhibits transcrip- tion of all the SOS genes (Fig. 28–22), and induction of the SOS response requires removal of LexA. This is not a simple dissociation from DNA in response to bind- ing of a small molecule, as in the regulation of the lac operon described above. Instead, the LexA repressor is FIGURE 28–22 SOS response in E. coli. See Table 25–6 for the functions of many of these proteins. The LexA protein is the repressor in this system, which has an operator site (red) near each gene. Because the recA gene is not entirely repressed by the LexA repressor, the normal cell contains about 1,000 RecA monomers. 1 When DNA is exten- sively damaged (e.g., by UV light), DNA replication is halted and the number of single-strand gaps in the DNA increases. 2 RecA protein binds to this damaged, single-stranded DNA, activating the protein’s coprotease activity. 3 While bound to DNA, the RecA protein facilitates cleavage and inactivation of the LexA repressor. When the repressor is inactivated, the SOS genes, including recA, are induced; RecA levels increase 50- to 100-fold. 1 Damage to DNA produces single-strand gap. lexA dinF uvrA polB dinB uvrB sulA umuC,D recA E. coli chromosome LexA repressor Replication 2 RecA binds to single-stranded DNA. lexA dinF uvrA polB dinB uvrB sulA umuC,D recA 3 LexA repressor is inactivated activated proteolysis RecA protein 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1097 mac76 mac76:385_reb: inactivated when it catalyzes its own cleavage at a spe- cific Ala–Gly peptide bond, producing two roughly equal protein fragments. At physiological pH, this au- tocleavage reaction requires the RecA protein. RecA is not a protease in the classical sense, but its interaction with LexA facilitates the repressor’s self-cleavage reac- tion. This function of RecA is sometimes called a co- protease activity. The RecA protein provides the functional link be- tween the biological signal (DNA damage) and induc- tion of the SOS genes. Heavy DNA damage leads to nu- merous single-strand gaps in the DNA, and only RecA that is bound to single-stranded DNA can facilitate cleavage of the LexA repressor (Fig. 28–22, bottom). Binding of RecA at the gaps eventually activates its co- protease activity, leading to cleavage of the LexA re- pressor and SOS induction. During induction of the SOS response in a severely damaged cell, RecA also cleaves and thus inactivates the repressors that otherwise allow propagation of certain viruses in a dormant lysogenic state within the bacter- ial host. This provides a remarkable illustration of evo- lutionary adaptation. These repressors, like LexA, also undergo self-cleavage at a specific Ala–Gly peptide bond, so induction of the SOS response permits repli- cation of the virus and lysis of the cell, releasing new viral particles. Thus the bacteriophage can make a hasty exit from a compromised bacterial host cell. Synthesis of Ribosomal Proteins Is Coordinated with rRNA Synthesis In bacteria, an increased cellular demand for protein synthesis is met by increasing the number of ribosomes rather than altering the activity of individual ribosomes. In general, the number of ribosomes increases as the cellular growth rate increases. At high growth rates, ri- bosomes make up approximately 45% of the cell’s dry weight. The proportion of cellular resources devoted to making ribosomes is so large, and the function of ribo- somes so important, that cells must coordinate the syn- thesis of the ribosomal components: the ribosomal pro- teins (r-proteins) and RNAs (rRNAs). This regulation is distinct from the mechanisms described so far, because it occurs largely at the level of translation. The 52 genes that encode the r-proteins occur in at least 20 operons, each with 1 to 11 genes. Some of these operons also contain the genes for the subunits of DNA primase (see Fig. 25–13), RNA polymerase (see Fig. 26–4), and protein synthesis elongation factors (see Fig. 27–23)—revealing the close coupling of replication, transcription, and protein synthesis during cell growth. The r-protein operons are regulated primarily through a translational feedback mechanism. One r-protein encoded by each operon also functions as a translational repressor, which binds to the mRNA transcribed from that operon and blocks translation of all the genes the messenger encodes (Fig. 28–23). In general, the r-protein that plays the role of repressor also binds directly to an rRNA. Each translational re- pressor r-protein binds with higher affinity to the ap- propriate rRNA than to its mRNA, so the mRNA is bound and translation repressed only when the level of the r-protein exceeds that of the rRNA. This ensures that translation of the mRNAs encoding r-proteins is re- pressed only when synthesis of these r-proteins exceeds that needed to make functional ribosomes. In this way, the rate of r-protein synthesis is kept in balance with rRNA availability. The mRNA binding site for the translational re- pressor is near the translational start site of one of the genes in the operon, usually the first gene (Fig. 28–23). In other operons this would affect only that one gene, because in bacterial polycistronic mRNAs most genes have independent translation signals. In the r-protein operons, however, the translation of one gene depends on the translation of all the others. The mechanism of this translational coupling is not yet understood in de- tail. However, in some cases the translation of multiple genes appears to be blocked by folding of the mRNA into an elaborate three-dimensional structure that is sta- bilized both by internal base-pairing (as in Fig. 8–26) and by binding of the translational repressor protein. When the translational repressor is absent, ribosome binding and translation of one or more of the genes dis- rupts the folded structure of the mRNA and allows all the genes to be translated. Because the synthesis of r-proteins is coordinated with the available rRNA, the regulation of ribosome pro- duction reflects the regulation of rRNA synthesis. In E. coli, rRNA synthesis from the seven rRNA operons re- sponds to cellular growth rate and to changes in the availability of crucial nutrients, particularly amino acids. The regulation coordinated with amino acid concentra- tions is known as the stringent response (Fig. 28–24). When amino acid concentrations are low, rRNA synthe- sis is halted. Amino acid starvation leads to the binding of uncharged tRNAs to the ribosomal A site; this trig- gers a sequence of events that begins with the binding of an enzyme called stringent factor (RelA protein) to the ribosome. When bound to the ribosome, stringent factor catalyzes formation of the unusual nucleotide guanosine tetraphosphate (ppGpp; see Fig. 8–42); it adds pyrophosphate to the 3H11032 position of GTP, in the reaction GTP H11001 ATP 88n pppGpp H11001 AMP then a phosphohydrolase cleaves off one phosphate to form ppGpp. The abrupt rise in ppGpp level in response to amino acid starvation results in a great reduction in rRNA synthesis, mediated at least in part by the bind- ing of ppGpp to RNA polymerase. Chapter 28 Regulation of Gene Expression1098 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1098 mac76 mac76:385_reb: L105H11032 3H11032H9252H9252 H11032 S125H11032 3H11032EF-G EF-Tu S10 L35H11032 3H11032 L7/L12 S7 L4 L23 L2 (L22, S19) S3 L16 L29 S17 L14 L245H11032 3H11032L5 S14 S8 L18 S5 L30 L15L6 S13 S115H11032 3H11032S4 L17 operon str operon S10 operon spc operon operon S4 L4 S8 S7 L10 H9252 H9251H9251 28.2 Regulation of Gene Expression in Prokaryotes 1099 FIGURE 28–23 Translational feedback in some ribosomal protein operons. The r-proteins that act as translational repressors are shaded pink. Each translational repressor blocks the translation of all genes in that operon by binding to the indicated site on the mRNA. Genes that encode subunits of RNA polymerase are shaded yellow; genes that encode elongation factors are blue. The r-proteins of the large (50S) ribosomal subunit are designated L1 to L34; those of the small (30S) subunit, S1 to S21. FIGURE 28–24 Stringent response in E. coli. This response to amino acid starvation is triggered by binding of an uncharged tRNA in the ribosomal A site. A protein called stringent factor binds to the ribosome and catalyzes the synthesis of pppGpp, which is converted by a phosphohy- drolase to ppGpp. The signal ppGpp reduces transcription of some genes and increases that of others, in part by binding to the H9252 subunit of RNA polymerase and altering the enzyme’s promoter specificity. Synthesis of rRNA is reduced when ppGpp levels increase. P E A 3H110325H11032 Growing polypeptide NH 3 + mRNA GTP H11001 ATP Stringent factor (RelA protein) (p)ppGpp H11001 AMP RNA polymerase OH 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1099 mac76 mac76:385_reb: The nucleotide ppGpp, along with cAMP, belongs to a class of modified nucleotides that act as cellular sec- ond messengers (p. 302). In E. coli, these two nu- cleotides serve as starvation signals; they cause large changes in cellular metabolism by increasing or de- creasing the transcription of hundreds of genes. In eu- karyotic cells, similar nucleotide second messengers also have multiple regulatory functions. The coordina- tion of cellular metabolism with cell growth is highly complex, and further regulatory mechanisms undoubt- edly remain to be discovered. Some Genes Are Regulated by Genetic Recombination Salmonella typhimurium, which inhabits the mam- malian intestine, moves by rotating the flagella on its cell surface (Fig. 28–25). The many copies of the pro- tein flagellin (M r 53,000) that make up the flagella are prominent targets of mammalian immune systems. But Salmonella cells have a mechanism that evades the im- mune response: they switch between two distinct fla- gellin proteins (FljB and FliC) roughly once every 1,000 generations, using a process called phase variation. The switch is accomplished by periodic inversion of a segment of DNA containing the promoter for a fla- gellin gene. The inversion is a site-specific recombina- tion reaction (see Fig. 25–39) mediated by the Hin re- combinase at specific 14 bp sequences (hix sequences) at either end of the DNA segment. When the DNA seg- ment is in one orientation, the gene for FljB flagellin and the gene encoding a repressor (FljA) are expressed (Fig. 28–26a); the repressor shuts down expression of the gene for FliC flagellin. When the DNA segment is inverted (Fig. 28–26b), the fljA and fljB genes are no longer transcribed, and the fliC gene is induced as the repressor becomes depleted. The Hin recombinase, en- coded by the hin gene in the DNA segment that un- dergoes inversion, is expressed when the DNA segment is in either orientation, so the cell can always switch from one state to the other. This type of regulatory mechanism has the advan- tage of being absolute: gene expression is impossible Chapter 28 Regulation of Gene Expression1100 FIGURE 28–25 Salmonella typhimurium, with flagella evident. fljAhin fliC hin mRNA fljB and fljA mRNA Hin recombinase FljB flagellin FljA protein (repressor) DNA (a) fljAhin fliC hin mRNA Hin recombinase (b) fliC mRNA FliC flagellin Inverted repeat (hix) Promoter for FljB and repressor Promoter for FliC fljB fljB Transposed segment FIGURE 28–26 Regulation of flagellin genes in Salmonella: phase variation. The products of genes fliC and fljB are different flagellins. The hin gene encodes the recombinase that catalyzes inversion of the DNA segment containing the fljB promoter and the hin gene. The recombination sites (inverted repeats) are called hix (yellow). (a) In one orientation, fljB is expressed along with a repressor protein (product of the fljA gene) that represses tran- scription of the fliC gene. (b) In the opposite orientation only the fliC gene is expressed; the fljA and fljB genes cannot be transcribed. The interconversion between these two states, known as phase variation, also requires two other nonspecific DNA-binding proteins (not shown), HU (histonelike protein from U13, a strain of E. coli) and FIS (factor for inversion stimulation). 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1100 mac76 mac76:385_reb: when the gene is physically separated from its promoter (note the position of the fljB promoter in Fig. 28–26b). An absolute on/off switch may be important in this sys- tem (even though it affects only one of the two flagellin genes), because a flagellum with just one copy of the wrong flagellin might be vulnerable to host antibodies against that protein. The Salmonella system is by no means unique. Similar regulatory systems occur in a num- ber of other bacteria and in some bacteriophages, and recombination systems with similar functions have been found in eukaryotes (Table 28–1). Gene regulation by DNA rearrangements that move genes and/or promot- ers is particularly common in pathogens that benefit by changing their host range or by changing their surface proteins, thereby staying ahead of host immune systems. SUMMARY 28.2 Regulation of Gene Expression in Prokaryotes ■ In addition to repression by the Lac repressor, the E. coli lac operon undergoes positive regulation by the cAMP receptor protein (CRP). When [glucose] is low, [cAMP] is high and CRP-cAMP binds to a specific site on the DNA, stimulating transcription of the lac operon and production of lactose-metabolizing enzymes. The presence of glucose depresses [cAMP], decreasing expression of lac and other genes involved in metabolism of secondary sugars. A group of coordinately regulated operons is referred to as a regulon. ■ Operons that produce the enzymes of amino acid synthesis have a regulatory circuit called attenuation, which uses a transcription termination site (the attenuator) in the mRNA. Formation of the attenuator is modulated by a mechanism that couples transcription and translation while responding to small changes in amino acid concentration. ■ In the SOS system, multiple unlinked genes repressed by a single repressor are induced simultaneously when DNA damage triggers RecA protein–facilitated autocatalytic proteolysis of the repressor. ■ In the synthesis of ribosomal proteins, one protein in each r-protein operon acts as a translational repressor. The mRNA is bound by the repressor, and translation is blocked only when the r-protein is present in excess of available rRNA. Some genes are regulated by genetic recombination processes that move promoters relative to the genes being regulated. Regulation can also take place at the level of translation. These diverse mechanisms permit very sensitive cellular responses to environmental change. 28.2 Regulation of Gene Expression in Prokaryotes 1101 Recombinase/ Type of System recombination site recombination Function Phase variation (Salmonella) Hin/hix Site-specific Alternative expression of two flagellin genes allows evasion of host immune response. Host range (bacteriophage H9262) Gin/gix Site-specific Alternative expression of two sets of tail fiber genes affects host range. Mating-type switch (yeast) HO endonuclease, Nonreciprocal Alternative expression of two RAD52 protein, other gene conversion * mating types of yeast, proteins/MAT a and H9251, creates cells of different mating types that can mate and undergo meiosis. Antigenic variation (trypanosomes) ? Varies Nonreciprocal gene Successive expression of conversion * different genes encoding the variable surface glycoproteins (VSGs) allows evasion of host immune response. TABLE 28–1 Examples of Gene Regulation by Recombination * In nonreciprocal gene conversion (a class of recombination events not discussed in Chapter 25), genetic information is moved from one part of the genome (where it is silent) to another (where it is expressed). The reaction is similar to replicative transposition (see Fig. 25–43). ? Trypanosomes cause African sleeping sickness and other diseases (see Box 22–2). The outer surface of a trypanosome is made up of multiple copies of a single VSG, the major surface antigen. A cell can change surface antigens to more than 100 different forms, precluding an effective defense by the host immune system. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1101 mac76 mac76:385_reb: 28.3 Regulation of Gene Expression in Eukaryotes Initiation of transcription is a crucial regulation point for both prokaryotic and eukaryotic gene expression. Al- though some of the same regulatory mechanisms are used in both systems, there is a fundamental difference in the regulation of transcription in eukaryotes and bacteria. We can define a transcriptional ground state as the inherent activity of promoters and transcriptional ma- chinery in vivo in the absence of regulatory sequences. In bacteria, RNA polymerase generally has access to every promoter and can bind and initiate transcription at some level of efficiency in the absence of activators or repressors; the transcriptional ground state is there- fore nonrestrictive. In eukaryotes, however, strong pro- moters are generally inactive in vivo in the absence of regulatory proteins; that is, the transcriptional ground state is restrictive. This fundamental difference gives rise to at least four important features that distinguish the regulation of gene expression in eukaryotes from that in bacteria. First, access to eukaryotic promoters is restricted by the structure of chromatin, and activation of tran- scription is associated with many changes in chromatin structure in the transcribed region. Second, although eukaryotic cells have both positive and negative regula- tory mechanisms, positive mechanisms predominate in all systems characterized so far. Thus, given that the transcriptional ground state is restrictive, virtually every eukaryotic gene requires activation to be transcribed. Third, eukaryotic cells have larger, more complex mul- timeric regulatory proteins than do bacteria. Finally, transcription in the eukaryotic nucleus is separated from translation in the cytoplasm in both space and time. The complexity of regulatory circuits in eukaryotic cells is extraordinary, as the following discussion shows. We conclude the section with an illustrated description of one of the most elaborate circuits: the regulatory cas- cade that controls development in fruit flies. Transcriptionally Active Chromatin Is Structurally Distinct from Inactive Chromatin The effects of chromosome structure on gene regula- tion in eukaryotes have no clear parallel in prokaryotes. In the eukaryotic cell cycle, interphase chromosomes appear, at first viewing, to be dispersed and amorphous (see Figs 12–41, 24–25). Nevertheless, several forms of chromatin can be found along these chromosomes. About 10% of the chromatin in a typical eukaryotic cell is in a more condensed form than the rest of the chro- matin. This form, heterochromatin, is transcription- ally inactive. Heterochromatin is generally associated with particular chromosome structures—the cen- tromeres, for example. The remaining, less condensed chromatin is called euchromatin. Transcription of a eukaryotic gene is strongly re- pressed when its DNA is condensed within heterochro- matin. Some, but not all, of the euchromatin is transcriptionally active. Transcriptionally active chro- mosomal regions can be detected based on their in- creased sensitivity to nuclease-mediated degradation. Nucleases such as DNase I tend to cleave the DNA of carefully isolated chromatin into fragments of multiples of about 200 bp, reflecting the regular repeating struc- ture of the nucleosome (see Fig. 24–26). In actively tran- scribed regions, the fragments produced by nuclease ac- tivity are smaller and more heterogeneous in size. These regions contain hypersensitive sites, sequences es- pecially sensitive to DNase I, which consist of about 100 to 200 bp within the 1,000 bp flanking the 5H11032 ends of transcribed genes. In some genes, hypersensitive sites are found farther from the 5H11032 end, near the 3H11032 end, or even within the gene itself. Many hypersensitive sites correspond to binding sites for known regulatory proteins, and the relative ab- sence of nucleosomes in these regions may allow the binding of these proteins. Nucleosomes are entirely ab- sent in some regions that are very active in transcrip- tion, such as the rRNA genes. Transcriptionally active chromatin tends to be deficient in histone H1, which binds to the linker DNA between nucleosome particles. Histones within transcriptionally active chromatin and heterochromatin also differ in their patterns of co- valent modification. The core histones of nucleosome particles (H2A, H2B, H3, H4; see Fig. 24–27) are mod- ified by irreversible methylation of Lys residues, phos- phorylation of Ser or Thr residues, acetylation (see be- low), or attachment of ubiquitin (see Fig. 27–41). Each of the core histones has two distinct structural domains. A central domain is involved in histone-histone interac- tion and the wrapping of DNA around the nucleosome. A second, lysine-rich amino-terminal domain is gener- ally positioned near the exterior of the assembled nu- cleosome particle; the covalent modifications occur at specific residues concentrated in this amino-terminal domain. The patterns of modification have led some re- searchers to propose the existence of a histone code, in which modification patterns are recognized by enzymes that alter the structure of chromatin. Modifications as- sociated with transcriptional activation would be recog- nized by enzymes that make the chromatin more ac- cessible to the transcription machinery. 5-Methylation of cytosine residues of CpG se- quences is common in eukaryotic DNA (p. 296), but DNA in transcriptionally active chromatin tends to be undermethylated. Furthermore, CpG sites in particular genes are more often undermethylated in cells from tis- sues where the genes are expressed than in those where Chapter 28 Regulation of Gene Expression1102 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1102 mac76 mac76:385_reb: the genes are not expressed. The overall pattern sug- gests that active chromatin is prepared for transcription by the removal of potential structural barriers. Chromatin Is Remodeled by Acetylation and Nucleosomal Displacements The detailed mechanisms for transcription-associated structural changes in chromatin, called chromatin re- modeling, are now coming to light, including identifi- cation of a variety of enzymes directly implicated in the process. These include enzymes that covalently modify the core histones of the nucleosome and others that use the chemical energy of ATP to remodel nucleosomes on the DNA (Table 28–2). The acetylation and deacetylation of histones figure prominently in the processes that activate chromatin for transcription. As noted above, the amino-terminal domains of the core histones are generally rich in Lys residues. Particular Lys residues are acetylated by histone acetyltransferases (HATs). Cytosolic (type B) HATs acetylate newly synthesized histones before the histones are imported into the nucleus. The subsequent assembly of the histones into chromatin is facilitated by additional proteins: CAF1 for H3 and H4, and NAP1 for H2A and H2B. (See Table 28–2 for an explanation of some of these abbreviated names.) Where chromatin is being activated for transcrip- tion, the nucleosomal histones are further acetylated by nuclear (type A) HATs. The acetylation of multiple Lys residues in the amino-terminal domains of histones H3 and H4 can reduce the affinity of the entire nucleosome for DNA. Acetylation may also prevent or promote in- teractions with other proteins involved in transcription or its regulation. When transcription of a gene is no longer required, the acetylation of nucleosomes in that vicinity is reduced by the activity of histone deacety- lases, as part of a general gene-silencing process that restores the chromatin to a transcriptionally inactive state. In addition to the removal of certain acetyl groups, new covalent modification of histones marks chromatin as transcriptionally inactive. As an example, the Lys residue at position 9 in histone H3 is often methylated in heterochromatin. Chromatin remodeling also requires protein com- plexes that actively move or displace nucleosomes, hy- drolyzing ATP in the process (Table 28–2). The enzyme complex SWI/SNF found in all eukaryotic cells, contains 11 polypeptides (total M r 2 H11003 10 6 ) that together create hypersensitive sites in the chromatin and stimulate the binding of transcription factors. SWI/SNF is not required for the transcription of every gene. NURF is another ATP-dependent enzyme complex that remodels chro- matin in ways that complement and overlap the activ- ity of SWI/SNF. These enzyme complexes play an im- portant role in preparing a region of chromatin for active transcription. Many Eukaryotic Promoters Are Positively Regulated As already noted, eukaryotic RNA polymerases have lit- tle or no intrinsic affinity for their promoters; initiation of transcription is almost always dependent on the action of multiple activator proteins. One important reason for the apparent predominance of positive regu- lation seems obvious: the storage of DNA within chro- matin effectively renders most promoters inaccessible, so genes are normally silent in the absence of other reg- ulation. The structure of chromatin affects access to some promoters more than others, but repressors that 28.3 Regulation of Gene Expression in Eukaryotes 1103 Oligomeric structure Enzyme complex * (number of polypeptides) Source Activities GCN5-ADA2-ADA3 3 Yeast GCN5 has type A HAT activity SAGA/PCAF H1102220 Eukaryotes Includes GCN5-ADA2-ADA3 SWI/SNF 11; total M r 2 H11003 10 6 Eukaryotes ATP-dependent nucleosome remodeling NURF 4; total M r 500,000 Drosophila ATP-dependent nucleosome remodeling CAFI H110222 Humans; Drosophila Responsible for binding histones H3 and H4 to DNA NAP1 1; M r 125,000 Widely distributed in Responsible for binding histones H2A eukaryotes and H2B to DNA TABLE 28–2 Some Enzyme Complexes Catalyzing Chromatin Structural Changes Associated with Transcription * The abbreviations for eukaryotic genes and proteins are often more confusing or obscure than those used for bacteria. The complex of GCN5 (general control nonderepressible) and ADA (alteration/deficiency activation) proteins was discovered during investigation of the regulation of nitrogen metabolism genes in yeast. These proteins can be part of the larger SAGA complex (SPF, ADA2,3, GCN5, acetyltransferase) in yeasts. The equivalent of SAGA in humans is PCAF (p300/CBP-associated factor). SWI (switching) was discovered as a protein required for expression of certain genes involved in mating-type switching in yeast, and SNF (sucrose nonfermenting) as a factor for expression of the yeast gene for sucrase. Subsequent studies revealed multiple SWI and SNF proteins that acted in a complex. The SWI/SNF complex has a role in the expression of a wide range of genes and has been found in many eukaryotes, including humans. NURF is nuclear remodeling factor; CAF1, chromatin assembly factor; and NAP1, nucleosome assembly protein. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1103 mac76 mac76:385_reb: bind to DNA so as to preclude access of RNA polymerase (negative regulation) would often be simply redundant. Other factors are at play in the use of positive regula- tion, and speculation generally centers around two: the large size of eukaryotic genomes and the greater effi- ciency of positive regulation. First, nonspecific DNA binding of regulatory pro- teins becomes a more important problem in the much larger genomes of higher eukaryotes. And the chance that a single specific binding sequence will occur ran- domly at an inappropriate site also increases with genome size. Specificity for transcriptional activation can be improved if each of several positive-regulatory proteins must bind specific DNA sequences and then form a complex in order to become active. The average number of regulatory sites for a gene in a multicellular organism is probably at least five. The requirement for binding of several positive-regulatory proteins to spe- cific DNA sequences vastly reduces the probability of the random occurrence of a functional juxtaposition of all the necessary binding sites. In principle, a similar strategy could be used by multiple negative-regulatory elements, but this brings us to the second reason for the use of positive regulation: it is simply more efficient. If the 30,000 to 35,000 genes in the human genome were negatively regulated, each cell would have to synthe- size, at all times, this same number of different repres- sors (or many times this number if multiple regulatory elements were used at each promoter) in concentra- tions sufficient to permit specific binding to each “un- wanted” gene. In positive regulation, most of the genes are normally inactive (that is, RNA polymerases do not bind to the promoters) and the cell synthesizes only the activator proteins needed to promote transcription of the subset of genes required in the cell at that time. These arguments notwithstanding, there are examples of negative regulation in eukaryotes, from yeast to hu- mans, as we shall see. DNA-Binding Transactivators and Coactivators Facilitate Assembly of the General Transcription Factors To continue our exploration of the regulation of gene expression in eukaryotes, we return to the interactions between promoters and RNA polymerase II (Pol II), the enzyme responsible for the synthesis of eukaryotic mRNAs. Although most (but not all) Pol II promoters include the TATA box and Inr (initiator) sequences, with their standard spacing (see Fig. 26–8), they vary greatly in both the number and the location of additional se- quences required for the regulation of transcription. These additional regulatory sequences are usually called enhancers in higher eukaryotes and upstream acti- vator sequences (UASs) in yeast. A typical enhancer may be found hundreds or even thousands of base pairs upstream from the transcription start site, or may even be downstream, within the gene itself. When bound by the appropriate regulatory proteins, an enhancer in- creases transcription at nearby promoters regardless of its orientation in the DNA. The UASs of yeast function in a similar way, although generally they must be posi- tioned upstream and within a few hundred base pairs of the transcription start site. An average Pol II promoter may be affected by a half-dozen regulatory sequences of this type, and even more complex promoters are quite common. Successful binding of active RNA polymerase II holoenzyme at one of its promoters usually requires the action of other proteins (Fig. 28–27), of three types: (1) basal transcription factors (see Fig. 26–9, Table 26–1), required at every Pol II promoter; (2) DNA- binding transactivators, which bind to enhancers or UASs and facilitate transcription; and (3) coactivators. The latter group act indirectly—not by binding to the DNA—and are required for essential communication be- tween the DNA-binding transactivators and the complex composed of Pol II and the general transcription factors. Furthermore, a variety of repressor proteins can inter- fere with communication between the RNA polymerase and the DNA-binding transactivators, resulting in re- pression of transcription (Fig. 28–27b). Here we focus on the protein complexes shown in Figure 28–27 and on how they interact to activate transcription. TATA-Binding Protein The first component to bind in the assembly of a preinitiation complex at the TATA box of a typical Pol II promoter is the TATA-binding protein (TBP). The complete complex includes the basal (or general) transcription factors TFIIB, TFIIE, TFIIF, TFIIH; Pol II; and perhaps TFIIA (not all of the factors are shown in Fig. 28–27). This minimal preinitiation complex, however, is often insufficient for the initiation of transcription and generally does not form at all if the promoter is obscured within chromatin. Positive regu- lation leading to transcription is imposed by the trans- activators and coactivators. DNA-Binding Transactivators The requirements for trans- activators vary greatly from one promoter to another. A few transactivators are known to facilitate transcription at hundreds of promoters, whereas others are specific for a few promoters. Many transactivators are sensitive to the binding of signal molecules, providing the capac- ity to activate or deactivate transcription in response to a changing cellular environment. Some enhancers bound by DNA-binding transactivators are quite distant from the promoter’s TATA box. How do the transactivators function at a distance? The answer in most cases seems to be that, as indicated earlier, the intervening DNA is looped so that the various protein complexes can inter- act directly. The looping is promoted by certain non- Chapter 28 Regulation of Gene Expression1104 8885d_c28_1104 2/19/04 6:13 AM Page 1104 mac76 mac76:385_reb: histone proteins that are abundant in chromatin and bind nonspecifically to DNA. These high mobility group (HMG) proteins (Fig. 28–27; “high mobility” refers to their electrophoretic mobility in polyacrylamide gels) play an important structural role in chromatin remod- eling and transcriptional activation. Coactivator Protein Complexes Most transcription re- quires the presence of additional protein complexes. Some major regulatory protein complexes that interact with Pol II have been defined both genetically and bio- chemically. These coactivator complexes act as inter- mediaries between the DNA-binding transactivators and the Pol II complex. The best-characterized coactivator is the transcrip- tion factor TFIID (Fig. 28–27). In eukaryotes, TFIID is a large complex that includes TBP and ten or more TBP- associated factors (TAFs). Some TAFs resemble his- tones and may play a role in displacing nucleosomes dur- ing the activation of transcription. Many DNA-binding transactivators aid in transcription initiation by inter- acting with one or more TAFs. The requirement for TAFs to initiate transcription can vary greatly from one gene to another. Some promoters require TFIID, some do not, and some require only subsets of the TFIID TAF subunits. Another important coactivator consists of 20 or more polypeptides in a protein complex called media- tor (Fig. 28–27); the 20 core polypeptides are highly conserved from fungi to humans. Mediator binds tightly to the carboxyl-terminal domain (CTD) of the largest subunit of Pol II. The mediator complex is required for both basal and regulated transcription at promoters used by Pol II, and it also stimulates the phosphoryla- tion of the CTD by TFIIH. Both mediator and TFIID are required at some promoters. As with TFIID, some DNA- binding transactivators interact with one or more com- ponents of the mediator complex. Coactivator com- plexes function at or near the promoter’s TATA box. Choreography of Transcriptional Activation We can now be- gin to piece together the sequence of transcriptional ac- tivation events at a typical Pol II promoter. First, cru- cial remodeling of the chromatin takes place in stages. Some DNA-binding transactivators have significant affinity for their binding sites even when the sites are within condensed chromatin. Binding of one transacti- vator may facilitate the binding of others, gradually dis- placing some nucleosomes. The bound transactivators can then interact di- rectly with HATs or enzyme complexes such as SWI/SNF (or both), accelerating the remodeling of the surrounding chromatin. In this way a bound transac- tivator can draw in other components necessary for further chromatin remodeling to permit transcription of specific genes. The bound transactivators, gener- ally acting through complexes such as TFIID or me- diator (or both), stabilize the binding of Pol II and its associated transcription factors and greatly facilitate formation of the preinitiation transcription complex. Complexity in these regulatory circuits is the rule rather than the exception, with multiple DNA-bound transactivators promoting transcription. 28.3 Regulation of Gene Expression in Eukaryotes 1105 TATAUAS Inr DNA Enhancers HMG proteins DNA-binding transactivators co- activators TBP TFIID CTD RNA polymerase II complex Mediator (a) Transcription TATAUAS Inr Enhancers TBP TFIID Mediator Repressor (b) FIGURE 28–27 Eukaryotic promoters and regulatory proteins. RNA polymerase II and its associated general transcription factors form a preinitiation complex at the TATA box and Inr site of the cognate pro- moters, a process facilitated by DNA-binding transactivators, acting through TFIID and/or mediator. (a) A composite promoter with typi- cal sequence elements and protein complexes found in both yeast and higher eukaryotes. The carboxyl-terminal domain (CTD) of Pol II (see Fig. 26–9) is an important point of interaction with mediator and other protein complexes. Not shown are the protein complexes required for histone acetylation and chromatin remodeling. For the DNA-binding transactivators, DNA-binding domains are shown in green, activation domains in pink. The interactions symbolized by blue arrows are dis- cussed in the text. (b) A wide variety of eukaryotic transcriptional re- pressors function by a range of mechanisms. Some bind directly to DNA, displacing a protein complex required for activation; others in- teract with various parts of the transcription or activation complexes to prevent activation. Possible points of interaction are indicated with red arrows. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1105 mac76 mac76:385_reb: The script can change from one promoter to an- other, but most promoters seem to require a precisely ordered assembly of components to initiate transcrip- tion. The assembly process is not always fast. At some genes it may take minutes; at certain genes in higher eukaryotes the process can take days. Reversible Transcriptional Activation Although rarer, some eukaryotic regulatory proteins that bind to Pol II pro- moters can act as repressors, inhibiting the formation of active preinitiation complexes (Fig. 28–27b). Some transactivators can adopt different conformations, en- abling them to serve as transcriptional activators or re- pressors. For example, some steroid hormone receptors (described later) function in the nucleus as DNA- binding transactivators, stimulating the transcription of certain genes when a particular steroid hormone signal is present. When the hormone is absent, the receptor proteins revert to a repressor conformation, prevent- ing the formation of preinitiation complexes. In some cases this repression involves interaction with histone deacetylases and other proteins that help restore the surrounding chromatin to its transcriptionally inactive state. The Genes of Galactose Metabolism in Yeast Are Subject to Both Positive and Negative Regulation Some of the general principles described above can be illustrated by one well-studied eukaryotic regulatory circuit (Fig. 28–28). The enzymes required for the im- portation and metabolism of galactose in yeast are en- coded by genes scattered over several chromosomes (Table 28–3). Each of the GAL genes is transcribed sep- arately, and yeast cells have no operons like those in bacteria. However, all the GAL genes have similar pro- moters and are regulated coordinately by a common set of proteins. The promoters for the GAL genes consist of the TATA box and Inr sequences, as well as an up- stream activator sequence (UAS G ) recognized by a DNA-binding transcriptional activator known as Gal4 protein (Gal4p). Regulation of gene expression by galac- tose entails an interplay between Gal4p and two other proteins, Gal80p and Gal3p (Fig. 28–28). Gal80p forms a complex with Gal4p, preventing Gal4p from function- ing as an activator of the GAL promoters. When galac- tose is present, it binds Gal3p, which then interacts with Gal80p, allowing Gal4p to function as an activator at the various GAL promoters. Other protein complexes also have a role in acti- vating transcription of the GAL genes. These may in- clude the SAGA complex for histone acetylation, the SWI/SNF complex for nucleosome remodeling, and the mediator complex. Figure 28–29 provides an idea of the complexity of protein interactions in the overall process of transcriptional activation in eukaryotic cells. Glucose is the preferred carbon source for yeast, as it is for bacteria. When glucose is present, most of the GAL genes are repressed—whether galactose is present or not. The GAL regulatory system described above is effectively overridden by a complex catabolite repres- sion system that includes several proteins (not depicted in Fig. 28–29). DNA-Binding Transactivators Have a Modular Structure DNA-binding transactivators typically have a distinct structural domain for specific DNA binding and one or more additional domains for transcriptional activation or for interaction with other regulatory proteins. Inter- action of two regulatory proteins is often mediated by domains containing leucine zippers (Fig. 28–14) or helix- loop-helix motifs (Fig. 28–15). We consider here three Chapter 28 Regulation of Gene Expression1106 TATA UAS G Inr TBP RNA polymerase II complex Gal4p Gal3p Gal80p Intermediary complex (TFIID or mediator) TATA Inr TBP UASG Intermediary complex Gal3p + galactose HMG proteins 0FIGURE 28–28 Regulation of transcription at genes of galactose metabolism in yeast. Galactose is imported into the cell and converted to galactose 6-phosphate by a pathway involving six enzymes whose genes are scattered over three chromosomes (see Table 28–3). Tran- scription of these genes is regulated by the combined actions of the proteins Gal4p, Gal80p, and Gal3p, with Gal4p playing the central role of DNA-binding transactivator. The Gal4p-Gal80p complex is in- active in gene activation. Binding of galactose to Gal3p and its inter- action with Gal80p produce a conformational change in Gal80p that allows Gal4p to function in transcription activation. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1106 mac76 mac76:385_reb: distinct types of structural domains used in activation by DNA-binding transactivators (Fig. 28–30a): Gal4p, Sp1, and CTF1. Gal4p contains a zinc fingerlike structure in its DNA-binding domain, near the amino terminus; this do- main has six Cys residues that coordinate two Zn 2H11001 . The protein functions as a homodimer (with dimerization mediated by interactions between two coiled coils) and binds to UAS G , a palindromic DNA sequence about 17 bp long. Gal4p has a separate activation domain with many acidic amino acid residues. Experiments that substitute a variety of different peptide sequences for the acidic activation domain of Gal4p suggest that the acidic na- ture of this domain is critical to its function, although its precise amino acid sequence can vary considerably. Sp1 (M r 80,000) is a DNA-binding transactivator for a large number of genes in higher eukaryotes. Its DNA binding site, the GC box (consensus sequence 28.3 Regulation of Gene Expression in Eukaryotes 1107 FIGURE 28–29 Protein complexes involved in transcription activa- tion of a group of related eukaryotic genes. The GAL system illustrates the complexity of this process, but not all these protein complexes are yet known to affect GAL gene transcription. Note that many of the complexes (such as SWI/SNF, GCN5-ADA2-ADA3, and mediator) af- fect the transcription of many genes. The complexes assemble step- wise. First the DNA-binding transactivators bind, then the additional protein complexes needed to remodel the chromatin and allow tran- scription to begin. Relative protein expression Chromosomal Protein size in different carbon sources Protein function location (number of residues) Glucose Glycerol Galactose Regulated genes GAL1 Galactokinase II 528 H11002 H11002 H11001H11001H11001 GAL2 Galactose permease XII 574 H11002 H11002 H11001H11001H11001 PGM2 Phosphoglucomutase XIII 569 H11001H11001 H11001H11001 GAL7 Galactose 1-phosphate uridylyltransferase II 365 H11002 H11002 H11001H11001H11001 GAL10 UDP-glucose 4-epimerase II 699 H11002 H11002 H11001H11001H11001 MEL1 H9251-Galactosidase II 453 H11002H11001 H11001H11001 Regulatory genes GAL3 Inducer IV 520 H11002H11001 H11001H11001 GAL4 Transcriptional activator XVI 881 H11001/H11002H11001 H11001 GAL80 Transcriptional inhibitor XIII 435 H11001H11001 H11001H11001 TABLE 28–3 Genes of Galactose Metabolism in Yeast Source: Adapted from Reece, R. & Platt, A. (1997) Signaling activation and repression of RNA polymerase II transcription in yeast. Bioessays 19, 1001–1010. TFIIA TFIIA TATA TBP TBP, UAS G TFIIA Mediator SWI/ SNF UAS G RNA polymerase II complex TFIIB TFIIF TFIIE TFIIH TBP TATA UAS G Gal4p GCN5-ADA2-ADA3 HMG proteins 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1107 mac76 mac76:385_reb: GGGCGG), is usually quite near the TATA box. The DNA-binding domain of the Sp1 protein is near its car- boxyl terminus and contains three zinc fingers. Two other domains in Sp1 function in activation, and are no- table in that 25% of their amino acid residues are Gln. A wide variety of other activator proteins also have these glutamine-rich domains. CCAAT-binding transcription factor 1 (CTF1) be- longs to a family of DNA-binding transactivators that bind a sequence called the CCAAT site (its consensus sequence is TGGN 6 GCCAA, where N is any nucleotide). The DNA-binding domain of CTF1 contains many basic amino acid residues, and the binding region is probably arranged as an H9251 helix. This protein has neither a helix- turn-helix nor a zinc finger motif; its DNA-binding mech- anism is not yet clear. CTF1 has a proline-rich acti- vation domain, with Pro accounting for more than 20% of the amino acid residues. The discrete activation and DNA-binding domains of regulatory proteins often act completely independ- ently, as has been demonstrated in “domain-swapping” experiments. Genetic engineering techniques (Chap- ter 9) can join the proline-rich activation domain of CTF1 to the DNA-binding domain of Sp1 to create a pro- tein that, like normal Sp1, binds to GC boxes on the DNA and activates transcription at a nearby promoter (as in Fig. 28–30b). The DNA-binding domain of Gal4p has similarly been replaced experimentally with the DNA- binding domain of the prokaryotic LexA repressor (of the SOS response; Fig. 28–22). This chimeric protein neither binds at UAS G nor activates the yeast GAL genes (as would normal Gal4p) unless the UAS G sequence in the DNA is replaced by the LexA recognition site. Eukaryotic Gene Expression Can Be Regulated by Intercellular and Intracellular Signals The effects of steroid hormones (and of thyroid and retinoid hormones, which have the same mode of ac- tion) provide additional well-studied examples of the modulation of eukaryotic regulatory proteins by direct interaction with molecular signals (see Fig. 12–40). Un- like other types of hormones, steroid hormones do not have to bind to plasma membrane receptors. Instead, they can interact with intracellular receptors that are themselves transcriptional transactivators. Steroid hor- mones too hydrophobic to dissolve readily in the blood (estrogen, progesterone, and cortisol, for example) travel on specific carrier proteins from their point of re- lease to their target tissues. In the target tissue, the hor- mone passes through the plasma membrane by simple diffusion and binds to its specific receptor protein in the nucleus. The hormone-receptor complex acts by bind- ing to highly specific DNA sequences called hormone response elements (HREs), thereby altering gene ex- pression. Hormone binding triggers changes in the con- formation of the receptor proteins so that they become capable of interacting with additional transcription fac- tors. The bound hormone-receptor complex can either enhance or suppress the expression of adjacent genes. The DNA sequences (HREs) to which hormone- receptor complexes bind are similar in length and arrangement, but differ in sequence, for the various steroid hormones. Each receptor has a consensus HRE sequence (Table 28–4) to which the hormone-receptor complex binds well, with each consensus consisting of two six-nucleotide sequences, either contiguous or sep- arated by three nucleotides, in tandem or in a palindromic arrangement. The hormone receptors have a highly conserved DNA-binding domain with two zinc fingers Chapter 28 Regulation of Gene Expression1108 TFIID TATA UAS G CCAAT CTFI INR DNA (a) HMG proteins TFIIH TBP GC P P P Sp1 QQQ ––– TFIID TATA INR DNA (b) TFIIH TBP CTFI GC Sp1 PPP Gal4p FIGURE 28–30 DNA-binding transactivators. (a) Typical DNA-bind- ing transactivators such as CTF1, Gal4p, and Sp1 have a DNA-bind- ing domain and an activation domain. The nature of the activation do- main is indicated by symbols: H11002H11002H11002, acidic; Q Q Q, glutamine-rich; P P P, proline-rich. Some or all of these proteins may activate tran- scription by interacting with intermediary complexes such as TFIID or mediator. Note that the binding sites illustrated here are not generally found together near a single gene. (b) A chimeric protein containing the DNA-binding domain of Sp1 and the activation domain of CTF1 activates transcription if a GC box is present. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1108 mac76 mac76:385_reb: (Fig. 28–31). The hormone-receptor complex binds to the DNA as a dimer, with the zinc finger domains of each monomer recognizing one of the six-nucleotide se- quences. The ability of a given hormone to act through the hormone-receptor complex to alter the expression of a specific gene depends on the exact sequence of the HRE, its position relative to the gene, and the number of HREs associated with the gene. Unlike the DNA-binding domain, the ligand-binding region of the receptor protein—always at the carboxyl terminus—is quite specific to the particular receptor. In the ligand-binding region, the glucocorticoid receptor is only 30% similar to the estrogen receptor and 17% sim- ilar to the thyroid hormone receptor. The size of the lig- and-binding region varies dramatically; in the vitamin D receptor it has only 25 amino acid residues, whereas in the mineralocorticoid receptor it has 603 residues. Mu- tations that change one amino acid in these regions can result in loss of responsiveness to a specific hormone. Some humans unable to respond to cortisol, testos- terone, vitamin D, or thyroxine have mutations of this type. Regulation Can Result from Phosphorylation of Nuclear Transcription Factors We noted in Chapter 12 that the effects of insulin on gene expression are mediated by a series of steps lead- ing ultimately to the activation of a protein kinase in the nucleus that phosphorylates specific DNA-binding pro- teins and thereby alters their ability to act as tran- scription factors (see Fig. 12–6). This general mecha- nism mediates the effects of many nonsteroid hormones. For example, the H9252-adrenergic pathway that leads to el- evated levels of cytosolic cAMP, which acts as a second messenger in eukaryotes as well as in prokaryotes (see Figs 12–12, 28–18), also affects the transcription of a set of genes, each of which is located near a specific DNA sequence called a cAMP response element (CRE). The catalytic subunit of protein kinase A, released when cAMP levels rise (see Fig. 12–15), enters the nucleus and phosphorylates a nuclear protein, the CRE-binding protein (CREB). When phosphorylated, CREB binds to CREs near certain genes and acts as a transcription fac- tor, turning on the expression of these genes. Many Eukaryotic mRNAs Are Subject to Translational Repression Regulation at the level of translation assumes a much more prominent role in eukaryotes than in bacteria and is observed in a range of cellular situations. In contrast to the tight coupling of transcription and translation in bac- teria, the transcripts generated in a eukaryotic nucleus 28.3 Regulation of Gene Expression in Eukaryotes 1109 Receptor Consensus sequence bound * Androgen GG(A/T)ACAN 2 TGTTCT Glucocorticoid GGTACAN 3 TGTTCT Retinoic acid (some) AGGTCAN 5 AGGTCA Vitamin D AGGTCAN 3 AGGTCA Thyroid hormone AGGTCAN 3 AGGTCA RX ? AGGTCANAGGTCANAGGTCANAGGTCA * N represents any nucleotide. ? Forms a dimer with the retinoic acid receptor or vitamin D receptor. TABLE 28–4 Hormone Response Elements (HREs) Bound by Steroid-Type Hormone Receptors 60 N K D I T C C Zn R R K S C C MKETRY KAFFKRSIQGHNDYM RLRKCYEVGMMKGGIRKDRRGG Y G S A Y D N C V A C Zn H Y G V W S C E G C Q N T A P Q A COO H11002 H 3 N H11001 20 10 50 70 8030 40 Hormone binding (variable sequence and length) DNA binding (66–68 residues, highly conserved) Transcription activation (variable sequence and length) FIGURE 28–31 Typical steroid hormone receptors. These receptor proteins have a binding site for the hormone, a DNA-binding domain, and a region that activates transcription of the regulated gene. The highly conserved DNA-binding domain has two zinc fingers. The sequence shown here is that for the estrogen receptor, but the residues in bold type are common to all steroid hormone receptors. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1109 mac76 mac76:385_reb: must be processed and transported to the cytoplasm be- fore translation. This can impose a significant delay on the appearance of a protein. When a rapid increase in protein production is needed, a translationally repressed mRNA already in the cytoplasm can be activated for translation without delay. Translational regulation may play an especially important role in regulating certain very long eukaryotic genes (a few are measured in the millions of base pairs), for which transcription and mRNA processing can require many hours. Some genes are regulated at both the transcriptional and transla- tional stages, with the latter playing a role in the fine- tuning of cellular protein levels. In some anucleate cells, such as reticulocytes (immature erythrocytes), tran- scriptional control is entirely unavailable and transla- tional control of stored mRNAs becomes essential. As described below, translational controls can also have spatial significance during development, when the reg- ulated translation of prepositioned mRNAs creates a local gradient of the protein product. Eukaryotes have at least three main mechanisms of translational regulation. 1. Initiation factors are subject to phosphorylation by a number of protein kinases. The phosphorylated forms are often less active and cause a general depression of translation in the cell. 2. Some proteins bind directly to mRNA and act as translational repressors, many of them binding at specific sites in the 3H11032 untranslated region (3H11032UTR). So positioned, these proteins interact with other translation initiation factors bound to the mRNA or with the 40S ribosomal subunit to prevent translation initiation (Fig. 28–32; compare this with Fig. 27–22). 3. Binding proteins, present in eukaryotes from yeast to mammals, disrupt the interaction between eIF4E and eIF4G (see Fig. 27–22). The mammalian versions are known as 4E-BPs (eIF4E binding proteins). When cell growth is slow, these proteins limit translation by binding to the site on eIF4E that normally interacts with eIF4G. When cell growth resumes or increases in response to growth factors or other stimuli, the binding proteins are inactivated by protein kinase– dependent phosphorylation. The variety of translational regulation mechanisms pro- vides flexibility, allowing focused repression of a few mRNAs or global regulation of all cellular translation. Translational regulation has been particularly well studied in reticulocytes. One such mechanism in these cells involves eIF2, the initiation factor that binds to the initiator tRNA and conveys it to the ribosome; when Met-tRNA has bound to the P site, the factor eIF2B binds to eIF2, recycling it with the aid of GTP binding and hydrolysis. The maturation of reticulocytes includes destruction of the cell nucleus, leaving behind a plasma membrane packed with hemoglobin. Messenger RNAs deposited in the cytoplasm before the loss of the nu- cleus allow for the replacement of hemoglobin. When reticulocytes become deficient in iron or heme, the translation of globin mRNAs is repressed. A protein ki- nase called HCR (hemin-controlled repressor) is acti- vated, catalyzing the phosphorylation of eIF2. In its phosphorylated form, eIF2 forms a stable complex with eIF2B that sequesters the eIF2, making it unavailable for participation in translation. In this way, the reticu- locyte coordinates the synthesis of globin with the avail- ability of heme. Many additional examples of translational regula- tion have been found in studies of the development of multicellular organisms, as discussed in more detail below. Posttranscriptional Gene Silencing Is Mediated by RNA Interference In higher eukaryotes, including nematodes, fruit flies, plants, and mammals, a class of small RNAs has been discovered that mediates the silencing of particular genes. The RNAs function by interacting with mRNAs, often in the 3H11032UTR, resulting in either mRNA degrada- tion or translation inhibition. In either case, the mRNA, and thus the gene that produces it, is silenced. This form of gene regulation controls developmental timing in at least some organisms. It is also used as a mechanism to protect against invading RNA viruses (particularly Chapter 28 Regulation of Gene Expression1110 FIGURE 28–32 Translational regulation of eukaryotic mRNA. One of the most important mechanisms for translational regulation in eu- karyotes involves the binding of translational repressors (RNA-binding proteins) to specific sites in the 3H11032 untranslated region (3H11032UTR) of the mRNA. These proteins interact with eukaryotic initiation factors or with the ribosome (see Fig. 27–22) to prevent or slow translation. A A A A(A) n eIF4G Translational repressors eIF4E AUG eIF3 5H11032 cap 3H11032 poly(A) binding protein 40S Ribosomal subunit 3H11032 Untranslated region (3H11032UTR) 8885d_c28_1110 2/19/04 7:43 AM Page 1110 mac76 mac76:385_reb: important in plants, which lack an immune system) and to control the activity of transposons. In addition, small RNA molecules may play a critical (but still undefined) role in the formation of heterochromatin. The small RNAs are sometimes called micro-RNAs (miRNAs). Many are present only transiently during development, and these are sometimes referred to as small temporal RNAs (stRNAs). Hundreds of different miRNAs have been identified in higher eukaryotes. They are transcribed as precursor RNAs about 70 nucleotides long, with internally complementary sequences that form hairpinlike structures (Fig. 28–33). The precursors are cleaved by endonucleases to form short duplexes about 20 to 25 nucleotides long. The best-characterized nuclease goes by the delightfully suggestive name Dicer; endonucleases in the Dicer family are widely distributed in higher eukaryotes. One strand of the processed miRNA is transferred to the target mRNA (or to a viral or transposon RNA), leading to inhibition of translation or degradation of the RNA (Fig. 28–33a). This gene regulation mechanism has an interesting and very useful practical side. If an investigator intro- duces into an organism a duplex RNA molecule corre- sponding in sequence to virtually any mRNA, the Dicer endonuclease cleaves the duplex into short segments, called small interfering RNAs (siRNAs). These bind to the mRNA and silence it (Fig. 28–33b). The process is known as RNA interference (RNAi). In plants, virtu- ally any gene can be effectively shut down in this way. In nematodes, simply introducing the duplex RNA into the worm’s diet produces very effective suppression of the target gene. The technique has rapidly become an important tool in the ongoing efforts to study gene func- tion, because it can disrupt gene function without cre- ating a mutant organism. The procedure can be applied to humans as well. Laboratory-produced siRNAs have already been used to block HIV and poliovirus infections in cultured human cells for a week or so at a time. Al- though this work is in its infancy, the rapid progress makes RNA interference a field to watch for future med- ical advances. Development Is Controlled by Cascades of Regulatory Proteins For sheer complexity and intricacy of coordination, the patterns of gene regulation that bring about develop- ment of a zygote into a multicellular animal or plant have no peer. Development requires transitions in morphol- ogy and protein composition that depend on tightly co- ordinated changes in expression of the genome. More genes are expressed during early development than in any other part of the life cycle. For example, in the sea urchin, an oocyte has about 18,500 different mRNAs, compared with about 6,000 different mRNAs in the cells of a typical differentiated tissue. The mRNAs in the oocyte give rise to a cascade of events that regulate the expression of many genes across both space and time. Several animals have emerged as important model systems for the study of development, because they are easy to maintain in a laboratory and have relatively short generation times. These include nematodes, fruit flies, zebra fish, mice, and the plant Arabidopsis. This dis- cussion focuses on the development of fruit flies. Our understanding of the molecular events during develop- ment of Drosophila melanogaster is particularly well advanced and can be used to illustrate patterns and principles of general significance. The life cycle of the fruit fly includes complete metamorphosis during its progression from an embryo to an adult (Fig. 28–34). Among the most important characteristics of the embryo are its polarity (the an- terior and posterior parts of the animal are readily dis- tinguished, as are its dorsal and ventral parts) and its metamerism (the embryo body is made up of serially repeating segments, each with characteristic features). During development, these segments become organized into a head, thorax, and abdomen. Each segment of the adult thorax has a different set of appendages. Devel- opment of this complex pattern is under genetic con- trol, and a variety of pattern-regulating genes have been 28.3 Regulation of Gene Expression in Eukaryotes 1111 Precursor Duplex RNA stRNA Degradation Silenced mRNA Translation inhibition AAA(A) n siRNA Dicer Dicer (a) (b) FIGURE 28–33 Gene silencing by RNA interference. (a) Small tem- poral RNAs (stRNAs) are generated by Dicer-mediated cleavage of longer precursors that fold to create duplex regions. The stRNAs then bind to mRNAs, leading to degradation of mRNA or inhibition of trans- lation. (b) Double-stranded RNAs can be constructed and introduced into a cell. Dicer processes the duplex RNAs into small interfering RNAs (siRNAs), which interact with the target mRNA. Again, the mRNA is either degraded or its translation inhibited. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1111 mac76 mac76:385_reb: discovered that dramatically affect the organization of the body. The Drosophila egg, along with 15 nurse cells, is surrounded by a layer of follicle cells (Fig. 28–35). As the egg cell forms (before fertilization), mRNAs and pro- teins originating in the nurse and follicle cells are de- posited in the egg cell, where some play a critical role in development. Once a fertilized egg is laid, its nucleus divides and the nuclear descendants continue to divide in synchrony every 6 to 10 min. Plasma membranes are not formed around the nuclei, which are distributed within the egg cytoplasm (or syncytium). Between the eighth and eleventh rounds of nuclear division, the nu- clei migrate to the outer layer of the egg, forming a monolayer of nuclei surrounding the common yolk-rich cytoplasm; this is the syncytial blastoderm. After a few additional divisions, membrane invaginations surround the nuclei to create a layer of cells that form the cellu- lar blastoderm. At this stage, the mitotic cycles in the various cells lose their synchrony. The developmental fate of the cells is determined by the mRNAs and pro- teins originally deposited in the egg by the nurse and follicle cells. Proteins that, through changes in local concentra- tion or activity, cause the surrounding tissue to take up a particular shape or structure are sometimes referred to as morphogens; they are the products of pattern- regulating genes. As defined by Christiane Nüsslein- Volhard, Edward B. Lewis, and Eric F. Wieschaus, three major classes of pattern-regulating genes—maternal, segmentation, and homeotic genes—function in suc- cessive stages of development to specify the basic fea- tures of the Drosophila embryo’s body. Maternal genes are expressed in the unfertilized egg, and the resulting maternal mRNAs remain dormant until fer- tilization. These provide most of the proteins needed in very early development, until the cellular blastoderm is formed. Some of the proteins encoded by maternal mRNAs direct the spatial organization of the develop- ing embryo at early stages, establishing its polarity. Segmentation genes, transcribed after fertilization, direct the formation of the proper number of body seg- ments. At least three subclasses of segmentation genes act at successive stages: gap genes divide the devel- oping embryo into several broad regions, and pair-rule genes together with segment polarity genes define 14 stripes that become the 14 segments of a normal em- bryo. Homeotic genes are expressed still later; they specify which organs and appendages will develop in particular body segments. The many regulatory genes in these three classes direct the development of an adult fly, with a head, tho- rax, and abdomen, with the proper number of segments, and with the correct appendages on each segment. Al- though embryogenesis takes about a day to complete, all these genes are activated during the first four hours. Some mRNAs and proteins are present for only a few minutes at specific points during this period. Some of the genes code for transcription factors that affect the expression of other genes in a kind of developmental cascade. Regulation at the level of translation also oc- curs, and many of the regulatory genes encode transla- tional repressors, most of which bind to the 3H11032UTR of the mRNA (Fig. 28–32). Because many mRNAs are Chapter 28 Regulation of Gene Expression1112 Late embryo—segmented Pupa Larva Adult Day 9 T 1 T 2 T 3 A 1 A 2 A 3 A 4 A 5 A 6 A 7 Oocyte Early embryo— no segments Day 0 Egg Day 1 hatching Day 5 pupation fertilization embryonic development metamorphosis separated by molts three larval stages, 1 mm AbdomenThoraxHead FIGURE 28–34 Life cycle of the fruit fly Drosophila melanogaster. Drosophila undergoes a complete metamorphosis, which means that the adult insect is radically different in form from its immature stages, a transformation that requires extensive alterations during development. By the late embryonic stage, segments have formed, each containing specialized structures from which the various appendages and other features of the adult fly will develop. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1112 mac76 mac76:385_reb: Egg Oocyte nanos mRNA bicoid mRNA Nurse cells PosteriorAnterior Nurse cells Follicle cells Oocyte Egg chamber Follicle cells Oocyte Fertilized egg nuclear divisions nuclear migration membrane invagination Syncytium Syncytial blastoderm Cellular blastoderm Pole cells fertilization deposited in the egg long before their translation is required, translational repression provides an especially important avenue for regulation in developmental pathways. Maternal Genes Some maternal genes are expressed within the nurse and follicle cells, and some in the egg itself. Within the unfertilized Drosophila egg, the mater- nal gene products establish two axes—anterior-posterior and dorsal-ventral—and thus define which regions of the radially symmetric egg will develop into the head and ab- domen and the top and bottom of the adult fly. A key event in very early development is establishment of mRNA and protein gradients along the body axes. Some maternal mRNAs have protein products that diffuse through the cytoplasm to create an asymmetric distribu- tion in the egg. Different cells in the cellular blastoderm therefore inherit different amounts of these proteins, setting the cells on different developmental paths. The products of the maternal mRNAs include transcriptional activators or repressors as well as translational rep- ressors, all regulating the expression of other pattern- regulating genes. The resulting specific patterns and sequences of gene expression therefore differ between cell lineages, ultimately orchestrating the development of each adult structure. The anterior-posterior axis in Drosophila is defined at least in part by the products of the bicoid and nanos genes. The bicoid gene product is a major anterior morphogen, and the nanos gene product is a major posterior morphogen. The mRNA from the bicoid gene is synthesized by nurse cells and deposited in the unfertil- ized egg near its anterior pole. Nüsslein-Volhard found that this mRNA is translated soon after fertilization, and the Bi- coid protein diffuses through 28.3 Regulation of Gene Expression in Eukaryotes 1113 FIGURE 28–35 Early development in Drosophila. During develop- ment of the egg, maternal mRNAs (including the bicoid and nanos gene transcripts, discussed in the text) and proteins are deposited in the developing oocyte (unfertilized egg cell) by nurse cells and folli- cle cells. After fertilization, the two nuclei of the fertilized egg divide in synchrony within the common cytoplasm (syncytium), then migrate to the periphery. Membrane invaginations surround the nuclei to cre- ate a monolayer of cells at the periphery; this is the cellular blasto- derm stage. During the early nuclear divisions, several nuclei at the far posterior become pole cells, which later become the germ-line cells. Christiane Nüsslein-Volhard 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1113 mac76 mac76:385_reb: the cell to create, by the seventh nuclear division, a concentration gradient radiating out from the anterior pole (Fig. 28–36a). The Bicoid protein is a transcription factor that activates the expression of a number of seg- mentation genes; the protein contains a homeodomain (p. 1090). Bicoid is also a translational repressor that in- activates certain mRNAs. The amounts of Bicoid protein in various parts of the embryo affect the subsequent ex- pression of a number of other genes in a threshold- dependent manner. Genes are transcriptionally activated or translationally repressed only where the Bicoid protein concentration exceeds the threshold. Changes in the shape of the Bicoid concentration gradient have dramatic effects on the body pattern. Lack of Bicoid protein results in development of an embryo with two abdomens but nei- ther head nor thorax (Fig. 28–36b); however, embryos without Bicoid will develop normally if an adequate amount of bicoid mRNA is injected into the egg at the ap- propriate end. The nanos gene has an analogous role, but its mRNA is deposited at the posterior end of the egg and the anterior-posterior protein gradient peaks at the pos- terior pole. The Nanos protein is a translational repressor. A broader look at the effects of maternal genes re- veals the outline of a developmental circuit. In addition to the bicoid and nanos mRNAs, which are deposited in the egg asymmetrically, a number of other maternal mRNAs are deposited uniformly throughout the egg cy- toplasm. Three of these mRNAs encode the Pumilio, Hunchback, and Caudal proteins, all affected by nanos and bicoid (Fig. 28–37). Caudal and Pumilio are in- volved in development of the posterior end of the fly. Caudal is a transcriptional activator with a home- odomain; Pumilio is a translational repressor. Hunch- back protein plays an important role in the development of the anterior end and is also a transcriptional regula- tor of a variety of genes, in some cases a positive regu- lator, in other cases negative. Bicoid suppresses trans- lation of caudal in the anterior and also acts as a transcriptional activator of hunchback in the cellular blastoderm. Because hunchback is expressed both from maternal mRNAs and from genes in the developing egg, it is considered both a maternal and a segmentation gene. The result of the activities of Bicoid is an increased concentration of Hunchback at the anterior end of the Chapter 28 Regulation of Gene Expression1114 Double-posterior larva bcd H11002 /bcd H11002 egg (b) Relative concentration of Bicoid (Bcd) protein 100 0 0 10050 Distance from anterior end (% of egg length) bcd H11002 / bcd H11002 mutant Normal larva Normal egg (a) 100 Relative concentration of Bicoid (Bcd) protein 0 0 10050 Distance from anterior end (% of egg length) Normal FIGURE 28–36 Distribution of a maternal gene product in a Drosophila egg. (a) Micrograph of an immunologically stained egg, showing distribution of the bicoid (bcd) gene product. The graph meas- ures stain intensity. This distribution is essential for normal develop- ment of the anterior structures of the animal. (b) If the bcd gene is not expressed by the mother (bcd H11002 /bcd H11002 mutant) and thus no bicoid mRNA is deposited in the egg, the resulting embryo has two posteri- ors (and soon dies). 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1114 mac76 mac76:385_reb: egg. The Nanos and Pumilio proteins act as translational repressors of hunchback, suppressing synthesis of its protein near the posterior end of the egg. Pumilio does not function in the absence of the Nanos protein, and the gradient of Nanos expression confines the activity of both proteins to the posterior region. Translational repression of the hunchback gene leads to degradation of hunchback mRNA near the posterior end. However, lack of Bicoid protein in the posterior leads to expres- sion of caudal. In this way, the Hunchback and Caudal proteins become asymmetrically distributed in the egg. Segmentation Genes Gap genes, pair-rule genes, and segment polarity genes, three subclasses of segmenta- tion genes in Drosophila, are activated at successive stages of embryonic development. Expression of the gap genes is generally regulated by the products of one or more maternal genes. At least some of the gap genes encode transcription factors that affect the expression of other segmentation or (later) homeotic genes. One well-characterized segmentation gene is fushi tarazu ( ftz), of the pair-rule subclass. When ftz is deleted, the embryo develops 7 segments instead of the normal 14, each segment twice the normal width. The Fushi-tarazu protein (Ftz) is a transcriptional activator with a homeodomain. The mRNAs and proteins derived from the normal ftz gene accumulate in a striking pat- tern of seven stripes that encircle the posterior two- thirds of the embryo (Fig. 28–38). The stripes demar- cate the positions of segments that develop later; these segments are eliminated if ftz function is lost. The Ftz protein and a few similar regulatory proteins directly or indirectly regulate the expression of vast numbers of genes in the continuing developmental cascade. 28.3 Regulation of Gene Expression in Eukaryotes 1115 Anterior Posterior Localized bicoid mRNA Localized nanos mRNA Bicoid protein Nanos protein translation of mRNA and diffusion of product creates concentration gradients translation suppression/activation of uniformly distributed mRNAs reflects gradient of regulator Caudal protein caudal mRNA hunchback mRNA pumilio mRNA Hunchback protein Pumilio protein Egg cytoplasm FIGURE 28–37 Regulatory circuits of the anterior-posterior axis in a Drosophila egg. The bicoid and nanos mRNAs are localized near the anterior and posterior poles, respectively. The caudal, hunchback, and pumilio mRNAs are distributed throughout the egg cytoplasm. The gradients of Bicoid (Bcd) and Nanos proteins lead to accumulation of Hunchback protein in the anterior and Caudal protein in the poste- rior of the egg. Because Pumilio protein requires Nanos protein for its activity as a translational repressor of hunchback, it functions only at the posterior end. (a) (c) (b) 100 mH9262 FIGURE 28–38 Distribution of the fushi tarazu (ftz) gene product in early Drosophila embryos. (a) In the normal embryo, the gene prod- uct can be detected in seven bands around the circumference of the embryo (shown schematically). These bands (b) appear as dark spots (generated by a radioactive label) in a cross-sectional autoradiograph and (c) demarcate the anterior margins of the segments in the late em- bryo (marked in red). 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1115 mac76 mac76:385_reb: Homeotic Genes Loss of homeotic genes by mutation or deletion causes the appearance of a normal appendage or body structure at an inappropriate body position. An important example is the ultrabithorax (ubx) gene. When Ubx function is lost, the first abdominal segment develops incorrectly, having the structure of the third thoracic segment. Other known homeotic mutations cause the formation of an extra set of wings, or two legs at the position in the head where the antennae are nor- mally found (Fig. 28–39). The homeotic genes often span long regions of DNA. The ubx gene, for example, is 77,000 bp long. More than 73,000 bp of this gene are in introns, one of which is more than 50,000 bp long. Transcription of the ubx gene takes nearly an hour. The delay this imposes on ubx gene expression is believed to be a timing mechanism involved in the temporal regulation of subsequent steps in development. The Ubx protein is yet another tran- scriptional activator with a homeodomain (Fig. 28–13). Many of the principles of development outlined above apply to eukaryotes from nematodes to humans. Some of the regulatory proteins themselves are con- served. For example, the products of the homeobox- containing genes HOX 1.1 in mouse and antennapedia in fruit fly differ in only one amino acid residue. Of course, although the molecular regulatory mechanisms may be similar, many of the ultimate developmental events are not conserved (humans do not have wings or antennae). The discovery of structural determinants with identifiable molecular functions is the first step in understanding the molecular events underlying devel- opment. As more genes and their protein products are discovered, the biochemical side of this vast puzzle will be elucidated in increasingly rich detail. SUMMARY 28.3 Regulation of Gene Expression in Eukaryotes ■ In eukaryotes, positive regulation is more common than negative regulation, and transcription is accompanied by large changes in chromatin structure. Promoters for Pol II typically have a TATA box and Inr sequence, as well as multiple binding sites for DNA-binding transactivators. The latter sites, sometimes located hundreds or thousands of base pairs away from the TATA box, are called upstream activator sequences in yeast and enhancers in higher eukaryotes. ■ Large complexes of proteins are generally required to regulate transcriptional activity. The effects of DNA-binding transactivators on Pol II are mediated by coactivator protein complexes such as TFIID or mediator. The modular structures of the transactivators have distinct activation and DNA-binding domains. Other protein complexes, including histone acetyltransferases such as GCN5-ADA2-ADA3 and ATP-dependent complexes such as SWI/SNF and NURF, reversibly remodel chromatin structure. ■ Hormones affect the regulation of gene expression in one of two ways. Steroid hormones interact directly with intracellular receptors that are DNA-binding regulatory proteins; binding of the hormone has either positive or negative effects on the transcription of genes targeted by the hormone. Nonsteroid Chapter 28 Regulation of Gene Expression1116 (b) (a) (c) (d) FIGURE 28–39 Effects of mutations in homeotic genes in Drosophila. (a) Normal head. (b) Homeotic mutant (antennapedia) in which antennae are replaced by legs. (c) Normal body structure. (d) Homeotic mutant (bithorax) in which a segment has developed incor- rectly to produce an extra set of wings. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1116 mac76 mac76:385_reb: hormones bind to cell-surface receptors, triggering a signaling pathway that can lead to phosphorylation of a regulatory protein, affecting its activity. ■ Development of a multicellular organism presents the most complex regulatory challenge. The fate of cells in the early embryo is determined by establishment of anterior-posterior and dorsal-ventral gradients of proteins that act as transcriptional transactivators or translational repressors, regulating the genes required for the development of structures appropriate to a particular part of the organism. Sets of regulatory genes operate in temporal and spatial succession, transforming given areas of an egg cell into predictable structures in the adult organism. Chapter 28 Further Reading 1117 Key Terms housekeeping genes 1082 induction 1082 repression 1082 specificity factor 1083 repressor 1083 activator 1083 operator 1083 negative regulation 1084 positive regulation 1084 operon 1085 helix-turn-helix 1088 zinc finger 1088 homeodomain 1090 homeobox 1090 leucine zipper 1090 basic helix-loop-helix 1090 catabolite repression 1093 cAMP receptor protein (CRP) 1093 regulon 1094 transcription attenuation 1094 translational repressor 1098 stringent response 1098 phase variation 1100 hypersensitive sites 1102 chromatin remodeling 1103 enhancers 1104 upstream activator se- quences (UASs) 1104 basal transcription factors 1104 DNA-binding transactivators 1104 coactivators 1104 TATA-binding protein (TBP) 1104 mediator 1105 hormone response ele- ments (HREs) 1108 RNA interference (RNAi) 1111 polarity 1111 metamerism 1111 morphogens 1112 maternal genes 1112 maternal mRNAs 1112 segmentation genes 1112 gap genes 1112 pair-rule genes 1112 segment polarity genes 1112 homeotic genes 1112 Terms in bold are defined in the glossary. Further Reading General Hershey, J.W.B., Mathews, M.B., & Sonenberg, N. (1996) Translational Control, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Many detailed reviews cover all aspects of this topic. Müller-Hill, B. (1996) The lac Operon: A Short History of a Genetic Paradigm, Walter de Gruyter, New York. An excellent detailed account of the investigation of this important system. Neidhardt, F.C. (ed.) (1996) Escherichia coli and Salmonella typhimurium, 2nd edn, Vol. 1: Cellular and Molecular Biology (Curtiss, R., Ingraham, J.L., Lin, E.C.C., Magasanik, B., Low, K.B., Reznikoff, W.S., Riley, M., Schaechter, M., & Umbarger, H.E., vol. eds), American Society for Microbiology, Washington, DC. An excellent source for reviews of many bacterial operons. The Web-based version, EcoSal, is updated regularly. Pabo, C.O. & Sauer, R.T. (1992) Transcription factors: structural factors and principles of DNA recognition. Annu. Rev. Biochem. 61, 1053–1095. Schleif, R. (1993) Genetics and Molecular Biology, 2nd edn, The Johns Hopkins University Press, Baltimore. Provides an excellent account of the experimental basis of important concepts of prokaryotic gene regulation. Regulation of Gene Expression in Prokaryotes Condon, C., Squires, C., & Squires, C.L. (1995) Control of rRNA transcription in Escherichia coli. Microbiol. Rev. 59, 623–645. Gourse, R.L., Gaal, T., Bartlett, M.S., Appleman, J.A., & Ross, W. (1996) rRNA transcription and growth rate–dependent regulation of ribosome synthesis in Escherichia coli. Annu. Rev. Microbiol. 50, 645–677. Jacob, F. & Monod, J. (1961) Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356. The operon model and the concept of messenger RNA, first proposed in the Proceedings of the French Academy of Sciences in 1960, are presented in this historic paper. Johnson, R.C. (1991) Mechanism of site-specific DNA inversion in bacteria. Curr. Opin. Genet. Dev. 1, 404–411. Kolb, A., Busby, S., Buc, H., Garges, S., & Adhya, S. (1993) Transcriptional regulation by cAMP and its receptor protein. Annu. Rev. Biochem. 62, 749–795. Romby, P. & Springer, M. (2003) Bacterial translational control at atomic resolution. Trends Genet. 19, 155–161. Yanofsky, C., Konan, K.V., & Sarsero, J.P. (1996) Some novel transcription attenuation mechanisms used by bacteria. Biochimie 78, 1017–1024. 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1117 mac76 mac76:385_reb: Chapter 28 Regulation of Gene Expression1118 Regulation of Gene Expression in Eukaryotes Agami, R. (2002) RNAi and related mechanisms and their potential use for therapy. Curr. Opin. Chem. Biol. 6, 829–834. Bashirullah, A., Cooperstock, R.L., & Lipshitz, H.D. (1998) RNA localization in development. Annu. Rev. Biochem. 67, 335–394. Becker, P.B. & Horz W. (2002) ATP-dependent nucleosome remodeling. Annu. Rev. Biochem. 71, 247–273. Boube, M., Joulia, L., Cribbs, D.L., & Bourbon, H.M. (2002) Evidence for a mediator of RNA polymerase II transcriptional regulation conserved from yeast to man. Cell 110, 143–151. Cerutti, H. (2003) RNA interference: traveling in the cell and gaining functions? Trends Genet. 19, 9–46. Conaway, R.C., Brower, C.S., & Conaway, J.W. (2002) Gene expression—emerging roles of ubiquitin in transcription regulation. Science 296, 1254–1258. Cosma, M.P. (2002) Ordered recruitment: gene-specific mechanism of transcription activation. Mol. Cell 10, 227–236. Dean, K.A., Aggarwal, A.K., & Wharton, R.P. (2002) Translational repressors in Drosophila. Trends Genet. 18, 572–577. DeRobertis, E.M., Oliver, G., & Wright, C.V.E. (1990) Homeobox genes and the vertebrate body plan. Sci. Am. 263 (July), 46–52. Edmondson, D.G. & Roth, S.Y. (1996) Chromatin and transcription. FASEB J. 10, 1173–1182. Gingras, A.-C., Raught, B., & Sonenberg, N. (1999) eIF4 initiation factors: effectors of mRNA recruitment to ribosomes and regulators of translation. Annu. Rev. Biochem. 68, 913–963. Gray, N.K. & Wickens, M. (1998) Control of translation initiation in animals. Annu. Rev. Cell Dev. Biol. 14, 399–458. Hannon, G.J. (2002) RNA interference. Nature 418, 244–251. Luger, K. (2003) Structure and dynamic behavior of nucleosomes. Curr. Opin. Genet. Dev. 13, 127–135. Mannervik, M., Nibu, Y., Zhang, H., & Levine, M. (1999) Transcriptional coregulators in development. Science 284, 606–609. Martens, J.A. & Winston, F. (2003) Recent advances in understanding chromatin remodeling by Swi/Snf complexes. Curr. Opin. Genet. Dev. 13, 136–142. McKnight, S.L. (1991) Molecular zippers in gene regulation. Sci. Am. 264 (April), 54–64. A good description of leucine zippers. Melton, D.A. (1991) Pattern formation during animal development. Science 252, 234–241. Muller, W.A. (1997) Developmental Biology, Springer, New York. A good elementary text. Myers, L.C. & Kornberg, R.D. (2000) Mediator of transcriptional regulation. Annu. Rev. Biochem. 69, 729–749. Reese, J.C. (2003) Basal transcription factors. Curr. Opin. Genet. Dev. 13, 114–118. Rivera-Pomar, R. & Jackle, H. (1996) From gradients to stripes in Drosophila embryogenesis: filling in the gaps. Trends Genet. 12, 478–483. Struhl, K. (1999) Fundamentally different logic of gene regulation in eukaryotes and prokaryotes. Cell 98, 1–4. Waterhouse, P.M. & Helliwell, C.A. (2003) Exploring plant genomes by RNA-induced gene silencing. Nat. Rev. Genet. 4, 29–38. 1. Effect of mRNA and Protein Stability on Regula- tion E. coli cells are growing in a medium with glucose as the sole carbon source. Tryptophan is suddenly added. The cells continue to grow, and divide every 30 min. Describe (qualitatively) how the amount of tryptophan synthase activity in the cells changes with time under the following conditions: (a) The trp mRNA is stable (degraded slowly over many hours). (b) The trp mRNA is degraded rapidly, but tryptophan synthase is stable. (c) The trp mRNA and tryptophan synthase are both degraded rapidly. 2. Negative Regulation Describe the probable effects on gene expression in the lac operon of a mutation in (a) the lac operator that deletes most of O 1 ; (b) the lacI gene that inactivates the repressor; and (c) the promoter that alters the region around position H1100210. 3. Specific DNA Binding by Regulatory Proteins A typical prokaryotic repressor protein discriminates between its specific DNA binding site (operator) and nonspecific DNA by a factor of 10 4 to 10 6 . About 10 molecules of repressor per cell are sufficient to ensure a high level of repression. Assume that a very similar repressor existed in a human cell, with a similar specificity for its binding site. How many copies of the repressor would be required to elicit a level of repression sim- ilar to that in the prokaryotic cell? (Hint: The E. coli genome contains about 4.6 million bp; the human haploid genome has about 3.2 billion bp.) 4. Repressor Concentration in E. coli The dissociation constant for a particular repressor-operator complex is very low, about 10 H1100213 M. An E. coli cell (volume 2 H11003 10 H1100212 mL) contains 10 copies of the repressor. Calculate the cellular con- centration of the repressor protein. How does this value com- pare with the dissociation constant of the repressor-operator complex? What is the significance of this result? Problems 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1118 mac76 mac76:385_reb: Chapter 28 Problems 1119 5. Catabolite Repression E. coli cells are growing in a medium containing lactose but no glucose. Indicate whether each of the following changes or conditions would increase, decrease, or not change the expression of the lac operon. It may be helpful to draw a model depicting what is happening in each situation. (a) Addition of a high concentration of glucose (b) A mutation that prevents dissociation of the Lac re- pressor from the operator (c) A mutation that completely inactivates H9252-galactosi- dase (d) A mutation that completely inactivates galactoside permease (e) A mutation that prevents binding of CRP to its bind- ing site near the lac promoter 6. Transcription Attenuation How would transcription of the E. coli trp operon be affected by the following manip- ulations of the leader region of the trp mRNA? (a) Increasing the distance (number of bases) between the leader peptide gene and sequence 2 (b) Increasing the distance between sequences 2 and 3 (c) Removing sequence 4 (d) Changing the two Trp codons in the leader peptide gene to His codons (e) Eliminating the ribosome-binding site for the gene that encodes the leader peptide (f) Changing several nucleotides in sequence 3 so that it can base-pair with sequence 4 but not with sequence 2 7. Repressors and Repression How would the SOS re- sponse in E. coli be affected by a mutation in the lexA gene that prevented autocatalytic cleavage of the LexA protein? 8. Regulation by Recombination In the phase variation system of Salmonella, what would happen to the cell if the Hin recombinase became more active and promoted re- combination (DNA inversion) several times in each cell generation? 9. Initiation of Transcription in Eukaryotes A new RNA polymerase activity is discovered in crude extracts of cells derived from an exotic fungus. The RNA polymerase ini- tiates transcription only from a single, highly specialized pro- moter. As the polymerase is purified its activity declines, and the purified enzyme is completely inactive unless crude ex- tract is added to the reaction mixture. Suggest an explana- tion for these observations. 10. Functional Domains in Regulatory Proteins A bio- chemist replaces the DNA-binding domain of the yeast Gal4 protein with the DNA-binding domain from the Lac repres- sor, and finds that the engineered protein no longer regulates transcription of the GAL genes in yeast. Draw a diagram of the different functional domains you would expect to find in the Gal4 protein and in the engineered protein. Why does the engineered protein no longer regulate transcription of the GAL genes? What might be done to the DNA-binding site rec- ognized by this chimeric protein to make it functional in ac- tivating transcription of GAL genes? 11. Inheritance Mechanisms in Development A Drosophila egg that is bcd H11002 /bcd H11002 may develop normally but as an adult will not be able to produce viable offspring. Explain. Biochemistry on the Internet 12. TATA Binding Protein and the TATA Box To ex- amine the interactions between transcription factors and DNA, go to the Protein Data Bank (www.rcsb.org/pdb) and download the PDB file 1TGH. This file models the interac- tions between a human TATA-binding protein and a segment of double-stranded DNA. Use the Noncovalent Bond Finder at the Chime Resources website (www.umass.edu/microbio/ chime) to examine the roles of hydrogen bonds and hydro- phobic interactions involved in the binding of this transcrip- tion factor to the TATA box. Within the Noncovalent Bond Finder program, load the PDB file and display the protein in Spacefill mode and the DNA in Wireframe mode. (a) Which of the base pairs in the DNA form hydrogen bonds with the protein? Which of these contribute to the spe- cific recognition of the TATA box by this protein? (Hydrogen- bond length between hydrogen donor and hydrogen accep- tor ranges from 2.5 to 3.3 ?.) (b) Which amino acid residues in the protein interact with these base pairs? On what basis did you make this de- termination? Do these observations agree with the informa- tion presented in the text? (c) What is the sequence of the DNA in this model and which portions of the sequence are recognized by the TATA- binding protein? (d) Can you identify any hydrophobic interactions in this complex? (Hydrophobic interactions usually occur with interatomic distances of 3.3 to 4.0 ?.) 8885d_c28_1081-1119 2/12/04 2:28 PM Page 1119 mac76 mac76:385_reb: