Lecture 19
EUKARYOTIC GENES AND GENOMES I
For the last several lectures we have been looking at how one can
manipulate prokaryotic genomes and how prokaryotic genes are regulated. In
the next several lectures we will be considering eukaryotic genes and genomes,
and considering how model eukaryotic organisms are used to study eukaryotic
gene function. During the course of the next six lectures we will think about
genes and genomes of some commonly used model organisms, the yeast
Saccharomyces cerevisiae and the mouse Mus musculus. But first let’s look how
the genes and genomes of these organisms compare to E. coli at one extreme,
and humans at the other.
Kb = kilobase = 1 thousand base-pairs of DNA
= DNA content of a gamete (sperm or egg)
genome = DNA content of a complete haploid set of chromosomes
H. sapiens
M. musculus
D. melanogaster
C. elegans
S. cerevisiae
E. coli
genes/
haploid
year
sequence
completed
DNA
content/
haploid (Mb)
cMChromosomesSpecies
1
16
6
4
20
23
N/A
4000
300
280
1700
3300
5
12
100
180
3000
3000
1997
1997
1998
2000
19,000
14,000
22,500?
22,500?
genes have
introns?
no
rarely
nearly all
nearly all
nearly all
nearly all
Mb = megabase = 1 million base-pairs of DNA
Note: cM = centi Morgan = 1% recombination
2002 draft
2001 draft
2005 finished?
2003 finished
4,200
5,800
Let’s think about the number of genes in an organism and the size of the
organism’s genome. The average protein is about 300 amino acids long,
requiring 300 triplet codons, or roughly 1Kb of DNA. Thus it makes sense that to
encode 4,200 genes E. coli requires a genome of 5 million base pairs. However,
the human genome encodes about 22,500 proteins, and this should require a
genome of lets say 25 million base pairs. Instead, humans have a genome that
is ~ 3000 million base pairs, or ~ 3,000 Mb, i.e., ~ 3 billion base pairs. In other
words, there is about 100-fold more DNA in the human genome than is required
for encoding 22,500 proteins. What is it all doing? Some of it constitutes
promoters upstream of each gene, some is structural DNA around centromeres
and telomeres (the end of chromosomes, some is simply intergenic regions (non-
coding regions between genes) but much of it is present as introns.
What does it mean “Genes Have Introns”. This represents one of the
fundamental organizational differences between prokaryotic and eukaryotic
genes. Eukaryotic genes turn out to be interrupted with long DNA sequences
that do not encode for
protein…these
“intervening sequences”
are called introns. chromosome (ds DNA)
1 2 3
gene
exons introns
transcription
1 2 3primary transcript (ss RNA)
mRNA (ssRNA)
translation
1 2 3protein (amino acids)
1 2 3MeG
cap
AAAAA
addition of 5’ cap
3’ polyadenylation
splicing out of introns
AUG
stop
The DNA segments that
are ultimately expressed
as protein, i.e., the DNA
sequence that contains
triplet codon information,
are called exons. The
intronic sequences are
removed from the primary
transcript by splicing.
A major consequence of this arrangement is the potential for alternative
splicing to produce different proteins species from the same gene and primary
transcript. This gives the potential for tremendous amplification of the
complexity of mammals (and other eukaryotes) through many more thousands
of possible proteins.
Note that lower eukaryotes such as the yeast S. cerevisiae only have ~ 5% of
their genes interrupted by introns, but for multicellular organisms, like humans,
>90% of all genes are interrupted by anywhere between 2 and 60 introns, but
most genes have between 5 and 12 introns.
Drosophila melanogaster
syt
CG2964 CG3123CG16987
CG15400
CG3131
0 50
Human
GATA1 HDAC6 LOC139168
PCSK1N
0 50
Saccharomyces cerevisiae
RGD2
SEC53 ACT1
FET5 TUB2 RP041 YFL034W HAC1 STE2
YFL046W
YFL044C YPT1
MOB2
RPL22B
CAK1 BST1 EPL1RIM15
CAF16
YFL042C
0 50
GYP8
YFL040W YFL030W
Figure by MIT OCW.
Gene Regulation in Yeast
In the next few lectures we will consider how eukaryotic genes and genomes can
be manipulated and studied, and we will begin with an example of examining
how genes are regulated in S. cerevisiae. First, let’s figure out how to use some
neat genetics to identify some regulated genes, and in the next lecture we will
figure out how one can use genetics to dissect the mechanism of that regulation.
Characterizing function and regulation of S. cerevisiae genes: We are
going to combine a few neat genetic tools that you learned about in Prof. Kaiser’s
lectures for this, namely a library of yeast genomic fragments cloned into a
bacterial plasmid, a modified transposon (mini-Tn7), and the lacZ gene
embedded within the transposon. In this experiment the lacZ gene is going to
be used as a reporter for transcriptional activity of yeast genes.
Tn7TR lacZ URA3 tet Tn7TR
Reporter of
transcription
Selection in
yeast
Selection in
E. coli
Required for
transposition
Required for
transposition
Mini-Tn7
Tn7TR lacZ URA3 tet Tn7TR
Tn7TR lacZ URA3 tet Tn7TR
In E. coli
Tn7TR lacZ URA3 tet Tn7TR
In yeast
Yeast genomic DNA
he
+
Random yeast
insertion library
Tn7 donor
Yeast genomic
plasmid library
Tn7
E. coli
The mini-Tn7 is introduced into a
population of E.coli that harbor a
plasmid library of the S. cerevisiae
genome; i.e., each E. coli cell is home
to a plasmid that contains a different
segment of the S. cerevisiae genome,
such that the whole geneome is
represented many times over in this
population of E. coli. The mini-Tn7 is
allowed to transpose by integrating into
either the plasmid DNA or the bacterial
NA; the original DNA that carries the D
mini-Tn7 can not replicate, but cells that have integrated the mini-Tn7 into t
plasmid or E. coli chromosome are selected as Tetracycline resistant colonies.
Plasmid DNA is purified from these transformants and retransformed into
tetracycline sensitive E. coli; the resulting tetracycline resistant bacteria harbor
only plasmids that have an integrated mini-Tn7 transposon. Plasmid is isolated
from these cells and the yeast genomic fragments are isolated by digestion with
an appropriate restriction enzyme.
So now we have a library of yeast genomic fragments each of which has the
transposon inserted; these genomic fragments can be transformed into S.
cerevisiae cells that are ura3-. Each Ura+ transformant colony will have
recombined a Tn7 transposon-containing genomic DNA into its genome. This
essentially gives us a library of yeast with transposons randomly
integrated into it g
enome.
romoter, and in the correct
-fusion
t
.
brilliant blue color…and so the colonies
c
d
u
such that the lacZ gene
n, the level of β-galactosidase
activity in these cells therefore eporter for the transcription
Note that the lacZ gene in the
transposon does not carry its own
transcription or a translation start
site, but if the transposon inserts
in the correct orientation
downstream of a yeast gene
Tn7TR lacZ URA3 tet Tn7TR
Promoter
of gene X
Yeast cells expressing β-galactosidase
activity can easily be detected by growth
in the presence of 5-bromo-4-chloro-
3-indolyl-beta-D-galactopyranoside,
better known as X-gal. LacZ cleaves X-
gal to release a chemical moiety that has
?Only one in three correct orientation
insertions can produce a LacZ-fusion proten
?One in two insertions will be in the incorrect
orientation and will not produce a LacZ-fusion
protein
?At most, only one in six insertions produce a
functional LacZ-fusion proten
Tn 7 T Rl a c ZUR A3te tTn7 T R Tn 7 T Rl a c ZUR A3te tTn7 T R Tn 7 T Rl a c ZUR A3te tTn7 T R Tn 7 T Rl a c ZUR A3te tTn7 T R Tn 7 T Rl a c ZUR A3te tTn7 T R Tn 7 T Rl a c ZUR A3te tTn7 T R Tn 7 T Rl a c ZUR A3te tTn7 T R Tn 7 T Rl a c ZUR A3te tTn7 T R Tn 7 T Rl a c ZUR A3te tTn7 T R Tn 7 T Rl a c ZUR A3te tTn7 T R Tn 7 T Rl a c ZUR A3te tTn7 T R Tn 7 T Rl a c ZUR A3te tTn7 T R
p
triplet codon reading frame, the
lacZ gene comes under the
control of that promoter and
when transcription is activated
from that promoter a LacZ
protein is expressed, and mos
LacZ-fusion proteins display
robust β-galactosidase activity
Promoter
of gene X
T
r
a
n
s
c
r
ip
t
io
n
s
t
o
p
Tn7TR lacZ URA3
T
r
a
n
s
c
r
ip
t
io
n
s
t
a
r
t
T
r
a
n
s
la
t
io
n
s
t
a
r
t
mRNA
AUG
a
turn bright blue!
ome out of such a collection of yeast
into a gene will essentially disrupt that
ll mutation.
into a yeast gene
There are at least two useful things to
strains:
(1) Any transposon that integrate
gene and is likely to cause a n
(2) For transposons that integrate
is in frame with the genes coding regio
Fusion protein N- -C
Gene X encoded
amino acids
Mini-Tn7 encoded
amino acids
LacZ encoded
amino acids
Fusion Protein has β?galactosidase activity
becomes a r
of that gene.
identify g ing agent that causes
can l
tobacco s
response
The chemical we’ll use as an example is 4-(Methylnitrosoamino)-1-(3-pyridyl)-1-
ary
to a
induced by NNK the
colonies are replica plated not contain
c ly
r o re
r
n
response to an environmental change, how can we use genetics to figure out
Here are just two examples of how such a library can be used: (1) to
enes that protect cells against a DNA damag
cer; ets take the example of one of the many many compounds found in
moke; and (2) to identify genes whose transcription is up-regulated in
to being exposed to this tobacco smoke chemical.
butanone (NNK). The yeast random insertion libr
is first plated out so that individual cells give rise
colony; these colonies are then replicated onto test
plates. To screen the library for genes that protect
against the cell killing that can be
onto agar medium that either does or does
reen the library for genes that are transcriptional
f this nasty carcinogenic compound, the colonies a
edium containing either X-gal alone or X-gal plus a low
+ X-Gal
Random library of Tn7lacZ insertion mutants –
screen for NNK-regulated genes
Random library of Tn7lacZ insertion mutants –
Phenotypic screen for NNK sensitivity
Plus NNK
a high dose of NNK. To s
egulated in the presence
eplica plated onto agar m
dose of NNK.
Interesting colonies can be retrieved from
for identification (and subsequent clo
interesting phenotype.
Once we have identified a gene that is
NNK
sensitive
strain
the master plate for further study and
ing) of the gene responsible for the
transcriptionally up or down regulated in
Minus NNK
high dose
X-Gal + NNK
low dose
how regulation is achieved. This is the topic of the next lecture.