NMR studies of protein–DNA interactions
N. Jamin
a,
*
, F. Toma
b
a
CEA/INSTN, 91191 Gif sur Yvette Cedex, France
b
De′partement de Biologie, Universite′ d’Evry, bld F. Mitterand, 91025 Evry Cedex, France
Received 1 June 2000
Contents
1. Introduction .................................................................. 84
2. Overview of techniques .......................................................... 84
2.1. Labeling of DNA .......................................................... 85
2.2. Chemical shift changes ...................................................... 86
2.3. Hydrogen exchange rates ..................................................... 88
2.4. Isotope editing and isotope filtering ............................................. 88
2.5. Deuteration ............................................................... 89
2.6. Transverse relaxation-optimized spectroscopy (TROSY) .............................. 89
2.7. Long-range distance constraints ................................................ 90
2.8. Dynamics ................................................................ 91
2.9. Hydration ................................................................ 92
3. Selected applications ............................................................ 92
3.1. The helix-turn-helix motif .................................................... 92
3.1.1. Homeodomain ....................................................... 92
3.1.2. Lac repressor headpiece ................................................ 99
3.1.3. Trp repressor ........................................................ 102
3.1.4. Ets ............................................................... 103
3.1.5. Myb .............................................................. 104
3.2. Zinc fingers .............................................................. 105
3.2.1. TFIIIA ............................................................ 105
3.2.2. ADR1 ............................................................. 106
3.2.3. GATA-1 ........................................................... 107
3.2.4. GAGA ............................................................ 107
3.3. Minor groove-binding architectural proteins ....................................... 108
3.3.1. SRY .............................................................. 108
3.3.2. LEF-1 ............................................................. 110
3.3.3. HMG-I(Y) ......................................................... 111
Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114
0079-6565/01/$ - see front matter q 2001 Elsevier Science B.V. All rights reserved.
PII: S0079-6565(00)00024-8
www.elsevier.nl/locate/pnmrs
* Corresponding author. Tel.: 133-1-69-08-96-38; fax: 133-1-69-08-57-53.
E-mail address: nadege.jamin@cea.fr (N. Jamin).
3.4. Recognition using b-sheet .................................................... 111
3.4.1. Tn916 integrase ..................................................... 111
3.4.2. GCC-box binding domain . ............................................. 112
4. Perspectives .................................................................. 112
References ...................................................................... 113
1. Introduction
Understanding at a molecular level, the mechan-
isms for the control of genetic information and its
replication, packaging and repair necessitates the
elucidation of the detailed interactions between
proteins and DNA. The last ten years have produced
a large amount of structural information about
protein–DNA complexes from both X-ray crystallo-
graphy and NMR. These data reveal the complexity of
the DNA recognition process. The absence of a
‘recognition code’ is particularly evident among the
three zinc fingers of the transcription factor TFIIIA as
homologue residues in different complexes do not
always contact corresponding base pairs. Direct inter-
action between protein side-chains and DNA bases
not only involve secondary structures like a-helix or
b-sheet but also flexible loops and arms. Moreover
residues not involved in specific interactions such as
the linker residues of the three zinc fingers domain of
TFIIIA can be as important for the protein–DNA
interaction as residues making contact with DNA
bases.
NMR makes its unique contribution to the under-
standing of protein–DNA interactions by highlighting
the dynamic aspects of protein–DNA interactions:
dynamics of disorder-to-order transitions upon DNA
binding, dynamics at the protein–DNA interface,
dynamics of opening and closing of base-pairs and,
measurements of lifetimes of water molecules at the
protein–DNA interface.
During the last 10 years, more than 20 structures of
specific protein–DNA complexes and numerous data
on protein–DNA interactions have been obtained by
NMR thanks to the developments in protein and
nucleic acid synthesis, in isotopic labeling techniques
and in heteronuclear magnetic resonance spectro-
scopy. The first 3D NMR structures of a protein–
DNA complex were obtained in 1993: the Drosophila
antennapedia mutant homeodomain (Antp(C39S))
bound to a 14-mer duplex DNA containing the BS2
site [1] and the lac repressor headpiece (residues 1–
56, HP56) complexed with a 11-mer operator [2].
This review will describe the use of NMR to obtain
information on complexes of proteins with their speci-
fic DNA targets. Most of the NMR techniques used to
study protein–DNA interactions are also employed
for other type of protein complexes. Therefore, for a
detailed description of the NMR techniques, the
reader is referred to recent reviews [3–5] or to specific
papers referenced in the text.
This review is divided in three parts. The first part is
an overview of the NMR techniques commonly used
to get information on protein–DNA interactions. It
includes a brief description of DNA labeling techni-
ques, the use of chemical shift or hydrogen exchange
changes to find the binding site, the use of hydrogen
exchange or relaxation data to get dynamics informa-
tion on the binding process, the use of the main
isotope filtering and editing techniques as well as
transverse relaxation-optimized spectroscopy to
assign the NMR signals, and newly developed tech-
niques to deal with large complexes or to obtain long-
range distance restraints. The second part comprises
applications of these techniques to different protein–
DNA complexes. Protein–DNA complexes are clas-
sified according to the protein recognition motif:
helix-turn-helix (HTH), zinc finger, minor groove
binding motif and b-sheet. Finally, the third part
presents the future perspectives that can be inferred
from the emerging NMR techniques.
2. Overview of techniques
Protein–nucleic acids complexes are large entities
and the availability of
13
C- and
15
N-labeled proteins
has made the determination of their solution structures
attainable. Double and triple resonance spectroscopy
facilitates the resonance assignments, the measurement
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–11484
of coupling constants and of relaxation parameters not
accessible by proton resonance spectroscopy. It is
only recently that efficient labeling of DNA [6–9]
has been published thus opening applications of
heteronuclear spectroscopy to DNA. We will present
briefly the new labeling methods proposed for DNA.
We will also give an overview of the NMR techniques
used to extract structural information about protein–
DNA complexes including chemical shift changes,
hydrogen exchange rates, isotope editing and filtering
techniques and methods for measuring protein
dynamics to study the changes in protein flexibility
upon binding.
2.1. Labeling of DNA
Large quantities of labeled DNA fragments for
NMR studies can be synthesized by chemical or enzy-
matic methods. The chemical synthesis of DNA oligo-
mers involves the solid-phase phosphoramidite
method using isotopically labeled monomer units
[6]. Labeled ribonucleotides are prepared from the
isolation of bacterial RNA from cells grown in labeled
medium, the hydrolysis of RNA and the separation
of the ribonucleotides [7]. They are then chemi-
cally converted to deoxynucleotides and deriva-
tized into nucleoside 3
0
-phosphoramidites which
are used for preparing oligonucleotides on a DNA
synthesizer.
Using this method, a 14-base pair DNA duplex
fully
13
C,
15
N doubly-labeled as well as partially
labeled at those nucleotides that form the protein–
DNA interface has been prepared to study its inter-
action with the antennapedia homeodomain [8].
The general procedure for the production of
uniformly
13
C,
15
N-labeled DNA by enzymatic synth-
esis is described in Fig. 1. Zimmer and Crothers have
shown that milligram quantities of material can be
synthesized using this procedure [9]. Their method
comprises the production of uniformly
13
C,
15
N-
labeled deoxynucleotides from enzymatic hydrolysis
of the DNA of bacteria grown on 99%
13
CH
3
OH and
.98%
15
NH
4
Cl as sole carbon and nitrogen sources.
The labeled DNA are then converted enzymatically to
the triphosphates and used in a DNA polymerization
reaction that utilizes an oligonucleotide hairpin
primer-template containing a ribonucleotide at the 3
0
terminus. Alkaline hydrolysis of the ribonucleotide
linkage between the labeled DNA and the unlabeled
primer-template followed by purification yields the
labeled DNA. More recently variations of this method
have been proposed by two other groups [10,11].
Masse and coworkers [10] proposed three modifica-
tions. First, the mixed dNTPs are separated from one
another so that the ratio of the four dNTPs correspond-
ing to the sequence of the deoxyoligonucleotide are
used in the reaction. Secondly, Taq polymerase is
used instead of Klenow fragment of DNA polymerase
I in the polymerization step. Third, an additional step
is used to remove non-templated addition at the 3
0
end. Louis and coworkers [11] used the same mole-
cule for the primer and template in the bidirectional
polymerase chain reaction thus obtaining an exponen-
tial growth in the length of the double strand that
contains two repeats of the desired DNA sequence.
An additional method has been presented by Louis
and coworkers [11]. It comprises the growth of a
suitable plasmid containing mutiple copies of the
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114 85
Culture of cells with
13
C carbon source
and
15
N nitrogen source
cell lysis
phenol extraction
DNA and RNA proteins
Nucleic acids hydrolysis
5’ monophosphate nucleotide
nucleotide separation
dNMPs rNMPs
dNTPs
DNA oligonucleotide
Fig. 1. General procedure for the enzymatic synthesis of
13
C,
15
N-
labeled DNA.
desired DNA sequence in E. coli with
15
N and
13
C
nutrients. These methods have been applied to the
synthesis of fully or partially
13
C,
15
Nor
15
N-labeled
double strand oligonucleotides of 10–21 base pairs. A
32 base DNA oligonucleotide that folds to form an
intramolecular quadruplex as well as a 12 base oligo-
nucleotide that dimerizes and folds to form a quadru-
plex uniformly
13
C,
15
N-doubly labeled have also been
produced for NMR studies.
Both these methods require a high level of exper-
tise. Site specific labeling is more easily attained with
the chemical method and is therefore the method of
choice for the synthesis of site specific labeled DNA.
2.2. Chemical shift changes
Interactions of protein with DNA fragment contain-
ing specific binding sites are tight binding interactions
i.e. the dissociation constants K
d
are less than 10
28
M
and detailed information can be obtained on the
complexes because of the slow exchange regime
between free and bound states (lifetimes greater than
1 s) at the chemical shift time-scale. The rate of
exchange is much less than the difference in the
chemical shift between the two states and, at a mole
ratio less than the stoichiometric ratio, two sets of
resonances are observed corresponding to the free
and bound states. Therefore, the resonances of the
complex have to be assigned using NMR techniques
employed for large molecules and/or edited/filtered
techniques.
Fig. 2 shows the imino region of the
1
H spectra
obtained upon addition of different amounts of a solu-
tion of R2R3 DNA binding domain of c-Myb to a
solution of mim12 oligonucleotide [12]. On addition
of the protein, new resonance lines corresponding to
the bound mim12 dodecamer appear. Some of these
lines are split into two signals which indicate the
simultaneous presence of two forms. The lifetimes
of these two forms are longer than the inverse of the
frequency difference between the free and bound state
resonances.
Chemical shifts are very sensitive probes of the
local environments of the nucleus but unfortunately
it is not possible to predict their values from the
conformation of the complex or conversely to deduce
the conformation from their values. Nevertheless,
they are useful parameters to gain insight into the
parts of the molecules influenced by the interaction.
Schmiedeskamp and coworkers [13] have shown by
analysis of
1
H and
13
C
a
chemical shifts that little
change in the structure of the zinc-finger domain
from the yeast transcription factor ADR1 occurs
upon binding to a 14mer DNA containing the UAS
half site. A correlation between the protein–DNA
interface mapped by chemical shift changes and that
mapped by mutagenesis experiments was found.
However, the identification of the DNA binding site
using DNA induced chemical shift changes should be
done with care. This approach is not feasible for
numerous protein–DNA complexes where proteins
undergo conformational transitions and dynamics
changes upon binding that will affect the chemical
shifts. This has been recently demonstrated by Foster
and coworkers [14]. These authors analyzed the corre-
lation between the chemical shift changes upon
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–11486
Fig. 2. Imino region of the
1
H-NMR 600 MHz spectra obtained
upon addition of different amount of a solution of the R2R3 DNA
binding domain of c-Myb to a solution of mim12 oligonucleotide at
208C.
binding of the three aminoterminal zinc fingers of X.
laevis TFIIIA (zf1-3) to a 15-mer DNA with the inter-
molecular contacts known from the high-resolution
structure of the complex. They found that the chemi-
cal shift changes for protein
1
H,
15
N and
13
C reso-
nances upon DNA binding are not well correlated
with DNA contacts observed in the solution structure
of the complex. In fact the protein resonances are
affected not only by DNA binding but also by changes
in the dynamics and conformation of the protein upon
binding. The DNA base-protons were found to be
good markers of the DNA binding sites because the
conformation of the DNA is not significantly distorted
upon binding.
In the case of fast exchange between free and
bound states, the structure of the complex cannot
be obtained easily. Titration experiments monitor
the variation of chemical shifts upon addition of
DNA and estimation of binding constants (in the
millimolar range) can be extracted from the
analysis of the titration curves [15]. The chemical
shifts of the bound protein resonances are directly
obtained from these titration experiments. As in
the case of slow exchange, the variation of
chemical shifts can be used to map the binding
surface.
For intermediate exchange between the free and
bound states or between different bound conformations,
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114 87
Fig. 3. Amide proton exchange rate (s
-1
) versus residue number for the wild-type and AV77 apo- and holorepressors, pH 7.6 at 458C. (Fig. 1
from Ref. [19]). Reprinted with the permission of O. Jardetzky and of Cambridge University Press (q 1996).
broadening or disappearance of peaks occur prevent-
ing a detailed structural analysis.
2.3. Hydrogen exchange rates
As with chemical shifts, DNA-induced changes in
hydrogen exchange rates can be used with care to map
the DNA binding site by comparing amide proton
exchange rates of the free protein with those of the
protein–DNA complex [16,17].
Quantitative analysis of amide proton exchange
rates provides insights into the stability and dynamics
of the protein. Mau and coworkers [18] compared the
amide proton exchange rates of three forms of the
GAL4 transcriptional activator, the native Zn-contain-
ing protein, the Cd-substituted protein and a Zn-Gal4/
DNA complex. They showed that the Cd-substituted
GAL4 is destabilized relative to the native protein as
inferred from the slower exchange rates of the amide
proton of the native protein compared with the Cd
analogue. They observed a global retardation of
amide proton exchange upon binding to DNA, indi-
cating that internal fluctuations of the DNA-recogni-
tion module are significantly reduced by the presence
of DNA.
Gryk and coworkers [19] ascribed the enhanced
repressor activity at the trp operator in vivo of the
Val77 mutant of the Trp repressor to an increase in
the stability of the flexible DNA binding domain of
the Val77 mutant as deduced from the study of the
amide proton exchange rates as shown in Fig. 3.
The measurement of the imino proton exchange of
the DNA provides insights into the dynamic behavior
of the opening and closing rates of the base-pairs.
Dhavan and coworkers have analyzed the imino
proton exchange in the Integration Host Factor
(IHF)–DNA complex [16]. This E. coli DNA binding
protein is a minor groove binder and bends the DNA
by greater than 1408 at each site. They observed a
large overall reduction in exchange rates for the
DNA in the complex. In the complex, groups of adja-
cent base-pairs exchange at the same rate and appear
to close more slowly than the rate of imino proton
exchange with bulk water since their exchange rate
is independent of catalyst concentration. Thus frag-
ments of the DNA as large as 6 base-pairs open in a
cooperative manner and remain open much longer
than found for free DNA. Binding to IHF enhanced
the probability of opening the DNA helix. This may
play a role in processes that involve IHF and require
opening of the double helix.
2.4. Isotope editing and isotope filtering
The general approach used to study molecular
complexes involves uniform labeling of one compo-
nent with
15
N and/or
13
C while the other component is
unlabeled. Then isotope edited or isotope filtered
experiments are selected to obtain information on
one component of the system. Isotope edited experi-
ments detect proton signals attached to
13
C/
15
N nuclei
while isotope filtered experiments detect proton
signals attached to
12
C/
14
N nuclei and remove
13
C/
15
N attached proton signals [20–26].
In the case of protein–DNA complexes, the protein
is generally uniformly doubly
13
C,
15
N labeled and the
DNA is unlabeled. Protein signals are assigned using
3D double and triple resonance experiments. For the
DNA
12
C-filtered NOESY and HOHAHA experi-
ments are implemented [22,25,26]. The intermolecu-
lar NOEs are measured by 3D
13
C F1-filtered, F3
edited NOESY-HSQC experiments [23,24].
The assignment of DNA signals is often difficult
due to signal overlap especially for the deoxyribose
protons. Thus, labeled DNA will help to assign all the
DNA resonances and to get more detailed conforma-
tional features for the DNA as well as to define more
precisely in some cases the interface between the
protein and the DNA. The first example which
makes use of
13
C,
15
N labeled DNA was published
by Masse and coworkers (Fig. 4) [27]. These authors
studied the non-specific interaction between the High
Mobility Group (HMG)-DNA binding domain of
NHP6A and a 15 base pair DNA. Three samples of
13
C,
15
N-labeled DNA were prepared: one strand
labeled, the other strand labeled and the two strands
labeled. The majority of the base and deoxyribose
DNA resonances in the complex were assigned by
homonuclear techniques, but assignments of H4
0
,
H5
0
and H5
00
are particularly difficult and were
successfully made by using 3D
1
H–
13
C NOESY-
HMQC and HCCH-TOCSY experiments on the
three labeled protein–DNA samples. Unambiguous
assignments of intermolecular NOEs involving the
phosphodiester backbone were accomplished with
3D double half-filtered
1
H–
13
C HMQC experiments.
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–11488
2.5. Deuteration
In the case of large protein–DNA complexes, the
conventional backbone triple resonance experiments
are unsuccessful for providing complete assignment
of the protein resonances. Therefore, selective proto-
nation and/or uniform complete or fractional deutera-
tion in combination or not with
13
C,
15
N-labeling of the
protein are used to simplify proton spectra (Fig. 4) and
to overcome the problem of rapid transverse nuclear
spin relaxation [28].
The structure of a 37 kDa trp repressor–operator
DNA complex (homodimeric 107 residue E. coli trp
repressor bound to a 20 base pair palindromic DNA
operator) was determined by recording homonuclear
2D and 3D spectra for complexes with different
deuterium labeled trp repressor analogs as well as
heteronuclear spectra for complexes with uniformly
15
N,
13
C-labeled trp repressor [29].
The use of perdeuterated protein in H
2
O (i.e. .90%
2
H incorporation at nonlabile positions and about 90%
of labile positions protonated) led to the assignments
of almost all backbone and C
b
resonances of the
37 kDa trp repressor–operator DNA complex [30]
and of a 64 kDa repressor–operator complex (two
tandem dimers bound to a 22 base pair symmetric
DNA operator and the corepressor analog 5-methyl-
tryptophan) [31,32].
Samples of perdeuterated protein containing selec-
tive protonated or
15
N,
13
C,
1
H labeled residues are
used to characterize specific contacts between the
protein and the DNA. For example in the study of
the DNA binding domain of the transcription factor
NFATC1 bound to a 12 base pair DNA, Zhou and
coworkers [33] performed 2D
1
H–
1
H homonuclear
NOESY experiment on complexes containing
perdeuterated protein with fully protonated Tyr and
Phe residues to characterize the contacts between Tyr
442 and DNA. These authors also mentioned the use
of site-specific deuteration at C2 of Ade6 to confirm
the close proximity of Arg555 and Ade6.
2.6. Transverse relaxation-optimized spectroscopy
(TROSY)
Recently, Wu¨thrich and coworkers have proposed a
new approach to reduce significantly transverse
relaxation rates in multidimensional NMR experi-
ments and thus eliminate one of the obstacles to the
study of large molecules and complexes by NMR
[34–36].
The relaxation of peptide backbone
15
N nuclei is
dominated by the dipolar interaction between
15
N
nuclei and its directly attached proton and by the
chemical shift anisotropy interaction. As the
15
N
CSA tensor is nearly axially symmetric and has its
axis making a small angle with the N–H bond vector,
the
15
N nuclei will have a relaxation rate depending on
the spin state of the proton attached to it. TROSY uses
this differential relaxation to select only the compo-
nent which relaxes the more slowly. Using this
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114 89
Fig. 4. Portion of
1
H–
13
C HSQC spectra at 298 K in D
2
O, showing
the correlations between aromatic protons and carbons of a 15 base
pair DNA containing the binding site of NHP6. Upper spectrum:
sample of
13
C,
15
N 15-mer DNA with upper strand labeled only.
Lower spectrum: sample of
13
C,
15
N 15-mer DNA with lower strand
labeled only (adapted from Fig. 8 of Ref. [27]). Reprinted with the
permission of J. Feigon and of Oxford University Press (q 1999).
approach, Wu¨thrich and coworkers observed a signif-
icant reduction in the linewidth for
15
N and
1
Hina2D
1
H,
15
N correlation experiment performed with a
uniformly
15
N-labeled protein complex with a DNA
fragment at 750 MHz and 48C tc 20 1 = 2 2ns :
This TROSY principle has been implemented in the
conventional triple resonance experiments HNCA,
HNCO, HN(CO)CA, HN(CA)CO, HNCACB and
HN(CO)CACB. A 2–3-fold enhancement in the
signal-to-noise ratio has been observed when applied
to
2
H/
13
C/
15
N-labeled proteins and significant gains of
sensitivity were measured or predicted for protonated
proteins. The highest sensitivity gains are obtained for
the regular secondary structure elements in the protein
core. Studies of protein–DNA complexes should
benefit from the implementation of the TROSY
principle.
2.7. Long-range distance constraints
Bax and coworkers have proposed the use of the
magnetic field dependence of the dipolar
1
H–
15
N and
1
H–
13
C couplings [37] and of the
15
N shift [38] to
measure the orientation of NH, CH or CC bond
vectors relative to the magnetic susceptibility tensor.
Thus, these measurements will provide long-range
constraints between distinct regions of the complex.
Molecules with an anisotropic magnetic susceptibility
will align along the static magnetic field to a degree
which is proportional to the product of the anisotropy
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–11490
Fig. 5. Backbone (b) and side-chain (s) relaxation parameters of the T1r (upper graph) and [
1
H–
15
N] NOE (lower graph) at 600 MHz for the
free (black bars) and the DNA-bound (hatched bars) lac repressor headpiece. The backbone and side-chain parameters are indicated with “b”
and “s”, respectively. For Asn, ‘side-chain’ refers to the N
d
; Gln and Arg, this refers to N
e
. (Fig. 4 from Ref. [40]). Reprinted with the
permission of R. Kaptein and of the American Chemical Society (q 1997).
of the molecular magnetic susceptibility and the
square of the magnetic field strength. As a result,
the dipolar couplings or the chemical shifts vary
with the strength of the magnetic field and depend
on the orientation of the bond vector or chemical
shift tensors relative to the magnetic susceptibility
tensor. These small effects were observed for DNA
or protein–DNA complexes due to the contributions
of the stacked aromatic groups of the DNA bases to
the magnetic susceptibility tensor. The dipolar
coupling restraints have been incorporated in the
simulated annealing protocol for structure determina-
tion of the complex of the DNA binding domain of
GATA-1 with a 20 base pair DNA [37]. When
compared with the structure calculated without
1
H–
15
N and
13
C
a
–
1
H
a
dipolar couplings, the overall
precision of the coordinates increased only slightly
but the percentage of residues in the most favorable
region of the Ramachandran map and the number of
bad contacts improved significantly. A large displace-
ment in the short loop connecting strands b3 and b4
was found. The magnetic field dependent
15
N shifts
correlated well with the structure of the GATA1–
DNA complex refined with
1
H–
15
N and
13
C
a
–
1
H
a
dipolar coupling constraints [38].
2.8. Dynamics
Measurements of
15
N spin–lattice and spin–spin
relaxation rates as well as steady state
1
H–
15
N hetero-
nuclear NOEs provide information about internal
motions on the pico- to nanosecond time-scale and
on conformational dynamics on the micro- to nano-
second time-scales [39]. The three examples given
below, illustrate the role of dynamics in protein–
DNA recognition. The dynamics studies on lac repres-
sor headpiece (1–56) [40] and on the three amino-
terminal zinc fingers of X. laevis TFIIIA [41] show
that the process of recognition is dynamic and not
static.
15
NT
1
,T
1r
, and [
1
H–
15
N] NOE experiments were
performed on uniformly
15
N-labeled free and DNA
bound lac repressor headpiece (1–56) [40]. For the
free lac repressor headpiece (1–56), the backbone of
the three a-helices and of the turn of the HTH motif is
rather rigid, whereas the backbone of the loop
between helices II and III is more mobile. Upon bind-
ing to the DNA, several changes in the mobility occur.
The most remarkable changes take place in the loop
between helices II and III: His29 within this loop
contacts the DNA. A large decrease in backbone
mobility within this loop is detected. The relaxation
parameters of most
15
N-containing side-chains
(Gln18, Arg22, Asn25, Gln26, Asn50, and Arg51)
have also been measured (Fig. 5). Some of the side-
chains of DNA-contacting residues show a significant
decrease in mobility upon DNA binding while others
are about equally mobile in both the free and the
bound state. This indicates that interactions with
DNA do not necessarily restrict the mobility of the
side-chain upon binding and that some flexibility
remains at the interface between the protein and the
DNA.
15
NT
1r
measurements indicate that the side-
chain of residues Gln18, Arg22 and Asn25 undergo
intermediate exchange (ms to ms time-scale) which
may indicate that these atoms are changing partners
in hydrogen bonds.
The dynamics of the three aminoterminal zinc
fingers of X. laevis TFIIIA (zf1-3) bound to a 15-
mer DNA has been studied by
15
N NMR [41]. The
flexibility of the backbone of the linker residues
(except Lys41) is significantly reduced upon DNA
binding. This reduction is associated with the forma-
tion of a defined conformation and close packing
interactions between the side-chains within the linker
and with the side-chains of the neighboring finger.
Some flexibility has been found for the protein–
DNA interface as indicated by the broadening of reso-
nances or weak connectivities observed for some
lysine resonances (Lys26, Lys29, Lys87). In fact,
analysis of the surface electrostatic potential at the
DNA binding site where these side-chains interact
suggests that these fluctuations arise from the fact
that these side-chains adopt different isoenergetic
conformations with different patterns of hydrogen
bonds to DNA bases.
The essential DNA binding domain of the yeast
ADR1 undergoes a disorder-to-order transition when
it binds to a 14 base-pair DNA duplex containing the
UAS1 binding site [13] as evidenced by
15
N relaxation
measurements. The free DNA binding domain of
ADR1 is composed of three distinct motional regions
and behaves like two beads linked by a flexible string.
Upon binding, most of this domain tumbles like a
single domain with reduced picosecond time-scale
motions compared to the free form.
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114 91
2.9. Hydration
Water molecules are important contributors in the
process of protein–DNA recognition as they may
have structural and /or functional roles.
NMR can provide information about the location
and lifetime of the contacts between water and the
protein/DNA [3,42,43]. The residence times of hydra-
tion water can be estimated from the measurements of
NOEs and ROEs between water protons and protein
or DNA protons. These measurements distinguish
residence times of less than 1 ns from longer ones.
Typically residence times shorter than 1 ns are
observed on the surface of protein and in the major
groove of DNA while residence times longer than 1 ns
have been observed for water molecules in the interior
of proteins, in the minor grooves of DNA and in
protein–DNA interfaces.
The NMR study of the Antennapedia homeodo-
main–DNA complex reveals that water molecules
are present at the protein–DNA interface: contacts
between protein and water have been observed for
amino acid residues 43, 44, 47, 48, 50, 51, 52 and
54 (Fig. 6 [44]). These water molecules exchange
slowly with the bulk solvent (residence times between
1 ns and 20 ms) [45] similar to water molecules in the
interior of proteins and have multiple preferred loca-
tions. In addition, two residues at the protein–DNA
interface, Asn51 (strictly conserved) and Gln50 (func-
tionally important), contact several DNA bases with
transient water mediated hydrogen bonds. The model
proposed for the interactions between the protein and
the DNA consists of a fluctuating network of hydro-
gen bonds between the polar groups of the protein and
the DNA and water molecules.
In contrast to other protein–DNA complexes, the
complex between the DNA binding domain of
chicken GATA-1 and a 16 base pair duplex is char-
acterized by only two hydrogen bonds between the
protein and the DNA [46]. The specific interactions
involve hydrophobic contacts between the methyl
groups of the protein and the DNA bases. Clore and
coworkers have found water molecules around all
surface exposed methyl groups as well as around
methyl groups in the neighborhood of the sugar-phos-
phate backbone but the water molecules are excluded
from the interface between the protein and the DNA
bases in the major groove [47]. They also observed
water molecules around the backbone amide proton of
Ala30, Tyr34 and Tyr35 which are close to phosphate
groups. This suggests that these water molecules
participate in bridging hydrogen bonds between the
sugar-phosphate backbone and the relevant amide
groups.
3. Selected applications
Table 1 summarizes the protein sequence
motifs and DNA sequence of the protein–DNA
complexes discussed below. It also includes a
summary of the direct interactions between the
amino acid side-chains and the nucleic acid
bases.
3.1. The helix-turn-helix motif
The HTH motif consists of two nearly perpendicu-
lar a-helices separated by a link of variable length.
The second helix of this motif called the “recognition
helix” inserts into the major groove of the DNA to
make specific contacts. Variations between members
of the HTH family include the orientation of the helix
in the major groove, the position of the residues
contacting the DNA and the length of the recognition
helix. This motif first identified in procaryotic gene-
regulatory proteins can be found in a wide variety of
DNA-binding proteins including eukaryotic homeo-
domains and transcription factors.
3.1.1. Homeodomain
A homeodomain protein is the product of homeo-
box genes. It is a highly conserved DNA-binding
domain of about 60 amino acid residues that is
found in transcriptional regulators involved in the
genetic control of development. These regulators
specify to the embryonic cells the positional informa-
tion (where they are relative to their neighbors) and
the segmental identity (what structure they should
generate). They act at various levels of the develop-
ment and in all organisms, from yeast to human.
Mutations in the homeodomain could result in genetic
diseases and developmental abnormalities. Therefore,
in order to understand the role of individual amino
acid residues in tertiary structure formation and
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–11492
sequence specific DNA binding, numerous NMR and
X-ray studies have been carried out.
The solution structure of two homeodomain–DNA
complexes have been solved by NMR: the Drosophila
antennapedia mutant homeodomain (Antp(C39S))
bound to a 14-mer duplex DNA containing the BS2
site [1] and the Drosophila ventral nervous system
(vnd)-NK2 homeodomain bound to a 16-mer duplex
DNA containing the vnd/NK2 binding site [48].
Antennapedia is probably the most well-known
homeodomain and its overexpression in the Droso-
phila embryo leads in a fly with an extra pair of feet
instead of antennae. The determination of the solution
structure of the Antp(C39S)–DNA complex (molecu-
lar weight about 18,000) was made possible by the
development of isotope edited and filtered techniques.
Due to the poor quality of the COSY and TOCSY
spectra of the Antp(C39S)–DNA complex, Otting
and coworkers [49] used a strategy based on NOE
data obtained from 2D [
1
H,
1
H] NOESY with
15
N(v2)-half-filter and 3D
15
N correlated [
1
H,
1
H]
NOESY on a sample containing
15
N uniformly
labeled Antp(C39S) and unlabeled DNA to assign
the resonances. As shown in Fig. 7, these experiments
discriminate along thev2-frequency axis between reso-
nances of protons bound to
15
N from others. The sum
spectrum (Fig. 7A) contains the diagonal peaks and
cross peaks with all DNA resonances and with those
protons of the protein not bound to
15
N while the differ-
ence spectrum (Fig. 7B) contains the diagonal peaks
and cross peaks with the amide protons of the protein.
The assumption is made and verified that the
conformations of both the protein and the DNA are
similar in the free state and in the complex. Using this
strategy, all proton resonances of the polypeptide
backbone (except Met0 and Arg1), b-protons for 60
residues, g-protons for 40 residues, all non-
exchangeable side-chain protons for 34 residues as
well as all nonexchangeable base protons, all 1
0
sugar protons and with two exceptions all 2
0
H, 2
00
H
and 3
0
H resonances of the DNA were assigned. Four-
teen intermolecular protein–DNA NOEs involving
amino acid residues Arg5, Tyr8, Tyr25, Ile 47,
Gln50 and Met54 were identified and these allowed
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114 93
Fig. 6. Snapshot of the protein–DNA interface after 1148 ps of MD simulation of the Antp HD–DNA complex (Fig. 4. from Ref. [44]). All
atoms of the protein are represented in cyan except for the side-chains of Ile47 (yellow), Gln50 (pink), Asn51 (gray) and Met54 (green). The a-
strand of the DNA is colored orange and the b-strand red. The water molecules at the interface are represented by dark blue sphere. Reprinted
with the permission of K. Wu¨thrich. Copyright (1996) held by Cell Press.
N.
Jamin,
F.
Toma
/
Progress
in
Nuclear
Magnetic
Resonance
Spectroscop
y
3
8
(2001)
83
–
114
94
Table 1
Protein–DNA complexes studied by NMR
Protein name Prosite pattern Pattern/protein sequence
a
DNA
b
fragment used in NMR
study
Summary of direct protein–DNA interactions Ref.
Amino acid
side-chains
Nucleic acid side-chains
Antp(C39S) Homeobox_1
34
IAHALSLTERQIKIWFQNR GAAAGCCATTAGAG Ile47 Thy 8, Ade 9 and Ade 10 strand b [1]
RMKWK Gln50 Cyt 6, Cyt 7 and Ade8 strand a; Thy8
strand b
Asn51 Ade10 strand b
Met54 Thy9 strand a
Vnd/NK-2 Homeobox_1
34
LASLIRLTPTQVKIWFQNHRYKTK TGTGTCAAGTGGCTT Lys3 Ade7, Ade8 and Gua9 strand a [48]
Arg5 Cyt6 and Ade7 strand a; Ade 5 strand b
Ile47 Ade8 and Ade7 strand a
Gln50 Ade10 and Cyt11 strand b; Gua9 and
Thy10 strand a
Asn51 Ade7 and Ade8 strand a
Tyr54 Thy8, Cyt9 and Ade10 strand b
Lae repressor HP56 HTH_LACI_FAMILY
6
LYDVAEYAGVSYQTVSRVV AATTGTGAGCG
c
Tyr17 Thy6 strand a; Cyt9 and Thy8 strand b [2]
Gln18 Cyt7 and Thy8 strand b
CRYSTALLIN_BETA
4
VTLYDVAEYAGVSYQT Ser21 Thy8 strand b
GAMMA Arg22 Gua5 strand a
Asn25 Thy8 strand b
Gln26 Cyt7 strand b
His29 Thy3 strand a
HP62 GAATTGTGAGCGCT Leu6 Cyt10 and Thy9 strand b [55]
CACAATTC Tyr7 Cyt10 strand b
Tyr17 Thy7, Gua8 and Thy9 strand a
Gln18 Ade7 and Cyt8 strand b
His29 Thy4 strand a
Leu56 Gua13 strand b
Trp repressor –
1
QSPYSAAMAEQRHQEWLRF CGTACTAGTTAACTA Arg69 Gua2 and Thy3 strand a [29]
VDLLKNAYQNDLHLPLLNLM GTACG Lys72 Gua16 strand a
LTPDEREALGTRVRIVEELL Ile79 Gua16 or Ade15 strand a
RGEMSQRELKNELGAGIATI Ala80 Ade15 strand a
TRGSNSLKAAPVELRQWLEE Thr83 Ade4 strand a
VLLKSD Arg84 Cyt13 strand a
hETS1 ETS_domain_2
71
KPKMNYEKLSRGLRYY TCGAGCCGGAAGTTCGA Arg81 Gua8, Gua9 strand a; Cyt8, Cyt9 and
Thy10 strand b
[58]
ETS_domain_1
27
LWQFLLELL Gly82 Thy10 and Thy11 strand b
Arg84 Gua8 and Gua9 strand a
Tyr85 Ade11 and Gua12 strand a; Thy11 and
Cyt12 strand b
Mouse c-Myb Myb_1
95
WTKEEDQRV CCTAACTGACA Lys41 Gua7 strand a [61]
147
WTEEEDRII Glu45 Cyt6 strand a; Cyt8 strand b
Myb_2
115
WSVIAKHLKGRIGKQCRE Lys95 Gua6 strand b
RWHNHL Asn96 Ade4 strand a
166
WAEIAKLLPGRTDNAIKNHWNSTM Asn99 Thy4 strand b
N.
Jamin,
F.
Toma
/
Progress
in
Nuclear
Magneti
c
Resonance
Spectroscopy
38
(2001)
83
–
114
95
Table 1 (continued)
Protein name Prosite pattern Pattern/protein sequence
a
DNA
b
fragment used in NMR
study
Summary of direct protein–DNA interactions Ref.
Amino acid
side-chains
Nucleic acid side-chains
TFIIIA ZINC_FINGER_C2H2
15
CSFADCGAAYNKNWKLQAH TTGGATGGGAGACC Finger 1:
Lys26
Gua14, Gua13 and Thy12 strand b [64]
LSKH G Finger 1:
Trp28
Gua9, Ade10 and Gua11 strand a; Thy12
strand b
45
CKEEGCEKGFTSLHHLTRH
SLTH Finger 1:
Lys29
Ade10 and Gua11 strand a
75
CDSDGCDLRFTTKANMKKH Finger 2:
His58
Thy10 strand b
FNRFH Finger 2:
His59
Gua8 and Gua9 strand a
Finger 2:
Arg62
Gua7 strand a
Finger 3:
Thr86
Thy6 strand a
Finger 3:
Ala88
Cyt7 strand b
Finger 3:
Asn89
Ade5 strand a
Finger 3:
Arg92
Gua3 and Gua4 strand a; Thy5 strand b
Finger 3:
Arg96
Gua 3 strand a
ADRI ZINC_FINGER_C2H2
106
CEVCTRAFARQEHLKRHY CCATCTCCAACTTAT
d
[65]
RSH AAGTTGGAGATCC
GATA1 GATA_ZN_FINGER
7
CSNCQTSTTTLWRRSPMGD GTTGCAGATAAACAT Thr16 Ade9 and Thy8 strand b [46]
PVCNAC T Leu17 Ade6 and Gua7 strand a; Thy8 strand b
Asn29 Ade8 strand a; Ade9 strand b
Leu33 Ade9 and Thy10 strand b
Leu37 Thy10 and Thy11 strand b
Lys57 Thy9 strand a
GAGA ZINC_FINGER_C2H2
36
CPICYAVIRQSRNLRRHLELRH GCCGAGAGTAG Arg14 Cyt6 strand b; Ade7 strand a [66]
Lys16 Cyt4 and Thy5 strand b; Ade5 strand a
Ser26 Thy10 strand b
Arg27 Gua8 strand a
Ser30 Cyt8 and Ade9 strand b
Arg47 Gua6 strand a; Thy7 strand b
Asn48 Ade5 strand a
Arg51 Gua4 strand a
SRY human –
1
VQDRVKRPMNAFIVWSRDQ GCACAAAC Asn10 Cyt4 and Ade5 strand a; Gua5 strand b [70]
RRKMALENPRMRNSEISKQL Phe12 Thy5 and Thy6 strand b
GYQWKMLTEAEKWPFFQEAQ Ile13 Ade5 and Ade6 strand a
KLQAMHREKYPNYKYRP Ser33 Gua8 strand b
N.
Jamin,
F.
Toma
/
Progress
in
Nuclear
Magnetic
Resonance
Spectroscop
y
3
8
(2001)
83
–
114
96
Table 1 (continued)
Protein name Prosite pattern Pattern/protein sequence
a
DNA
b
fragment used in NMR
study
Summary of direct protein–DNA interactions Ref.
Amino acid
side-chains
Nucleic acid side-chains
Ile35 Ade7 and Cyt8 strand a; Thy7 and Gua8
strand b
Ser36 Thy7 strand b
Tyr74 Ade3 strand a; Thy3 strand b
LEF1 mouse –
1
MHIKKPLNAFMLYMKEMRA CACCCTTTGAAGCTC Asn7 Thy8 and Gua9 strand a [68]
NVVAESTLKESAAINQILGR Met10 Ade7 and Ade8 strand b
RWHALSREEQAKYYELARKE Glu28 Gua4 and Gua5 strand b
RQLHMQLYPGWSARDNYGKK Ser29 Gua5 and Ade6 strand b
KKRKREK Ala30 Cyt5 and Thy6 strand a
Asn33 Thy6 strand a
Tyr75 Ade11 and Gua12 strand a; Thy11 strand b
HMGI(Y) human HMGI(Y)
5
TPKRPRGRPKG GGGAAATTCCTC Arg10 Cyt9 strand a; Ade7 and Ade8 strand b [69]
Gly11 Ade6 strand a
Arg12 Ade5 and Ade6 strand a; Thy4 strand b
Tn916 –
1
EKRRDNRGRILKTGESQRK GAGTAGTAAATTC Leu26 Thy4 strand a [72]
DGRYLYKYIDSFGEPQFVYS Lys28 Gua3 strand a
WKLVATDRVPAGKRDAISLR Pro36 Thy5 strand b
EKIAELQKDI Phe38 Thy5 strand b
Tyr40 Cyt6 strand b
GCC –
1
KHYRGVRQRPWGKFAAEIR GCTAGCCGCCAGC Arg150 Gua7 and Cyt8 strand b [73]
DPAKNGARVWLGTFETAEDA Arg152 Gua5 strand a; Gua6 strand b
ALAYDRAAFRMRGSRALLNF Trp154 Thy3 and Ade4 strand a
PLRV Glu160 Cyt8 strand b
Arg162 Gua10 and Thy11 strand b
Arg170 Gua8 and Cyt7 strand a
Trp172 Gua5 and Cyt6 strand a
a
The entire protein sequence is shown if no prosite pattern has been defined.
b
All DNA fragments used are complementary double strand DNA, therefore only the sequence of the a strand (5
0
–3
0
direction) is displayed. The first base is numbered 1. Target
(or consensus) sequences are shown in bold; required sequences such as A,T-tracts are underlined.
c
Half-operator sequence.
d
Only a model as been proposed.
a unique docking of the protein on the DNA to be
determined. The determination of the solution struc-
ture of the complex needed more NOE data, therefore
the protein was uniformly labeled with
13
C and a 2D
[
1
H,
1
H] NOESY with
13
C(v1, v2) double-half-filter
as well as a 3D
13
C correlated [
1
H,
1
H] NOESY were
recorded [1,50]. These experiments lead to a complete
proton resonance assignment and almost complete
assignment of
13
C and
15
N resonances of the amino
acid side-chains for the DNA-bound Antp(C39S)
homeodomain [50]. Following these assignments,
1123 different intramolecular protein–protein NOEs
were assigned and yielded 855 NOE upper distance
constraints for the protein. Using these distance
constraints and in addition 155 dihedral angle
constraints on f, c and x1 derived from NOE and
3
J
HNa
data, the solution structure of the DNA-bound
protein was determined without consideration of the
presence of the DNA. This structure was used as a
starting point for the refinement of the structure of
the complex. A rmsd value of 0.80 A
?
between the
mean backbone atom (N, C
a
,C
0
) coordinates of the
DNA-bound and free Antp(C39S) homeodomain indi-
cates that the global fold of the protein is the same in
both states. Additional assignments were also made
for the proton resonances of the DNA as well as addi-
tional intermolecular NOEs involving different side-
chain protons of the amino acid residues already iden-
tified or new amino acid residues (Arg3, Arg43 and
Lys46). The homeodomain forms three helices with
helix III and the N terminal arm inserted into the
major and minor grooves of the DNA, respectively
(Fig. 8). The intermolecular contacts involve from
the protein side: the recognition helix (residues 43–
55) and outside the recognition helix, residues Arg3,
Arg5, Gln6 and Tyr8 of the N-terminal arm of the
homeodomain, Tyr25 in the loop preceding helix II,
Arg28 and Arg31 at the start of helix II and, from the
DNA side: the base pairs 4–13. Salt bridges between
arginine and lysine side-chains of the protein with
phosphate groups of the DNA were observed for
Arg3 of the N-terminal segment, Arg28 and both
ends of the recognition helix (Arg31, Arg43, Arg52,
Arg53 and Lys55) and in the DNA the base pairs 4–
12. Hydrophobic interactions were found between
deoxyriboses of the DNA and the N-terminal arm
(Arg5, Gln6), the loop between helix I and helix II
(Tyr25) and the recognition helix (Arg43, Lys46,
Ile47, Gln50, Met54) and also between bases of the
DNA and the recognition helix (Ile47, Gln50, Met54).
Other intermolecular short contacts (shorter than
3.5 A
?
) also involved residues 5, 6, 8, 25, 43, 44, 47,
50, 54 and base pairs 7–13 of the DNA. The specific
DNA-contacts are formed with the side-chain of Arg5
of the N-terminal arm and with the well-determined
side-chain of Ile47 and the side-chains of Gln50 and
Met54. The conformation of Gln50, a key residue for
specific DNA recognition and, the conformation of
the invariant Asn51, are not well determined. In
fact, Billeter et al. [1] propose that these two residues
are part of a fluctuating network of interactions invol-
ving the DNA bases Ade9 and Ade10 and water mole-
cules in the protein–DNA interface underlying the
important role of water molecules in the homeodo-
main–DNA interaction. The Antp homeodomain–
DNA complex has been crystallized and its structure
has been determined at 2.4 A
?
resolution [51]. There
are two complexes in the asymmetric unit. The crystal
structure is in agreement with the NMR data but
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114 97
Fig. 7. Spectral region w1 21:0–10:5 ppm; w2 5:0–10:5 ppm
of
15
N-edited [
1
H,
1
H]-NOESY spectra of a 3.5 mM solution of the
1:1 complex formed from uniformly
15
N-labeled Antp (C39S)
homeodomain and DNA 14 mer at 368C. (A) sum spectrum. (B)
difference spectrum. Adapted from Fig. 2 of Ref. [49]. Reprinted
with the permission of K. Wu¨thrich and of the EMBO Journal
(q 1990).
indicates that Gln50 has two major conformations and
that Asn51 has a well-defined conformation and
makes specific contact with Ade9 (5
0
-TAATGG-3
0
).
This is not incompatible with NMR data as X-ray
provides information on the most populated and stable
conformation. In addition X-ray indicates the location
and structural role of several water molecules in parti-
cular a water molecule which hydrogen bonds to
Gln50, to Asn51 and to the O4 of Thy8 for complex
B (and to the N4 group of C7 for complex A) and two
water molecules that mediate contact between the
phosphate backbone and residues Trp48 and Asn51.
The vnd/NK2 homeodomain of Drosophila mela-
nogaster is the earliest predominantly neural gene
regulator found so far that is expressed in part of the
ventrolateral neurogenic anlage, which gives rise to
part of the central nervous system of the embryo. In
contrast to other homeodomains, which recognize the
consensus sequence 5
0
-TAATGG-3
0
, the vnd/NK2
homeodomain recognizes the consensus sequence
5
0
-CAAGTG-3
0
. Gruschus and coworkers determined
the solution structure of the complex between the vnd/
NK2 homeodomain and a 16-mer duplex DNA
containing the vnd/NK2 binding site [48]. The protein
was uniformly labeled with
15
N and uniformly doubly
15
N/
13
C labeled. 3D
15
N edited NOESY-HMQC, 3D
13
C edited NOESY-HMQC yield intra protein and
protein–DNA distance restraints while 2D
12
C-
filtered NOESY and 2D NOESY with a 1-1 semi
selective excitation pulse provide intra-DNA distance
restraints. A 3D
12
C-filtered/
13
C-edited NOESY spec-
trum was measured and yielded only NOE cross peaks
between the protein and the DNA. The implementa-
tion of a modified water flip back technique in a 3D
15
N edited NOESY-HMQC experiment to enhance the
signal intensity of weak side-chain resonances allows
the observation of a contact between the invariant
Asn51 and A3 (5
0
-CAAGTG-3
0
) [52] that had never
been detected by NMR but had been found in crystal
structures of other homeodomain–DNA complexes.
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–11498
Fig. 8. Stereo view of one of the 16 final conformers of the Antp(C39S) homeodomain–DNA complex. Only the heavy atoms of residues 3–56
are displayed. The a-strand of the DNA is colored brown and the b-strand magenta. The polypeptide backbone is represented in cyan and the
following colors are used for the side-chains: Arg and Lys, blue; Glu, red; Ala, Ile, Leu, Met, Phe and Trp, yellow; Asn, Gln, His, Ser, Thr and
Tyr, white (Fig. 2 from Ref. [1]). Reprinted with the permission of K. Wu¨thrich and of the publisher, Academic Press (q 1993).
In the N-terminal arm, Lys3 and Arg5 contact the
DNA bases while Val6, Leu7 and Phe8 contact the
ribose/phosphate backbone of the a-strand. The resi-
dues of the recognition helix involved in major groove
contacts are: Ile47, Gln50, Asn51 and Tyr54. The
main differences between the structure of the vnd/
NK2 homeodomain and other homeodomains
when bound to DNA is the bend of the recogni-
tion helix which is smaller than 108 for vnd/NK
2
while it is about 158 for the other homeodomain
and, the turn between helix II and helix III which
show large variations in all the homeodomain–
DNA complexes. The most significant variation
in the complex is the orientation of the homeodo-
main, especially the recognition helix, relative to
the DNA (Fig. 9). Gruschus and coworkers have
proposed that this difference in orientation could
be related to the different specific consensus
sequence recognised by vnd/NK2 homeodomain
(5
0
-CAAGTG-3
0
) and by Antp, engrailed, paired
and oct-1 homeodomain (5
0
-TAATGG-3
0
). The
other explanation that they propose is the unique
manner in which Tyr54 of the vnd/NK2 homeo-
domain contacts Cyt4 in the b-strand of the DNA.
3.1.2. Lac repressor headpiece
The lac repressor regulates the lactose metabolism
in E. coli by binding to a specific sequence of the lac
operator. The lac repressor is a tetramer of four iden-
tical subunits, each of which contains the principal
DNA binding site in its N-terminal domain (head-
piece). Each dimer binds one operator DNA sequence
with its two headpieces. The structure of the complex
of the lac repressor headpiece (residues 1–56, HP56)
and an 11-mer operator (half site) has been deter-
mined by 2D proton NMR and restrained molecular
dynamics [2]. Due to the molecular weight of the
complex (about 13,000), the proton resonance assign-
ments relied on the sequential assignment of both the
protein and the DNA based on TOCSY and NOESY
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114 99
Fig. 9. Superposition of the optimized vnd/NK-2 DNA structure (with protein backbone shown in pink and DNA backbone shown in gray) with
the average homeodomain–DNA complex generated from the Antennapedia, engrailed, paired and oct-1 complexes (with protein backbone
shown in blue and DNA backbone shown in green) (Fig. 6 from Ref. [48]). Reprinted with the permission of J.A. Feretti and of the publisher,
Academic Press (q 1999).
experiments in combination with the comparison of
the spectra of the free species and of the spectra with
different protein/DNA ratios or with different salt
concentrations. The structure of HP56 contains three
helices, the two first helices (residues 6–13 and
residues 17–24) are part of a HTH motif with the
second helix of this motif placed in the major groove
of the DNA but in the reverse orientation to that found
for the corresponding helix in the lambdoid repressor
proteins and CAP (Fig. 10). Only two direct hydrogen
bonds were observed between N
h2
of Arg22 and O
6
of
Gua5 and between O
e1
of Gln18 and N
4
of Cyt7 while
numerous hydrophobic contacts were found. The
methyl groups of Thy3 and Thy8 are involved in
hydrophobic interactions with, respectively, the ring
of His29 and side-chain atoms of Tyr17 and Ser21.
Hydrophobic interactions are also observed between
Tyr17 and Thy6, Tyr17 and Cyt9, Gln26 and Cyt7. In
fact, hydrogen bonds, hydrophobic interactions and
water interactions are interconnected to form
networks of interactions. For example, C
g
of Thr19
is hydrogen bonded with the phosphate group of
Thy4, which is involved in water-mediated interac-
tions with Ser16-N, His29-O and Thr34-C
g
. Thr34-
O
g
has water-bridged contacts with Ser31-N and
Thr34-N and in 35% of the configurations calculated
from molecular dynamics, His29-O and Thr34-C
g
are
bridged by a water molecule. Large variations in the
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114100
Fig. 10. Schematic view of HP56–DNA interactions. Methyl groups are indicated by black balls, phosphate groups by stripped circles and
protons in H-bonds by small circles (Fig. 8 from Ref. [2]. Reprinted with the permission of R. Kaptein and of the publisher, Academic Press
(q 1993).
conformation and dynamics of the loop between helix
II and helix III of the headpiece occur upon binding to
the operator [53]. This flexible loop fits to the DNA
upon binding thus allowing Asn25 and His29 to
contact the DNA and become more rigid. Moreover,
most of the side-chains which contact the DNA become
more rigid upon binding. The flexibility of the free head-
piece is essential for a good fit to the DNA. Thermody-
namic studies [54] have found a large negative heat
capacity change, larger than that could be expected
from a rigid body protein–DNA association and
which is associated with a local folding transition
upon binding. The conformational changes of the loop
between helix II and helix III as well as the formation of
a fourth helix (residues 50–58) could explain this
change in heat capacity upon complexation.
Recent X-ray and NMR experiments have shown
the importance of the hinge region, which connects
the DNA binding domain to the inducer-binding core
domain, upon DNA binding. The NMR structure of
the complex composed of two headpieces (residues
1–62, HP62) containing the hinge helix and a 22
base pair DNA fragment containing the full operator
sequence has been determined (Fig. 11) [55]. The two
headpieces bind symmetrically to the operator with
their HTH motifs inserted into the major groove.
The hinge helices are antiparallel and form the inter-
face of the two headpieces in the center of the opera-
tor. The hinge helices are essential for the high affinity
binding of the repressor to the operator. When an
inducer molecule binds to the core region, the two
hinge helices separate and unfold causing a decrease
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114 101
Fig. 11. Two perpendicular views of the overlay of the 11 final structures of the HP62–DNA complex. Adapted from Fig. 1 of Ref. [65].
Reprinted with the permission of R. Kaptein and of Elsevier Science (q 1999).
of the repressor affinity to the operator. Residues of
the hinge helices interact with the minor groove open-
ing it and inducing a global bend of the DNA by about
458. These distortions of the DNA structure lead to
some changes in the interactions in the major groove.
The largest difference is found in the loop following
the recognition helix: only residues 29–31 interact
with the DNA, the interactions of Gln26 with the
DNA observed in the HP56–DNA complex are
absent.
3.1.3. Trp repressor
In the presence of l-tryptophan, the E. coli trp
repressor binds to at least five operators in the E.
coli genome. Thereby it represses initiation of tran-
scription of genes involved in tryptophan uptake and
biosynthesis in response to intracellular level of l-
tryptophan. The E. coli trp repressor is a homodimer
of 107 amino acid residues. Its free form (aporepres-
sor) binds weakly and non-specifically to DNA while
its active form (holorepressor) contains two molecules
of the corepressor l-tryptophan and binds to an 18
base-pair consensus operator sequence with a 1:1 stoi-
chiometry.
The crystal and solution structure of the repressor
have been determined and have shown that each
monomer comprises six a helices. Helices A, B, C,
F intertwin with the corresponding helices of the
second monomer and therefore form the hydrophobic
central core while helices D and E form a HTH DNA
binding motif.
The solution structure of a ternary complex
between l-trp, trp repressor and a consensus 20 base
pair operator DNA has been determined [29]. The
assignments of l-trp, repressor and operator reso-
nances benefit from the use of a large number of
labeled samples. The assignment of l-trp resonances
was derived from the analysis of 2D NOESY spectra
of complex containing fully deuterated trp repressor
and of isotope filtered NOESY spectra of the
uniformly
13
C-labeled repressor complex. The use of
a sample containing
13
C/
15
N-labeled corepressor
removed the ambiguities in the assignment of core-
pressor a and b-resonances. The assignments of the
proton resonances of the DNA result from the analysis
of two samples, one with fully deuterated repressor
and the other with fully deuterated repressor except
for Lys, Ile and Thr residues. These assignments were
confirmed by the use of a
13
C filtered NOESY of a
uniformly
13
C labeled Trp repressor complex. 96% of
the proton resonances of the operator were assigned,
except H4
0
,H5
0
and H5
00
resonances. The assignment
of the resonances of the repressor is based on the
analysis of NOESY spectra of deuterated analogs
and
15
N HMQC NOESY. The side-chain assignments
were confirmed by 3D HCCH-TOCSY and 3D
NOESY-HMQC of a uniformly
13
C labeled repressor
complex. The HNCO experiment was useful for the
determination of the secondary structure of the repres-
sor. 93% of the backbone resonances and 85–90% of
the side-chain resonances (except residues 2–17)
were assigned. The nonassigned resonances are
mostly from Asx, Glx and Ser residues.
Most of the protein–DNA contacts are made by the
turn between helices D and E and the N-terminus of
helix E while the backbone of helix D is outside the
DNA major groove (Fig. 12). The overall topology of
the solution structure as well as the specific amino
acid side-chain DNA interactions are in agreement
with the crystal structure. But in contrast to the crystal
structure the NMR data identify eight potential direct
hydrogen bonds involving residues Arg69, Lys72,
Ile79, Ala80, Thr83 and Arg84. In the crystal struc-
ture, most of the protein–DNA contacts are water
mediated. No long-lived (lifetimes greater than
50 ms) water molecules have been detected in the
NMR experiments. If water molecules are present at
the interface between the protein and the DNA, their
lifetime must be less than about 20 ms. The other
difference observed with the crystal structure
concerns the bend of the DNA. In solution, the
DNA bend is 5–108 larger than in the crystal structure.
Discrepancies between NMR and crystallographic
results can arise from the differences in the hydration
of the interface between the protein and the DNA in
solution and in the crystal as suggested by recent data
on the effects of osmolytes on the interaction between
the repressor and a DNA fragment [56].
There are three dynamic processes in the trp repres-
sor–DNA complex: protein–DNA association/disso-
ciation (half-life time of the complex about 180 s at
378C), fluctuation or ‘breathing’ of the helices D and
E (lifetime of about 1 s at 358C) and exchange of the
corepressor in and out of the complex (lifetime of
about 300 ms at 458C) [57]. The relatively fast
exchange of the corepressor in and out its binding
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114102
pocket is probably the mechanism by which the
repressor responds to the variation of l-trp concentra-
tion in the cell. Since the corepressor is buried
between the protein and the DNA and contacts both
macromolecules, there must be a significant motion of
the protein and/or DNA in the ligand-binding region
to allow this exchange. A proposed mechanism for
this exchange is the breathing of helix E or of the
HTH motif as helix E has been found to be flexible
in the complex (fast NH exchange rates) and as this
helix forms one face of the corepressor binding
pocket.
3.1.4. Ets
The Ets family of transcription factors regulates
gene expression during growth and development
including mammalian haematopoiesis and Drosophila
eye development. They share a conserved domain of
about 85 amino acids which binds to the DNA
sequence 5
0
-C/AGGAA/T-3
0
. The 3D structures of
the DNA binding domain of several Ets transcription
factors determined by NMR or X-ray crystallography
reveal a common overall fold similar to that of the E.
coli CAP i.e. a N-terminal a-helix, a four-stranded b-
sheet and a HTH motif. Werner and coworkers [58]
reported the 3D structure in solution of a complex
formed between the DNA binding domain of human
Ets-1 and a 17-mer DNA containing the GGAA motif.
In contrast to the X-ray structure of the DNA complex
of the Ets DNA binding domain of mouse Pu-1,
Werner and coworkers found the intercalation of a
Trp residue (replaced by Tyr in Pu-1) at a CpC step
5
0
to the GGAA motif and an opposite orientation of
the protein on the DNA. The misassignment of a reso-
nance at low field (12.33 ppm) to a DNA imino proton
instead of Tyr 86-OH led to these contradictory results
[59]. After correction, the NMR structure of the Ets1–
DNA binding domain complex was found to be simi-
lar to that of Pu-1–DNA binding domain complex.
The protein contacts the DNA by a loop-helix-loop
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114 103
Fig. 12. Mean structure of the solution trp repressor–DNA complex. The cofactor l-tryptophan molecules are marked in red. Adapted from
Fig. 7 of Ref. [29]. Reprinted with the permission of O. Jardetzky and of the publisher, Academic Press (q 1994).
motif, the second helix (H3) of the HTH motif
contacts the major groove at the GGAA sequence
while two loops contact the adjacent minor grooves
(Fig. 13). Residues Arg81, Gly82, Arg84 and Tyr85 of
the second helix of the HTH motif contact the bases of
the GGAA sequence. The loop between strands 3 and
4 of the b-sheet interacts with the sugar-phosphate
backbone of the adjacent 5
0
minor groove. Gln26,
Leu27, Trp28 at the N-terminus helix, Trp65 of the
first helix of the HTH motif, several Lys residues in
the turn of the HTH motif and Tyr86 of the second
helix of the HTH motif interact with the sugar-phos-
phate backbone of the adjacent 3
0
minor groove.
3.1.5. Myb
Transcription factors of the myb family regulate the
proliferation and differentiation of hematopoietic cells
at different levels and in different lineages. These
proteins share a common DNA binding domain and
recognize the same DNA sequence: PyAAC
T
/
G
G. The
minimal specific DNA binding domain is composed
of two repeats of 52 residues (named R2 and R3).
NMR structural studies of the R2R3 domain of three
different Myb proteins and of their interaction with
DNA have been published including mouse c-Myb,
human/chicken c-Myb and chicken B-Myb [60–63].
These studies show that while the structure of the R3
repeat is conserved between the three proteins and is
composed of three helices comprising a HTH motif,
the structure of the R2 repeat varies depending of the
protein. Especially, the C-terminal region of R2 which
forms the second helix of a HTH motif in mouse c-
Myb, exists in multiple conformations in human/
chicken c-Myb and in B-Myb and, the relative
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114104
Fig. 13. Solution structure of the ETS1-DBD/DNA complex. The protein is shown as a tube in green; the DNA is shown in yellow with the
central GGAA motif in magenta for guanine residues and in blue for adenine residues. Adapted from Fig. 4 of Ref. [59]. Reprinted with the
permission of A. Gronenborn and of Kluwer academic publishers (q 1997).
orientation of the first two helices varies between the
three proteins (from 20 to 408) [62,63]. Moreover, two
forms of the protein–DNA complex are found for
human/chicken c-Myb and B-Myb [12,63] in contrast
to mouse c-Myb. It has been suggested that the
conformational instability of the R2 repeat may be
necessary to bind to a number of different specific
DNA sequence.
3.2. Zinc fingers
The zinc finger motif was first identified in the tran-
scription factor TFIIIA. It consists of an a-helix and a
b-sheet stabilized in a compact structure by a tetra-
hedrally coordinated zinc ion and by a hydrophobic
pocket in the interior of the structure involving amino
acids that are well conserved. One zinc finger does not
bind DNA with high affinity, in order to do so, zinc
fingers are generally organized in tandem arrays (for
example the first three zinc fingers of TFIIIA [64]) or
associated with another DNA binding motif (for
example the proximal accessory region of ADR1
[65], the basic region of GAGA [66] and the C-term-
inal tail of GATA-1 [67]). The different classes of
DNA binding protein containing zinc fingers differ
in the number of zinc fingers, the distribution of Cys
and His residues involved in chelation of zinc, the
number of amino acid residues in the loop between
the chelation site and the monomeric/dimeric struc-
ture of the complex. The simple model of DNA recog-
nition by zinc finger proteins which proposes that each
zinc finger domain behaves like an independent
module with a limited number of amino acid side-
chain making hydrogen bonds with DNA bases and
thus contacting three to four base pairs was challenged
by the recent data on zinc finger protein–DNA
complexes.
3.2.1. TFIIIA
The transcription factor TFIIIA regulates the tran-
scription of the 5 S ribosomal RNA gene by RNA
polymerase III by binding specifically to a 50 bp
region within the coding sequence for 5 S RNA (the
internal control region). It also binds to the 5 S RNA
transcript and is involved in RNA storage and trans-
port. It contains a tandem array of nine Cys
2
/His
2
zinc
finger motifs. The first three zinc fingers constitute the
minimal DNA binding domain and are not involved in
RNA binding while fingers 4–7 are essential for RNA
binding but are not required for DNA binding. Fingers
8 and 9 are not essential for binding of either DNA or
RNA but are required for transcriptional activation.
The solution structure of a complex composed of
the first three zinc fingers with a 15 base pairs DNA
corresponding to nucleotides 79–93 of the X. laevis
5 S RNA gene has been determined [64].
The three zinc fingers bind in the major groove
contacting 13 base pairs and with no contact with
the minor groove (Fig. 14). Finger 1 binds at the 3
0
end of the DNA contacting five base pairs, in particu-
lar Trp 28 at helix position 2 makes extensive hydro-
phobic contacts with four bases on both strands of the
DNA and thus plays a key role in the orientation of
finger 1 in the major groove. Finger 2 is located over
the central GGG triplet. Finger 3, in contrast to other
zinc fingers, makes contact with the bases over its
entire helix from Thr 86 at position 21 to Arg 96 at
position 110. This study identified two new residues
belonging to the helix of the zinc finger motif and
making specific contacts with the bases: a Trp at posi-
tion 12 and an Arg at position 110. The linkers
between the zinc finger domain are shown to play
an important role in the stabilization of the protein–
DNA complex. Intensive mutations of amino acid
residues of the linkers show that these residues are
as important for the interaction with the DNA as resi-
dues contacting DNA bases. The linkers lose their
flexibility upon DNA binding [41] and thus adopt
well-defined conformations and pack against the adja-
cent zinc fingers in the complex. As a consequence
substantial protein–protein interfaces are formed
between adjacent zinc fingers and contribute to the
orientation of the zinc finger in the major groove of
the DNA. Differences in the protein–protein inter-
faces are reflected by the different orientation of
finger 1 in the major groove relative to fingers 2 and
3 (Fig. 15). This study demonstrates that high affinity
DNA binding is not only determined by specific side-
chain-base contacts but also depends indirectly on the
linker structure and on the interaction between the
zinc fingers. As has been observed in the case of
other protein–DNA complexes, the side-chain of
several residues involved in contact with the
bases (Lys26 and Lys29 in finger 1, His58 and
His59 in finger 2 and Lys92 in finger 3) appear to
fluctuate between multiple conformations. Wuttke
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114 105
and coworkers [64] proposed that this flexibility
could be advantageous for the entropic cost as con-
formational restriction of a Lys side-chain upon bind-
ing has an estimated entropic cost of about
3 kcal mol
21
.
3.2.2. ADR1
The yeast transcription factor ADR1 from Saccha-
romyces cerevisiae regulates the expression of genes
governing the carbon source metabolism. The mini-
mal DNA binding domain of ADR1 contains two
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114106
Fig. 15. Relative orientation of Fingers 1, 2 and 3 of zf1-3 in the DNA major groove (Fig. 14 from Ref. [64]). Reprinted with the permission of
P.E. Wright and of the publisher, Academic Press (q 1997).
Fig. 14. Stereo view of the mean structure of the zf1-3/DNA complex with the side-chains of residues contacting DNA displayed. Adapted from
Fig. 6 of Ref. [64]. Reprinted with the permission of P.E. Wright and of the publisher, Academic Press (q 1997).
Cys
2
/His
2
zinc fingers and an additional 20 amino acid
residues sequence (named PAR), N-terminal to the
first zinc finger. It binds to a 28 base pairs DNA
(UAS1) containing two symmetric and opposed
nucleotide binding sites, first identified in the
glucose-repressible dehydrogenase gene ADH2. The
global fold of the ADR1 DNA binding domain bound
to the 28 base pairs UAS1 has been determined using
the methodology based on the observation of NH–NH
NOEs for perdeuterated proteins [65]. The NOEs
observed for the zinc finger motifs indicate that no
large structural change occurs in the zinc finger
upon DNA binding. In contrast, the PAR unstructured
region in the free protein becomes ordered and
consists of three antiparallel strands. The use of a
perdeuterated protein also provides identification of
intermolecular contacts between the protein and the
DNA by the observation of NOEs between
15
N
attached protons from the protein and aliphatic DNA
protons in
15
N edited NOESY experiments. Numerous
NOEs are found between protons of PAR and DNA
protons from base pairs preceding the GAGG
sequence contacted by the zinc fingers. A model of
the ADR1 DNA binding domain bound to the UAS1 is
proposed based on the global fold, on observed intra-
and intermolecular NOE contacts, on residues and
bases contacts inferred from mutagenesis experiments
and on the homology with the Zif 268–DNA complex
(Fig. 16). In this model, the 21, 13 and 16 helix
positions of finger 1 and 21, 12 helix positions of
finger 2 recognize the core sequence G(A/G)GG in
each ADR1 binding site and residues Arg95, Gly94,
Lys100 and Leu101 of PAR contact the DNA. It is
proposed that residues of PAR make essentially non-
specific and phosphate backbone DNA contacts and
thus the role of PAR is to increase DNA binding affi-
nity by adding non-specific DNA contacts to the
limited number of contacts made by fingers 1 and 2.
3.2.3. GATA-1
GATA-1 was the first discovered member of the
GATA family of transcription factors. It regulates
the transcription of genes involved in red-cell devel-
opment and has been demonstrated to be essential for
normal erythroid development. GATA-1 displays at
least four functions: activation of the erythrocytic and
megakaryocytic specific genes, regulation of the epsi-
lon–gamma globin switch and control of the cell
cycle. It contains two zinc fingers with the following
topology: Cys-X
2
-Cys-X
17
-Cys-X
2
-Cys. They share
50% sequence identity and, bind to the consensus
sequence (T/A)GATA(A/G). The C-terminal zinc
finger is necessary and sufficient for high-affinity
sequence specific DNA binding. The N-terminal
zinc finger increases the stability and specificity of
DNA binding and can bind DNA at double (T/
A)GATA(A/G) motifs with the C-terminal zinc
finger. This N-terminal zinc finger is implicated in
specific protein–protein interactions with other zinc
finger proteins.
The solution structure of a complex formed
between a 66 residues fragment of GATA-1 contain-
ing the C-terminal zinc finger and a 36 residues frag-
ment C-terminal to the last Cys and a 16 base pairs
DNA containing the consensus sequence AGATAA
has been determined using double and triple NMR
experiments (Fig. 17) [67]. The protein makes contact
with eight DNA bases essentially in the major groove
(only one is a minor groove base). The helix and the
loop connecting the b-strands 2 and 3 are involved in
the major groove contacts while the C-terminal tail
contacts the minor groove. Most of the interactions
are hydrophobic (predominance of Thy in the
DNA), there are only three hydrogen bonds: Asn29
contacts Ade24 and Ade8 in the major groove and
Lys57 contacts Thy9 in the minor groove.
3.2.4. GAGA
The minimal DNA binding domain of GAGA
comprises one Cys2/His2 zinc finger motif preceded
by two highly basic regions BR1 (7 residues) and BR2
(5 residues) and binds to the sequence GAGAGAG.
The solution structure of the complex between the
minimal DNA binding domain and a 11 base pairs
DNA containing the nucleotide sequence
G
4
AGAGAG
8
[66] reveals that the additional contacts
required for high-affinity DNA binding are made by
the N-terminal fragment containing the two basic
regions BR1 and BR2 (Fig. 18). Arg27 of BR2
contacts G8 in the major groove and Arg14 and
Lys16 of BR1 contact A7 in the minor groove.
Arg51, Asn48 and Arg47 at positions 16, 13 and
12 of the zinc finger helix contacts Gua4, Ade5 and
Gua6, respectively, in the major groove. All the DNA
bases of the G4AGAG8 sequence are contacted by the
protein.
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114 107
3.3. Minor groove-binding architectural proteins
These proteins play crucial roles in the assembly of
large protein–DNA complexes. They bend DNA by
interacting exclusively with the minor groove and
thereby, in multiprotein–DNA complexes, they
bring distantly bound proteins in close proximity
and thus facilitate the interaction between them. The
solution structures of three minor groove-binding
architectural proteins bound to their DNA recognition
sites have been determined and include: the male sex
determining factor SRY, the lymphoid enhancer bind-
ing factor 1 (LEF-1) and the high mobility group I(Y)
[68–71].
3.3.1. SRY
In mammals, the male sex determination switch is
controlled by a single gene on the Y chromosome,
SRY (for sex-determining region Y). SRY encodes a
protein with an HMG-like DNA-binding domain,
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114108
Fig. 17. Two views of the cGATA-1-DBD/DNA complex. The color coding for the DNA bases is red for A, lilac for T, dark blue for G and light
blue for C. Adapted from Fig. 5 of Ref. [67]. Reprinted with the permission of A. Gronenborn and of Science (q 1993).
Fig. 16. A model of ADR1-DBD bound to UAS1. The position of the zinc fingers on the DNA binding site is modeled from the Zif268–DNA
complex and from change-of-specificity experiments. The structure of the N-terminal region is the average structure taken from the global fold
of ADR1-DBD and positioned with relation to the binding site on the basis of NOE contacts observed in the 3D
15
N-edited NOESY spectra.
(Fig. 7 of Ref. [68]). Reprinted with the permission of R. Klevit and of Nature Publishing Group, New York (q 1999).
which probably acts as a local organizer of chromatin
structure. It is believed to regulate downstream genes
in the sex determination cascade such as the Mu¨llerian
inhibiting substance (MIS) gene. Clinical mutations in
the HMG box of SRY are associated with failure of
testicular morphogenesis leading to male to female
sex reversal. The solution structure of the complex
between the DNA binding domain of SRY and a
DNA octamer containing its binding site in the MIS
promoter (d(GCACAAAC)
2
) shows that the DNA is
bent by about 70–808 in the direction of the major
groove (Fig. 19), the DNA helix is unwound and the
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114 109
Fig. 18. Three views of GAGA-DBD/DNA complex. GC base pairs are colored in red and AT base pairs in blue. The side-chains contacting the
DNA are in yellow and the histidine and cysteine side-chains coordinating the zinc (blue sphere) are in magenta. Adapted from Fig. 6 of Ref.
[66]. Reprinted with the permission of A. Gronenborn and of Nature Publishing Group, New York (q 1997).
minor groove is widened (by about 3.2 A
?
compared
with B-DNA) [70]. The protein has a twisted letter L
or boomerang shape with irregular N- and C-terminal
strands and three helices. The long arm of the L is
formed by helix 3 and the N-terminal strand and the
short arm by helices 1 and 2 with helices 2 and 3
approximately orthogonal to each other. The convex
surface of the DNA is perfectly adjusted to the
concave binding surface of the protein made by
helices 1 and 3 bounded at the bottom by a ridge
containing helix 2 and at the top by a ridge containing
the N- and C-terminal strands. Phe12 and the partial
intercalated Ile13 interact with base pairs 5 and 6 to
induce bending in the center of the octamer while
Met9 and Trp43 interact with the riboses of base
pairs 5 and 6 to pry open the minor groove. The
bend induced at base pairs 2 and 3 results from the
packing of Tyr74 with bases of Ade3 and Thy14.
Seven residues (Asn10, Phe12, Ile13, Ser33, Ile35,
Ser36 and Tyr74) are involved in specific interactions
with the DNA bases. In addition numerous electro-
static and hydrophobic interactions involving the
phosphate backbone and the sugars as well as 11
amino acid residues contribute to the stabilization of
the bend conformation of the DNA.
3.3.2. LEF-1
Lymphoid enhancer-binding factor 1 (LEF-1) is a
pre-B and T lymphocyte-specific nuclear protein that
participates in the regulation of the T-cell antigen
receptor (TCR) alpha enhancer by binding to the
nucleotide sequence 5
0
-CCTTTGAA-3
0
. The NMR
solution structure of the HMG domain of LEF-1
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114110
Fig. 19. Structure of the human SRY–DNA complex. The DNA
bases are colored in red for A, lilac for T, dark blue G, light blue
C. Adapted from Fig. 3 of Ref. [70]. Reprinted with the permission
of A. Gronenborn. Copyright (1995) held by Cell Press.
Fig. 20. Stereo view of the LEF-1 HNG domain complexed with DNA. Adapted from Fig. 2 of Ref. [71]. Reprinted with the permission of
P.E. Wright and of Nature (q 1995).
bound to a 15 base pair DNA containing the optimal
binding site for the TCR alpha gene enhancer shows
similar features with the structure of the SRY–DNA
complex (Fig. 20) [71]. The protein exhibits a similar
L-shape and the DNA binds to the concave surface of
the protein and is bent towards the major groove and
away from the protein. The DNA bend is larger by
about 40–508 than for the SRY–DNA complex. The
bend in the center of the DNA is essentially induced
by the partial insertion of Met10 in the stack between
Ade23 and Ade24, and by the interaction of Met13
with Ade24 as well as by the packing of Phe9 against
the ribose of Thy8. Tyr75 seems to play a key role in
the DNA bending at base pairs 10 and 11 as it is
inserted into the narrowed region of the minor groove
and interacts with the ribose rings of Gua12 and
Thy21 as well as with the bases Gua12, Thy20 and
Thy21. In addition to these minor groove interactions
and in contrast to the SRY–DNA complex, residues
from the highly basic C-terminal region bind into the
major groove by making non-specific interactions
with the DNA phosphate backbone.
3.3.3. HMG-I(Y)
HMG-I(Y) is a member of a distinct family of “high
mobility group” (HMG) proteins that are non-histone
chromatin-associated proteins initially characterized
by high electrophoretic mobility in polyacrylamide
gels (hence the acronym HMG). HMG-I(Y) plays an
essential role in the assembly and function of the IFN
beta gene enhancement in particular by recruiting NF-
kappaB, ATF-2/c-Jun and IRFs. HMG-I(Y) preferen-
tially binds to stretches of AT-rich sequence. In
contrast to the other known structures of minor groove
architectural proteins, HMG-I(Y) preserves the B-
form of the DNA and has been shown to participate
in the reversal of intrinsic DNA bends in the IFN beta
gene enhancer. It comprises three DNA binding
domains containing the AT-hook motif. Only two of
these domains are required for binding. Therefore, the
interaction between a fragment of HMG-I(Y)
comprising the second and third DNA binding
domains (HMG-I(2/3)) and a 12 base pairs oligonu-
cleotide containing the PRDII site of the interferon-b
promoter has been studied [69]. As two molecules of
the PRDII dodecamer binds HMG-I(2/3), the solution
structure of the 2:1 (DNA:protein) complex has been
determined. The conformation in the dodecamer is
essentially B-type with a small widening (about 1–
1.5 A
?
) of the minor groove compared to B-DNA. The
extended core sequence Arg-Gly-Arg of each DNA
binding domain makes specific contacts with DNA
bases into the minor groove of the DNA while the
lysine and arginine residues flanking this core make
electrostatic and hydrophobic interactions with the
DNA phosphate backbone. In the case of the second
DNA binding domain, the contact surface with
the DNA is larger due to additional interactions with
both edges of the minor groove made by six amino
acids C-terminal to the core sequence. This difference
can explain the higher affinity for the dodecamer (up
to two-fold greater) of the second DNA binding
domain.
3.4. Recognition using b-sheet
Among the structures of protein–DNA complexes
determined so far, few of them show proteins which
bind DNA via a b-sheet. As a b-sheet is not flat but
has a curvature, two modes of DNA binding can be
found for a two-stranded b-sheet: either the convex or
the concave side of the b-sheet faces the DNA. The
convex mode is used by MetJ and Arc repressors.
These proteins are dimers and a b-strand from each
subunit is intertwined to form a two-stranded antipar-
allel b-sheet, which fits the concave face of the DNA
major groove. Recently, the solution structure of two
complexes with the protein (Tn916 integrase and
GCC-box binding domain) recognizing the major
groove of DNA using the convex side of a b-sheet,
have been published [72,73].
3.4.1. Tn916 integrase
The integrase protein from the conjugative transpo-
son Tn916 is a member of the integrase family of site-
specific recombinases. Its role is essential during
transposition as it performs the DNA strand cleavage
and joining reactions. The solution structure of the
minimal N-terminal DNA binding domain complexed
with a 13-mer DNA containing the DNA binding site
within the transposon arm was solved (Fig. 21) [72].
The structure of the DNA binding domain consists of
a three-stranded antiparallel b-sheet connected by a
large loop (18 amino acid residues) to a C-terminal a-
helix. A large N-terminal loop precedes the b-sheet.
The second and third strands of the b-sheet as well as
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114 111
the turn between the first two strands form a concave
surface that fits into the major groove. Five amino acid
residues (Leu26, Lys28, Pro36, Phe38 and Tyr40)
belonging to the second and third strands interact
specifically with four consecutive DNA base pairs.
Non-specific interactions between residues of the
first strand, of the N-terminal loop, of the loop
between the third strand and the a-helix and the
DNA phosphate backbone anchor the two strands of
the b-sheet to the DNA. The DNA is bent towards the
protein by 358 thereby facilitating non-specific
contacts between base pairs at the 3
0
end of the duplex
and residues of the loop between the third strand and
the b-helix.
3.4.2. GCC-box binding domain
Ethylene-responsive element-binding proteins
(EREBPs) have novel DNA-binding domains (ERF
domains), which are widely conserved in plants, and
interact specifically with sequences containing
AGCCGCC motifs (GCC box). The solution structure
of the GCC-box binding domain of a protein from
Arabidopsis thaliana free and in complex with a 13-
mer duplex containing the GCC-box has been deter-
mined [73]. The structure of the protein consists of a
three-stranded antiparallel b-sheet packed along an
a-helix, the axis of the helix being approximately
parallel to the second strand of the b-sheet. The
protein binds to the major groove of the DNA via its
b-sheet. A close fit between the protein and the DNA
is obtained by the curvature of the b-sheet, which
follows the DNA axis and by the DNA bent towards
the major groove (about 208). Nine consecutive DNA
base pairs are contacted by seven amino acid residues.
Six amino acid residues of the three strands of the b-
sheet and Arg154 from the turn between b-strands 1
and 2 are involved in specific interactions: four argi-
nine residues from the three strands make hydrogen
bonds with five guanine bases and, two tryptophans
are involved in hydrophobic interaction with DNA
bases. All these Arg and Trp residues (except
Arg152) as well as additional residues from the b-
sheet contact the phosphate backbone and the sugars.
Only one residue from the a-helix contacts non-speci-
fically the phosphate backbone.
4. Perspectives
As observed in the past, future structural studies of
protein–DNA interactions will continue to benefit
from the development in labeling methods and
NMR technology [74 and references therein].
The availability of
15
N,
13
C labeled DNA has
already proven its usefulness in obtaining detailed
conformational information on DNA structure in a
complex of the BS2 operator with the Antp homeo-
domain [75]. Future applications of labeled DNA
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114112
Fig. 21. Stereo view of the solution structure of the complex between the N-terminal domain of integrase and DNA showing all ordered amino
acids side-chains at the protein–DNA interface (Thr15, Ser18, Lys21, Leu26, Lys28, Ile30, Pro36, Phe38, Tyr40, Trp42, Lys54). The protein
backbone is colored in red and the side-chains in yellow. Adapted from Fig. 2 of Ref. [72]. Reprinted with the permission of R.T. Clubb and of
Nature Publishing Group, New York (q 1999).
include the direct observation of NH···N hydrogen
bonds between nucleic acids and amino acids
and the study of DNA duplex dynamics upon
complexation.
Recent work has been performed on the stable
isotope labeling of peptide segments in a protein
sample by means of either protein splicing using a
protein splicing element, intein or in vitro chemical
ligation of expressed protein domains. The develop-
ment of these labeling techniques combined with the
recently TROSY and CRINEPT techniques will assist
in studies of larger complexes including investigation
of the influence of the other domains within the
protein (like the phosphorylation site and the activa-
tion domain) or other domains of a complex partner
upon binding to the DNA. Thereby the role of
protein–protein interactions could be tackled.
References
[1] M. Billeter, Y.Q. Qian, G. Otting, M. Mu¨ller, W. Gehring, K.
Wu¨thrich, J. Mol. Biol. 234 (1993) 1084.
[2] V.P. Chuprina, J.A.C. Rullmann, R.M.J.N. Lamerichs, J.H.
van Boom, R. Boelens, R. Kaptein, J. Mol. Biol. 234 (1993)
446.
[3] G. Wider, Prog. NMR Spectrosc. 32 (1998) 193.
[4] C.H. Arrowsmith, Y.S. Wu, Prog. NMR Spectrosc. 32 (1998)
277.
[5] M. Sattler, J. Schleucher, C. Griesinger, Prog. NMR Spec-
trosc. 34 (1999) 93.
[6] A. Ono, S. Tate, Y. Ishido, M. Kainosho, J. Biomol. NMR 4
(1994) 581.
[7] M.J. Michnicka, J.W. Harper, G.C. King, Biochemistry 32
(1993) 395.
[8] C. Fernandez, T. Szyperski, A. Ono, H. Iwai, S. Tate, M.
Kainosho, K. Wu¨thrich, J. Biomol. NMR 12 (1998) 25.
[9] D.P. Zimmer, D.M. Crothers, Proc. Natl. Acad. Sci. USA 92
(1995) 3091.
[10] J.E. Masse, P. Bortmann, T. Dieckmann, J. Feigon, Nucleic
Acids Res. 26 (1998) 2618.
[11] J.M. Louis, R.G. Martin, G.M. Clore, A.M. Gronenborn, J.
Biol. Chem. 273 (1998) 2374.
[12] N. Jamin, V. Le Tilly, L. Zargarian, A. Bostad, I. Besanc?on-
Yospe, P.-N. Lirsac, O.S. Gabrielsen, F. Toma, Int. J. Quan-
tum Chem. 59 (1996) 333.
[13] M. Schmiedeskamp, P. Rajagopal, R.E. Klevit, Prot. Sci. 6
(1997) 1835.
[14] M.P. Foster, D.S. Wuttke, K.R. Clemens, W. Jahnke, I.
Radhakrishnan, L. Tennant, M. Reymond, J. Chung, P.E.
Wright, J. Biomol. NMR 12 (1998) 51.
[15] H. Aihara, Y. Ito, H. Kurumizaka, S. Yokoyama, T. Shibata, J.
Mol. Biol. 290 (1999) 495.
[16] G.M. Dhavan, J. Lapham, S. Yang, D.M. Crothers, J. Mol.
Biol. 288 (1999) 659.
[17] X. Luo, D.G. Sanford, P.E. Bullock, W.W. Bachovchin, Nat.
Struct. Biol. 3 (12) (1996) 1034.
[18] T. Mau, J.D. Baleja, G. Wagner, Prot. Sci. 1 (1992) 1493.
[19] M.R. Gryk, O. Jardetzky, J. Mol. Biol. 255 (1996) 204.
[20] G. Otting, K. Wuthrich, Q. Rev. Biophys. 23 (1) (1990) 39.
[21] G.M. Clore, A.M. Gronenborn, Prot. Sci. 3 (1990) 372.
[22] G.W. Vuister, S-J. Kim, C. Wu, A. Bax, J. Am. Chem. Soc.
116 (1994) 9206.
[23] C. Zwahlen, P. Legault, S.J.F. Vincent, J. Greenblatt, R.
Konrat, L.E. Kay, J. Am. Chem. Soc. 119 (1997) 6711.
[24] W. Lee, M.J. Revington, C. Arrowsmith, L.E. Kay, FEBS
Lett. 350 (1994) 87.
[25] A. Bax, S. Grzesiek, A.M. Gronenborn, G.M. Clore, J. Magn.
Res., Ser A 106 (1994) 269.
[26] M. Ikura, G.M. Clore, A.M. Gronenborn, G. Zhu, C.B. Klee,
A. Bax, Science 256 (1992) 632.
[27] J.E. Masse, F.H.T. Allain, Y.M. Yen, R.C. Johnson, J. Feigon,
J. Am. Chem. Soc. 121 (1999) 3547.
[28] C. Arrowsmith, Y.S. Wu, Prog. NMR Spectrosc. 32 (1998)
277.
[29] H. Zhang, D. Zhao, M. Revington, W. Lee, X. Jia, C. Arrow-
smith, O. Jardetzky, J. Mol. Biol. 238 (1994) 592.
[30] T. Yamazaki, W. Lee, C.H. Arrowsmith, D.R. Muhandiram,
L.E. Kay, J. Am. Chem. Soc. 116 (1994) 11 655.
[31] X. Shan, K.H. Gardner, D.R. Muhandiram, N.S. Rao, C.H.
Arrowsmith, L.E. Kay, J. Am. Chem. Soc. 118 (1996) 6570.
[32] X. Shan, K.H. Gardner, D.R. Muhandiram, L.E. Kay, C.
Arrowsmith, J. Biomol. NMR 11 (1998) 307.
[33] P. Zhou, L.J. Sun, V. Do¨tsch, G. Wagner, G.L. Verdine, Cell
92 (1998) 687.
[34] K. Pervushin, R. Riek, G. Wider, K. Wu¨thrich, Proc. Natl.
Acad. Sci. USA 94 (1997) 12 366.
[35] M. Salzmann, K. Pervushin, G. Wider, H. Senn, K. Wu¨thrich,
Proc. Natl. Acad. Sci. USA 95 (1998) 13 585.
[36] M. Salzmann, G. Wider, K. Pervushin, H. Senn, K. Wu¨thrich,
J. Am. Chem. Soc. 121 (1999) 844.
[37] N. Tjandra, J.G. Omichinski, A.M. Gronenborn, G.M. Clore,
A. Bax, Nat. Struct. Biol. 4 (1997) 732.
[38] M. Ottiger, N. Tjandra, A. Bax, J. Am. Chem. Soc. 119 (1997)
9825.
[39] M.W.F. Fischer, A. Majumdar, E.R.P. Zuiderweg, Prog.
NMR. Spectrosc. 33 (1998) 207.
[40] M. Slijper, R. Boelens, A.L. Davis, R.N.H. Konings, G.A. van
der Marel, J.H. van Boom, R. Kaptein, Biochemistry 36
(1997) 249.
[41] M.P. Foster, D.S. Wuttke, I. Radhakrsihnan, D.A. Case,
J.M. Gottesfeld, P.E. Wright, Nat. Struct. Biol. 4 (1997)
605.
[42] M. Billeter, Prog. NMR Spectrosc. 27 (1995) 635.
[43] G. Otting, Prog. NMR Spectrosc. 31 (1997) 259.
[44] M. Billeter, P. Gu¨ntert, P. Luginbu¨hl, K. Wu¨thrich, Cell 85
(1996) 1057.
[45] Y.Q. Qian, G. Otting, K. Wu¨thrich, J. Am. Chem. Soc. 115
(1993) 1189.
[46] J.G. Omichinski, G.M. Clore, O. Schaad, G. Felsenfeld, C.
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114 113
Trainor, E. Appelle, S.J. Stahl, A.M. Gronenborn, Science 261
(1993) 438.
[47] G.M. Clore, A. Bax, J.G. Omichinski, A.M. Gronenborn,
Structure 2 (1994) 89.
[48] J.M. Gruschus, D.H.H. Tsao, L.-H. Wang, M. Nirenberg, J.A.
Feretti, J. Mol. Biol. 289 (1999) 529.
[49] G. Otting, Y.Q. Qian, M. Billeter, M. Mu¨ller, M. Affolter,
W.J. Gehring, K. Wu¨thrich, EMBO J. 9 (1990) 3085.
[50] Y.Q. Qian, G. Otting, M. Billeter, M. Mu¨ller, W. Gehring, K.
Wu¨thrich, J. Mol. Biol. 234 (1993) 1070.
[51] E. Fraenkel, C.O. Pabo, Nat. Struct. Biol. 5 (1998) 692.
[52] J.M. Gruschus, J.A. Ferretti, J. Magn. Reson. 135 (1998) 87.
[53] M. Slijper, A.M.J.J. Bonvin, R. Boelens, R. Kaptein, J. Mol.
Biol. 259 (1996) 761.
[54] R.S. Spolar, M.T. Record, Science 263 (1994) 777.
[55] C.A.E.M. Spronk, A.M.J.J. Bonvin, P.K. Radha, G. Melacini,
R. Boelens, R. Kaptein, Structure 7 (1999) 1483.
[56] M.P. Brown, A.O. Grillo, M. Boyer, C. Royer, Prot. Sci. 8
(1999) 1276.
[57] W. Lee, M. Revington, N.A. Farrow, A. Nakamura, N. Utsu-
nomiya-Tate, Y. Miyake, M. Kainosho, C. Arrowmith, J.
Biomol. NMR 5 (1995) 367.
[58] M.H. Werner, G.M. Clore, C.L. Fisher, R.J. Fisher, L. Trinh, J.
Shiloach, A.M. Gronenborn, Cell 83 (1995) 761.
[59] M.H. Werner, G.M. Clore, C.L. Fisher, R.J. Fisher, L. Trinh, J.
Shiloach, A.M. Gronenborn, J. Biomol. NMR 10 (1997) 317.
[60] N. Jamin, O.S. Gabrielsen, N. Gilles, P.-N. Lirsac, F. Toma,
Eur. J. Biochem. 216 (1993) 147.
[61] K. Ogata, S. Morikawa, H. Nakamura, A. Sekikawa, T.
Inoue, H. Kanai, A. Sarai, S. Ishii, Y. Nishimura, Cell 79
(1994) 639.
[62] P.B. McIntosh, T.A. Frenkiel, U. Wollborn, J.E. McCornick,
K.-H. Klempnauer, J. Feeney, M.D. Carr, Biochemistry 37
(1998) 9619.
[63] I. Segalas, S. Desjardins, H. Oulyadi, Y. Prigent, S. Tribouil-
lard, E. Bernardi, A.R. Schoofs, D. Davoust, F. Toma, J.
Chim. Phys. 96 (1999) 1580.
[64] D.S. Wuttke, M.P. Foster, D.A. Case, J.M. Gottesfeld, P.E.
Wright, J. Mol. Biol. 273 (1997) 183.
[65] P.M. Bowers, L.E. Schaufler, R.E. Klevit, Nat. Struct. Biol. 6
(1999) 478.
[66] J.G. Omichinski, P.V. Pedone, G. Felsenberg, A.M. Gronen-
born, G.M. Clore, Nat. Struct. Biol. 4 (1997) 122.
[67] J.G. Omichinski, G.M. Clore, O. Schaad, G. Felsenberg, C.
Trainor, E. Appelle, S.J. Stahl, A.M. Gronenborn, Science 261
(1993) 438.
[68] C.A. Bewley, A.M. Gronenborn, G.M. Clore, Annu. Rev.
Biophys. Biomol. Struct. 27 (1998) 105.
[69] J.R. Huth, C.A. Bewley, M.S. Nissen, J.N.S. Evans, R.
Reeves, A.M. Gronenborn, G.M. Clore, Nat. Struct. Biol. 4
(1997) 657.
[70] M.H. Werner, J.R. Huth, A.M. Gronenborn, G.M. Clore, Cell
81 (1995) 705.
[71] J.J. Love, X. Li, D.A. Case, K. Giese, R. Grosschedl, P.E.
Wright, Nature 376 (1995) 791.
[72] J.M. Wojciak, K.M. Connolly, R.T. Clubb, Nat. Struct. Biol. 6
(1999) 366.
[73] M.D. Allen, K. Yamasaki, M. Ohme-Takagi, M. Tateno, M.
Suzuki, EMBO J. 17 (1998) 5484.
[74] G. Wider, K. Wu¨thrich, Curr. Opin. Struct. Biol. 9 (1999) 594.
[75] C. Fernandez, T. Szyperski, M. Billeter, A. Ono, H. Iwai, M.
Kainosho, K. Wu¨thrich, J. Mol. Biol. 292 (1999) 609.
N. Jamin, F. Toma / Progress in Nuclear Magnetic Resonance Spectroscopy 38 (2001) 83–114114