How do we use computational methods to analyze, predict, or design protein sequences and structures? Theme: Methods based on physics vs. methods based on our accumulated empirical knowledge of protein properties 7.91 Amy Keating For a molecular simulation or model you need: 1. A representation of the protein 2. An energy function 3. A search algorithm or optimizer Covalent Potential Energy Terms U Covalent = U bond + U angle bond + U dihedral improper + U torsion 2 U bond = ∑ 1 k b (b ? b 0 ) bonds 2 2 U angle bond = ∑ 1 k θ (θ ? θ ) angles bond 2 0 U dihedral improper = ∑ 1 dihedrals improper 2 k Φ ( Φ ? Φ 0 ) U torsion = ∑ 1 k φ [1 + cos(nφ ? δ )] torsions 2 Brooks et al., J. Comput. Chem. 4: 187-217 (1983) 2 Non-Covalent Potential Energy Terms U Non -covalent = U vdW + U elec Lennard-Jones potential 12 ij ij r B 6 ij ij r C ? ? B ij U vdW = ∑ ? 12 ? C ij ? ? inull j ? r ij r ij 6 ? “accurate” approximate q i q j U elec = ∑ Coulomb’s law i null j εr ij The potential energy surface is a 3N-6 dimensional space. For a protein, we assume a single native-structure minimum. There are many local minima, and some may be close in energy to the global minimum. E n e r g y X Sampling the Potential Energy Surface ? Energy minimization – “downhill” search, generally to nearest local minimum – can be used to relax structures – might be useful to define local changes due to mutation ? Normal mode analysis – defines “characteristic motions”, which are distortions about a local minimum structure – orders motions “easy” (low frequency) to “hard” (high) ? Molecular dynamics – movie of motion at given temperature (300 K) – equivalent to statistical mechanical ensemble ? Monte Carlo/Simulated Annealing – Describe properties of the landscape and thermodyanmic parameters without simulating how the molecular actually moves Energy Minimization Conformational Space, R Potential Energy, U(R) X-ray structure X ? Iterative procedures; terminate when reach tolerance, such as small ( F i = ?? r U ) i gradient ? Poor initial structure leads r i+ 1 = r i +δ F i to poor local minimum ? Multiple minimum problem ONLY FINDS LOCAL MINIMA! Uses of simple minimization 1. The “minimum perturbation approach” to modeling a mutation – Assume structure of single-site mutant is close to known wild-type structure ? Find stable conformations for mutant side chain in context of wild-type protein ? Use energy minimization to relax candidate structures (all degrees of freedom) Shih, Brady, and Karplus, Proc. Natl. Acad. Sci. USA 82: 1697–1700 (1985); hemagglutinin Gly to Asp mutation modeled accurately 2. Relieving strain before analyzing the energy of an experimental or predicted structure 3. Structure building and refinement when solving structures using X-ray crystallography or NMR Normal Mode Analysis ? Characteristic Motions and their Relative Ease ? Thermodynamic Properties Mathematical Approximation: Series of Independent Harmonic Oscillators 0 ? 2 U ( ( ( ? ?R U ) = R U 0 ) ? + R U 0 )( R R 0 ) + 1 2 ∑∑ ? ? ( r r i 0 , )(r j ? r j 0 , ) + null i j > i i r r ji Corresponds to Local Expansion of Potential Surface as Parabolic U(R ) actual harmonic approximation R Normal Modes Locate “Easy” Deformations U(R) R ? Low-frequency, energetically easy motions will dominate the dynamical behavior of macromolecules Normal modes of proteins sometimes correspond to biologically relevant motions ? “shearing” or “hinging” conformational changes are common in enzymes ? Can compare structure changes computed from NMA with alternate conformations observed experimentally. Frequently a low-frequency mode describes the change (but it may not be the lowest energy mode) ? NMA is a way to get an idea about motion from a static structure Molecular Dynamics Simulations ? Simulate motions of molecules as a function of time – Collect short “movie” of protein “swimming” in solution ? Can use the simulation to compute: – Average structure (at given temperature, T) – Atomic fluctuations (at T) – Thermodynamic quantities (temperature dependent) m ? 2 x i F i F i = m i a i a i = =? ? i U i ?t 2 m i null null r (t +δt) = r(t) +δ null (tv t +δt /2) null v(t +δt /2) = null v (t ?δt /2) +δt null a(t) Verlet “leap frog” algorithm Running a Molecular Dynamics Simulation ? Minimize initial coordinates ? Assign random starting velocities from Maxwell-Boltzmann distribution 1/2 2 ? p(v i )= ? ? m i ? ? exp ? ? ? ?m i v i ? ? ? 2πkT ? ? 2kT ? ?Use small time step to integrate Newton’s equations of motion (1 fs =1 x 10 -15 sec is typical) ? Equilibration (running dynamics until system equilibrates) ? Production dynamics (useful dynamics at a desired temp) Sampling conformational space using Monte Carlo Monte Carlo randomly samples configurations, rather than simulating how the molecule would actually move in time. 1. Begin with a random (energy-minimized) conformation 2. Evaluate the energy = E 0 3. Make a random conformational change (e.g. rotation around a bond) 4. Evaluate the new energy = E’ 5. DECISION: if E’ < E 0 ACCEPT the move, set E 0 = E’, go to 3 if E’ > E 0 AND if exp[-(E’-E 0 )/kT] > random # then ACCEPT the move, set E 0 = E’, go to 3 else reject the move, go to 3 6. Proceed for an arbitrary amount of time The acceptance criterion is such that the conformations generated sample the Boltzman distribution. What are Monte Carlo and Molecular Dynamics Useful For? 1. Exploring the energy landscape What is the global minimum? What are other minima? What are the barriers between minima? 2. Computing thermodynamic quantities 3. “Observing” pathways connecting different states What are the differences between MD and MC? Molecular dynamics and Monte Carlo can be used for many of the same purposes. If you want to know how a system evolves in time, you must use MD. MC can be better at sampling broad conformational spaces because it makes random moves that can get you out of local minima. A Dynamic Model for the Allosteric Mechanism of GroEL MD simulation used to probe the mechanism of conformational change in the chaperonin GroEL that occurs upon binding ATP and GroES. Ma, J, PB Sigler, Z Xu, and M Karplus. "A Dynamic Model for The Allosteric Mechanism of GroEL." Please see figure 1a of J Mol Biol. 302, no. 2 (15 September 2000): 303-13. Shown are conformational changes that occur upon binding ATP. The endpoints of the calculation were determined from experimental X-ray structures, but the path in between isn’t accessible experimentally. The conformation change is too slow to simulate normally, but “targeted MD” can identify such trajectories. The yellow intermediate domain moves downwards first, triggering the upwards movement of the green apical domain. Simulation shows details of interactions responsible for both positive and negative cooperativity. Please see figures 2b and 5b of Ma, J, PB Sigler, Z Xu, and M Karplus. "A Dynamic Model for The Allosteric Mechanism of GroEL." J Mol Biol. 302, no. 2 (15 September 2000): 303-13. MD simulations are now run on ENORMOUS systems! Simulation of the aquaporin channel: permeable to water but not protons. The waters and the lipid bilayer are both present in the calculation - 101,000 atoms total. Please see figure 1 of de Groot, Bert L., and Helmut Grubmüller. "Water Permeation Across Biological Membranes: Mechanism and Dynamics of Aquaporin-1 and GlpF." Science 294 (14 December 2001): 2353-2357. MD simulation of water permeation through aquaporin Please see figure 2 of de Groot, Bert L., and Helmut Grubmüller. "Water Permeation Across Biological Membranes: Mechanism and Dynamics of Aquaporin-1 and GlpF." Science 294 (14 December 2001): 2353-2357. A 10 ns simulation was sufficient to see water move in “real time” because the permeation rate is 3x109 s-1. Can look at the mechanism of selectivity and what determines the rate. Can you fold a protein using Molecular Dynamics? We don’t really know yet! Simulate the folding of villin headpiece: 36 residues, with explicit solvent. 1 us simulation with 2 fs timestep, took, 512 Cray processor months! Results: The NMR structure is stable in the simulation. After 60 ns the structure showed 50% helicity, in good agreement with experimental data for other proteins There was an initial “burst” phase giving helix formation and native contacts A “metastable intermediate” was found with secondary and tertiary contacts similar to the native state. Duan & Kollman Science (1998) 282, 740-744 Villin headpiece A. unfolded B. 980 ns partly folded C. native E. structure from stable cluster Time evolution of: A. Native helical content B. Native contacts C. Radius of gyration D. Solvation free energy Please see figures 1 and 2 of Duan, Yong, and Peter A. Kollman. "Pathways to a Protein Folding Intermediate Observed in a 1-Microsecond Simulation in Aqueous Solution." Science 282 (23 October 1998): 740-744. Molecular Modeling - a few final words ? Potential functions used for modeling are approximate (though quite good for some purposes); calculations are not replacement for experiment ? Ability to “dissect” computational results leads to insights unavailable by other approaches ? The need to specify the problem precisely in order to build a model is itself valuable Solving structures using X-ray crystallography & NMR spectroscopy How are X-ray crystal structures determined? 1. Grow crystals - structure determination by X-ray crystallography relies on the repeating structure of a crystalline lattice. 2. Collect a diffraction pattern - periodically spaced atoms in the crystal give specific “spots” where X-rays interfere constructively. 3. Carry out a Fourier transform to get from “reciprocal space” to a real space description of the electron density. 4. THIS STEP REQUIRES KNOWLEDGE OF THE PHASES OF THE INTERFERING WAVES, WHICH CAN’T BE DIRECTLY MEASURED “THE PHASE PROBLEM” 4. Build a preliminary model of the protein into the envelope of electron density that results from the experiment. 5. Refine the structure through an iterative process of changing the model and comparing how it fits the data. Growing Protein Crystals This is often the rate-limiting step in solving a structure! hanging-drop vapor diffusion 1. choose a sequence that you think folds into a well-defined structure 2. prepare highly-purified protein 3. concentrate to > 10 mg/ml buffer, salts, precipitant Collecting Diffraction Data What do the spots in the diffraction pattern mean? Every spot represents constructive interference between diffraction from a set of atoms with spacing satisfying Bragg’s Law: nλ = 2dsinθ So the INTENSITY of a spot at a given q tells you something about how much electron density lies in a set of periodic planes with spacing d. NOTE: λ = 0.5 to 1.5 ? each spot results from a differently-oriented set of planes with the appropriate spacing The electron density in the protein is a superposition of all of these periodic functions… The diffraction pattern represents reciprocal space large θ -> small d; HIGH resolution small θ -> large d LOW resolution Image of diffraction data removed for reasons of copyright. Fourier Transform f(x) = ∫ F(s) exp(-i 2πxs) ds F(s) = ∫ f(x) exp(i 2πxs) dx This is the “diffraction pattern” This is the “electron density” Courtesy of Kevin Cowtan: http://www.ysbl.york.ac.uk/~cowtan/sfapplet/sfintro.html The Phase Problem: we don’t know what phases to use to add up all of the contributing waves. BIG PROBLEM. | F hkl | exp(iα hkl ) = observable amplitude atomic scattering factor - related the phase of F is determined by the to electron density around atom j x, y and z coordinates of the atoms What we observe is I hkl α |F khl | 2 we can’t measure the phases directly F hkl is called a structure factor Courtesy of Kevin Cowtan: http://www.ysbl.york.ac.uk/~cowtan/sfapplet/sfintro.html Courtesy of Kevin Cowtan: http://www.ysbl.york.ac.uk/~cowtan/sfapplet/sfintro.html FT FT Contribution of intensities vs. phases to the Fourier Transform duck intensities + cat phases Courtesy of Kevin Cowtan: http://www.ysbl.york.ac.uk/~cowtan/sfapplet/sfintro.html So, how do you get the phases? I. Molecular Replacement “Borrow” phases from a structur e that you think is similar. e.g. duck inte nsi t ie s plus goose phases would give a reasonable estimate of t h e real struct ure. Beware “model bias” - n eed to run controls to make sure you haven’t forced the data to assume the structure of the model. How it works: Take the model structure and translate/rotate it in the unit cell of the crystal. Calculate the expected structure factors. Monitor the R value. If you find a significantly non-random R value, this might be a good model. Try to refine it and see if the R free value improves. So, how do you get the phases? II. Heavy atom methods If you can make two or more crystals of the same geometry that have heavy atoms incorporated, then F protein?heavy = F protein + F heavy and taking the difference of a diffraction pattern with and without the heavy atoms will give you the diffraction pattern for just the heavy atoms. This structure can be solved using a “Patterson map”: 1 2 V P(u ,v, w) = ∑∑∑ | F hkl | exp(?2πi(hu + kv + lw)) h k l The Patterson map gives you all of the vectors between atoms. This doesn’t help for a whole protein, but for a sufficiently small structure, you can get the entire structure from this data (except the chirality). Once you have the structures of the heavy atom derivatives, you can use this to derive the phases for the original structure. (Trust me on this - we don’t have time to do the details.) Map tracing usually requires knowledge of the sequence 2.55? map X-Ray Crystal Structure Refinement The model: Computed The data: Actual intensities of spots intensities of spots F obs (h,k,l) ? F calc (h,k,l) ] 2 U X -ray expt = ∑[ Summation h ,k ,l Actual intensity of spot Intensity of spot calculated runs over spots observed in expt from trial structure U hybrid = U Model Molec + sU expt ray -X ? Simulated annealing on hybrid potential rapidly improves correspondence between structure and X-ray observations while maintaining reasonable chemistry (large radius of convergence) ? Previous method effectively used local minimization which became trapped in local minima (small radius of convergence) Structure refinement and the R factor current model observed X-ray amplitudes R = Σ all measurements ||F obs | - |F calc ||/Σ|F obs | model-derived amplitudes make changes guided by observed data and by stereochemical principles The Free R factor current model 90% of X-ray amplitudes R = Σ||F obs calc ||/Σ|F obs | model-derived amplitudes change model 10% of X-ray amplitudes R free = Σ||F obs calc ||/Σ|F obs | assess model | - |F | - |F