Simulated Annealing A Basic Introduction Prof. Olivier de Weck Massachusetts Institute of Technology Dr. Cyrus Jilla Outline ? Heuristics ? Background in Statistical Mechanics – Atom Configuration Problem – Metropolis Algorithm ? Simulated Annealing Algorithm ? Sample Problems and Applications ? Summary Learning Objectives ? Review background in Statistical Mechanics: configuration, ensemble, entropy, heat capacity ? Understand the basic assumptions and steps in Simulated Annealing ? Be able to transform design problems into a combinatorial optimization problem suitable to SA ? Understand strengths and weaknesses of SA Heuristics What is a Heuristic? ? A Heuristic is simply a rule of thumb that hopefully will find a good answer. ? Why use a Heuristic? – Heuristics are typically used to solve complex (large, nonlinear, nonconvex (ie. contain many local minima)) multivariate combinatorial optimization problems that are difficult to solve to optimality. ? Unlike gradient-based methods in a convex design space, heuristics are NOT guaranteed to find the true global optimal solution in a single objective problem, but should find many good solutions (the mathematician's answer vs. the engineer’s answer) ? Heuristics are good at dealing with local optima without getting stuck in them while searching for the global optimum. Types of Heuristics ? Heuristics Often Incorporate Randomization ? 2 Special Cases of Heuristics – Construction Methods ? Must first find a feasible solution and then improve it. – Improvement Methods ? Start with a feasible solution and just try to improve it. ? 3 Most Common Heuristic Techniques – Genetic Algorithms – Simulated Annealing – Tabu Search – New Methods: Particle Swarm Optimization, etc… Origin of Simulated Annealing (SA) ? Definition: A heuristic technique that mathematically mirrors the cooling of a set of atoms to a state of minimum energy. ? Origin: Applying the field of Statistical Mechanics to the field of Combinatorial Optimization (1983) ? Draws an analogy between the cooling of a material (search for minimum energy state) and the solving of an optimization problem. ? Original Paper Introducing the Concept – Kirkpatrick, S., Gelatt, C.D., and Vecchi, M.P., “Optimization by Simulated Annealing,” Science, Volume 220, Number 4598, 13 May 1983, pp. 671 680. Statistical Mechanics The Analogy ? Statistical Mechanics: The behavior of systems with many degrees of freedom in thermal equilibrium at a finite temperature. ? Combinatorial Optimization: Finding the minimum of a given function depending on many variables. ? Analogy: If a liquid material cools and anneals too quickly, then the material will solidify into a sub-optimal configuration. If the liquid material cools slowly, the crystals within the material will solidify optimally into a state of minimum energy (i.e. ground state). – This ground state corresponds to the minimum of the cost function in an optimization problem. Sample Atom Configuration Original Configuration Perturbed Configuration Atom Configuration - Sample Problem Atom Configuration - Sample Problem 7 6 5 4 3 2 1 0 -1 1 2 3 4 -1 0 1 2 3 4 5 6 7 1 23 4 -1 0 1 2 3 4 5 6 7 -1 0 1 2 3 4 5 6 7 E=133.67 E=109.04 Energy of original (configuration) Perturbing = move a random atom to a new random (unoccupied) slot Configurations ? Mathematically describe a configuration – Specify coordinates of each “atom” 1 3 4 5 ?? ? ?? ? ?? { } with i=1,..,4 r 1 = ?? , r = ? ? , r = ? ? , r = ?? r i 1 2 ?? ? ? 4 4 32 3 ? ? ?? – Specify a slot for each atom { r i } with i=1,..,4 R = [ 1 12 19 23 ] ? ? Energy of a state ? Each state (configuration) has an energy level associated with it ( ,, ? ii ( ,,H qqt ) = ∑ qp - L qqt ) i Hamiltonian H =+=T VE tot Energy of configuration = Objective function value of design kinetic potential Energy Energy sample problem ? Define energy function for “atom” sample problem 1 ? i + ∑ ( ( x i - x j ) + ( y - y j ) ) E i = mgy N 2 2 2 i ≠ potential ????? ji ???????? energy kinetic energy N ER ) = ∑ Er ) Absolute and relative position of ( i ( i i=1 each atom contributes to Energy ? ? ? ? ? ? ? ? Compute Energy of Config. A ? Energy of initial configuration E 1 = 1 10 1 + 5 + 18 + 20 = 20.95 = 1 10 2 + 5 + 5 + 5 = 26.71E 2 = 1 10 4 + 18 + 5 + 2 = 47.89E 3 E 4 = 1 10 3 + 20 + 5 + 2 = 38.12 Total Energy Configuration A: E({r A })= 133.67 Boltzmann Probability P! Number of configurations = - P=# of slots=25 N R ( PN ) ! N R =6,375,600 N=# of atoms =4 What is the likelihood that a particular configuration will exist in a large ensemble of configurations? ( { } ) = exp ? ? -Er Boltzmann probability Pr ({ }) ? depends on energy ? kT ? ? and temperature B Occurences 6000 8000 Occurences Boltzmann Collapse at low T T=100 T=10 T=1 T=100 - E =100.1184 T=10 - E =58.9245 T=1 - E =38.1017 avg avg avg 14000 14000 12000 12000 12000 10000 10000 10000 Occurences 8000 8000 6000 6000 4000 4000 4000 2000 2000 2000 0 0 20 40 60 80 100 120 140 160 180 200 0 0 20 40 60 80 100 120 140 160 180 200 0 Energy 0 20 40 60 80 100 120 140 160 180 200 Energy Energy T high T low Boltzmann Distribution collapses to the lowest energy state(s) in the limit of low temperature Basis of search by Simulated Annealing ? Partition Function Z ? Ensemble of configurations can be described statistically ? Partition function, Z, generates the ensemble average N R Z = Tr exp ? ? -E i ? ? = ∑ exp ? ? -E i ? B B? kT ? i=1 ? kT ? Initially (at T>>0) equal to the number of possible configurations Free Energy N R () = ∑ ET ) Average Energy of all E = ET i ( Configurations in an avg i=1 ensemble ( B ln () -TSF T ) =-kTZ = ET N R ? ET ) ? ? ET ) () = ∑ exp ? - i ( i ( i=1 ? kT ? -dZ E = ET B = ln avg Z d ( 1 kT ) B Relates average Energy at T with Entropy S Specific Heat and Entropy ? Specific Heat N R 2 i ( () ? ? ? N 1 R i=1 ()- ET ) 2 ? 2 ()- ET ) 2 CT ) = dET = ∑ E T ( ? ? = E T ( dT k 2 kT 2 B B ? Entropy () () = ST CT () = CT ST ()- T ∫ 1 () dT dST 1 T dT T T Entropy is ~ equal to ln(# of unique configurations) Low Temperature Statistics Specific Heat C and Entropy S 5 10 10 T 0 2 4 6 8 # o f c onfiguration s, N Entropy S decreases with lower T # configs Ht. Capacity 0 10 0 2 4 0 2 4 10 10 10 10 10 10 decreases Temperature Temperature 150 150 -100 -150 -100 -50 0 50 100 Free Energ y, F Av erage Energ y, E a v g Approaches Approaches E* from 100 50 0 -50 -150 E* from 0 2 4 0 2 4 10 10 10 10 10 10 below Temperature Temperature above Minimum Energy Configurations ? Sample Atom Placement Problem Uniqueness? Atom Configuration - Sample Problem Atom Configuration - Sample Problem 7 7 6 5 4 3 2 1 0 -1 -1 0 1 2 3 4 5 6 7 -1 0 1 2 3 4 5 6 7 -1 0 1 2 3 4 5 6 1 23 4 E*=14.26 12 34 E*=60 Strong gravity g=10 Low gravity g=0.1 Simulated Annealing Dilemma ? Cannot compute energy of all configurations ! – Design space often too large – Computation time for a single function evaluation can be large ? Use Metropolis Algorithm, at successively lower temperatures to find low energy states – Metropolis: Simulate behavior of a set of atoms in thermal equilibrium (1953) – Probability of a configuration existing at T Boltzmann Probability P(r,T)=exp(-E(r)/T) The SA Algorithm ? Terminology: – X (or R or G) = Design Vector (i.e. Design, Architecture, Configuration) – E = System Energy (i.e. Objective Function Value) – T = System Temperature – D = Difference in System Energy Between Two Design Vectors ? The Simulated Annealing Algorithm 1) Choose a random X , select the initial system temperature, and specify the i cooling (i.e. annealing) schedule 2) Evaluate E(X ) using a simulation model i 3) Perturb X to obtain a neighboring Design Vector (X i+1 ) i 4) Evaluate E(X i+1 ) using a simulation model 5) If E(X i+1 )< E(X ), X i+1 is the new current solution i 6) If E(X i+1 )> E(X ), then accept X i+1 as the new current solution with a i probability e (- D/T) where D = E(X i+1 ) - E(X ). i 7) Reduce the system temperature according to the cooling schedule. 8) Terminate the algorithm. o ?E = Define Initial Configuration R o o ) Perturb Configuration R i - > R i+1 j ? min ? Accept R i+1 as New Configuration Create Random Number ? in [0,1] exp > ? ? -?E ? i+1 ) y y y y n n Step n n E(R i+1 ) - E(R i ) ? ? ? ? ? ? ? ? ? ? ? ? ? ? T j+1 = T j - ?T min Keep R i as Current Configuration ?E < 0 ? Start T Evaluate Energy E(R Reached Equilibrium at T T < T Evaluate Energy E(R Metropolis Compute Energy Difference SA BLOCK DIAGRAM Reduce Temperature End T Matlab Function: SA.m Best Configuration Initial Configuration Simulated Annealing Algorithm evaluation function perturbation function file_eval.m file_perturb.m SA.m x best E best x o Option Flags History x hist options Exponential Cooling ? Typically (T 1 /T o )~0.7-0.9 ? T 1 ? k T k +1 = ? ? T kSimulated Annealing Evolution 10 S-entropy C-specific heat T/10-temperature ? T o ? 9 8 7 Can also do linear 6 or step-wise cooling 5 4 … 3 2 But exponential cooling 1 often works best. 0 0 5 10 15 20 25 30 35 40 45 Temperature Step Key Ingredients for SA ? A concise description of a configuration (architecture, design, topology) of the system (Design Vector). ? A random generator of rearrangements of the elements in a configuration (Neighborhoods). This generator encapsulates rules so as to generate only valid configurations. ? A quantitative objective function containing the trade-offs that have to be made (Simulation Model and Output Metric(s)). Surrogate for system energy. ? An annealing schedule of the temperatures and/or the length of times for which the system is to be evolved. Sample Problems Matlab “peaks” function ? Difficult due to plateau at z=0, local maxima ? SAdemo1 ? x o =[-2,-2] ? Optimum at ? x*=[ 0.012, 1.524] ? z*=8.0484 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 x Peaks y Start Point Maximum “peaks” convergence ? Initially ~ nearly random search ? Later ~ gradient search SA convergence history -10 -8 -6 -4 -2 0 2 Syst em Energy current configuration new best configuration 0 50 100 150 200 250 Iteration Number 300 Traveling Salesman Problem ? N cities arranged randomly on [-1,1] ? Choose N=15 ? SAdemo1 ? Minimize “cost” of route (length, time,…) ? Visit each city once, return to start city N 2 ( j ( j ( 2 2 j ( j ( 2 l R ) = ∑ ( x i+1 ) - x i ) ) + ∑ ( x R ) - xR N ) ) 1 i=1 j=1 j=1 ???????????? ??????????? visit N cities return home to first city TSP Problem (II) Initial (Random) Route Final (Optimized) Route Length: 17.43 Length: 8.24 Length of TSP Route: 17.4309 Length of TSP Route: 8.2384 1 Start1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -0.2 -0.2 -0.4 -0.4 -0.6 -0.6 -0.8 -0.8 -1 -1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1 1-0.8 -0.2 0 0.4 Start 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Shortest Route -0.6 -0.4 0.2 0.6 0.8 Result with SA Structural Optimization ? Define: – Design Domain find r i i = 1,..., N T – Boundary Conditions min C fu () = r i – Loads -1 s.t. uKf= – Mass constraint N ? Subdivide domain s.t. ∑ V r ≤ m ii max i=1 – N x M design “cells” – Cell density r=1 or r=0 ? Where to put material to minimize compliance? Structural Topology Optimization (II) “Energy” = strain energy = compliance Computed via Finite Element Analysis 50 1 2 1211 2 3 13 3 4 14 4 5 15 5 6 16 6 7 17 7 8 18 8 9 19 9 10 2011 2221 23 24 25 26 27 28 29 3021 3231 33 34 35 36 37 38 39 4031 4241 43 44 45 46 47 48 49 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 1 2 3 13 3 4 14 4 5 15 5 6 16 6 7 17 7 8 18 8 9 19 18 9 10 20 19 23 24 25 26 27 28 29 20 30 29 33 34 35 36 37 38 39 38 30 40 39 4241 43 44 45 46 47 48 49 48 40 49 Applied Load Cantilever F=-2000 Boundary condition Deformation not drawn to scale Structural Toplogy Optimization Initial Structural Toplogy “Final” Toplogy Initial Structural Configuration: x o “island” 9 8 7 6 5 4 3 2 1 0 -1 Compliance C=593.0 0 2 4 6 8 10 Compliance C=236.68 Structural Configuration Not “optimal” – often need post-processing Structural Optimization – Convergence Analysis Simulated Annealing Evolution 14 45 12 10 8 6 4 2 0 S-entropy 0 5 10 15 20 25 30 35 40 T=3e+002 Occurences T=2.7e+002 T=2.4e+002 T=2.2e+002 T=1.9e+002 T=1.8e+002 T=1.6e+002 T=1.4e+002 T=1.3e+002 T=1.1e+002 T=1e+002 T=93 T=84 T=75 T=68 T=61 T=55 T=49 T=45 T=40 T=36 T=32 T=29 T=26 T=24 T=21 T=19 T=17 T=16 T=14 T=13 T=11T=10 T=9.2T=8.2 T=7.4 T=6.7 T=6 when C is still large ! C-specific heat ln(T)-temperature Don’t terminate 0 5 10 15 20 25 30 35 40 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 Temperature Step Energy Boltzmann Distributions Evolution of - Entropy, Temperature, Specific Heat Premature termination 0 500 1000 1500 2000 2500 3000 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 Iteration Number Syst em Energy SA convergence history current configuration new best configuration Indicator: Best Configuration found only shortly before Simulated Annealing terminated. Final Example: Telescope Array Optimization ? Place N=27 stations in xy within a 200 km radius ? Minimize UV density metric ? Ideally also minimize cable length Cable Length = 1064.924 UV Density = 0.67236 200 100 0 Initial Configuration -100 -200 100 50 0 -50 -100 -200 -100 0 100 200 -100 -50 0 50 100 Optimized Solution -200 -100 0 100 200 -200 -100 0 100 200 -100 -50 0 50 100 -100 -50 0 50 100 Cable Length = 1349.609 UV Density = 0.33333 Simulated Annealing Improved UV density from 0.67 to 0.33 ? Simulated Annealing transforms the array: – Hub-and-Spoke Circle-with-arms Reference available Summary Summary: Steps of SA ? The Simulated Annealing Algorithm 1) Choose a random X , select the initial system temperature, and outline the i cooling (ie. annealing) schedule 2) Evaluate E(X ) using a simulation model i 3) Perturb X to obtain a neighboring Design Vector (X i+1 ) i 4) Evaluate E(X i+1 ) using a simulation model 5) If E(X i+1 )< E(X ), X i+1 is the new current solution i 6) If E(X i+1 )> E(X ), then accept X i+1 as the new current solution with a i probability e (- D/T) where D = E(X i+1 ) - E(X ). i 7) Reduce the system temperature according to the cooling schedule. 8) Terminate the algorithm. Research in SA ? Alternative Cooling Schedules and Termination criteria ? Adaptive Simulated Annealing (ASA) – determines its own cooling schedule ? Hybridization with other Heuristic Search Methods (GA, Tabu Search …) ? Multiobjective Optimization with SA References ? Cerny, V., "Thermodynamical Approach to the Traveling Salesman Problem: An Efficient Simulation Algorithm", J. Opt. Theory Appl., 45, 1, 41-51, 1985 . ? de Weck O. “System Optimization with Simulated Annealing (SA), Memorandum ? Cohanim B., Hewitt J, and de Weck O. L. “The Design of Radio Telescope Array Configurations using Multiobjective Optimization: Imaging Performance versus Cable Length”, The Astrophysical Journal, (in press), 2004 ? Jilla, C.D., and Miller, D.W., “Assessing the Performance of a Heuristic Simulated Annealing Algorithm for the Design of Distributed Satellite Systems,” Acta Astronautica, Vol. 48, No. 5-12, 2001, pp. 529-543. ? Kirkpatrick, S., Gelatt, C.D., and Vecchi, M.P., “Optimization by Simulated Annealing,” Science, Volume 220, Number 4598, 13 May 1983, pp. 671-680. ? Metropolis,N., A. Rosenbluth, M. Rosenbluth, A. Teller, E. Teller, "Equation of State Calculations by Fast Computing Machines", J. Chem. Phys.,21, 6, 1087-1092, 1953.