Simulated Annealing
A Basic Introduction
Prof. Olivier de Weck
Massachusetts Institute of Technology
Dr. Cyrus Jilla
Outline
? Heuristics
? Background in Statistical Mechanics
– Atom Configuration Problem
– Metropolis Algorithm
? Simulated Annealing Algorithm
? Sample Problems and Applications
? Summary
Learning Objectives
? Review background in Statistical Mechanics:
configuration, ensemble, entropy, heat capacity
? Understand the basic assumptions and steps in
Simulated Annealing
? Be able to transform design problems into a
combinatorial optimization problem suitable to
SA
? Understand strengths and weaknesses of SA
Heuristics
What is a Heuristic?
? A Heuristic is simply a rule of thumb that hopefully will find a good
answer.
? Why use a Heuristic?
– Heuristics are typically used to solve complex (large, nonlinear,
nonconvex (ie. contain many local minima)) multivariate combinatorial
optimization problems that are difficult to solve to optimality.
? Unlike gradient-based methods in a convex design space, heuristics
are NOT guaranteed to find the true global optimal solution in a
single objective problem, but should find many good solutions (the
mathematician's answer vs. the engineer’s answer)
? Heuristics are good at dealing with local optima without getting stuck
in them while searching for the global optimum.
Types of Heuristics
? Heuristics Often Incorporate Randomization
? 2 Special Cases of Heuristics
– Construction Methods
? Must first find a feasible solution and then improve it.
– Improvement Methods
? Start with a feasible solution and just try to improve it.
? 3 Most Common Heuristic Techniques
– Genetic Algorithms
– Simulated Annealing
– Tabu Search
– New Methods: Particle Swarm Optimization, etc…
Origin of Simulated Annealing (SA)
? Definition: A heuristic technique that mathematically mirrors the
cooling of a set of atoms to a state of minimum energy.
? Origin: Applying the field of Statistical Mechanics to the field of
Combinatorial Optimization (1983)
? Draws an analogy between the cooling of a material (search for
minimum energy state) and the solving of an optimization problem.
? Original Paper Introducing the Concept
– Kirkpatrick, S., Gelatt, C.D., and Vecchi, M.P., “Optimization by Simulated
Annealing,” Science, Volume 220, Number 4598, 13 May 1983, pp. 671
680.
Statistical Mechanics
The Analogy
? Statistical Mechanics: The behavior of systems with many
degrees of freedom in thermal equilibrium at a finite temperature.
? Combinatorial Optimization: Finding the minimum of a given
function depending on many variables.
? Analogy: If a liquid material cools and anneals too quickly, then the
material will solidify into a sub-optimal configuration. If the liquid
material cools slowly, the crystals within the material will solidify
optimally into a state of minimum energy (i.e. ground state).
– This ground state corresponds to the minimum of the cost function in an
optimization problem.
Sample Atom Configuration
Original Configuration Perturbed Configuration
Atom Configuration - Sample Problem Atom Configuration - Sample Problem
7
6
5
4
3
2
1
0
-1
1
2
3
4
-1
0
1
2
3
4
5
6
7
1
23
4
-1 0 1 2 3 4 5 6 7 -1 0 1 2 3 4 5 6 7
E=133.67 E=109.04
Energy of original (configuration)
Perturbing = move a random atom
to a new random (unoccupied) slot
Configurations
? Mathematically describe a configuration
– Specify coordinates of each “atom”
1 3 4 5
?? ? ?? ? ??
{ }
with i=1,..,4 r
1
=
??
, r =
? ?
, r =
? ?
, r =
??
r
i
1
2
?? ? ?
4
4
32
3
? ? ??
– Specify a slot for each atom
{
r
i
}
with i=1,..,4 R =
[
1 12 19 23
]
? ?
Energy of a state
? Each state (configuration) has an energy
level associated with it
(
,,
?
ii
(
,,H qqt
)
=
∑
qp - L qqt
)
i
Hamiltonian
H =+=T VE
tot
Energy of configuration
=
Objective function value of design
kinetic potential
Energy
Energy sample problem
? Define energy function for “atom” sample
problem
1
?
i
+
∑
(
(
x
i
- x
j
)
+
(
y - y
j
)
)
E
i
= mgy
N
2 2
2
i
≠
potential
?????
ji
????????
energy
kinetic
energy
N
ER
)
=
∑
Er
)
Absolute and relative position of
(
i
(
i
i=1 each atom contributes to Energy
? ?
? ?
? ?
? ?
Compute Energy of Config. A
? Energy of initial configuration
E
1
= 1 10 1 + 5 + 18 + 20 = 20.95
= 1 10 2 + 5 + 5 + 5 = 26.71E
2
= 1 10 4 + 18 + 5 + 2 = 47.89E
3
E
4
= 1 10 3 + 20 + 5 +
2 = 38.12
Total Energy Configuration A: E({r
A
})= 133.67
Boltzmann Probability
P!
Number of configurations
=
-
P=# of slots=25
N
R
(
PN
)
!
N
R
=6,375,600
N=# of atoms =4
What is the likelihood that a particular configuration will
exist in a large ensemble of configurations?
(
{ }
)
= exp
?
?
-Er
Boltzmann probability
Pr
({ })
?
depends on energy
?
kT
?
?
and temperature
B
Occurences
6000
8000
Occurences
Boltzmann Collapse at low T
T=100 T=10 T=1
T=100 - E =100.1184 T=10 - E =58.9245 T=1 - E =38.1017
avg avg avg
14000 14000
12000
12000
12000
10000
10000
10000
Occurences
8000
8000
6000
6000
4000
4000
4000
2000
2000
2000
0
0 20 40 60 80 100 120 140 160 180 200
0
0 20 40 60 80 100 120 140 160 180 200 0
Energy 0 20 40 60 80 100 120 140 160 180 200
Energy
Energy
T high
T low
Boltzmann Distribution collapses to the lowest energy
state(s) in the limit of low temperature
Basis of search by Simulated Annealing
?
Partition Function Z
? Ensemble of configurations can be
described statistically
? Partition function, Z, generates the
ensemble average
N
R
Z = Tr exp
?
?
-E
i
?
?
=
∑
exp
?
?
-E
i
?
B B?
kT
?
i=1
?
kT
?
Initially (at T>>0) equal to the number of possible configurations
Free Energy
N
R
() =
∑
ET
)
Average Energy of all
E = ET
i
(
Configurations in an
avg
i=1
ensemble
(
B
ln () -TSF T
)
=-kTZ = ET
N
R
? ET
)
?
?
ET
)
() =
∑
exp
?
-
i
(
i
(
i=1
?
kT
?
-dZ
E = ET
B
=
ln
avg
Z d
(
1 kT
)
B
Relates average Energy at T with Entropy S
Specific Heat and Entropy
? Specific Heat
N
R
2
i
(
()
?
?
?
N
1
R
i=1
()- ET
)
2
?
2
()- ET
)
2
CT
)
=
dET
=
∑
E T
(
?
?
=
E T
(
dT k
2
kT
2
B B
? Entropy
()
() = ST
CT
()
=
CT
ST ()-
T
∫
1
()
dT
dST
1
T
dT T
T
Entropy is ~ equal to ln(# of unique configurations)
Low Temperature Statistics
Specific Heat C and Entropy S
5
10 10
T
0
2
4
6
8
# o
f c
onfiguration
s, N
Entropy S
decreases
with lower T
# configs Ht. Capacity
0
10
0 2 4 0 2 4
10 10 10 10 10 10
decreases
Temperature Temperature
150 150
-100
-150
-100
-50
0
50
100
Free Energ
y, F
Av
erage
Energ
y, E
a
v
g
Approaches
Approaches
E* from
100
50
0
-50
-150
E* from
0 2 4 0 2 4
10 10 10 10 10 10
below
Temperature Temperature
above
Minimum Energy Configurations
? Sample Atom Placement Problem
Uniqueness?
Atom Configuration - Sample Problem Atom Configuration - Sample Problem
7
7
6
5
4
3
2
1
0
-1
-1 0 1 2 3 4 5 6 7 -1 0 1 2 3 4 5 6 7
-1
0
1
2
3
4
5
6
1
23
4
E*=14.26
12 34
E*=60
Strong gravity g=10
Low gravity g=0.1
Simulated Annealing
Dilemma
? Cannot compute energy of all configurations !
– Design space often too large
– Computation time for a single function evaluation can
be large
? Use Metropolis Algorithm, at successively lower
temperatures to find low energy states
– Metropolis: Simulate behavior of a set of atoms in
thermal equilibrium (1953)
– Probability of a configuration existing at T
Boltzmann Probability P(r,T)=exp(-E(r)/T)
The SA Algorithm
? Terminology:
– X (or R or G) = Design Vector (i.e. Design, Architecture, Configuration)
– E = System Energy (i.e. Objective Function Value)
– T = System Temperature
– D = Difference in System Energy Between Two Design Vectors
? The Simulated Annealing Algorithm
1) Choose a random X , select the initial system temperature, and specify the
i
cooling (i.e. annealing) schedule
2) Evaluate E(X ) using a simulation model
i
3) Perturb X to obtain a neighboring Design Vector (X
i+1
)
i
4) Evaluate E(X
i+1
) using a simulation model
5) If E(X
i+1
)< E(X ), X
i+1
is the new current solution
i
6) If E(X
i+1
)> E(X ), then accept X
i+1
as the new current solution with a
i
probability e
(- D/T)
where D = E(X
i+1
) - E(X ).
i
7) Reduce the system temperature according to the cooling schedule.
8) Terminate the algorithm.
o
?E =
Define Initial
Configuration R
o
o
)
Perturb Configuration
R
i
-
> R
i+1
j
?
min
?
Accept R
i+1
as
New Configuration
Create Random Number
? in [0,1]
exp
> ? ?
-?E
?
i+1
)
y
y y
y
n
n
Step
n
n
E(R
i+1
) - E(R
i
)
?
?
?
?
?
?
?
?
?
?
?
?
?
?
T
j+1
= T
j
- ?T
min
Keep R
i
as Current
Configuration
?E < 0 ?
Start T
Evaluate Energy E(R
Reached Equilibrium at T
T < T
Evaluate Energy E(R
Metropolis
Compute Energy Difference
SA BLOCK DIAGRAM
Reduce Temperature
End T
Matlab Function: SA.m
Best Configuration
Initial Configuration
Simulated Annealing
Algorithm
evaluation
function
perturbation
function
file_eval.m file_perturb.m
SA.m
x
best
E
best
x
o
Option Flags History
x
hist
options
Exponential Cooling
? Typically (T
1
/T
o
)~0.7-0.9
?
T
1
?
k
T
k +1
=
? ?
T
kSimulated Annealing Evolution
10
S-entropy
C-specific heat
T/10-temperature
?
T
o ?
9
8
7
Can also do linear
6
or step-wise cooling
5
4
…
3
2
But exponential cooling
1
often works best.
0
0 5 10 15 20 25 30 35 40 45
Temperature Step
Key Ingredients for SA
? A concise description of a configuration (architecture, design,
topology) of the system (Design Vector).
? A random generator of rearrangements of the elements in a
configuration (Neighborhoods). This generator encapsulates rules
so as to generate only valid configurations.
? A quantitative objective function containing the trade-offs that
have to be made (Simulation Model and Output Metric(s)).
Surrogate for system energy.
? An annealing schedule of the temperatures and/or the length of
times for which the system is to be evolved.
Sample Problems
Matlab “peaks” function
? Difficult due to plateau at z=0, local maxima
? SAdemo1
? x
o
=[-2,-2]
? Optimum at
? x*=[ 0.012, 1.524]
? z*=8.0484
-3 -2 -1 0 1 2 3
-3
-2
-1
0
1
2
3
x
Peaks
y
Start Point
Maximum
“peaks” convergence
? Initially ~ nearly random search
? Later ~ gradient search
SA convergence history
-10
-8
-6
-4
-2
0
2
Syst
em Energy
current configuration
new best configuration
0 50 100 150 200 250
Iteration Number
300
Traveling Salesman Problem
? N cities arranged randomly on [-1,1]
? Choose N=15
? SAdemo1
? Minimize “cost” of route (length, time,…)
? Visit each city once, return to start city
N 2
(
j
(
j
(
2
2
j
(
j
(
2
l R
)
=
∑
(
x
i+1
)
- x
i
)
)
+
∑
(
x R
)
- xR
N
)
)
1
i=1 j=1 j=1
???????????? ???????????
visit N cities return home to first city
TSP Problem (II)
Initial (Random) Route Final (Optimized) Route
Length: 17.43 Length: 8.24
Length of TSP Route: 17.4309 Length of TSP Route: 8.2384
1
Start1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1 -1
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1 1-0.8 -0.2 0 0.4
Start
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Shortest Route
-0.6 -0.4 0.2 0.6 0.8
Result with SA
Structural Optimization
? Define:
– Design Domain
find r
i
i = 1,..., N
T
– Boundary Conditions
min C fu ()
= r
i
– Loads
-1
s.t. uKf=
– Mass constraint
N
? Subdivide domain s.t.
∑
V r ≤ m
ii max
i=1
– N x M design “cells”
– Cell density r=1 or r=0
? Where to put material to minimize compliance?
Structural Topology Optimization (II)
“Energy” = strain energy = compliance
Computed via Finite Element Analysis
50
1 2
1211
2 3
13
3 4
14
4 5
15
5 6
16
6 7
17
7 8
18
8 9
19
9 10
2011
2221 23 24 25 26 27 28 29 3021
3231 33 34 35 36 37 38 39 4031
4241 43 44 45 46 47 48 49 50
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36
1 2
3
13
3
4
14
4
5
15
5
6
16
6
7
17
7
8
18
8
9
19
18
9
10
20
19
23
24
25
26
27
28
29
20
30
29
33
34
35
36
37
38
39
38
30
40
39
4241
43
44
45
46
47
48
49
48
40
49
Applied Load
Cantilever
F=-2000
Boundary
condition
Deformation not drawn to scale
Structural Toplogy Optimization
Initial Structural Toplogy
“Final” Toplogy
Initial Structural Configuration: x
o “island”
9
8
7
6
5
4
3
2
1
0
-1
Compliance C=593.0
0 2 4 6 8 10
Compliance C=236.68
Structural Configuration
Not “optimal” – often
need post-processing
Structural Optimization –
Convergence Analysis
Simulated Annealing Evolution
14 45
12
10
8
6
4
2
0
S-entropy
0
5
10
15
20
25
30
35
40
T=3e+002
Occurences
T=2.7e+002
T=2.4e+002
T=2.2e+002
T=1.9e+002
T=1.8e+002
T=1.6e+002
T=1.4e+002
T=1.3e+002
T=1.1e+002
T=1e+002
T=93
T=84
T=75
T=68
T=61
T=55
T=49
T=45
T=40
T=36
T=32
T=29
T=26
T=24
T=21
T=19
T=17
T=16
T=14
T=13
T=11T=10
T=9.2T=8.2
T=7.4
T=6.7
T=6
when C is still
large !
C-specific heat
ln(T)-temperature
Don’t terminate
0 5 10 15 20 25 30 35 40 200 400 600 800 1000 1200 1400 1600 1800 2000 2200
Temperature Step Energy
Boltzmann Distributions
Evolution of
- Entropy, Temperature, Specific Heat
Premature termination
0 500 1000 1500 2000 2500 3000
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
Iteration Number
Syst
em Energy
SA convergence history
current configuration
new best configuration
Indicator: Best Configuration found only shortly
before Simulated Annealing terminated.
Final Example: Telescope Array
Optimization
? Place N=27 stations in xy within a 200 km radius
? Minimize UV density metric
? Ideally also minimize cable length
Cable Length = 1064.924 UV Density = 0.67236
200
100
0
Initial
Configuration
-100
-200
100
50
0
-50
-100
-200 -100 0 100 200 -100 -50 0 50 100
Optimized Solution
-200 -100 0 100 200
-200
-100
0
100
200
-100 -50 0 50 100
-100
-50
0
50
100
Cable Length = 1349.609 UV Density = 0.33333
Simulated Annealing
Improved UV density from 0.67 to 0.33
? Simulated Annealing transforms the array:
– Hub-and-Spoke Circle-with-arms
Reference available
Summary
Summary: Steps of SA
? The Simulated Annealing Algorithm
1) Choose a random X , select the initial system temperature, and outline the
i
cooling (ie. annealing) schedule
2) Evaluate E(X ) using a simulation model
i
3) Perturb X to obtain a neighboring Design Vector (X
i+1
)
i
4) Evaluate E(X
i+1
) using a simulation model
5) If E(X
i+1
)< E(X ), X
i+1
is the new current solution
i
6) If E(X
i+1
)> E(X ), then accept X
i+1
as the new current solution with a
i
probability e
(- D/T)
where D = E(X
i+1
) - E(X ).
i
7) Reduce the system temperature according to the cooling schedule.
8) Terminate the algorithm.
Research in SA
? Alternative Cooling Schedules and Termination
criteria
? Adaptive Simulated Annealing (ASA) –
determines its own cooling schedule
? Hybridization with other Heuristic Search
Methods (GA, Tabu Search …)
? Multiobjective Optimization with SA
References
? Cerny, V., "Thermodynamical Approach to the Traveling Salesman Problem: An Efficient
Simulation Algorithm", J. Opt. Theory Appl., 45, 1, 41-51, 1985 .
? de Weck O. “System Optimization with Simulated Annealing (SA), Memorandum
? Cohanim B., Hewitt J, and de Weck O. L. “The Design of Radio Telescope Array
Configurations using Multiobjective Optimization: Imaging Performance versus Cable
Length”, The Astrophysical Journal, (in press), 2004
? Jilla, C.D., and Miller, D.W., “Assessing the Performance of a Heuristic Simulated
Annealing Algorithm for the Design of Distributed Satellite Systems,” Acta Astronautica,
Vol. 48, No. 5-12, 2001, pp. 529-543.
? Kirkpatrick, S., Gelatt, C.D., and Vecchi, M.P., “Optimization by Simulated
Annealing,” Science, Volume 220, Number 4598, 13 May 1983, pp. 671-680.
? Metropolis,N., A. Rosenbluth, M. Rosenbluth, A. Teller, E. Teller, "Equation of State
Calculations by Fast Computing Machines", J. Chem. Phys.,21, 6, 1087-1092, 1953.