1 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Post-Optimality Analysis
Lecture 14
17 March 2004
Olivier de Weck
Karen Willcox
Multidisciplinary System
Design Optimization (MSDO)
2 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Today’s Topics
? Optimality Conditions & Termination
– Gradient-based techniques
– Heuristic techniques
? Lagrange Multipliers
? Objective Function Behavior
? Scaling
3 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Standard Problem Definition
1
2
min ( )
s.t. ( ) 0 1,..,
( ) 0 1,..,
1,..,
j
k
u
iii
J
gjm
hkm
xxxi n
≤=
==
≤≤ =
x
x
x
A
For now, we consider a single objective function, J(x).
There are n design variables, and a total of m
constraints (m=m
1
+m
2
).
The bounds are known as side constraints.
4 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Kuhn-Tucker Conditions
If x* is optimum, these conditions are satisfied:
1. x* is feasible
2. λ
j
g
j
(x*) = 0, j=1,..,m
1
and λ
j
≥ 0
3.
The Kuhn-Tucker conditions are necessary and
sufficient if the design space is convex.
sign in edunrestrict
0
0)()()(
1
2
1
1
1
*
1
**
km
j
m
k
kkm
m
j
jj
xhxgxJ
+
=
+
=
≥
=?+?+?
∑∑
λ
λ
λλ
5 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Kuhn-Tucker Conditions:
Interpretation
Condition 1: the optimal design satisfies the constraints
Condition 2: if a constraint is not precisely satisfied, then
the corresponding Lagrange multiplier is zero
– the j
th
Lagrange multiplier represents the sensitivity of the
objective function to the j
th
constraint
– can be thought of as representing the “tightness” of the
constraint
– if λ
j
is large, then constraint j is important for this solution
Condition 3: the gradient of the Lagrangian vanishes at
the optimum
6 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Optimization for Engineering
Problems
? Most engineering problems have a complicated design
space, usually with several local optima
? Gradient-based methods can have trouble converging to the
correct solution
? Heuristic techniques offer absolutely no guarantee of
optimality, neither global nor local
? Your post-optimality analysis should address the question:
– How confident are you that you have found the global
optimum?
– Do you actually care?
7 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Optimization for Engineering
Problems
? Usually cannot guarantee that absolute optimum is found
– local optima
– numerical ill-conditioning
? gradient-based techniques should be started from
several initial solutions
? best solution from a heuristic technique should be
checked with KT conditions or used as an initial
condition for a gradient-based algorithm
? Can determine mathematically if have relative minimum but
Kuhn-Tucker conditions are only sufficient if the problem is
convex
? It is very important to interrogate the “optimum” solution
8 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Termination Criteria: Gradient-Based
Gradient-based algorithm is terminated when ...
an acceptable solution is found
OR
algorithm terminates unsuccessfully
Need to decide:
? when an acceptable solution is found
? when to stop the algorithm with no acceptable
solution
– when progress is unreasonably slow
– when a specified amount of resources have
been used (time, number of iterations, etc.)
– when an acceptable solution does not exist
– when the iterative process is cycling
9 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Termination Criteria: Gradient-Based
?Is x
k
an acceptable solution?
?does x
k
almost satisfy the conditions for optimality?
?has the sequence { x
k
} converged?
? Often the first question is difficult to test
? The tests are often approximate
? Often rely on the answer to the second question
? But convergence to a non-optimal solution or extended
lack of progress can look the same as convergence to
the correct solution!
? No one set of termination criteria is suitable for all
optimization problems and all methods
10 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Termination Criteria: Gradient-Based
Has the sequence { x
k
} converged?
*
k
JJ ε?≤ *
k
ε?≤xxIdeally: or
1kk
JJ ε
+
?≤
1kk
ε
+
?≤xxIn practice: or
Also should check constraint satisfaction:
k
ε≤g
11 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Termination Criteria: Heuristics
Typical
Results
generation
Converged too
fast (mutation rate
too small?)
- GEN = max(GEN): maximum # of generations reached
- Stagnation in Fitness, no progress made on objective
- Dominant Schema have emerged
GA Termination
global
optimum
(unknown)
12 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Termination Criteria: Heuristics
13 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Termination Criteria: Heuristics
? Simulated Annealing - cooling schedule: T(k)=f(k, To)
To
0
Search stops when
T(k)<ε , where ε>0,
but small
? Tabu search termination
? Usually after a predefined number of iterations
? Best solution found is reported
? No guarantee of optimality
? Experimentation gives confidence
search broadly
search locally
14 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Termination in iSIGHT
? Be careful with iSIGHT: the last solution is not
necessarily the best one
? Need to look back over all the solutions in the monitor to
find the “optimum”
? Feasibility parameter:
3 = infeasible
7 = feasible
8 = feasible and equal to the best design found so far
9 = feasible and the best design found so far
? The “optimum” will be the last solution with feasibility=9
15 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Post-Optimality Analysis
? Already talked about sensitivity analysis (Lecture 9)
– How does optimal solution change as a
parameter is varied?
– How does optimal solution change as a design
variable value is varied?
– How does optimal solution change as constraints
are varied?
? Also would like to understand key drivers in optimal
design
16 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Lagrange Multipliers
? The values of the Lagrange multipliers at the optimal
solution, λ
j
*
, give information on the constraints.
?If λ
j
*
is zero, then constraint j is inactive.
?If λ
j
*
is positive, then constraint j is active.
? The value of the j
th
Lagrange multiplier tells you by
how much the objective function will change if the
constraint is varied by a small amount:
T
(*) (*)Jgλ?=??xx
– λ is a vector containing the m LMs
– is a vector containing all m constraints (inequality+equality)g
17 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Objective Function Behavior
Consider the quadratic function:
1
()
2
Φ= +
TT
xcx xHx
The behavior of Φ(x) in the neighborhood of a local
minimum is determined by the eigenvalues of H.
1
????()()()()
2
α αααΦ+ = + + + +
TT
xpcxp xpHxp
2
11
??? ? ?
2222
α α
αα=+ + + + +
TT T T T T
cx xHx cp pHp xHp pHx
2
1
???()()()
2
T
αα αΦ+ =Φ + ++
T
x p x p Hx c p Hp
18 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Objective Function Behavior
2
1
???()()()
2
T
αα αΦ+ =Φ + ++
T
x p x p Hx c p Hp
Consider the neighborhood of the optimal solution:
? *=xx
(*) * 0?Φ = + =xHxc
* =?Hx cor
2
1
(* ) (*)
2
ααΦ+=Φ+
T
xp x pHp
The behavior of Φ in the neighborhood
of x* is determined by H.
19 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Objective Function Behavior
Let v
j
, λ
j
be the j
th
eigenvector and eigenvalue of H:
and since H is symmetric.
Consider the case when p=v
j
:
j jj
H λ=vv
T
ij ij
v δ=v
2
2
1
(* ) (*)
2
1
*)
2
j jj
j
αα
αλ
Φ+ =Φ+
=Φ( +
T
xv x vHv
x
As we move away from x* along the direction v
j
, the
change in the objective depends on the sign of λ
j
.
20 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Objective Function Behavior
2
1
(* ) *)
2
j j
α αλΦ+ =Φ(+xv x
?If λ
j
>0, Φ increases
?If λ
j
<0, Φ decreases
?If λ
j
=0, Φ remains constant
? When all λ
j
>0, x* is a minimum of Φ
? The contours of Φ are ellipsoids
– principal axes in directions of eigenvectors
– lengths of principal axes inversely proportional to
square roots of eigenvalues
21 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Objective Function Behavior
v
1
v
2
contours of constant Φ
direction of first
eigenvector
? If λ
2
=λ
1
, the contours are circular
?As λ
2
/λ
1
gets very small, the ellipsoids get more and more stretched
? If any eigenvalue is very close to zero, Φ will change very little when
moving along that eigenvector
direction of second
eigenvector
Φ(x)
x
1
x
2
22 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Hessian Condition Number
? The condition number of the Hessian is given by
? When κ(H)=1, the objective function contours are circular
?As κ(H) increases, the contours become elongated
?If κ(H)>>1, the change in the objective function due to a
small change in x will vary radically depending on the
direction of perturbation
? κ(H) can be computed via a Cholesky factorization
(H=LDL
T
)
n
λ
κ
λ
1
()=Η
23 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Scaling
? In theory we should be able to choose any scaling of
the design variables, constraints and objective
functions without affecting the solution
? In practice, the scaling can have a large effect on the
solution
? numerical accuracy, numerical conditioning
? From Papalambros, p. 352: “scaling is the single
most important, but simplest, reason that can make
the difference between success and failure of a
design optimization algorithm”
24 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Scaling
1
2
123
3
12
2
3
min 3
s.t. 5 2 1
632
0
i
xxx
xx
xx
x
++
+≥
?≥
≥
1
2
123
3
12
2
3
min 10 30 10 28
s.t. 5 2 1
632
0
i
xxx
xx
xx
x
+++
+≥
?≥
≥
1
2
123
3
12
2
3
min 3
s.t. 50 20 10
632
0
i
xxx
xx
xx
x
++
+≥
?≥
≥
1
2
12 3
3
12
2
3
min 3 10
s.t. 5 2 1
6302
0
i
xx x
xx
xx
x
++
+≥
?≥
≥
≡
≡
scale
objective
s
c
a
l
e
d
e
s
i
g
n
v
a
r
i
a
b
l
e
scale
constraint
25 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Design Variable Scaling
? In aircraft design, we are combining variables of very
different magnitudes
? e.g. aircraft range ~ 10
6
m
wing span ~ 10
1
m
skin thickness ~ 10
-3
m
? Need to non-dimensionalize and scale variables to
be of similar magnitude in the region of interest
? Want each variable to be of similar weight during the
optimization
26 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Design Variable Scaling
? Consider the transformation x = Ly, where L is an arbitrary
nonsingular transformation matrix
? If at any iteration in the algorithm, x
k
= Ly
k
(using exact
arithmetic), then the algorithm is said to be scale invariant
? In practice, this property will not hold
? The conditioning of the Hessian matrix at x* gives us
information about the scaling of the design variables
? When H(x) is ill-conditioned, J(x) varies much more rapidly
along some directions than along others
? The ill-conditioning of the Hessian is a form of bad scaling,
since similar changes in ||x|| do not cause similar changes
in J
27 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Design Variable Scaling
? We saw that if H(x*) is ill-conditioned (κ(H)>>1), then the
change in the objective function due to a small change in x
will vary radically depending on the direction of perturbation
?J(x) may vary so slowly along an eigenvector associated with
a near-zero eigenvalue that changes that should be
significant are lost in rounding error
? We would like to scale our design variables so that κ(H)~1
? In practice this may be unachievable (often we don’t know H)
? Often, a diagonal scaling is used where we consider only the
diagonal elements of H(x
0
) and try to make them close to
unity
28 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Objective Function Scaling
? In theory, we can multiply J(x) by any constant or add
a constant term, and not affect the solution
? In practice, it is generally desirable to have J~O(1) in
the region of interest
? Algorithms can have difficulties if J(x) is very small
everywhere, since convergence is usually tested using
some small quantity
? Inclusion of a constant term can also cause difficulties,
since the error associated with the sum may reflect the
size of the constant rather than the size of J(x)
e.g. min x
1
2
+x
2
2
vs. min x
1
2
+x
2
2
+1000
29 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Constraint Scaling
A well scaled set of constraints has two properties:
– each constraint is well conditioned with respect
to perturbations in the design variables
– the constraints are balanced with respect to
each other, i.e. all constraints have an equal
weighting in the optimization
The scaling of constraints can have a major effect on the
path chosen by the optimizer. For example, many
algorithms maintain a set of active constraints and from
one iteration to the next they interchange one active and
one inactive constraint. Constraint scaling impacts the
selection of which constraint to add or delete.
30 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Scaling
Two reasons to scale:
1. At the beginning of optimization, using x
0
, to
improve algorithm performance (e.g. decrease
number of iterations).
? dairy farm demo
2. At the end of optimization, using x*, to make
sure that the “optimal” solution is indeed the
best we can achieve.
? BWB example
31 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Scaling Example
? Consider optimization of the BWB
? Rather than optimizing just a single aircraft, we want to
design a family of aircraft
? This family has commonality – the planes share
common parts, planforms and systems
? Commonality can help to reduce costs
e.g. manufacturing costs
design costs
spare parts
crew training
? But will require a trade with performance
? It is easier to achieve commonality with the BWB than
with conventional tube & wing aircraft
centerbody
inner wing
outer wing
winglet
32 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Scaling Example
? Consider a two-aircraft family with common wings:
? We set up an MDO framework for each aircraft
? We link the variables that are common between the two
aircraft
BWB 3-250
272 passengers, 8550 nm
BWB 3-450
475 passengers, 8550 nm
common
different
33 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Scaling Example
0.99
1.00
1.01
1.02
1.03
0 2040608010
Iteration number
Plane 1
Plane 2
? In order to test the framework, we first try to optimize a
family with no commonality
? We should get the point-design solutions for each plane
? However, the algorithm (SQP) does not converge to the
correct solution for the smaller plane!
MTOW
Point-Design
MTOW
MTOW results
for last 100
iterations
34 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Scaling Example
0.99
1.00
1.01
1.02
1.03
0 2040608010
Plane 1
Plane 2 (scaled)
Plane 2
(unscaled)
? When we look at the Hessian matrix at the solution,
we see that the diagonal entries corresponding to the
design variables of Plane 2 are badly scaled
? We rescale these variables, and now the algorithm
converges
Iteration number
MTOW
Point-Design
MTOW
MTOW results
for last 100
iterations
~4000 kg
35 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Lecture Summary
? Optimality Conditions
? Objective Function Behavior
? Scaling
? In practice, optimization can be very difficult to implement:
algorithms can behave badly, and it can be difficult
(impossible) to verify that a solution is truly optimal
? Numerical accuracy is a real issue and can drastically
affect results, especially if the problem is not well scaled
? It is very important to interrogate the “optimum” solution
carefully
? The mathematical tools you learn are very useful in
practice, but they must be applied carefully!
36 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
References
? Gill, P.E, Murray, W. and Wright, M. H., Practical
Optimization, Academic Press, London, 1981
? Vanderplaats, G.N., Numerical Optimization Techniques for
Engineering Design, Vanderplaats R&D, 1999.
? Willcox, K. and Wakayama, S., “ Simultaneous Optimization of
a Multiple-Aircraft Family,” AIAA Paper 2002-1423, 2002.