1 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Post-Optimality Analysis Lecture 14 17 March 2004 Olivier de Weck Karen Willcox Multidisciplinary System Design Optimization (MSDO) 2 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Today’s Topics ? Optimality Conditions & Termination – Gradient-based techniques – Heuristic techniques ? Lagrange Multipliers ? Objective Function Behavior ? Scaling 3 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Standard Problem Definition 1 2 min ( ) s.t. ( ) 0 1,.., ( ) 0 1,.., 1,.., j k u iii J gjm hkm xxxi n ≤= == ≤≤ = x x x A For now, we consider a single objective function, J(x). There are n design variables, and a total of m constraints (m=m 1 +m 2 ). The bounds are known as side constraints. 4 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Kuhn-Tucker Conditions If x* is optimum, these conditions are satisfied: 1. x* is feasible 2. λ j g j (x*) = 0, j=1,..,m 1 and λ j ≥ 0 3. The Kuhn-Tucker conditions are necessary and sufficient if the design space is convex. sign in edunrestrict 0 0)()()( 1 2 1 1 1 * 1 ** km j m k kkm m j jj xhxgxJ + = + = ≥ =?+?+? ∑∑ λ λ λλ 5 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Kuhn-Tucker Conditions: Interpretation Condition 1: the optimal design satisfies the constraints Condition 2: if a constraint is not precisely satisfied, then the corresponding Lagrange multiplier is zero – the j th Lagrange multiplier represents the sensitivity of the objective function to the j th constraint – can be thought of as representing the “tightness” of the constraint – if λ j is large, then constraint j is important for this solution Condition 3: the gradient of the Lagrangian vanishes at the optimum 6 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Optimization for Engineering Problems ? Most engineering problems have a complicated design space, usually with several local optima ? Gradient-based methods can have trouble converging to the correct solution ? Heuristic techniques offer absolutely no guarantee of optimality, neither global nor local ? Your post-optimality analysis should address the question: – How confident are you that you have found the global optimum? – Do you actually care? 7 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Optimization for Engineering Problems ? Usually cannot guarantee that absolute optimum is found – local optima – numerical ill-conditioning ? gradient-based techniques should be started from several initial solutions ? best solution from a heuristic technique should be checked with KT conditions or used as an initial condition for a gradient-based algorithm ? Can determine mathematically if have relative minimum but Kuhn-Tucker conditions are only sufficient if the problem is convex ? It is very important to interrogate the “optimum” solution 8 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Termination Criteria: Gradient-Based Gradient-based algorithm is terminated when ... an acceptable solution is found OR algorithm terminates unsuccessfully Need to decide: ? when an acceptable solution is found ? when to stop the algorithm with no acceptable solution – when progress is unreasonably slow – when a specified amount of resources have been used (time, number of iterations, etc.) – when an acceptable solution does not exist – when the iterative process is cycling 9 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Termination Criteria: Gradient-Based ?Is x k an acceptable solution? ?does x k almost satisfy the conditions for optimality? ?has the sequence { x k } converged? ? Often the first question is difficult to test ? The tests are often approximate ? Often rely on the answer to the second question ? But convergence to a non-optimal solution or extended lack of progress can look the same as convergence to the correct solution! ? No one set of termination criteria is suitable for all optimization problems and all methods 10 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Termination Criteria: Gradient-Based Has the sequence { x k } converged? * k JJ ε?≤ * k ε?≤xxIdeally: or 1kk JJ ε + ?≤ 1kk ε + ?≤xxIn practice: or Also should check constraint satisfaction: k ε≤g 11 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Termination Criteria: Heuristics Typical Results generation Converged too fast (mutation rate too small?) - GEN = max(GEN): maximum # of generations reached - Stagnation in Fitness, no progress made on objective - Dominant Schema have emerged GA Termination global optimum (unknown) 12 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Termination Criteria: Heuristics 13 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Termination Criteria: Heuristics ? Simulated Annealing - cooling schedule: T(k)=f(k, To) To 0 Search stops when T(k)<ε , where ε>0, but small ? Tabu search termination ? Usually after a predefined number of iterations ? Best solution found is reported ? No guarantee of optimality ? Experimentation gives confidence search broadly search locally 14 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Termination in iSIGHT ? Be careful with iSIGHT: the last solution is not necessarily the best one ? Need to look back over all the solutions in the monitor to find the “optimum” ? Feasibility parameter: 3 = infeasible 7 = feasible 8 = feasible and equal to the best design found so far 9 = feasible and the best design found so far ? The “optimum” will be the last solution with feasibility=9 15 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Post-Optimality Analysis ? Already talked about sensitivity analysis (Lecture 9) – How does optimal solution change as a parameter is varied? – How does optimal solution change as a design variable value is varied? – How does optimal solution change as constraints are varied? ? Also would like to understand key drivers in optimal design 16 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Lagrange Multipliers ? The values of the Lagrange multipliers at the optimal solution, λ j * , give information on the constraints. ?If λ j * is zero, then constraint j is inactive. ?If λ j * is positive, then constraint j is active. ? The value of the j th Lagrange multiplier tells you by how much the objective function will change if the constraint is varied by a small amount: T (*) (*)Jgλ?=??xx – λ is a vector containing the m LMs – is a vector containing all m constraints (inequality+equality)g 17 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Objective Function Behavior Consider the quadratic function: 1 () 2 Φ= + TT xcx xHx The behavior of Φ(x) in the neighborhood of a local minimum is determined by the eigenvalues of H. 1 ????()()()() 2 α αααΦ+ = + + + + TT xpcxp xpHxp 2 11 ??? ? ? 2222 α α αα=+ + + + + TT T T T T cx xHx cp pHp xHp pHx 2 1 ???()()() 2 T αα αΦ+ =Φ + ++ T x p x p Hx c p Hp 18 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Objective Function Behavior 2 1 ???()()() 2 T αα αΦ+ =Φ + ++ T x p x p Hx c p Hp Consider the neighborhood of the optimal solution: ? *=xx (*) * 0?Φ = + =xHxc * =?Hx cor 2 1 (* ) (*) 2 ααΦ+=Φ+ T xp x pHp The behavior of Φ in the neighborhood of x* is determined by H. 19 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Objective Function Behavior Let v j , λ j be the j th eigenvector and eigenvalue of H: and since H is symmetric. Consider the case when p=v j : j jj H λ=vv T ij ij v δ=v 2 2 1 (* ) (*) 2 1 *) 2 j jj j αα αλ Φ+ =Φ+ =Φ( + T xv x vHv x As we move away from x* along the direction v j , the change in the objective depends on the sign of λ j . 20 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Objective Function Behavior 2 1 (* ) *) 2 j j α αλΦ+ =Φ(+xv x ?If λ j >0, Φ increases ?If λ j <0, Φ decreases ?If λ j =0, Φ remains constant ? When all λ j >0, x* is a minimum of Φ ? The contours of Φ are ellipsoids – principal axes in directions of eigenvectors – lengths of principal axes inversely proportional to square roots of eigenvalues 21 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Objective Function Behavior v 1 v 2 contours of constant Φ direction of first eigenvector ? If λ 2 =λ 1 , the contours are circular ?As λ 2 /λ 1 gets very small, the ellipsoids get more and more stretched ? If any eigenvalue is very close to zero, Φ will change very little when moving along that eigenvector direction of second eigenvector Φ(x) x 1 x 2 22 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Hessian Condition Number ? The condition number of the Hessian is given by ? When κ(H)=1, the objective function contours are circular ?As κ(H) increases, the contours become elongated ?If κ(H)>>1, the change in the objective function due to a small change in x will vary radically depending on the direction of perturbation ? κ(H) can be computed via a Cholesky factorization (H=LDL T ) n λ κ λ 1 ()=Η 23 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Scaling ? In theory we should be able to choose any scaling of the design variables, constraints and objective functions without affecting the solution ? In practice, the scaling can have a large effect on the solution ? numerical accuracy, numerical conditioning ? From Papalambros, p. 352: “scaling is the single most important, but simplest, reason that can make the difference between success and failure of a design optimization algorithm” 24 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Scaling 1 2 123 3 12 2 3 min 3 s.t. 5 2 1 632 0 i xxx xx xx x ++ +≥ ?≥ ≥ 1 2 123 3 12 2 3 min 10 30 10 28 s.t. 5 2 1 632 0 i xxx xx xx x +++ +≥ ?≥ ≥ 1 2 123 3 12 2 3 min 3 s.t. 50 20 10 632 0 i xxx xx xx x ++ +≥ ?≥ ≥ 1 2 12 3 3 12 2 3 min 3 10 s.t. 5 2 1 6302 0 i xx x xx xx x ++ +≥ ?≥ ≥   ≡ ≡ scale objective s c a l e d e s i g n v a r i a b l e scale constraint 25 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Design Variable Scaling ? In aircraft design, we are combining variables of very different magnitudes ? e.g. aircraft range ~ 10 6 m wing span ~ 10 1 m skin thickness ~ 10 -3 m ? Need to non-dimensionalize and scale variables to be of similar magnitude in the region of interest ? Want each variable to be of similar weight during the optimization 26 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Design Variable Scaling ? Consider the transformation x = Ly, where L is an arbitrary nonsingular transformation matrix ? If at any iteration in the algorithm, x k = Ly k (using exact arithmetic), then the algorithm is said to be scale invariant ? In practice, this property will not hold ? The conditioning of the Hessian matrix at x* gives us information about the scaling of the design variables ? When H(x) is ill-conditioned, J(x) varies much more rapidly along some directions than along others ? The ill-conditioning of the Hessian is a form of bad scaling, since similar changes in ||x|| do not cause similar changes in J 27 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Design Variable Scaling ? We saw that if H(x*) is ill-conditioned (κ(H)>>1), then the change in the objective function due to a small change in x will vary radically depending on the direction of perturbation ?J(x) may vary so slowly along an eigenvector associated with a near-zero eigenvalue that changes that should be significant are lost in rounding error ? We would like to scale our design variables so that κ(H)~1 ? In practice this may be unachievable (often we don’t know H) ? Often, a diagonal scaling is used where we consider only the diagonal elements of H(x 0 ) and try to make them close to unity 28 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Objective Function Scaling ? In theory, we can multiply J(x) by any constant or add a constant term, and not affect the solution ? In practice, it is generally desirable to have J~O(1) in the region of interest ? Algorithms can have difficulties if J(x) is very small everywhere, since convergence is usually tested using some small quantity ? Inclusion of a constant term can also cause difficulties, since the error associated with the sum may reflect the size of the constant rather than the size of J(x) e.g. min x 1 2 +x 2 2 vs. min x 1 2 +x 2 2 +1000 29 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Constraint Scaling A well scaled set of constraints has two properties: – each constraint is well conditioned with respect to perturbations in the design variables – the constraints are balanced with respect to each other, i.e. all constraints have an equal weighting in the optimization The scaling of constraints can have a major effect on the path chosen by the optimizer. For example, many algorithms maintain a set of active constraints and from one iteration to the next they interchange one active and one inactive constraint. Constraint scaling impacts the selection of which constraint to add or delete. 30 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Scaling Two reasons to scale: 1. At the beginning of optimization, using x 0 , to improve algorithm performance (e.g. decrease number of iterations). ? dairy farm demo 2. At the end of optimization, using x*, to make sure that the “optimal” solution is indeed the best we can achieve. ? BWB example 31 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Scaling Example ? Consider optimization of the BWB ? Rather than optimizing just a single aircraft, we want to design a family of aircraft ? This family has commonality – the planes share common parts, planforms and systems ? Commonality can help to reduce costs e.g. manufacturing costs design costs spare parts crew training ? But will require a trade with performance ? It is easier to achieve commonality with the BWB than with conventional tube & wing aircraft centerbody inner wing outer wing winglet 32 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Scaling Example ? Consider a two-aircraft family with common wings: ? We set up an MDO framework for each aircraft ? We link the variables that are common between the two aircraft BWB 3-250 272 passengers, 8550 nm BWB 3-450 475 passengers, 8550 nm common different 33 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Scaling Example 0.99 1.00 1.01 1.02 1.03 0 2040608010 Iteration number Plane 1 Plane 2 ? In order to test the framework, we first try to optimize a family with no commonality ? We should get the point-design solutions for each plane ? However, the algorithm (SQP) does not converge to the correct solution for the smaller plane! MTOW Point-Design MTOW MTOW results for last 100 iterations 34 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Scaling Example 0.99 1.00 1.01 1.02 1.03 0 2040608010 Plane 1 Plane 2 (scaled) Plane 2 (unscaled) ? When we look at the Hessian matrix at the solution, we see that the diagonal entries corresponding to the design variables of Plane 2 are badly scaled ? We rescale these variables, and now the algorithm converges Iteration number MTOW Point-Design MTOW MTOW results for last 100 iterations ~4000 kg 35 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Lecture Summary ? Optimality Conditions ? Objective Function Behavior ? Scaling ? In practice, optimization can be very difficult to implement: algorithms can behave badly, and it can be difficult (impossible) to verify that a solution is truly optimal ? Numerical accuracy is a real issue and can drastically affect results, especially if the problem is not well scaled ? It is very important to interrogate the “optimum” solution carefully ? The mathematical tools you learn are very useful in practice, but they must be applied carefully! 36 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics References ? Gill, P.E, Murray, W. and Wright, M. H., Practical Optimization, Academic Press, London, 1981 ? Vanderplaats, G.N., Numerical Optimization Techniques for Engineering Design, Vanderplaats R&D, 1999. ? Willcox, K. and Wakayama, S., “ Simultaneous Optimization of a Multiple-Aircraft Family,” AIAA Paper 2002-1423, 2002.