1 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Multidisciplinary System Design Optimization (MSDO) Numerical Optimization I Lecture 6 23 February 2004 Karen Willcox 2 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Today’s Topics Existence & Uniqueness of an Optimum Solution Kuhn-Tucker Conditions Convex Spaces Unconstrained Problems Linear Programming 3 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Disclaimer! This is not a classic optimization class ... The aim is not to teach you the details of optimization algorithms, but rather to expose you to different methods. We will utilize optimization techniques – the goal is to understand enough to be able to utilize them wisely. If you plan to use optimization extensively in your research, you should take an optimization class, e.g. 15.093 4 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Learning Objectives After the next two lectures, you should: be familiar with what gradient-based optimization techniques are available understand the basics of how each technique works be able to choose which optimization technique is appropriate for your problem understand what to look for when the algorithm terminates understand why the algorithm might fail 5 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics How to Choose an Algorithm? Number of design variables Type of design variables (real/integer, continuous/discrete) Linear/nonlinear Equality/inequality constraints Discontinuous feasible spaces Initial solution feasible/infeasible Simulation code runtime 6 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Technique Overview Steepest Descent Conjugate Gradient Quasi-Newton Newton Simplex – linear SLP – linear SQP – nonlinear, expensive, common in engineering applications Exterior Penalty – nonlinear, discontinuous design spaces Interior Penalty – nonlinear Generalized Reduced Gradient – nonlinear Method of Feasible Directions – nonlinear Mixed Integer Programming UNCONSTRAINED CONSTRAINED 7 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Standard Problem Definition 1 2 min ( ) s.t. ( ) 0 1,.., ( ) 0 1,.., 1,.., j k u iii J gjm hkm xxxi n d dd x x x A For now, we consider a single objective function, J(x). There are n design variables, and a total of m constraints (m=m 1 +m 2 ). The bounds are known as side constraints. 8 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Linear vs. Nonlinear The objective function is a linear function of the design variables if each design variable appears only to the first power with constant coefficients multiplying it. 12 3 () 2 3.4Jxx x  x is linear in x=[x 1 x 2 x 3 ] T 12 2 3 () 2 3.4Jxxx x x is nonlinear in x 12 ( ) cos( ) 2 3.4Jxx x 3 x is nonlinear in x 9 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Linear vs. Nonlinear A constraint is a linear function of the design variables if each design variable appears only to the first power with constant coefficients multiplying it. 12 3 621xx x 0 is linear in x is nonlinear in x 0 2 12 3 621xx x  is nonlinear in x 123 6sin()2 10xxx 10 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Iterative Optimization Procedures Most optimization algorithms are iterative: where q=iteration number S=vector search direction D=scalar distance and the initial solution x 0 is given The algorithm determines the search direction S according to some criteria. Gradient-based algorithms use gradient information to decide where to move. 1qq qq D  xx S 12 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Gradient Vector Consider a function J(x), x=[x 1 ,x 2 ,...,x n ] The gradient of J(x) at a point x 0 is a vector of length n: 0 1 0 0 2 0 () () () () n J x J xJ J x w ao ?? w w?? w? ?? ?? w ??w ?? x x x x # Each element in the vector is evaluated at the point x 0 . 13 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Hessian Matrix Consider a function J(x), x=[x 1 ,x 2 ,...,x n ] The second derivative of J(x) at a point x 0 is a matrix of size nun: 22 2 2 112 1 2 12020 22 2 1 () () n nn JJ J xxx xx J xx J JJ xx x ao ww w ?? www ww w ww ? ? {? ? ? ?? ww ww w ?? Hx x "" %# #% # Each element in the matrix is evaluated at the point x 0 . 14 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Gradients & Hessian Example 23 112 3 23 () 3 6Jxxxx  x 15 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Taylor Series Consider scalar case: 0 0 2 00 02 2 1 ()() () () 2 z z df d f fz fz z z z z dz dz   " When function depends on a vector: 000 00 () () ()( ) 1 ()()() 2 T T JJ J ao ?  ??   0 xx xxx xx Hx xx " 1unnu1 1unnu1nun 16 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Finite Difference Approximation The gradient vector and Hessian matrix can be approximated using finite differences if they are not available analytically. First-order gradient approximation: j njnjj j x xxxxJxxxxxJ x J G G ),,,,,(),,,,,( 2121 !!!!  | w w Second-order Hessian approximation: 2 12 12 12 22 (,,, ,,)2(,,,,,) (,,, ,,) iin in iin ii J Jxx x x x Jxx x x Jxx x x xGG G w  | w !! !! !! Expensive to calculate the entire Hessian if n is large! In Lecture 8 we will see various options for computing gradients. 17 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Existence & Uniqueness of an Optimum Solution Usually cannot guarantee that global optimum is found – multiple solutions may exist – numerical ill-conditioning ?start from several initial solutions Can determine mathematically if have relative minimum Under certain conditions can guarantee global optimum - but not usually!! It is very important to interrogate the “optimum” solution 18 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Existence & Uniqueness: Unconstrained Problems Unconstrained problems: at minimum, gradient must vanish ||?J(x*)|| = 0 – x* is a stationary point of J – necessary but not sufficient – here A,B,C,D all have ?J=0 ABCD J(x) x Calculus: at minimum, second derivative > 0 Vector case: at minimum, H(x*) >0 (positive definite) 19 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics SPD Matrices Positive definiteness: y T Hy > 0 for all yz0 Consider the i th eigenmode of H: Hv i = O i v i IfH is a symmetric matrix, then the eigenvectors of H form an orthogonal set: v i T v j = G ij Any vector y can be thought of as a linear combination of eigenvectors: y = |a i v i Then: Therefore, if the eigenvalues of H are all positive, H is SPD. 2 , TT T ii j j ii jj j i i ijij aa aa aOO ||| | yHy v H v v v 20 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Existence & Uniqueness: Unconstrained Problems Necessary and sufficient conditions for a minimum (unconstrained problem): 1. Gradient must vanish 2. Hessian must be positive definite J(x) x relative minimum at x* ?J(x*)=0 H(x*)>0 The minimum is only a global optimum if H(x)>0 for all values of x (e.g. simple parabola). 21 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Existence & Uniqueness: Constrained Problems At optimum: – at least one constraint on design is active – ?J does not have to be zero In order to improve design: – move in direction that decreases objective – move in direction that does not violate constraints Usable direction = any direction that reduces objective S T ?J(x) d 0 Feasible direction = a direction in which a small move will not violate constraints S T ?g i (x) d 0 (for all active constraints i) Note that these conditions may be relaxed for certain algorithms. ABCD J(x) x constraint optimum 22 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Lagrangian Functions A constrained minimization can be written as an unconstrained minimization by defining the Lagrangian function: 12 1 11 (,) () () () mm jj mkk jk LJ g hOOO    || xx x x L(x,O) is called the Lagrangian function. O i is the j th Lagrange multiplier It can be shown that a stationary point x* of L(x,O) is a stationary point of the original constrained minimization problem. 23 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Lagrangian Example 22 12 12 min ( ) 3 s.t. 1 Jxx xx   x 24 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Existence & Uniqueness: Constrained Problems Necessary condition for optimality: the gradient of the Lagrangian must vanish: 12 1 1 ** * 11 () () ()0 0 unrestricted in sign mm jj mkk jk j mk Jg hOO O O   ??  ? t || xx x O i are Lagrange multipliers Not sufficient!! 25 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Kuhn-Tucker Conditions If x* is optimum, these conditions are satisfied: 1. x* is feasible 2. O j g j (x*) = 0, j =1,..,m 1 and O j t 0 3. The Kuhn-Tucker conditions are necessary and sufficient if the design space is convex. signinedunrestrict 0 0)()()( 1 2 1 1 1 * 1 ** km j m k kkm m j jj xhxgxJ   t ??? || O O OO 26 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Kuhn-Tucker Conditions: Interpretation Condition 1: the optimal design satisfies the constraints Condition 2: if a constraint is not precisely satisfied, then the corresponding Lagrange multiplier is zero – the j th Lagrange multiplier represents the sensitivity of the objective function to the j th constraint – can be thought of as representing the “tightness” of the constraint – if O j is large, then constraint j is important for this solution Condition 3: the gradient of the Lagrangian vanishes at the optimum 27 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Convex Sets Consider a set, and imagine drawing a line connecting any two points in the set. If every point along that line is inside the set, then the set is convex. If any point along that line is outside the set, then the set is non-convex. The line connecting points x 1 and x 2 is given by w = Tx 1 + (1-T)x 2 ,0dTd1 28 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Convex Functions Informal definition: a convex function is one that will hold water, while a concave function will not hold water... convex concave neither A function f(x) bounding a convex set is convex if: f(x) x x 1 f(x 1 ) x 2 f(x 2 ) 121 [(1 ]()(1 )fffTTT T   d  xxx x 29 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Convex Spaces Pick any two points in the feasible region. If all points on the line connecting these points lie in the feasible region, then the constraint surfaces are convex. If the objective function is convex, then it has only one optimum (the global one) and the Hessian matrix is positive definite for all possible designs. If the objective function and all constraint surfaces are convex, then the design space is convex, and the Kuhn-Tucker conditions are sufficient to guarantee that x* is a global optimum. In general, for engineering problems, the design space is not convex ... 30 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Convergence Criteria Algorithm has converged when ... no change in the objective function is obtained OR the maximum number of iterations is reached Once the “optimal” solution has been obtained, the K-T conditions should be checked. 31 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Optimization Process x 0 , q=0 Calculate ?J(x q ) Calculate S q Perform 1-D search x q = x q-1 + D S q Converged? q=q+1 no yes Done 32 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Unconstrained Problems: Solution Methods First-Order Methods – use gradient information to calculate S – steepest descent method – conjugate gradient method – quasi-Newton methods Second-Order Methods – use gradients and Hessian to calculate S – Newton method Methods to calculate gradients in Lecture 9. Often, a constrained problem can be cast as an unconstrained problems and these techniques used. 33 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics One-Dimensional Search (Choosing D) Polynomial interpolation – pick several values for D – fit polynomials to J(D) – efficient, but need to be careful with implementation Golden section search – easy to implement, but inefficient The one-dimensional search is one of the more challenging aspects of implementing a gradient-based optimization algorithm 34 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Polynomial Interpolation Example 11 22 J  -? - ? ? ?? ? ?  ˉ? ˉ ? S 12 3 4 (JccccDDDD     T 12 12 xxdJ J J J dx xDDD wwww  ? ww ww S D J dJ/dD 010-5 16 28-5 35 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Steepest Descent Algorithm: choose x 0 , set x=x 0 repeat until converged: S = -?J(x) choose Dto minimize J(x+DS) x = x + DS S q = -?J(x q-1 ) -?J(x) is the direction of max decrease of J at x doesn’t use any information from previous iterations converges slowly x D is chosen with a 1-D search (interpolation or Golden section) 36 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Conjugate Gradient S 1 = -?J(x 0 ) S q = -?J(x q-1 ) + E q S q-1 2 1 2 2 () () q q q J J E   ? ? x x search directions are now conjugate directions S j and S k are conjugate if S j T HS k = 0 (also called H-orthogonal) makes use of information from previous iterations without having to store a matrix 37 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Geometric Interpretation Conjugate gradientSteepest descent STEEPEST DESCENT CONJUGATE GRADIENT X 2 X X X 1 X 2 100 Adapted from: "Optimal Design in Multidisciplinary System." AIAA Professional Development Short Course Notes. September 2002. 60 100 60 20 0 20 0 -20 -40 -20 -40 X 1 8 -8 0 84-4 4 -4 8 -8 0 84-4 4 -4 38 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Newton’s Method Taylor series: 00 0 1 () () () () 2 TT JJ JGG G|? xx xx xHxx 0 G xxx where differentiate: at optimum ?J(x*)=0 00 () () ()JJ G?|? xxHxx 00 () () 0J G??  xHxx 1 00 () ()JG  ao  ? ?? xHx x 39 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Newton’s Method 1 00 () ()J  ao  ? ?? SHx x ifJ(x) is quadratic, method gives exact solution in one iteration ifJ(x) not quadratic, perform Taylor series about new point and repeat until converged a very efficient technique if started near the solution H is not usually available analytically, and finite difference is too expensive (nun matrix) H can be singular if J is linear in a design variable 40 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Quasi Newton S q = -A q ?J(x q-1 ) Also known as variable metric methods Objective and gradient information is used to create an approximation to the inverse of the Hessian A approaches H -1 during optimization of quadratic functions Convergence is similar to second-order methods (strictly 1 st order) Initialy: A=I, so S 1 is steepest descent direction then: A q+1 = A q + D q where D is a symmetric update matrix D q = fn(x q -x q-1 , ?J(x q )- ?J(x q-1 ), A q ) Various methods to determine D e.g. Davidon-Fletcher-Powell (DFP) Broydon-Fletcher-Goldfarb-Shanno (BFGS) 41 ? Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Lecture Summary Gradient vector and Hessian matrix Existence and uniqueness Optimality conditions Convex spaces Unconstrained Methods The next lecture will focus on gradient-based techniques for nonlinear constrained optimization. We will consider SQP and penalty methods. These are the methods most commonly used for engineering applications.