Ch. 1 Matrix Algebra 1 Some Terminology A matrix is a rectangular array of numbers, denoted Ai×k = [aik] = ? ?? ?? ?? ? a11 a12 . . . a1k a21 a22 . . . a2k . . . . . . . . . . . . . . . . . . ai1 ai2 . . . aik ? ?? ?? ?? ? , where a subscribed element of a matrix is always read as arow,column. Here we confine the element to be real number. A vector is a matrix with one row or one column. Therefore a row vector is A1×k and a column vector is Ai×1 and commonly denoted as a′k and ai, respec- tively. In the followings of this course, we follow conventional custom to say that a vector is a columnvector except for particular mention. The dimension of a matrix is the numbers of rows and columns it contained. If i equals to k, then A is a square matrix. Several particular types of square matrices occur in econometrics. (1). A symmetric matrix, A is one in which aik = aki for all i and k. (2). A diagonal matrix is a square matrix whose nonzero elements appears on the main diagonal, moving from upper left to lower right. (3). A scalar matrix is a diagonal matrix with the same values in all diagonal elements (4). An identity matrix is a scalar matrix with ones on the diagonal. This matrix is always denoted as I. A subscript is sometimes included to indicate its size. for example, I3 = ? ? 1 0 0 0 1 0 0 0 1 ? ?. (5). A triangular matrix is one that has only zeros either above or below the main diagonal. 1 2 Algebraic Manipulation of Matrices 2.1 Equality of Matrices Matrices A and B are equal if and only if they have the same dimensions and each elements of A equal the corresponding element of B. A=B if and only if aik = bik for all i and k. 2.2 Transposition The transpose of a matrix A, denoted A′, is obtained by creating the matrix whose kth row is the kth column of the original matrix. If A is i×k, A′ is k×i. For example, A = ? ?? ? 1 2 3 5 1 5 6 4 5 3 1 4 ? ?? ?, A′ = ? ? 1 5 6 3 2 1 4 1 3 5 5 4 ? ?. IfAis symmetric, A=A′. It is also apparentthat for any matrixA, (A′)′ = A. Finally, the transpose of a column vector, ai is a row vector: a′i = bracketleftbig a1 a2 . . . ai bracketrightbig. 2.3 Matrix Addition Matrices cannot be added unless they have the same dimension. The operation of addition is extended to matrices by defining C = A+B=[aik + bik]. We also extend the operation of subtraction to matrices precisely as if they were scalars by performing the operation element by element. Thus, C = A?B=[aik ?bik]. It follows that matrix addition is commutative, A+B = B+A, and associative, 2 (A+B) +C = A+ (B+C), and that (A+B)′ = A′ +B′. 2.4 Matrix Multiplication Matrices are multiplied by using the inner product. The inner product of two vectors, an and bn, is a scalar and is written a′b = a1b1 + a2b2 + ... + anbn = b′a. For an n×k matrix A and a k ×T matrix B, the product matrix, C = AB, is an n × T matrix whose ikth element is the inner product of row i of A and column k of B. Generally, AB negationslash= BA. The product of a matrix and a vector is a vector and is written as c = Ab = b1a1 + b2a2 + ... + bkak, where bi are ith element of vector b and ai are ith column of matrix A. Here we see that the right-hand side is a linear combination of the columns of the matrix where the coefficients are the elements of the vector. In the calculation of a matrix product C = An×kBk×T, it can be written as C = AB = [Ab1 Ab2 AbT], where bi are ith column of matrix B. Some general rules for matrix multiplication are as follow: Associate law: (AB)C = A(BC). Distributive law: A(B+C) = AB+AC. Transpose of a product: (AB)′ = B′A′. Scalar multiplication: αA = [αaik] for a scalar α. 3 2.5 Matrix Inversion Definition: A square matrix A is said to be nonsingular or invertible if there exist a unique matrix (square) B such that AB = BA = I. The matrix B is said to be a multiplicative inverse of A. We will refer to the multiplicative inverse of a nonsingular matrix A as simply the inverse of A and denote it by A?1. Some computational results involving inverse are |A?1| = 1|A|, (A?1)?1 = A, (A?1)′ = (A′)?1 (AB)?1 = B?1A?1, when both inverse matrices exist. Finally, if A is symmetric, then A?1 is also symmetric. Lemma: Suppose that A, B and A+B are all m×m nonsingular matrices. Then (A+B)?1 = A?1 ?A?1(B?1 +A?1)?1A?1. 2.6 A useful idempotent matrix Definition: An idempotent matrix is the one that is equal to its square, that is M2 = MM = M. A useful idempotent matrix we will often face is the matrix M0 = I ? 1nii′, 4 such that M0x = ? ?? ?? ?? ? x1 ? ˉx x2 ? ˉx . . . xn ? ˉx ? ?? ?? ?? ? , where i is a column of ones’s vector, x = [x1,x2,...,xn]′ and ˉx = 1n summationtextni=1 xi. Proof. As definition, M0x = (I? 1nii′)x = x? 1nii′x = x?iˉx. Exercise: Using the idempotent matrixM0 to calculatesummationtext200j=1(j?ˉj)2, whereˉj = 1200 summationtext200j=1(j) from Gauss. 2.7 Trace of Matrix The trace of a square k ×k matrix is the sums of its diagonal elements: tr(A)=summationtextki=1 aii. Some useful results are: 1. tr(cA) = c(tr(A)), 2. tr(A) = tr(A′), 3. tr(A+B) = tr(B) + tr(A), 4. tr(ABCD) = tr(BCDA) = tr(CDAB) = tr(DABC). 5 3 Geometry of Matrices Matrix algebra is extremely useful in the formulation and solution of sets of linear equations. At the same time, the algebraic results have a geometrical basis that is very helpful in understanding the calculation. It is helpful to digress to a geometric treatment of matrices and vectors. 3.1 Vector Spaces 3.1.1 Euclidean Vector Space Perhaps the most elementary vector are the Euclidean vector space Rn,n = 1,2,.... For simplicity, let us consider first R2. Non zero vector in R2 can be represented geometrically by directed line segments. Given a nonzero vector x = bracketleftbigg x 1 x2 bracketrightbigg , we can associate it with the line segment in the plane from (0,0) to (x1,x2). If we equate line segment that have the same length and direction, x can be represented by any line segment from (a,b) to (a + x1,b + x2). For example, the vector x = bracketleftbigg 2 1 bracketrightbigg in R2 could be represented by the directed line segment from (2,2) to (4,3), or from (?1,?1) to (1,0). We can think of the Euclidean length of a vector x = bracketleftbigg x 1 x2 bracketrightbigg as the length of any directed line segment representing x. The length of the segment from (0,0) to (x1,x2) is radicalbigx21 + x22. Two basic operations are defined for vectors, scalar multiplication and ad- dition. The geometric representation will help us to visualize how the operation of scalar multiplication and addition work in R2. (1). Scalar multiplication: For each vector x = bracketleftbigg x 1 x2 bracketrightbigg and each scalar α, the product αx is defined by αx = bracketleftbigg αx 1 αx2 bracketrightbigg . For example, the set of all possible scalar multiple of x is the line through 0 and x. Any scalar multiple of x is a segment of this line. Example: 6 x = bracketleftbigg 1 2 bracketrightbigg x? = 2x = bracketleftbigg 2 4 bracketrightbigg x?? = ?12x = bracketleftbigg ?1 2?1 bracketrightbigg . The vector x?(= 2x) is in the same direction as x, but its length is two times that of x. The vector x??(= ?12x) has half of length as x but its point in the opposite direction. (2). Addition: The sum of two vectors a and b is a third vector whose coordinates are the sums of the corresponding coordinates of a and b. For example , c = a+b = bracketleftbigg 1 2 bracketrightbigg + bracketleftbigg 2 1 bracketrightbigg = bracketleftbigg 3 3 bracketrightbigg . Geometrically, c is obtained by moving in the distance and direction defined by b from the tip of a or, because addition is commutative, from the tip of b in the distance and direction of a. In a similar manner, vectors in R3 can be represented by directed line segments in a 3-space. Vector in Rn can be views as the coordinates of a point in a n- dimensional space or as the definition of the line segment connecting the origin and this point. In general, scalar multiplication and addition in Rn are defined by αx = ? ?? ?? ?? ? αx1 αx2 . . . αxn ? ?? ?? ?? ? and x+y = ? ?? ?? ?? ? x1 + y1 x2 + y2 . . . xn + yn ? ?? ?? ?? ? for any x and y ∈ Rn and any scalar α. 7 3.1.2 Vector Space Axiom Definition: Let V be a set on which the operations of addition and scalar multiplication are defined. By this we mean that (1). If x,y ∈ V, then x+y ∈ V, (2). If x ∈ V and α is a scalar, then αx ∈ V. The set V together with the operations of addition and scalar multiplication is said to form a vector space if the following axioms are satisfied: (a). x+y = y +x for any x and y in V. (b). (x+y) +z = x+ (y +z). (c). There exist an element 0 in V such that x+0 = x for each x ∈ V. (d). For each x ∈ V, there exist an element ?x ∈ V such that x+ (?x) = 0. (e). α(x+y) = αx+ αy for each real number α and any x and y in V. (f). (α + β)x = αx+ βx for any real number α and β and any x ∈ V. (g). (αβ)x = α(βx) for any real number α and β and any x ∈ V. (h). 1·x = x for all x ∈ V. Example: The two-dimensional plane is the set of all vectors with two real-valued coordi- nates. We label this set R2. It has two important properties. (1). R2 is closed under scalar multiplication; every scalar multiple of a vector in the plane is also in the plane. (2). R2 is closed under addition; the sum of any two vectors is always a vector in the plane. 3.2 Linear Combination of Vectors and Basis Vectors Definition: A set of vectors in a vector is a basis for that vector space if any vector in the vector space can be written as a linear combination of them. Example: Any pair of two dimensional vectors that point in different directions will form 8 a basis for R2. Proof: Consider an arbitrary set of vectors in R2,a,b, and c. If a and b are a basis, we can find numbers α1 and α2 such that c = α1a+ α2b. Let a = bracketleftbigg a 1 a2 bracketrightbigg , b = bracketleftbigg b 1 b2 bracketrightbigg , and c = bracketleftbigg c 1 c2 bracketrightbigg . Then c1 = α1a1 + α2b1, c2 = α1a2 + α2b2. The solutions to this pair of equations are α1 = b2c1 ?b1c2a 1b2 ?b1a2 , (1) α2 = a1c2 ?a2c1a 1b2 ?b1a2 . (2) This gives a unique solution unless (a1b2 ? b1a2) = 0. If (a1b2 ?b1a2) = 0, then a1/a2 = b1/b2, which means that b is just a multiple of a. This returns us to our original condition , that a and b point in different direction. The implication is that if a and b are any pair of vectors for which the denominator in () is not zero, then any other vector c can be formed as a unique linear combination of a and b. The basis of a vector space is not unique, since any set of vectors that satisfy the definition will do. But for any particular basis, there is only one linear combination of them that will produce another particular vector in the vector space. 3.3 Linear Dependence As the preceding should suggest, k vectors are required to form a basis for Rk. However it is not every set of k vectors will suffices. As we see, to form a basis we require that this k vectors to be linearly independent. Definition: A sets of vectors is linearly dependent if any one of the vectors in the set can be 9 written as a linear combination of the others. Definition: The vector v1,v2,...,vn in a vector space V are said to be linearly independent if and only if the solution to c1v1 + c2v2 + ... + cnvn = 0 is c1 = c2 = ... = cn = 0. Example: The vector bracketleftbigg 1 1 bracketrightbigg and bracketleftbigg 1 2 bracketrightbigg are linear independent, since if c1 bracketleftbigg 1 1 bracketrightbigg + c2 bracketleftbigg 1 2 bracketrightbigg = bracketleftbigg 0 0 bracketrightbigg , then c1 + c2 = 0 c1 + 2c2 = 0 and the only solution to this system is c1 = c2 = 0. 3.4 Subspace Definition: The set of all linear combinations of a set of vectors is the vector space that is spanned by those vectors. Example: Rk = Span(v1,...,vk) for a basis (v1,...,vk). We now consider what happens to the vector space that is spanned by linearly dependent vectors. 10 Definition: If S is a nonempty subset of a vector space V, and S satisfying the following conditions: (1). αx ∈ S, whenever x ∈ S for ant scalar α. (2). x+y ∈ S whenever x ∈ S and y ∈ S, then S is said to be a subspace of V. Example: S= Span bracketleftbigg 1 2 bracketrightbigg is a one-dimensional subspace in R2 since α bracketleftbigg 1 2 bracketrightbigg ∈ S and α bracketleftbigg 1 2 bracketrightbigg + β bracketleftbigg 1 2 bracketrightbigg = (α + β) bracketleftbigg 1 2 bracketrightbigg ∈ S. Therefore, the space spanned by a set of vectors in Rk has at most k dimen- sions. If this space has fewer than k dimensions, it is subspace, or hyperplane. But the important point in the preceding discussion is that every set of vectors spans some space; it may be the entire space in which the vector reside, or some subspace of it. Exercise: Let S = {(x1,x2,x3)′|x1 = x2}. Show that S is a subspace of R3. 3.5 Rank of a Matrix If A is an m×n matrix, each row of A is an n?tuple of real numbers and hence can be considered as a vector in R1×n. The m vectors corresponding to the rows of A will be referred to as the row vectors of A. Similarly, each column of A can be considered as a vector in Rm and one can associate n column vectors with the matrix A. Definition: If A is an m×n matrix, the subspace of R1×n spanned by the row vectors of A is called the row space of A. The subspace of Rm spanned by the column vectors of A is called the column space. Example: 11 Let A = bracketleftbigg 1 0 0 0 1 0 bracketrightbigg . The row space of A is the set of all 3-tuples of the form α(1 0 0) + β(0 1 0) = (α β 0). The column space of A is the set of all vectors of the form α bracketleftbigg 1 0 bracketrightbigg + β bracketleftbigg 0 1 bracketrightbigg + γ bracketleftbigg 0 0 bracketrightbigg . Thus the row space of A is a two-dimensional subspace of R1×3 and the column space of A is R2. Theorem: The column space and the row space of a matrix have the same dimension. Definition: The column(row) rank of a matrix is the dimension of the vector space that is spanned by its columns (rows). In short from this definition we know that the column rank is the number of linearly independent column of a matrix. Theorem: The column rank and row rank of a matrix are equal, that is rank(A)=rank(A’)≤ min(number of rows, numbers of columns). Definition: A full (short) rank matrix is a matrix whose rank is equal (fewer) to the number of columns it contains. Some useful results: 1. rank(AB)≤ min(rank(A),rank(B)). 12 Proof: For A is m×n and B is n×k. Write AB = [Ab1 Ab2..... Abk], where bi are the ith column of B. That is, each column of AB can be expressed as a linear combination of the column of A, so the number of linearly independent columns in AB can not be more than the number of linearly independent columns in A. Thus, rank(AB) ≤ rank(A). Similarly, each row of AB can be expressed as a linear combination of the rows of B from which we get rank(AB) ≤ rank(B). Corollary: 1. For any matrix A, rank(A)=rank(A’A)=rank(AA’). 2. Let A is m×n matrix, B is m×m matrix, and C is n ×n matrix. Then if B and C are nonsingular matrices, it follows that rank(BAC)=rank(BA)=rank(AC)=rank(A) Example: Set B=I, if A is M ×n and C is a square matrix of rank n, then rank(AC)=rank(A). 3.6 Determinant For a 2×2 matrix A, the area of the parallelogram formed by its columns is the determinant of A, denoted as |A|. Proposition: The determinant of a matrix is nonzero if and only if it has full rank. Useful results: 1. |cD| = ck|D|, for a constant c, and k ×k matrix D. 2. |CD| = |C|·|D| for two matrix C and D. 13 3. |C| = |C′|. 14 4 Partitioned Matrices It is some times useful to group some of the elements in submatrices. For example, we might write A = bracketleftbigg A 11 A12 A21 A22 bracketrightbigg . Note that A′ = bracketleftbigg A′ 11 A ′ 21 A′12 A′22 bracketrightbigg . A common special case is the block diagonal matrix: A = bracketleftbigg A 11 0 0 A22 bracketrightbigg , where A11 and A22 are square matrices. 4.1 Addition and Multiplication of Partitioned Matrices For conformably partitioned matrices A11 and A22, A+B = bracketleftbigg A 11 +B11 A12 +B12 A21 +B21 A22 +B22 bracketrightbigg . That is, for addition, the dimension of Aij and Bij must be the same. However, AB = bracketleftbigg A 11 A12 A21 A22 bracketrightbiggbracketleftbigg B 11 B12 B21 B22 bracketrightbigg = bracketleftbigg A 11B11 +A12B21 A11B12 +A12B22 A21B11 +A22B21 A21B12 +A22B22 bracketrightbigg That is, for multiplication, the number of columns in Aij must equal the number of rows in Bjk for all pairs i and j. Example: Recall the calculation of the product of a matrix and a vector c = An×kbk×1 = bracketleftbig a1 a2 . . . ak bracketrightbig ? ?? ?? ?? ? b1 b2 . . . bk ? ?? ?? ?? ? = b1a1 + b2a2 + ... + bkak, 15 and that of a matrix product C = An×kBk×T C = AB = Abracketleftbig b1 b2 . . . bT bracketrightbig = [Ab1 Ab2 AbT] = ? ?? ?? ?? ? a′1 a′2 . . . a′n ? ?? ?? ?? ? B = ? ?? ?? ?? ? a′1B a′2B . . . a′nB ? ?? ?? ?? ? . Two cases frequently encountered are of the form bracketleftbigg A 1 A2 bracketrightbigg′bracketleftbigg A 1 A2 bracketrightbigg = bracketleftbig A′1 A′2 bracketrightbig bracketleftbigg A 1 A2 bracketrightbigg = bracketleftbig A′1A1 +A′2A2 bracketrightbig, and bracketleftbigg A11 0 0 A22 bracketrightbigg′bracketleftbigg A 11 0 0 A22 bracketrightbigg = bracketleftbigg A′ 11A11 0 0 A′22A22 bracketrightbigg . 4.2 Determinants of Partitioned Matrices (a). vextendsingle vextendsinglevextendsingle vextendsingle A11 0 0 A22 vextendsinglevextendsingle vextendsinglevextendsingle = |A11|·|A22|. (b). vextendsingle vextendsinglevextendsingle vextendsingle A11 A12 A21 A22 vextendsinglevextendsingle vextendsinglevextendsingle = |A22|·|A11 ?A12A?112A21| = |A11|·|A22 ?A21A?111A12|. 16 4.3 Inverses of Partitioned Matrices 1. The inverse of a block diagonal matrix is bracketleftbigg A 11 0 0 A22 bracketrightbigg?1 = bracketleftbigg A?1 11 0 0 A?122 bracketrightbigg . For a general 2×2 partitioned matrix, bracketleftbigg A 11 A12 A21 A22 bracketrightbigg?1 = bracketleftbigg A?1 11(I+A12F2A21A ?1 11) ?A ?1 11A12F2 ?F2A21A?111 F2 bracketrightbigg , where F2 = (A22 ?A21A?111A12)?1. Exercise: Show that bracketleftbigg A 11 A12 A21 A22 bracketrightbigg?1 is also bracketleftbigg F 1 ?F1A12A?122 ?A?122A21F1 A?122(I+A21F1A12A?122) bracketrightbigg , where F1 = (A11 ?A12A?122A21)?1. 4.4 Kronecker Products For general matrices A and B, the Kronecker products of them is A?B = ? ?? ?? ?? ? a11B a12B . . . a1kB a21B a22B . . . a2kB . . . . . . . . . . . . . . . . . . ai1B ai2B . . . aikB ? ?? ?? ?? ? . 17 Notice that if A is i×k and B is m×n, then A?B is (im)×(kn). Useful results: 1. Let A, B, C, and D be any matrices, then 1a. (A?B)(C?D) = (AC)?(BD), 1b. (A+B)?C = (A?C) + (B?C), 1c. (A?B)′ = A′ ?B′. 2. Let A be m×m and B be k ×k matrices, then 2a. (A?B)?1 = A?1 ?B?1. Proof: (A?1 ?B?1)(A?B) = (A?1A?B?1B) = Im ?Ik = Imk. 2b. tr(A?B) = tr(A)tr(B). Proof: tr(A?B) = msummationdisplay i=1 aiitr(B) = tr(A)tr(B). 4.5 The Vec Operator The operator that transforms a matrix to a vector is known as the vec operator. If the m×n matrix A has ai as its ith column, then vec(A) is the mn×1 vectors given by vec(A) = ? ?? ?? ?? ? a1 a2 . . . an ? ?? ?? ?? ? . 18 Useful results: 1. Let a be m×1 and b be n×1 vectors, then 1a. vec(a) = vec(a′) = a, 1b. vec(ab′) = b?a. Proof: vec(ab′) = vec([b1a b2a ... bna]) = ? ?? ?? ?? ? b1a b2a . . . bna ? ?? ?? ?? ? = b?a. 2. Let A and B both be m×n matrices, then tr(A′B) = {vec(A)}′vec(B). Proof: Let a1,...,an denote the columns of A and b1,...,bn denote the column of B. Then tr(A′B) = nsummationdisplay i=1 (A′B)ii = nsummationdisplay i=1 a′ibi = [a′1 ... a′n] ? ?? ?? ? b1 . . . bn ? ?? ?? ? = {vec(A)}′vec(B). 3. Let A, B, and C be matrices of dimension m×n, n×p and p×q, respectively. Then vec(ABC) = (C′ ?A)vec(B). 4. Let A, B, C, and D be matrices of dimension m×n, n×p, p×q, and q×m respectively. Then tr(ABCD) = {vec(A′)}′(D′ ?B)vec(C). 19 Proof: Using 2., it follows that tr(ABCD) = tr{A(BCD)} = {vec(A′)}′vec(BCD), and using 3. we have that vec(BCD) = (D′ ?B)vec(C), so the proof is complete. Exercise: Let Π = ? ?? ?? ?? ? Π1 0 . . 0 0 Π2 . . 0 . . . 0 . . . ΠN ? ?? ?? ?? ? , where Πi is k × pi matrix. Find an expression for the matrix A such that vec (Π′) = A· ? ?? ?? ? vec(Π′1) . . . vec(Π′N) ? ?? ?? ? . 20 5 Characteristic Roots and Vectors 5.1 Eigenvalues, Eigenvectors, and Eigenspaces If A is an n×n matrix, then any scalar λ satisfying the equation Ax = λx, (3) for some n × 1 vector x negationslash= 0, is called an eigenvalues of A. The vector x is called an eigenvector of A corresponding to eigenvalue λ. Equation (3) can be equivalently expressed as (A?λI)x = 0. Notices that if |A?λI| negationslash= 0, then (A?λI)?1 would exist and so premultiplication of this equation by this inverse would lead to a contradiction of the already stated assumption that x negationslash= 0. Thus, any eigenvalue λ must satisfy the determinantal equation |A?λI| = 0, (4) which is known as the eigenvalue-eigenvector (or characteristic) equation of A. Example: LetA = bracketleftbigg 3 2 3 ?2 bracketrightbigg , thentheeigenvalues ofAisthesolutionto vextendsinglevextendsingle vextendsinglevextendsingle 3?λ 23 ?2?λ vextendsinglevextendsingle vextendsinglevextendsingle = 0. Thus, the eigenvalues of A are λ1 = 4 and λ2 = ?3. To find the eigenvectors belonging to λ1 = 4, we solve (A? 4I)x = 0 to get x1 = bracketleftbigg 2x 2 x2 bracketrightbigg . Thus any nonzero multiple of bracketleftbigg 2 1 bracketrightbigg is an eigenvectors belonging to λ1 and bracketleftbigg 2 1 bracketrightbigg is a basis for the eigenspace corresponding to λ1. Similarly, any nonzero multiple of bracketleftbigg ?1 3 bracketrightbigg is an eigenvector belonging to λ2. From the example above, we see that eigenvectors are not uniquely defined. To remove the indeterminacy, we always (but not necessary) impose the scale constraint that x′ixi = 1. 21 It is worthy to note that a real matrix may have complex eigenvalue and eigen- vectors. Exercise: Fine the eigenvector and eigenvalues of the matrix A = bracketleftbigg 1 1 ?2 ?1 bracketrightbigg . 5.2 Symmetric Matrices We have seen that a matrix may have complex eigenvalues even when the matrix itself is real. This is not the case for symmetric matrices. Theorem: Let A be an n × n real symmetric matrix. Then the eigenvalues of A are real, and corresponding to any eigenvalue there exist eigenvectors that are real. Proof: Let λ = α+iβ be an eigenvalue of A and x = y+iz a corresponding eigenvector, where i = √?1. We will first show that β = 0. Substitution of these expressions for λ and x in the eigenvalue-eigenvector equation (4) yields A(y + iz) = (α + iβ)(y+ iz). (5) Premultiplying (5) by (y + iz)′ we get (y + iz)′A(y + iz) = (α + iβ)(y+ iz)′(y+ iz), which simplifies to y′Ay +z′Az = (α + iβ)(y′y +z′z), (6) since y′Az = z′Ay follows from the symmetry of A. Now x negationslash= 0 implies that (y′y + z′z) > 0 and, consequently, we must have β = 0 since the left-hand side of (6) is real. Substituting β = 0 in (5), we find that Ay + iAz = αy + iαz. Thus, x = y+iz will be an eigenvector of A corresponding to λ = α as long as y and z satisfy Ay = αy, Az = αz, and at least one is not 0 so that x negationslash= 0. A real eigenvector is then constructed by selecting y negationslash= 0, such that Ay = αy and z = 0. 22 Theorem: If A is an n×n real symmetric matrix, then it is possible to construct a set of n eigenvectors of A such that the set is orthonormal, i.e. x′ixj = 0 for i negationslash= j. It is convenient to collect the n eigenvectors in a n × n matrix whose ith column is the xi corresponding to λi, X = [x1 x2..... xn], and the n-eigenvalues in the same order, in a diagonal matrix, Λ = ? ?? ?? ?? ? λ1 0 . . . 0 0 λ2 0 . . 0 . . . . . . . . . . . . . . . . . . 0 0 . . . λn ? ?? ?? ?? ? = [Λ1, Λ2,.... ,Λn] It is ease to see that AX = [Ax1 Ax2..... Axn] = [λ1x1, λ2x2,.... ,λnxn], and XΛ = [XΛ1, XΛ2,.... ,XΛn] = [λ1x1, λ2x2,.... ,λnxn]. Therefore, we have the useful results that AX = XΛ, or known as the spectral decomposition A = XΛX′ and matrix diagnoalization X′AX = X′XΛ = IΛ = Λ, from the fact that X′X = I. 23 5.3 Rank, Trace and Determinant of a matrix Using the results in the rank of matrix products, it is ease to see that 1. rank(A) = rank(X′AX) = rank(Λ) = numbers of non zero eigenvalues of A. 2. tr(X′AX) = tr(AXX′) = tr(AI) = tr(Λ) = nsummationdisplay i=1 λi = summations of eigenvalues of A. 3. |X′AX| = |X′||A||X| = |X′||X||A| = |X′X||A| = |I||A| = |A| = |Λ| = nproductdisplay i=1 λi = products of eigenvalues of A. Exercise: Let A = ? ? 1 2 3 2 9 6 3 6 7 ? ?. Find eigenvalues and eigenvectors of A. In addition, verify that tr(A) = summationtext3i=1 λi, |A| = producttext3i=1 λi, and eigenvectors are orthonormal. Finally, is A a full rank matrix ? 5.4 Powers of a Matrix 1. ”Expanding” (integer power): Consider first for example, AA = A2 = (XΛX′)(XΛX′) = XΛX′XΛX′ = XΛIΛX′ = XΛΛX′ = XΛ2X′, 24 and if A?1 exist, then A?1 = (XΛX′)?1 = (X′)?1Λ?1X?1 = XΛ?1X′. Then we have the following theorem. Theorem: For any nonsingular symmetric matrix A = XΛX′, then Ak = XΛkX′, k = ...,?2,?1,0,1,2,... 2. ”Contraction” (fractional power that is less than one): Sometimes we may require a matrix A such that BB = A, the B is denoted as A1/2, and it is computed as A1/2 = XΛ1/2X′ = X ? ?? ?? ?? ? √λ 1 0 . . . 0 0 √λ2 0 . . 0 . . . . . . . . . . . . . . . . . . 0 0 . . . √λn ? ?? ?? ?? ? X′, as long as all λi are nonnegative. Combining 1 and 2, we have the following result. Theorem: For a positive definite matrix A, Ar = XΛrX′, for any real number r. 5.5 Idempotent Matrices An symmetric Idempotent matrix is the one such that Ak = XΛkX′ = XΛX′ = A, for all nonnegetive integer k. 25 Therefore, (λi)k = λi for all i = 1,..,n. That is all the eigenvalue of an idempotent matrix are 0 or 1. An immediate results from this is that rank of idempotent matrix is equal to its trace. 5.6 Quadratic Forms and Definite Matrices Many optimization problems involving double sums of the form q = nsummationdisplay i=1 nsummationdisplay j=1 cicjaij. This quadratic form can be written as q = c′Ac, where A is a symmetric matrix. Example: In optimization z = f(x1,x2), the FOC is dz = f1dx1 + f2dx2, and the SOC is d2z = f11dx21 + 2f12dx1dx2 + f22dx22, which can be written as [dx1 dx2] bracketleftbigg f 11 f12 f21 f22 bracketrightbiggbracketleftbigg dx 1 dx2 bracketrightbigg , and is a quadratic form. Definition: For a given symmetric matrix A, 1. If c′Ac > (<)0 for all nonzero c, the A is positive (negative) definite. 2. If c′Ac ≥ (≤)0 for all nonzero c, the A is nonnegative definite or positive semidefinite (nonpositive definite). To see the possibility, recall that A = XΛX′, 26 therefore the quadratic form can be written as c′Ac = c′XΛX′c = y′Λy = nsummationdisplay i=1 λiy2i , where y = X′c is a n×1 vector. Theorem: Let A be a symmetric matrix. If all the eigenvalues of A are positive (negative), then A is positive definite (negative definite). If some of the eigenvalues are zero, then A is nonnegative definite if the remainder are positive. If A has both neg- ative and positive roots, then A is indefinite. Example: Let’s us consider a two-product firm under a perfectly competitive market. Ac- cordingly, a firm’s revenue function is R = 12Q1 + 18Q2, where Qi represents the output level of the ith product per unit of time. The firm’s cost function is assumed to be C = 2Q21 + Q1Q2 + 2Q22. The profit function of this hypothetical firm can now be written as pi = R?C = 12Q1 + 18Q2 ?2Q21 ?Q1Q2 ?2Q22 It is our task to find the levels of Q1 and Q2 which, in combination, will maximize pi. For this purpose, we first find the first-order partial derivatives of the profit function: φ1 parenleftbigg = ?pi?Q 1 parenrightbigg = 12?4Q1 ?Q2 (7) φ2 parenleftbigg = ?pi?Q 2 parenrightbigg = 18?Q1 ?4Q2. (8) Setting these both equal to zero, to satisfy the necessary condition for maximum, we have Q?1 = 2 and Q?2 = 4, implying an optimal profit pi? = 48 per unit of time. 27 To be sure that this does represent a maximum profit, let us check the second- order condition. The second partial derivatives, obtainable by partial differenti- ation of (7) and (8), give us the following Hessian: H = bracketleftbigg pi 11 pi12 pi21 pi22 bracketrightbigg = bracketleftbigg ?4 ?1 ?1 ?4 bracketrightbigg . Since eigenvalues of H is ?3 and ?5 and both are smaller than zero, the Hessian matrix (or d2z) is negative definite, and the solution does maximize the profit. 5.6.1 Nonnegative Definite Matrices 1. If A is nonnegative definite, then |A| ≥ 0. 2. If A is positive definite, so is A?1. 3. If A is n × k with full rank and n > k, then A′A is positive and AA′ is nonnegative. Proof: SinceAisfull rank, soAc = a1c1+a2c2+...+akck negationslash= 0. Therefore, c′A′Ac = (Ac)′(Ac) = y′y =summationtext i y 2 i > 0. Meanwhile, a possible ”zero” solution exist in the equation A ′c = 0, since A′c = a′1c1 +a′2c2 + ... +a′ncn. Here, ai,i = 1,2,...,k is the ith column of A and a′j,j = 1,2,...,n is the jth column of A′. 5.6.2 Idempotent Quadratic Forms A quadratic form c′Ac is called a ”Idempotent Quadratic Forms” when A is a symmetric idempotent matrix. 1. Every symmetric idempotent matrix is nonnegative definite. 2. If A is symmetric and idempotent n × n with rank j, then every quadratic form in A can be written as c′Ac=summationtextji=1 y2i . 5.6.3 The Triangular Factorization of a Positive Definite Symmetric Matrix 28 Theorem: Any positive definite symmetric (n×n) matrix ? has a unique representation of the form ? = ADA′, where A is a lower triangular matrix with 1s along the principal diagonal, A = ? ?? ?? ?? ?? ? 1 0 0 . . . 0 a21 1 0 . . . 0 a31 a32 1 . . . 0 . . . . . . . . . . . . . . . . . . . . . an1 an2 an3 . . . 1 ? ?? ?? ?? ?? ? , and D is a diagonal matrix, D = ? ?? ?? ?? ?? ? d11 0 0 . . . 0 0 d22 0 . . . 0 0 0 d33 . . . 0 . . . . . . . . . . . . . . . . . . . . . 0 0 0 . . . dnn ? ?? ?? ?? ?? ? , where dii > 0 for all i. This is known as the triangular factorization of ?. Proof: Consider ? = ? ?? ?? ?? ?? ? ?11 ?12 ?13 . . . ?1n ?21 ?22 ?23 . . . ?2n ?31 ?32 ?33 . . . ?3n . . . . . . . . . . . . . . . . . . . . . ?n1 ?n2 ?n3 . . . ?nn ? ?? ?? ?? ?? ? . Our goal here is to transform ? to be a diagonal matrix. This can be accom- plished in the first step by transform ? to be a matrix with zero in all the first row and first column except for the (1,1) element. This set of operations is ? premultiplied by E1 and postmultiplied by E′1 the results is E1?E′1 = H, (9) 29 where E1 = ? ?? ?? ?? ?? ? 1 0 0 . . . 0 ??21??111 1 0 . . . 0 ??31??111 0 1 . . . 0 . . . . . . . . . . . . . . . . . . . . . ??n1??111 0 0 . . . 1 ? ?? ?? ?? ?? ? (10) and H = ? ?? ?? ?? ?? ? h11 0 0 . . . 0 0 h22 h23 . . . h2n 0 h32 h33 . . . h3n . . . . . . . . . . . . . . . . . . . . . 0 hn2 hn3 . . . hnn ? ?? ?? ?? ?? ? = ? ?? ?? ?? ?? ? ?11 0 0 . . . 0 0 ?22 ??21??111 ?12 ?23 ??21??111 ?13 . . . ?2n ??21??111 ?1n 0 ?32 ??31??111 ?12 ?33 ??31??111 ?13 . . . ?3n ??31??111 ?1n . . . . . . . . . . . . . . . . . . . . . 0 ?n2 ??n1??111 ?12 ?n2 ??n1??111 ?13 . . . ?nn ??n1??111 ?1n ? ?? ?? ?? ?? ? . The matrix E1 always exists, provided that ?11 negationslash= 0. This is ensured in the present case, because ?11 is equal to e′1?e1, where e′1 = [1 0 0 ... 0]. Since ? is positive definite, e′1?e1 must be greater than zero. (Hint: To see this, consider n = 3 for example, E1 is so constructed such that the first column of E1? is zero, i.e. E1? = ? ? 1 0 0 ??21??111 1 0 ??31??111 0 1 ? ? ? ? ?11 ?12 ?13 ?21 ?22 ?23 ?31 ?32 ?33 ? ? = ? ? ?11 ?12 ?13 0 ?22 ??21??111 ?12 ?23 ??21??111 ?13 0 ?32 ??31??111 ?12 ?33 ??31??111 ?13 ? ?. 30 It is easy to see that E1?E′1 = ? ? ?11 ?12 ?13 0 ?22 ??21??111 ?12 ?23 ??21??111 ?13 0 ?32 ??31??111 ?12 ?33 ??31??111 ?13 ? ? ? ? 1 ??21??111 ??31??111 0 1 0 0 0 1 ? ? = ? ? ?11 0 0 0 ?22 ??21??111 ?12 ?23 ??21??111 ?13 0 ?32 ??31??111 ?12 ?33 ??31??111 ?13 ? ? using the fact that ? is symmetric. ? ? We next proceed in exactly the same way with the second row and second column of H. This set of operations is H promultiplied by E2 and postmultiplied by E′2 the results is E2HE′2 = K, (11) where E2 = ? ?? ?? ?? ?? ? 1 0 0 . . . 0 0 1 0 . . . 0 0 ?h32h?122 1 . . . 0 . . . . . . . . . . . . . . . . . . . . . 0 ?hn2h?122 0 . . . 1 ? ?? ?? ?? ?? ? (12) and K = ? ?? ?? ?? ?? ? h11 0 0 . . . 0 0 h22 0 . . . 0 0 0 h33 ?h32h?122 h23 . . . h3n ?h32h?122 h2n . . . . . . . . . . . . . . . . . . . . . 0 0 hn3 ?hn2h?122 h23 . . . hnn ?hn2h?122 h2n ? ?? ?? ?? ?? ? . The matrix E2 always exists, provided that h22 negationslash= 0. But h22 can be calcu- lated as h22 = e′2He2, where e′2 = [0 1 0 ... 0]. Moreover, H = E1?E′1, where ? is positive definite and E1 is given by (10). Since E1 is lower triangular, its determinant is the product of terms along the principle diagonal, which are all unity. Thus E1 is nonsingular, meaning that H = E1?E′1 is positive definite and 31 so h22 = e′2He2 must be strictly positive. Thus the matrix in (11) can always be calculated. Proceeding through each of the columns and rows with the same approach, we see that for any positive symmetric matrix ? there exist matrices E1, E2,...,En?1 such that En?1 ···E2E1?E′1E′2 ···E′n?1 = D, (13) where D = ? ?? ?? ?? ?? ? ?11 0 0 . . . 0 0 ?22 ??21??111 ?12 0 . . . 0 0 0 h33 ?h32h?122 h23 . . . 0 . . . . . . . . . . . . . . . . . . . . . 0 0 0 . . . cnn ?cn,n?1c?1n?1,n?1cn?1,n ? ?? ?? ?? ?? ? , with all the diagonal entries of D strictly positive. In general, Ej is a matrix with nonzero value in the jth column below the principle diagonal, 1s along the principle diagonal, and zeros everywhere else. Thus each Ej is lower triangular with unit determinant. Hence E?1j exists, and the following matrix exists: A = (En?1 ···E2E1)?1 = E?11 E?12 ···E?1n?1. If (13) is premultiplied by A and postmulptiplied by A′, the result is ? = ADA′, where A is a lower triangular matrix with 1s along the principle diagonal from the fact that the product of lower triangular matrix is also triangular and the inverse of a lower triangular matrix is also lower triangular. 5.6.4 The Cholesky Factorization A close related factorization of a symmetric positive definite matrix ? is obtained as follows. DefineD1/2 to be the (n×n)diagonal matrix whose diagonal entries are 32 the square roots of the corresponding elements of the matrix D in the triangular factorization: D1/2 = ? ?? ?? ?? ?? ? √d 11 0 0 . . . 0 0 √d22 0 . . . 0 0 0 √d33 . . . 0 . . . . . . . . . . . . . . . . . . . . . 0 0 0 . . . √dnn ? ?? ?? ?? ?? ? . since the matrix D is unique and has strictly positive diagonal entries, the matrix D1/2 exists and is unique. Then the triangular factorization can be written ? = AD1/2D1/2A′ = AD1/2(AD1/2)′ or ? = PP′, (14) where P ≡ AD1/2 = ? ?? ?? ?? ?? ? 1 0 0 . . . 0 a21 1 0 . . . 0 a31 a32 1 . . . 0 . . . . . . . . . . . . . . . . . . . . . an1 an2 an3 . . . 1 ? ?? ?? ?? ?? ? ? ?? ?? ?? ?? ? √d 11 0 0 . . . 0 0 √d22 0 . . . 0 0 0 √d33 . . . 0 . . . . . . . . . . . . . . . . . . . . . 0 0 0 . . . √dnn ? ?? ?? ?? ?? ? = ? ?? ?? ?? ?? ? √d 11 0 0 . . . 0 a21√d11 √d22 0 . . . 0 a31√d11 a32√d22 √d33 . . . 0 . . . . . . . . . . . . . . . . . . . . . an1√d11 an2√d22 an3√d33 . . . √dnn ? ?? ?? ?? ?? ? . Expression (14) is known as the Cholesky factorization of ?. Exercise: 33 Let ? = ? ? 4 2 ?2 2 10 2 ?2 2 5 ? ?. Find lower triangular matrices P and A, and a diagonal matrix D such that ? = PP′ (Cholesky decomposition) and ? = ADA′ (triangular factorization). 34 6 Calculus and Matrix Algebra 1. Differentiating a scalar with respect to a column yield a column. That is for y = f(x1,x2,..,xn) = f(x), then ?f(x) ?x = ? ?? ?? ?? ?? ?f(x) ?x1?f(x) ?x2 . . . ?f(x) ?xn ? ?? ?? ?? ?? = ? ?? ?? ?? ? f1 f2 . . . fn ? ?? ?? ?? ? ≡ gradient vector. Example 1: A linear function is y = a1x1 + a2x2 + ... + anxn = a′x. Then ?a′x ?x = ? ?? ?? ?? ?? ?a′x ?x1 ?a′x ?x2 . . . ?a′x ?xn ? ?? ?? ?? ?? = ? ?? ?? ?? ? a1 a2 . . . an ? ?? ?? ?? ? = a. Example 2: A quadratic form is y = x′Ax. Then ?x′Ax ?x = braceleftbigg 2Ax when A is symmetric; (A+A′)x when A is not symmetric. 2. Differentiating a column vector with respect to a row vector yield a matrix. That is for y = f(x1,x2,..,xn) = f(x), then ? bracketleftBig ?f(x) ?x bracketrightBig ?x′ = ? ?? ?? ?? ?? ?f1 ?x1 ?f1 ?x2 . . . ?f1 ?xn?f 2 ?x1 ?f2 ?x2 . . . ?f2 ?xn . . . . . . . . . . . . . . . . . . ?fn ?x1 ?fn ?x2 . . . ?fn ?xn ? ?? ?? ?? ?? = ? ?? ?? ?? ? f11 f12 . . . f1n f21 f22 . . . f2n . . . . . . . . . . . . . . . . . . fn1 fn2 . . . fnn ? ?? ?? ?? ? ≡ Hessian Matrix. 35 Example: A set of linear function is y = Ax. It follows that yi = aix, where ai is the ith row of A. Therefore ?yi ?x = (a i)′, or ?yi ?x′ = a i. Collecting all the elements, we have ? ?? ?? ?? ? ?y1 ?x′?y 2 ?x′. . . ?yn ?x′ ? ?? ?? ?? ? = ? ?? ?? ?? ? a1 a2 . . . an ? ?? ?? ?? ? = ?Ax?x′ = A. 3. Differentiating a scalar with respect to a n×n matrix yield a n×n matrix. a. ?x′Ax ?A = xx ′, b. ? ln|A| ?A = (A ?1)′. 36