Ch. 1 Matrix Algebra
1 Some Terminology
A matrix is a rectangular array of numbers, denoted
Ai×k = [aik] =
?
??
??
??
?
a11 a12 . . . a1k
a21 a22 . . . a2k
. . . . . .
. . . . . .
. . . . . .
ai1 ai2 . . . aik
?
??
??
??
?
,
where a subscribed element of a matrix is always read as arow,column. Here we
confine the element to be real number.
A vector is a matrix with one row or one column. Therefore a row vector is
A1×k and a column vector is Ai×1 and commonly denoted as a′k and ai, respec-
tively. In the followings of this course, we follow conventional custom to say that
a vector is a columnvector except for particular mention.
The dimension of a matrix is the numbers of rows and columns it contained.
If i equals to k, then A is a square matrix. Several particular types of square
matrices occur in econometrics.
(1). A symmetric matrix, A is one in which aik = aki for all i and k.
(2). A diagonal matrix is a square matrix whose nonzero elements appears on
the main diagonal, moving from upper left to lower right.
(3). A scalar matrix is a diagonal matrix with the same values in all diagonal
elements
(4). An identity matrix is a scalar matrix with ones on the diagonal. This matrix
is always denoted as I. A subscript is sometimes included to indicate its size. for
example,
I3 =
?
?
1 0 0
0 1 0
0 0 1
?
?.
(5). A triangular matrix is one that has only zeros either above or below the
main diagonal.
1
2 Algebraic Manipulation of Matrices
2.1 Equality of Matrices
Matrices A and B are equal if and only if they have the same dimensions and
each elements of A equal the corresponding element of B.
A=B if and only if aik = bik for all i and k.
2.2 Transposition
The transpose of a matrix A, denoted A′, is obtained by creating the matrix
whose kth row is the kth column of the original matrix. If A is i×k, A′ is k×i.
For example,
A =
?
??
?
1 2 3
5 1 5
6 4 5
3 1 4
?
??
?, A′ =
?
?
1 5 6 3
2 1 4 1
3 5 5 4
?
?.
IfAis symmetric, A=A′. It is also apparentthat for any matrixA, (A′)′ = A.
Finally, the transpose of a column vector, ai is a row vector:
a′i = bracketleftbig a1 a2 . . . ai bracketrightbig.
2.3 Matrix Addition
Matrices cannot be added unless they have the same dimension. The operation
of addition is extended to matrices by defining
C = A+B=[aik + bik].
We also extend the operation of subtraction to matrices precisely as if they
were scalars by performing the operation element by element. Thus,
C = A?B=[aik ?bik].
It follows that matrix addition is commutative,
A+B = B+A,
and associative,
2
(A+B) +C = A+ (B+C),
and that
(A+B)′ = A′ +B′.
2.4 Matrix Multiplication
Matrices are multiplied by using the inner product. The inner product of two
vectors, an and bn, is a scalar and is written
a′b = a1b1 + a2b2 + ... + anbn = b′a.
For an n×k matrix A and a k ×T matrix B, the product matrix,
C = AB,
is an n × T matrix whose ikth element is the inner product of row i of A and
column k of B. Generally, AB negationslash= BA.
The product of a matrix and a vector is a vector and is written as
c = Ab
= b1a1 + b2a2 + ... + bkak,
where bi are ith element of vector b and ai are ith column of matrix A. Here
we see that the right-hand side is a linear combination of the columns of the
matrix where the coefficients are the elements of the vector.
In the calculation of a matrix product C = An×kBk×T, it can be written as
C = AB
= [Ab1 Ab2 AbT],
where bi are ith column of matrix B.
Some general rules for matrix multiplication are as follow:
Associate law: (AB)C = A(BC).
Distributive law: A(B+C) = AB+AC.
Transpose of a product: (AB)′ = B′A′.
Scalar multiplication: αA = [αaik] for a scalar α.
3
2.5 Matrix Inversion
Definition:
A square matrix A is said to be nonsingular or invertible if there exist a unique
matrix (square) B such that AB = BA = I. The matrix B is said to be a
multiplicative inverse of A.
We will refer to the multiplicative inverse of a nonsingular matrix A as simply
the inverse of A and denote it by A?1.
Some computational results involving inverse are
|A?1| = 1|A|,
(A?1)?1 = A,
(A?1)′ = (A′)?1
(AB)?1 = B?1A?1,
when both inverse matrices exist. Finally, if A is symmetric, then A?1 is also
symmetric.
Lemma:
Suppose that A, B and A+B are all m×m nonsingular matrices. Then
(A+B)?1 = A?1 ?A?1(B?1 +A?1)?1A?1.
2.6 A useful idempotent matrix
Definition:
An idempotent matrix is the one that is equal to its square, that is M2 = MM =
M.
A useful idempotent matrix we will often face is the matrix M0 = I ? 1nii′,
4
such that
M0x =
?
??
??
??
?
x1 ? ˉx
x2 ? ˉx
.
.
.
xn ? ˉx
?
??
??
??
?
,
where i is a column of ones’s vector, x = [x1,x2,...,xn]′ and ˉx = 1n summationtextni=1 xi.
Proof.
As definition,
M0x = (I? 1nii′)x = x? 1nii′x = x?iˉx.
Exercise:
Using the idempotent matrixM0 to calculatesummationtext200j=1(j?ˉj)2, whereˉj = 1200 summationtext200j=1(j)
from Gauss.
2.7 Trace of Matrix
The trace of a square k ×k matrix is the sums of its diagonal elements:
tr(A)=summationtextki=1 aii.
Some useful results are:
1.
tr(cA) = c(tr(A)),
2.
tr(A) = tr(A′),
3.
tr(A+B) = tr(B) + tr(A),
4.
tr(ABCD) = tr(BCDA) = tr(CDAB) = tr(DABC).
5
3 Geometry of Matrices
Matrix algebra is extremely useful in the formulation and solution of sets of linear
equations. At the same time, the algebraic results have a geometrical basis that
is very helpful in understanding the calculation. It is helpful to digress to a
geometric treatment of matrices and vectors.
3.1 Vector Spaces
3.1.1 Euclidean Vector Space
Perhaps the most elementary vector are the Euclidean vector space Rn,n =
1,2,.... For simplicity, let us consider first R2. Non zero vector in R2 can be
represented geometrically by directed line segments. Given a nonzero vector
x =
bracketleftbigg x
1
x2
bracketrightbigg
, we can associate it with the line segment in the plane from (0,0) to
(x1,x2). If we equate line segment that have the same length and direction, x can
be represented by any line segment from (a,b) to (a + x1,b + x2). For example,
the vector x =
bracketleftbigg 2
1
bracketrightbigg
in R2 could be represented by the directed line segment
from (2,2) to (4,3), or from (?1,?1) to (1,0).
We can think of the Euclidean length of a vector x =
bracketleftbigg x
1
x2
bracketrightbigg
as the length of
any directed line segment representing x. The length of the segment from (0,0)
to (x1,x2) is radicalbigx21 + x22.
Two basic operations are defined for vectors, scalar multiplication and ad-
dition. The geometric representation will help us to visualize how the operation
of scalar multiplication and addition work in R2.
(1). Scalar multiplication: For each vector x =
bracketleftbigg x
1
x2
bracketrightbigg
and each scalar α, the
product αx is defined by
αx =
bracketleftbigg αx
1
αx2
bracketrightbigg
.
For example, the set of all possible scalar multiple of x is the line through 0 and
x. Any scalar multiple of x is a segment of this line.
Example:
6
x =
bracketleftbigg 1
2
bracketrightbigg
x? = 2x =
bracketleftbigg 2
4
bracketrightbigg
x?? = ?12x =
bracketleftbigg ?1
2?1
bracketrightbigg
.
The vector x?(= 2x) is in the same direction as x, but its length is two times
that of x. The vector x??(= ?12x) has half of length as x but its point in the
opposite direction.
(2). Addition: The sum of two vectors a and b is a third vector whose coordinates
are the sums of the corresponding coordinates of a and b. For example ,
c = a+b =
bracketleftbigg 1
2
bracketrightbigg
+
bracketleftbigg 2
1
bracketrightbigg
=
bracketleftbigg 3
3
bracketrightbigg
.
Geometrically, c is obtained by moving in the distance and direction defined by
b from the tip of a or, because addition is commutative, from the tip of b in the
distance and direction of a.
In a similar manner, vectors in R3 can be represented by directed line segments
in a 3-space. Vector in Rn can be views as the coordinates of a point in a n-
dimensional space or as the definition of the line segment connecting the origin
and this point.
In general, scalar multiplication and addition in Rn are defined by
αx =
?
??
??
??
?
αx1
αx2
.
.
.
αxn
?
??
??
??
?
and
x+y =
?
??
??
??
?
x1 + y1
x2 + y2
.
.
.
xn + yn
?
??
??
??
?
for any x and y ∈ Rn and any scalar α.
7
3.1.2 Vector Space Axiom
Definition:
Let V be a set on which the operations of addition and scalar multiplication are
defined. By this we mean that
(1). If x,y ∈ V, then x+y ∈ V,
(2). If x ∈ V and α is a scalar, then αx ∈ V.
The set V together with the operations of addition and scalar multiplication is
said to form a vector space if the following axioms are satisfied:
(a). x+y = y +x for any x and y in V.
(b). (x+y) +z = x+ (y +z).
(c). There exist an element 0 in V such that x+0 = x for each x ∈ V.
(d). For each x ∈ V, there exist an element ?x ∈ V such that x+ (?x) = 0.
(e). α(x+y) = αx+ αy for each real number α and any x and y in V.
(f). (α + β)x = αx+ βx for any real number α and β and any x ∈ V.
(g). (αβ)x = α(βx) for any real number α and β and any x ∈ V.
(h). 1·x = x for all x ∈ V.
Example:
The two-dimensional plane is the set of all vectors with two real-valued coordi-
nates. We label this set R2. It has two important properties.
(1). R2 is closed under scalar multiplication; every scalar multiple of a vector in
the plane is also in the plane.
(2). R2 is closed under addition; the sum of any two vectors is always a vector in
the plane.
3.2 Linear Combination of Vectors and Basis Vectors
Definition:
A set of vectors in a vector is a basis for that vector space if any vector in the
vector space can be written as a linear combination of them.
Example:
Any pair of two dimensional vectors that point in different directions will form
8
a basis for R2.
Proof:
Consider an arbitrary set of vectors in R2,a,b, and c. If a and b are a basis, we
can find numbers α1 and α2 such that c = α1a+ α2b. Let
a =
bracketleftbigg a
1
a2
bracketrightbigg
, b =
bracketleftbigg b
1
b2
bracketrightbigg
, and c =
bracketleftbigg c
1
c2
bracketrightbigg
.
Then
c1 = α1a1 + α2b1,
c2 = α1a2 + α2b2.
The solutions to this pair of equations are
α1 = b2c1 ?b1c2a
1b2 ?b1a2
, (1)
α2 = a1c2 ?a2c1a
1b2 ?b1a2
. (2)
This gives a unique solution unless (a1b2 ? b1a2) = 0. If (a1b2 ?b1a2) = 0, then
a1/a2 = b1/b2, which means that b is just a multiple of a. This returns us to our
original condition , that a and b point in different direction. The implication is
that if a and b are any pair of vectors for which the denominator in () is not
zero, then any other vector c can be formed as a unique linear combination of a
and b. The basis of a vector space is not unique, since any set of vectors that
satisfy the definition will do. But for any particular basis, there is only one linear
combination of them that will produce another particular vector in the vector
space.
3.3 Linear Dependence
As the preceding should suggest, k vectors are required to form a basis for Rk.
However it is not every set of k vectors will suffices. As we see, to form a basis
we require that this k vectors to be linearly independent.
Definition:
A sets of vectors is linearly dependent if any one of the vectors in the set can be
9
written as a linear combination of the others.
Definition:
The vector v1,v2,...,vn in a vector space V are said to be linearly independent
if and only if the solution to
c1v1 + c2v2 + ... + cnvn = 0
is
c1 = c2 = ... = cn = 0.
Example:
The vector
bracketleftbigg 1
1
bracketrightbigg
and
bracketleftbigg 1
2
bracketrightbigg
are linear independent, since if
c1
bracketleftbigg 1
1
bracketrightbigg
+ c2
bracketleftbigg 1
2
bracketrightbigg
=
bracketleftbigg 0
0
bracketrightbigg
,
then
c1 + c2 = 0
c1 + 2c2 = 0
and the only solution to this system is c1 = c2 = 0.
3.4 Subspace
Definition:
The set of all linear combinations of a set of vectors is the vector space that is
spanned by those vectors.
Example:
Rk = Span(v1,...,vk) for a basis (v1,...,vk).
We now consider what happens to the vector space that is spanned by linearly
dependent vectors.
10
Definition:
If S is a nonempty subset of a vector space V, and S satisfying the following
conditions:
(1). αx ∈ S, whenever x ∈ S for ant scalar α.
(2). x+y ∈ S whenever x ∈ S and y ∈ S, then S is said to be a subspace of V.
Example:
S= Span
bracketleftbigg 1
2
bracketrightbigg
is a one-dimensional subspace in R2 since α
bracketleftbigg 1
2
bracketrightbigg
∈ S and
α
bracketleftbigg 1
2
bracketrightbigg
+ β
bracketleftbigg 1
2
bracketrightbigg
= (α + β)
bracketleftbigg 1
2
bracketrightbigg
∈ S.
Therefore, the space spanned by a set of vectors in Rk has at most k dimen-
sions. If this space has fewer than k dimensions, it is subspace, or hyperplane.
But the important point in the preceding discussion is that every set of vectors
spans some space; it may be the entire space in which the vector reside, or some
subspace of it.
Exercise:
Let S = {(x1,x2,x3)′|x1 = x2}. Show that S is a subspace of R3.
3.5 Rank of a Matrix
If A is an m×n matrix, each row of A is an n?tuple of real numbers and hence
can be considered as a vector in R1×n. The m vectors corresponding to the rows
of A will be referred to as the row vectors of A. Similarly, each column of A can
be considered as a vector in Rm and one can associate n column vectors with the
matrix A.
Definition:
If A is an m×n matrix, the subspace of R1×n spanned by the row vectors of A
is called the row space of A. The subspace of Rm spanned by the column vectors
of A is called the column space.
Example:
11
Let
A =
bracketleftbigg 1 0 0
0 1 0
bracketrightbigg
.
The row space of A is the set of all 3-tuples of the form
α(1 0 0) + β(0 1 0) = (α β 0).
The column space of A is the set of all vectors of the form
α
bracketleftbigg 1
0
bracketrightbigg
+ β
bracketleftbigg 0
1
bracketrightbigg
+ γ
bracketleftbigg 0
0
bracketrightbigg
.
Thus the row space of A is a two-dimensional subspace of R1×3 and the column
space of A is R2.
Theorem:
The column space and the row space of a matrix have the same dimension.
Definition:
The column(row) rank of a matrix is the dimension of the vector space that is
spanned by its columns (rows).
In short from this definition we know that the column rank is the number of
linearly independent column of a matrix.
Theorem:
The column rank and row rank of a matrix are equal, that is
rank(A)=rank(A’)≤ min(number of rows, numbers of columns).
Definition:
A full (short) rank matrix is a matrix whose rank is equal (fewer) to the number
of columns it contains.
Some useful results:
1.
rank(AB)≤ min(rank(A),rank(B)).
12
Proof:
For A is m×n and B is n×k. Write AB = [Ab1 Ab2..... Abk], where bi are
the ith column of B. That is, each column of AB can be expressed as a linear
combination of the column of A, so the number of linearly independent columns
in AB can not be more than the number of linearly independent columns in
A. Thus, rank(AB) ≤ rank(A). Similarly, each row of AB can be expressed
as a linear combination of the rows of B from which we get rank(AB) ≤ rank(B).
Corollary:
1. For any matrix A,
rank(A)=rank(A’A)=rank(AA’).
2. Let A is m×n matrix, B is m×m matrix, and C is n ×n matrix. Then if
B and C are nonsingular matrices, it follows that
rank(BAC)=rank(BA)=rank(AC)=rank(A)
Example:
Set B=I, if A is M ×n and C is a square matrix of rank n, then
rank(AC)=rank(A).
3.6 Determinant
For a 2×2 matrix A, the area of the parallelogram formed by its columns is the
determinant of A, denoted as |A|.
Proposition:
The determinant of a matrix is nonzero if and only if it has full rank.
Useful results:
1.
|cD| = ck|D|, for a constant c, and k ×k matrix D.
2.
|CD| = |C|·|D| for two matrix C and D.
13
3.
|C| = |C′|.
14
4 Partitioned Matrices
It is some times useful to group some of the elements in submatrices. For example,
we might write
A =
bracketleftbigg A
11 A12
A21 A22
bracketrightbigg
.
Note that
A′ =
bracketleftbigg A′
11 A
′
21
A′12 A′22
bracketrightbigg
.
A common special case is the block diagonal matrix:
A =
bracketleftbigg A
11 0
0 A22
bracketrightbigg
,
where A11 and A22 are square matrices.
4.1 Addition and Multiplication of Partitioned Matrices
For conformably partitioned matrices A11 and A22,
A+B =
bracketleftbigg A
11 +B11 A12 +B12
A21 +B21 A22 +B22
bracketrightbigg
.
That is, for addition, the dimension of Aij and Bij must be the same. However,
AB =
bracketleftbigg A
11 A12
A21 A22
bracketrightbiggbracketleftbigg B
11 B12
B21 B22
bracketrightbigg
=
bracketleftbigg A
11B11 +A12B21 A11B12 +A12B22
A21B11 +A22B21 A21B12 +A22B22
bracketrightbigg
That is, for multiplication, the number of columns in Aij must equal the number
of rows in Bjk for all pairs i and j.
Example:
Recall the calculation of the product of a matrix and a vector
c = An×kbk×1
= bracketleftbig a1 a2 . . . ak bracketrightbig
?
??
??
??
?
b1
b2
.
.
.
bk
?
??
??
??
?
= b1a1 + b2a2 + ... + bkak,
15
and that of a matrix product C = An×kBk×T
C = AB
= Abracketleftbig b1 b2 . . . bT bracketrightbig
= [Ab1 Ab2 AbT]
=
?
??
??
??
?
a′1
a′2
.
.
.
a′n
?
??
??
??
?
B
=
?
??
??
??
?
a′1B
a′2B
.
.
.
a′nB
?
??
??
??
?
.
Two cases frequently encountered are of the form
bracketleftbigg A
1
A2
bracketrightbigg′bracketleftbigg A
1
A2
bracketrightbigg
= bracketleftbig A′1 A′2 bracketrightbig
bracketleftbigg A
1
A2
bracketrightbigg
= bracketleftbig A′1A1 +A′2A2 bracketrightbig,
and bracketleftbigg
A11 0
0 A22
bracketrightbigg′bracketleftbigg A
11 0
0 A22
bracketrightbigg
=
bracketleftbigg A′
11A11 0
0 A′22A22
bracketrightbigg
.
4.2 Determinants of Partitioned Matrices
(a). vextendsingle
vextendsinglevextendsingle
vextendsingle
A11 0
0 A22
vextendsinglevextendsingle
vextendsinglevextendsingle = |A11|·|A22|.
(b). vextendsingle
vextendsinglevextendsingle
vextendsingle
A11 A12
A21 A22
vextendsinglevextendsingle
vextendsinglevextendsingle = |A22|·|A11 ?A12A?112A21|
= |A11|·|A22 ?A21A?111A12|.
16
4.3 Inverses of Partitioned Matrices
1. The inverse of a block diagonal matrix is
bracketleftbigg A
11 0
0 A22
bracketrightbigg?1
=
bracketleftbigg A?1
11 0
0 A?122
bracketrightbigg
.
For a general 2×2 partitioned matrix,
bracketleftbigg A
11 A12
A21 A22
bracketrightbigg?1
=
bracketleftbigg A?1
11(I+A12F2A21A
?1
11) ?A
?1
11A12F2
?F2A21A?111 F2
bracketrightbigg
,
where F2 = (A22 ?A21A?111A12)?1.
Exercise:
Show that
bracketleftbigg A
11 A12
A21 A22
bracketrightbigg?1
is also
bracketleftbigg F
1 ?F1A12A?122
?A?122A21F1 A?122(I+A21F1A12A?122)
bracketrightbigg
,
where F1 = (A11 ?A12A?122A21)?1.
4.4 Kronecker Products
For general matrices A and B, the Kronecker products of them is
A?B =
?
??
??
??
?
a11B a12B . . . a1kB
a21B a22B . . . a2kB
. . . . . .
. . . . . .
. . . . . .
ai1B ai2B . . . aikB
?
??
??
??
?
.
17
Notice that if A is i×k and B is m×n, then A?B is (im)×(kn).
Useful results:
1. Let A, B, C, and D be any matrices, then
1a.
(A?B)(C?D) = (AC)?(BD),
1b.
(A+B)?C = (A?C) + (B?C),
1c.
(A?B)′ = A′ ?B′.
2. Let A be m×m and B be k ×k matrices, then
2a.
(A?B)?1 = A?1 ?B?1.
Proof:
(A?1 ?B?1)(A?B) = (A?1A?B?1B) = Im ?Ik = Imk.
2b.
tr(A?B) = tr(A)tr(B).
Proof:
tr(A?B) =
msummationdisplay
i=1
aiitr(B) = tr(A)tr(B).
4.5 The Vec Operator
The operator that transforms a matrix to a vector is known as the vec operator.
If the m×n matrix A has ai as its ith column, then vec(A) is the mn×1 vectors
given by
vec(A) =
?
??
??
??
?
a1
a2
.
.
.
an
?
??
??
??
?
.
18
Useful results:
1. Let a be m×1 and b be n×1 vectors, then
1a.
vec(a) = vec(a′) = a,
1b.
vec(ab′) = b?a.
Proof:
vec(ab′) = vec([b1a b2a ... bna]) =
?
??
??
??
?
b1a
b2a
.
.
.
bna
?
??
??
??
?
= b?a.
2. Let A and B both be m×n matrices, then
tr(A′B) = {vec(A)}′vec(B).
Proof:
Let a1,...,an denote the columns of A and b1,...,bn denote the column of B. Then
tr(A′B) =
nsummationdisplay
i=1
(A′B)ii =
nsummationdisplay
i=1
a′ibi = [a′1 ... a′n]
?
??
??
?
b1
.
.
.
bn
?
??
??
?
= {vec(A)}′vec(B).
3. Let A, B, and C be matrices of dimension m×n, n×p and p×q, respectively.
Then
vec(ABC) = (C′ ?A)vec(B).
4. Let A, B, C, and D be matrices of dimension m×n, n×p, p×q, and q×m
respectively. Then
tr(ABCD) = {vec(A′)}′(D′ ?B)vec(C).
19
Proof:
Using 2., it follows that
tr(ABCD) = tr{A(BCD)} = {vec(A′)}′vec(BCD),
and using 3. we have that
vec(BCD) = (D′ ?B)vec(C),
so the proof is complete.
Exercise:
Let
Π =
?
??
??
??
?
Π1 0 . . 0
0 Π2 . . 0
.
.
.
0 . . . ΠN
?
??
??
??
?
,
where Πi is k × pi matrix. Find an expression for the matrix A such that
vec (Π′) = A·
?
??
??
?
vec(Π′1)
.
.
.
vec(Π′N)
?
??
??
?
.
20
5 Characteristic Roots and Vectors
5.1 Eigenvalues, Eigenvectors, and Eigenspaces
If A is an n×n matrix, then any scalar λ satisfying the equation
Ax = λx, (3)
for some n × 1 vector x negationslash= 0, is called an eigenvalues of A. The vector x is
called an eigenvector of A corresponding to eigenvalue λ. Equation (3) can be
equivalently expressed as
(A?λI)x = 0.
Notices that if |A?λI| negationslash= 0, then (A?λI)?1 would exist and so premultiplication
of this equation by this inverse would lead to a contradiction of the already stated
assumption that x negationslash= 0. Thus, any eigenvalue λ must satisfy the determinantal
equation
|A?λI| = 0, (4)
which is known as the eigenvalue-eigenvector (or characteristic) equation of A.
Example:
LetA =
bracketleftbigg 3 2
3 ?2
bracketrightbigg
, thentheeigenvalues ofAisthesolutionto
vextendsinglevextendsingle
vextendsinglevextendsingle 3?λ 23 ?2?λ
vextendsinglevextendsingle
vextendsinglevextendsingle =
0. Thus, the eigenvalues of A are λ1 = 4 and λ2 = ?3.
To find the eigenvectors belonging to λ1 = 4, we solve (A? 4I)x = 0 to get
x1 =
bracketleftbigg 2x
2
x2
bracketrightbigg
. Thus any nonzero multiple of
bracketleftbigg 2
1
bracketrightbigg
is an eigenvectors belonging
to λ1 and
bracketleftbigg 2
1
bracketrightbigg
is a basis for the eigenspace corresponding to λ1. Similarly, any
nonzero multiple of
bracketleftbigg ?1
3
bracketrightbigg
is an eigenvector belonging to λ2.
From the example above, we see that eigenvectors are not uniquely defined.
To remove the indeterminacy, we always (but not necessary) impose the scale
constraint that
x′ixi = 1.
21
It is worthy to note that a real matrix may have complex eigenvalue and eigen-
vectors.
Exercise:
Fine the eigenvector and eigenvalues of the matrix A =
bracketleftbigg 1 1
?2 ?1
bracketrightbigg
.
5.2 Symmetric Matrices
We have seen that a matrix may have complex eigenvalues even when the matrix
itself is real. This is not the case for symmetric matrices.
Theorem:
Let A be an n × n real symmetric matrix. Then the eigenvalues of A are real,
and corresponding to any eigenvalue there exist eigenvectors that are real.
Proof:
Let λ = α+iβ be an eigenvalue of A and x = y+iz a corresponding eigenvector,
where i = √?1. We will first show that β = 0. Substitution of these expressions
for λ and x in the eigenvalue-eigenvector equation (4) yields
A(y + iz) = (α + iβ)(y+ iz). (5)
Premultiplying (5) by (y + iz)′ we get
(y + iz)′A(y + iz) = (α + iβ)(y+ iz)′(y+ iz),
which simplifies to
y′Ay +z′Az = (α + iβ)(y′y +z′z), (6)
since y′Az = z′Ay follows from the symmetry of A. Now x negationslash= 0 implies that
(y′y + z′z) > 0 and, consequently, we must have β = 0 since the left-hand side
of (6) is real. Substituting β = 0 in (5), we find that
Ay + iAz = αy + iαz.
Thus, x = y+iz will be an eigenvector of A corresponding to λ = α as long as y
and z satisfy Ay = αy, Az = αz, and at least one is not 0 so that x negationslash= 0. A real
eigenvector is then constructed by selecting y negationslash= 0, such that Ay = αy and z = 0.
22
Theorem:
If A is an n×n real symmetric matrix, then it is possible to construct a set of n
eigenvectors of A such that the set is orthonormal, i.e. x′ixj = 0 for i negationslash= j.
It is convenient to collect the n eigenvectors in a n × n matrix whose ith
column is the xi corresponding to λi,
X = [x1 x2..... xn],
and the n-eigenvalues in the same order, in a diagonal matrix,
Λ =
?
??
??
??
?
λ1 0 . . . 0
0 λ2 0 . . 0
. . . . . .
. . . . . .
. . . . . .
0 0 . . . λn
?
??
??
??
?
= [Λ1, Λ2,.... ,Λn]
It is ease to see that
AX = [Ax1 Ax2..... Axn]
= [λ1x1, λ2x2,.... ,λnxn],
and
XΛ = [XΛ1, XΛ2,.... ,XΛn]
= [λ1x1, λ2x2,.... ,λnxn].
Therefore, we have the useful results that
AX = XΛ,
or known as the spectral decomposition
A = XΛX′
and matrix diagnoalization
X′AX = X′XΛ = IΛ = Λ,
from the fact that X′X = I.
23
5.3 Rank, Trace and Determinant of a matrix
Using the results in the rank of matrix products, it is ease to see that
1.
rank(A) = rank(X′AX) = rank(Λ) = numbers of non zero eigenvalues of A.
2.
tr(X′AX) = tr(AXX′) = tr(AI) = tr(Λ) =
nsummationdisplay
i=1
λi = summations of eigenvalues of A.
3.
|X′AX| = |X′||A||X| = |X′||X||A| = |X′X||A|
= |I||A| = |A| = |Λ| =
nproductdisplay
i=1
λi = products of eigenvalues of A.
Exercise:
Let A =
?
?
1 2 3
2 9 6
3 6 7
?
?. Find eigenvalues and eigenvectors of A. In addition,
verify that tr(A) = summationtext3i=1 λi, |A| = producttext3i=1 λi, and eigenvectors are orthonormal.
Finally, is A a full rank matrix ?
5.4 Powers of a Matrix
1. ”Expanding” (integer power):
Consider first for example,
AA = A2
= (XΛX′)(XΛX′)
= XΛX′XΛX′
= XΛIΛX′
= XΛΛX′
= XΛ2X′,
24
and if A?1 exist, then
A?1 = (XΛX′)?1
= (X′)?1Λ?1X?1
= XΛ?1X′.
Then we have the following theorem.
Theorem:
For any nonsingular symmetric matrix A = XΛX′, then Ak = XΛkX′, k =
...,?2,?1,0,1,2,...
2. ”Contraction” (fractional power that is less than one):
Sometimes we may require a matrix A such that BB = A, the B is denoted as
A1/2, and it is computed as
A1/2 = XΛ1/2X′
= X
?
??
??
??
?
√λ
1 0 . . . 0
0 √λ2 0 . . 0
. . . . . .
. . . . . .
. . . . . .
0 0 . . . √λn
?
??
??
??
?
X′,
as long as all λi are nonnegative.
Combining 1 and 2, we have the following result.
Theorem:
For a positive definite matrix A, Ar = XΛrX′, for any real number r.
5.5 Idempotent Matrices
An symmetric Idempotent matrix is the one such that
Ak = XΛkX′
= XΛX′
= A, for all nonnegetive integer k.
25
Therefore, (λi)k = λi for all i = 1,..,n. That is all the eigenvalue of an idempotent
matrix are 0 or 1. An immediate results from this is that rank of idempotent
matrix is equal to its trace.
5.6 Quadratic Forms and Definite Matrices
Many optimization problems involving double sums of the form
q =
nsummationdisplay
i=1
nsummationdisplay
j=1
cicjaij.
This quadratic form can be written as
q = c′Ac,
where A is a symmetric matrix.
Example:
In optimization z = f(x1,x2), the FOC is
dz = f1dx1 + f2dx2,
and the SOC is
d2z = f11dx21 + 2f12dx1dx2 + f22dx22,
which can be written as
[dx1 dx2]
bracketleftbigg f
11 f12
f21 f22
bracketrightbiggbracketleftbigg dx
1
dx2
bracketrightbigg
,
and is a quadratic form.
Definition:
For a given symmetric matrix A,
1. If c′Ac > (<)0 for all nonzero c, the A is positive (negative) definite.
2. If c′Ac ≥ (≤)0 for all nonzero c, the A is nonnegative definite or positive
semidefinite (nonpositive definite).
To see the possibility, recall that
A = XΛX′,
26
therefore the quadratic form can be written as
c′Ac = c′XΛX′c
= y′Λy
=
nsummationdisplay
i=1
λiy2i ,
where y = X′c is a n×1 vector.
Theorem:
Let A be a symmetric matrix. If all the eigenvalues of A are positive (negative),
then A is positive definite (negative definite). If some of the eigenvalues are zero,
then A is nonnegative definite if the remainder are positive. If A has both neg-
ative and positive roots, then A is indefinite.
Example:
Let’s us consider a two-product firm under a perfectly competitive market. Ac-
cordingly, a firm’s revenue function is
R = 12Q1 + 18Q2,
where Qi represents the output level of the ith product per unit of time. The
firm’s cost function is assumed to be
C = 2Q21 + Q1Q2 + 2Q22.
The profit function of this hypothetical firm can now be written as
pi = R?C = 12Q1 + 18Q2 ?2Q21 ?Q1Q2 ?2Q22
It is our task to find the levels of Q1 and Q2 which, in combination, will maximize
pi. For this purpose, we first find the first-order partial derivatives of the profit
function:
φ1
parenleftbigg
= ?pi?Q
1
parenrightbigg
= 12?4Q1 ?Q2 (7)
φ2
parenleftbigg
= ?pi?Q
2
parenrightbigg
= 18?Q1 ?4Q2. (8)
Setting these both equal to zero, to satisfy the necessary condition for maximum,
we have Q?1 = 2 and Q?2 = 4, implying an optimal profit pi? = 48 per unit of time.
27
To be sure that this does represent a maximum profit, let us check the second-
order condition. The second partial derivatives, obtainable by partial differenti-
ation of (7) and (8), give us the following Hessian:
H =
bracketleftbigg pi
11 pi12
pi21 pi22
bracketrightbigg
=
bracketleftbigg ?4 ?1
?1 ?4
bracketrightbigg
.
Since eigenvalues of H is ?3 and ?5 and both are smaller than zero, the Hessian
matrix (or d2z) is negative definite, and the solution does maximize the profit.
5.6.1 Nonnegative Definite Matrices
1. If A is nonnegative definite, then |A| ≥ 0.
2. If A is positive definite, so is A?1.
3. If A is n × k with full rank and n > k, then A′A is positive and AA′ is
nonnegative.
Proof:
SinceAisfull rank, soAc = a1c1+a2c2+...+akck negationslash= 0. Therefore, c′A′Ac = (Ac)′(Ac) = y′y =summationtext
i y
2
i > 0. Meanwhile, a possible ”zero” solution exist in the equation A
′c = 0,
since A′c = a′1c1 +a′2c2 + ... +a′ncn. Here, ai,i = 1,2,...,k is the ith column of
A and a′j,j = 1,2,...,n is the jth column of A′.
5.6.2 Idempotent Quadratic Forms
A quadratic form c′Ac is called a ”Idempotent Quadratic Forms” when A is a
symmetric idempotent matrix.
1. Every symmetric idempotent matrix is nonnegative definite.
2. If A is symmetric and idempotent n × n with rank j, then every quadratic
form in A can be written as c′Ac=summationtextji=1 y2i .
5.6.3 The Triangular Factorization of a Positive Definite Symmetric
Matrix
28
Theorem:
Any positive definite symmetric (n×n) matrix ? has a unique representation of
the form
? = ADA′,
where A is a lower triangular matrix with 1s along the principal diagonal,
A =
?
??
??
??
??
?
1 0 0 . . . 0
a21 1 0 . . . 0
a31 a32 1 . . . 0
. . . . . . .
. . . . . . .
. . . . . . .
an1 an2 an3 . . . 1
?
??
??
??
??
?
,
and D is a diagonal matrix,
D =
?
??
??
??
??
?
d11 0 0 . . . 0
0 d22 0 . . . 0
0 0 d33 . . . 0
. . . . . . .
. . . . . . .
. . . . . . .
0 0 0 . . . dnn
?
??
??
??
??
?
,
where dii > 0 for all i. This is known as the triangular factorization of ?.
Proof:
Consider
? =
?
??
??
??
??
?
?11 ?12 ?13 . . . ?1n
?21 ?22 ?23 . . . ?2n
?31 ?32 ?33 . . . ?3n
. . . . . . .
. . . . . . .
. . . . . . .
?n1 ?n2 ?n3 . . . ?nn
?
??
??
??
??
?
.
Our goal here is to transform ? to be a diagonal matrix. This can be accom-
plished in the first step by transform ? to be a matrix with zero in all the first
row and first column except for the (1,1) element. This set of operations is ?
premultiplied by E1 and postmultiplied by E′1 the results is
E1?E′1 = H, (9)
29
where
E1 =
?
??
??
??
??
?
1 0 0 . . . 0
??21??111 1 0 . . . 0
??31??111 0 1 . . . 0
. . . . . . .
. . . . . . .
. . . . . . .
??n1??111 0 0 . . . 1
?
??
??
??
??
?
(10)
and
H =
?
??
??
??
??
?
h11 0 0 . . . 0
0 h22 h23 . . . h2n
0 h32 h33 . . . h3n
. . . . . . .
. . . . . . .
. . . . . . .
0 hn2 hn3 . . . hnn
?
??
??
??
??
?
=
?
??
??
??
??
?
?11 0 0 . . . 0
0 ?22 ??21??111 ?12 ?23 ??21??111 ?13 . . . ?2n ??21??111 ?1n
0 ?32 ??31??111 ?12 ?33 ??31??111 ?13 . . . ?3n ??31??111 ?1n
. . . . . . .
. . . . . . .
. . . . . . .
0 ?n2 ??n1??111 ?12 ?n2 ??n1??111 ?13 . . . ?nn ??n1??111 ?1n
?
??
??
??
??
?
.
The matrix E1 always exists, provided that ?11 negationslash= 0. This is ensured in the
present case, because ?11 is equal to e′1?e1, where e′1 = [1 0 0 ... 0]. Since ? is
positive definite, e′1?e1 must be greater than zero.
(Hint: To see this, consider n = 3 for example, E1 is so constructed such that
the first column of E1? is zero, i.e.
E1? =
?
?
1 0 0
??21??111 1 0
??31??111 0 1
?
?
?
?
?11 ?12 ?13
?21 ?22 ?23
?31 ?32 ?33
?
?
=
?
?
?11 ?12 ?13
0 ?22 ??21??111 ?12 ?23 ??21??111 ?13
0 ?32 ??31??111 ?12 ?33 ??31??111 ?13
?
?.
30
It is easy to see that
E1?E′1 =
?
?
?11 ?12 ?13
0 ?22 ??21??111 ?12 ?23 ??21??111 ?13
0 ?32 ??31??111 ?12 ?33 ??31??111 ?13
?
?
?
?
1 ??21??111 ??31??111
0 1 0
0 0 1
?
?
=
?
?
?11 0 0
0 ?22 ??21??111 ?12 ?23 ??21??111 ?13
0 ?32 ??31??111 ?12 ?33 ??31??111 ?13
?
? using the fact that ? is symmetric.
?
?
We next proceed in exactly the same way with the second row and second
column of H. This set of operations is H promultiplied by E2 and postmultiplied
by E′2 the results is
E2HE′2 = K, (11)
where
E2 =
?
??
??
??
??
?
1 0 0 . . . 0
0 1 0 . . . 0
0 ?h32h?122 1 . . . 0
. . . . . . .
. . . . . . .
. . . . . . .
0 ?hn2h?122 0 . . . 1
?
??
??
??
??
?
(12)
and
K =
?
??
??
??
??
?
h11 0 0 . . . 0
0 h22 0 . . . 0
0 0 h33 ?h32h?122 h23 . . . h3n ?h32h?122 h2n
. . . . . . .
. . . . . . .
. . . . . . .
0 0 hn3 ?hn2h?122 h23 . . . hnn ?hn2h?122 h2n
?
??
??
??
??
?
.
The matrix E2 always exists, provided that h22 negationslash= 0. But h22 can be calcu-
lated as h22 = e′2He2, where e′2 = [0 1 0 ... 0]. Moreover, H = E1?E′1, where
? is positive definite and E1 is given by (10). Since E1 is lower triangular, its
determinant is the product of terms along the principle diagonal, which are all
unity. Thus E1 is nonsingular, meaning that H = E1?E′1 is positive definite and
31
so h22 = e′2He2 must be strictly positive. Thus the matrix in (11) can always be
calculated.
Proceeding through each of the columns and rows with the same approach, we
see that for any positive symmetric matrix ? there exist matrices E1, E2,...,En?1
such that
En?1 ···E2E1?E′1E′2 ···E′n?1 = D, (13)
where
D =
?
??
??
??
??
?
?11 0 0 . . . 0
0 ?22 ??21??111 ?12 0 . . . 0
0 0 h33 ?h32h?122 h23 . . . 0
. . . . . . .
. . . . . . .
. . . . . . .
0 0 0 . . . cnn ?cn,n?1c?1n?1,n?1cn?1,n
?
??
??
??
??
?
,
with all the diagonal entries of D strictly positive. In general, Ej is a matrix
with nonzero value in the jth column below the principle diagonal, 1s along the
principle diagonal, and zeros everywhere else.
Thus each Ej is lower triangular with unit determinant. Hence E?1j exists,
and the following matrix exists:
A = (En?1 ···E2E1)?1 = E?11 E?12 ···E?1n?1.
If (13) is premultiplied by A and postmulptiplied by A′, the result is
? = ADA′,
where A is a lower triangular matrix with 1s along the principle diagonal from
the fact that the product of lower triangular matrix is also triangular and the
inverse of a lower triangular matrix is also lower triangular.
5.6.4 The Cholesky Factorization
A close related factorization of a symmetric positive definite matrix ? is obtained
as follows. DefineD1/2 to be the (n×n)diagonal matrix whose diagonal entries are
32
the square roots of the corresponding elements of the matrix D in the triangular
factorization:
D1/2 =
?
??
??
??
??
?
√d
11 0 0 . . . 0
0 √d22 0 . . . 0
0 0 √d33 . . . 0
. . . . . . .
. . . . . . .
. . . . . . .
0 0 0 . . . √dnn
?
??
??
??
??
?
.
since the matrix D is unique and has strictly positive diagonal entries, the matrix
D1/2 exists and is unique. Then the triangular factorization can be written
? = AD1/2D1/2A′ = AD1/2(AD1/2)′
or
? = PP′, (14)
where
P ≡ AD1/2
=
?
??
??
??
??
?
1 0 0 . . . 0
a21 1 0 . . . 0
a31 a32 1 . . . 0
. . . . . . .
. . . . . . .
. . . . . . .
an1 an2 an3 . . . 1
?
??
??
??
??
?
?
??
??
??
??
?
√d
11 0 0 . . . 0
0 √d22 0 . . . 0
0 0 √d33 . . . 0
. . . . . . .
. . . . . . .
. . . . . . .
0 0 0 . . . √dnn
?
??
??
??
??
?
=
?
??
??
??
??
?
√d
11 0 0 . . . 0
a21√d11 √d22 0 . . . 0
a31√d11 a32√d22 √d33 . . . 0
. . . . . . .
. . . . . . .
. . . . . . .
an1√d11 an2√d22 an3√d33 . . . √dnn
?
??
??
??
??
?
.
Expression (14) is known as the Cholesky factorization of ?.
Exercise:
33
Let ? =
?
?
4 2 ?2
2 10 2
?2 2 5
?
?. Find lower triangular matrices P and A, and a
diagonal matrix D such that ? = PP′ (Cholesky decomposition) and ? = ADA′
(triangular factorization).
34
6 Calculus and Matrix Algebra
1. Differentiating a scalar with respect to a column yield a column. That is for
y = f(x1,x2,..,xn) = f(x), then
?f(x)
?x =
?
??
??
??
??
?f(x)
?x1?f(x)
?x2
.
.
.
?f(x)
?xn
?
??
??
??
??
=
?
??
??
??
?
f1
f2
.
.
.
fn
?
??
??
??
?
≡ gradient vector.
Example 1:
A linear function is y = a1x1 + a2x2 + ... + anxn = a′x. Then
?a′x
?x =
?
??
??
??
??
?a′x
?x1
?a′x
?x2
.
.
.
?a′x
?xn
?
??
??
??
??
=
?
??
??
??
?
a1
a2
.
.
.
an
?
??
??
??
?
= a.
Example 2:
A quadratic form is y = x′Ax. Then
?x′Ax
?x =
braceleftbigg 2Ax when A is symmetric;
(A+A′)x when A is not symmetric.
2. Differentiating a column vector with respect to a row vector yield a matrix.
That is for y = f(x1,x2,..,xn) = f(x), then
?
bracketleftBig
?f(x)
?x
bracketrightBig
?x′ =
?
??
??
??
??
?f1
?x1
?f1
?x2 . . .
?f1
?xn?f
2
?x1
?f2
?x2 . . .
?f2
?xn
. . . . . .
. . . . . .
. . . . . .
?fn
?x1
?fn
?x2 . . .
?fn
?xn
?
??
??
??
??
=
?
??
??
??
?
f11 f12 . . . f1n
f21 f22 . . . f2n
. . . . . .
. . . . . .
. . . . . .
fn1 fn2 . . . fnn
?
??
??
??
?
≡ Hessian Matrix.
35
Example:
A set of linear function is y = Ax. It follows that yi = aix, where ai is the ith
row of A. Therefore
?yi
?x = (a
i)′,
or
?yi
?x′ = a
i.
Collecting all the elements, we have
?
??
??
??
?
?y1
?x′?y
2
?x′.
.
.
?yn
?x′
?
??
??
??
?
=
?
??
??
??
?
a1
a2
.
.
.
an
?
??
??
??
?
= ?Ax?x′ = A.
3. Differentiating a scalar with respect to a n×n matrix yield a n×n matrix.
a.
?x′Ax
?A = xx
′,
b.
? ln|A|
?A = (A
?1)′.
36