1 
18.338J/16.394J: The Mathematics of In?nite Random Matrices 
Essentials of Finite Random Matrix Theory 
Alan Edelman 
Handout #6, Tuesday, September 28, 2004 
This handout provides the essential elements needed to understand ?nite random matrix theory. A 
cursory observation should reveal that the tools for in?nite random matrix theory are quite di?erent from 
the tools for ?nite random matrix theory. Nonetheless, there are signi?cantly more published applications 
that use ?nite random matrix theory as opposed to in?nite random matrix theory. Our belief is that many 
of the results that have been historically derived using ?nite random matrix theory can be reformulated and 
answered using in?nite random matrix theory. In this sense, it is worth recognizing that in many applications 
it is an integral of a function of the eigenvalues that is more important that the mere distribution of the 
eigenvalues. For ?nite random matrix theory, the tools that often come into play when setting up such 
integrals are the Matrix Jacobians, the Joint Eigenvalue Densities and the Cauchy-Binet theorem. We 
describe these in subsequent sections. 
Matrix and Vector Di?erentiation 
In this section, we concern ourselves with the di?erentiation of matrices. Di?erentiating matrix and vector 
functions is not signi?cantly harder than di?erentiating scalar functions except that we need notation to 
keep track of the variables. We titled this section “matrix and vector” di?erentiation, but of course it is 
the function that we di?erentiate. The matrix or vector is just a notational package for the scalar functions 
involved. In the end, a derivative is nothing more than the “linearization” of all the involved functions. 
We ?nd it useful to think of this linearization both symbolically (for manipulative purposes) as well as 
numerically (in the sense of small numerical perturbations). The di?erential notation idea captures these 
viewpoints very well. 
We begin with the familiar product rule for scalars, 
d(uv) = u(dv) + v(du), 
from which we can derive that d(x
3
) = 3x
2
dx. We refer to dx as a di?erential. 
We all unconsciously interpret the “dx” symbolically as well as numerically. Sometimes it is nice to 
con?rm on a computer that 
3
(x + ?)
3 
? x
≈ 3x 
2 
. (1) 
? 
I do this by taking x to be 1 or 2 or randn(1) and ? to be .001 or .0001. 
The product rule holds for matrices as well: 
d(UV ) = U(dV )+ (dU)V . 
In the examples we will see some symbolic and numerical interpretations. 
Example 1: Y = X
3 
We use the product rule to di?erentiate Y (X) = X
3 
to obtain that 
d(X
3
) = X
2
(dX)+ X(dX)X + (dX)X
2 
. 
When I introduce undergraduate students to matrix multiplication, I tell them that matrices are like scalars, 
except that they do not commute. 
The numerical (or ?rst order perturbation theory) interpretation applies, but it may seem less familiar 
at ?rst. Numerically take X=randn(n) and E=randn(n) for ? = .001 say, and then compute 
(X + ?E)
3 
? X
3 
≈ X
2
E + XEX + EX
2 
. (2) 
? 
This is the matrix version of (1). Holding X ?xed and allowing E to vary, the right-hand side is a linear 
function of E. There is no simpler form possible. 
Symbolically (or numerically) one can take dX = E
kl 
which is the matrix that has a one in element (k,l) 
and 0 elsewhere. Then we can write down the matrix of partial derivatives: 
?X
3 
= X
2
(E
kl
) + X(E
kl
)X + (E
kl
)X
2 
. 
?x
kl 
As we let k,l vary over all possible indices, we obtain all the information we need to compute the derivative 
in any general direction E. 
In general, the directional derivative of Y
ij
(X) in the direction dX is given by (dY)
ij
. For a particular 
matrix X, dY(X) is a matrix of directional derivatives corresponding to a ?rst order perturbation in the 
direction E = dX. It is a matrix of linear functions corresponding to the linearization of Y(X) about X. 
Structured Perturbations 
We sometimes restrict our E to be a structured perturbation. For example if X is triangular, symmetric, 
antisymmetric, or even sparse then often we wish to restrict E so that the pattern is maintained in the 
perturbed matrix as well. An important case occurs when X is orthogonal. We will see in an example below 
that we will want to restrict E so that X
T
E is antisymmetric when X is orthogonal. 
Example 2: y = x
T
x 
Here y is a scalar and dot products commute so that dy = 2x
T
dx. When y = 1, x is on the unit sphere. 
To stay on the sphere, we need dy = 0 so that x
T
dx = 0, i.e., the tangent to the sphere is perpendicular to 
the sphere. Note the two uses of dy. In the ?rst case it is the change to the squared length of x. In the 
second case, by setting dy = 0, we ?nd perturbations to x which to ?rst order do not change the length at 
all. Indeed if one computes (x+dx)
T 
(x+dx) for a small ?nite dx, one sees that if x
T 
dx = 0 then the length 
changes only to second order. Geometrically, one can draw a tangent to a circle. The distance to the circle 
is second order in the distance along the tangent. 
Example 3: y = x
T
Ax 
Again y is scalar. We have dy = dx
T
Ax + x
T
Adx. If A is symmetric then dy = 2x
T
Adx. 
Example 4: Y = X
?1 
We have that XY = I so that X(dY) + (dX)Y = 0 so that dY = ?X
?1
dXX
?1 
. 
We recommend that the reader compute ?
?1
((X + ?E)
?1 
? X
?1
) numerically and verify that it is equal 
to ?X
?1
EX
?1 
. 
In other words, 
(X + ?E)
?1 
= X
?1 
? ?X
?1
EX
?1 
+ O(?
2
). 
Example 5: I = Q
T
Q 
If Q is orthogonal we have that Q
T
dQ + dQ
T
Q = 0 so that Q
T
dQ is antisymmetric. 
In general, d(Q
T
Q) = Q
T
dQ+dQ
T
Q, but with no orthogonalitycondition on Q, there is no anti-symmetry 
condition on Q
T
dQ. 
If y is a scalar function of x
1
,x
2
,. . . ,x
n 
then we have the “chain rule” 
?y ?y ?y 
dy = dx
1 
+ dx
2 
+ . . . + dx
n 
. 
?x
1 
?x
2 
?x
n 
bracketleftbigg bracketrightbigg bracketleftbigg 
integraltext 
integraltext 
integraldisplay integraldisplay 
Often we wish to apply the chain rule to every element of a vector or matrix. 
2 
bracketrightbigg
Let X = 
p q 
and Y = X
2 
= 
p + qr pq + rs 
then
r s pr + rs qs + s
2 
dY = XdX + dXX . (3) 
2 Matrix Jacobians (getting started) 
2.1 De?nition 
Let x ∈ R
n 
and y = y(x) ∈ R
n 
be a di?erentiable function of x. It is well known that the Jacobian matrix 
? ?
?y
1 
?y
1 
? 
?x
1 
··· 
?x
n 
? parenleftbigg parenrightbigg 
? 
. 
? 
?y
i 
? 
.J =
. 
. 
? 
= 
? . . ? 
? 
?x
j 
i,j=1,2,...,n ? 
?y
n 
?y
n 
?x
1 
··· 
?x
n 
evaluated at a point x approximates y(x) by a linear function. Intuitively y(x + δx) ≈ y(x) + Jδx, i.e., J 
is the matrix that allows us to invoke perturbation theory. The function y may be viewed as performing a 
change of variables. 
Furthermore (intuitively) if a little box of n dimensional volume ? surrounds x, then it is transformed by 
y into a parallelopiped of volume det J ? around y(x). Therefore the Jacobian det J is the magni?cation | | | |
factor for volumes. 
If we are integrating some function of y ∈ R
n 
as in p(y)dy, (where dy = dy
1 
. . . dy
n
), then when we 
change variables from y to x where y = y(x), then the integral becomes p(y(x))|det(
?y
i 
) dx. For many 
?x
j 
|
people this becomes a matter of notation, but one should understand intuitively that the Jacobian tells you 
how volume elements scale. 
The determinant is 0 exactly where the change of variables breaks down. It is always instructive to see 
when this happens. Either there is no “x” locally for each “y” or there are many as in the example of polar 
coordinates at the origin. 
Notation: throughout this book, J denotes the Jacobian matrix. (Sometimes called the derivative or 
simply the Jacobian in the literature.) We will consistently write det J for the Jacobian determinant (un 
fortunately also called the Jacobian in the literature.) When we say Jacobian, we will be talking about 
both. 
2.2 Simple Examples (n=2) 
We get our feet wet with some simple 2×2 examples. Every reader is familiar with changing scalar variables 
as in 
integraldisplay integraldisplay 
f(x)dx = f(y 
2
)(2y)dy. 
We want the reader to be just as comfortable when f is a scalar function of a matrix and we change X = Y
2
: 
f(X)(dX) = f(Y
2
)(Jacobian)(dY). 
One can compute all of the 2 by 2 Jacobians that follow by hand, but in some cases it can be tedious 
and hard to get right on the ?rst try. Code 8.1 in MATLAB takes away the drudgery and gives the right 
answer. Later we will learn fancy ways to get the answer without too much drudgery and also without the 
aid of a computer. We now consider the following examples: 
2 × 2 Example 1: Matrix Square (Y = X
2
) 
parenleftBig parenrightBig 
bracketleftBigg bracketrightBigg 
bracketleftBigg bracketrightBigg 
2 × 2 Example 2: Matrix Cube (Y = X
3
)
2 × 2 Example 3: Matrix Inverse (Y = X
?1
)
2 × 2 Example 4: Linear Transformation (Y = AX + B)
2 × 2 Example 5: The LU Decomposition (X = LU)
2 × 2 Example 6: A Symmetric Decomposition (S = DMD)
2 × 2 Example 7: Traceless Symmetric = Polar Decomposition (S = QΛQ
T
, tr S = 0)
2 × 2 Example 8: The Symmetric Eigenvalue Problem (S = QΛQ
T
)
2 × 2 Example 9: Symmetric Congruence (Y = A
T
SA)
Discussion: 
Example 1: Matrix Square (Y = X
2
) 
bracketleftBigg bracketrightBigg 
With X = 
p q 
and Y = X
2 
the Jacobian matrix of interest is 
r s 
?p ?r ?q ?s 
?	 ? 
2p q r 0 ?Y
11 
? 
r p+ s 0 r 
? 
?Y
21 
?	 ?
J = 
?	
q 0 p+ s q 
? 
?Y
12 
0 q r 2s ?Y
22 
On this ?rst example we label the columns and rows so that the elements correspond to the de?nition 
J = 
?Y
ij 
. Later we will omit the labels. We invite readers to compare with Equation (3). We see that 
?X
kl 
the Jacobian matrix and the di?erential contain the same information. We can compute then 
det J = 4(p+ s)
2
(sp? qr) = 4(tr X)
2 
det(X). 
Notice that breakdown occurs if X is singular or has trace zero. 
Example 2: Matrix Cube (Y = X
3
) 
p q
With X = and Y = X
3
r s
? 
3p
2 
+ 2qr pq + q(p+ s) 2rp+ sr qr 
? 
?	 ? 
?? 
2rp + sr p
2 
+ 2qr + (p+ s)s r
2 
rp + 2sr 
?	 ?
J = ? ? , 
? 
2pq + qs q
2 
p(p+ s) + 2qr + s
2 
pq + 2qs 
? 
?	 ? 
qr pq + 2qs r (p+ s) + sr 2qr + 3s
2 
so that 
2 2
det J = 9(sp? qr)
2
(qr + p + s + sp)
2 
=
9
(det X)
2
(tr X
2 
+ (tr X)
2
)
2 
. 
4
Breakdown occurs if X is singular or if the eigenvalue ratio is a complex cube root of unity. 
Example 3: Matrix Inverse (Y = X
?1
) 
p q
With X = and Y = X
?1
r s
? 
2	
? 
qs sr ?qr ?s
?	 ? 
?	
sr ?ps 
2 
rp 
?
1 
? 
?r
?
J = ? ? ,
det X
2 
× 
? 
qs ?q
2 
?ps pq 
? 
?	 ? 
2
?qr pq rp ?p
bracketleftBigg bracketrightBigg bracketleftBigg bracketrightBigg bracketleftBigg bracketrightBigg 
negationslash
bracketleftbigg bracketrightbigg 
bracketleftBigg bracketrightBigg 
bracketleftbigg bracketrightbigg 
so that 
det J = (det X)
?4 
. 
Breakdown occurs if X is singular. 
Example 4: Linear Transformation (Y = AX + B) 
? ? 
a b 0 0 
? ? 
? 
c 
?
d 0 0 
? ?
J = ? ? 
? 
0 
?
0 a b 
? ? 
0 0 c d 
The Jacobian matrix has two copies of the constant matrix A so that det J = det A
2 
= (detA)
2 
. Breakdown 
occurs if A is singular. 
Example 5: The LU Decomposition (X = LU) 
The LU Decomposition computes a lower triangular L with ones on the diagonal and an upper triangular 
U such that X = LU. 
For a general 2 × 2 matrix it takes the form 
p q 1 0 p q 
= 
r s 
which exists when p = 0. 
r
p p 
ps?qr 
1 0 
Notice the function of four variables might be written: 
y
1 
= p 
y
2 
= r/p 
y
3 
= q 
y
4 
= (ps ? qr)/p = det(X)/p 
The Jacobian matrix is itself lower triangular 
? ? 
1 0 0 0 
? 
p
?1 
? 
0 0
r
p
2 
? ? , 
? 
0 
?
0 1 0 
? ? 
? 
? 
? 
? 
J = 
? 
qr 
p
2 
so that det J = 1/p. Breakdown occurs when p = 0. 
Example 6: A Symmetric Decomposition (S = DMD) 
Any symmetric matrix X = 
p r 
may be written as 
r s 
√
p 0
1 r/
√
ps 
X = DMD where D = . 
0 
√
s 
r/
√
ps 1 
Since X is symmetric, there are three independent variables, and thus the Jacobian matrix is 3 × 3. The 
three independent elements in D and M may be thought of as functions of p, r, and s: namely 
y
1 
= 
√
p 
y
2 
= 
√
s and 
y
3 
= r/
√
ps 
q
p 
r
p 
1? ? 
and M = 
radicalbig 
bracketleftBigg bracketrightBigg 
bracketleftBigg bracketrightBigg 
bracketleftbigg bracketrightbigg bracketleftbigg bracketrightbigg bracketleftbigg 
The Jacobian matrix is 
? 
1 
? 
√
p 
0 0 
? ? 
? 
1 
? 
√
s 
0
J = ? 
0 
1
? 
2 ? ? 
r/p r/s 2 
√
ps 
√
ps 
√
ps 
? ? 
so that 
1 
det J = 
4ps 
Breakdown occurs if p or s is 0. 
Example 7: Traceless Symmetric = Polar Decomposition (S = QΛQ
T 
, tr S = 0) 
The reader will recall the usual de?nition of polar coordinates. If (p, s) are Cartesian coordinates, then 
the angle is θ = arctan(s/p) and the radius is r = p
2 
+ s
2 
. If we take a symmetric traceless 2 × 2 matrix 
p s 
S = , 
s ?p 
and compute its eigenvalue and eigenvector decomposition, we ?nd that the eigendecomposition is mathe 
matically equivalent to the familiar transformation between Cartesian and polar coordinates. Indeed one of 
the eigenvectors of S is (cos φ, sin φ), where φ = θ/2. The Jacobian matrix is 
? ? 
√
p
2
p 
+s
√
p
2
s 
+s
2 2 
J = 
? ? 
p?s 
2
p
2
+s
2 
p
2
+s
The Jacobian is the inverse of the radius. This corresponds to the familiar formula using the more usual 
notation dxdy/r = drdθ so that det J = 1/r. Breakdown occurs when r = 0. 
Example 8: The Symmetric Eigenvalue Problem (S = QΛQ
T 
) 
We compute the Jacobian for the general symmetric eigenvalue and eigenvector decomposition. Let 
p s 
cos θ ? sin θ λ
1 
cos θ ? sin θ 
bracketrightbigg
T 
S = = . 
sin θ cos θ λ
2 
sin θ cos θ
s r 
We can compute the eigenvectors and eigenvalues of S directly in MATLAB and compute the Jacobian of 
the two eigenvalues and the eigenvector angle, but when we tried with the Maple toolbox we found that the 
symbolic toolbox did not handle “arctan” very well. Instead we found it easy to compute the Jacobian in 
the other direction. 
We write S = QΛQ
T 
, where Q is 2 × 2 orthogonal and Λ is diagonal. The Jacobian is 
? ? 
?2sin θcos θ(λ
1 
? λ
2
) cos
2 
θ sin 
2 
θ 
? 
?
J = 
? 
2sin θcos θ(λ
1 
? λ
2
) sin 
2 
θ cos
2 
θ 
? 
? ? 
(cos
2 
θ ? sin 
2 
θ)(λ
1 
? λ
2
) sin θ cos θ ? sin θcos θ 
so that 
det J = λ
1 
? λ
2 
. 
Breakdown occurs if S is a multiple of the identity. 
Example 9: Symmetric Congruence (Y = A
T
SA) 
bracketleftbigg bracketrightbigg 
? ? 
? ? 
Let Y = A
T
SA, where Y and S are symmetric, but A = 
a b 
is arbitrary. The Jacobian matrix is 
c d 
? 
2 2 
? 
a c 2ca 
? ? 
?
J = 
? 
b
2 
d
2 
2db 
? ? 
ab cd cb + ad 
and det J = (det A)
3 
. 
The cube on the determinant tends to surprise many people. Can you guess what it is for an n × n 
symmetric matrix (Y = A
T
SA)? The answer (det J = (det A)
n+1
)is in Example 3 of Section 3. 
%jacobian2by2.m
%Code 8.1 of Random Eigenvalues by Alan Edelman
%Experiment: Compute the Jacobian of a 2x2 matrix function
%Comment: Symbolic tools are not perfect. The author
% exercised care in choosing the variables.
syms p q r s a b c d t e1 e2
X=[p q ; r s]; A=[a b;c d];
%% Compute Jacobians
Y=X^2; J=jacobian(Y(:),X(:)), JAC_square =factor(det(J))
Y=X^3; J=jacobian(Y(:),X(:)), JAC_cube =factor(det(J))
Y=inv(X); J=jacobian(Y(:),X(:)), JAC_inv =factor(det(J))
Y=A*X; J=jacobian(Y(:),X(:)), JAC_linear =factor(det(J))
Y=[p q;r/p det(X)/p]; J=jacobian(Y(:),X(:)), JAC_lu =factor(det(J))
x=[p s r];y=[sqrt(p) sqrt(s) r/(sqrt(p)*sqrt(s))];
J=jacobian(y,x), JAC_DMD =factor(det(J))
x=[p s]; y=[ sqrt(p^2+s^2) atan(s/p)];
J=jacobian(y,x), JAC_notrace =factor(det(J))
Q=[cos(t) -sin(t); sin(t) cos(t)];
D=[e1 0;0 e2];Y=Q*D*Q.’;
y=[Y(1,1) Y(2,2) Y(1,2)]; x=[t e1 e2];
J=jacobian(y,x), JAC_symeig =simplify(det(J))
X=[p s;s r]; Y=A.’*X*A;
y=[Y(1,1) Y(2,2) Y(1,2)]; x=[p r s];
J=jacobian(y,x), JAC_symcong =factor(det(J))
Code 1 
productdisplay 
3 Y = BXA
T 
and the Kronecker Product 
3.1 Jacobian of Y = BXA
T 
(Kronecker Product Approach) 
There is a “nuts and bolts” approach to calculate some Jacobian determinants. A good example function is 
the matrix inverse Y = X
?1 
. We recall from Example 4 of Section 1 that 
dY = ?X
?1
dXX
?1 
. 
In words, the perturbation dX is multiplied on the left and on the right by a ?xed matrix. When this 
happens we are in a “Kronecker Product” situation, and can instantly write down the Jacobian. 
We provide two de?nitions of the Kronecker Product for square matrices A∈ R
n,n 
to B∈ R
m,m 
. 
See [1] for a nice discussion of Kronecker products. 
Operator De?nition 
A? B is the operator from X∈ R
m,n 
to Y ∈ R
m,n 
where Y = BXA
T 
. We write 
(A? B)X = BXA
T 
. 
Matrix De?nition (Tensor Product) 
A? B is the matrix 
? 
a
11
B ... a
1m
2 
B 
? 
A? B = 
? 
? 
. 
. 
. 
. 
. 
. 
? 
? . (4) 
a
m
1
1
B ... a
m
1
m
2 
B 
The following theorem is important for applications. 
Theorem 1. det(A? B) = (det A)
m
(det B)
n 
Application: If Y = X
?1 
then dY = ?(X
?T 
? X
?1
)dX so that 
det J| = det X
?2n 
.| | |
Notational Note: The correspondence between the operator de?nition and matrix de?nition is worth 
spelling out. It corresponds to the following identity in MATLAB 
Y = B * X * A’
Y(:) = kron(A,B) * X(:)
% The second line does not change Y
Here kron(A,B) is exactly the matrix in Equation (4), and X(:) is the column vector consisting of the 
columns of X stacked on top of each other. (In computer science this is known as storing an array in 
“column major” order.) Many authors write vec(X), where we use X(:). Concretely, we have that 
vec(BXA
T
) = (A? B)vec(X) 
where A? B is as in (4). 
Proofs of Theorem 8.1: 
Assume A and B are diagonalizable, with Au
i 
= λ
i
u
i 
(i = 1,...,n) and Bv
i 
= μ
i
v
i 
(i = 1,...,m). 
Let M
ij 
= 
T 
. The mn matrices M
ij 
form a basis for R
mn 
and they are eigenmatrices of our map since v
i
u
j 
BM
ij
A
T 
= μ
i
λ
j
M
ij
. The determinant is 
μ
i
λ
j 
or (det A)
m
(det B)
n 
. (5) 
1≤i≤n
1≤j≤m
The assumption of diagonalizability is not important. 
Also one can directly manipulate the matrix since obviously 
A?B = (A?I)(I ?B) 
as operators, and det I ?B = (det B)
n 
and detA?I which can be reshu?ed into detI ?A = (detA)
m 
. 
Other proofs may be obtained by working with the “LU” decomposition of A and B, the SVD of A and 
B, the QR decomposition, and many others. 
Mathematical Note: That the operator Y = BXA
T 
can be expressed as a matrix is a consequence of 
linearity: 
B(c
1
X
1 
+ c
2
X
2
)A
T 
= c
1
BX
1
A
T 
+ c
2
BX
2
A
T 
i.e. (A?B)(c
1
X
1 
+ c
2
X
2
) = c
1
(A?B)X
1 
+ c
2
(A?B)X
2 
It is important to realize that a linear transformation from R
m,n 
to R
m,n 
is de?ned by an element of 
R
mn,mn 
, i.e., by the m
2
n
2 
entries of an mn×mn matrix. The transformation de?ned by Kronecker products 
is an m
2 
+ n
2 
subspace of this m
2
n
2 
dimensional space. 
Some Kronecker product properties: 
1. (A?B)
T 
= A
T 
?B
T 
2. (A?B)
?1 
= A
?1 
?B
?1 
, A ∈R
n,n 
,B ∈R
m,m 
3. det(A?B) = det(A)
m 
det(B)
n 
4. tr(A?B) = tr(A)tr(B) 
5. A?B is orthogonal if A and B are orthogonal 
6. (A?B)(C ?D) = (AC)?(BD) 
7. If Au = λu, and Bv = μv, then if X = vu
T 
, then BXA
T 
= λμX, and also AX
T
B
T 
= λμX
T 
. Therefore 
A?B and B ?A have the same eigenvalues, and transposed eigenvectors. 
It is easy to see that property 5 holds, since if A and B are orthogonal (A?B)
T 
= A
T 
?B
T 
= A
?1 
?B
?1 
= 
(A?B)
?1 
. 
Linear Subspace Kronecker Products 
Some researchers consider the symmetric Kronecker product ?
sym
. In fact it has become clear that one 
might consider an anti-symmetric, upper-triangular or even a Toeplitz Kronecker product. We formulate a 
general notion: 
De?nition: Let Sdenote a linear subspace of R
mn 
and π
S 
a projection onto S. If A ∈R
n,n 
and B ∈R
m,m 
then we de?ne (A?B)
S
X = π
S
(BXA
T
) for X ∈S. 
Comments 
1. If S is the set of symmetric matrices then m = n and 
BXA
T 
+ AXB
T 
(A?B)
sym
X = 
2 
2. If S is the set of anti-sym matrices, then m = n 
BXA
T 
+ AXB
T 
(A?B)
anti
X = as well 
2
but this matrix is anti-sym.
productdisplay 
productdisplay 
3. If S is the set of upper triangular matrices, then m = n and (A?B)
upper 
= upper(BXA
T
). 
4. We recall that π
S 
is a projection if π is linear and for all X∈R
mn 
, π .
S	 S
X∈S
5. Jacobian of ?
sym
: 
BXA
T 
+ AXB
T 
(A?B)
sym
X = 
2
If B = A then
det J = λ
i
λ
j 
= (det A)
n+1 
i≤j 
T
by considering eigenmatrices E = u
i
u
j 
for i≤j. 
6. Jacobian of ?
upper
: 
Special case: A lower triangular, B upper triangular so that 
(A?B)
upper
X = BXA
T 
since BXA
T 
is upper triangular. 
The eigenvalues of A and B are λ
i 
=	 A
ii 
and μ
j 
= B
jj 
respectively, where Au
i 
= λ
i
u
i 
and Bv
i 
= μ
i
v
i 
T
(i = 1,...,n). The matrix M
ij 
= v
i
u
j 
for i≤j is upper triangular since v
i 
and u
j 
are zero below the 
ith and above the jth component respectively. (The eigenvectors of a triangular matrix are triangular.) 
j 
? ? 
i 
? ? 
? ? 
for i≤j. M
ij 
= 
? ? 
Since the M
ij 
are a basis for upper triangular matrices BM
ij
A
T 
= μ
i
λ
j
M
ij
. We then have 
n n 
2
λ
3
det J = μ
i
λ
j 
= (λ
1
λ
2
3 
...λ
n
)(μ
1 
μ 
n?1 
μ 
n?2 
...μ
n
)
2 3 
i≤j 
n 
= (A
11
A
2 
33 
...A
nn
)(B
n 
22
A
3 
11
B
n?1
B
n?2 
...B
nn
).
22 33 
Note that J is singular if an only if A or B is. 
7. Jacobian of ?
Toeplitz
Let X be a Toeplitz matrix. We can de?ne
(A?
Toeplitz 
B)X = Toeplitz(BXA
T
)
where Toeplitz averages every diagonal.
4 Jacobians of Linear Functions, Powers and Inverses 
The Jacobian of a linear map is just the determinant. This determinant is not always easily computed. The 
2 
dimension of the underlying space of matrices plays a role. For example the Jacobian of Y = 2X is 2
n 
for 
n(n+1)	 n(n?1) 
2	 2
X ∈R
n×n
, 2 for upper triangular or symmetric X, 2 for antisymmetric X, and 2
n 
for diagonal 
X. 
We will concentrate on general real matrices X and explore the symmetric case and triangular case as 
well when appropriate. 
producttext producttext 
summationdisplay 
productdisplay summationdisplay 
4.0.1 General Matrices 
Let Y = X
2 
then 
dY = XdX + dXX or dY = I? X + X
T 
? I 
Using the E
ij 
as before Equation (5), we see that the eigenvalues of I? X + X
T 
? I are λ
i 
+ λ
j 
so that 
det J = 
1≤i,j≤n
(λ
i 
+ λ
j
) = 2
n 
det X 
i<j
(λ
i 
+ λ
j
)
2 
. When n = 2, we obtain 4detX(trX)
2 
as seen in 
example 8.2.2. 
For general powers, Y = X
p
, (p = 1,2,3,...) 
p?1 
dY = X
k
dX X
p?1?k 
k=0 
n
j
and det J = 
parenleftBigg 
p?1 
λ
i
k
λ
p?1?k 
parenrightBigg 
= p (det X)
p?1 
productdisplay
bracketleftBigg 
λ
i
p 
? λ
p 
bracketrightBigg
2 
.
j 
1≤i,j≤n k=0 i<j 
λ
i 
? λ
j 
Of course by computing the Jacobian determinant explicitly in terms of the matrix elements, we can obtain 
an equivalent expression in terms of the elements of X rather than the λ
i
. 
Formally, if we plug in p = ?1, we ?nd that we can compute det J for Y = X
?1 
yielding the answer 
given after Theorem 8.1. Indeed one can take any scalar function y = f(x) such as x
p
,e
x 
,logx,sin x, etc. 
The correct formula is 
bracketleftbigg bracketrightbigg
2 n 
bracketleftbigg bracketrightbigg
2 
productdisplay productdisplay 
f(λ
i
)? f(λ
j
)
det J = det(f
′
(X))
productdisplay 
f(λ
i
)? f(λ
j
)
= f
′
(λ
i
) .· 
λ
i 
? λ
j 
λ
i 
? λ
j
i<j i=1 i<j 
We can prove these using the tools provided by wedge products. 
Often one thinks that to de?ne Y = f(X) for matrices, one needs a power series, but this is not so. If 
X = ZΛZ
?1
, then we can de?ne f(X) = Zf(Λ)Z
?1
, where f(Λ) = diag(f(λ
1
),...,f(λ
n
)). The formula 
above is correct at any matrix X such that f is di?erentiable at the eigenvalues. 
When X is of low rank, we can consider the Moore-Penrose Inverse of X. The Jacobian was calculated 
in [2] and [3]. There is a nonsymmetric and symmetric version. 
Exercise. Compute the Jacobian determinant for 
Y = X
1/2 
and Y = X
?1/2 
. 
Advanced exercise: compute the Jacobian matrix for Y = X
1/2 
as an inverse of a sum of two Kronecker 
products. Numerically, a “Sylvester equation” solver is needed. To obtain the Jacobian matrix for Y = X
?1/2 
one can use the chain rule, given the answer to the previous problem. It is interesting that this is as hard as 
it is. 
4.0.2 Symmetric Matrices 
If X is symmetric, we have that det J = (det X)
?(n+1) 
for the map Y = X
?1 
. 
5 Jacobians of Matrix Factorizations (without wedge products) 
Let A be an n × n matrix. In elementary linear algebra, we learn about Gaussian elimination, Gram-
Schmidt orthogonalization, and the eigendecomposition. All of these ideas may be written compactly as 
matrix factorizations, which in turn may be thought of as a change of variables: 
productdisplay 
parenleftbig	 parenrightbig 
parenleftbigg	 parenrightbigg 
producttext	 producttext 
producttext 
Gaussian Elimination: 
Gram-Schmidt: 
Eigenvalue Decomposition: 
A = L U· 
↑ ↑
unit lower upper 
triangular triangular 
A = Q R· 
↑ ↑
Orthogonal upper 
triangular 
A = X Λ X
?1 
· · 
↑ ↑
eigenvectors eigenvalues 
parameter count 
n(n? 1)/2 + n(n + 1)/2 
n(n? 1)/2 + n(n + 1)/2 
(n
2 
? n)+ n 
↑ ↑
eigenvector eigenvalue 
Each of these factorizations is a change of variables. Somehow the n
2 
parameters in A are transformed into 
n
2 
other parameters, though it may not be immediately obvious what these parameters are (the n(n? 1)/2 
parameters in Q for example). 
Our goal is to derive the Jacobians for those matrix factorizations. 
5.1 Jacobian of Gaussian Elimination (A = LU) 
In numerical linear algebra texts it is often observed that the memory used to store A on a computer may 
be overwritten with the n
2 
variables in L and U. Graphically the n× n matrix A 
U 
A 
= 
L 
Indeed the same is true for other factorizations. This is not just a happy coincidence, but deeply related to 
the fact that n
2 
parameters ought not need more storage. 
Theorem 2. If A = LU, the Jacobian of the change of variables is 
n 
det J = u 
n?1 
u
22 
n?2 
...u
n?1,n?1 
= u 
n?i 
11 ii 
i=1 
Proof 1: Let A = LU, then using 
dA	 = LdU + dLU
= L(dUU
?1 
+ L
?1
dL)U
= (U
T 
? L) (U
T 
?
upper 
I)
?1
dU + (I?
lower 
L)
?1
dL 
The mapping dU dUU
?1 
only a?ects the upper triangular part. Similarly dL L
?1
dL for the lower →	 →
triangular. We might think of the map in block format: 
?1 
dU 
dA = U
T 
? L 
(U
T 
?
upper 
I) 
I?
lower 
L 
parenrightbigg parenleftbigg 
dL 
n
Since the Jacobian determinants of U
T 
? L is u
ii 
and of (U
T 
? I)
?1 
is u
?i 
(and that of I? L is 1), we 
ii 
have that det J = u 
n?i 
.	 ?
ii 
That is the fancy slick proof. Here is the direct proof. It is of value to see both. 
braceleftbigg 
braceleftbigg 
parenleftbigg parenrightbigg 
productdisplay 
parenrightbig 
producttext 
l
ij 
i > j 
Proof 2: Let M
ij 
= 
i≤ j 
as in the diagram above. Since A = LU, we have that 
u
ij 
i?1 
summationdisplay 
A
ij 
= M
ik
M
kj 
+ u
ij 
for i≤ j 
k=1 
and 
j?1 
summationdisplay 
A
ij 
= M
ik
M
kj 
+ l
ij
u
jj 
for i > j. 
k=1 
Therefore 
1 if ?Aij 
= 
i≤ j
. (6) 
?M
ij 
u
jj 
if i > j 
Notice that A
ij 
never depends on M
pq 
when p > i or q > j. Therefore if we order the variables ?rst by row 
and next by column, we see that the Jacobian matrix is lower triangular with diagonal entries given in (6). 
Remembering that the determinant of a lower triangular matrix is the product of the diagonal entries, the 
theorem follows. ? 
Perhaps it should not have been surprising that the condition that A admits a unique LU decomposition 
may be thought of in terms of the Jacobian. Given a matrix A, let p
k 
denote the determinant of the upper 
left k by k principal minor. It may be readily veri?ed that u
11 
...u
kk 
= p
k 
and hence the Jacobian is 
p
1
p
2 
...p
n?1
. The condition that A admits a unique LU decomposition is well known to be that all the 
upper-left principal minors of dimension smaller than n are non-singular. Otherwise the matrix may have 
no LU decomposition. For example, 
0 1 
has no LU factorization. 
1 1 
It may also have many LU decompositions as does the zero matrix when n > 1. This degeneracy explains the 
need for pivoting strategies (i.e., strategies that reorder the matrix rows and/or columns) for solving linear 
systems of equations even if the computation is done in exact arithmetic. Modern Gaussian elimination 
software for solving linear systems of equations in ?nite precision include pivoting strategies designed to 
avoid being near a matrix with such a degeneracy. 
Exercise. If A is symmetric positive de?nite, it may be factored A = LL
T 
. This is the famous Cholesky 
decomposition of A. How many independent variables are in A? In L? Prove that the Jacobian of the change 
of variables is 
n 
2
n n+1?i
det J = l
ii 
i=1 
Research Question. It seems that the existence of a ?nite algorithm for a matrix factorization is linked 
to a triangular Jacobian matrix. After all, the latter implies only one new variable need be substituted at a 
time. This is the essential idea of Doolittle’s or Crout’s matrix factorization schemes for LU. 
5.2 Jacobian of Gram-Schmidt (A = QR) 
It is well known that any nonsingular square matrix A may be factored uniquely into orthogonal times upper 
triangular, or A = QR with R
ii 
> 0. 
Theorem 3. Given any dQ with Q
T
dQ anti-symmetric, and any dR, we can compute dA = QdR+dQR = 
Q(dRR
?1 
+ Q
T
dQ)R = (R
T 
? Q) 
parenleftbig 
(R
T 
?
upper 
I)
?1
dR+ (Q
T
dQ) . The linear maps from (lower Q
T
dQ), dR 
n
to dA then has determinant ( 
producttext
i
n 
=1 
r
ii
)( r
?i 
= .
ii 
) 
producttext
n n?i 
i=1 
r
ii 
bracketleftbigg bracketrightbigg 
parenleftbig parenrightbig 
parenleftbig parenrightbig 
productdisplay 
2 
6 
Note: It is straightforward to imagine computing the Jacobian numerically. We can make 
n(n+1) 
perturba 
tions to R, say by systematically adding .001 to each entry, and then compute dA via dA = QR?Q(R+dR). 
Similarlywe can make n(n?1)/2 perturbation to Q by choosing two distinct columns of Q and multiplying 
the n×2 matrix formed from these columns by a matrix such as 
cos(.001) sin(.001) 
. 
?sin(.001) cos(.001) 
The dQ is the new Q minus the old one. 
The n
2 
resulting dA’s may be squashed into an n
2 
by n
2 
matrix whose determinant could be computed 
numerically, con?rming Theorem 3. 
Note: It is easy to compute the inverse map, i.e., given dA compute dQ and dR. Numerically we can ask 
for [Q,R] = qr(A) and [Q+ dQ,R+ dR] = qr(A + dA). 
Mathematically, dA = Q(dRR
?1 
+ Q
T
dQ)R so if lower(M) denotes the strictly lower part of M and 
upper(M) denotes the upper part of M, then 
dQ = Q lower(Q
T
dAR
?1
)?lower(Q
T
dAR
?1
)
T 
and dR = upper(Q
T
dAR
?1
)+ lower(Q
T
dAdR
?1
)
T 
R 
Jacobians for Spherical Coordinates 
The Jacobian matrix for the transformation between spherical coordinates and Cartesian coordinates has a 
su?ciently interesting structure that we include this case as our ?nal example. The structure is known as a 
“lower Hessenberg” structure. 
We recall that in R
n
, we de?ne spherical coordinates r,θ
1
,θ
2
,...,θ
n?1 
by 
x
1 
= rcos θ
1 
x
2 
= rsin θ
1 
cos θ
2 
x
3 
= rsin θ
1 
sin θ
2 
cos θ
3 
. 
. 
. 
x
n?1 
= rsin θ
1 
sin θ
2 
sin θ
n?2 
cos θ
n?1
···
x
n 
= rsin θ
1 
sin θ
2 
...sin θ
n?2 
sin θ
n?1 
or for j = 1,...,n, x
j 
may be written as 
bracketleftBigg
j?1 
bracketrightBigg 
x
j 
= r sin θ
i 
cos θ
j 
(θ
n 
= 0). 
i=1 
We schematically place an × in the matrix below if x
j 
depends on the variable heading the columns. The 
dependency structure is then 
r θ
1 
θ
2 
θ
n?2 
θ
n?1 
? 
··· 
? 
x
1 
?
x
2 
. 
? 
? 
× 
×
× 
×
× 
. 
. 
? 
? 
? 
. 
? 
.
. 
.
. 
?
. 
? ? 
?
x
n?2
×
×
× 
×
×
× 
×
×
× ··· 
×
× 
×
×
× 
? 
?? 
x
n?1 
? 
x
n 
? 
Therefore the nonzero structure of the Jacobian matrix is represented by the pattern above. 
Matrices that are nonzero only in the lower triangular part and on the superdiagonal are referred to as 
lower Hessenberg matrices. 
It is fairly messy to write down the exact Jacobian matrix. Fortunately, it is unnecessary to do so. We 
obtain the LU factorization of the matrix by de?ning auxiliary variables y
i 
as follows: 
2 2
y
1 
= x
1 
+ + x
2 
= r
n
·········
2
y
2 
= x
2 
+ + x
2 
= r
2 
sin 
2 
θ
1
n
······
2
y
3 
= x
3 
+ + x
2 
= r
2 
sin 
2 
θ
1 
sin 
2 
θ
2
n
···
. . 
.
. . . 
. . . . . . 
. . 
.
. . . 
y
n 
= x
2 
= r
2 
sin 
2 
θ
1 
sin 
2 
θ
2 
sin 
2 
θ
n?1
n 
···
Di?erentiating and expressing the relationship between dy
i 
+dx
j 
for i,j = 1,...,n in matrix form we obtain 
that the Jacobian matrix J
x→y 
from x to y is triangular: 
? ?? ? ? ? 
x
1 
x
2 
dy
1
··· x
n 
dx
1 
? 
x
2 
··· x
n 
??
dx
2
? ?
dy
2
? 
? ?? ? ? ?
2? 
. 
?? . ? = ? . ? . 
. . . 
?? 
. 
? ? 
. 
?? 
. 
x
n 
dx
n 
dy
n 
We recognize that we have an upper triangular matrix in front of the vector of Cartesian di?erentials and a 
lower triangular matrix in front of the vector of spherical coordinate di?erentials. 
? ?? ?? ? ? ? 
1 1 1 x
1 
dx
1 
dy
1
··· 
? 
1 1
?? 
x
2 
??
dx
2
? ?
dy
2
? 
? ?? ?? ? ? ? 
··· 
2? 
. 
?? 
. 
?? . ? = ? . ? 
. . . . 
? 
. 
?? 
. 
?? 
. 
? ? 
. 
? 
1 x
n 
dx
n 
dy
n 
Similarly the Jacobian matrix J
spherical→y 
from spherical coordinates to y is triangular 
? ? 
? ? ? ?
2
. 
r 
dr dy
1 
. 
? 
. 2rsin θ
1
x
1 
?
? 
dθ
1 
? ?
dy
2
? 
? . . ? 
? ? ? ? 
? 
.
. .
.
2rsin θ
1 
sin θ
2
x
2 
?
? 
dθ
2 
? = ? .
. 
? . 
? . . .
. 
? 
. . . . 
? . ? ? 
. 
? 
? 
. . .
. 
? 
. 
. . . 
. 
. . . 
. . . 2rsin θ
1 
sin θ
n?1
x
n?1 
dθ
n?1 
dy
n 
···
Therefore J 
x r,θ 
= J
?1 
r,θ→y 
J 
x→y 
. 
→
The LU (really (L
?1
U)
?1
) allows us to easily obtain the determinant as the ratio of the triangular 
determinants. The Jacobian determinant is then 
?(x
1
,...,x
n
) 2
n
r
n
(sin θ
1
)
n?1
(sin θ
2
)
n?2
(sin θ
n?1
)x
1 
x
n?1 
= 
···
?(r,θ
1
,...,θ
n
) 2
n 
x
1 
···x
n 
= r 
n?1
(sin θ
1
)
n?2 
···(sin θ
n?2
). 
In the next section we will introduce wedge products and show that they are a convenient tool for handling 
curvilinear matrix coordinates. 
summationdisplay summationdisplay summationdisplay 
7 Wedge Products 
There is a wedge product notation that can facilitate the computation of matrix Jacobians. In point of 
fact, it is never needed at all. The reader who can follow the derivation of the Jacobian in Section 5 is 
well equipped to never use wedge products. The notation also expresses the concept of volume on curved 
surfaces. 
For advanced readers who truly wish to understand exterior products from a full mathematical viewpoint, 
this chapter contains the important details. We took some pains to write a readable account of wedge 
products at the cost of straying way beyond the needs of random matrix theory. 
The steps to understanding how to use wedge products for practical calculations are very straightforward. 
1. Learn how to wedge two quantities together. This is usually understood instantly. 
2. Recognize that the wedge product is a formalism for computing determinants. This is understood 
instantly as well, and at this point many readers wonder what else is needed. 
3. Practice wedging the order n
2 
or mn or some such entries of a matrix together. This requires mastering 
three good examples. 
4. Learn the mathematical interpretation of the wedge product of such quantities as the lower triangular 
part of Q
T
dQ and how this relates to the Jacobian for QR or the symmetric eigendecomposition. We 
provide a thorough explanation. 
8 The Mechanics of Wedging 
The algebra of wedge products is very easy to master. We will wedge together di?erentials as illustrated in 
this example: 
(2dx + x 
2
dy + 5dw + 2dz)∧ (ydx ? xdy) =
(?2x ? x 
2
y)dx ∧ dy + 5y(dw ∧ dx)? 5x(dw ∧ dy)? 2y(dx ∧ dz)+ 2x(dy ∧ dz) 
Formally the wedge product acts like multiplication except that it follows the anticommutative law 
(du ∧ dv) = (?dv ∧ du) 
Generally 
(pdu + qdv)∧ (rdu + sdv) = (ps ? qr)(du ∧ dv) 
It therefore follows that 
du ∧ du = 0. 
In general 
f
i
(x)dx
i 
∧ g
j
(x)dx
j 
= (f
i
(x)g
j
(x)? f
j
(x)g
i
(x))dx
i 
∧ dx
j 
= 
summationdisplay 
f
i
(x) g
i
(x)
dx
i 
∧ dx
j 
. 
f
j
(x) g
j
(x)
i<j i<j 
We can wedge together more than two di?erentials. For example 
2dx ∧ (3dx + 5dy)∧ 7(dx + dy + dz) = 70dx ∧ dy ∧ dz 
Let us write down concretely the wedge product. 
Rewriting the two examples above in matrix notation, we have 
? ? 
2 y 
? ? 
2 3 7
2 
? 
x
? 
F = 
? 
?x 
? 
and F = 
? 
0 5 7 
? 
? 
5 0 
? 
0 0 7 
2 0 
parenleftbig parenrightbig 
logicalanddisplay 
parenleftbigg parenrightbigg 
parenleftbigg parenrightbigg 
logicalanddisplay 
logicalanddisplay 
logicalandtext 
logicalandtext 
logicalandtext 
logicalandtext 
logicalandtext 
logicalandtext 
In the ?rst case, we have computed all 2 ×2 subdeterminants and in the second case we compute the one 
3 ×3 determinant. In general, if F ∈ R
n,p
, we compute all 
n 
subdeterminants of size p. 
p 
If F(x) ∈ R
n,p 
and dx is the vector (dx
1
,dx
2
,...,dx
n
)
T 
, then we can wedge together the elements of 
F(x)
T
dx. The result is 
p 
parenleftbigg parenrightbigg 
(F(x)
T 
dx)
i 
= 
summationdisplay 
F
i
1 
i
2 
... i
p 
,
1 2 ... p 
dx
i
1 
∧···∧dx
i
p 
i=1 i
1
<i
2
<...<i
p 
i
1 
i
2 
... i
p
where F denotes the subdeterminant of F obtained by taking rows i
1
,i
2
,...,i
p 
and 
1 2 ... p 
i
1 
i
2 
... i
p
all p columns. In almost MATLAB notation, F = det(F[i
1 
i
2 
... i
p
,:]). In simple 
1 2 ... p 
English, if we wedge together p di?erentials which we can store in the columns of F, then the wedge product 
computes all p×p subdeterminants. 
We use the notation 
p 
(F(x)
T
dx)
∧ 
≡ (F(x)
T
dx)
i 
, 
i=1 
i.e., “( )
∧
” denotes wedge together all the elements of the vector inside the parentheses. 
A notational note: We are inventing the “( )
∧
” notation. Books such as [4] use only parentheses 
to indicate the wedge product of the components inside. Unfortunately, we have found that parenthesis 
notation is ambiguous especially when we want to wedge together the elements of a complicated expression. 
Once we decided on placing the wedge symbol “ ” somewhere, we experimented with upper/lower left and ∧
upper/lower right, ?nally settling on the upper right notation as above. 
We will extend the “( )
∧
” notation from vectors to matrices of di?erentials. We will only wedge 
together the independent elements. For example 
M ∈ R
mn 
(dM)
∧ 
= dM
ij 
. 
1≤i≤m 
1≤j≤n 
We use subscripts to indicate which elements to wedge over. For example (dS)
∧
i≥j 
= 
i≤j 
dS
ij
. When 
unnecessary as in the cases below, we omit the subscripts. 
S∈ R
nn 
(dS)
∧ 
= dS
ij 
symmetric 
1≤i≤j≤n 
A∈ R
nn 
(dA)
∧ 
= dA
ij 
antisymmetric 
1≤i<j≤n 
Λ ∈ R
nn 
(dΛ)
∧ 
= dΛ
ii 
diagonal 
1≤i≤n 
U ∈ R
nn 
(dU)
∧ 
= dU
ij 
upper triangular 
1≤i≤j≤n 
U ∈ R
nn 
(dU)
∧ 
= dU
ij 
strictly upper triangular 
1≤i<j≤n
etc.
De?nitional note: We have not speci?ed the order. Therefore the wedge product of elements of a matrix is 
de?ned up to “+” or “?” sign. 
Notational note: We write (dM
1
)
∧
(dM
2
)
∧ 
for (dM
1
)
∧ 
∧(dM
2
)
∧
. 
9 Jacobians with wedge products 
9.1 Wedge Notation not useful (Y = X
2
) 
Consider the 2 ×2 case of Y = X
2 
as in Example 1 of Section 2.2. 
bracketleftbigg bracketrightbigg 
bracketleftbigg bracketrightbigg 
p q 
With X = 
r	 s 
bracketleftbigg bracketrightbigg bracketleftbigg bracketrightbigg bracketleftbigg bracketrightbigg bracketleftbigg bracketrightbigg 
p q dp dq dp dq p q 
dY = XdX + dXX = + . 
r s dr ds dr ds r s 
Thus 
dY
11 
= 2pdp+ qdr + rdq 
dY
21 
= rdp+ (p + s)dr + rds 
(7) 
dY
12 
= qdp + (p+ s)dq + qds 
dY
22 
= qdr + rdq + 2sds. 
If we wedge together all of the elements of Y, we obtain the determinant of the Jacobian: 
(dY)
∧ 
= 4(p+ s)
2
(sp?qr). 
The reader should compare the notation with that from the previous chapter: 
?p ?r ?q ?s 
?	 ? 
2p q r 0 ?Y
11 
? 
r p+ s 0 r 
? 
?Y
21 
?	 ?
J = 
?	
q 0 p+ s q 
? 
?Y
12 
0 q r 2s ?Y
22 
We conclude that for these examples the notation is of little value. 
9.2 Square Matrix = Antisymmetric + Upper Triangular 
Given any matrix M ∈ R
n,n 
, let 
A = lower(M)?lower(M)
T 
, 
where lower(M) is the strictly lower triangular part of M. Further let 
R = upper(M) + lower(M)
T 
, 
where upper(M) includes the diagonal. We have the decomposition 
= + R = (antisymmetric) + (upper triangular).
M A 
Therefore 
?	 ? 
dr
11 
dr
12 
?da
21 
dr
13 
?da
31 
dr
1n 
?da
n1
··· 
?? 
da
21 
dr
22 
dr
23 
?da
32 
dr
2n 
?da
n2 
?	 ? 
?	 ?
dM = ? 
da
31 
da
32 
dr
33 
dr
3n 
?da
n3 
? . (8) 
? . . . . ? 
. . .	 . 
? 
.	
?
. . . 
da
n1 
da
n2 
da
n3 
dr
nn 
We can easily see that the ?da
ij 
play no role in the wedge product: 
(dM)
∧ 
= (lower (dA) + dR)
∧ 
= (dA)
∧
(dR)
∧
. 
We feel this notation is already somewhat more mechanical than what was needed in the last chapter. 
Without wedges, we would have concentrated on the n(n + 1)/2 parameters in R and the n(n ? 1)/2 in 
lower(A) and would have written a block two by two matrix as in Section 5.2. 
The reader should think carefully about the following sentence: The realization that the upper da
ji 
’s 
play no role in (dM)
∧
, is equivalent to the realization that the matrix X
12 
plays no role in the determinant 
of the 2 ×2 block matrix 
X
11 
X
12 
X = . 
0 X
21 
parenleftbigg parenrightbigg 
parenleftbigg parenrightbigg 
parenleftBig parenrightBig 
producttext 
productdisplay 
9.3 Triangular ←→ Symmetric 
+Let S = U + U
T 
= . If we consider the map from U to upper(S), it is easy to see that the Jacobian 
determinant is 2
n 
. 
(dS)
∧ 
= (diag(dS))
∧ 
∧(strictly-upper(dS))
∧ 
= 2
n
(diag(dU))
∧ 
∧(strictly-upper(dU))
∧ 
= 2
n
(dU)
∧
. 
9.4 General matrix functions 
x
11 
x
12 
Let Y = Y(X) be a 2 ×2 matrix, where X = . We have that 
x
21 
x
22 
? ?
?y
11 
dx
11
+ 
?y
11 
dx
12
+ 
?y
11 
dx
21
+ 
?y
11 
?y
12 
dx
11
+ 
?y
12 
dx
12
+ 
?y
12 
dx
21
+ 
?y
12 
dx
22 
?x
11 
dx
22 
?x
11 
?x
12 
?x
21 
?x
22 
?x
12 
?x
21 
?x
22 
dY = 
? ? 
. 
?y
21 
dx
11
+ 
?y
21 
dx
12
+ 
?y
21 
dx
21
+ 
?y
21 
?y
22 
dx
11
+ 
?y
22 
dx
12
+ 
?y
22 
dx
21
+ 
?y
22 
dx
22 
?x
11 
dx
22 
?x
11 
?x
12 
?x
21 
?x
22 
?x
12 
?x
21 
?x
22 
The wedge product of all the elements is 
(dY)
∧ 
= dy
11 
∧dy
21 
∧dy
12 
∧dy
22 
= det 
?y
ij 
dx
11 
∧dx
21 
∧dx
12 
∧dx
22 
= (detJ)(dX)
∧
. 
?x
kl 
In general, if Y = Y(X) is an n ×n matrix, then 
(dY)
∧ 
= (det J)(dX)
∧
, 
where det J denotes the Jacobian determinant 
?y
ij 
.
?x
kl 
9.5 Eigenproblem Jacobians 
Case I: S Symmetric 
If S ∈ R
n,n 
is symmetric it may be written 
S = QΛQ
T
,Q
T
Q = I,Λ = diag(λ
1
,··· ,λ
n
). 
The columns of Q are the eigenvectors, while the λ
i 
are the eigenvalues. 
Di?erentiating, 
dS = dQΛQ
T 
+ QdΛQ
T 
+ QΛdQ
T 
, 
Q
T
dSQ = (Q
T
dQ)Λ ?Λ(Q
T
dQ)+ dΛ. 
T
Notice that Q
T
dSQ is a symmetric matrix of di?erentials with dλ
i 
on the diagonal and q
i 
dq
j
(λ
i 
?λ
j
) as 
the i,jth entry in the upper triangular part. 
Therefore (Q
T
dSQ)
∧ 
= 
i<j 
|λ
i 
? λ
j
(dΛ)
∧
(Q
T
dQ)
∧
. We have from Example 3 of Section 4 that |
(Q
T
dSQ)
∧ 
= (dS)
∧
. In summary 
(dS)
∧ 
= |λ
i 
?λ
j
(dΛ)
∧
(Q
T
dQ)
∧
.|
i<j
Many readers will wonder about the meaning of (Q
T
dQ)
∧
. We did. We will not discuss this in length here 
but encourage you to ask us if you are curious. 
logicalanddisplay logicalanddisplay 
logicalanddisplay logicalanddisplay 
10 Handbook of Jacobians 
Some Jacobians that come up frequently are listed in this and the following sections. 
Function
Y = BXA
T
N 
= (A B)X 
“Kronecker Product” 
Y = X
?1 
“inverse” 
Y = X
2 
Y = X
k 
1
Y = 
2 
(AXB + B
T 
XA) 
“Symmetric Kronecker” 
Sizes 
X m × n 
A n × n 
B m × m 
X n × n 
X n × n 
X n × n 
X,A,B n × n 
A,B sym 
B = A
T 
Jacobian 
(dY) = (detA)
n
(det B)
m 
(dY) = (detX)
?2n
(dX) 
(dY) = 
Q 
i,j 
|λ
i 
+ λ
j 
|(dX) 
? ? 
? 
P
k?1 
λ
i
l 
λ
n?1?l
?
(dY) = 
Q 
i≥j 
? ? (dX)
l=0 j 
(dY) = 
Q 
i≤j 
|λ
i
M
j 
+ λ
j 
M
i
|(dX) 
(dY) = (detA)
n+1
(dX) 
11 Real Case; General Matrices 
In factorizations that involve a square m × m orthogonal matrix Q, and start from a thin, tall m × n matrix 
A (m ≥ n always), H
T 
dQ stands for the wedge product 
n n 
H
T 
dQ = q
i
T 
dq
j 
. 
i=1 j=i+1 
Here H is a stand-in for the ?rst n columns of Q. If the di?erential matrix and the matrix whose transpose 
we multiply by are one and the same, the wedging is done over all of the columns. For example, 
n n 
V 
T 
dV = v
i
T 
dv
j
, 
i=1 j=i+1 
for V an n × n orthogonal matrix. 
The Jacobian of the last factorization (non-symmetric eigenvalue) is computed only for the positive-
measure set of matrices with real eigenvalues and with a full set of eigenvectors. 
Factorization Matrix sizes & properties Parameter count Jacobian 
A,L,U n × n 
A = LU n
2 
= 
(dA) = 
L,U
T 
lower triangular 
n 
Q
n?i 
i=1 
n(n?1) n(n+1) 
(dL)(dU)u
lu 
+ ii 
2 2
l
ii 
= 1, ?i = 1,n 
A,R m × n 
A = QR Q m × m mn = 
(dA) = 
n 
Q
i=1 
R upper triangular 
A,Σ m × n 
A = UΣV
T 
U m × m, V n × n 
mn = 
(dA) = 
m?i
(dR)(H
T 
ii 
′ 
n(n+1) n(n+1) 
dQ)
` 
Q
T r
+
Q = I
n 
mn 
qr 
? 
2 2 
n 
QQ
i<j i=1 
V
T 
V = I
n 
A,Q m × n 
A = QS S n × n mn = (dA) = 
σ
m?n 
i
(σ
2 
j 
)
i 
?σ
2 T
(dΣ)(H
T 
dU)(V
′ 
n(n+1) n(n?1) dV )
` 
svd U
T 
+ n +
U = I
m 
mn ? 
2 2 
(σ
i 
+ σ
j 
)(dS)(Q
T 
dQ)
Q
i<j 
S positive de?nite 
A = XΛX
?1 
A,X,Λ n × n 
2 
= 
(dA) = 
n
` ′ 
n(n+1) 
polar Q
T 
Q = I
n 
mn ? 
n(n+1) 
+
2 2 
Q
(λ
i 
?λ
j 
)
2
(dΛ)(X
?1
dX) 
i<j 
` ′ 
2
n ? n + n
nonsymm eig Λ diagonal 
12 Real Matrices; Symmetric “+” Cases 
Here we gather factorizations that are symmetric or, in the case when this is required, positive de?nite. All 
matrices are square m× m. 
A = LL 
Factorization “+” Parameter count Matrix properties Jacobian 
′ n(n+1) 
=
(dA) = 
2
Positive de?nite L lower triangular 
2
n 
n 
Q 
=1 i
l
n+1?i 
ii 
n(n+1) 
(dL)
Choleski 
2 
A = LDL 
′ n(n+1) 
= 
L lower triangular (dA) = 
2 
— lii = 1, ?i = 1, n 
n 
Q 
=1 i
d
n?i 
i 
(dL)(dD)
n(n?1) 
ldl + n 
D diagonal 2 
n(n
2
+1) 
= 
Q
T 
Q = I
n 
(dA) = 
A = QΛQ
T 
—
Q 
i<j 
|λ
i 
? λ
j 
|(dΛ)(Q
T 
dQ)
eig 
n(n?1) 
Λ diagonal 
+ n
2 
13 Real Matrices; Orthogonal Case 
This is the case of the CS decomposition. Note that the matrices U
1 
and V
1 
share 
p(p
2
?1) 
parameters; this 
is due to the fact that the product of the ?rst p columns of U
1 
and the ?rst p rows of V
T 
is invariant and 
1 
determined. This is equivalent to saying that we introduce an equivalence relation of the set of pairs (U
1
,V
1
) 
of orthogonal matrices of size k, by having 
parenleftbigg bracketleftbigg bracketrightbigg bracketleftbigg bracketrightbiggparenrightbigg 
Q 0 Q 0 
(U
1
,V
1
) ~ U
1 
,V
1
0 I
j 
0 I
j 
, 
for any p× p orthogonal matrix Q. 
Since it is rather hard to integrate over a manifold of pairs of orthogonal matrices with the equivalence 
relationship mentioned above, we will use a di?erent way of thinking about this splitting of parameters. We 
k(k?1) 
? 
p(p?1) 
(having can assume that U
1 
contains 
k(k
2
?1) 
parameters (a full set) while V
1 
only contains 
2 2 
the ?rst p columns already determined from the relationship with U
1
). Alternatively, we could assign a full 
set of parameters to V
1 
and think of U
1 
as having the ?rst r columns predetermined. In the following we 
choose to make the former assumption, and think of the undetermined part H of V
T 
as a point in the Stiefel 
1 
manifold V
j,k
. 
Factorization 
Q = 
2 3 
? 
U
1
0 
– I
p 
0 0 ? 
V
1
0 
–
T 
4 
0 S 
5
C 
0 U
2
0 V
2
0 S ?C 
n = k + j, p = k ? j 
CS decomposition 
Matrix sizes & properties 
Q n × n 
U
1
, V
1 
k × k 
U
2
, V
2 
j × j 
Q
T 
Q = I
n 
U
1
, V
1
, U
2
, V
2 
orthogonal 
C = diag(cos(θ
i
)), i = 1...n 
S = diag(sin(θ
i
)), i = 1...n 
Parameter count 
U
1
V
1 
T 
has 
k(k ? 1)? 
p(p
2 
?1) 
U
2
, V
2 
have 
j(j?1) 
each 
2 
C and S share j 
Jacobian 
(dA) = 
Q
sin(θ
j 
?θ
i
)sin(θ
i 
+ θ
j 
)(dθ) 
i<j 
T T T
(U
1 
dU
1
)(U
2 
dU
2
)(H
T 
dH)(V
2 
dV
2
) 
14 Real Matrices; Symmetric Tridiagonal 
The important factorization of this section is the eigenvalue decomposition of the symmetric tridiagonal.The 
tridiagonal is de?ned by its 2n? 1 parameters, n on the diagonal and n? 1 on the upper diagonal: 
? ? 
a b
n n?1 
? 
b a b
? 
n?1 n?1 n?2 
? ? 
T = 
? 
? 
. 
. 
. 
. 
. 
. 
. 
. 
. 
? 
? . 
? ? 
? 
b
2 
a
2 
b
1 
? 
b
1 
a
1 
The matrix ofeigenvectors Qcan be completely determined from knowledge of its ?rst row q = (q
1
,...,q
n
) 
(n? 1 parameters) and the eigenvalues on the diagonal of Λ (another n parameters). Here dq represents 
integration over the (1,n) Stiefel manifold. 
Factorization 
T = QΛQ
T 
eig 
Matrix sizes 
T,Q,Λ n× n 
Parameter count 
2n? 1 = 
(n? 1) + n 
Matrix properties 
Q
T 
Q = I
n 
Λ diagonal 
Jacobian 
n?1 
b
i 
i
Q
=1 
(dT) = (dΛ)(dq)
n 
q
i 
i
Q
=1 
Proof by MATLAB 
At a basic level, as we discussed earlier, the Jacobian is the magni?cation in the volume element that occurs 
when a matrix is perturbed in “every conceivable direction”. Hence the theoretical results can be ver?ed 
using actual MATLAB experiments. Consider the symmetric eigenvalue problem on the third row of the table 
in Section 12. The function jsymeig.m allows you to enter a real symmetric matrix A and computes the 
theoretical Jacobian and the numerical answer. You should easily be able to verify that both these answers 
match. We encourage you to verify the theoretical answers for other factorizations. 
Once we have the Jacobians computing the eigenvalue densities is fairly straightforward. The densities 
for the Hermite, Laguerre and Jacobi ensembles is tabulated here as well. 
? ? 
? ? 
function j = jsymeig(a)
%JSYMEIG Jacobian for the symmetric eigenvalue problem
% J = JSYMEIG(A) returns the Numerical and Theoretical Jacobian for
% the symmmetric eigenvalue problem. The numerical answer is
% computed by perturbing the symmetric matrix A in the N*(N+1)/2
% independent directions and computing the change in the
% eigenvalues and eigenvectos.
%
% A is a REAL N x N symmetric matrix
% Example: Valid Inputs for A are
% 1) G = randn(5); A = (G + G’)
% 2) G = randn(5,10); A = G * G’
%
% J is a 1 x 2 row vector
% J(1) is the theoretical Jacobian computed using the formula in the
% third row of the table in Section 12 of Handout # 6
%
%
% References:
% [1] Alan Edelman, Handout 6: Essentials of Finite Random Matrix
% Theory, Fall 2004, Course Notes 18.338.
% [2] Alan Edelman, Random Matrix Eigenvalues.
%
% Alan Edelman and Raj Rao, Sept. 2004.
% $Revision: 1.1 $ $Date: 2004/09/28 17:11:18 $
format long
[q,e] = eig(a); % Compute eigenvalues and eigenvectors
e = diag(e);
epsilon = 1e-7; % Size of perturbation
n = length(a); % Size of Matrix
jacmatrix = zeros(n*(n+1)/2);
k = 0; mask = triu( ones(n),1); mask = logical(mask(:));
for i = 1:n,
for j = i:n
k = k+1;
E = zeros(n);
E(i,j) = 1; E(j,i) = 1;
aa = a + epsilon * E;
[qq,ee] = eig(aa);
de= (diag(ee)-e)/epsilon;
qdq = q’*(qq-q)/epsilon; qdq = qdq(:);
jacmatrix(1:n,k) = de;
jacmatrix((n+1):end,k) = qdq(mask);
end
end
% Numerical answer
j = 1/det(jacmatrix);
% Theoretical answer
[e1,e2] = meshgrid(e,e);
z = abs(e1-e2);
j = abs([j prod( z(mask) ) ]);
Code 2 
Random Matrix Ensembles
n
Joint Eigenvalue Distributions: c | Δ | 
β
Π V( λ
i
 )
i=1 
β 
2 
β 
2 
(n ? m+1) ? 1 
H 
C 
R 
H 
C 
R 
C 
H 
R 
A 
A S AS
D 
T
Q AQ 
H
A U AU 
A S AS
D 
A 
A 
T
Q AQ 
U AU
H 
? λ / 2 
β 
2 
Decomposition 
Singular Value 
Eigenvalue Problem 
SymmetricGOE Symmetric 
GUE Hermitian 
GSE Self?Dual 
Laguerre ensembles Wishart ensembles 
Jacobi ensembles
 1
 2 
MANOVA ensembles 
a = 
b = 
1 
MANOVA real
 2
 1
 2
 4
 4
 4 
Hermite ensembles Gaussian ensembles 
Technical name Traditional name β Invariance 
ba 
(QZ) 
V(λ ) = e 
V(λ 
V(λ ) = 
) = λ 
a 
a = (n ? m+1) ? 1 
Positive 
Semi?Definite 
Positive1 
Semi?Definite
 Eigenvalue Problem 
e 
? λ / 2
2
 Generalized Symmetric 
Wishart complex 
Wishart quaternion 
Wishart real 
MANOVA complex 
MANOVA quaternion 
(SVD) 
(EIG) 
Field Property 
λ (1?λ)
 2
(n ? m+1) ? 1 
T 
T 
H 
H 
D 
X 
Y Q YQ 
X 
Y U YU 
X S XS 
Y S YS 
D 
4 
2 
1 
2 
1 
4 
A = [X Y; ?conj(Y) conj(X)]; A = (A+A’) / 2; 
A = randn(n) + i * randn(n); A = (A+A’) / 2; 
A = randn(m, n); A = A * A’; 
Type of ensemble 
Hermite 
Laguerre 
Jacobi 
1 
4 
2 
X = randn(n) + i * randn(n); Y = randn(n) + i * randn(n); 
A = randn(m, n) + i * randn(m, n); A = A * A’; 
1
 1  1  2  2
X = randn(m, n ) + i * randn(m, n ); Y = randn(m, n ) + i * randn(m, n ); A = (X * X’) / (X * X’ + Y * Y’);
 222
 1  211 
1
Y = randn(m, n ) + i * randn(m, n ); Y = randn(m, n ) + i * randn(m, n ); 
X = randn(m, n ) + i * randn(m, n ); X = randn(m, n ) + i * randn(m, n ); 
2 2  11  2  121
X = [X X ; ?conj(X ) conj(X )]; Y = [Y Y ; ?conj(Y ) conj(Y )]; 
A = (X * X’) / (X * X’ + Y * Y’);
 1  1
 2  2 
X = randn(m, n) + i * randn(m, n); Y = randn(m, n) + i * randn(m, n); 
A = [X Y; ?conj(Y) conj(X)]; A = A * A’; 
β 
X = randn(m, n ); Y = randn(m, n ); A = (X * X’) / (X * X’ + Y * Y’);
2 
Q XQ 
U XU 
1
 1
 1
 1
 1
 1 
A = randn(n); A = (A+A’) / 2; 
MATLAB code 
Connected Matrix Problem 
BD 
integraldisplay integraldisplay 
summationtext 
integraltext integraltext 
integraltext integraltext 
logicalandtext summationtext summationtext 
15 Volume and Integration 
15.1 Integration Using Di?erential Forms 
One nice property of our di?erential form notation is that if y = y(x) is some function from (a subset of) 
R
n 
to R
n
, then the formula for changing the volume element is built into the identity 
f(y)dy
1 
∧...∧dy
n 
= f(y(x))dx
1 
∧...∧dx
n
, 
y(S) S 
because as we saw in 9.4 the Jacobian emerges when we write the exterior product of the dy’s in terms of 
the dx’s. 
We will only concern ourselves with integration of n-forms on manifolds of dimension n. In fact, most 
of our manifolds will be ?at (subsets of R
n
), or surfaces only slightly more complicated than spheres. For 
example, the Stiefel manifold V
m,n 
of n by p orthogonal matrices Q (Q
T
Q = I
m
) which we shall introduce 
shortly. Exterior products will give us the correct volume element for integration. 
If the x
i 
are Cartesian coordinates in n-dimensional Euclidean space, then (dx) ≡ dx
1 
∧dx
2 
∧...dx
n 
is 
the correct volume element. For simplicity, this may be written as dx
1
dx
2 
...dx
n 
so as to correspond to the 
Lebesgue measure. Let q
i 
be the ith component of a unit vector q∈ R
n 
. Evidently, n parameters is one too 
many for specifying points on the sphere. Unless q
n 
= 0, we may use q
1 
through q
n?1 
as local coordinates 
on the sphere, and then dq
n 
may be thought of as a linear combination of the dq
i 
for i < n. ( 
i 
q
i
dq
i 
= 0 
because q
T
q = 1). However, the Cartesian volume element dq
1
dq
2 
...dq
n?1 
is not correct for integrating 
functions on the sphere. It is as if we took a map of the Earth and used Latitude and Longitude as Cartesian 
coordinates, and then tried to make some presumption about the size of Greenland
1 
. 
Integration: 
x∈S 
f(x)(dx) or f(dx) and other related expressions will denote the “ordinary” integral over a region 
S 
S∈ R. 
|| ||
2 /2 
.Example. 
R
n 
exp(? x
2
/2)(dx) = (2π)
n/2 
and similarly 
R
n,n 
exp(? x
F 
/2)(dA) = (2π)
n 
2
F 
summationtext 
2 
|| || ||A||
2 
= 
tr(A
T
A) = 
i,j 
a
ij 
= “Frobenius norm” of A squared. 
If an object has n parameters, the correct di?erential form for the volume element is an n-form. What 
2
about x ∈ S
n?1
, i.e., {x ∈ R
n 
: x = 1}? dx
i 
= (dx)
∧ 
= 0. We have x
i 
= 1 x
i
dx
i 
= 0
i=1 
? ?
dx
n 
= 
1 
(x
1
dx
1 
+ + x
n?1
dx
||
n?
||
1
). Whatever the correct volume element for a sphere is, it is not (dx). 
x
n
? ···
As an example, we revisit spherical coordinates in the next section. 
15.2 Plucker Coordinates and Volume Measurement 
Let F ∈ R
n,p
. We might think of the columns of F as the edges of a parallelopiped. By de?ning Pl(F) 
(“Plucker(F)”), we can obtain simple formulas for volumes. 
De?nition 1. Pl(F) is the vector of p×p subdeterminants of F written in natural order. 
1
I do not think that I have ever seen a map of the Earth that uses Latitude and Longitude as Cartesian coordinates. The 
most familiar map, the Mercator map, takes a stereographic projection of the Earth onto the (complex) plane, and then takes 
the image of the entire plane into an in?nite strip by taking the complex logarithm. 
producttext producttext
producttext 
parenrightbig 
parenleftbigg parenrightbigg parenleftbigg parenrightbigg parenleftbigg parenrightbigg 
productdisplay 
p = 2 : 
? 
? 
? 
f
11 
. 
. 
. 
f
n1 
f
12 
. 
. 
. 
f
n2 
? 
? 
? 
Pl 
?→ 
? 
? 
? 
? 
? 
f
11 
f
12 
f
13 
? 
? 
? 
? ? ? 
p = 3 : 
? 
? 
? 
? 
. 
. 
. 
? 
? 
? 
? 
Pl 
?→ 
? 
? 
? 
? 
? ? ? 
f
n1 
f
n2 
f
n3 
? 
? 
Pl 
? 
? 
f
11
f
22 
? f
21
f
12 
. ? 
. 
. 
? 
f
n?1,1
f
n,2 
? f
n,1
f
n?1,2 
? 
f
11 
f
12 
f
13 
?
f
21 
f
22 
f
23 
? 
?
f
31 
f
32 
f
33 
? 
. ? 
. 
?
. 
? 
?
f
n?2,1 
f
n?2,2 
f
n?2,3 
? 
f
n?1,1 
f
n?1,2 
f
n?1,3 
? 
f
n,1 
f
n,2 
f
n,3 
? 
p general : F = (f
ij
)
1≤i≤n 
?
?→ 
?
det (f
ij
)
i=i
1
,...,i
p 
1≤j≤p j=1,...,p 
i
1
<...<i
p 
De?nition 2. Let vol(F) denote the volume of the parallelopiped { Fx : 0 ≤ x
i 
≤ 1} , i.e., the volume of the 
parallelopiped with edges equal to the columns of F. 
p
Theorem 4. vol(F) = 
p 
σ
i 
= det(F
T
F)
1/2 
= 
i=1 
r
ii 
= bardbl Pl(F)bardbl , where the σ
i 
are the singular values 
i=1 
of F, and the r
ii 
are the diagonal elements of R in F = YR, where Y ∈ R
n,p 
has orthonormal columns and 
R is upper triangular. 
Proof. Let F = UΣV
T 
, where U ∈ R
n,p 
and V ∈ R
p,p 
has orthonormal columns. The matrix σ denotes 
a box with edge sides σ
i 
so has volume σ
i
. The volume is invariant under rotations. The other formulas 
follow trivially except perhaps the last which follows from the Cauchy-Binet theorem taking A = F
T 
and 
B = F. ? 
Theorem 5. (Cauchy-Binet) Let C = AB be a matrix product of any kind. Let M 
parenleftbig 
i
1
...i
p 
denote the p× p
j
1
...j
p 
minor 
det(M
i
k
j
l 
)
1≤k≤p,1≤l≤p
. 
In other words, it is the determinant of the submatrix of M formed from rows i
1
,...,i
p 
and columns j
1
,...,j
p
. 
The Cauchy-Binet Theorem states that 
i
1
,...,i
p 
summationdisplay 
i
1
,...,i
p 
j
1
,...,j
p
C = A B . 
k
1
,...,k
p 
j
1
,...,j
p 
k
1
,...,k
p
j
1
<j
2
<···<j
p 
Notice that when p = 1 this is the familiar formula for matrix multiplication. When all matrices are p× p, 
then the formula states that 
det C = det Adet B. 
Corollary 6. Let F ∈ R
n,p 
have orthonormal columns, i.e., F
T
F = I
p
. Let X ∈ R
n,p
. If span(F) = 
span(X), then vol(X) = det(F
T
X) = Pl(F)
T 
· Pl(X). 
Theorem 7. Let F,V ∈ R
n,p 
be arbitrary. Then 
p 
Pl(F)
T 
Pl(V ) = det(F
T
X) = | vol(F) vol(V) cos θ
i 
,|| | 
i=1 
where θ
1
,...,θ
p 
are the principal angles between span(F) and span(V ). 
Remark 1. If S
p 
is some p-dimensional surface it is convenient for F
(i) 
to be a set of p orthonormal tangent 
vectors on the surface at some point x
(i) 
and V
(i) 
to be any “little” parallelopiped on the surface. 
summationdisplay 
summationdisplay summationdisplay 
integraldisplay integraldisplay 
summationtext 
productdisplay 
productdisplay 
radicaltp 
If we decompose the surface into parallelopipeds we have 
vol(S
p
) ≈ Pl(F
(i)
)
T 
Pl(V
(i)
). 
and 
integraldisplay 
f(x)d(surface) ≈ f(x
(i)
)Pl(F
(i)
)
T 
Pl(V
(i)
) = f(x
(i)
)det((F
(i)
)
T
V
(i)
). 
Mathematicians write the continuous limit of the above equation as 
f(x)d(surface) = f(x)(F
T
dx)
∧
. 
Notice that (F(x)
T
dx)
∧ 
formally computes Pl(F(x)). Indeed 
? ? 
. 
. 
. 
? 
? ?
(F(x)
T
dx)
∧ 
= Pl(F(x))
T 
? 
dx
i
1 
∧...∧dx
i
p 
. 
? ? 
. 
. 
. 
i
1
<...<i
p 
III. Generalized dot products algebraically: Linear functions l(x) for x∈ R
n 
may be written l(x) = f
i
x
i
, 
i.e., f
T
x. 
The information that speci?es the linear function is exactly the components of f. 
F ∈ R
n,p
Similarly, consider functions l(V) for V ∈ R
n,p 
that have the form l(V) = det(F
T
V), where . 
The reader should check how much information is needed to “know” l(V) uniquely. (It is “less than” 
all the elements of F.) In fact, if we de?ne 
E
i
1
i
2
...i
p 
= The n×p matrix consisting of columns i
1
,...,i
p 
of the identity, 
i.e., (E = eye(n);E
i,i
2
,...,i
p 
= E(:,[i
1 
...i
p
])) in pseudo-Matlab notation, then knowing l(E
i
1
,...,i
p 
) 
for all i
1 
< ... < i
p 
is equivalent to knowing l everywhere. This information is precisely all p× p 
subdeterminants of F, the Plucker coordinates. 
IV. Generalized dot product geometrically. If F and V ∈ R
n,p 
we can generalize the familiar f
T
v = 
bardblfbardblbardblvbardblcosθ. 
It becomes 
p 
Pl(F)
T 
Pl(V) = det(F
T
V) = |vol(F) vol(V) cos θ
i 
.|| |
i=1 
Here vol(M) denotes the volume of the parallelopiped that is the image of the unit cube under M (Box 
with edges equal to columns of M). Everyone knows that there is an angle between any two vectors 
in R
n 
. Less well known is that there are p principal angles between two p-dimensional hyperplanes in 
R
n 
. The θ
i 
are principal angles between the two hyperplanes spanned by F and by V. 
Easy special case: Suppose F
T
F = I
p 
and span(F) = span(V). In other words, the columns of F are 
an orthonormal basis for the columns of V. In this case 
det(F
T
V) = vol(V). 
Other formulas for vol(V) which we mention for completeness is 
p 
vol(V ) = σ
i
(V), 
i=1 
the product of the singular values of V. And 
radicalvertex parenleftbigg parenrightbigg
2 
radicalvertex
summationdisplay 
1...p 
vol(V) = 
radicalbt 
V
i
1 
...i
p 
, 
i
1
<...<i
p 
the sum of the p×p subdeterminants squared. In other words vol(V) = bardblPlucker(V)bardbl. 
logicalandtext
logicalandtext
logicalandtext
integraldisplay 
summationdisplay 
logicalanddisplay 
1.	 Volume measuring function: (Volume element of Integration) 
We now more explicitly consider V ∈ R
n,p 
as a parallelopiped generated by its columns. For general 
F ∈ R
n,p
, det(F
T
V) may be thought of as a “skewed” volume. 
p
We now carefully interpret 
i=1
(F
T
dx)
i
, where F ∈ R
n,p 
and dx = (dx
1
dx
2 
...dx
n
)
T 
. We remind the 
summationtext
n
reader that (F
T
dx)
i 
= 
j=1 
F
ji
dx
j
. 
p
By 
i=1
(F
T
dx)
i 
we are thinking of the linear map that takes V ∈ R
n,p 
to the scalar det(F
T
V). In our 
mind we imagine the following assumptions 
?	 We have some p-dimensional surface S
p 
in R
n 
. 
?	 At some point P on S
p
, the columns of X ∈ R
n,p 
are tangent to S
p
. We think of X as generating a 
little parallelopiped on the surface of S
p
. 
?	 We imagine the columns of F are an orthonormal basis for the tangents to S
p 
at P. 
p
Then 
i=1
(F
T
dx)
i 
= (F
T 
dx)
∧ 
is a function that replaces parallelopipeds on S
p 
with its Euclidean ? 
volume. 
Now we can do “the calculus thing.” Suppose we want to compute 
I = f(x)d(surface), 
S
p 
the integral of some scalar function f on the surface S
p
. 
We discretize by circumscribing the surface with little parallelopipeds that we specify by using the matrix 
V
(i) 
∈ R
n,p
. The word circumscribing indicates that the p columns of V
(i) 
are tangent (or reasonably close 
to tangent) at x
(i)
. 
Let F
(i) 
be an orthonormal basis for the tangents at x
(i)
. Then I is discretized by 
I≈ f(x
(i)
)det(F
(i)
T 
V
(i)
) 
i 
We then use the notation 
integraldisplay p integraldisplay 
I = f(x) (F
T
dx)
i 
= f(x)(F
T 
dx)
∧ 
S
p 
i=1 
S
p 
to indicate this continuous limit. Here f is a function of x. 
The notation does not require that F be orthonormal or tangent vectors. These conditions guarantee 
that you get the correct Euclidean volume. Let them go and you obtain some warped or weighted volume. 
Linear combinations are allowed too. 
The integral notation does require that you feed in tangent vectors if you discretize. Careful mathematics 
shows that for the cases we care about, no matter how one discretizes, the limit of small parallelopipeds 
gives a unique answer. (Analog of no matter how you take small rectangles to compute Riemann integrals, 
in the limit there is only one unique area under a curve.) 
15.3 Overview of special surfaces 
We are very interested in the following three mathematical objects: 
1.	 The sphere = {x : bardblxbardbl = 1} in R
n 
2.	 The orthogonal group O(n) of orthogonal matrices Q (Q
T
Q = I) in R
nn 
. 
3.	 The Stiefel manifold of tall skinny matrices Y ∈ R
np 
with orthogonal columns (Y
T
Y = I
p
). 
parenleftbigg parenrightbigg 
parenleftbigg parenrightbigg parenleftbigg parenrightbigg 
summationdisplay 
integraltext 
summationdisplay 
logicalanddisplay 
summationtext 
Of course the sphere and the orthogonal group are special cases of the Stiefel manifold with p = 1 and p = n 
respectively. 
We will derive and explain the meaning of the following volume forms on these objects 
Sphere (H
T
dq)
∧
where H is a so-called Householder transform 
i=2,...,n 
Orthogonal Group (Q
T
dQ)
∧ 
= (Q
T
dQ)
∧
i>j 
Stiefel Manifold (Q
T
dY)
∧
i>j 
where Q
T
Y = I
p
. 
The concept of volume or surface area is so familiar that the reader might not see the need for a formalism 
to encode this concept. 
Here are some examples. 
x
1
Example 1: Integration about a circle Let x = . 
x
2 
integraldisplay integraldisplay 
2π 
f(x)(?x
1
dx
1 
+ x
2
dx
2
) = f(cos θ,sin θ)dθ 
x∈R
2 
0 
?x?=1 
x
1 
cos θ 
Algebraic proof: = implies 
x
2 
sin θ 
parenleftbigg parenrightbigg
T 
parenleftbigg parenrightbigg parenleftbigg parenrightbigg
T 
parenleftbigg parenrightbigg 
2 
dx
1 
? sin θ ? sin θ 
?x
2
dx
1 
+ x
1
dx
2 
= dθ = 
?x
= dθ. 
x
1 
dx
2 
cos θ cos θ 
Geometric Interpretation: Approximate the circle by a polygon with k sides. 
parenleftBigg parenrightBigg
T
(i) 
integraldisplay 
2π 
(x
(i) 
? x
(i?1)
) ≈f(x
(i)
) 
?x
(
2 
i) 
f(cos θ,sin θ)dθ. 
x
1
0 
Note that the dot product computes bardblv
i
bardbl. 
Geometric interpretation of the wrong answer: Why not f(x)dx
1
?
?x?=1
x∈R
2 
This is approximated by 
parenleftbigg parenrightbigg
T 
f(x
(i)
)
1 
v
(i) 
0 
which is the integral of f over the “shadow” of the circle on the x-axis. 
Example 2: Integration over a sphere Given q with bardblqbardbl = 1, let H(q) be any n× n orthogonal matrix 
with ?rst column q. (There are many ways to do this. One way is to construct the “Householder” re?ector 
T 
I? 2
vv 
H(q) = 
v v 
, where v = e
1 
? q, and e
1 
is the ?rst column of I.) 
The sphere is an n? 1 dimensional surface in n dimensional space. 
Integration over the sphere is then 
T
integraldisplay 
n 
integraldisplay 
f(x) (H
T
dx)
i 
= f(x)dS, 
x∈R
n 
x∈R
n 
?x?=1 
i=2 
?x?=1 
where dS is the surface “volume” on the sphere. 
Consider the familiar sphere n = 3. Approximate the sphere by a polyhedron of parallelograms with k 
faces. A parallelogram may be algebraically encoded by the two vectors forming edges, into a matrix V(i). 
We have f(x(v
i
))· det(H(q)
T
V) approximates this integral. 
Example 3: The orthogonal group Let O(n) denote the set of orthogonal n× n matrices. This is an 
n(n?1) 
2 
dimensional set living in R
n 
. 
2 
logicalandtext 
bracketleftbigg bracketrightbigg 
Tangents consist of matrices T such that Q
T
T is anti-symmetric. We can take an orthogonal, but not 
orthonormal set to consist of matrices Q(E
ij 
? E
ji
) for i > j ((E
ij
kl 
= 1) if i = k and j = l; 0 otherwise). 
The matrix “F
T
V” roughly amounts to taking twice the triangular part of Q
T
dQ. 
Let O(m) denote the “orthogonal group” of m× m matrices Q such that Q
T
Q = I. We have seen 
T
(Q
T
dQ) = 
i>j 
q
i 
dq
j 
is the natural volume element on O(M). Also, notice that O(n) has two connected 
components. When m = 2, we may take 
Q = 
c ?s 
2
(c 
2 
+ s = 1) 
s c 
for half of O(2). This gives 
parenleftbigg parenrightbigg parenleftbigg parenrightbigg parenleftbigg parenrightbigg 
Q
T
dQ = 
cos θ 
? sin θ 
sin θ 
cos θ 
? sin θdθ ? cos θdθ 
cos θdθ ? sin θdθ 
= 
0 
dθ 
?dθ 
0 
in terms of dθ. Therefore (Q
T
dQ)
∧ 
= dθ. 
15.4 The Sphere 
Students who have seen any integral on the sphere before probably have worked with traditional spherical 
coordinates or integrated with respect to something labeled “the surface element of the sphere.” We mention 
certain problems with these notations. Before we do, we mention that the sphere is so symmetric and so 
easy, that these problems never manifest themselves very seriously on the sphere, but they become more 
serious on more complicated surfaces. 
The ?rst problem concerns spherical coordinates: the angles are not symmetric. 
They do not interchange nicely. Often one wants to preserve the spherical symmetry by writing x = qr, 
where r = bardblxbardbl and q = x/bardblxbardbl. Of course, q then has n components expressing n? 1 parameters. The 
n quantities dq
1
,...,dq
n 
are linearly dependent. Indeed di?erentiating q
T
q = 1 we obtain that q
T
dq = 
summationtext
n 
= 0. 
i=1 
q
i
dq
i 
Writing the Jacobian from x to q,r is slightly awkward. One choice is to write the radial and angular 
parts separately. Since dx = qdr + dqr, 
q 
T
dx = dr and (I? qq 
T 
)dx = rdq. 
We then have that 
dx = dr∧ (rdq) = r 
n?1
dr(dq), 
where (dq) is the surface element of the sphere. 
We introduce an explicit formula for the surface element of the sphere. Many readers will wonder why 
this is necessary. Experience has shown that one can need only set up a notation such as dS for the surface 
element of the sphere, and most integrals work out just ?ne. We have two reason for introducing this formula, 
both pedagogical. The ?rst is to understand wedge products on a curved space in general. The sphere is 
one of the nicest curved spaces to begin working with. Our second reason, is that when we work on spaces 
of orthogonal matrices, both square and rectangular, then it becomes more important to keep track of the 
correct volume element. The sphere is an important stepping stone for this understanding. 
We can derive an expression for the surface element on a sphere. We introduce an orthogonal and 
symmetric matrix H such that Hq = e
1 
and He
1 
= q, where e
1 
is the ?rst column of the identity. Then 
? ? 
dr 
?
r(Hdq)
2
? 
? ? 
?
r(Hdq)
3
?
Hdx = e
1
dr + Hdqr = ? ? . 
? . ? 
. 
? 
. 
? 
r(Hdq)
n 
Thus
(dx)
∧
= (Hdx)
∧
= r
n?1
dr
n
logicalanddisplay
i=2
(Hdq)
i
.
We can conclude that the surface element on the sphere is (dq) =
logicalandtext
n
i=2
(Hdq)
i
.
v
(5)
1
f
 (5)
1
x
(5)
v
(5)
2
v
(17)
1
f
 (5)
2
f
 (17)
2
f
 (17)
1
v
(17)
2
x
(17)
Householder Re?ectors
We can explicitly construct an H as described above so as to be the Householder re?ector. Notice that
H serves as a rotating coordinate system. It is critical that H(:,2 : n) is an orthogonal basis of tangents.
Choose v = e
1
?q the external angle bisector and
H = I?2
vv
T
v
T
v
.
See Figure 1 which illustrate that H is a re?ection through the internal angle bisector of q +e
1
.
Notice that (Hdq)
1
= 0 and every other component
summationtext
n
j=1
H
ij
dq
j
(inegationslash= 1) may be thought of as a tangent
on the sphere. H = H
T
,Hq = e
1
,He
1
= q
1
,H
2
= I, and H is orthogonal.
?
?
?
?
v = q?e
1
q
e
1
q +e
1
plane
of
re?ection
Figure 1:
I think many people do not really understand the meaning of (Q
T
dQ)
∧
. I like to build a nice cube on
the surface of the orthogonal group ?rst. Then we will connect the notations. First of all O(n) is an
n(n?1)
2
dimensional surface in R
n
2
. At any point Q, tangents have the form Q·A, where A is anti-symmetric.
summationtext 
integraldisplay integraldisplay integraldisplay 
integraldisplay integraldisplay 
The dot product of two “vectors” (matrices) X and Y is X Y ≡ tr(X
T
Y) = X
ij
Y
ij
. If Q is orthogonal ·
QX·QY = X Y.·
If Q = I then the matrices A
ij 
= (E
ij 
? E
ji
)/
√
2 for i < j clearly form an 
n(n?1) 
dimensional cube 
2 
tangent to O(n) at I, i.e., they form an orthonormal basis for the tangent space. Similarly the n(n?1)/2 
matrices Q(E
ij 
?E
ji
)/
√
2 form such a basis at Q. 
One can form an F whose columns are vec(Q(E
ij
?E
ji
)) for i < j. The matrix would be n
2 
by n(n?1)/2. 
Then F
T
(dq
ij
)
i<j 
would be the Euclidean volume element. Mathematicians like to throw away the 
√
2 so 
that the volume element is o? by the factor 2
n(n?1)/4 
. This does not seem to bother anyone. In non-vec 
format, this is (Q
T
dQ)
∧
. 
Application 
Surface Area of Sphere Computation 
We directly use the formula (dx)
∧ 
= r
n?1
dr(Hdq)
∧
: 
∞
2 2 
r 
dr (Hdq)
∧
e
?
1
2
?x? n?1 
e
?
1
2
(2π) 
n 
2
dx= = r 
r=0 x∈R
n 
n 
2n?2 
parenleftBig 
n
parenrightBig 
2π 
(Hdq)
∧ 
or (Hdq)
∧
2
2
Γ A= = = 
n
,
Γ(
n 
2 
)2 
and A
n 
is the surface of the sphere of radius 1. For example, 
A
2 
= 2π (circumference of a circle) 
A
3 
= 4π (surface area of a sphere in 3d) 
A
4 
= 2π
2 
15.5 The Stiefel Manifold 
QR Decomposition 
? ? 
x 
? 
0 
? 
? ?
Let A ∈ R
n,m
(m ≤ n). Taking x = a
i
, we see ?H
1 
such that H
1
A has the form ? 
. ? 
. We can 
. 
? 
. 
? 
0 
? ? ? ? 
1 0 0 x x··· 
? 
0 
? 
x 
?? 
0 
? ? ? ?
then construct an H
2 
= ? 
. ? 
so that H
2
H
1
A = ? 
. . ? 
. Continuing H
m 
···H
1
A = 
. 
H
2 
. . 
0 0 0 
? 
.
? ? ? 
. . 
? 
? ? 
0 
? ? 
? 
. 
? R 
? 
.
. 
? 
? ? or A = (H
1 
H
m
)
? ?
, where R is m×m upper triangular (with positive diagonals). 
? 
0 
? 
···
O 
0 0··· 
let Q = the ?rst m columns of H
1 
H
m
. Then A = QR as desired. ···
The Stiefel Manifold 
The set of Q∈ R
n,p 
such that Q
T
Q = I
p 
is denoted V
p,n 
and is known as the Stiefel manifold. Considering 
the Householder construction, ?H
1
,··· ,H
p 
such that. 
? ? 
1 0 
? 
. 
?
H
p
H
p?1 
H
1
Q = 
? 
. 
. 
?
···
0 1 
logicalanddisplay logicalanddisplay 
integraldisplay 
so that Q = 1st p columns of H
1
H
2 
H
p?1
H
p
.···
Corollary 8. The Stiefel manifold may be speci?ed by (n ?1)+ (n ?2)+ = pn ?1/2p(p+ 1) ···+ (n ?p) 
parameters. You may think of this as the pn arbitrary parameters in an n×p matrix reduced by the p(p+1)/2 
conditions that q
i
T
q
j 
= δ
ij 
for i ≥ j. You might also think of this as 
dim{Q} = dim{A} ? dim{R} 
↑ ↑ 
pn p(p + 1)/2 
It is no coincidence that it is more economical in numerical computations to store the Householder parameters 
than to compute out the Q. 
This is the prescription that we would like to follow for the QR decomposition for the n by p matrix A. 
If Q ∈ R
p,n 
is orthogonal, let H be an orthogonal p by p matrix such that H
T
Q is the ?rst p columns of I. 
Actually H may be constructed by applying the Householder process to Q. Notice that Q is simply the ?rst 
p columns of H. 
As we proceed with the general case, notice how this generalizes the situation when p = 1. If A = QR, 
then dA = QdR + dQR and H
T
dA = H
T
QdR + H
T
dQR. Let H = [h
1
,. . . , h
n
]. The matrix H
T
QdR 
is an n by p upper triangular matrix. While H
T
dQ is (rectangularly) antisymmetric. (h
T
h
j 
= 0 implies 
i 
h
T
dh
j 
= ?h
T
dh
i
)
i j
Haar Measure and Volume of the Stiefel Manifold 
It is evident that the volume element in mn dimensional space decouples into a term due to the upper 
triangular component and a term due to the orthogonal matrix component. The di?erential form 
m n 
(H
T
dQ) = h
i
T
dh
j 
j=1 i=j+1 
is a natural volume element on the Stiefel manifold. 
We may de?ne 
μ(S) = (H
T
dQ). 
S 
This represent the surface area (volume) of the region S on the Stiefel manifold. This “measure” μ is known 
as Haar measure when m = n. It is invariant under orthogonal rotations. 
producttext
m 
1
Exercise. Let Γ
m
(a) ≡ π
m(m?1)/4 
i=1 
Γ[a ? 
2
(i ?1)]. Show that the volume of V
m,n 
is 
2
m
π
mn/2 
Vol (V
m,n
) = 
Γ
m
(
1
n) 
. 
2
Exercise. What is this expression when n = 2? Why is this number twice what you might have thought 
it would be? 
Exercise. Let A have independent elements from a standard normal distribution. Prove that Q and R 
are independent, and that Q is distributed uniformly with respect to Haar measure. How are the elements on 
the strictly upper triangular part of R distributed. How are the diagonal elements of R distributed? Interpret 
the QR algorithm in a statistical sense. (This may be explained in further detail in class). 
Readers who may never have taken a course in advanced measure theory might enjoy a loose general 
description of Haar measure. If G is any group, then we can de?ne the map on ordered pairs that sends 
(g, h) to g
?1
h. If G is also a manifold (or some kind of Hausdor? topological space), and if this map is 
continuous, we have a topological group. An additional assumption one might like is that every g ∈ G has 
integraldisplay integraldisplay 
summationdisplay 
summationdisplay 
braceleftBig bracerightBig 
summationtext 
braceleftBig bracerightBig 
summationtext 
braceleftBig bracerightBig 
summationtext 
parenleftbigg parenrightbigg 
summationtext 
an open neighborhood whose closure is compact. This is a locally compact topological group. The set of 
square nonsingular matrices or the set of orthogonal matrices are good examples. A measure μ(E) is some 
sort of volume de?ned on E which may be thought of as nice (“measurable”) subsets of G. The measure 
is a Haar measure if μ(gE) = μ(E), for every g ∈ G. In the example of orthogonal n by n matrices, the 
condition that our measure be Haar is that 
f(Q)(Q
T
dQ)
∧ 
= f(Q
0
Q)(Q
T
dQ)
∧
. 
Q∈S Q∈Q
?1
S
0 
In other words, Haar measure is symmetrical, no matter how we rotate our sets, we get the same answer. 
The general theorem is that on every locally compact topological group, there exists a Haar measure μ. 
16 Exterior Products (The Algebra) 
Let V be an n-dimensional vector space over R. For p = 0,1,...,n we de?ne the pth exterior product. For 
p = 0 it is R and for p = 1 it is V. For p = 2, it consists of formal sums of the form 
a
i
(u
i 
∧w
i
), 
i 
where a
i 
∈ R and u
i
,w
i 
∈ V. (We say “u
i 
wedge v
i
.”) Additionally, we require that (au + v) ∧ w = 
a(u∧w)+ (v∧w), u∧(bv+ w) = b(u∧v)+ (u∧w) and u∧u=0. A consequence of the last relation is that 
u∧w = ?w∧u which we have referred to previously as the anti-commutative law. We further require that 
if e
1
,...,e
n 
constitute a basis for V, then e
i 
∧e
j 
for i < j, constitute a basis for the second exterior product. 
Proceeding analogously, if the e
i 
form a basis for V we can produce formal sums 
),c
γ
(e
γ
1 
∧e
γ
2 
∧···∧e
γ
p 
γ 
where γ is the multi-index (γ
1
,...,γ
p
), where γ
1 
< < γ
p
. The expression is multilinear, and the signs ··· 
change if we transpose any two elements. 
The table below lists the exterior products of a vector space V = {c
i
e
i
}. 
p 
0 
1 
2 
3 
. 
. 
. 
p 
. 
. 
. 
n 
n + 1 
pth Exterior Product 
V
0 
= R 
V
1 
= V = {c
i
e
i
}
V
2 
= 
i<j 
c
ij
e
i 
∧e
j 
V
3 
= 
i<j<k 
c
ijk
e
i 
∧e
j 
∧e
k 
. 
. 
. 
V
p 
= 
i
1
<i
2
<...<i
p 
c
i
1
i
2
...i
p 
e
i
1 
∧e
i
2 
∧...∧e
i
p 
. 
. 
. 
V
n 
= {ce
1 
∧e
2 
∧...∧e
n
}
V
n+1 
= {0} 
Dimension 
1 
n 
n(n?1)/2 
n(n?1)(n?2)/6 
. 
. 
. 
n 
p 
. 
. 
. 
1 
0 
In this book V = c
i
dx
i
}, i.e. the 1-forms. Then V
p 
consists of the p-forms, i.e. the rank p exterior {
di?erential forms. 
References 
[1] Charles Van Loan. The ubiquitous Kronecker product. J. Comp. and Applied Math., 123(200):85–100. 
[2] Y. Zhang. The exact distribution of the Moore-Penrose inverse of X with a density. In P.R. Krishnaiah, 
editor, Multivariate Analysis VI, pages 633–635, New York, 1985. Elsevier. 
[3] Heinz Neudecker and Shuangzhe Liu. The density of the Moore-Penrose inverse of a random matrix. 
Multivariate Analysis, 237/238:123–126, 1996. 
[4] Robb J. Muirhead. Aspects of Multivariate Statistical Theory. John Wiley & Sons, New York, 1982. 

