Introduction to Scientific
Computing
-- A Matrix Vector Approach Using Matlab
Written by Charles F.Van Loan
陈 文 斌
Wbchen@fudan.edu.cn
复旦大学
Matrix Computations
? Setting Up Matrix Problems
? Matrix Operations
? Once Again,Setting Up Matrix Problems
? Recursive Matrix Operations
? Distributed Memory Matrix Multiplication
? Matrix-vector multiplication
? Matrix- Matrix multiplication
? Often the amount of work that is required to
initialize an n-by-n matrix is as much as the
work required to solve for x
? Fast Fourier transform and fast Strassen
matrix multiply algorithm
Ax=b
Setting up Matrix problems
Hilbert matrix
1
1
??
?
ji
a ij
A=zeros(n,n);
for i=1:n
for j=1:n
A(i,j)=1/(i+j-1);
end
end
A=zeros(n,n);
for i=1:n
for j=i:n
A(i,j)=1/(i+j-1);
A(j,i)=A(i,j);
end
end
Simple ij Recipes
function H = hilb(n)
%HILB Hilbert matrix.
% This is also a good example of efficient MATLAB programming
% style where conventional FOR or DO loops are replaced by
% vectorized statements,This approach is faster,but uses
% more storage.
% C,Moler,6-22-91.
% Copyright 1984-2001 The MathWorks,Inc,
% $Revision,5.9 $ $Date,2001/04/15 12:02:29 $
J = 1:n; J = J(ones(n,1),:); I = J';
E = ones(n,n); H = E./(I+J-1);
The setting up of a matrix can often be made more efficient
by exploiting relationships that exist between the entries.
??
?
?
?
??
????
?
?
??
?
?
?
?
?
o t h e r w is e0
0 if
)!(!
!
1
1 mk
kmk
m
k
m
p mk
O(n3) flops
?
?
?
?
?
?
?
?
?
?
?
?
?
1331
0121
0011
0001
P
jijiij ppp,11,1 ??? ??
O(n2) flops
P=zeros(n,n);
P(:,1)=ones(n,1);
for i=2:n
for j=2:i
P(i,j)=P(i-1,j-1)+P(i-1,j);
end
end
O(n2)
flops
?
?
?
?
?
?
?
?
?
?
?
?
?
3
4
2
44
3
3
2
33
3
2
2
22
3
1
2
11
1
1
1
1
xxx
xxx
xxx
xxx
V
Matrix defined by a vector of parameters
n=length(x,y);
V(:,1)=ones(n,1);
for j=2:n
V(:,j)=x.*V(:,j-1);
end
?
?
?
?
?
?
?
?
?
?
?
?
?
1432
2143
3214
4321
aaaa
aaaa
aaaa
aaaa
C
Circulant
matrices
function C = Circulant1(a)
n = length(a); C = zeros(n,n);
for i=1:n
for j=1:n
C(i,j) = a(rem(n-i+j,n)+1);
end
end
function C = Circulant2(a)
n = length(a);C = zeros(n,n);
C(1,:) = a;
for i=2:n
C(i,:) = [ C(i-1,n) C(i-1,1:n-1) ];
end
1)m o d)(( ???? njinij ac
n time for Circulant1/Circulant2,
100 24.694
200 35.157
400 48.468
nRepeat = 20;
for n=[100 200 400]
a = 1:n;
tic; for k=1:nRepeat,C = Circulant1(a); end,t1 = toc;
tic; for k=1:nRepeat,C = Circulant2(a); end,t2 = toc;
disp(sprintf(' %4.0f %5.3f',n,t1/t2))
end
?
?
?
?
?
?
?
?
?
?
?
?
?
1432
2143
3214
4321
aaaa
baaa
bbaa
bbba
T
toeplitz
matrices
If a and b are n-vectors,then T=toeplitz(a,b) sets up the matrix
?
?
?
?
?
?
?
?
ijr
jic
t
ij
ji
ij
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?????
????
???
??
?
?
0
00
000
0000
L
Lower triangular
matrices
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
??
???
????
?????
?
0000
000
00
0
U
Upper triangular
matrices
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
??
???
???
???
??
?
000
00
00
00
000
T
Tridiagonal
matrices
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
????
?????
??????
??????
??????
?????
????
???
?
0000
000
00
00
00
000
0000
00000
T
Lower
bandwith 3
Upper bandwidth 2
3,0 ??? jia ij
2,0 ??? ija ij
Diag
DIAG(V,K) when V is a vector with N components is a square
matrix of order N+ABS(K) with the elements of V on the K-th
diagonal,K = 0 is the main diagonal,K > 0 is above the main
diagonal and K < 0 is below the main diagonal.
m = 5;
diag(-m:m) + diag(ones(2*m,1),1) + diag(ones(2*m,1),-1)
DIAG(DIAG(X)) is a diagonal matrix
tril
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
43215
32154
21543
15432
54321
X
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
43215
32154
21543
05432
00321
)2,( Xt r i l
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
??
00215
00054
00003
00000
00000
)2,( Xt r i l
?
?
?
??
??
?
kij
kijx
b ijij
0
tril(X,k)
triu
?
?
?
??
??
?
kji
kjix
b ijij
0
triu(X,k)
T=-triu(tril(ones(6,6),1),-1)+3*eyes(6,6)
T=-diag(ones(5,1)-1)+diag(2*ones(6,1),0)-diag(ones(5,1),1)
T=toeplitz([2;-1;zeros(4,1)],[2;-1;zeros(4,1)])
Block structure
A11=[10 11 12; 13 14 15;16 17 18];
A12=[20;21;22]; A13=[30 31;32 33;34 35];
A21=[40 41 42; 43 44 45];A22=[50; 51];
A23=[60 61;62 63];
A=[A11 A12 A13; A21 A22 A23];
C=A(2:4,5:6);
Matrix-Vector multiplication
Axy ?
?
?
?
n
j
jiji xay
1[m,n]=size(A);
y=zeros(m,1);
for i=1:m
for j=1:n
y(i)=y(i)+A(i,j)*x(j);
end
end
2mn flops
function y = MatVecR0(A,x)
[m,n] = size(A);
y = zeros(m,1);
for i=1:m
y(i) = A(i,:)*x;
end
Row-
oriented
function y = MatVecC0(A,x)
[m,n] = size(A);
y = zeros(m,1);
for j=1:n
y = y + A(:,j)*x(j);
end
Column-
oriented
saxpy
vector←scalar*vector+vector
If the matrix entries aij are stored column by
column in memory,the the saxpy version accesses
A-entries that are contiguous in memory,In contract
the row-oriented algorithm access noncontiguous aij.
Exploiting structure
A is upper triangular
[n,n]=size(A);
y=zeros(n,1);
for i=1:n
y(i)=A(i,1:i)*x(1:i)
end
[n,n]=size(A);
y=zeros(n,1);
for j=1:n
y(j:n)=A(j:n,j)*x(j)+y(j:n)
end
Matrix-Matrix Multiplication
ABCRBRA mrrm ??? ??,,
?
?
?
r
k
kjikij bac
1
C=zeros(m,n);
for j=1:n
for i=1:m
for k=1:r
C(i,j)=C(i,j)+A(i,k)*B(k,j);
end; end;end
C=A*B
Four Versions
function C = MatMatDot(A,B)
[m,p] = size(A);
[p,n] = size(B);
C = zeros(m,n);
for j=1:n
for i=1:m
C(i,j) = A(i,:)*B(:,j);
end
end
First Version:Dot Product Version
function C = MatMatSax(A,B)
[m,p] = size(A);
[p,n] = size(B);
C = zeros(m,n);
for j=1:n
for k=1:p
C(:,j) = C(:,j) + A(:,k)*B(k,j);
end
end
Second version,saxpys
function C = MatMatVec(A,B)
[m,p] = size(A);
[p,n] = size(B);
C = zeros(m,n);
for j=1:n
C(:,j) = C(:,j) + A*B(:,j);
end
Third Version,via matrix-vector products
? ?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
nmmm
n
n
n
m
T
vuvuvu
vuvuvu
vuvuvu
vvv
u
u
u
uv
?
????
?
?
?
?
21
22212
12111
21
2
1
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
p
k
kBkA
pB
B
B
pAAAABC
1
:),()(,,
:),(
:),2(
:),1(
)](,,||)2(,,|)1(,,[
?
?
outer products
function C = MatMatOuter(A,B)
[m,p] = size(A);
[p,n] = size(B);
C = zeros(m,n);
for k=1:p
C = C + A(:,k)*B(k,:);
end
Fourth Version,via outer products
n Dot Saxpy MatVec Outer Direct
------------------------------------------------
100 0.3030 0.3790 0.0270 0.0440 0.0060
200 2.4500 3.0810 0.1370 0.6590 0.0280
400 11.0400 13.4630 1.5430 6.1570 0.2250
MatBench,repeat=100,n=100,200,400
Error and Norms
|||| 11 nxxx ??? ?
22
2
2
12 nxxxx ???? ?
? ?||,|,|m a x 1 nxxx ???
?? ?? xnxx 1 ?? ?? xnxx 2
norm(x,1)
norm(x,2)
norm(x,inf)
?
???
?
m
i
ijnj aA
11
1 m a x
?
???
? ?
n
j
ijmi aA
11
m a x
212
2
m a x AxA
x ?
?
? ?
? ?
?
m
i
n
j
ijF aA
1 1
2
norm(A,1)
norm(A,2)
norm(A,inf)
norm(A,’fro’)
11
a n d w h e r e?t h e n
,of v e r s io n s t o r e d t h eis ? If 5 T h e o r e m
Ae p sE
REEAA
RAA
nm
nm
?
???
?
?
?
BAe p sABABfl ??)(
n 1-norm factor
------------------------
2 0.231
3 0.408
4 0.339
5 0.265
6 0.266
7 0.242
8 0.241
9 0.208
10 0.203
norm(E,1)/(eps3*norm(A,1)*norm(B,1))
ProdBound.m
Examines the error in 3-digit matrix multiplication.
Once Again,Setting Up Matrix Problems
On numerous occasions we have been
required to evaluate a continuous function f(x)
on a vector of values.
Two-dimensional tables of function values
)3( 22e x p),( yxyxf ??? )3( 22 ii yx
ij ef
???
v=linspace(0,1,n);
F=zeros(n,n);
for i=1:n
for j=1:n
F(i,j)=exp(-(v(i)^2+3*v(j)^2);
end
end
v=linspace(0,1,n);
F=zeros(n,n);
for j=1:n
F(:,j)=exp(-(v.^2+3*v(j)^2);
end
end
v=linspace(0,1,n);
A=zeros(n,n);
for i=1:n
for j=1:n
A(i,j)=-(v(i)^2+3*v(j)^2);
end
end
F=exp(A);
function F = SampleF(x,y)
n = length(x);
m = length(y);
A = -((2*y.^2)*ones(1,n) + ones(m,1)*(x.^2)')/4;
F = exp(A);
Notice that the matrix A is the sum of two outer products and
that 4/)2( 22 ijij yxa ???
Contour Plots
Contour,f(x,y)=c
x = linspace(-2,2,50)';
y = linspace(-1.5,1.5,50)';
F = SampleF(x,y);
Contour(x,y,F)
contour(x,y,F,5);
contour(x,y,F,[1,8,6,4,2])
c= contour(x,y,F,4);
clabel(c,'manual');
Spotting Matrix-Vector Products
? ?? ba dc d y d xyxfI ),(
? ?
?
???b
a
N
i
xii
x
Qxgwabdxxg
1
)()()(
? ?
?
???d
c
N
j
yjj
y
Qxgcddyyg
1
)()()( ?
? ? ??
?
??????????????? ba
N
i
d
c ii
d
c
x
dyyxfwabdxdyyxfI
1
),()(),(
? ?
? ?
???
?
???
?
???
x yN
i
N
j
jiji yxfcdwab
1 1
),()()( ?
? ?
? ?
???
?
?
???
?
???
x yN
i
N
j
jiji Qyxfwcdab
1 1
),())(( ?
?FwcdabQ T))(( ???
function numI2D = CompQNC2D(fname,a,b,c,d,mx,nx,my,ny)
[omega,x] = wCompNC(a,b,mx,nx);
[mu,y] = wCompNC(c,d,my,ny);
F = feval(fname,x,y);
numI2D = (b-a)*(d-c)*(omega'*F*mu);
Subintervals Integral Time
------------------------------------------------
32 46.220349653054718 0.0600
64 46.220349653055116 0.1600
128 46.220349653055095 0.3300
256 46.220349653055102 1.1600
512 46.220349653055123 3.8400
For larger values of n,we would find that the amount of
computation increases by a factor of 4 with a doubling of n
reflecting the O(n2) nature of the caculation.
? ? ???????? ?????????? 20 20 2222 )1.)3.)((1.)7.(( 1)1.)4.)((1.)3.(( 1 d y d xyxyxI
Recursive Matrix Operations
Some of the most interesting algorithmic developments
in matrix computations are recursive,Two examples are
given in this section,The first is the fast Fourier
transform,a super-quick way of computing a special,a
very important matrix-vector product,The second is a
recursive matrix multiplication algorithm that involves
markedly fewer flops than the conventional algorithm.
The Fast Fourier Transform
The discrete Fourier transform(DFT) matrix is a complex
Vandermonde matrix.
?
?
?
?
?
?
?
?
?
?
?
?
?
9
4
6
4
3
4
6
4
4
4
2
4
3
4
2
44
4
1
1
1
1111
www
www
www
F
iw ?4
i=sqrt(-1);
F=[1 1 1 1; 1 –i –1 i; 1 –1 1 –1; 1 i –1 –i];
)/2s i n ()/2c o s ()/2e x p ( ninniw n ??? ????
For general n,the DFT matrix is defined in terms of
In particular,the n-by-n DFT matrix is defined by
)1)(1(),( ???? qp
npqpqn wffF
F=ones(n,n);
F(:,2)=exp(-2*pi*sqrt(-1)/n).^(0,n-1)’;
for k=3:n
F(:,k)=F(:,2).*F(:,k-1);
end
It is possible to computer y= Fn x without explicitly forming the
DFT matrix Fn
n=length(x);
y=x(1)*ones(n,1);
for k=2:n
y=y+exp(-2*pi*sqrt(-1)*(k-1)*(0:n-1)’)*x(k);
end
k
kn
n
k
n
k
n
x
w
w
w
y
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
??
?
?
)1)(1(
)1(2
1
1
?
1?nnw
},,,,1{ 12 ?nnnn www ?
nm
n
m
n ww
m o d ?
function y = DFT(x)
n = length(x);
y = x(1)*ones(n,1);
if n > 1
v = exp(-2*pi*sqrt(-1)/n).^(0:n-1)';
for k=2:n
z = rem((k-1)*(0:n-1)',n ) +1;
y = y + v(z)*x(k);
end
end
O(n2)
O(nlog2n)
The starting point is to look at an even order DFT matrix when
we permute its columns so that the odd-indexed columns come
first,
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
wwwwwww
wwwwww
wwwwwww
wwww
wwwwwww
wwwwww
wwwwwww
F
234567
246246
364725
4444
527463
642642
765432
8
1
11
1
1111
1
11
1
11111111
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
????
????
????
????
?
573246
626244
753642
573246
626244
753642
8
1
11
1
11111111
1
11
1
11111111
)(,,
wwwwwww
wwwwww
wwwwwww
wwwwwww
wwwwww
wwwwwww
c o l sF
Cols=[1 3 5 7 2 4 6 8]
Noting that
4
2
8
2 www ??
?
?
?
?
?
?
?
?
44
44
8 )(,,DFF
DFF
c o l sF
?
?
?
?
?
?
?
?
?
?
?
?
?
3
2
1
w
w
w
D
?
?
?
?
?
?
?
?
?
?
?
?
?
??
)8:2:2(
)8:2:1(
)()(,,
44
44
8 x
x
DFF
DFF
c o lsxc o lsFxF
?
?
?
?
?
?
?
?
?
?
?
?
?
?
)8:2:2(
)8:2:1(
4
4
xF
xF
DI
DI
y=F8x yT=F4x(1:2:8)y
B=F4x(2:2:8)
y(1:m)=yT+d,yB
y(m+1:n)=yT - d,yB
?
?
?
?
?
?
?
?
?
?
?
?
?
3
2
1
w
w
w
d
n=2m
y=Fnx
yT=Fmx(1:2:n)
yB=Fmx(2:2:n)
y(1:4)=yT+d,yB
y(5:8)=yT - d,yB
?
?
?
?
?
?
?
?
?
?
?
?
?
?1
1
m
n
w
w
d
?
function y = FFTRecur(x)
n = length(x);
if n ==1
y = x;
else
m = n/2;
yT = FFTRecur(x(1:2:n));
yB = FFTRecur(x(2:2:n));
d = exp(-2*pi*sqrt(-1)/n).^(0:m-1)';
z = d.*yB;
y = [ yT+z ; yT-z ];
end
Radix-2 FFT
FFTRecur is slower than FFT,Since FFTRecur involves more
flops than FFT concerns the computation of the,weight vector”
d,As it stands,there is a considerable amount of redundant
computation with respect to the exponential values that are
required,This can be avoided by precomputing the weights,
storing them in a vector,and then merely,looking up” the values
as they are needed during the recursion,With care,the amount of
work required by a radix-2 FFT is 5nlog2n flops
n DFT FFTRecur FFT
Time Time Time
------------------------------------------
4 0 1.100000e-001 0
8 6.000000e-002 1.100000e-001 0
16 1.600000e-001 2.200000e-001 0
32 3.300000e-001 5.500000e-001 0
64 8.800000e-001 1.100000e+000 0
128 3.180000e+000 2.310000e+000 0
256 9.500000e+000 4.620000e+000 5.000000e-002
512 3.263000e+001 9.390000e+000 0
1024 1.198500e+002 1.922000e+001 0
Strassen Multiplication
??
?
??
?
??
???
??
?
??
?
??
?
??
??
??
?
??
?
2222122121221121
2212121121121111
2221
1211
2221
1211
2221
1211
BABABABA
BABABABA
BB
BB
AA
AA
CC
CC
6231224221
5312754111
222122127
1211112162212115
11212242212113
1122212221122111
,
,
))((
))((,)(
)(),(
)(),)((
PPPPCPPC
PPCPPPPC
BBAAP
BBAAPBAAP
BBAPBBAP
BAAPBBAAP
??????
??????
???
?????
????
?????
Strassen multiplication scheme
n=2m,Strassen, 7 multiplications m*m matrix
18 matrix additions
2(7m3)+18m2=7/8(2n3)+9/2n2
Conventional algorithms
2n3+n2
n Strass Times Ordinary times
-------------------------------------
32 0.000 0.930
64 0.060 0.000
128 0.110 0.000
256 0.770 0.050
512 5.550 0.990
1024 39.650 4.180
Computing
-- A Matrix Vector Approach Using Matlab
Written by Charles F.Van Loan
陈 文 斌
Wbchen@fudan.edu.cn
复旦大学
Matrix Computations
? Setting Up Matrix Problems
? Matrix Operations
? Once Again,Setting Up Matrix Problems
? Recursive Matrix Operations
? Distributed Memory Matrix Multiplication
? Matrix-vector multiplication
? Matrix- Matrix multiplication
? Often the amount of work that is required to
initialize an n-by-n matrix is as much as the
work required to solve for x
? Fast Fourier transform and fast Strassen
matrix multiply algorithm
Ax=b
Setting up Matrix problems
Hilbert matrix
1
1
??
?
ji
a ij
A=zeros(n,n);
for i=1:n
for j=1:n
A(i,j)=1/(i+j-1);
end
end
A=zeros(n,n);
for i=1:n
for j=i:n
A(i,j)=1/(i+j-1);
A(j,i)=A(i,j);
end
end
Simple ij Recipes
function H = hilb(n)
%HILB Hilbert matrix.
% This is also a good example of efficient MATLAB programming
% style where conventional FOR or DO loops are replaced by
% vectorized statements,This approach is faster,but uses
% more storage.
% C,Moler,6-22-91.
% Copyright 1984-2001 The MathWorks,Inc,
% $Revision,5.9 $ $Date,2001/04/15 12:02:29 $
J = 1:n; J = J(ones(n,1),:); I = J';
E = ones(n,n); H = E./(I+J-1);
The setting up of a matrix can often be made more efficient
by exploiting relationships that exist between the entries.
??
?
?
?
??
????
?
?
??
?
?
?
?
?
o t h e r w is e0
0 if
)!(!
!
1
1 mk
kmk
m
k
m
p mk
O(n3) flops
?
?
?
?
?
?
?
?
?
?
?
?
?
1331
0121
0011
0001
P
jijiij ppp,11,1 ??? ??
O(n2) flops
P=zeros(n,n);
P(:,1)=ones(n,1);
for i=2:n
for j=2:i
P(i,j)=P(i-1,j-1)+P(i-1,j);
end
end
O(n2)
flops
?
?
?
?
?
?
?
?
?
?
?
?
?
3
4
2
44
3
3
2
33
3
2
2
22
3
1
2
11
1
1
1
1
xxx
xxx
xxx
xxx
V
Matrix defined by a vector of parameters
n=length(x,y);
V(:,1)=ones(n,1);
for j=2:n
V(:,j)=x.*V(:,j-1);
end
?
?
?
?
?
?
?
?
?
?
?
?
?
1432
2143
3214
4321
aaaa
aaaa
aaaa
aaaa
C
Circulant
matrices
function C = Circulant1(a)
n = length(a); C = zeros(n,n);
for i=1:n
for j=1:n
C(i,j) = a(rem(n-i+j,n)+1);
end
end
function C = Circulant2(a)
n = length(a);C = zeros(n,n);
C(1,:) = a;
for i=2:n
C(i,:) = [ C(i-1,n) C(i-1,1:n-1) ];
end
1)m o d)(( ???? njinij ac
n time for Circulant1/Circulant2,
100 24.694
200 35.157
400 48.468
nRepeat = 20;
for n=[100 200 400]
a = 1:n;
tic; for k=1:nRepeat,C = Circulant1(a); end,t1 = toc;
tic; for k=1:nRepeat,C = Circulant2(a); end,t2 = toc;
disp(sprintf(' %4.0f %5.3f',n,t1/t2))
end
?
?
?
?
?
?
?
?
?
?
?
?
?
1432
2143
3214
4321
aaaa
baaa
bbaa
bbba
T
toeplitz
matrices
If a and b are n-vectors,then T=toeplitz(a,b) sets up the matrix
?
?
?
?
?
?
?
?
ijr
jic
t
ij
ji
ij
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?????
????
???
??
?
?
0
00
000
0000
L
Lower triangular
matrices
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
??
???
????
?????
?
0000
000
00
0
U
Upper triangular
matrices
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
??
???
???
???
??
?
000
00
00
00
000
T
Tridiagonal
matrices
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
????
?????
??????
??????
??????
?????
????
???
?
0000
000
00
00
00
000
0000
00000
T
Lower
bandwith 3
Upper bandwidth 2
3,0 ??? jia ij
2,0 ??? ija ij
Diag
DIAG(V,K) when V is a vector with N components is a square
matrix of order N+ABS(K) with the elements of V on the K-th
diagonal,K = 0 is the main diagonal,K > 0 is above the main
diagonal and K < 0 is below the main diagonal.
m = 5;
diag(-m:m) + diag(ones(2*m,1),1) + diag(ones(2*m,1),-1)
DIAG(DIAG(X)) is a diagonal matrix
tril
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
43215
32154
21543
15432
54321
X
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
43215
32154
21543
05432
00321
)2,( Xt r i l
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
??
00215
00054
00003
00000
00000
)2,( Xt r i l
?
?
?
??
??
?
kij
kijx
b ijij
0
tril(X,k)
triu
?
?
?
??
??
?
kji
kjix
b ijij
0
triu(X,k)
T=-triu(tril(ones(6,6),1),-1)+3*eyes(6,6)
T=-diag(ones(5,1)-1)+diag(2*ones(6,1),0)-diag(ones(5,1),1)
T=toeplitz([2;-1;zeros(4,1)],[2;-1;zeros(4,1)])
Block structure
A11=[10 11 12; 13 14 15;16 17 18];
A12=[20;21;22]; A13=[30 31;32 33;34 35];
A21=[40 41 42; 43 44 45];A22=[50; 51];
A23=[60 61;62 63];
A=[A11 A12 A13; A21 A22 A23];
C=A(2:4,5:6);
Matrix-Vector multiplication
Axy ?
?
?
?
n
j
jiji xay
1[m,n]=size(A);
y=zeros(m,1);
for i=1:m
for j=1:n
y(i)=y(i)+A(i,j)*x(j);
end
end
2mn flops
function y = MatVecR0(A,x)
[m,n] = size(A);
y = zeros(m,1);
for i=1:m
y(i) = A(i,:)*x;
end
Row-
oriented
function y = MatVecC0(A,x)
[m,n] = size(A);
y = zeros(m,1);
for j=1:n
y = y + A(:,j)*x(j);
end
Column-
oriented
saxpy
vector←scalar*vector+vector
If the matrix entries aij are stored column by
column in memory,the the saxpy version accesses
A-entries that are contiguous in memory,In contract
the row-oriented algorithm access noncontiguous aij.
Exploiting structure
A is upper triangular
[n,n]=size(A);
y=zeros(n,1);
for i=1:n
y(i)=A(i,1:i)*x(1:i)
end
[n,n]=size(A);
y=zeros(n,1);
for j=1:n
y(j:n)=A(j:n,j)*x(j)+y(j:n)
end
Matrix-Matrix Multiplication
ABCRBRA mrrm ??? ??,,
?
?
?
r
k
kjikij bac
1
C=zeros(m,n);
for j=1:n
for i=1:m
for k=1:r
C(i,j)=C(i,j)+A(i,k)*B(k,j);
end; end;end
C=A*B
Four Versions
function C = MatMatDot(A,B)
[m,p] = size(A);
[p,n] = size(B);
C = zeros(m,n);
for j=1:n
for i=1:m
C(i,j) = A(i,:)*B(:,j);
end
end
First Version:Dot Product Version
function C = MatMatSax(A,B)
[m,p] = size(A);
[p,n] = size(B);
C = zeros(m,n);
for j=1:n
for k=1:p
C(:,j) = C(:,j) + A(:,k)*B(k,j);
end
end
Second version,saxpys
function C = MatMatVec(A,B)
[m,p] = size(A);
[p,n] = size(B);
C = zeros(m,n);
for j=1:n
C(:,j) = C(:,j) + A*B(:,j);
end
Third Version,via matrix-vector products
? ?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
nmmm
n
n
n
m
T
vuvuvu
vuvuvu
vuvuvu
vvv
u
u
u
uv
?
????
?
?
?
?
21
22212
12111
21
2
1
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
p
k
kBkA
pB
B
B
pAAAABC
1
:),()(,,
:),(
:),2(
:),1(
)](,,||)2(,,|)1(,,[
?
?
outer products
function C = MatMatOuter(A,B)
[m,p] = size(A);
[p,n] = size(B);
C = zeros(m,n);
for k=1:p
C = C + A(:,k)*B(k,:);
end
Fourth Version,via outer products
n Dot Saxpy MatVec Outer Direct
------------------------------------------------
100 0.3030 0.3790 0.0270 0.0440 0.0060
200 2.4500 3.0810 0.1370 0.6590 0.0280
400 11.0400 13.4630 1.5430 6.1570 0.2250
MatBench,repeat=100,n=100,200,400
Error and Norms
|||| 11 nxxx ??? ?
22
2
2
12 nxxxx ???? ?
? ?||,|,|m a x 1 nxxx ???
?? ?? xnxx 1 ?? ?? xnxx 2
norm(x,1)
norm(x,2)
norm(x,inf)
?
???
?
m
i
ijnj aA
11
1 m a x
?
???
? ?
n
j
ijmi aA
11
m a x
212
2
m a x AxA
x ?
?
? ?
? ?
?
m
i
n
j
ijF aA
1 1
2
norm(A,1)
norm(A,2)
norm(A,inf)
norm(A,’fro’)
11
a n d w h e r e?t h e n
,of v e r s io n s t o r e d t h eis ? If 5 T h e o r e m
Ae p sE
REEAA
RAA
nm
nm
?
???
?
?
?
BAe p sABABfl ??)(
n 1-norm factor
------------------------
2 0.231
3 0.408
4 0.339
5 0.265
6 0.266
7 0.242
8 0.241
9 0.208
10 0.203
norm(E,1)/(eps3*norm(A,1)*norm(B,1))
ProdBound.m
Examines the error in 3-digit matrix multiplication.
Once Again,Setting Up Matrix Problems
On numerous occasions we have been
required to evaluate a continuous function f(x)
on a vector of values.
Two-dimensional tables of function values
)3( 22e x p),( yxyxf ??? )3( 22 ii yx
ij ef
???
v=linspace(0,1,n);
F=zeros(n,n);
for i=1:n
for j=1:n
F(i,j)=exp(-(v(i)^2+3*v(j)^2);
end
end
v=linspace(0,1,n);
F=zeros(n,n);
for j=1:n
F(:,j)=exp(-(v.^2+3*v(j)^2);
end
end
v=linspace(0,1,n);
A=zeros(n,n);
for i=1:n
for j=1:n
A(i,j)=-(v(i)^2+3*v(j)^2);
end
end
F=exp(A);
function F = SampleF(x,y)
n = length(x);
m = length(y);
A = -((2*y.^2)*ones(1,n) + ones(m,1)*(x.^2)')/4;
F = exp(A);
Notice that the matrix A is the sum of two outer products and
that 4/)2( 22 ijij yxa ???
Contour Plots
Contour,f(x,y)=c
x = linspace(-2,2,50)';
y = linspace(-1.5,1.5,50)';
F = SampleF(x,y);
Contour(x,y,F)
contour(x,y,F,5);
contour(x,y,F,[1,8,6,4,2])
c= contour(x,y,F,4);
clabel(c,'manual');
Spotting Matrix-Vector Products
? ?? ba dc d y d xyxfI ),(
? ?
?
???b
a
N
i
xii
x
Qxgwabdxxg
1
)()()(
? ?
?
???d
c
N
j
yjj
y
Qxgcddyyg
1
)()()( ?
? ? ??
?
??????????????? ba
N
i
d
c ii
d
c
x
dyyxfwabdxdyyxfI
1
),()(),(
? ?
? ?
???
?
???
?
???
x yN
i
N
j
jiji yxfcdwab
1 1
),()()( ?
? ?
? ?
???
?
?
???
?
???
x yN
i
N
j
jiji Qyxfwcdab
1 1
),())(( ?
?FwcdabQ T))(( ???
function numI2D = CompQNC2D(fname,a,b,c,d,mx,nx,my,ny)
[omega,x] = wCompNC(a,b,mx,nx);
[mu,y] = wCompNC(c,d,my,ny);
F = feval(fname,x,y);
numI2D = (b-a)*(d-c)*(omega'*F*mu);
Subintervals Integral Time
------------------------------------------------
32 46.220349653054718 0.0600
64 46.220349653055116 0.1600
128 46.220349653055095 0.3300
256 46.220349653055102 1.1600
512 46.220349653055123 3.8400
For larger values of n,we would find that the amount of
computation increases by a factor of 4 with a doubling of n
reflecting the O(n2) nature of the caculation.
? ? ???????? ?????????? 20 20 2222 )1.)3.)((1.)7.(( 1)1.)4.)((1.)3.(( 1 d y d xyxyxI
Recursive Matrix Operations
Some of the most interesting algorithmic developments
in matrix computations are recursive,Two examples are
given in this section,The first is the fast Fourier
transform,a super-quick way of computing a special,a
very important matrix-vector product,The second is a
recursive matrix multiplication algorithm that involves
markedly fewer flops than the conventional algorithm.
The Fast Fourier Transform
The discrete Fourier transform(DFT) matrix is a complex
Vandermonde matrix.
?
?
?
?
?
?
?
?
?
?
?
?
?
9
4
6
4
3
4
6
4
4
4
2
4
3
4
2
44
4
1
1
1
1111
www
www
www
F
iw ?4
i=sqrt(-1);
F=[1 1 1 1; 1 –i –1 i; 1 –1 1 –1; 1 i –1 –i];
)/2s i n ()/2c o s ()/2e x p ( ninniw n ??? ????
For general n,the DFT matrix is defined in terms of
In particular,the n-by-n DFT matrix is defined by
)1)(1(),( ???? qp
npqpqn wffF
F=ones(n,n);
F(:,2)=exp(-2*pi*sqrt(-1)/n).^(0,n-1)’;
for k=3:n
F(:,k)=F(:,2).*F(:,k-1);
end
It is possible to computer y= Fn x without explicitly forming the
DFT matrix Fn
n=length(x);
y=x(1)*ones(n,1);
for k=2:n
y=y+exp(-2*pi*sqrt(-1)*(k-1)*(0:n-1)’)*x(k);
end
k
kn
n
k
n
k
n
x
w
w
w
y
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
??
?
?
)1)(1(
)1(2
1
1
?
1?nnw
},,,,1{ 12 ?nnnn www ?
nm
n
m
n ww
m o d ?
function y = DFT(x)
n = length(x);
y = x(1)*ones(n,1);
if n > 1
v = exp(-2*pi*sqrt(-1)/n).^(0:n-1)';
for k=2:n
z = rem((k-1)*(0:n-1)',n ) +1;
y = y + v(z)*x(k);
end
end
O(n2)
O(nlog2n)
The starting point is to look at an even order DFT matrix when
we permute its columns so that the odd-indexed columns come
first,
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
wwwwwww
wwwwww
wwwwwww
wwww
wwwwwww
wwwwww
wwwwwww
F
234567
246246
364725
4444
527463
642642
765432
8
1
11
1
1111
1
11
1
11111111
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
????
????
????
????
?
573246
626244
753642
573246
626244
753642
8
1
11
1
11111111
1
11
1
11111111
)(,,
wwwwwww
wwwwww
wwwwwww
wwwwwww
wwwwww
wwwwwww
c o l sF
Cols=[1 3 5 7 2 4 6 8]
Noting that
4
2
8
2 www ??
?
?
?
?
?
?
?
?
44
44
8 )(,,DFF
DFF
c o l sF
?
?
?
?
?
?
?
?
?
?
?
?
?
3
2
1
w
w
w
D
?
?
?
?
?
?
?
?
?
?
?
?
?
??
)8:2:2(
)8:2:1(
)()(,,
44
44
8 x
x
DFF
DFF
c o lsxc o lsFxF
?
?
?
?
?
?
?
?
?
?
?
?
?
?
)8:2:2(
)8:2:1(
4
4
xF
xF
DI
DI
y=F8x yT=F4x(1:2:8)y
B=F4x(2:2:8)
y(1:m)=yT+d,yB
y(m+1:n)=yT - d,yB
?
?
?
?
?
?
?
?
?
?
?
?
?
3
2
1
w
w
w
d
n=2m
y=Fnx
yT=Fmx(1:2:n)
yB=Fmx(2:2:n)
y(1:4)=yT+d,yB
y(5:8)=yT - d,yB
?
?
?
?
?
?
?
?
?
?
?
?
?
?1
1
m
n
w
w
d
?
function y = FFTRecur(x)
n = length(x);
if n ==1
y = x;
else
m = n/2;
yT = FFTRecur(x(1:2:n));
yB = FFTRecur(x(2:2:n));
d = exp(-2*pi*sqrt(-1)/n).^(0:m-1)';
z = d.*yB;
y = [ yT+z ; yT-z ];
end
Radix-2 FFT
FFTRecur is slower than FFT,Since FFTRecur involves more
flops than FFT concerns the computation of the,weight vector”
d,As it stands,there is a considerable amount of redundant
computation with respect to the exponential values that are
required,This can be avoided by precomputing the weights,
storing them in a vector,and then merely,looking up” the values
as they are needed during the recursion,With care,the amount of
work required by a radix-2 FFT is 5nlog2n flops
n DFT FFTRecur FFT
Time Time Time
------------------------------------------
4 0 1.100000e-001 0
8 6.000000e-002 1.100000e-001 0
16 1.600000e-001 2.200000e-001 0
32 3.300000e-001 5.500000e-001 0
64 8.800000e-001 1.100000e+000 0
128 3.180000e+000 2.310000e+000 0
256 9.500000e+000 4.620000e+000 5.000000e-002
512 3.263000e+001 9.390000e+000 0
1024 1.198500e+002 1.922000e+001 0
Strassen Multiplication
??
?
??
?
??
???
??
?
??
?
??
?
??
??
??
?
??
?
2222122121221121
2212121121121111
2221
1211
2221
1211
2221
1211
BABABABA
BABABABA
BB
BB
AA
AA
CC
CC
6231224221
5312754111
222122127
1211112162212115
11212242212113
1122212221122111
,
,
))((
))((,)(
)(),(
)(),)((
PPPPCPPC
PPCPPPPC
BBAAP
BBAAPBAAP
BBAPBBAP
BAAPBBAAP
??????
??????
???
?????
????
?????
Strassen multiplication scheme
n=2m,Strassen, 7 multiplications m*m matrix
18 matrix additions
2(7m3)+18m2=7/8(2n3)+9/2n2
Conventional algorithms
2n3+n2
n Strass Times Ordinary times
-------------------------------------
32 0.000 0.930
64 0.060 0.000
128 0.110 0.000
256 0.770 0.050
512 5.550 0.990
1024 39.650 4.180