数据结构
Data Structures
主讲人,李晓红
教材 (Text Book)
Data Structures and
Algorithm Analysis in C
(2nd Edition)
Mark Allen Weiss
参考书目 (Reference)
数据结构?(C语言版) 严蔚敏等编著,清华大学出版社
数据结构?( 用面向对象方法与 C++描述 )
殷人昆等编著,清华大学出版社课程评分方法 (Grading Policies)
Lecture Grade (80) = Final Exam (80)
Laboratory Grade (15) =
15.051


5
1i
) i ( L a b
Other Grade(5)=Attendance + Homework
作业 &出勤
布置作业的下周五交,迟交不交扣分。做则有分,不计对错,一周后公布参考答案。
必须按照软件学院规范做作业,否则不计分,1/3作业未交者不能参加期末考试。
点名三次不到者不能参加期末考试。
实验 (Laboratory Projects)
共 5 次;迟交罚扣为 10% /day;抄袭者 0分;
3人一组,各组自行分工(例如:按功能模块、按开发过程等),各自工作量应相当,试验报告中必须写明各自工作量 ;
通过 e-mail提交,请在 邮件标题写明 班级学号 及 试验
ID,并在 文档末尾写明分工 。
助教( Teaching Assistant):
杜洪伟( dhw198426@126.com)
潘东 ( pandong912@gmail.com)
CHAPTER 2
ALGORITHM ANALYSIS
【 Definition】 An algorithm is a finite set of instructions
that,if followed,accomplishes a particular task,In
addition,all algorithms must satisfy the following criteria:
(1) Input There are zero or more quantities that are externally
supplied.
(2) Output At least one quantity is produced.
(3) Definiteness Each instruction is clear and unambiguous.
(4) Finiteness If we trace out the instructions of an algorithm,then
for all cases,the algorithm terminates after finite number of steps.
(5) Effectiveness Every instruction must be basic enough to be
carried out,in principle,by a person using only pencil and paper,It is
not enough that each operation be definite as in(3); it also must be
feasible.
Note,A program is written in some programming language,
and does not have to be finite (e.g,an operation system).
An algorithm can be described by human languages,
flow charts,some programming languages,or pseudo-
code.
〖 Example〗 Selection Sort,Sort a set of n? 1 integers in
increasing order.
From those integers that are currently unsorted,find the
smallest and place it next in the sorted list.
for ( i = 0; i < n; i++) {
Examine list[i] to list[n?1] and suppose that the smallest
integer is at list[min];
Interchange list[i] and list[min];
}
Sort = Find the smallest integer + Interchange it with list[i].
§ 1 What to Analyze
Machine & compiler-dependent run times,
Time & space complexities,machine & compiler-
independent.
Assumptions:
instructions are executed sequentially
each instruction is simple,and takes exactly one time unit
integer size is fixed and we have infinite memory
Typically the following two functions are analyzed:
Tavg(N) & Tworst(N) -- the average and worst case time
complexities,respectively,as functions of input size N.
〖 Example〗 Matrix addition
void add ( int a[ ][ MAX_SIZE ],
int b[ ][ MAX_SIZE ],
int c[ ][ MAX_SIZE ],
int rows,int cols )
{
int i,j ;
for ( i = 0; i < rows; i++ )
for ( j = 0; j < cols; j++ )
c[ i ][ j ] = a[ i ][ j ] + b[ i ][ j ];
}
/* rows + 1 */
/* rows(cols+1) */
/* rows? cols */
T(rows,cols ) = 2 rows? cols + 2rows + 1
〖 Example〗 Iterative
function for summing
a list of numbers
float sum ( float list[ ],int n )
{ /* add a list of numbers */
float tempsum = 0;
int i ;
for ( i = 0; i < n; i++ )
tempsum += list [ i ] ;
return tempsum;
}
/* count = 1 */
/* count ++ */
/* count ++ for last execution of for */
/* count ++ */
/* count ++ */
Tsum ( n ) = 2n + 3
〖 Example〗 Recursive
function for summing a
list of numbers
float rsum ( float list[ ],int n )
{ /* add a list of numbers */
if ( n )
return rsum(list,n?1) + list[n? 1];
return 0;
}
/* count ++ */
/* count ++ */
/* count ++ */
Trsum ( n ) = 2n + 2
But it takes more time to
compute each step.
§ 2 Asymptotic Notation (?,?,?,o )
The point of counting the steps is to predict the
growth in run time as the N change,and thereby
compare the time complexities of two programs,
So what we really want to know is the asymptotic
behavior of Tp.
Suppose Tp1 ( N ) = c1N2 + c2N and Tp2 ( N ) = c3N,
Which one is faster?
No matter what c1,c2,and c3 are,there will be an n0
such that Tp1 ( N ) > Tp2 ( N ) for all N > n0.
【 Definition】 T (N) = O( f (N) ) if there are positive constants c
and n0 such that T (N)? c? f (N) for all N? n0.
【 Definition】 T (N) =?( g(N) ) if there are positive constants c
and n0 such that T (N)? c? g(N) for all N? n0.
【 Definition】 T (N) =?( h(N) ) if and only if T (N) = O( h(N) ) and
T (N) =?( h(N) ),
Note,
2N + 3 = O( N ) = O( Nk?1 ) = O( 2N ) = We shall always take the
smallestf (N).
2N + N2 =?( 2N ) =?( N2 ) =?( N ) =?( 1 ) = We shall always take
the largest g(N).
【 Definition】 T (N) = o( p(N) ) if T (N) = O( p(N) ) and T (N)?
( p(N) ),
Rules of Asymptotic Notation
If T1 (N) = O( f (N) ) and T2 (N) = O( g(N) ),then
(a) T1 (N) + T2 (N) = max( O( f (N)),O( g(N)) ),
(b) T1 (N) * T2 (N) = O( f (N) * g(N) ).
If T (N) is a polynomial of degree k,then T (N) =?( N k ).
logk N = O(N) for any constant k,This tells us that
logarithms grow very slowly.
Note,When compare the complexities of two programs
asymptotically,make sure that N is sufficiently large.
For example,suppose that Tp1 ( N ) = 106N and Tp2 ( N ) = N2,
Although it seems that?( N2 ) grows faster than?( N ),but if N <
106,P2 is still faster than P1.
In put siz e n
Ti me Na me 1 2 4 8 1 6 3 2
1
l o g n
n
n l o g n
n
2
n
3
con sta nt
l o g a ri th mi c
l i nea r
l o g li nea r
qua dra tic
cubic
1 1 1 1 1 1
0 1 2 3 4 5
1 2 4 8 1 6 3 2
0 2 8 2 4 6 4 16 0
1 4 1 6 6 4 2 5 6 1 0 2 4
1 8 6 4 5 1 2 4 0 9 6 3 2 7 6 8
2
n
n !
exp o nent i a l
fa cto ri a l
2 4 1 6 2 5 6 6 5 5 3 6 4 2 9 4 9 6 7 2 96
1 2 2 4 4 0 3 2 6 2 0 9 2 2 7 8 9 8 8 0 0 0 2 6 3 1 3? 10
33
0 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
2n n2
n log n
n
Log n
f
n
T ime fo r f ( n ) instru ctions o n a 10
9
instr/se c compute r
n f ( n )= n log
2
n n
2
n
3
n
4
n
10
2
n
10
20
30
40
50
100
1,000
10,000
100,000
1,000,000
,01? s
,02? s
,03? s
,04? s
,05? s
,10? s
1.00? s
10? s
100? s
1.0ms
,03? s
,09? s
,15? s
,21? s
,28? s
,66? s
9.96? s
130.03? s
1.66ms
19.92ms
.1? s
.4? s
.9? s
1.6? s
2.5? s
10? s
1ms
100ms
10s ec
16.67min
1? s
8? s
27? s
64? s
125? s
1ms
1sec
16.67min
1 1.57d
31.71yr
10? s
160? s
810? s
2.56ms
6.25ms
100ms
16.67min
1 15.7d
3171y r
3.17? 10
7
yr
10s ec
2.84hr
6.83d
121.36d
3.1yr
3171y r
3.17? 10
13
yr
3.17? 10
23
yr
3.17? 10
33
yr
3.17? 10
43
yr
1? s
1ms
1sec
18.3mi n
13d
4? 10
13
yr
32? 10
283
yr
s = microsecond = 10-6 seconds
ms = millisecond = 10-3 seconds
sec = seconds
min = minutes yr = years
hr = hours
d = days
n
〖 Example〗 Matrix addition
void add ( int a[ ][ MAX_SIZE ],
int b[ ][ MAX_SIZE ],
int c[ ][ MAX_SIZE ],
int rows,int cols )
{
int i,j ;
for ( i = 0; i < rows; i++ )
for ( j = 0; j < cols; j++ )
c[ i ][ j ] = a[ i ][ j ] + b[ i ][ j ];
}
/*? (rows) */
/*? (rows? cols ) */
/*? (rows? cols ) */
T(rows,cols ) =? (rows? cols )
General Rules
FOR LOOPS,The running time of a for loop is at most the
running time of the statements inside the for loop (including
tests) times the number of iterations.
NESTED FOR LOOPS,The total running time of a statement
inside a group of nested loops is the running time of the
statements multipliedby the product of the sizes of all the for
loops.
CONSECUTIVE STATEMENTS,These just add (which
means that the maximum is the one that counts).
IF / ELSE,For the fragment
if ( Condition ) S1;
else S2;
the running time is never more than the running time of the
test plus the larger of the running time of S1 and S2.
RECURSIONS:
〖 Example〗 Fibonacci number,
Fib(0) = Fib(1) = 1,Fib(n) = Fib(n?1) + Fib(n?2)
long int Fib ( int N )
{
if ( N <= 1 )
return 1;
else
return Fib( N? 1 ) + Fib( N? 2 );
}
/* O( 1 ) */
/* O( 1 ) */
/*O(1)*/
/* T ( N ) */
/*T(N?1)*/ /*T(N?2)*/
T(N) = T(N?1) + T(N?2) + 2? Fib(N)
NN
N 35)(F i b23
T(N) grows exponentially
§ 3 Compare the Algorithms
〖 Example〗 Given (possibly negative) integers A1,A2,…,
AN,find the maximum value of,?
j ik kA
Algorithm 1
int MaxSubsequenceSum ( const int A[ ],int N )
{
int ThisSum,MaxSum,i,j,k;
/* 1*/ MaxSum = 0; /* initialize the maximum sum */
/* 2*/ for( i = 0; i < N; i++ ) /* start from A[ i ] */
/* 3*/ for( j = i; j < N; j++ ) { /* end at A[ j ] */
/* 4*/ ThisSum = 0;
/* 5*/ for( k = i; k <= j; k++ )
/* 6*/ ThisSum += A[ k ]; /* sum from A[ i ] to A[ j ] */
/* 7*/ if ( ThisSum > MaxSum )
/* 8*/ MaxSum = ThisSum; /* update max sum */
} /* end for-j and for-i */
/* 9*/ return MaxSum;
} T( N ) = O( N3 ) (Detailed analysis is given on p.18-19.)
Algorithm 2
int MaxSubsequenceSum ( const int A[ ],int N )
{
int ThisSum,MaxSum,i,j;
/* 1*/ MaxSum = 0; /* initialize the maximum sum */
/* 2*/ for( i = 0; i < N; i++ ) { /* start from A[ i ] */
/* 3*/ ThisSum = 0;
/* 4*/ for( j = i; j < N; j++ ) { /* end at A[ j ] */
/* 5*/ ThisSum += A[ j ]; /* sum from A[ i ] to A[ j ] */
/* 6*/ if ( ThisSum > MaxSum )
/* 7*/ MaxSum = ThisSum; /* update max sum */
} /* end for-j */
} /* end for-i */
/* 8*/ return MaxSum;
}
T( N ) = O( N2 )
Algorithm 3 Divide and Conquer
4?3 5?2?1 2 6?2
conquer divide
4 5
6
2 6
8
11
T ( N/2 ) T ( N/2 )O( N )
T ( N ) = 2 T( N/2 ) + c N,T(1) = O(1)
= 2 [2 T( N/22 ) + c N/2] + c N
= 2k O(1) + c k N where N/2k = 1
= O( N log N )
Also true for
N? 2k
The program
can be found
on p.21.
Algorithm 4 On-line Algorithm
int MaxSubsequenceSum( const int A[ ],int N )
{
int ThisSum,MaxSum,j;
/* 1*/ ThisSum = MaxSum = 0;
/* 2*/ for ( j = 0; j < N; j++ ) {
/* 3*/ ThisSum += A[ j ];
/* 4*/ if ( ThisSum > MaxSum )
/* 5*/ MaxSum = ThisSum;
/* 6*/ else if ( ThisSum < 0 )
/* 7*/ ThisSum = 0;
} /* end for-j */
/* 8*/ return MaxSum;
}
T( N ) = O( N )
A[ ] is scanned once only.
1 3?2 4?6 1 6?13 2 4
At any point in time,the algorithm
can correctly give an answer to the
subsequence problem for the data
it has already read.
0.00034
0.00063
0.00333
0.03042
0.29832
0.00066
0.00486
0.05843
0.68631
8.0113
0.00045
0.01112
1.1233
111.13
NA
0.00103
0.47015
448.77
NA
NA
O( N )O(Nlog N)O( N2 )O( N3 )Time
4321Algorithm
N =10
N =100
N =1,000
N =10,000
N =100,000
Input
Size
Running times of several algorithms for maximum
subsequence sum (in seconds)
Note,The time required to read the input is not included.
§ 4 Logarithms in the Running Time
〖 Example〗 Binary Search:
Given,A [0]? A [1]? ……? A [N? 1] ; X
Task,Find X
Output,i if X = = A [ i ]
1 if X is not found
low highmid
X ~ A [mid]
low high = mid? 1 highlow= mid + 1
< >==
mid
int BinarySearch ( const ElementType A[ ],
ElementType X,int N )
{
int Low,Mid,High;
/* 1*/ Low = 0; High = N - 1;
/* 2*/ while ( Low <= High ) {
/* 3*/ Mid = ( Low + High ) / 2;
/* 4*/ if ( A[ Mid ] < X )
/* 5*/ Low = Mid + 1;
else
/* 6*/ if ( A[ Mid ] > X )
/* 7*/ High = Mid - 1;
else
/* 8*/ return Mid; /* Found */
} /* end while */
/* 9*/ return NotFound; /* NotFound is defined as -1 */
}
Tworst( N ) = O( log N )
§ 5 Checking Your Analysis
Method 1 When T(N) = O(N),check if T(2N)/T(N)?2
When T(N) = O(N2),check if T(2N)/T(N)?4
When T(N) = O(N3),check if T(2N)/T(N)?8
… …
Method 2 When T(N) = O( f (N) ),check if
C o n s t a n t)( )(l i m Nf NTN
Read the example given on p.28 (Figures 2.12 & 2.13).