CHAPTER 8
THE DISJOINT SET ADT
§ 1 Equivalence Relations
【 Definition】 A relation R is defined on a set S if for every
pair of elements (a,b),a,b?S,a R b is either true or false,If
a R b is true,then we say that a is related to b.
【 Definition】 A relation,~,over a set,S,is said to be an
equivalence relation over S iff it is symmetric,reflexive,and
transitive over S.
【 Definition】 Two members x and y of a set S are said to be
in the same equivalence class iff x ~ y.
§ 2 The Dynamic Equivalence Problem
Given an equivalence relation ~,decide for any a and b if a ~ b.
〖 Example〗 Given S = { 1,2,3,4,5,6,7,8,9,10,11,12 } and 9
relations,12?4,3?1,6?10,8?9,7?4,6?8,3?5,2?11,11?12,
The equivalence classes are { 2,4,7,11,12 },{ 1,3,5 },{ 6,8,9,10 }
Algorithm:
{ /* step 1,read the relations in */
Initialize N disjoint sets;
while ( read in a ~ b ) {
if ( ! (Find(a) == Find(b)) )
Union the two sets;
} /* end-while */
/* step 2,decide if a ~ b */
while ( read in a and b )
if ( Find(a) == Find(b) ) output( true );
else output( false );
}
(Union / Find)
Elements of the sets,1,2,3,...,N
Sets,S1,S2,...,.,and Si? Sj =? ( if i? j ) —— disjoint
〖 Example〗 S1 = { 6,7,8,10 },S2 = { 1,4,9 },S3 = { 2,3,5 }
10
6 87
4
1 9
2
3 5
A possible forest representation of these sets
Note:
Pointers are
from children
to parents
Operations,
(1) Union( i,j ),:= Replace Si and Sj by S = Si? Sj
(2) Find( i ),:= Find the set Sk which contains the element i.
§ 3 Basic Data Structure
Union ( i,j )
Idea,Make Si a subtree of Sj,or vice versa,That is,we can
set the parent pointer of one of the roots to the other root.
10
6 87 4
1 9
4
1 910
6 87S1? S2 S2? S1
Implementation 1:
S1
S2
S3
name[ ]
10
6 87
4
1 9
2
3 5
S2? S1
S2
Here we use the fact that
the elements are numbered from 1 to N,
Hence they can be used as
indices of an array.
Implementation 2,S [ element ] = the element’s parent.
Note,S [ root ] = 0 and set name = root index.
10
6 87
4
1 9
2
3 5
〖 Example〗 The array representation of the three sets is
( S1? S2? S1 )? S [ 4 ] = 10
Find ( i )
Implementation 1:
name[k] S? j
i,..
find ( i ) = ‘S’
Implementation 2:
SetType Find ( ElementType X,
DisjSet S )
{ for ( ; S[X] > 0; X = S[X] ) ;
return X ;
}
void SetUnion ( DisjSet S,
SetType Rt1,
SetType Rt2 )
{ S [ Rt2 ] = Rt1 ; }
[1]
4
[2]
0
[10]
0
[9]
4
[8]
10
[7]
10
[6]
10
[5]
2
[4]
0
[3]
2S 10
Analysis
Practically speaking,union and find are always paired,
Thus we consider the performance of a sequence of union-
find operations.
〖 Example〗 Given S = { 1,2,3,4,5,6,7,8,9,10,11,12 }
and 9 relations,12?4,3?1,6?10,8?9,7?4,6?8,3?5,2?11,
11?12,We have 3 equivalence classes { 2,4,7,11,12 },{ 1,
3,5 },and { 6,8,9,10 }
Algorithm using union-find operations
{ Initialize Si = { i } for i = 1,...,12 ;
for ( k = 1; k <= 9; k++ ) { /* for each pair i? j */
if ( Find( i ) != Find( j ) )
SetUnion( Find( i ),Find( j ) );
}
}
Can you think of
a worst case example?
Sure,Try this one:
union(2,1),find(1);
union(3,2),find(1);
...,..,;
union(N,N?1),find(1).
N
N?1
1
T =?( N2 ) !
That’s not
good.
§ 4 Smart Union Algorithms
Union-by-Size -- Always change the smaller tree
S [ Root ] = – size; /* initialized to be –1 */
Now T = O( N )
for the worst case
example I gave.
1
2 N?
【 Lemma】 Let T be a tree created by union-by-size with
N nodes,then
1l o g)( 2 NTh e i g h t
Proof,By induction,(Each element can have its set
name changed at most log2 N times.)
Time complexity of N Union and M Find operations is
now O( N + M log2 N ).
Union-by-Height -- Always change the shallow tree
/*Assume Root1 and Root2 are Roots
Union is a C ketword,so this routine is named SetUnion
*/
void SetUnion(DisjSet s,SetType Root1,SetType Root2)
{
if(s[Root2]<s[Root1]) //Root2 is deeper set
s[Root1]=Root2; //make Root2 new root
else
{
if(s[Root1]==s[Root2]) //same height
s[Root1]--; //so update
s[Root2]=Root1;
}
}
Code of Union-by-height (rank)
§ 5 Path Compression
SetType Find ( ElementType X,DisjSet S )
{
if ( S[ X ] <= 0 ) return X;
else return S[ X ] = Find( S[ X ],S );
}
SetType Find ( ElementType X,DisjSet S )
{ ElementType root,trail,lead;
for ( root = X; S[ root ] > 0; root = S[ root ] ); /* find the root */
for ( trail = X; trail != root; trail = lead ) {
lead = S[ trail ] ;
S[ trail ] = root ;
} /* collapsing */
return root ;
}
Note,Not compatible with union-by-
height since it changes the
heights,Just take,height” as
an estimated rank.
Slower for
a single find,but
faster for a sequence of
find operations.
§ 6 Worst Case for
Union-by-Rank and Path Compression
【 Lemma (Tarjan)】 Let T( M,N ) be the maximum time required to
process an intermixed sequence of M? N finds and N? 1 unions,Then,
k1M? ( M,N )? T( M,N )? k2M? ( M,N )
for some positive constants k1 and k2,




2 a n d 2))1,(,1(
1 a n d 2)2,1(
1 a n d 12
),(
jijiAiA
jiiA
ji
jiA
j
}l o g),(|1m i n {),( NNMiAiNM
Ackermann’s Function and? ( M,N )
O( log* N )? 4
http://mathworld.wolfram.com/AckermannFunction.html
log* N (inverse Ackermann function)
= # of times the logarithm is applied to N until the result? 1.
)4,2(?A
655362 22 222?
log* 265536 = 5
since
logloglogloglog
( 265536 )= 1
We have a network of computers and a list of bi-directional connections,
Each of these connections allows a file transfer from one computer to
another,Is it possible to send a file from any computer on the network
to any other?
Input:
Input consists of several test cases.
For each test case,the first line contains N (<=10,000),the total number
of computers in a network,Each computer in the network is then
represented by a positive integer between 1 and N.
Then in the following lines,the input is given in the format:
I c1 c2 where I stands for inputting a connection between c1 and
c2; or
C c1 c2 where C stands for checking if it is possible to transfer files
between c1 and c2; or
S where S stands for stopping this case,
Output:
For each C case,print in one line the word yes or no if it is possible or
impossible to transfer files between c1 and c2,respectively,
At the end of each case,print in one line The network is
connected,if there is a path between any pair of computers; or
There are k components,where k is the number of connected
components in this network.
Print a blank line between test cases,
Sample Input:
3
C 1 2
I 1 2
C 1 2
S
3
I 3 1
I 2 3
C 1 2
S
Sample Output:
no
yes
There are 2 components,
yes
The network is connected,
Must use union-by-size and path compression to obtain
a worst-case running time of O( M? ( M,N ) ).