Rough Sets in KDD
Tutorial Notes
Andrzej Skowron
Warsaw University
skowron@mimuw.eud.pl
Ning Zhong
Maebashi Institute of Technolgy
zhong@maebashi-it.ac.jp
Copyright 2000 by A,Skowron & N,Zhong
About the Speakers
Andrzej Skowron received his Ph.D,from Warsaw University,
He is a professor in Faculty of Mathematics,Computer Science and
Mechanics,Warsaw University,Poland,His research interests
include soft computing methods and applications,in particular,
reasoning with incomplete information,approximate reasoning,
rough sets,rough mereology,granular computing,synthesis and
analysis of complex objects,intelligent agents,knowledge discovery
and data mining,etc,with over 200 journal and conference
publications,He is an editor of several international journals and
book series including Fundamenta Informaticae (editor in chief),
Data Mining and Knowledge Discovery,He is president of
International Rough Set Society,He was an invited speaker at many
international conferences,and has served or is currently serving on
the program committees of over 40 international conferences and
workshops,including ISMIS’97-99 (program chair),RSCTC’98-00
(program chair),RSFDGrC’99 (program chair).
About the Speakers (2)
Ning Zhong received his Ph.D,from the University of Tokyo,
He is head of Knowledge Information Systems Laboratory,and an
associate professor in Department of Information Engineering,
Maebashi Institute of Technology,Japan,His research interests
include knowledge discovery and data mining,rough sets and
granular-soft computing,intelligent agents and databases,Web-
intelligence (WI),knowledge information systems,with over 80
journal and conference publications,He is an editor of Knowledge
and Information Systems,an international journal (Springer),He is
a member of the advisory board of International Rough Set Society,
the Steering Committee of IEEE International Conferences on Data
Mining,and PAKDD conferences,the advisory board of
BISC/SIGGrC,He has served or is currently serving on the program
committees of over 30 international conferences and workshops,
including PAKDD’99 (program chair),IAT’99 and 2001 (program
chair),RSFDGrC’99 (program chair),and WI’2001(program chair),
Contents
Introduction
Basic Concepts of Rough Sets
A Rough Set Based KDD process
Rough Sets in ILP and GrC
Concluding Remarks
(Summary,Advanced Topics,References
and Further Readings),
Introduction
Rough set theory was developed by
Zdzislaw Pawlak in the early 1980’s.
Representative Publications:
– Z,Pawlak,“Rough Sets”,International Journal
of Computer and Information Sciences,Vol.11,
341-356 (1982).
– Z,Pawlak,Rough Sets - Theoretical Aspect of
Reasoning about Data,Kluwer Academic
Pubilishers (1991).
Introduction (2)
The main goal of the rough set analysis is
induction of approximations of concepts.
Rough sets constitutes a sound basis for
KDD,It offers mathematical tools to
discover patterns hidden in data.
It can be used for feature selection,feature
extraction,data reduction,decision rule
generation,and pattern extraction
(templates,association rules) etc.
identifies partial or total dependencies in
Introduction (3)
Recent extensions of rough set theory (rough
mereology) have developed new methods for
decomposition of large data sets,data mining
in distributed and multi-agent systems,and
granular computing.
This presentation shows how several aspects of
the above problems are solved by the (classic)
rough set approach,discusses some advanced
topics,and gives further research directions.
Basic Concepts of Rough Sets
Information/Decision Systems (Tables)
Indiscernibility
Set Approximation
Reducts and Core
Rough Membership
Dependency of Attributes
Information Systems/Tables
IS is a pair (U,A)
U is a non-empty
finite set of objects.
A is a non-empty finite
set of attributes such
that for
every
is called the value
set of a.
aVUa?:
.Aa?
aV
Age LEMS
x1 16-30 50
x2 16-30 0
x3 31-45 1-25
x4 31-45 1-25
x5 46-60 26-49
x6 16-30 26-49
x7 46-60 26-49
Decision Systems/Tables
DS,
is the decision
attribute (instead of one
we can consider more
decision attributes).
The elements of A are
called the condition
attributes.
Age LEMS Walk
x1 16-30 50 yes
x2 16-30 0 no
x3 31-45 1-25 no
x4 31-45 1-25 yes
x5 46-60 26-49 no
x6 16-30 26-49 yes
x7 46-60 26-49 no
}){,( dAUT
Ad?
Issues in the Decision Table
The same or indiscernible objects may be
represented several times.
Some of the attributes may be superfluous.
Indiscernibility
The equivalence relation
A binary relation which is
reflexive (xRx for any object x),
symmetric (if xRy then yRx),and
transitive (if xRy and yRz then xRz),
The equivalence class of an element
consists of all objects such that
xRy.
XXR
Xx? Xy?
Rx][
Indiscernibility (2)
Let IS = (U,A) be an information system,then
with any there is an associated equivalence
relation:
where is called the B-indiscernibility
relation.
If then objects x and x’ are
indiscernible from each other by attributes from B.
The equivalence classes of the B-indiscernibility
relation are denoted by
AB?
)}'()(,|)',{()( 2 xaxaBaUxxBI N D IS
)( BI N D IS
),()',( BI N Dxx IS?
.][ Bx
An Example of Indiscernibility
The non-empty subsets of
the condition attributes
are {Age},{LEMS},and
{Age,LEMS}.
IND({Age}) = {{x1,x2,x6},
{x3,x4},{x5,x7}}
IND({LEMS}) = {{x1},
{x2},{x3,x4},{x5,x6,x7}}
IND({Age,LEMS}) =
{{x1},{x2},{x3,x4},
{x5,x7},{x6}},
Age LEMS Walk
x1 16-30 50 yes
x2 16-30 0 no
x3 31-45 1-25 no
x4 31-45 1-25 yes
x5 46-60 26-49 no
x6 16-30 26-49 yes
x7 46-60 26-49 no
Observations
An equivalence relation induces a partitioning
of the universe.
The partitions can be used to build new subsets
of the universe.
Subsets that are most often of interest have the
same value of the decision attribute.
It may happen,however,that a concept such as
“Walk” cannot be defined in a crisp manner.
Set Approximation
Let T = (U,A) and let and
We can approximate X using only the
information contained in B by constructing
the B-lower and B-upper approximations of
X,denoted and respectively,where
AB?,UX?
XB XB
},][|{ XxxXB B
}.][|{ XxxXB B
Set Approximation (2)
B-boundary region of X,
consists of those objects that we cannot
decisively classify into X in B.
B-outside region of X,
consists of those objects that can be with
certainty classified as not belonging to X.
A set is said to be rough if its boundary
region is non-empty,otherwise the set is
crisp.
,)( XBXBXBN B
,XBU?
An Example of Set Approximation
Let W = {x | Walk(x) = yes},
The decision class,Walk,is
rough since the boundary
region is not empty.
Age LEMS Walk
x1 16-30 50 yes
x2 16-30 0 no
x3 31-45 1-25 no
x4 31-45 1-25 yes
x5 46-60 26-49 no
x6 16-30 26-49 yes
x7 46-60 26-49 no
}.7,5,2{
},4,3{)(
},6,4,3,1{
},6,1{
xxxWAU
xxWBN
xxxxWA
xxWA
A
An Example of
Set Approximation (2)
yes
yes/no
no
{{x1},{x6}}
{{x3,x4}}
{{x2},{x5,x7}}
AW
WA
U
setX
U/R
R,subset of
attributes
XR
XXR?
Lower & Upper Approximations
Lower & Upper Approximations
(2)
}:/{ XYRUYXR
}:/{ XYRUYXR?
Lower Approximation:
Upper Approximation:
Lower & Upper Approximations
(3)
X1 = {u | Flu(u) = yes}
= {u2,u3,u6,u7}
RX1 = {u2,u3}
= {u2,u3,u6,u7,u8,u5}
X2 = {u | Flu(u) = no}
= {u1,u4,u5,u8}
RX2 = {u1,u4}
= {u1,u4,u5,u8,u7,u6}X1R X2R
U He adac h e T e m p,Fl u
U1 Y e s N o r m a l No
U2 Y e s H ig h Y e s
U3 Y e s V e r y - hi g h Y e s
U4 No N o r m a l No
U5 N o H i g h N o
U6 No V e ry - h igh Y e s
U7 N o H i g h Y e s
U8 No V e ry - h igh No
The indiscernibility classes defined by
R = {Headache,Temp.} are
{u1},{u2},{u3},{u4},{u5,u7},{u6,u8}.
Lower & Upper Approximations
(4)
R = {Headache,Temp.}
U/R = { {u1},{u2},{u3},{u4},{u5,u7},{u6,u8}}
X1 = {u | Flu(u) = yes} = {u2,u3,u6,u7}
X2 = {u | Flu(u) = no} = {u1,u4,u5,u8}
RX1 = {u2,u3}
= {u2,u3,u6,u7,u8,u5}
RX2 = {u1,u4}
= {u1,u4,u5,u8,u7,u6}
X1R
X2R
u1
u4u3
X1 X2
u5u7u2
u6 u8
Properties of Approximations
YX
YBXBYXB
YBXBYXB
UUBUB BB
XBXXB
)()()(
)()()(
)()(,)()(
)(
)()( YBXB? )()( YBXB?implies and
Properties of Approximations (2)
)())(())((
)())(())((
)()(
)()(
)()()(
)()()(
XBXBBXBB
XBXBBXBB
XBXB
XBXB
YBXBYXB
YBXBYXB
where -X denotes U - X.
Four Basic Classes of Rough Sets
X is roughly B-definable,iff and
X is internally B-undefinable,iff
and
X is externally B-undefinable,iff
and
X is totally B-undefinable,iff
and
)( XB
,)( UXB?
)( XB
,)( UXB?
)( XB
,)( UXB?
)( XB
.)( UXB?
Accuracy of Approximation
where |X| denotes the cardinality of
Obviously
If X is crisp with respect to B.
If X is rough with respect to B.
|)(|
|)(|)(
XB
XBX
B
.X
.10 B?
,1)(?XB?
,1)(?XB?
Issues in the Decision Table
The same or indiscernible objects may be
represented several times.
Some of the attributes may be superfluous
(redundant).
That is,their removal cannot worsen the
classification,
Reducts
Keep only those attributes that preserve the
indiscernibility relation and,consequently,
set approximation,
There are usually several such subsets of
attributes and those which are minimal are
called reducts.
Dispensable & Indispensable
Attributes
Let
Attribute c is dispensable in T
if,otherwise
attribute c is indispensable in T,
.Cc?
)()( }){( DPO SDPO S cCC
XCDP O S
DUX
C?
/
)(
The C-positive region of D:
Independent
T = (U,C,D) is independent
if all are indispensable in T.Cc?
Reduct & Core
The set of attributes is called a reduct
of C,if T’ = (U,R,D) is independent and
The set of all the condition attributes
indispensable in T is denoted by CORE(C).
where RED(C) is the set of all reducts of C.
CR?
).()( DP O SDP O S CR?
)()( CR E DCC OR E
An Example of Reducts & Core
U H e ad ach e M u s c l e
pa i n
T e m p,F l u
U1 Y e s Y e s No r m a l No
U2 Y e s Y e s Hi g h Y e s
U3 Y e s Y e s V e r y - hi g h Y e s
U4 No Y e s No r m a l No
U5 No No Hi g h No
U6 No Y e s V e r y - hi g h Y e s
U M u s c l e
pa i n
T e m p,F l u
U 1,U 4 Y e s No r m a l No
U2 Y e s Hi g h Y e s
U3,U6 Y e s V e ry - hi g h Y e s
U5 No Hi g h No
U H e ad ac h e T e m p,F l u
U1 Y e s No rl m a l No
U2 Y e s Hi g h Y e s
U3 Y e s V e ry - hi g h Y e s
U4 No No r m a l No
U5 No Hi g h No
U6 No V e ry - hi g h Y e s
Reduct1 = {Muscle-pain,Temp.}
Reduct2 = {Headache,Temp.}
CORE = {Headache,Temp}
{MusclePain,Temp}
= {Temp}
Discernibility Matrix
(relative to positive region)
Let T = (U,C,D) be a decision table,with
By a discernibility matrix of T,denoted M(T),
we will mean matrix defined as:
for i,j = 1,2,…,n such that or belongs to
the C-positive region of D.
is the set of all the condition attributes that
classify objects ui and uj into different classes.
}.,.,,,,{ 21 nuuuU?
nn?
)]()([)}()(:{ )]()([ jiji
ji
udud Dd if ucuc Cc
udud Dd if ijm
iu ju
ijm
Discernibility Matrix
(relative to positive region) (2)
The equation is similar but conjunction is taken
over all non-empty entries of M(T) corresponding
to the indices i,j such that
or belongs to the C-positive region of D.
denotes that this case does not need to be
considered,Hence it is interpreted as logic truth.
All disjuncts of minimal disjunctive form of this
function define the reducts of T (relative to the
positive region),
iu ju
ijm
Discernibility Function
(relative to objects)
For any,Uu
i?
}},.,,,2,1{,:{)( njijmuf ijjiT
where (1) is the disjunction of all variables a
such that if
(2) if
(3) if
ijm?
,ijma?,ijm
),( f a l s em ij,ijm
),( t r u etm ij,ijm
Each logical product in the minimal disjunctive normal
form (DNF) defines a reduct of instance,
iu
Examples of Discernibility Matrix
No a b c d
u1 a0 b1 c1 y
u2 a1 b1 c0 n
u3 a0 b2 c1 n
u4 a1 b1 c1 y
C = {a,b,c}
D = {d}
In order to discern equivalence
classes of the decision attribute d,
to preserve conditions described
by the discernibility matrix for
this table
u1 u2 u3
u2
u3
u4
a,c
b
c a,b
Reduct = {b,c}
cb
bacbca
)()(
Examples of Discernibility Matrix
(2)
a b c d E
u 1 1 0 2 1 1
u 2 1 0 2 0 1
u 3 1 2 0 0 2
u 4 1 2 2 1 0
u 5 2 1 0 0 2
u 6 2 1 1 0 2
u 7 2 1 2 1 1
u1 u2 u3 u4 u5 u6
u2
u3
u4
u5
u6
u7
b,c,d b,c
b b,d c,d
a,b,c,d a,b,c a,b,c,d
a,b,c,d a,b,c a,b,c,d
a,b,c,d a,b c,d c,d
Core = {b}
Reduct1 = {b,c}
Reduct2 = {b,d}
Rough Membership
The rough membership function quantifies
the degree of relative overlap between the
set X and the equivalence class to
which x belongs,
The rough membership function can be
interpreted as a frequency-based estimate of
where u is the equivalence class
of IND(B).
Bx][
]1,0[,?UBX?
|][|
|][|
B
BB
X x
Xx
),|( uXxP?
Rough Membership (2)
The formulae for the lower and upper
approximations can be generalized to some
arbitrary level of precision by
means of the rough membership function
Note,the lower and upper approximations as originally formulated are obtained as a special
case with
]1,5.0(
}.1)(|{
})(|{
xxXB
xxXB
B
X
B
X
.1
Dependency of Attributes
Discovering dependencies between attributes
is an important issue in KDD.
A set of attribute D depends totally on a set
of attributes C,denoted if all values
of attributes from D are uniquely determined
by values of attributes from C.
,DC?
Dependency of Attributes (2)
Let D and C be subsets of A,We will say that
D depends on C in a degree k
denoted by if
where called C-positive
region of D.
),10( k
,DC k?
||
|)(|),(
U
DP O SDCk C
),()( / XCDP O S DUXC
Dependency of Attributes (3)
Obviously
If k = 1 we say that D depends totally on C.
If k < 1 we say that D depends partially
(in a degree k) on C.
.|| |)(|),(
/
DUX U
XCDCk?
A Rough Set Based KDD Process
Discretization based on RS and
Boolean Reasoning (RSBR).
Attribute selection based RS
with Heuristics (RSH),
Rule discovery by GDT-RS.
What Are Issues of Real World?
Very large data sets
Mixed types of data (continuous valued,
symbolic data)
Uncertainty (noisy data)
Incompleteness (missing,incomplete data)
Data change
Use of background knowledge
very large data set
mixed types of data
noisy data
incomplete instances
data change
use of background
knowledge
Real world
issues
Methods ID3 Prism Version BP Dblearn
(C4.5) Space
Okay possible
Probability
Logic
Set
Soft Techniques for KDD
Stoch,Proc.
Belief Nets
Conn,Nets
GDT
Deduction
Induction
Abduction
RoughSets
Fuzzy Sets
Soft Techniques for KDD (2)
Deduction
InductionAbduction
GDT
GrC
RS&ILP
RSTM
A Hybrid Model
GDT,Generalization Distribution Table
RS,Rough Sets
TM,Transition Matrix
ILP,Inductive Logic Programming
GrC,Granular Computing
A Rough Set Based KDD Process
Discretization based on RS and
Boolean Reasoning (RSBR).
Attribute selection based RS
with Heuristics (RSH),
Rule discovery by GDT-RS.
Observations
A real world data set always contains mixed
types of data such as continuous valued,
symbolic data,etc.
When it comes to analyze attributes with real
values,they must undergo a process called
discretization,which divides the attribute’s
value into intervals.
There is a lack of the unified approach to
discretization problems so far,and the choice
of method depends heavily on data considered.
Discretization based on RSBR
In the discretization of a decision table
T = where is
an interval of real values,we search for
a partition of for any
Any partition of is defined by a sequence
of the so-called cuts from
Any family of partitions can be
identified with a set of cuts,
} ),{,( dAU? ),[
aaa wvV?
aP aV
.Aa?
aV
kvvv,..21,aV
AaaP?}{
Discretization Based on RSBR
(2)
In the discretization process,we search for a set
of cuts satisfying some natural conditions.
U a b d
x1 0.8 2 1
x2 1 0.5 0
x3 1.3 3 0
x4 1.4 1 1
x5 1.4 2 0
x6 1.6 3 1
x7 1.3 1 1
U a b d
x1 0 2 1
x2 1 0 0
x3 1 2 0
x4 1 1 1
x5 1 2 0
x6 2 2 1
x7 1 1 1
P P
P = {(a,0.9),
(a,1.5),
(b,0.75),
(b,1.5)}
A Geometrical Representation of
Data
0 0.8 1 1.3 1.4 1.6 a
b
3
2
1
0.5
x1
x2
x3
x4x7
x5
x6
A Geometrical Representation of
Data and Cuts
0 0.8 1 1.3 1.4 1.6 a
b
3
2
1
0.5
x1
x2
x3
x4
x5
x6
x7
Discretization Based on RSBR
(3)
The sets of possible values of a and b are
defined by
The sets of values of a and b on objects
from U are given by
a(U) = {0.8,1,1.3,1.4,1.6};
b(U) = {0.5,1,2,3}.
);2,0[ V a? ).4,0[ V b?
Discretization Based on RSBR
(4)
The discretization process returns a partition
of the value sets of condition attributes into
intervals.
A Discretization Process
Step 1,define a set of Boolean variables,
wherecorresponds to the interval [0.8,1) of a
corresponds to the interval [1,1.3) of a
corresponds to the interval [1.3,1.4) of a
corresponds to the interval [1.4,1.6) of a
corresponds to the interval [0.5,1) of b
corresponds to the interval [1,2) of b
corresponds to the interval [2,3) of b
},,,,,,{)( 3214321 bbbaaaa pppppppUBV?
b
b
b
a
a
a
a
p
p
p
p
p
p
p
3
2
1
4
3
2
1
The Set of Cuts on Attribute a
0.8 1.0 1.3 1.4 1.6
a
ap1
ap2
ap3
ap4
1c 2c 3c 4c
A Discretization Process (2)
Step 2,create a new decision table by using
the set of Boolean variables defined in Step 1.
Let be a decision table,
be a propositional variable corresponding to
the interval for any
and
}){,( dAUT P akp
),[ 1akak vv?
}1,...,1{ ank
.Aa?
A Sample Defined in Step 2
U* ap
1 ap3ap2 ap4 bp1 bp2 bp3
(x1,x2)
(x1,x3)
(x1,x5)
(x4,x2)
(x4,x3)
(x4,x5)
(x6,x2)
(x6,x3)
(x6,x5)
(x7,x2)
(x7,x3)
(x7,x5)
1 0 0 0 1 1 0
1 1 0 0 0 0 1
1 1 1 0 0 0 0
0 1 1 0 1 0 0
0 0 1 0 0 1 1
0 0 0 0 0 1 0
0 1 1 1 1 1 1
0 0 1 1 0 0 0
0 0 0 1 0 0 1
0 1 0 0 1 0 0
0 0 0 0 0 1 0
0 0 1 0 0 1 0
PT
The Discernibility Formula
The discernibility formula
means that in order to discern object x1 and
x2,at least one of the following cuts must
be set,
a cut between a(0.8) and a(1)
a cut between b(0.5) and b(1)
a cut between b(1) and b(2).
bba pppxx 21121 ),(
The Discernibility Formulae for
All Different Pairs
bba pppxx 21121 ),(
baa pppxx 32131 ),(
aaa pppxx 32151 ),(
baa pppxx 13224 ),(
bba pppxx 32234 ),(
bpxx 254 ),(
The Discernibility Formulae for
All Different Pairs (2)
bbbaaa ppppppxx 32143226 ),(
aa ppxx 4336 ),(
ba ppxx 3456 ),(
ba ppxx 1227 ),(
bb ppxx 3237 ),(
ba ppxx 2357 ),(
A Discretization Process (3)
Step 3,find the minimal subset of p that
discerns all objects in different decision
classes.
The discernibility boolean propositional
formula is defined as follows,
) },()(:).({ jiU xdxdji
The Discernibility Formula
in CNF Form
)(
)()(
321
321211
aaa
baabbaU
ppp
pppppp
)()( 322132 bbabaa pppppp
)( 321432 bbbaaa pppppp
)()()( 123443 babaaa pppppp
.)()( 22332 bbabb ppppp
The Discernibility Formula
in DNF Form
We obtain four prime implicants,
is the optimal result,because
it is the minimal subset of P.
},,{ 242 baa ppp
)()( 3232242 bbaabaaU ppppppp
).()( 21413213 bbaabbba pppppppp
The Minimal Set Cuts
for the Sample DB
0 0.8 1 1.3 1.4 1.6 a
b
3
2
1
0.5
x1
x2
x3
x4
x5
x6
x7
A Result
U a b d
x1 0.8 2 1
x2 1 0.5 0
x3 1.3 3 0
x4 1.4 1 1
x5 1.4 2 0
x6 1.6 3 1
x7 1.3 1 1
U a b d
x1 0 1 1
x2 0 0 0
x3 1 1 0
x4 1 0 1
x5 1 1 0
x6 2 1 1
x7 1 0 1
P P
P = {(a,1.2),
(a,1.5),
(b,1.5)}
A Rough Set Based KDD Process
Discretization based on RS and
Boolean Reasoning (RSBR).
Attribute selection based RS
with Heuristics (RSH),
Rule discovery by GDT-RS.
Observations
A database always contains a lot of attributes
that are redundant and not necessary for rule
discovery.
If these redundant attributes are not removed,
not only the time complexity of rule discovery
increases,but also the quality of the discovered
rules may be significantly depleted.
The Goal of Attribute Selection
Finding an optimal subset of attributes in a
database according to some criterion,so that
a classifier with the highest possible
accuracy can be induced by learning
algorithm using information about data
available only from the subset of attributes.
Attribute Selection
U H e adach e Mu sc l e - pai n T e m p,F l u
U1 Y e s Y e s No r m a l No
U2 Y e s Y e s Hig h Y e s
U3 Y e s Y e s V e r y - hi g h Y e s
U4 No Y e s No r m a l No
U5 No No Hig h No
U6 No Y e s V e r y - hi g h Y e s
U Mu sc l e - pai n T e m p,F l u
U1 Y e s No r m a l No
U2 Y e s Hig h Y e s
U3 Y e s V e r y - hi g h Y e s
U4 Y e s No r m a l No
U5 No Hig h No
U6 Y e s V e r y - hi g h Y e s
U H e ad ach e T e m p,F l u
U1 Y e s No r m a l No
U2 Y e s Hi g h Y e s
U3 Y e s V e ry - hi g h Y e s
U4 No No r m a l No
U5 No Hi g h No
U6 No V e ry - hi g h Y e s
The Filter Approach
Preprocessing
The main strategies of attribute selection:
– The minimal subset of attributes
– Selection of the attributes with a higher rank
Advantage
– Fast
Disadvantage
– Ignoring the performance effects of the induction
algorithm
The Wrapper Approach
Using the induction algorithm as a part of the search
evaluation function
Possible attribute subsets (N-number of attributes)
The main search methods:
– Exhaustive/Complete search
– Heuristic search
– Non-deterministic search
Advantage
– Taking into account the performance of the induction algorithm
Disadvantage
– The time complexity is high
12?N
Basic Ideas,
Attribute Selection using RSH
Take the attributes in CORE as the initial
subset.
Select one attribute each time using the rule
evaluation criterion in our rule discovery
system,GDT-RS.
Stop when the subset of selected attributes
is a reduct,
Why Heuristics?
The number of possible reducts can be
where N is the number of attributes.
Selecting the optimal reduct from all of
possible reducts is time-complex and
heuristics must be used.
12?N
The Rule Selection Criteria
in GDT-RS
Selecting the rules that cover as many
instances as possible.
Selecting the rules that contain as little
attributes as possible,if they cover the same
number of instances.
Selecting the rules with larger strengths,if
they have same number of condition
attributes and cover the same number of
instances.
Attribute Evaluation Criteria
Selecting the attributes that cause the number
of consistent instances to increase faster
– To obtain the subset of attributes as small as
possible
Selecting an attribute that has smaller number
of different values
– To guarantee that the number of instances covered
by rules is as large as possible.
Main Features of RSH
It can select a better subset of attributes
quickly and effectively from a large DB.
The selected attributes do not damage the
performance of induction so much,
An Example of
Attribute Selection
U a b c d e
u 1 1 0 2 1 1
u 2 1 0 2 0 1
u 3 1 2 0 0 2
u 4 1 2 2 1 0
u 5 2 1 0 0 2
u 6 2 1 1 0 2
u 7 2 1 2 1 1
Condition Attributes:
a,Va = {1,2}
b,Vb = {0,1,2}
c,Vc = {0,1,2}
d,Vd = {0,1}
Decision Attribute,
e,Ve = {0,1,2}
U b c d e
u 1 0 2 1 1
u 2 0 2 0 1
u 3 2 0 0 2
u 4 2 2 1 0
u 5 1 0 0 2
u 6 1 1 0 2
u 7 1 2 1 1
Searching for CORE
Removing attribute a
Removing attribute a does
not cause inconsistency.
Hence,a is not used as
CORE.
Searching for CORE (2)
Removing attribute b
U a c d e
u 1 1 2 1 1
u 2 1 2 0 1
u 3 1 0 0 2
u 4 1 2 1 0
u 5 2 0 0 2
u 6 2 1 0 2
u 7 2 2 1 1
01214
11211
,
,
edcau
edcau
Removing attribute b
cause inconsistency.
Hence,b is used as CORE.
Searching for CORE (3)
Removing attribute c
U a b d e
u 1 1 0 1 1
u 2 1 0 0 1
u 3 1 2 0 2
u 4 1 2 1 0
u 5 2 1 0 2
u 6 2 1 0 2
u 7 2 1 1 1
Removing attribute c
does not cause inconsistency.
Hence,c is not used
as CORE.
Searching for CORE (4)
Removing attribute d
U a b c e
u 1 1 0 2 1
u 2 1 0 2 1
u 3 1 2 0 2
u 4 1 2 2 0
u 5 2 1 0 2
u 6 2 1 1 2
u 7 2 1 2 1Removing attribute d
does not cause inconsistency.
Hence,d is not used
as CORE.
Searching for CORE (5)
CORE(C)={b}
Initial subset R = {b}
Attribute b is the unique indispensable
attribute.
R={b}
U a b c d e
u 1 1 0 2 1 1
u 2 1 0 2 0 1
u 3 1 2 0 0 2
u 4 1 2 2 1 0
u 5 2 1 0 0 2
u 6 2 1 1 0 2
u 7 2 1 2 1 1
10 eb
U ’ b e
u 1 0 1
u 2 0 1
u 3 2 2
u 4 2 0
u 5 1 2
u 6 1 2
u 7 1 1
The instances containing b0 will not be considered.
T T’
Attribute Evaluation Criteria
Selecting the attributes that cause the number
of consistent instances to increase faster
– To obtain the subset of attributes as small as
possible
Selecting the attribute that has smaller number
of different values
– To guarantee that the number of instances covered
by a rule is as large as possible.
Selecting Attribute from {a,c,d}
U ’ a b e
u 3 1 2 2
u 4 1 2 0
u 5 2 1 2
u 6 2 1 2
u 7 2 1 1
1,Selecting {a}
R = {a,b}
021
221
eba
eba
112
212
eba
eba
}/{
},{ )(
eUX
ba XP O S
u3,u5,u6
u4
u7
U/{e}
u3
u4
u7
U/{a,b}
u5
u6
Selecting Attribute from {a,c,d} (2)
2,Selecting {c}
R = {b,c}
121
211
201
022
202
ecb
ecb
ecb
ecb
ecb
U ’ b c e
u 3 2 0 2
u 4 2 2 0
u 5 1 0 2
u 6 1 1 2
u 7 1 2 1
u3,u5,u6
u4
u7
U/{e}
U ’ b c
u
u
u
u
u
};7,6,5,4,3{)(
}/{
},{ uuuuuXP O S
eUX
cb?
Selecting Attribute from {a,c,d} (3)
3,Selecting {d}
R = {b,d}
111
201
012
202
edb
edb
edb
edb
U ’ b d e
u 3 2 0 2
u 4 2 1 0
u 5 1 0 2
u 6 1 0 2
u 7 1 1 1u3,u5,u6
u4
u7
U/{e}
};7,6,5,4,3{)(
}/{
},{ uuuuuXP O S
eUX
db?
Selecting Attribute from {a,c,d} (4)
3,Selecting {d}
R = {b,d}
}}6,5{},3{{},/{})6,5,3({},{ uuudbuuuPO S db?
Result,Subset of attributes= {b,d}
u3,u5,u6
u4
u7
U/{e}
u3,
u4
u7
U/{b,d}
u5,u6
2}),/{})6,5,3({(m a x _ },{?dbuuuPO S s i z e db
A Heuristic Algorithm
for Attribute Selection
Let R be a set of the selected attributes,P be the
set of unselected condition attributes,U be the
set of all instances,X be the set of contradictory
instances,and EXPECT be the threshold of
accuracy,
In the initial state,R = CORE(C),
k = 0,)( DP O SUX R
),( CC O R ECP
A Heuristic Algorithm
for Attribute Selection (2)
Step 1,If k >= EXPECT,finish,otherwise
calculate the dependency degree,k,
Step 2,For each p in P,calculate
))}{/()((m a x _
|)(|
}){(
}){(
DpRDP O S s iz em
DP O S v
pRp
pRp
.|| |)(| U DP O Sk R?
where max_size denotes the cardinality of the maximal subset,
A Heuristic Algorithm
for Attribute Selection (3)
Step 3,Choose the best attribute p with the
largest and let
Step 4,Remove all consistent instances u in
from X,
Step 5,Go back to Step 1.
)( DP OS R
}.{
}{
pPP
pRR
,pp mv?
Experimental Results
D at a s et s
At trib ute
N umb e r
In stanc e
N umb e r
At tri,N,
In Co r e
Sel e c te d
At tri,N,
M onk 1 6 124 3 3
M onk 3 6 122 4 4
M us hr o om 22 8 1 2 4 0 4
B r ea s t
c a nc e r
10 699 1 4
Ea r thq ua ke 16 155 0 3
M eni ngi ti s 30 140 1 4
Ba c te r i a l
exa mi nat i on
57 2 0 9 2 0 2 9
Sl o p e-
c ol l a p se
23 3 4 3 6 6 8
Ga st r i c
c a nc e r
3 8 7 5 2 0 2 19
A Rough Set Based KDD Process
Discretization based on RS and
Boolean Reasoning (RSBR).
Attribute selection based RS
with Heuristics (RSH),
Rule discovery by GDT-RS.
Main Features of GDT-RS
Unseen instances are considered in the
discovery process,and the uncertainty of a
rule,including its ability to predict possible
instances,can be explicitly represented in the
strength of the rule.
Biases can be flexibly selected for search
control,and background knowledge can be
used as a bias to control the creation of a GDT
and the discovery process,
A Sample DB
u1 a0 b0 c1 y
u2 a0 b1 c1 y
u3 a0 b0 c1 y
u4 a1 b1 c0 n
u5 a0 b0 c1 n
u6 a0 b2 c1 y
u7 a1 b1 c1 y
Condition attributes,a,b,c
Va = {a0,a1} Vb = {b0,b1,b2} Vc = {c0,c1}
Decision attribute,d,Vd = {y,n}
U a b c d
A Sample GDT
a0b0c0 a0b0c1 … … a1b0c0 …..,a1b2c1
*b0c0
*b0c1
*b1c0
*b1c1
*b2c0
*b2c1
a0*c0
…...
a1b1*
a1b2*
**c0
…...
a0**
a1**
1/2 …… 1/2 ……
1/2 ……
……
……
……
…… 1/2
1/3 ……
…… ……
……
…… 1/2
1/6 1/6 ……
…… ……
1/6 1/6 ……
1/6 …… 1/6
G(x)
F(x)
Explanation for GDT
F(x),the possible instances (PI)
G(x),the possible generalizations (PG)
the probability relationships
between PI & PG.
:)()( xFxG?
Probabilistic Relationship
Between PIs and PGs
*}][|{ lPGlk kPG nN i
a0*c0
a0b0c0
a0b1c0
a0b2c0
3}00 bca nN{
o t h e r w i s e 0
if
1
)|(
ij
PGij
PGPI
NPGPIp
i
p = 1/3
1/3
1/3
iPGN
is the number of PI
satisfying the ith PG.
Unseen Instances
U Headache Mus cle- pain T e m p, Flu
U1 Y es Y es Nor m al No
U2 Y es Y es High Y es
U3 Y es Y es V er y - high Y es
U4 No Y es Nor m al No
U5 No No High No
U6 No Y es V er y - high Y es
Possible Instances:
yes,no,normal
yes,no,high
yes,no,very-high
no,yes,high
no,no,normal
no,no,very-high
Closed world Open world
Rule Representation
X Y with S
X denotes the conjunction of the conditions
that a concept must satisfy
Y denotes a concept that the rule describes
S is a,measure of strength” of which the
rule holds
Rule Strength (1)
The strength of the generalization X
(BK is no used),
is the number of the observed
instances satisfying the ith generalization.
kPG
kr e lins
N
PGN
l
kl
k
PGPIp
PGsXs
)(
)|(
)()(
))(1)(()( YXrXsYXS
)( kre li n s PGN?
Rule Strength (2)
The strength of the generalization X
(BK is used),
k
PG
l
kl
l
klbkk
N
PGPIB K F
PGPIpPGsXs
)|(
)|()()(
Rule Strength (3)
The rate of noises
is the number of instances
belonging to the class Y within the instances
satisfying the generalization X.
)(
),()()(
XN
YXNXN
r e li n s
c l a s si n sr e li n sYXr
),( YXN c l a ssi n s?
Rule Discovery by GDT-RS
Condition Attrs.,a,b,c
a,Va = {a0,a1}
b,Vb = {b0,b1,b2}
c,Vc = {c0,c1}
Class,d:
d,Vd = {y,n}
U a b c d
u1 a0 b0 c1 y
u2 a0 b1 c1 y
u3 a0 b0 c1 y
u4 a1 b1 c0 n
u5 a0 b0 c1 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
Regarding the Instances
(Noise Rate = 0)
U a b c d
u1,
u1 ’ u 3,
u5
a0 b0 c1
y,
y,
n
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
67.0
3
1
1)'1(
33.0
3
2
1)'1(
}{
}{
ur
ur
n
y
)'1(
)'1(
)'1(
0L e t
}{
}{
ud
Tur
Tur
T
n o i s en
n o i s ey
n o i s e
と ?
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
Generating Discernibility Vector
for u2
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
7,2
6,2
4,2
2,2
1,2
}{
},{
}{
m
bm
cam
m
bm
u1 ’ u2 u4 u6 u7
u2 b? a,c b?
Obtaining Reducts for u2
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
u1 ’ u2 u4 u6 u7
u2 b? a,c b?
)()(
)()(
)()()()2(
cbba
cab
bcabuf
T
Generating Rules from u2
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
)()()2( cbbauf T
{a0,b1} {b1,c1}
{a0b1}
a0b1c0
a0b1c1(u2)
s({a0b1}) = 0.5
{b1c1}
a0b1c1(u2)
a1b1c1(u7)
s({b1c1}) = 1
0)}11({ ycbr
y
y
0)}10({ ybar
y
Generating Rules from u2 (2)
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
1)01()
2
1
2( w i t h }11{
5.0)01()
2
1
1( w i t h }10{
Sycb
Syba
Generating Discernibility Vector
for u4
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
}{
},{
},,{
7,4
6,4
4,4
2,4
1,4
cm
m
m
cam
cbam
u1 ’ u2 u4 u6 u7
u4 a,b,c a,c c
Obtaining Reducts for u4
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
)(
)()()()4(
c
ccacbauf T
u1 ’ u2 u4 u6 u7
u4 a,b,c a,c c
Generating Rules from u4
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
{c0}
{c0}
a0b0c0
a1b1c0(u4)
0)}0({
6
1
)0(
ncr
cs
n
)()4( cuf T?
a1b2c0
Generating Rules from u4 (2)
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
1 6 7.0)01()611( w i t h }0{ Snc
Generating Rules from All
Instances
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
u2,{a0b1} y,S = 0.5
{b1c1} y,S =1
u4,{c0} n,S = 0.167
u6,{b2} n,S=0.25
u7,{a1c1} y,S=0.5
{b1c1} y,S=1
The Rule Selection Criteria
in GDT-RS
Selecting the rules that cover as many
instances as possible.
Selecting the rules that contain as little
attributes as possible,if they cover the same
number of instances.
Selecting the rules with larger strengths,if
they have same number of condition
attributes and cover the same number of
instances.
Generalization Belonging to
Class y
a0b1 c1
( y )
a1b1 c1
( y )
*b 1c1 1/2 1/2
a1 *c1 1/3
a0b1 * 1/2
{b1c1} y with S = 1 u2,u7
{a1c1} y with S = 1/2 u7
{a0b1} y with S = 1/2 u2
u2 u7
Generalization Belonging to
Class n
a0 b2 c 1
(n)
a1 b1 c 0
(n)
* *c 0 1/6
*b2* 1/4
c0 n with S = 1/6 u4
b2 n with S = 1/4 u6
u4 u6
Results from the Sample DB
(Noise Rate = 0)
Certain Rules,Instances Covered
{c0} n with S = 1/6 u4
{b2} n with S = 1/4 u6
{b1c1} y with S = 1 u2,u7
Possible Rules:
b0 y with S = (1/4)(1/2)
a0 & b0 y with S = (1/2)(2/3)
a0 & c1 y with S = (1/3)(2/3)
b0 & c1 y with S = (1/2)(2/3)
Instances Covered,u1,u3,u5
Results from the Sample DB (2)
(Noise Rate > 0)
Regarding Instances
(Noise Rate > 0)
U a b c d
u1,
u1 ’ u 3,
u5
a0 b0 c1
y,
y,
n
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
67.0
3
1
1)'1(
33.0
3
2
1)'1(
}{
}{
ur
ur
n
y
yud
Tur
T
n o i s ey
n o i s e
)'1(
)'1(
5.0L e t
}{
U a b c d
u1 ’ a0 b0 c1 y
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
Rules Obtained from All
Instacnes
U a b c d
u1 ’ a0 b0 c1 y
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
u2,{a0b1} y,S=0.5
{b1c1} y,S=1
u4,{c0} n,S=0.167
u6,{b2} n,S=0.25
u7,{a1c1} y,S=0.5
{b1c1} y,S=1
u1’:{b0} y,S=1/4*2/3=0.167
Example of Using BK
a 0 b 0 c0 a 0 b 0 c1 a 0 b 1 c0 a 0 b 1 c1 a 0 b 2 c0 a 0 b 2 c1 … a 1 b 2 c1
a 0 b 0 * 1 / 2 1 / 2
a 0 b 1 * 1 / 2 1 / 2
a 0 * c1 1 / 3 1 / 3 1 / 3
a 0 * * 1 / 6 1 / 6 1 / 6 1 / 6 1 / 6 1 / 6
BK,a0 => c1,100%
a 0 b 0 c 0 a 0 b 0 c 1 a 0 b 1 c 0 a 0 b 1 c 1 a 0 b 2 c 0 a 0 b 2 c 1 … a 1 b 2 c 1
a 0 b 0 * 0 1
a 0 b 1 * 0 1
a 0 * c 1 1 / 3 1 / 3 1 / 3
a 0 * * 0 1 / 3 0 1 / 3 0 1 / 3
Changing Strength of
Generalization by BK
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
)()()2( cbbauf T
{a0,b1} {b1,c1}
{a0b1}
a0b1c0
a0b1c1(u2)
s({a0b1}) = 0.5
0)}10({ ybar
{a0b1} a0b1c0
a0b1c1(u2)
s({a0b1}) = 1
0)}10({ ybar
1/2
1/2 100%
0%
a0 => c1,100%
Algorithm 1
Optimal Set of Rules
Step 1,Consider the instances with the same
condition attribute values as one instance,
called a compound instance.
Step 2,Calculate the rate of noises r for each
compound instance.
Step 3,Select one instance u from U and
create a discernibility vector for u.
Step 4,Calculate all reducts for the instance u
by using the discernibility function.
Algorithm 1
Optimal Set of Rules (2)
Step 5,Acquire the rules from the reducts for
the instance u,and revise the strength of
generalization of each rule.
Step 6,Select better rules from the rules (for u)
acquired in Step 5,by using the heuristics for
rule selection.
Step 7,If then go back to
Step 3,Otherwise go to Step 8.
}.{ uUU
,U
Algorithm 1
Optimal Set of Rules (3)
Step 8,Finish if the number of rules
selected in Step 6 for each instance is 1,
Otherwise find a minimal set of rules,
which contains all of the instances in the
decision table.
The Issue of Algorithm 1
It is not suitable for the database with a
large number of attributes.
Methods to Solve the Issue:
Finding a reduct (subset) of condition
attributes in a pre-processing.
Finding a sub-optimal solution using some
efficient heuristics.
Algorithm 2
Sub-Optimal Solution
Step1,Set R = {},COVERED = {},and
SS = {all instances IDs},
For each class,divide the decision table
T into two parts,current class and other
classes
Step2,From the attribute values of the
instances (where means the jth value of
attribute i,
),,SSITI kk
kI
cD
ijv
ijv
T
.?T
Algorithm 2
Sub-Optimal Solution (2)
choose a value v with the maximal number of
occurrence within the instances contained in
T+,and the minimal number of occurrence
within the instances contained in T-.
Step3,Insert v into R,
Step4,Delete the instance ID from SS if the
instance does not contain v.
Algorithm 2
Sub-Optimal Solution (3)
Step5,Go back to Step2 until the noise rate
is less than the threshold value.
Step6,Find out a minimal sub-set R’ of R
according to their strengths,Insert
into RS,Set R = {},copy the instance IDs
in SS to COVERED,and
set SS = {all instance IDs}- COVERED.
)'( cDR?
Algorithm 2
Sub-Optimal Solution (4)
Step8,Go back to Step2 until all instances
of T+ are in COVERED.
Step9,Go back to Step1 until all classes are
handled.
Time Complexity of Alg.1&2
Time Complexity of Algorithm 1:
Time Complexity of Algorithm 2:
Let n be the number of instances in a DB,
m the number of attributes,
the number of generalizations
and is less than
))(( 23 TGNmnmnO?
)( 22 nmO
)( TGN
).2( 1?mO
Experiments
DBs that have been tested,
meningitis,bacterial examination,cancer,mushroom,
slope-in-collapse,earth-quack,contents-sell,…...
Experimental methods:
– Comparing GDT-RS with C4.5
– Using background knowledge or not
– Selecting different allowed noise rates as the
threshold values
– Auto-discretization or BK-based discretization.
Experiment 1
(meningitis data)
C4.5:
(from a meningitis DB with 140 records,and
38 attributes)
V ir u sC e llP o lyn o r m a lC TF in d
V ir u sC e llMo n oC e llP o ly
B a c te r iaC e llMo n oa b n o r m a lC TF in d
B a c te r iaC e llP o ly
)220()(
)12()220(
)12()(
)220(
Experiment 1
(meningitis data) (2)
GDT-RS (auto-discretization):
b a c t e r i as t r e p t oc u l t u r e
b a c t e r i apr i s k G r o u n pa c u t eo n s e ts e i z u r e
b a c t e r i a
pr i s k G r o u n pn e g a t i v eC C o u r s ea c u t eo n s e t
b a c t e r i aC R Pms e x
b a c t e r i ac e l l M o n oa b n o r m a lc t F i n d
b a c t e r i ac e l l P o l y
)(
)()()0(
)()()(
)4()(
)15()(
)2 2 1(
Experiment 1
(meningitis data) (3)
GDT-RS (auto-discretization):
v ir u snr is kc e llMo n oc e llP o ly
v ir u sc e llP o lyn o r m a lc tF in dC R P
v ir u sc e llMo n oc e llP o lyC R P
)()15()2 2 1(
)2 2 1()()4(
)15()2 2 1()4(
Using Background Knowledge
(meningitis data)
Never occurring together:
EEGwave(normal) EEGfocus(+)
CSFcell(low) Cell_Poly(high)
CSFcell(low) Cell_Mono(high)
Occurring with lower possibility:
WBC(low) CRP(high)
WBC(low) ESR(high)
WBC(low) CSFcell(high)
Using Background Knowledge
(meningitis data) (2)
Occurring with higher possibility:
WBC(high) CRP(high)
WBC(high) ESR(high)
WBC(high) CSF_CELL(high)
EEGfocus(+) FOCAL(+)
EEGwave(+) EEGfocus(+)
CRP(high) CSF_GLU(low)
CRP(high) CSF_PRO(low)
Explanation of BK
If the brain wave (EEGwave) is normal,the
focus of brain wave (EEGfocus) is never
abnormal.
If the number of white blood cells (WBC) is
high,the inflammation protein (CRP) is also
high.
Using Background Knowledge
(meningitis data) (3)
rule1 is generated by BK
rule1:
4)/384(30
);()(
)10()5()(
ES
EV I R U SC U L TU R E
C S F c e llE S Ra c u teO N S E T
Using Background Knowledge
(meningitis data) (4)
rule2 is replaced by rule2’
rule2:
rule2’:
./30;)7,4[))((
ES
lE E G a b n o r m aL O CEV I R U SD I A G
.4)/10(;)7,4[)(
ES
lE E G a b n o r m aL O CE E G fo c u s
Experiment 2
(bacterial examination data)
Number of instances,20,000
Number of condition attributes,60
Goals:
– analyzing the relationship between the bacterium-
detected attribute and other attributes
– analyzing what attribute-values are related to the
sensitivity of antibiotics when the value of
bacterium-detected is (+).
Attribute Selection
(bacterial examination data)
Class-1:bacterium-detected (+、-)
condition attributes,11
Class-2:antibiotic-sensibility
(resistant (R),sensibility(S))
condition attributes,21
Some Results
(bacterial examination data)
Some of rules discovered by GDT-RS are
the same as C4.5,e.g.,
Some of rules can only be discovered by
GDT-RS,e.g.,
)3(l a c t a m e s e? bacterium-detected(+)
)10( 3q u a n t i t yu r i n e bacterium-detected(-)
)(1 p n e u m o n i ad i s e a s e bacterium-detected(-).
Experiment 3
(gastric cancer data)
Instances number:7520
Condition Attributes,38
Classes:
– cause of death (specially,the direct death)
– post-operative complication
Goals:
– analyzing the relationship between the direct
death and other attributes
– analyzing the relationship between the post-
operative complication and other attributes.
Result of Attribute Selection
(gastric cancer data)
Class,the direct death
sex,location_lon1,location_lon2,location_cir1,
location_cir2,serosal_inva,peritoneal_meta,
lymphnode_diss,reconstruction,pre_oper_comp1,
post_oper_comp1,histological,structural_atyp,
growth_pattern,depth,lymphatic_inva,
vascular_inva,ln_metastasis,chemotherapypos
(19 attributes are selected)
Result of Attribute Selection (2)
(gastric cancer data)
Class,post-operative complication
multi-lesions,sex,location_lon1,location_cir1,
location_cir2,lymphnode_diss,maximal_diam,
reconstruction,pre_oper_comp1,histological,
stromal_type,cellular_atyp,structural_atyp,
growth_pattern,depth,lymphatic_inva,
chemotherapypos
(17 attributes are selected)
Experiment 4
(slope-collapse data)
Instances number:3436
– (430 places were collapsed,and 3006 were not)
Condition attributes,32
Continuous attributes in condition attributes,6
– extension of collapsed steep slope,gradient,altitude,
thickness of surface of soil,No,of active fault,
distance between slope and active fault,
Goal,find out what is the reason that causes the
slope to be collapsed.
Result of Attribute Selection
(slope-collapse data)
9 attributes are selected from 32 condition
attributes:
altitude,slope azimuthal,slope shape,direction of
high rank topography,shape of transverse section,
position of transition line,thickness of surface of
soil,kind of plant,distance between slope and
active fault.
(3 continuous attributes in red color)
The Discovered Rules
(slope-collapse data)
s_azimuthal(2) ∧ s_shape(5) ∧ direction_high(8) ∧ plant_kind(3)
S = (4860/E)
altitude[21,25)∧ s_azimuthal(3) ∧ soil_thick(>=45) S = (486/E)
s_azimuthal(4) ∧ direction_high(4) ∧ t_shape(1) ∧ tl_position(2) ∧
s_f_distance(>=9) S = (6750/E)
altitude[16,17)∧ s_azimuthal(3) ∧ soil_thick(>=45)∧
s_f_distance(>=9) S = (1458/E)
altitude[20,21)∧ t_shape(3) ∧ tl_position(2) ∧ plant_kind(6) ∧
s_f_distance(>=9) S = (12150/E)
altitude[11,12)∧ s_azimuthal(2) ∧ tl_position(1) S = (1215/E)
altitude[12,13)∧ direction_high(9) ∧ tl_position(4) ∧ s_f_distance[8,9)
S = (4050/E)
altitude[12,13)∧ s_azimuthal(5) ∧ t_shape(5) ∧ s_f_distance[8,9)
S = (3645/E)
…...
Other Methods for Attribute
Selection
(download from http://www.iscs/nus.edu.sg/liuh/)
LVW,A stochastic wrapper feature selection algorithm
LVI,An incremental multivariate feature selection
algorithm
WSBG/C4.5,Wrapper of sequential backward
generation
WSFG/C4.5,Wrapper of sequential forward
generation
Rule induction system,C4.5
Executing times,10
Class,direct death
Number of selected attributes for each time:
20,19,21,26,22,31,21,19,31,28
Result-2 (19 attributes are selected):
multilesions,sex,location_lon3,location_cir4,
liver_meta,lymphnode_diss,proximal_surg,resection_meth,
combined_rese2,reconstruction,pre_oper_comp1,
post_oper_com2,post_oper_com3,spec_histologi,cellular_atyp,
depth,eval_of_treat,ln_metastasis,othertherapypre
Results of LVW
Result-2 (19 attributes are selected):
age,typeofcancer,location_cir3,location_cir4,
liver_meta,lymphnode_diss,maximal_diam,
distal_surg,combined_rese1,combined_rese2,
pre_oper_comp2,post_oper_com1,histological,
spec_histologi,structural_atyp,depth,lymphatic_inva,
vascular_inva,ln_metastasis
(only the attributes in red color are selected by our method)
Result of LVW (2)
Result of WSFG
Rule induction system:
C4.5
Results
the best relevant attribute first
Result of WSFG (2)
(class,direct death)
eval_of_treat,liver_meta,peritoneal_meta,typeofcancer,
chemotherapypos,combined_rese1,ln_metastasis,location_lon2,
depth,pre_oper_comp1,histological,growth_pattern,vascular_inva,
location_cir1,location_lon3,cellular_atyp,maximal_diam,
pre_oper_comp2,location_lon1,location_cir3,sex,post_oper_com3,
age,serosal_inva,spec_histologi,proximal_surg,location_lon4,
chemotherapypre,lymphatic_inva,lymphnode_diss,structural_atyp,
distal_surg,resection_meth,combined_rese3,chemotherapyin,
location_cir4,post_oper_comp1,stromal_type,combined_rese2,
othertherapypre,othertherapyin,othertherapypos,reconstruction,
multilesions,location_cir2,pre_oper_comp3
( the best relevant attribute first)
Result of WSBG
Rule induction system,
C4.5
Result
the least relevant attribute first
Result of WSBG (2)
(class,direct death)
peritoneal_meta,liver_meta,eval_of_treat,lymphnode_diss,
reconstruction,chemotherapypos,structural_atyp,typeofcancer,
pre_oper_comp1,maximal_diam,location_lon2,combined_rese3,
othertherapypos,post_oper_com3,stromal_type,cellular_atyp,
resection_meth,location_cir3,multilesions,location_cir4,
proximal_surg,location_cir1,sex,lymphatic_inva,location_lon4,
location_lon1,location_cir2,distal_surg,post_oper_com2,
location_lon3,vascular_inva,combined_rese2,age,pre_oper_comp2,
ln_metastasis,serosal_inva,depth,growth_pattern,combined_rese1,
chemotherapyin,spec_histologi,post_oper_com1,chemotherapypre,
pre_oper_comp3,histological,othertherapypre
Result of LVI
(gastric cancer data)
Number of allowed
inconsistent instances
Executing
times
Number of
inconsistent instances
Number of
selected
attributes
80
20
1
2
3
4
5
1
2
3
4
5
79
68
49
61
66
7
19
19
20
18
19
16
20
18
20
49
26
28
23
26
Some Rules
Related to Direct Death
peritoneal_meta(2) ∧ pre_oper_comp1(.) ∧ post_oper_com1(L) ∧
chemotherapypos(.) S= 3*(7200/E)
location_lon1(M) ∧ post_oper_com1(L) ∧ ln_metastasis(3) ∧
chemotherapypos(.) S= 3*(2880/E)
sex(F) ∧ location_cir2(.) ∧ post_oper_com1(L) ∧ growth_pattern(2) ∧
chemotherapypos(.) S= 3*(7200/E)
location_cir1(L) ∧ location_cir2(.) ∧ post_oper_com1(L) ∧
ln_metastasis(2) ∧ chemotherapypos(.) S= 3*(25920/E)
pre_oper_comp1(.) ∧ post_oper_com1(L) ∧ histological(MUC) ∧
growth_pattern(3) ∧ chemotherapypos(.) S= 3*(64800/E)
sex(M) ∧ location_lon1(M) ∧ reconstruction(B2) ∧ pre_oper_comp1(.)
∧ structural_atyp(3) ∧ lymphatic_inva(3) ∧ vascular_inva(0) ∧
ln_metastasis(2) S=3*(345600/E)
sex(F) ∧ location_lon2(M) ∧ location_cir2(.) ∧ pre_oper_comp1(A) ∧
depth(S2) ∧ chemotherapypos(.) S= 3*(46080/E)
GDT-RS vs,Discriminant
Analysis
if -then rules
multi-class,high-
dimension,large-scale
data can be processed
BK can be used easily
the stability and
uncertainty of a rule
can be expressed
explicitly
continuous data must
be discretized,
algebraic expressions
difficult to deal with the
data with multi-class.
difficult to use BK
the stability and
uncertainty of a rule
cannot be explained
clearly
symbolic data must be
quantized,
GDT-RS vs,ID3 (C4.5)
BK can be used easily
the stability and
uncertainty of a rule
can be expressed
explicitly
unseen instances are
considered
the minimal set of
rules containing all
instances can be
discovered
difficult to use BK
the stability and
uncertainty of a rule
cannot be explained
clearly
unseen instances are not
considered
not consider whether the
discovered rules are the
minimal set covered all
instances
Rough Sets in ILP and GrC
-- An Advanced Topic --
Background and goal
The normal problem setting for ILP
Issues,observations,and solutions
Rough problem settings
Future work on RS (GrC) in ILP
ILP,Inductive Logic Programming
GrC,Granule Computing
Advantages of ILP
(Compared with Attribute-Value Learning)
It can learn knowledge which is more
expressive because it is in predicate logic
It can utilize background knowledge more
naturally and effectively because in ILP the
examples,the background knowledge,as
well as the learned knowledge are all
expressed within the same logic framework,
Weak Points of ILP
(Compared with Attribute-Value Learning)
It is more difficult to handle numbers
(especially continuous values) prevailing in
real-world databases.
The theory,techniques are much less
mature for ILP to deal with imperfect data
(uncertainty,incompleteness,vagueness,
impreciseness,etc,in examples,background
knowledge as well as the learned rules),
Goal
Applying Granular Computing (GrC) and a
special form of GrC,Rough Sets to ILP to
deal with some kinds of imperfect data
which occur in large real-world applications,
Normal Problem Setting for ILP
Given:
– The target predicate p
– The positive examples and the negative
examples (two sets of ground atoms of p)
– Background knowledge B (a finite set of
definite clauses)
E
E
Normal Problem Setting for ILP (2)
To find:
– Hypothesis H (the defining clauses of p) which is
correct with respect to and,i.e.
1,is complete with respect to
(i.e,)
We also say that covers all positive examples,
2,is consistent with respect to
(i.e,)
We also say that rejects any negative examples,
E?E
BH?
eBHEe |
BH?
BH?
eBHEe |
BH?
E
E
Normal Problem Setting for ILP (3)
Prior conditions:
1’,B is not complete with respect to
(Otherwise there will be no learning task at all)
2’,is consistent with respect to
(Otherwise there will be no solution)
Everything is assumed correct and perfect,
E
E EB
Issues
In large,real-world empirical learning,
uncertainty,incompleteness,vagueness,
impreciseness,etc,are frequently observed
in training examples,in background
knowledge,as well as in the induced
hypothesis.
Too strong bias may miss some useful
solutions or have no solution at all.
Imperfect Data in ILP
Imperfect output
– Even the input (Examples and BK) are,perfect”,
there are usually several Hs that can be induced.
– If the input is imperfect,we have imperfect
hypotheses,
Noisy data
– Erroneous argument values in examples.
– Erroneous classification of examples as
belonging to or?E,?E
Imperfect Data in ILP (2)
Too sparse data
– The training examples are too sparse to induce
reliable H.
Missing data
– Missing values,some arguments of some
examples have unknown values.
– Missing predicates,BK lacks essential predicates
(or essential clauses of some predicates) so that
no non-trivial H can be induced.
Imperfect Data in ILP (3)
Indiscernible data
– Some examples belong to both and
This presentation will focus on
(1) Missing predicates
(2) Indiscernible data
E,?E
Observations
H should be,correct with respect to
and,needs to be relaxed,otherwise
there will be no (meaningful) solutions to
the ILP problem.
While it is impossible to differentiate
distinct objects,we may consider granules,
sets of objects drawn together by similarity,
indistinguishability,or functionality.
E
E
Observations (2)
Even when precise solutions in terms of
individual objects can be obtained,we may still
prefect to granules in order to have an efficient
and practical solution.
When we use granules instead of individual
objects,we are actually relaxing the strict
requirements in the standard normal problem
setting for ILP,so that rough but useful
hypotheses can be induced from imperfect data,
Solution
Granular Computing (GrC) can pay an
important role in dealing with imperfect data
and/or too strong bias in ILP.
GrC is a superset of various theories (such as
rough sets,fuzzy sets,interval computation)
used to handle incompleteness,uncertainty,
vagueness,etc,in information systems
(Zadeh,1997).
Why GrC?
A Practical Point of View
With incomplete,uncertain,or vague
information,it may be difficult to
differentiate some elements and one is forced
to consider granules.
It may be sufficient to use granules in order
to have an efficient and practical solution.
The acquisition of precise information is too
costly,and coarse-grained information
reduces cost,
Solution (2)
Granular Computing (GrC) may be regarded as
a label of theories,methodologies,techniques,
and tools that make use of granules,i.e.,groups,
classes,or clusters of a universe,in the process
of problem solving,
We use a special form of GrC,rough sets to
provide a,rough” solution,
Rough Sets
Approximation space A = (U,R)
U is a set (called the universe)
R is an equivalence relation on U (called an
indiscernibility relation).
In fact,U is partitioned by R into
equivalence classes,elements within an
equivalence class are indistinguishable in A,
Rough Sets (2)
Lower and upper approximations,For an
equivalence relation R,the lower and upper
approximations of are defined by
where denotes the equivalence class containing x.
UX?
}][|{][)(
][
XxUxxXA p r R
Xx
RA
R
}][|{][)(
][
XxUxxXA p r R
Xx
RA
R
Rx][
Rough Sets (3)
Boundary.
is called the boundary of X in A.
Rough membership,
– elements x surely belongs to X in A if
– elements x possibly belongs to X in A if
– elements x surely does not belong to X in A if
)()()( XA p rXA p rXB n d AAA
)( XA p rx A?
)( XAp rx A?
).( XA p rx A?
An Illustrating Example
The target predicate:
customer(Name,Age,Sex,Income)
The negative examples
customer(c,50,female,2).
customer(g,20,male,2).
The positive examples
customer(a,30,female,1).
customer(b,53,female,100).
customer(d,50,female,2).
customer(e,32,male,10).
customer(f,55,male,10).
Background knowledge B defining married_to(H,W)
by married_to(e,a),married_to(f,d).
Given:
An Illustrating Example (2)
Hypothesis H (customer/4) which is correct with
respect to and
customer(N,A,S,I),- I >= 10.
customer(N,A,S,I),- married_to(N’,N),
customer(N’,A’,S’,I').
To find:
The normal problem setting is perfectly suitable for this
problem,and an ILP system can induce the following
hypothesis H defining customer/4,
E,?E
Rough Problem Setting for
Insufficient BK
Problem,If married_to/2 is missing in BK,
no hypothesis will be induced.
Solution,Rough Problem Setting 1.
Given:
– The target predicate p
(the set of all ground atoms of p is U).
– An equivalence relation R on U
(we have the approximation space A= (U,R)).
– and satisfying the prior condition,
is consistent with respect to,
– BK,B (may lack essential predicates/clauses).
UE UE
EB?E
Rough Problem Setting for
Insufficient BK (2)
Considering the following rough sets:
containing all positive
examples,and those negative examples
containing the,pure”
(remaining) negative examples.
containing,pure” positive
examples,That is,where
)( EAp rE A
)( EA p rE A
EEE
EEE
}R e '|'{ eEeE Ee
}R e '|{ ' eEeE Ee
Rough Problem Setting for
Insufficient BK (3)
containing all negative
examples and,non-pure” positive examples.
To find:
Hypothesis (the defining clauses of p)
which is correct with respect to and
i.e.
1,covers all examples of
2,rejects any examples of
EEE
E
H
EE
E
BH
BH
Rough Problem Setting for
Insufficient BK (4)
Hypothesis (the defining clauses of p)
which is correct with respect to and
i.e.
1,covers all examples of
2,rejects any examples of
BH
BH
EE
E
E
H
Example Revisited
Married_to/2 is missing in B,Let R be defined as
“customer(N,A,S,I) R customer(N’,A,S,I)”,with
the Rough Problem Setting 1,we may induce as:
customer(N,A,S,I),- I >= 10.
customer(N,A,S,I),- S = female.
which covers all positive examples and the negative
example,customer(c,50,female,2)”,rejecting other
negative examples,
H
Example Revisited (2)
We may also induce as:
customer(N,A,S,I),- I >= 10.
customer(N,A,S,I),- S = female,A < 50.
which covers all positive examples except
“customer(d,50,female,2)”,
rejecting all negative examples,
H
Example Revisited (3)
These hypotheses are rough (because the
problem itself is rough),but still useful.
On the other hand,if we insist in the normal
problem setting for ILP,these hypothese are
not considered as,solutions”.
Rough Problem Setting for
Indiscernible Examples
Problem,Consider customer(Age,Sex,Income),
we have customer(50,female,2) belonging to
as well as to
Solution,Rough Problem Setting 2.
Given:
– The target predicate p (the set of all ground atoms of
p is U).
– and where
– Background knowledge B.
UE UE EE
E,?E
Rough Problem Setting for
Indiscernible Examples (2)
Rough sets to consider and the hypotheses
to find:
Taking the identity relation I as a special
equivalence relation R,the remaining
description of Rough Problem Setting 2 is
the same as in Rough Problem Setting 1,
Rough Sets (GrC) for Other
Imperfect Data in ILP
Noisy data
Too sparse data
Missing values
Discretization of continuous values
Future Work on RS (GrC) in ILP
Trying to find more concrete formalisms and
methods to deal with noisy data,too sparse
data,missing values,etc,in ILP within the
framework of RS (GrC).
Giving quantitative measures associated with
hypotheses induced of ILP,either in its normal
problem setting or in the new rough setting.
Developing ILP algorithms and systems based
on rough problem settings,
Summary
Rough sets offers mathematical tools and
constitutes a sound basis for KDD.
We introduced the basic concepts of
(classical) rough set theory,
We described a rough set based KDD process.
We discussed the problem of imperfect data
handling in ILP using some ideas,concepts
and methods of GrC (or a particular form of
GrC,Rough Sets).
Advanced Topics
(to deal with real world problems)
Recent extensions of rough set theory
(rough mereology,approximate synthesis
of objects) have developed new methods for
decomposition of large datasets,data
mining in distributed and multi-agent
systems,and fusion of information granules
to induce complex information granule
approximation.
Advanced Topics (2)
(to deal with real world problems)
Combining rough set theory with logic
(including non-classical logic),ANN,GA,
probabilistic and statistical reasoning,fuzzy
set theory to construct a hybrid approach,
References and Further Readings
Z,Pawlak,“Rough Sets”,International Journal of Computer
and Information Sciences,Vol.11,341-356 (1982).
Z,Pawlak,Rough Sets - Theoretical Aspect of Reasoning about
Data,Kluwer Academic Publishers (1991).
L,Polkowski and A,Skowron (eds.) Rough Sets in Knowledge
Discovery,Vol.1 and Vol.2.,Studies in Fuzziness and Soft
Computing series,Physica-Verlag (1998).
L,Polkowski and A,Skowron (eds.) Rough Sets and Current
Trends in Computing,LNAI 1424,Springer (1998).
T.Y,Lin and N,Cercone (eds.),Rough Sets and Data Mining,
Kluwer Academic Publishers (1997).
K,Cios,W,Pedrycz,and R,Swiniarski,Data Mining Methods
for Knowledge Discovery,Kluwer Academic Publishers (1998).
References and Further Readings
R,Slowinski,Intelligent Decision Support,Handbook of
Applications and Advances of the Rough Sets Theory,Kluwer
Academic Publishers (1992).
S.K,Pal and S,Skowron (eds.) Rough Fuzzy Hybridization,A
New Trend in Decision-Making,Springer (1999).
E,Orlowska (ed.) Incomplete Information,Rough Set Analysis,
Physica-Verlag (1997).
S,Tsumolto,et al,(eds.) Proceedings of the 4th International
Workshop on Rough Sets,Fuzzy Sets,and Machine Discovery,
The University of Tokyo (1996).
J,Komorowski and S,Tsumoto (eds.) Rough Set Data Analysis
in Bio-medicine and Public Health,Physica-Verlag (to appear).
References and Further Readings
W,Ziarko,“Discovery through Rough Set Theory”,Knowledge
Discovery,viewing wisdom from all perspectives,
Communications of the ACM,Vol.42,No,11 (1999).
W,Ziarko (ed.) Rough Sets,Fuzzy Sets,and Knowledge
Discovery,Springer (1993).
J,Grzymala-Busse,Z,Pawlak,R,Slowinski,and W,Ziarko,
“Rough Sets”,Communications of the ACM,Vol.38,No,11
(1999).
Y.Y,Yao,“A Comparative Study of Fuzzy Sets and Rough
Sets”,Vol.109,21-47,Information Sciences (1998).
Y.Y,Yao,“Granular Computing,Basic Issues and Possible
Solutions”,Proceedings of JCIS 2000,Invited Session on
Granular Computing and Data Mining,Vol.1,186-189 (2000).
References and Further Readings
N,Zhong,A,Skowron,and S,Ohsuga (eds.),New Directions in
Rough Sets,Data Mining,and Granular-Soft Computing,LNAI
1711,Springer (1999).
A,Skowron and C,Rauszer,“The Discernibility Matrices and
Functions in Information Systems”,in R,Slowinski (ed)
Intelligent Decision Support,Handbook of Applications and
Advances of the Rough Sets Theory,331-362,Kluwer (1992).
A,Skowron and L,Polkowski,“Rough Mereological Foundations
for Design,Analysis,Synthesis,and Control in Distributive
Systems”,Information Sciences,Vol.104,No.1-2,129-156,
North-Holland (1998).
C,Liu and N,Zhong,“Rough Problem Settings for Inductive
Logic Programming”,in N,Zhong,A,Skowron,and S,Ohsuga
(eds.),New Directions in Rough Sets,Data Mining,and
Granular-Soft Computing,LNAI 1711,168-177,Springer (1999).
References and Further Readings
J.Z,Dong,N,Zhong,and S,Ohsuga,“Rule Discovery by
Probabilistic Rough Induction”,Journal of Japanese Society for
Artificial Intelligence,Vol.15,No.2,276-286 (2000),
N,Zhong,J.Z,Dong,and S,Ohsuga,“GDT-RS,A Probabilistic
Rough Induction System”,Bulletin of International Rough Set
Society,Vol.3,No.4,133-146 (1999).
N,Zhong,J.Z,Dong,and S,Ohsuga,“Using Rough Sets with
Heuristics for Feature Selection”,Journal of Intelligent
Information Systems (to appear).
N,Zhong,J.Z,Dong,and S,Ohsuga,“Soft Techniques for Rule
Discovery in Data”,NEUROCOMPUTING,An International
Journal,Special Issue on Rough-Neuro Computing (to appear).
References and Further Readings
H.S,Nguyen and S.H,Nguyen,“Discretization Methods in Data
Mining”,in L,Polkowski and A,Skowron (eds.) Rough Sets in
Knowledge Discovery,Vol.1,451-482,Physica-Verlag (1998).
T.Y,Lin,(ed.) Journal of Intelligent Automation and Soft
Computing,Vol.2,No,2,Special Issue on Rough Sets (1996),
T.Y,Lin (ed.) International Journal of Approximate Reasoning,
Vol.15,No,4,Special Issue on Rough Sets (1996).
Z,Ziarko (ed.) Computational Intelligence,An International
Journal,Vol.11,No,2,Special Issue on Rough Sets (1995).
Z,Ziarko (ed.) Fundamenta Informaticae,An International
Journal,Vol.27,No,2-3,Special Issue on Rough Sets (1996).
References and Further Readings
A,Skowron et al,(eds.) NEUROCOMPUTING,An
International Journal,Special Issue on Rough-Neuro
Computing (to appear).
A,Skowron,N,Zhong,and N,Cercone (eds.) Computational
Intelligence,An International Journal,Special Issue on
Rough Sets,Data Mining,and Granular Computing
(to appear).
J,Grzymala-Busse,R,Swiniarski,N,Zhong,and Z,Ziarko
(eds.) International Journal of Applied Mathematics and
Computer Science,Special Issue on Rough Sets and Its
Applications (to appear).
Related Conference and Web
Pages
RSCTC’2000 will be held in October 16-19,
Banff,Canada
http://www.cs.uregina.ca/~yyao/RSCTC200/
International Rough Set Society
http://www.cs.uregina.ca/~yyao/irss/bulletin.html
BISC/SIG-GrC
http://www.cs.uregina.ca/~yyao/GrC/
Thank You!
Tutorial Notes
Andrzej Skowron
Warsaw University
skowron@mimuw.eud.pl
Ning Zhong
Maebashi Institute of Technolgy
zhong@maebashi-it.ac.jp
Copyright 2000 by A,Skowron & N,Zhong
About the Speakers
Andrzej Skowron received his Ph.D,from Warsaw University,
He is a professor in Faculty of Mathematics,Computer Science and
Mechanics,Warsaw University,Poland,His research interests
include soft computing methods and applications,in particular,
reasoning with incomplete information,approximate reasoning,
rough sets,rough mereology,granular computing,synthesis and
analysis of complex objects,intelligent agents,knowledge discovery
and data mining,etc,with over 200 journal and conference
publications,He is an editor of several international journals and
book series including Fundamenta Informaticae (editor in chief),
Data Mining and Knowledge Discovery,He is president of
International Rough Set Society,He was an invited speaker at many
international conferences,and has served or is currently serving on
the program committees of over 40 international conferences and
workshops,including ISMIS’97-99 (program chair),RSCTC’98-00
(program chair),RSFDGrC’99 (program chair).
About the Speakers (2)
Ning Zhong received his Ph.D,from the University of Tokyo,
He is head of Knowledge Information Systems Laboratory,and an
associate professor in Department of Information Engineering,
Maebashi Institute of Technology,Japan,His research interests
include knowledge discovery and data mining,rough sets and
granular-soft computing,intelligent agents and databases,Web-
intelligence (WI),knowledge information systems,with over 80
journal and conference publications,He is an editor of Knowledge
and Information Systems,an international journal (Springer),He is
a member of the advisory board of International Rough Set Society,
the Steering Committee of IEEE International Conferences on Data
Mining,and PAKDD conferences,the advisory board of
BISC/SIGGrC,He has served or is currently serving on the program
committees of over 30 international conferences and workshops,
including PAKDD’99 (program chair),IAT’99 and 2001 (program
chair),RSFDGrC’99 (program chair),and WI’2001(program chair),
Contents
Introduction
Basic Concepts of Rough Sets
A Rough Set Based KDD process
Rough Sets in ILP and GrC
Concluding Remarks
(Summary,Advanced Topics,References
and Further Readings),
Introduction
Rough set theory was developed by
Zdzislaw Pawlak in the early 1980’s.
Representative Publications:
– Z,Pawlak,“Rough Sets”,International Journal
of Computer and Information Sciences,Vol.11,
341-356 (1982).
– Z,Pawlak,Rough Sets - Theoretical Aspect of
Reasoning about Data,Kluwer Academic
Pubilishers (1991).
Introduction (2)
The main goal of the rough set analysis is
induction of approximations of concepts.
Rough sets constitutes a sound basis for
KDD,It offers mathematical tools to
discover patterns hidden in data.
It can be used for feature selection,feature
extraction,data reduction,decision rule
generation,and pattern extraction
(templates,association rules) etc.
identifies partial or total dependencies in
Introduction (3)
Recent extensions of rough set theory (rough
mereology) have developed new methods for
decomposition of large data sets,data mining
in distributed and multi-agent systems,and
granular computing.
This presentation shows how several aspects of
the above problems are solved by the (classic)
rough set approach,discusses some advanced
topics,and gives further research directions.
Basic Concepts of Rough Sets
Information/Decision Systems (Tables)
Indiscernibility
Set Approximation
Reducts and Core
Rough Membership
Dependency of Attributes
Information Systems/Tables
IS is a pair (U,A)
U is a non-empty
finite set of objects.
A is a non-empty finite
set of attributes such
that for
every
is called the value
set of a.
aVUa?:
.Aa?
aV
Age LEMS
x1 16-30 50
x2 16-30 0
x3 31-45 1-25
x4 31-45 1-25
x5 46-60 26-49
x6 16-30 26-49
x7 46-60 26-49
Decision Systems/Tables
DS,
is the decision
attribute (instead of one
we can consider more
decision attributes).
The elements of A are
called the condition
attributes.
Age LEMS Walk
x1 16-30 50 yes
x2 16-30 0 no
x3 31-45 1-25 no
x4 31-45 1-25 yes
x5 46-60 26-49 no
x6 16-30 26-49 yes
x7 46-60 26-49 no
}){,( dAUT
Ad?
Issues in the Decision Table
The same or indiscernible objects may be
represented several times.
Some of the attributes may be superfluous.
Indiscernibility
The equivalence relation
A binary relation which is
reflexive (xRx for any object x),
symmetric (if xRy then yRx),and
transitive (if xRy and yRz then xRz),
The equivalence class of an element
consists of all objects such that
xRy.
XXR
Xx? Xy?
Rx][
Indiscernibility (2)
Let IS = (U,A) be an information system,then
with any there is an associated equivalence
relation:
where is called the B-indiscernibility
relation.
If then objects x and x’ are
indiscernible from each other by attributes from B.
The equivalence classes of the B-indiscernibility
relation are denoted by
AB?
)}'()(,|)',{()( 2 xaxaBaUxxBI N D IS
)( BI N D IS
),()',( BI N Dxx IS?
.][ Bx
An Example of Indiscernibility
The non-empty subsets of
the condition attributes
are {Age},{LEMS},and
{Age,LEMS}.
IND({Age}) = {{x1,x2,x6},
{x3,x4},{x5,x7}}
IND({LEMS}) = {{x1},
{x2},{x3,x4},{x5,x6,x7}}
IND({Age,LEMS}) =
{{x1},{x2},{x3,x4},
{x5,x7},{x6}},
Age LEMS Walk
x1 16-30 50 yes
x2 16-30 0 no
x3 31-45 1-25 no
x4 31-45 1-25 yes
x5 46-60 26-49 no
x6 16-30 26-49 yes
x7 46-60 26-49 no
Observations
An equivalence relation induces a partitioning
of the universe.
The partitions can be used to build new subsets
of the universe.
Subsets that are most often of interest have the
same value of the decision attribute.
It may happen,however,that a concept such as
“Walk” cannot be defined in a crisp manner.
Set Approximation
Let T = (U,A) and let and
We can approximate X using only the
information contained in B by constructing
the B-lower and B-upper approximations of
X,denoted and respectively,where
AB?,UX?
XB XB
},][|{ XxxXB B
}.][|{ XxxXB B
Set Approximation (2)
B-boundary region of X,
consists of those objects that we cannot
decisively classify into X in B.
B-outside region of X,
consists of those objects that can be with
certainty classified as not belonging to X.
A set is said to be rough if its boundary
region is non-empty,otherwise the set is
crisp.
,)( XBXBXBN B
,XBU?
An Example of Set Approximation
Let W = {x | Walk(x) = yes},
The decision class,Walk,is
rough since the boundary
region is not empty.
Age LEMS Walk
x1 16-30 50 yes
x2 16-30 0 no
x3 31-45 1-25 no
x4 31-45 1-25 yes
x5 46-60 26-49 no
x6 16-30 26-49 yes
x7 46-60 26-49 no
}.7,5,2{
},4,3{)(
},6,4,3,1{
},6,1{
xxxWAU
xxWBN
xxxxWA
xxWA
A
An Example of
Set Approximation (2)
yes
yes/no
no
{{x1},{x6}}
{{x3,x4}}
{{x2},{x5,x7}}
AW
WA
U
setX
U/R
R,subset of
attributes
XR
XXR?
Lower & Upper Approximations
Lower & Upper Approximations
(2)
}:/{ XYRUYXR
}:/{ XYRUYXR?
Lower Approximation:
Upper Approximation:
Lower & Upper Approximations
(3)
X1 = {u | Flu(u) = yes}
= {u2,u3,u6,u7}
RX1 = {u2,u3}
= {u2,u3,u6,u7,u8,u5}
X2 = {u | Flu(u) = no}
= {u1,u4,u5,u8}
RX2 = {u1,u4}
= {u1,u4,u5,u8,u7,u6}X1R X2R
U He adac h e T e m p,Fl u
U1 Y e s N o r m a l No
U2 Y e s H ig h Y e s
U3 Y e s V e r y - hi g h Y e s
U4 No N o r m a l No
U5 N o H i g h N o
U6 No V e ry - h igh Y e s
U7 N o H i g h Y e s
U8 No V e ry - h igh No
The indiscernibility classes defined by
R = {Headache,Temp.} are
{u1},{u2},{u3},{u4},{u5,u7},{u6,u8}.
Lower & Upper Approximations
(4)
R = {Headache,Temp.}
U/R = { {u1},{u2},{u3},{u4},{u5,u7},{u6,u8}}
X1 = {u | Flu(u) = yes} = {u2,u3,u6,u7}
X2 = {u | Flu(u) = no} = {u1,u4,u5,u8}
RX1 = {u2,u3}
= {u2,u3,u6,u7,u8,u5}
RX2 = {u1,u4}
= {u1,u4,u5,u8,u7,u6}
X1R
X2R
u1
u4u3
X1 X2
u5u7u2
u6 u8
Properties of Approximations
YX
YBXBYXB
YBXBYXB
UUBUB BB
XBXXB
)()()(
)()()(
)()(,)()(
)(
)()( YBXB? )()( YBXB?implies and
Properties of Approximations (2)
)())(())((
)())(())((
)()(
)()(
)()()(
)()()(
XBXBBXBB
XBXBBXBB
XBXB
XBXB
YBXBYXB
YBXBYXB
where -X denotes U - X.
Four Basic Classes of Rough Sets
X is roughly B-definable,iff and
X is internally B-undefinable,iff
and
X is externally B-undefinable,iff
and
X is totally B-undefinable,iff
and
)( XB
,)( UXB?
)( XB
,)( UXB?
)( XB
,)( UXB?
)( XB
.)( UXB?
Accuracy of Approximation
where |X| denotes the cardinality of
Obviously
If X is crisp with respect to B.
If X is rough with respect to B.
|)(|
|)(|)(
XB
XBX
B
.X
.10 B?
,1)(?XB?
,1)(?XB?
Issues in the Decision Table
The same or indiscernible objects may be
represented several times.
Some of the attributes may be superfluous
(redundant).
That is,their removal cannot worsen the
classification,
Reducts
Keep only those attributes that preserve the
indiscernibility relation and,consequently,
set approximation,
There are usually several such subsets of
attributes and those which are minimal are
called reducts.
Dispensable & Indispensable
Attributes
Let
Attribute c is dispensable in T
if,otherwise
attribute c is indispensable in T,
.Cc?
)()( }){( DPO SDPO S cCC
XCDP O S
DUX
C?
/
)(
The C-positive region of D:
Independent
T = (U,C,D) is independent
if all are indispensable in T.Cc?
Reduct & Core
The set of attributes is called a reduct
of C,if T’ = (U,R,D) is independent and
The set of all the condition attributes
indispensable in T is denoted by CORE(C).
where RED(C) is the set of all reducts of C.
CR?
).()( DP O SDP O S CR?
)()( CR E DCC OR E
An Example of Reducts & Core
U H e ad ach e M u s c l e
pa i n
T e m p,F l u
U1 Y e s Y e s No r m a l No
U2 Y e s Y e s Hi g h Y e s
U3 Y e s Y e s V e r y - hi g h Y e s
U4 No Y e s No r m a l No
U5 No No Hi g h No
U6 No Y e s V e r y - hi g h Y e s
U M u s c l e
pa i n
T e m p,F l u
U 1,U 4 Y e s No r m a l No
U2 Y e s Hi g h Y e s
U3,U6 Y e s V e ry - hi g h Y e s
U5 No Hi g h No
U H e ad ac h e T e m p,F l u
U1 Y e s No rl m a l No
U2 Y e s Hi g h Y e s
U3 Y e s V e ry - hi g h Y e s
U4 No No r m a l No
U5 No Hi g h No
U6 No V e ry - hi g h Y e s
Reduct1 = {Muscle-pain,Temp.}
Reduct2 = {Headache,Temp.}
CORE = {Headache,Temp}
{MusclePain,Temp}
= {Temp}
Discernibility Matrix
(relative to positive region)
Let T = (U,C,D) be a decision table,with
By a discernibility matrix of T,denoted M(T),
we will mean matrix defined as:
for i,j = 1,2,…,n such that or belongs to
the C-positive region of D.
is the set of all the condition attributes that
classify objects ui and uj into different classes.
}.,.,,,,{ 21 nuuuU?
nn?
)]()([)}()(:{ )]()([ jiji
ji
udud Dd if ucuc Cc
udud Dd if ijm
iu ju
ijm
Discernibility Matrix
(relative to positive region) (2)
The equation is similar but conjunction is taken
over all non-empty entries of M(T) corresponding
to the indices i,j such that
or belongs to the C-positive region of D.
denotes that this case does not need to be
considered,Hence it is interpreted as logic truth.
All disjuncts of minimal disjunctive form of this
function define the reducts of T (relative to the
positive region),
iu ju
ijm
Discernibility Function
(relative to objects)
For any,Uu
i?
}},.,,,2,1{,:{)( njijmuf ijjiT
where (1) is the disjunction of all variables a
such that if
(2) if
(3) if
ijm?
,ijma?,ijm
),( f a l s em ij,ijm
),( t r u etm ij,ijm
Each logical product in the minimal disjunctive normal
form (DNF) defines a reduct of instance,
iu
Examples of Discernibility Matrix
No a b c d
u1 a0 b1 c1 y
u2 a1 b1 c0 n
u3 a0 b2 c1 n
u4 a1 b1 c1 y
C = {a,b,c}
D = {d}
In order to discern equivalence
classes of the decision attribute d,
to preserve conditions described
by the discernibility matrix for
this table
u1 u2 u3
u2
u3
u4
a,c
b
c a,b
Reduct = {b,c}
cb
bacbca
)()(
Examples of Discernibility Matrix
(2)
a b c d E
u 1 1 0 2 1 1
u 2 1 0 2 0 1
u 3 1 2 0 0 2
u 4 1 2 2 1 0
u 5 2 1 0 0 2
u 6 2 1 1 0 2
u 7 2 1 2 1 1
u1 u2 u3 u4 u5 u6
u2
u3
u4
u5
u6
u7
b,c,d b,c
b b,d c,d
a,b,c,d a,b,c a,b,c,d
a,b,c,d a,b,c a,b,c,d
a,b,c,d a,b c,d c,d
Core = {b}
Reduct1 = {b,c}
Reduct2 = {b,d}
Rough Membership
The rough membership function quantifies
the degree of relative overlap between the
set X and the equivalence class to
which x belongs,
The rough membership function can be
interpreted as a frequency-based estimate of
where u is the equivalence class
of IND(B).
Bx][
]1,0[,?UBX?
|][|
|][|
B
BB
X x
Xx
),|( uXxP?
Rough Membership (2)
The formulae for the lower and upper
approximations can be generalized to some
arbitrary level of precision by
means of the rough membership function
Note,the lower and upper approximations as originally formulated are obtained as a special
case with
]1,5.0(
}.1)(|{
})(|{
xxXB
xxXB
B
X
B
X
.1
Dependency of Attributes
Discovering dependencies between attributes
is an important issue in KDD.
A set of attribute D depends totally on a set
of attributes C,denoted if all values
of attributes from D are uniquely determined
by values of attributes from C.
,DC?
Dependency of Attributes (2)
Let D and C be subsets of A,We will say that
D depends on C in a degree k
denoted by if
where called C-positive
region of D.
),10( k
,DC k?
||
|)(|),(
U
DP O SDCk C
),()( / XCDP O S DUXC
Dependency of Attributes (3)
Obviously
If k = 1 we say that D depends totally on C.
If k < 1 we say that D depends partially
(in a degree k) on C.
.|| |)(|),(
/
DUX U
XCDCk?
A Rough Set Based KDD Process
Discretization based on RS and
Boolean Reasoning (RSBR).
Attribute selection based RS
with Heuristics (RSH),
Rule discovery by GDT-RS.
What Are Issues of Real World?
Very large data sets
Mixed types of data (continuous valued,
symbolic data)
Uncertainty (noisy data)
Incompleteness (missing,incomplete data)
Data change
Use of background knowledge
very large data set
mixed types of data
noisy data
incomplete instances
data change
use of background
knowledge
Real world
issues
Methods ID3 Prism Version BP Dblearn
(C4.5) Space
Okay possible
Probability
Logic
Set
Soft Techniques for KDD
Stoch,Proc.
Belief Nets
Conn,Nets
GDT
Deduction
Induction
Abduction
RoughSets
Fuzzy Sets
Soft Techniques for KDD (2)
Deduction
InductionAbduction
GDT
GrC
RS&ILP
RSTM
A Hybrid Model
GDT,Generalization Distribution Table
RS,Rough Sets
TM,Transition Matrix
ILP,Inductive Logic Programming
GrC,Granular Computing
A Rough Set Based KDD Process
Discretization based on RS and
Boolean Reasoning (RSBR).
Attribute selection based RS
with Heuristics (RSH),
Rule discovery by GDT-RS.
Observations
A real world data set always contains mixed
types of data such as continuous valued,
symbolic data,etc.
When it comes to analyze attributes with real
values,they must undergo a process called
discretization,which divides the attribute’s
value into intervals.
There is a lack of the unified approach to
discretization problems so far,and the choice
of method depends heavily on data considered.
Discretization based on RSBR
In the discretization of a decision table
T = where is
an interval of real values,we search for
a partition of for any
Any partition of is defined by a sequence
of the so-called cuts from
Any family of partitions can be
identified with a set of cuts,
} ),{,( dAU? ),[
aaa wvV?
aP aV
.Aa?
aV
kvvv,..21,aV
AaaP?}{
Discretization Based on RSBR
(2)
In the discretization process,we search for a set
of cuts satisfying some natural conditions.
U a b d
x1 0.8 2 1
x2 1 0.5 0
x3 1.3 3 0
x4 1.4 1 1
x5 1.4 2 0
x6 1.6 3 1
x7 1.3 1 1
U a b d
x1 0 2 1
x2 1 0 0
x3 1 2 0
x4 1 1 1
x5 1 2 0
x6 2 2 1
x7 1 1 1
P P
P = {(a,0.9),
(a,1.5),
(b,0.75),
(b,1.5)}
A Geometrical Representation of
Data
0 0.8 1 1.3 1.4 1.6 a
b
3
2
1
0.5
x1
x2
x3
x4x7
x5
x6
A Geometrical Representation of
Data and Cuts
0 0.8 1 1.3 1.4 1.6 a
b
3
2
1
0.5
x1
x2
x3
x4
x5
x6
x7
Discretization Based on RSBR
(3)
The sets of possible values of a and b are
defined by
The sets of values of a and b on objects
from U are given by
a(U) = {0.8,1,1.3,1.4,1.6};
b(U) = {0.5,1,2,3}.
);2,0[ V a? ).4,0[ V b?
Discretization Based on RSBR
(4)
The discretization process returns a partition
of the value sets of condition attributes into
intervals.
A Discretization Process
Step 1,define a set of Boolean variables,
wherecorresponds to the interval [0.8,1) of a
corresponds to the interval [1,1.3) of a
corresponds to the interval [1.3,1.4) of a
corresponds to the interval [1.4,1.6) of a
corresponds to the interval [0.5,1) of b
corresponds to the interval [1,2) of b
corresponds to the interval [2,3) of b
},,,,,,{)( 3214321 bbbaaaa pppppppUBV?
b
b
b
a
a
a
a
p
p
p
p
p
p
p
3
2
1
4
3
2
1
The Set of Cuts on Attribute a
0.8 1.0 1.3 1.4 1.6
a
ap1
ap2
ap3
ap4
1c 2c 3c 4c
A Discretization Process (2)
Step 2,create a new decision table by using
the set of Boolean variables defined in Step 1.
Let be a decision table,
be a propositional variable corresponding to
the interval for any
and
}){,( dAUT P akp
),[ 1akak vv?
}1,...,1{ ank
.Aa?
A Sample Defined in Step 2
U* ap
1 ap3ap2 ap4 bp1 bp2 bp3
(x1,x2)
(x1,x3)
(x1,x5)
(x4,x2)
(x4,x3)
(x4,x5)
(x6,x2)
(x6,x3)
(x6,x5)
(x7,x2)
(x7,x3)
(x7,x5)
1 0 0 0 1 1 0
1 1 0 0 0 0 1
1 1 1 0 0 0 0
0 1 1 0 1 0 0
0 0 1 0 0 1 1
0 0 0 0 0 1 0
0 1 1 1 1 1 1
0 0 1 1 0 0 0
0 0 0 1 0 0 1
0 1 0 0 1 0 0
0 0 0 0 0 1 0
0 0 1 0 0 1 0
PT
The Discernibility Formula
The discernibility formula
means that in order to discern object x1 and
x2,at least one of the following cuts must
be set,
a cut between a(0.8) and a(1)
a cut between b(0.5) and b(1)
a cut between b(1) and b(2).
bba pppxx 21121 ),(
The Discernibility Formulae for
All Different Pairs
bba pppxx 21121 ),(
baa pppxx 32131 ),(
aaa pppxx 32151 ),(
baa pppxx 13224 ),(
bba pppxx 32234 ),(
bpxx 254 ),(
The Discernibility Formulae for
All Different Pairs (2)
bbbaaa ppppppxx 32143226 ),(
aa ppxx 4336 ),(
ba ppxx 3456 ),(
ba ppxx 1227 ),(
bb ppxx 3237 ),(
ba ppxx 2357 ),(
A Discretization Process (3)
Step 3,find the minimal subset of p that
discerns all objects in different decision
classes.
The discernibility boolean propositional
formula is defined as follows,
) },()(:).({ jiU xdxdji
The Discernibility Formula
in CNF Form
)(
)()(
321
321211
aaa
baabbaU
ppp
pppppp
)()( 322132 bbabaa pppppp
)( 321432 bbbaaa pppppp
)()()( 123443 babaaa pppppp
.)()( 22332 bbabb ppppp
The Discernibility Formula
in DNF Form
We obtain four prime implicants,
is the optimal result,because
it is the minimal subset of P.
},,{ 242 baa ppp
)()( 3232242 bbaabaaU ppppppp
).()( 21413213 bbaabbba pppppppp
The Minimal Set Cuts
for the Sample DB
0 0.8 1 1.3 1.4 1.6 a
b
3
2
1
0.5
x1
x2
x3
x4
x5
x6
x7
A Result
U a b d
x1 0.8 2 1
x2 1 0.5 0
x3 1.3 3 0
x4 1.4 1 1
x5 1.4 2 0
x6 1.6 3 1
x7 1.3 1 1
U a b d
x1 0 1 1
x2 0 0 0
x3 1 1 0
x4 1 0 1
x5 1 1 0
x6 2 1 1
x7 1 0 1
P P
P = {(a,1.2),
(a,1.5),
(b,1.5)}
A Rough Set Based KDD Process
Discretization based on RS and
Boolean Reasoning (RSBR).
Attribute selection based RS
with Heuristics (RSH),
Rule discovery by GDT-RS.
Observations
A database always contains a lot of attributes
that are redundant and not necessary for rule
discovery.
If these redundant attributes are not removed,
not only the time complexity of rule discovery
increases,but also the quality of the discovered
rules may be significantly depleted.
The Goal of Attribute Selection
Finding an optimal subset of attributes in a
database according to some criterion,so that
a classifier with the highest possible
accuracy can be induced by learning
algorithm using information about data
available only from the subset of attributes.
Attribute Selection
U H e adach e Mu sc l e - pai n T e m p,F l u
U1 Y e s Y e s No r m a l No
U2 Y e s Y e s Hig h Y e s
U3 Y e s Y e s V e r y - hi g h Y e s
U4 No Y e s No r m a l No
U5 No No Hig h No
U6 No Y e s V e r y - hi g h Y e s
U Mu sc l e - pai n T e m p,F l u
U1 Y e s No r m a l No
U2 Y e s Hig h Y e s
U3 Y e s V e r y - hi g h Y e s
U4 Y e s No r m a l No
U5 No Hig h No
U6 Y e s V e r y - hi g h Y e s
U H e ad ach e T e m p,F l u
U1 Y e s No r m a l No
U2 Y e s Hi g h Y e s
U3 Y e s V e ry - hi g h Y e s
U4 No No r m a l No
U5 No Hi g h No
U6 No V e ry - hi g h Y e s
The Filter Approach
Preprocessing
The main strategies of attribute selection:
– The minimal subset of attributes
– Selection of the attributes with a higher rank
Advantage
– Fast
Disadvantage
– Ignoring the performance effects of the induction
algorithm
The Wrapper Approach
Using the induction algorithm as a part of the search
evaluation function
Possible attribute subsets (N-number of attributes)
The main search methods:
– Exhaustive/Complete search
– Heuristic search
– Non-deterministic search
Advantage
– Taking into account the performance of the induction algorithm
Disadvantage
– The time complexity is high
12?N
Basic Ideas,
Attribute Selection using RSH
Take the attributes in CORE as the initial
subset.
Select one attribute each time using the rule
evaluation criterion in our rule discovery
system,GDT-RS.
Stop when the subset of selected attributes
is a reduct,
Why Heuristics?
The number of possible reducts can be
where N is the number of attributes.
Selecting the optimal reduct from all of
possible reducts is time-complex and
heuristics must be used.
12?N
The Rule Selection Criteria
in GDT-RS
Selecting the rules that cover as many
instances as possible.
Selecting the rules that contain as little
attributes as possible,if they cover the same
number of instances.
Selecting the rules with larger strengths,if
they have same number of condition
attributes and cover the same number of
instances.
Attribute Evaluation Criteria
Selecting the attributes that cause the number
of consistent instances to increase faster
– To obtain the subset of attributes as small as
possible
Selecting an attribute that has smaller number
of different values
– To guarantee that the number of instances covered
by rules is as large as possible.
Main Features of RSH
It can select a better subset of attributes
quickly and effectively from a large DB.
The selected attributes do not damage the
performance of induction so much,
An Example of
Attribute Selection
U a b c d e
u 1 1 0 2 1 1
u 2 1 0 2 0 1
u 3 1 2 0 0 2
u 4 1 2 2 1 0
u 5 2 1 0 0 2
u 6 2 1 1 0 2
u 7 2 1 2 1 1
Condition Attributes:
a,Va = {1,2}
b,Vb = {0,1,2}
c,Vc = {0,1,2}
d,Vd = {0,1}
Decision Attribute,
e,Ve = {0,1,2}
U b c d e
u 1 0 2 1 1
u 2 0 2 0 1
u 3 2 0 0 2
u 4 2 2 1 0
u 5 1 0 0 2
u 6 1 1 0 2
u 7 1 2 1 1
Searching for CORE
Removing attribute a
Removing attribute a does
not cause inconsistency.
Hence,a is not used as
CORE.
Searching for CORE (2)
Removing attribute b
U a c d e
u 1 1 2 1 1
u 2 1 2 0 1
u 3 1 0 0 2
u 4 1 2 1 0
u 5 2 0 0 2
u 6 2 1 0 2
u 7 2 2 1 1
01214
11211
,
,
edcau
edcau
Removing attribute b
cause inconsistency.
Hence,b is used as CORE.
Searching for CORE (3)
Removing attribute c
U a b d e
u 1 1 0 1 1
u 2 1 0 0 1
u 3 1 2 0 2
u 4 1 2 1 0
u 5 2 1 0 2
u 6 2 1 0 2
u 7 2 1 1 1
Removing attribute c
does not cause inconsistency.
Hence,c is not used
as CORE.
Searching for CORE (4)
Removing attribute d
U a b c e
u 1 1 0 2 1
u 2 1 0 2 1
u 3 1 2 0 2
u 4 1 2 2 0
u 5 2 1 0 2
u 6 2 1 1 2
u 7 2 1 2 1Removing attribute d
does not cause inconsistency.
Hence,d is not used
as CORE.
Searching for CORE (5)
CORE(C)={b}
Initial subset R = {b}
Attribute b is the unique indispensable
attribute.
R={b}
U a b c d e
u 1 1 0 2 1 1
u 2 1 0 2 0 1
u 3 1 2 0 0 2
u 4 1 2 2 1 0
u 5 2 1 0 0 2
u 6 2 1 1 0 2
u 7 2 1 2 1 1
10 eb
U ’ b e
u 1 0 1
u 2 0 1
u 3 2 2
u 4 2 0
u 5 1 2
u 6 1 2
u 7 1 1
The instances containing b0 will not be considered.
T T’
Attribute Evaluation Criteria
Selecting the attributes that cause the number
of consistent instances to increase faster
– To obtain the subset of attributes as small as
possible
Selecting the attribute that has smaller number
of different values
– To guarantee that the number of instances covered
by a rule is as large as possible.
Selecting Attribute from {a,c,d}
U ’ a b e
u 3 1 2 2
u 4 1 2 0
u 5 2 1 2
u 6 2 1 2
u 7 2 1 1
1,Selecting {a}
R = {a,b}
021
221
eba
eba
112
212
eba
eba
}/{
},{ )(
eUX
ba XP O S
u3,u5,u6
u4
u7
U/{e}
u3
u4
u7
U/{a,b}
u5
u6
Selecting Attribute from {a,c,d} (2)
2,Selecting {c}
R = {b,c}
121
211
201
022
202
ecb
ecb
ecb
ecb
ecb
U ’ b c e
u 3 2 0 2
u 4 2 2 0
u 5 1 0 2
u 6 1 1 2
u 7 1 2 1
u3,u5,u6
u4
u7
U/{e}
U ’ b c
u
u
u
u
u
};7,6,5,4,3{)(
}/{
},{ uuuuuXP O S
eUX
cb?
Selecting Attribute from {a,c,d} (3)
3,Selecting {d}
R = {b,d}
111
201
012
202
edb
edb
edb
edb
U ’ b d e
u 3 2 0 2
u 4 2 1 0
u 5 1 0 2
u 6 1 0 2
u 7 1 1 1u3,u5,u6
u4
u7
U/{e}
};7,6,5,4,3{)(
}/{
},{ uuuuuXP O S
eUX
db?
Selecting Attribute from {a,c,d} (4)
3,Selecting {d}
R = {b,d}
}}6,5{},3{{},/{})6,5,3({},{ uuudbuuuPO S db?
Result,Subset of attributes= {b,d}
u3,u5,u6
u4
u7
U/{e}
u3,
u4
u7
U/{b,d}
u5,u6
2}),/{})6,5,3({(m a x _ },{?dbuuuPO S s i z e db
A Heuristic Algorithm
for Attribute Selection
Let R be a set of the selected attributes,P be the
set of unselected condition attributes,U be the
set of all instances,X be the set of contradictory
instances,and EXPECT be the threshold of
accuracy,
In the initial state,R = CORE(C),
k = 0,)( DP O SUX R
),( CC O R ECP
A Heuristic Algorithm
for Attribute Selection (2)
Step 1,If k >= EXPECT,finish,otherwise
calculate the dependency degree,k,
Step 2,For each p in P,calculate
))}{/()((m a x _
|)(|
}){(
}){(
DpRDP O S s iz em
DP O S v
pRp
pRp
.|| |)(| U DP O Sk R?
where max_size denotes the cardinality of the maximal subset,
A Heuristic Algorithm
for Attribute Selection (3)
Step 3,Choose the best attribute p with the
largest and let
Step 4,Remove all consistent instances u in
from X,
Step 5,Go back to Step 1.
)( DP OS R
}.{
}{
pPP
pRR
,pp mv?
Experimental Results
D at a s et s
At trib ute
N umb e r
In stanc e
N umb e r
At tri,N,
In Co r e
Sel e c te d
At tri,N,
M onk 1 6 124 3 3
M onk 3 6 122 4 4
M us hr o om 22 8 1 2 4 0 4
B r ea s t
c a nc e r
10 699 1 4
Ea r thq ua ke 16 155 0 3
M eni ngi ti s 30 140 1 4
Ba c te r i a l
exa mi nat i on
57 2 0 9 2 0 2 9
Sl o p e-
c ol l a p se
23 3 4 3 6 6 8
Ga st r i c
c a nc e r
3 8 7 5 2 0 2 19
A Rough Set Based KDD Process
Discretization based on RS and
Boolean Reasoning (RSBR).
Attribute selection based RS
with Heuristics (RSH),
Rule discovery by GDT-RS.
Main Features of GDT-RS
Unseen instances are considered in the
discovery process,and the uncertainty of a
rule,including its ability to predict possible
instances,can be explicitly represented in the
strength of the rule.
Biases can be flexibly selected for search
control,and background knowledge can be
used as a bias to control the creation of a GDT
and the discovery process,
A Sample DB
u1 a0 b0 c1 y
u2 a0 b1 c1 y
u3 a0 b0 c1 y
u4 a1 b1 c0 n
u5 a0 b0 c1 n
u6 a0 b2 c1 y
u7 a1 b1 c1 y
Condition attributes,a,b,c
Va = {a0,a1} Vb = {b0,b1,b2} Vc = {c0,c1}
Decision attribute,d,Vd = {y,n}
U a b c d
A Sample GDT
a0b0c0 a0b0c1 … … a1b0c0 …..,a1b2c1
*b0c0
*b0c1
*b1c0
*b1c1
*b2c0
*b2c1
a0*c0
…...
a1b1*
a1b2*
**c0
…...
a0**
a1**
1/2 …… 1/2 ……
1/2 ……
……
……
……
…… 1/2
1/3 ……
…… ……
……
…… 1/2
1/6 1/6 ……
…… ……
1/6 1/6 ……
1/6 …… 1/6
G(x)
F(x)
Explanation for GDT
F(x),the possible instances (PI)
G(x),the possible generalizations (PG)
the probability relationships
between PI & PG.
:)()( xFxG?
Probabilistic Relationship
Between PIs and PGs
*}][|{ lPGlk kPG nN i
a0*c0
a0b0c0
a0b1c0
a0b2c0
3}00 bca nN{
o t h e r w i s e 0
if
1
)|(
ij
PGij
PGPI
NPGPIp
i
p = 1/3
1/3
1/3
iPGN
is the number of PI
satisfying the ith PG.
Unseen Instances
U Headache Mus cle- pain T e m p, Flu
U1 Y es Y es Nor m al No
U2 Y es Y es High Y es
U3 Y es Y es V er y - high Y es
U4 No Y es Nor m al No
U5 No No High No
U6 No Y es V er y - high Y es
Possible Instances:
yes,no,normal
yes,no,high
yes,no,very-high
no,yes,high
no,no,normal
no,no,very-high
Closed world Open world
Rule Representation
X Y with S
X denotes the conjunction of the conditions
that a concept must satisfy
Y denotes a concept that the rule describes
S is a,measure of strength” of which the
rule holds
Rule Strength (1)
The strength of the generalization X
(BK is no used),
is the number of the observed
instances satisfying the ith generalization.
kPG
kr e lins
N
PGN
l
kl
k
PGPIp
PGsXs
)(
)|(
)()(
))(1)(()( YXrXsYXS
)( kre li n s PGN?
Rule Strength (2)
The strength of the generalization X
(BK is used),
k
PG
l
kl
l
klbkk
N
PGPIB K F
PGPIpPGsXs
)|(
)|()()(
Rule Strength (3)
The rate of noises
is the number of instances
belonging to the class Y within the instances
satisfying the generalization X.
)(
),()()(
XN
YXNXN
r e li n s
c l a s si n sr e li n sYXr
),( YXN c l a ssi n s?
Rule Discovery by GDT-RS
Condition Attrs.,a,b,c
a,Va = {a0,a1}
b,Vb = {b0,b1,b2}
c,Vc = {c0,c1}
Class,d:
d,Vd = {y,n}
U a b c d
u1 a0 b0 c1 y
u2 a0 b1 c1 y
u3 a0 b0 c1 y
u4 a1 b1 c0 n
u5 a0 b0 c1 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
Regarding the Instances
(Noise Rate = 0)
U a b c d
u1,
u1 ’ u 3,
u5
a0 b0 c1
y,
y,
n
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
67.0
3
1
1)'1(
33.0
3
2
1)'1(
}{
}{
ur
ur
n
y
)'1(
)'1(
)'1(
0L e t
}{
}{
ud
Tur
Tur
T
n o i s en
n o i s ey
n o i s e
と ?
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
Generating Discernibility Vector
for u2
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
7,2
6,2
4,2
2,2
1,2
}{
},{
}{
m
bm
cam
m
bm
u1 ’ u2 u4 u6 u7
u2 b? a,c b?
Obtaining Reducts for u2
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
u1 ’ u2 u4 u6 u7
u2 b? a,c b?
)()(
)()(
)()()()2(
cbba
cab
bcabuf
T
Generating Rules from u2
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
)()()2( cbbauf T
{a0,b1} {b1,c1}
{a0b1}
a0b1c0
a0b1c1(u2)
s({a0b1}) = 0.5
{b1c1}
a0b1c1(u2)
a1b1c1(u7)
s({b1c1}) = 1
0)}11({ ycbr
y
y
0)}10({ ybar
y
Generating Rules from u2 (2)
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
1)01()
2
1
2( w i t h }11{
5.0)01()
2
1
1( w i t h }10{
Sycb
Syba
Generating Discernibility Vector
for u4
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
}{
},{
},,{
7,4
6,4
4,4
2,4
1,4
cm
m
m
cam
cbam
u1 ’ u2 u4 u6 u7
u4 a,b,c a,c c
Obtaining Reducts for u4
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
)(
)()()()4(
c
ccacbauf T
u1 ’ u2 u4 u6 u7
u4 a,b,c a,c c
Generating Rules from u4
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
{c0}
{c0}
a0b0c0
a1b1c0(u4)
0)}0({
6
1
)0(
ncr
cs
n
)()4( cuf T?
a1b2c0
Generating Rules from u4 (2)
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
1 6 7.0)01()611( w i t h }0{ Snc
Generating Rules from All
Instances
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
u2,{a0b1} y,S = 0.5
{b1c1} y,S =1
u4,{c0} n,S = 0.167
u6,{b2} n,S=0.25
u7,{a1c1} y,S=0.5
{b1c1} y,S=1
The Rule Selection Criteria
in GDT-RS
Selecting the rules that cover as many
instances as possible.
Selecting the rules that contain as little
attributes as possible,if they cover the same
number of instances.
Selecting the rules with larger strengths,if
they have same number of condition
attributes and cover the same number of
instances.
Generalization Belonging to
Class y
a0b1 c1
( y )
a1b1 c1
( y )
*b 1c1 1/2 1/2
a1 *c1 1/3
a0b1 * 1/2
{b1c1} y with S = 1 u2,u7
{a1c1} y with S = 1/2 u7
{a0b1} y with S = 1/2 u2
u2 u7
Generalization Belonging to
Class n
a0 b2 c 1
(n)
a1 b1 c 0
(n)
* *c 0 1/6
*b2* 1/4
c0 n with S = 1/6 u4
b2 n with S = 1/4 u6
u4 u6
Results from the Sample DB
(Noise Rate = 0)
Certain Rules,Instances Covered
{c0} n with S = 1/6 u4
{b2} n with S = 1/4 u6
{b1c1} y with S = 1 u2,u7
Possible Rules:
b0 y with S = (1/4)(1/2)
a0 & b0 y with S = (1/2)(2/3)
a0 & c1 y with S = (1/3)(2/3)
b0 & c1 y with S = (1/2)(2/3)
Instances Covered,u1,u3,u5
Results from the Sample DB (2)
(Noise Rate > 0)
Regarding Instances
(Noise Rate > 0)
U a b c d
u1,
u1 ’ u 3,
u5
a0 b0 c1
y,
y,
n
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
67.0
3
1
1)'1(
33.0
3
2
1)'1(
}{
}{
ur
ur
n
y
yud
Tur
T
n o i s ey
n o i s e
)'1(
)'1(
5.0L e t
}{
U a b c d
u1 ’ a0 b0 c1 y
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
Rules Obtained from All
Instacnes
U a b c d
u1 ’ a0 b0 c1 y
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
u2,{a0b1} y,S=0.5
{b1c1} y,S=1
u4,{c0} n,S=0.167
u6,{b2} n,S=0.25
u7,{a1c1} y,S=0.5
{b1c1} y,S=1
u1’:{b0} y,S=1/4*2/3=0.167
Example of Using BK
a 0 b 0 c0 a 0 b 0 c1 a 0 b 1 c0 a 0 b 1 c1 a 0 b 2 c0 a 0 b 2 c1 … a 1 b 2 c1
a 0 b 0 * 1 / 2 1 / 2
a 0 b 1 * 1 / 2 1 / 2
a 0 * c1 1 / 3 1 / 3 1 / 3
a 0 * * 1 / 6 1 / 6 1 / 6 1 / 6 1 / 6 1 / 6
BK,a0 => c1,100%
a 0 b 0 c 0 a 0 b 0 c 1 a 0 b 1 c 0 a 0 b 1 c 1 a 0 b 2 c 0 a 0 b 2 c 1 … a 1 b 2 c 1
a 0 b 0 * 0 1
a 0 b 1 * 0 1
a 0 * c 1 1 / 3 1 / 3 1 / 3
a 0 * * 0 1 / 3 0 1 / 3 0 1 / 3
Changing Strength of
Generalization by BK
U a b c d
u1 ’ a0 b0 c1
u2 a0 b1 c1 y
u4 a1 b1 c0 n
u6 a0 b2 c1 n
u7 a1 b1 c1 y
)()()2( cbbauf T
{a0,b1} {b1,c1}
{a0b1}
a0b1c0
a0b1c1(u2)
s({a0b1}) = 0.5
0)}10({ ybar
{a0b1} a0b1c0
a0b1c1(u2)
s({a0b1}) = 1
0)}10({ ybar
1/2
1/2 100%
0%
a0 => c1,100%
Algorithm 1
Optimal Set of Rules
Step 1,Consider the instances with the same
condition attribute values as one instance,
called a compound instance.
Step 2,Calculate the rate of noises r for each
compound instance.
Step 3,Select one instance u from U and
create a discernibility vector for u.
Step 4,Calculate all reducts for the instance u
by using the discernibility function.
Algorithm 1
Optimal Set of Rules (2)
Step 5,Acquire the rules from the reducts for
the instance u,and revise the strength of
generalization of each rule.
Step 6,Select better rules from the rules (for u)
acquired in Step 5,by using the heuristics for
rule selection.
Step 7,If then go back to
Step 3,Otherwise go to Step 8.
}.{ uUU
,U
Algorithm 1
Optimal Set of Rules (3)
Step 8,Finish if the number of rules
selected in Step 6 for each instance is 1,
Otherwise find a minimal set of rules,
which contains all of the instances in the
decision table.
The Issue of Algorithm 1
It is not suitable for the database with a
large number of attributes.
Methods to Solve the Issue:
Finding a reduct (subset) of condition
attributes in a pre-processing.
Finding a sub-optimal solution using some
efficient heuristics.
Algorithm 2
Sub-Optimal Solution
Step1,Set R = {},COVERED = {},and
SS = {all instances IDs},
For each class,divide the decision table
T into two parts,current class and other
classes
Step2,From the attribute values of the
instances (where means the jth value of
attribute i,
),,SSITI kk
kI
cD
ijv
ijv
T
.?T
Algorithm 2
Sub-Optimal Solution (2)
choose a value v with the maximal number of
occurrence within the instances contained in
T+,and the minimal number of occurrence
within the instances contained in T-.
Step3,Insert v into R,
Step4,Delete the instance ID from SS if the
instance does not contain v.
Algorithm 2
Sub-Optimal Solution (3)
Step5,Go back to Step2 until the noise rate
is less than the threshold value.
Step6,Find out a minimal sub-set R’ of R
according to their strengths,Insert
into RS,Set R = {},copy the instance IDs
in SS to COVERED,and
set SS = {all instance IDs}- COVERED.
)'( cDR?
Algorithm 2
Sub-Optimal Solution (4)
Step8,Go back to Step2 until all instances
of T+ are in COVERED.
Step9,Go back to Step1 until all classes are
handled.
Time Complexity of Alg.1&2
Time Complexity of Algorithm 1:
Time Complexity of Algorithm 2:
Let n be the number of instances in a DB,
m the number of attributes,
the number of generalizations
and is less than
))(( 23 TGNmnmnO?
)( 22 nmO
)( TGN
).2( 1?mO
Experiments
DBs that have been tested,
meningitis,bacterial examination,cancer,mushroom,
slope-in-collapse,earth-quack,contents-sell,…...
Experimental methods:
– Comparing GDT-RS with C4.5
– Using background knowledge or not
– Selecting different allowed noise rates as the
threshold values
– Auto-discretization or BK-based discretization.
Experiment 1
(meningitis data)
C4.5:
(from a meningitis DB with 140 records,and
38 attributes)
V ir u sC e llP o lyn o r m a lC TF in d
V ir u sC e llMo n oC e llP o ly
B a c te r iaC e llMo n oa b n o r m a lC TF in d
B a c te r iaC e llP o ly
)220()(
)12()220(
)12()(
)220(
Experiment 1
(meningitis data) (2)
GDT-RS (auto-discretization):
b a c t e r i as t r e p t oc u l t u r e
b a c t e r i apr i s k G r o u n pa c u t eo n s e ts e i z u r e
b a c t e r i a
pr i s k G r o u n pn e g a t i v eC C o u r s ea c u t eo n s e t
b a c t e r i aC R Pms e x
b a c t e r i ac e l l M o n oa b n o r m a lc t F i n d
b a c t e r i ac e l l P o l y
)(
)()()0(
)()()(
)4()(
)15()(
)2 2 1(
Experiment 1
(meningitis data) (3)
GDT-RS (auto-discretization):
v ir u snr is kc e llMo n oc e llP o ly
v ir u sc e llP o lyn o r m a lc tF in dC R P
v ir u sc e llMo n oc e llP o lyC R P
)()15()2 2 1(
)2 2 1()()4(
)15()2 2 1()4(
Using Background Knowledge
(meningitis data)
Never occurring together:
EEGwave(normal) EEGfocus(+)
CSFcell(low) Cell_Poly(high)
CSFcell(low) Cell_Mono(high)
Occurring with lower possibility:
WBC(low) CRP(high)
WBC(low) ESR(high)
WBC(low) CSFcell(high)
Using Background Knowledge
(meningitis data) (2)
Occurring with higher possibility:
WBC(high) CRP(high)
WBC(high) ESR(high)
WBC(high) CSF_CELL(high)
EEGfocus(+) FOCAL(+)
EEGwave(+) EEGfocus(+)
CRP(high) CSF_GLU(low)
CRP(high) CSF_PRO(low)
Explanation of BK
If the brain wave (EEGwave) is normal,the
focus of brain wave (EEGfocus) is never
abnormal.
If the number of white blood cells (WBC) is
high,the inflammation protein (CRP) is also
high.
Using Background Knowledge
(meningitis data) (3)
rule1 is generated by BK
rule1:
4)/384(30
);()(
)10()5()(
ES
EV I R U SC U L TU R E
C S F c e llE S Ra c u teO N S E T
Using Background Knowledge
(meningitis data) (4)
rule2 is replaced by rule2’
rule2:
rule2’:
./30;)7,4[))((
ES
lE E G a b n o r m aL O CEV I R U SD I A G
.4)/10(;)7,4[)(
ES
lE E G a b n o r m aL O CE E G fo c u s
Experiment 2
(bacterial examination data)
Number of instances,20,000
Number of condition attributes,60
Goals:
– analyzing the relationship between the bacterium-
detected attribute and other attributes
– analyzing what attribute-values are related to the
sensitivity of antibiotics when the value of
bacterium-detected is (+).
Attribute Selection
(bacterial examination data)
Class-1:bacterium-detected (+、-)
condition attributes,11
Class-2:antibiotic-sensibility
(resistant (R),sensibility(S))
condition attributes,21
Some Results
(bacterial examination data)
Some of rules discovered by GDT-RS are
the same as C4.5,e.g.,
Some of rules can only be discovered by
GDT-RS,e.g.,
)3(l a c t a m e s e? bacterium-detected(+)
)10( 3q u a n t i t yu r i n e bacterium-detected(-)
)(1 p n e u m o n i ad i s e a s e bacterium-detected(-).
Experiment 3
(gastric cancer data)
Instances number:7520
Condition Attributes,38
Classes:
– cause of death (specially,the direct death)
– post-operative complication
Goals:
– analyzing the relationship between the direct
death and other attributes
– analyzing the relationship between the post-
operative complication and other attributes.
Result of Attribute Selection
(gastric cancer data)
Class,the direct death
sex,location_lon1,location_lon2,location_cir1,
location_cir2,serosal_inva,peritoneal_meta,
lymphnode_diss,reconstruction,pre_oper_comp1,
post_oper_comp1,histological,structural_atyp,
growth_pattern,depth,lymphatic_inva,
vascular_inva,ln_metastasis,chemotherapypos
(19 attributes are selected)
Result of Attribute Selection (2)
(gastric cancer data)
Class,post-operative complication
multi-lesions,sex,location_lon1,location_cir1,
location_cir2,lymphnode_diss,maximal_diam,
reconstruction,pre_oper_comp1,histological,
stromal_type,cellular_atyp,structural_atyp,
growth_pattern,depth,lymphatic_inva,
chemotherapypos
(17 attributes are selected)
Experiment 4
(slope-collapse data)
Instances number:3436
– (430 places were collapsed,and 3006 were not)
Condition attributes,32
Continuous attributes in condition attributes,6
– extension of collapsed steep slope,gradient,altitude,
thickness of surface of soil,No,of active fault,
distance between slope and active fault,
Goal,find out what is the reason that causes the
slope to be collapsed.
Result of Attribute Selection
(slope-collapse data)
9 attributes are selected from 32 condition
attributes:
altitude,slope azimuthal,slope shape,direction of
high rank topography,shape of transverse section,
position of transition line,thickness of surface of
soil,kind of plant,distance between slope and
active fault.
(3 continuous attributes in red color)
The Discovered Rules
(slope-collapse data)
s_azimuthal(2) ∧ s_shape(5) ∧ direction_high(8) ∧ plant_kind(3)
S = (4860/E)
altitude[21,25)∧ s_azimuthal(3) ∧ soil_thick(>=45) S = (486/E)
s_azimuthal(4) ∧ direction_high(4) ∧ t_shape(1) ∧ tl_position(2) ∧
s_f_distance(>=9) S = (6750/E)
altitude[16,17)∧ s_azimuthal(3) ∧ soil_thick(>=45)∧
s_f_distance(>=9) S = (1458/E)
altitude[20,21)∧ t_shape(3) ∧ tl_position(2) ∧ plant_kind(6) ∧
s_f_distance(>=9) S = (12150/E)
altitude[11,12)∧ s_azimuthal(2) ∧ tl_position(1) S = (1215/E)
altitude[12,13)∧ direction_high(9) ∧ tl_position(4) ∧ s_f_distance[8,9)
S = (4050/E)
altitude[12,13)∧ s_azimuthal(5) ∧ t_shape(5) ∧ s_f_distance[8,9)
S = (3645/E)
…...
Other Methods for Attribute
Selection
(download from http://www.iscs/nus.edu.sg/liuh/)
LVW,A stochastic wrapper feature selection algorithm
LVI,An incremental multivariate feature selection
algorithm
WSBG/C4.5,Wrapper of sequential backward
generation
WSFG/C4.5,Wrapper of sequential forward
generation
Rule induction system,C4.5
Executing times,10
Class,direct death
Number of selected attributes for each time:
20,19,21,26,22,31,21,19,31,28
Result-2 (19 attributes are selected):
multilesions,sex,location_lon3,location_cir4,
liver_meta,lymphnode_diss,proximal_surg,resection_meth,
combined_rese2,reconstruction,pre_oper_comp1,
post_oper_com2,post_oper_com3,spec_histologi,cellular_atyp,
depth,eval_of_treat,ln_metastasis,othertherapypre
Results of LVW
Result-2 (19 attributes are selected):
age,typeofcancer,location_cir3,location_cir4,
liver_meta,lymphnode_diss,maximal_diam,
distal_surg,combined_rese1,combined_rese2,
pre_oper_comp2,post_oper_com1,histological,
spec_histologi,structural_atyp,depth,lymphatic_inva,
vascular_inva,ln_metastasis
(only the attributes in red color are selected by our method)
Result of LVW (2)
Result of WSFG
Rule induction system:
C4.5
Results
the best relevant attribute first
Result of WSFG (2)
(class,direct death)
eval_of_treat,liver_meta,peritoneal_meta,typeofcancer,
chemotherapypos,combined_rese1,ln_metastasis,location_lon2,
depth,pre_oper_comp1,histological,growth_pattern,vascular_inva,
location_cir1,location_lon3,cellular_atyp,maximal_diam,
pre_oper_comp2,location_lon1,location_cir3,sex,post_oper_com3,
age,serosal_inva,spec_histologi,proximal_surg,location_lon4,
chemotherapypre,lymphatic_inva,lymphnode_diss,structural_atyp,
distal_surg,resection_meth,combined_rese3,chemotherapyin,
location_cir4,post_oper_comp1,stromal_type,combined_rese2,
othertherapypre,othertherapyin,othertherapypos,reconstruction,
multilesions,location_cir2,pre_oper_comp3
( the best relevant attribute first)
Result of WSBG
Rule induction system,
C4.5
Result
the least relevant attribute first
Result of WSBG (2)
(class,direct death)
peritoneal_meta,liver_meta,eval_of_treat,lymphnode_diss,
reconstruction,chemotherapypos,structural_atyp,typeofcancer,
pre_oper_comp1,maximal_diam,location_lon2,combined_rese3,
othertherapypos,post_oper_com3,stromal_type,cellular_atyp,
resection_meth,location_cir3,multilesions,location_cir4,
proximal_surg,location_cir1,sex,lymphatic_inva,location_lon4,
location_lon1,location_cir2,distal_surg,post_oper_com2,
location_lon3,vascular_inva,combined_rese2,age,pre_oper_comp2,
ln_metastasis,serosal_inva,depth,growth_pattern,combined_rese1,
chemotherapyin,spec_histologi,post_oper_com1,chemotherapypre,
pre_oper_comp3,histological,othertherapypre
Result of LVI
(gastric cancer data)
Number of allowed
inconsistent instances
Executing
times
Number of
inconsistent instances
Number of
selected
attributes
80
20
1
2
3
4
5
1
2
3
4
5
79
68
49
61
66
7
19
19
20
18
19
16
20
18
20
49
26
28
23
26
Some Rules
Related to Direct Death
peritoneal_meta(2) ∧ pre_oper_comp1(.) ∧ post_oper_com1(L) ∧
chemotherapypos(.) S= 3*(7200/E)
location_lon1(M) ∧ post_oper_com1(L) ∧ ln_metastasis(3) ∧
chemotherapypos(.) S= 3*(2880/E)
sex(F) ∧ location_cir2(.) ∧ post_oper_com1(L) ∧ growth_pattern(2) ∧
chemotherapypos(.) S= 3*(7200/E)
location_cir1(L) ∧ location_cir2(.) ∧ post_oper_com1(L) ∧
ln_metastasis(2) ∧ chemotherapypos(.) S= 3*(25920/E)
pre_oper_comp1(.) ∧ post_oper_com1(L) ∧ histological(MUC) ∧
growth_pattern(3) ∧ chemotherapypos(.) S= 3*(64800/E)
sex(M) ∧ location_lon1(M) ∧ reconstruction(B2) ∧ pre_oper_comp1(.)
∧ structural_atyp(3) ∧ lymphatic_inva(3) ∧ vascular_inva(0) ∧
ln_metastasis(2) S=3*(345600/E)
sex(F) ∧ location_lon2(M) ∧ location_cir2(.) ∧ pre_oper_comp1(A) ∧
depth(S2) ∧ chemotherapypos(.) S= 3*(46080/E)
GDT-RS vs,Discriminant
Analysis
if -then rules
multi-class,high-
dimension,large-scale
data can be processed
BK can be used easily
the stability and
uncertainty of a rule
can be expressed
explicitly
continuous data must
be discretized,
algebraic expressions
difficult to deal with the
data with multi-class.
difficult to use BK
the stability and
uncertainty of a rule
cannot be explained
clearly
symbolic data must be
quantized,
GDT-RS vs,ID3 (C4.5)
BK can be used easily
the stability and
uncertainty of a rule
can be expressed
explicitly
unseen instances are
considered
the minimal set of
rules containing all
instances can be
discovered
difficult to use BK
the stability and
uncertainty of a rule
cannot be explained
clearly
unseen instances are not
considered
not consider whether the
discovered rules are the
minimal set covered all
instances
Rough Sets in ILP and GrC
-- An Advanced Topic --
Background and goal
The normal problem setting for ILP
Issues,observations,and solutions
Rough problem settings
Future work on RS (GrC) in ILP
ILP,Inductive Logic Programming
GrC,Granule Computing
Advantages of ILP
(Compared with Attribute-Value Learning)
It can learn knowledge which is more
expressive because it is in predicate logic
It can utilize background knowledge more
naturally and effectively because in ILP the
examples,the background knowledge,as
well as the learned knowledge are all
expressed within the same logic framework,
Weak Points of ILP
(Compared with Attribute-Value Learning)
It is more difficult to handle numbers
(especially continuous values) prevailing in
real-world databases.
The theory,techniques are much less
mature for ILP to deal with imperfect data
(uncertainty,incompleteness,vagueness,
impreciseness,etc,in examples,background
knowledge as well as the learned rules),
Goal
Applying Granular Computing (GrC) and a
special form of GrC,Rough Sets to ILP to
deal with some kinds of imperfect data
which occur in large real-world applications,
Normal Problem Setting for ILP
Given:
– The target predicate p
– The positive examples and the negative
examples (two sets of ground atoms of p)
– Background knowledge B (a finite set of
definite clauses)
E
E
Normal Problem Setting for ILP (2)
To find:
– Hypothesis H (the defining clauses of p) which is
correct with respect to and,i.e.
1,is complete with respect to
(i.e,)
We also say that covers all positive examples,
2,is consistent with respect to
(i.e,)
We also say that rejects any negative examples,
E?E
BH?
eBHEe |
BH?
BH?
eBHEe |
BH?
E
E
Normal Problem Setting for ILP (3)
Prior conditions:
1’,B is not complete with respect to
(Otherwise there will be no learning task at all)
2’,is consistent with respect to
(Otherwise there will be no solution)
Everything is assumed correct and perfect,
E
E EB
Issues
In large,real-world empirical learning,
uncertainty,incompleteness,vagueness,
impreciseness,etc,are frequently observed
in training examples,in background
knowledge,as well as in the induced
hypothesis.
Too strong bias may miss some useful
solutions or have no solution at all.
Imperfect Data in ILP
Imperfect output
– Even the input (Examples and BK) are,perfect”,
there are usually several Hs that can be induced.
– If the input is imperfect,we have imperfect
hypotheses,
Noisy data
– Erroneous argument values in examples.
– Erroneous classification of examples as
belonging to or?E,?E
Imperfect Data in ILP (2)
Too sparse data
– The training examples are too sparse to induce
reliable H.
Missing data
– Missing values,some arguments of some
examples have unknown values.
– Missing predicates,BK lacks essential predicates
(or essential clauses of some predicates) so that
no non-trivial H can be induced.
Imperfect Data in ILP (3)
Indiscernible data
– Some examples belong to both and
This presentation will focus on
(1) Missing predicates
(2) Indiscernible data
E,?E
Observations
H should be,correct with respect to
and,needs to be relaxed,otherwise
there will be no (meaningful) solutions to
the ILP problem.
While it is impossible to differentiate
distinct objects,we may consider granules,
sets of objects drawn together by similarity,
indistinguishability,or functionality.
E
E
Observations (2)
Even when precise solutions in terms of
individual objects can be obtained,we may still
prefect to granules in order to have an efficient
and practical solution.
When we use granules instead of individual
objects,we are actually relaxing the strict
requirements in the standard normal problem
setting for ILP,so that rough but useful
hypotheses can be induced from imperfect data,
Solution
Granular Computing (GrC) can pay an
important role in dealing with imperfect data
and/or too strong bias in ILP.
GrC is a superset of various theories (such as
rough sets,fuzzy sets,interval computation)
used to handle incompleteness,uncertainty,
vagueness,etc,in information systems
(Zadeh,1997).
Why GrC?
A Practical Point of View
With incomplete,uncertain,or vague
information,it may be difficult to
differentiate some elements and one is forced
to consider granules.
It may be sufficient to use granules in order
to have an efficient and practical solution.
The acquisition of precise information is too
costly,and coarse-grained information
reduces cost,
Solution (2)
Granular Computing (GrC) may be regarded as
a label of theories,methodologies,techniques,
and tools that make use of granules,i.e.,groups,
classes,or clusters of a universe,in the process
of problem solving,
We use a special form of GrC,rough sets to
provide a,rough” solution,
Rough Sets
Approximation space A = (U,R)
U is a set (called the universe)
R is an equivalence relation on U (called an
indiscernibility relation).
In fact,U is partitioned by R into
equivalence classes,elements within an
equivalence class are indistinguishable in A,
Rough Sets (2)
Lower and upper approximations,For an
equivalence relation R,the lower and upper
approximations of are defined by
where denotes the equivalence class containing x.
UX?
}][|{][)(
][
XxUxxXA p r R
Xx
RA
R
}][|{][)(
][
XxUxxXA p r R
Xx
RA
R
Rx][
Rough Sets (3)
Boundary.
is called the boundary of X in A.
Rough membership,
– elements x surely belongs to X in A if
– elements x possibly belongs to X in A if
– elements x surely does not belong to X in A if
)()()( XA p rXA p rXB n d AAA
)( XA p rx A?
)( XAp rx A?
).( XA p rx A?
An Illustrating Example
The target predicate:
customer(Name,Age,Sex,Income)
The negative examples
customer(c,50,female,2).
customer(g,20,male,2).
The positive examples
customer(a,30,female,1).
customer(b,53,female,100).
customer(d,50,female,2).
customer(e,32,male,10).
customer(f,55,male,10).
Background knowledge B defining married_to(H,W)
by married_to(e,a),married_to(f,d).
Given:
An Illustrating Example (2)
Hypothesis H (customer/4) which is correct with
respect to and
customer(N,A,S,I),- I >= 10.
customer(N,A,S,I),- married_to(N’,N),
customer(N’,A’,S’,I').
To find:
The normal problem setting is perfectly suitable for this
problem,and an ILP system can induce the following
hypothesis H defining customer/4,
E,?E
Rough Problem Setting for
Insufficient BK
Problem,If married_to/2 is missing in BK,
no hypothesis will be induced.
Solution,Rough Problem Setting 1.
Given:
– The target predicate p
(the set of all ground atoms of p is U).
– An equivalence relation R on U
(we have the approximation space A= (U,R)).
– and satisfying the prior condition,
is consistent with respect to,
– BK,B (may lack essential predicates/clauses).
UE UE
EB?E
Rough Problem Setting for
Insufficient BK (2)
Considering the following rough sets:
containing all positive
examples,and those negative examples
containing the,pure”
(remaining) negative examples.
containing,pure” positive
examples,That is,where
)( EAp rE A
)( EA p rE A
EEE
EEE
}R e '|'{ eEeE Ee
}R e '|{ ' eEeE Ee
Rough Problem Setting for
Insufficient BK (3)
containing all negative
examples and,non-pure” positive examples.
To find:
Hypothesis (the defining clauses of p)
which is correct with respect to and
i.e.
1,covers all examples of
2,rejects any examples of
EEE
E
H
EE
E
BH
BH
Rough Problem Setting for
Insufficient BK (4)
Hypothesis (the defining clauses of p)
which is correct with respect to and
i.e.
1,covers all examples of
2,rejects any examples of
BH
BH
EE
E
E
H
Example Revisited
Married_to/2 is missing in B,Let R be defined as
“customer(N,A,S,I) R customer(N’,A,S,I)”,with
the Rough Problem Setting 1,we may induce as:
customer(N,A,S,I),- I >= 10.
customer(N,A,S,I),- S = female.
which covers all positive examples and the negative
example,customer(c,50,female,2)”,rejecting other
negative examples,
H
Example Revisited (2)
We may also induce as:
customer(N,A,S,I),- I >= 10.
customer(N,A,S,I),- S = female,A < 50.
which covers all positive examples except
“customer(d,50,female,2)”,
rejecting all negative examples,
H
Example Revisited (3)
These hypotheses are rough (because the
problem itself is rough),but still useful.
On the other hand,if we insist in the normal
problem setting for ILP,these hypothese are
not considered as,solutions”.
Rough Problem Setting for
Indiscernible Examples
Problem,Consider customer(Age,Sex,Income),
we have customer(50,female,2) belonging to
as well as to
Solution,Rough Problem Setting 2.
Given:
– The target predicate p (the set of all ground atoms of
p is U).
– and where
– Background knowledge B.
UE UE EE
E,?E
Rough Problem Setting for
Indiscernible Examples (2)
Rough sets to consider and the hypotheses
to find:
Taking the identity relation I as a special
equivalence relation R,the remaining
description of Rough Problem Setting 2 is
the same as in Rough Problem Setting 1,
Rough Sets (GrC) for Other
Imperfect Data in ILP
Noisy data
Too sparse data
Missing values
Discretization of continuous values
Future Work on RS (GrC) in ILP
Trying to find more concrete formalisms and
methods to deal with noisy data,too sparse
data,missing values,etc,in ILP within the
framework of RS (GrC).
Giving quantitative measures associated with
hypotheses induced of ILP,either in its normal
problem setting or in the new rough setting.
Developing ILP algorithms and systems based
on rough problem settings,
Summary
Rough sets offers mathematical tools and
constitutes a sound basis for KDD.
We introduced the basic concepts of
(classical) rough set theory,
We described a rough set based KDD process.
We discussed the problem of imperfect data
handling in ILP using some ideas,concepts
and methods of GrC (or a particular form of
GrC,Rough Sets).
Advanced Topics
(to deal with real world problems)
Recent extensions of rough set theory
(rough mereology,approximate synthesis
of objects) have developed new methods for
decomposition of large datasets,data
mining in distributed and multi-agent
systems,and fusion of information granules
to induce complex information granule
approximation.
Advanced Topics (2)
(to deal with real world problems)
Combining rough set theory with logic
(including non-classical logic),ANN,GA,
probabilistic and statistical reasoning,fuzzy
set theory to construct a hybrid approach,
References and Further Readings
Z,Pawlak,“Rough Sets”,International Journal of Computer
and Information Sciences,Vol.11,341-356 (1982).
Z,Pawlak,Rough Sets - Theoretical Aspect of Reasoning about
Data,Kluwer Academic Publishers (1991).
L,Polkowski and A,Skowron (eds.) Rough Sets in Knowledge
Discovery,Vol.1 and Vol.2.,Studies in Fuzziness and Soft
Computing series,Physica-Verlag (1998).
L,Polkowski and A,Skowron (eds.) Rough Sets and Current
Trends in Computing,LNAI 1424,Springer (1998).
T.Y,Lin and N,Cercone (eds.),Rough Sets and Data Mining,
Kluwer Academic Publishers (1997).
K,Cios,W,Pedrycz,and R,Swiniarski,Data Mining Methods
for Knowledge Discovery,Kluwer Academic Publishers (1998).
References and Further Readings
R,Slowinski,Intelligent Decision Support,Handbook of
Applications and Advances of the Rough Sets Theory,Kluwer
Academic Publishers (1992).
S.K,Pal and S,Skowron (eds.) Rough Fuzzy Hybridization,A
New Trend in Decision-Making,Springer (1999).
E,Orlowska (ed.) Incomplete Information,Rough Set Analysis,
Physica-Verlag (1997).
S,Tsumolto,et al,(eds.) Proceedings of the 4th International
Workshop on Rough Sets,Fuzzy Sets,and Machine Discovery,
The University of Tokyo (1996).
J,Komorowski and S,Tsumoto (eds.) Rough Set Data Analysis
in Bio-medicine and Public Health,Physica-Verlag (to appear).
References and Further Readings
W,Ziarko,“Discovery through Rough Set Theory”,Knowledge
Discovery,viewing wisdom from all perspectives,
Communications of the ACM,Vol.42,No,11 (1999).
W,Ziarko (ed.) Rough Sets,Fuzzy Sets,and Knowledge
Discovery,Springer (1993).
J,Grzymala-Busse,Z,Pawlak,R,Slowinski,and W,Ziarko,
“Rough Sets”,Communications of the ACM,Vol.38,No,11
(1999).
Y.Y,Yao,“A Comparative Study of Fuzzy Sets and Rough
Sets”,Vol.109,21-47,Information Sciences (1998).
Y.Y,Yao,“Granular Computing,Basic Issues and Possible
Solutions”,Proceedings of JCIS 2000,Invited Session on
Granular Computing and Data Mining,Vol.1,186-189 (2000).
References and Further Readings
N,Zhong,A,Skowron,and S,Ohsuga (eds.),New Directions in
Rough Sets,Data Mining,and Granular-Soft Computing,LNAI
1711,Springer (1999).
A,Skowron and C,Rauszer,“The Discernibility Matrices and
Functions in Information Systems”,in R,Slowinski (ed)
Intelligent Decision Support,Handbook of Applications and
Advances of the Rough Sets Theory,331-362,Kluwer (1992).
A,Skowron and L,Polkowski,“Rough Mereological Foundations
for Design,Analysis,Synthesis,and Control in Distributive
Systems”,Information Sciences,Vol.104,No.1-2,129-156,
North-Holland (1998).
C,Liu and N,Zhong,“Rough Problem Settings for Inductive
Logic Programming”,in N,Zhong,A,Skowron,and S,Ohsuga
(eds.),New Directions in Rough Sets,Data Mining,and
Granular-Soft Computing,LNAI 1711,168-177,Springer (1999).
References and Further Readings
J.Z,Dong,N,Zhong,and S,Ohsuga,“Rule Discovery by
Probabilistic Rough Induction”,Journal of Japanese Society for
Artificial Intelligence,Vol.15,No.2,276-286 (2000),
N,Zhong,J.Z,Dong,and S,Ohsuga,“GDT-RS,A Probabilistic
Rough Induction System”,Bulletin of International Rough Set
Society,Vol.3,No.4,133-146 (1999).
N,Zhong,J.Z,Dong,and S,Ohsuga,“Using Rough Sets with
Heuristics for Feature Selection”,Journal of Intelligent
Information Systems (to appear).
N,Zhong,J.Z,Dong,and S,Ohsuga,“Soft Techniques for Rule
Discovery in Data”,NEUROCOMPUTING,An International
Journal,Special Issue on Rough-Neuro Computing (to appear).
References and Further Readings
H.S,Nguyen and S.H,Nguyen,“Discretization Methods in Data
Mining”,in L,Polkowski and A,Skowron (eds.) Rough Sets in
Knowledge Discovery,Vol.1,451-482,Physica-Verlag (1998).
T.Y,Lin,(ed.) Journal of Intelligent Automation and Soft
Computing,Vol.2,No,2,Special Issue on Rough Sets (1996),
T.Y,Lin (ed.) International Journal of Approximate Reasoning,
Vol.15,No,4,Special Issue on Rough Sets (1996).
Z,Ziarko (ed.) Computational Intelligence,An International
Journal,Vol.11,No,2,Special Issue on Rough Sets (1995).
Z,Ziarko (ed.) Fundamenta Informaticae,An International
Journal,Vol.27,No,2-3,Special Issue on Rough Sets (1996).
References and Further Readings
A,Skowron et al,(eds.) NEUROCOMPUTING,An
International Journal,Special Issue on Rough-Neuro
Computing (to appear).
A,Skowron,N,Zhong,and N,Cercone (eds.) Computational
Intelligence,An International Journal,Special Issue on
Rough Sets,Data Mining,and Granular Computing
(to appear).
J,Grzymala-Busse,R,Swiniarski,N,Zhong,and Z,Ziarko
(eds.) International Journal of Applied Mathematics and
Computer Science,Special Issue on Rough Sets and Its
Applications (to appear).
Related Conference and Web
Pages
RSCTC’2000 will be held in October 16-19,
Banff,Canada
http://www.cs.uregina.ca/~yyao/RSCTC200/
International Rough Set Society
http://www.cs.uregina.ca/~yyao/irss/bulletin.html
BISC/SIG-GrC
http://www.cs.uregina.ca/~yyao/GrC/
Thank You!