mohammad el-hajj

36
M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta ACM SIGKDD Aug. 2003 – Washington, DC Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining Mohammad El-Hajj Osmar R. Zaïane KDD 2003 Department of Computing Science University of Alberta, Canada

Upload: meryle

Post on 19-Jan-2016

55 views

Category:

Documents


0 download

DESCRIPTION

Inverted Matrix : Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining. KDD 2003. Mohammad El-Hajj. Osmar R. Zaïane. Department of Computing Science University of Alberta, Canada. Introduction Pre-processing Mining Phase Experiments Conclusion. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Inverted Matrix: Efficient Discovery of Frequent Items

in Large Datasets in the Context of Interactive Mining

Mohammad El-Hajj Osmar R. Zaïane

KDD 2003

Department of Computing ScienceUniversity of Alberta, Canada

Page 2: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Outline

• Introduction

• Pre-Processing Phase

• Mining Phase

Transactional Layouts

Building COFI-trees

Mining COFI-trees• Experimental Studies

• Conclusion and Future work

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 3: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Association rule mining is crucial in many applications and plays an essential role in many important mining tasks.

Association Rule Mining

Frequent Itemset Mining Association Rules Generation

1 2FIM

IntroductionPre-processingMining PhaseExperimentsConclusion

Antecedent Consequent Body Head

Page 4: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Expensive candidacy generation step

OR

Huge Memory based Data structures

Challenges for FIM

3. Non interactive mining

1. High memory dependency

2. Repetitive tasks, (I/O) readings (Superfluous Processing)

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 5: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

T#

T1 A G D C B

T2 B C H E D

T3 B D E A M

T4 C E F A N

T5 A B N O P

T6 A C Q R G

T7 A C H I G

T8 L E F K B

T9 A F M N O

T10 C F P J R

T11 A D B H I

T12 D E B K L

T13 M D C G O

T14 C F P Q J

T15 B D E F I

T16 J E B A D

T17 A K E F C

T18 C D L B A

Items T#

T1 A G D C B

T2 B C G E D

T3 B D E A G

T4 C E F A G

T5 A B G G G

T6 A C G G G

T7 A C G G G

T8 G E F G B

T9 A F G G G

T10 C F G G G

T11 A D B G G

T12 D E B G G

T13 G D C G G

T14 C F G G G

T15 B D E F G

T16 G E B A D

T17 A G E F C

T18 C D G B A

ItemsT#

T1 A D C B

T2 B C E D

T3 B D E A

T4 C E F A

T5 A B

T6 A C

T7 A C

T8 E F B

T9 A F

T10 C F

T11 A D B

T12 D E B

T13 D C

T14 C F

T15 B D E F

T16 E B A D

T17 A E F C

T18 C D B A

Items

Support > 4

Challenges for FIM

3. Non interactive mining

1. High memory dependency

2. Repetitive tasks, (I/O) readings (Superfluous Processing)

IntroductionPre-processingMining PhaseExperimentsConclusion

Frequent 1-itemsets {A, B, C, D, E, F}Non frequent items {G, H, I, J, K, L, M, N, O, P, Q, R}

Page 6: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Support > 9

T#

T1 A G G C B

T2 B C G G G

T3 B G G A G

T4 C G G A G

T5 A B G G G

T6 A C G G G

T7 A C G G G

T8 G G G G B

T9 A G G G G

T10 C G G G G

T11 A G B G G

T12 G G B G G

T13 G G C G G

T14 C G G G G

T15 B G G G G

T16 G G B A G

T17 A G G G C

T18 C G G B A

ItemsT#

T1 A C B

T2 B C

T3 B A

T4 C A

T5 A B

T6 A C

T7 A C

T8 B

T9 A

T10 C

T11 A B

T12 B

T13 C

T14 C

T15 B

T16 B A

T17 A C

T18 C B A

Items

Challenges for FIM

T#

T1 A G D C B

T2 B C H E D

T3 B D E A M

T4 C E F A N

T5 A B N O P

T6 A C Q R G

T7 A C H I G

T8 L E F K B

T9 A F M N O

T10 C F P J R

T11 A D B H I

T12 D E B K L

T13 M D C G O

T14 C F P Q J

T15 B D E F I

T16 J E B A D

T17 A K E F C

T18 C D L B A

Items

3. Non interactive mining

1. High memory dependency

2. Repetitive tasks, (I/O) readings (Superfluous Processing)

IntroductionPre-processingMining PhaseExperimentsConclusion

Frequent 1-itemsets {A, B, C}Non frequent items {D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R}

Page 7: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Changing the support level means expensive steps (whole process is redone)

Knowledge

Data warehouse

Databases

Selection and Transformation

Data Mining

Evaluation and

Presentation

Patterns

Challenges for FIM

3. Non interactive mining

1. High memory dependency

2. Repetitive tasks, (I/O) readings (Superfluous Processing)

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 8: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

• New association Rule mining algorithm that has the following features

1. Low Memory Dependency 2. Remove Superfluous Processing3. Interactive Mining Ready

MotivationIntroductionPre-processingMining PhaseExperimentsConclusion

Without compromising scalability

Page 9: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Transactional Layouts

• Horizontal Layout

Transaction ID

1 A G D C B

2 B C H E D

3 B D E A M

4 C E F A N

5 A B N O P

6 A C Q R G

7 A C H I G

8 L E F K B

9 A F M N O

10 C F P J R

11 A D B H I

12 D E B K L

13 M D C G O

14 C F P Q J

15 B D E F I

16 J E B A D

17 A K E F C

18 C D L B A

Items

Candidacy generation can be removed (FP-Growth)

Superfluous Processing

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 10: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Transaction ID

1 A G D C B

2 B C H E D

3 B D E A M

4 C E F A N

5 A B N O P

6 A C Q R G

7 A C H I G

8 L E F K B

9 A F M N O

10 C F P J R

11 A D B H I

12 D E B K L

13 M D C G O

14 C F P Q J

15 B D E F I

16 J E B A D

17 A K E F C

18 C D L B A

Items Items

A 1 3 4 5 6 7 9 11 16 17 18B 1 2 3 5 8 11 12 15 16 18C 1 2 4 6 7 10 13 14 17 18D 1 2 3 11 12 13 15 16 18E 2 3 4 8 12 15 16 17F 4 8 9 10 14 15 17G 1 6 7 13H 2 7 11I 7 11 15J 10 14 16K 8 12 17L 8 12 18M 3 9 13N 4 5 9O 5 9 13P 5 9 13Q 6 14R 6 10

Transaction ID

Candidacy generation is required

Minimize Superfluous Processing

Transactional Layouts

• Vertical Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 11: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Suggested Layout

• Inverted Matrix Layout: Combines the horizontal and vertical layouts

Transaction ID

1 A G D C B

2 B C H E D

3 B D E A M

4 C E F A N

5 A B N O P

6 A C Q R G

7 A C H I G

8 L E F K B

9 A F M N O

10 C F P J R

11 A D B H I

12 D E B K L

13 M D C G O

14 C F P Q J

15 B D E F I

16 J E B A D

17 A K E F C

18 C D L B A

Items

2 I/O passes

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 12: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

1 R 22 Q 23 P 34 O 35 N 36 M 37 L 38 K 39 J 310 I 311 H 312 G 413 F 714 E 815 D 916 C 1017 B 1018 A 11

IndexTransactional Array

Loc

Transactional Layouts

• Inverted Matrix LayoutPass 1, generates sorted item list (based on frequency)

Transaction ID

1 A G D C B

2 B C H E D

3 B D E A M

4 C E F A N

5 A B N O P

6 A C Q R G

7 A C H I G

8 L E F K B

9 A F M N O

10 C F P J R

11 A D B H I

12 D E B K L

13 M D C G O

14 C F P Q J

15 B D E F I

16 J E B A D

17 A K E F C

18 C D L B A

Items

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 13: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

1 R 22 Q 23 P 34 O 35 N 36 M 37 L 38 K 39 J 310 I 311 H 312 G 4 (15,1)13 F 714 E 815 D 9 (16,1)16 C 10 (17,1)17 B 10 (18,1)18 A 11 (¤, ¤)

Loc IndexTransactional Array

Pass 2, Generate the transactional array of the IM

T#

T1 A G D C B

T2 B C H E D

T3 B D E A M

T4 C E F A N

T5 A B N O P

T6 A C Q R G

T7 A C H I G

T8 L E F K B

T9 A F M N O

T10 C F P J R

T11 A D B H I

T12 D E B K L

T13 M D C G O

T14 C F P Q J

T15 B D E F I

T16 J E B A D

T17 A K E F C

T18 C D L B A

Items

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 14: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

1 R 22 Q 23 P 34 O 35 N 36 M 37 L 38 K 39 J 310 I 311 H 3 (14,1)12 G 4 (15,1)13 F 714 E 8 (15,2)15 D 9 (16,1) (16,2)16 C 10 (17,1) (17,2)17 B 10 (18,1) (¤, ¤)18 A 11 (¤, ¤)

Loc IndexTransactional Array

T#

T1 A G D C B

T2 B C H E D

T3 B D E A M

T4 C E F A N

T5 A B N O P

T6 A C Q R G

T7 A C H I G

T8 L E F K B

T9 A F M N O

T10 C F P J R

T11 A D B H I

T12 D E B K L

T13 M D C G O

T14 C F P Q J

T15 B D E F I

T16 J E B A D

T17 A K E F C

T18 C D L B A

Items

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 15: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Transactional Layouts

• Inverted Matrix Layout

There is no minimum support involved in building the Inverted Matrix.

IntroductionPre-processingMining PhaseExperimentsConclusion

1 2 3 4 5 6 7 8 9 10 11

1 R 2 (2,1) (3,2)

2 Q 2 (12,2) (3,3)3 P 3 (4,1) (9,1) (9,2)4 O 3 (5,2) (5,3) (6,3)5 N 3 (13,1) (17,4) (6,2)6 M 3 (14,2) (13,3) (12,4)7 L 3 (8,1) (8,2) (15,9)8 K 3 (13,2) (14,5) (13,7)9 J 3 (13,4) (13,5) (14,7)10 I 3 (11,2) (11,3) (13,6)11 H 3 (14,1) (12,3) 15,4)12 G 4 (15,1) (16,4) (16,5) (15,6)13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (18,10) (17,10)17 B 10 (18,1) (18,2) (18,4) (18,8) (18,9) (18,11)18 A 11

LocTransactional Array

Index

(¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤)(¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤)

(¤, ¤) (¤, ¤) (¤, ¤)

Page 16: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

1 R 2 (2,1) (3,2)2 Q 2 (12,2) (3,3)3 P 3 (4,1) (9,1) (9,2)4 O 3 (5,2) (5,3) (6,3)5 N 3 (13,1) (17,4) (6,2)6 M 3 (14,2) (13,3) (12,4)7 L 3 (8,1) (8,2) (15,9)8 K 3 (13,2) (14,5) (13,7)9 J 3 (13,4) (13,5) (14,7)10 I 3 (11,2) (11,3) (13,6)11 H 3 (14,1) (12,3) 15,4)12 G 4 (15,1) (16,4) (16,5) (15,6)

13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (☼,☼) (☼,☼) (☼,☼) (18,10) (17,10)17 B 10 (18,1) (☼,☼) (18,2) (18,4) (☼,☼) (18,8) (☼,☼) (☼,☼) (18,9) (18,11)18 A 11 (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼)

Loc IndexTransactional Array

Support > 4

Border Support

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 17: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (☼,☼) (☼,☼) (☼,☼) (18,10) (17,10)17 B 10 (18,1) (☼,☼) (18,2) (18,4) (☼,☼) (18,8) (☼,☼) (☼,☼) (18,9) (18,11)18 A 11 (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼)

IndexTransactional Array

Loc

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 18: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (☼,☼) (☼,☼) (☼,☼) (18,10) (17,10)17 B 10 (18,1) (☼,☼) (18,2) (18,4) (☼,☼) (18,8) (☼,☼) (☼,☼) (18,9) (18,11)18 A 11 (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼)

Loc IndexTransactional Array

T#

T1 A D C B

T2 B C E D

T3 B D E A

T4 C E F A

T5 A B

T6 A C

T7 A C

T8 E F B

T9 A F

T10 C F

T11 A D B

T12 D E B

T13 D C

T14 C F

T15 B D E F

T16 E B A D

T17 A E F C

T18 C D B A

Items

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 19: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Sub transactions generated from IM

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item FE D C B

E D B A

E C A

E B

E D B

E D B

E D B A

E C A

Item ED C B A

D C B

D B A

D B A

D B

D C

D B

D B A

D C B A

Item D

C B A

C B

C A

C A

C A

C A

C B A

Item C

B A

B A

B A

B A

B A

B A

Item B

IntroductionPre-processingMining PhaseExperimentsConclusion

Frequent sub-transaction with item F

Frequent sub-transaction with item D

Frequent sub-transaction with item E

Frequent sub-transaction with item B

Frequent sub-transaction with item C

Page 20: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Co-Occurrences Frequent Item tree

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 1 0 )

E 1 E ( 1 0 )D 0

C 1

B 0 C ( 1 0 )A 1

A ( 1 0 )

Building F-COFI-tree

Frequency Count

Participation Count

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 21: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 2 0 )

E 2 E ( 2 0 )D 0

C 1

B 1 C ( 1 0 ) B ( 1 0 )A 1

A ( 1 0 )

Co-Occurrences

Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 22: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 3 0 )

E 2 E ( 2 0 ) A ( 1 0 )D 0

C 1

B 1 C ( 1 0 ) B ( 1 0 )A 2

A ( 1 0 )

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 23: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 4 0 )

E 2 E ( 2 0 ) A ( 1 0 ) C ( 1 0 )D 0

C 2

B 1 C ( 1 0 ) B ( 1 0 )A 2

A ( 1 0 )

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 24: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 5 0 )

E 2 E ( 2 0 ) A ( 1 0 ) C ( 2 0 )D 0

C 3

B 1 C ( 1 0 ) B ( 1 0 )A 2

A ( 1 0 )

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 25: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 7 0 )

E 3 E ( 3 0 ) A ( 1 0 ) C ( 2 0 )D 1

C 3

B 2 C ( 1 0 ) B ( 1 0 ) D ( 1 0 )A 2

A ( 1 0 ) B ( 1 0 )

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 26: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F ( 7 0 )

E 4 E ( 4 0 ) A ( 1 0 ) C ( 2 0 )D 1

C 4

B 2 C ( 2 0 ) B ( 1 0 ) D ( 1 0 )A 3

A ( 2 0 ) B ( 1 0 )

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 27: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E D C B

E D B A

E C A

E B

E D B

E D B

E D B A

E C A

Item E

D C B A

D C B

D B A

D B A

D B

D C

D B

D B A

D C B A

Item D

E ( 8 0 )

D 5 D ( 5 0 ) C ( 2 0 ) B ( 1 0 )

C 3

B 6

A 4 C ( 1 0 ) B ( 4 0 ) A ( 2 0 )

B ( 1 0 ) A ( 2 0 )

D ( 9 0 )

C 4 C ( 4 0 ) B ( 5 0 )

B 8

A 5

B ( 3 0 ) A ( 3 0 )

A ( 2 0 )

C B A

C B

C A

C A

C A

C A

C B A

Item CC ( 7 0 )

B ( 3 0 ) A ( 4 0 )

B 3

A 6

A ( 2 0 )

B ( 6 0 )

A 6

A ( 6 0 )

B A

B A

B A

B A

B A

B A

Item B

Co-Occurrences Frequent Item tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 28: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E ( 8 0 )

D 5 D ( 5 0 ) C ( 2 0 ) B ( 1 0 )

C 3

B 6

A 4 C ( 1 0 ) B ( 4 0 ) A ( 2 0 )

B ( 1 0 ) A ( 2 0 )

Mining COFI-trees

E-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 29: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

 

E ( 8 0 ) E ( 8 1 )

E D B 1

D 5 D ( 5 0 ) C ( 2 0 ) B ( 1 0 ) D ( 5 1 ) E D 1

C 3 E B 1

B 6 E D B 1

A 3 C ( 1 0 ) B ( 4 0 ) A ( 2 0 ) C ( 1 1 )

B ( 1 0 ) A ( 2 0 ) B ( 1 1 )

E-COFI-treeSupport = Frequency count – Participation count

Mining COFI-treesIntroductionPre-processingMining PhaseExperimentsConclusion

Page 30: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E ( 8 1 ) E ( 8 5 ) E D B 4

E D 5

D 5 D ( 5 1 ) C ( 2 0 ) B ( 1 0 ) D ( 5 5 ) E B 5

C 3 E D B 5

B 6

A 3 C ( 1 1 ) B ( 4 0 ) A ( 2 0 ) B ( 4 4 )

B ( 1 1 ) A ( 2 0 )

Mining COFI-trees

E-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 31: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E B 1

E ( 8 5 ) E ( 8 6 )

E D 5

E B 6

D 5 D ( 5 5 ) C ( 2 0 ) B ( 1 0 ) B ( 1 1 ) E D B 5

C 3

B 6

A 3 C ( 1 1 ) B ( 4 4 ) A ( 2 0 )

B ( 1 1 ) A ( 2 0 )

Mining COFI-trees

E-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 32: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E ( 8 6 ) E ( 8 6 ) No change

D 5 D ( 5 5 ) C ( 2 0 ) B ( 1 1 ) D ( 5 5 )

C 3

B 6

A 3 C ( 1 1 ) B ( 4 4 ) A ( 2 0 )

B ( 1 1 ) A ( 2 0 ) E D 5

E B 6

E D B 5

Mining COFI-trees

E-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 33: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

D ( 9 0 )

C 4 C ( 4 0 ) B ( 5 0 )

B 8

A 5

B ( 3 0 ) A ( 3 0 )

A ( 2 0 )

C ( 7 0 )

B ( 3 0 ) A ( 4 0 )

B 3

A 6

A ( 2 0 )

B ( 6 0 )

A 6

A ( 6 0 )

Mining COFI-trees

D-COFI-tree

DBA:5

DA:5

DB:8

C-COFI-tree

CA:6

B-COFI-tree

BA:6

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 34: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

0

2000

4000

6000

8000

10000

12000

0.0025 0.005 0.0075 0.01

Support %

Tim

e in

Sec

on

ds

Apriori

FP-Growth

Inverted Matrix

Experimental Studies

Time needed to mine 1M transactions with different support levels

IntroductionPre-processingMining PhaseExperimentsConclusion

Pentium 700Mhz with 256 MB of RAM

Page 35: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Experimental Studies

Accumulated time needed to mine 1M transactions using 4 different

support levels

IntroductionPre-processingMining PhaseExperimentsConclusion

Pentium 700Mhz with 256 MB of RAM

Time needed in seconds to mine different transaction sizes

Page 36: Mohammad El-Hajj

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Conclusion and Future work

Updateable Inverted Matrix for native storage of transactions

Compressing the size of Inverted Matrix

IntroductionPre-processingMining PhaseExperimentsConclusion

Future work

New AR algorithm1. Low memory dependency2. No Superfluous processing3. Interactive mining ready4. scalable

Parallelizing the mining process as well as the construction of the Inverted Matrix