mohammad el-hajj

Post on 19-Jan-2016

55 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Inverted Matrix : Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining. KDD 2003. Mohammad El-Hajj. Osmar R. Zaïane. Department of Computing Science University of Alberta, Canada. Introduction Pre-processing Mining Phase Experiments Conclusion. - PowerPoint PPT Presentation

TRANSCRIPT

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Inverted Matrix: Efficient Discovery of Frequent Items

in Large Datasets in the Context of Interactive Mining

Mohammad El-Hajj Osmar R. Zaïane

KDD 2003

Department of Computing ScienceUniversity of Alberta, Canada

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Outline

• Introduction

• Pre-Processing Phase

• Mining Phase

Transactional Layouts

Building COFI-trees

Mining COFI-trees• Experimental Studies

• Conclusion and Future work

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Association rule mining is crucial in many applications and plays an essential role in many important mining tasks.

Association Rule Mining

Frequent Itemset Mining Association Rules Generation

1 2FIM

IntroductionPre-processingMining PhaseExperimentsConclusion

Antecedent Consequent Body Head

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Expensive candidacy generation step

OR

Huge Memory based Data structures

Challenges for FIM

3. Non interactive mining

1. High memory dependency

2. Repetitive tasks, (I/O) readings (Superfluous Processing)

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

T#

T1 A G D C B

T2 B C H E D

T3 B D E A M

T4 C E F A N

T5 A B N O P

T6 A C Q R G

T7 A C H I G

T8 L E F K B

T9 A F M N O

T10 C F P J R

T11 A D B H I

T12 D E B K L

T13 M D C G O

T14 C F P Q J

T15 B D E F I

T16 J E B A D

T17 A K E F C

T18 C D L B A

Items T#

T1 A G D C B

T2 B C G E D

T3 B D E A G

T4 C E F A G

T5 A B G G G

T6 A C G G G

T7 A C G G G

T8 G E F G B

T9 A F G G G

T10 C F G G G

T11 A D B G G

T12 D E B G G

T13 G D C G G

T14 C F G G G

T15 B D E F G

T16 G E B A D

T17 A G E F C

T18 C D G B A

ItemsT#

T1 A D C B

T2 B C E D

T3 B D E A

T4 C E F A

T5 A B

T6 A C

T7 A C

T8 E F B

T9 A F

T10 C F

T11 A D B

T12 D E B

T13 D C

T14 C F

T15 B D E F

T16 E B A D

T17 A E F C

T18 C D B A

Items

Support > 4

Challenges for FIM

3. Non interactive mining

1. High memory dependency

2. Repetitive tasks, (I/O) readings (Superfluous Processing)

IntroductionPre-processingMining PhaseExperimentsConclusion

Frequent 1-itemsets {A, B, C, D, E, F}Non frequent items {G, H, I, J, K, L, M, N, O, P, Q, R}

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Support > 9

T#

T1 A G G C B

T2 B C G G G

T3 B G G A G

T4 C G G A G

T5 A B G G G

T6 A C G G G

T7 A C G G G

T8 G G G G B

T9 A G G G G

T10 C G G G G

T11 A G B G G

T12 G G B G G

T13 G G C G G

T14 C G G G G

T15 B G G G G

T16 G G B A G

T17 A G G G C

T18 C G G B A

ItemsT#

T1 A C B

T2 B C

T3 B A

T4 C A

T5 A B

T6 A C

T7 A C

T8 B

T9 A

T10 C

T11 A B

T12 B

T13 C

T14 C

T15 B

T16 B A

T17 A C

T18 C B A

Items

Challenges for FIM

T#

T1 A G D C B

T2 B C H E D

T3 B D E A M

T4 C E F A N

T5 A B N O P

T6 A C Q R G

T7 A C H I G

T8 L E F K B

T9 A F M N O

T10 C F P J R

T11 A D B H I

T12 D E B K L

T13 M D C G O

T14 C F P Q J

T15 B D E F I

T16 J E B A D

T17 A K E F C

T18 C D L B A

Items

3. Non interactive mining

1. High memory dependency

2. Repetitive tasks, (I/O) readings (Superfluous Processing)

IntroductionPre-processingMining PhaseExperimentsConclusion

Frequent 1-itemsets {A, B, C}Non frequent items {D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R}

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Changing the support level means expensive steps (whole process is redone)

Knowledge

Data warehouse

Databases

Selection and Transformation

Data Mining

Evaluation and

Presentation

Patterns

Challenges for FIM

3. Non interactive mining

1. High memory dependency

2. Repetitive tasks, (I/O) readings (Superfluous Processing)

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

• New association Rule mining algorithm that has the following features

1. Low Memory Dependency 2. Remove Superfluous Processing3. Interactive Mining Ready

MotivationIntroductionPre-processingMining PhaseExperimentsConclusion

Without compromising scalability

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Transactional Layouts

• Horizontal Layout

Transaction ID

1 A G D C B

2 B C H E D

3 B D E A M

4 C E F A N

5 A B N O P

6 A C Q R G

7 A C H I G

8 L E F K B

9 A F M N O

10 C F P J R

11 A D B H I

12 D E B K L

13 M D C G O

14 C F P Q J

15 B D E F I

16 J E B A D

17 A K E F C

18 C D L B A

Items

Candidacy generation can be removed (FP-Growth)

Superfluous Processing

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Transaction ID

1 A G D C B

2 B C H E D

3 B D E A M

4 C E F A N

5 A B N O P

6 A C Q R G

7 A C H I G

8 L E F K B

9 A F M N O

10 C F P J R

11 A D B H I

12 D E B K L

13 M D C G O

14 C F P Q J

15 B D E F I

16 J E B A D

17 A K E F C

18 C D L B A

Items Items

A 1 3 4 5 6 7 9 11 16 17 18B 1 2 3 5 8 11 12 15 16 18C 1 2 4 6 7 10 13 14 17 18D 1 2 3 11 12 13 15 16 18E 2 3 4 8 12 15 16 17F 4 8 9 10 14 15 17G 1 6 7 13H 2 7 11I 7 11 15J 10 14 16K 8 12 17L 8 12 18M 3 9 13N 4 5 9O 5 9 13P 5 9 13Q 6 14R 6 10

Transaction ID

Candidacy generation is required

Minimize Superfluous Processing

Transactional Layouts

• Vertical Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Suggested Layout

• Inverted Matrix Layout: Combines the horizontal and vertical layouts

Transaction ID

1 A G D C B

2 B C H E D

3 B D E A M

4 C E F A N

5 A B N O P

6 A C Q R G

7 A C H I G

8 L E F K B

9 A F M N O

10 C F P J R

11 A D B H I

12 D E B K L

13 M D C G O

14 C F P Q J

15 B D E F I

16 J E B A D

17 A K E F C

18 C D L B A

Items

2 I/O passes

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

1 R 22 Q 23 P 34 O 35 N 36 M 37 L 38 K 39 J 310 I 311 H 312 G 413 F 714 E 815 D 916 C 1017 B 1018 A 11

IndexTransactional Array

Loc

Transactional Layouts

• Inverted Matrix LayoutPass 1, generates sorted item list (based on frequency)

Transaction ID

1 A G D C B

2 B C H E D

3 B D E A M

4 C E F A N

5 A B N O P

6 A C Q R G

7 A C H I G

8 L E F K B

9 A F M N O

10 C F P J R

11 A D B H I

12 D E B K L

13 M D C G O

14 C F P Q J

15 B D E F I

16 J E B A D

17 A K E F C

18 C D L B A

Items

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

1 R 22 Q 23 P 34 O 35 N 36 M 37 L 38 K 39 J 310 I 311 H 312 G 4 (15,1)13 F 714 E 815 D 9 (16,1)16 C 10 (17,1)17 B 10 (18,1)18 A 11 (¤, ¤)

Loc IndexTransactional Array

Pass 2, Generate the transactional array of the IM

T#

T1 A G D C B

T2 B C H E D

T3 B D E A M

T4 C E F A N

T5 A B N O P

T6 A C Q R G

T7 A C H I G

T8 L E F K B

T9 A F M N O

T10 C F P J R

T11 A D B H I

T12 D E B K L

T13 M D C G O

T14 C F P Q J

T15 B D E F I

T16 J E B A D

T17 A K E F C

T18 C D L B A

Items

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

1 R 22 Q 23 P 34 O 35 N 36 M 37 L 38 K 39 J 310 I 311 H 3 (14,1)12 G 4 (15,1)13 F 714 E 8 (15,2)15 D 9 (16,1) (16,2)16 C 10 (17,1) (17,2)17 B 10 (18,1) (¤, ¤)18 A 11 (¤, ¤)

Loc IndexTransactional Array

T#

T1 A G D C B

T2 B C H E D

T3 B D E A M

T4 C E F A N

T5 A B N O P

T6 A C Q R G

T7 A C H I G

T8 L E F K B

T9 A F M N O

T10 C F P J R

T11 A D B H I

T12 D E B K L

T13 M D C G O

T14 C F P Q J

T15 B D E F I

T16 J E B A D

T17 A K E F C

T18 C D L B A

Items

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Transactional Layouts

• Inverted Matrix Layout

There is no minimum support involved in building the Inverted Matrix.

IntroductionPre-processingMining PhaseExperimentsConclusion

1 2 3 4 5 6 7 8 9 10 11

1 R 2 (2,1) (3,2)

2 Q 2 (12,2) (3,3)3 P 3 (4,1) (9,1) (9,2)4 O 3 (5,2) (5,3) (6,3)5 N 3 (13,1) (17,4) (6,2)6 M 3 (14,2) (13,3) (12,4)7 L 3 (8,1) (8,2) (15,9)8 K 3 (13,2) (14,5) (13,7)9 J 3 (13,4) (13,5) (14,7)10 I 3 (11,2) (11,3) (13,6)11 H 3 (14,1) (12,3) 15,4)12 G 4 (15,1) (16,4) (16,5) (15,6)13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (18,10) (17,10)17 B 10 (18,1) (18,2) (18,4) (18,8) (18,9) (18,11)18 A 11

LocTransactional Array

Index

(¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤)(¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤)

(¤, ¤) (¤, ¤) (¤, ¤)

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

1 R 2 (2,1) (3,2)2 Q 2 (12,2) (3,3)3 P 3 (4,1) (9,1) (9,2)4 O 3 (5,2) (5,3) (6,3)5 N 3 (13,1) (17,4) (6,2)6 M 3 (14,2) (13,3) (12,4)7 L 3 (8,1) (8,2) (15,9)8 K 3 (13,2) (14,5) (13,7)9 J 3 (13,4) (13,5) (14,7)10 I 3 (11,2) (11,3) (13,6)11 H 3 (14,1) (12,3) 15,4)12 G 4 (15,1) (16,4) (16,5) (15,6)

13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (☼,☼) (☼,☼) (☼,☼) (18,10) (17,10)17 B 10 (18,1) (☼,☼) (18,2) (18,4) (☼,☼) (18,8) (☼,☼) (☼,☼) (18,9) (18,11)18 A 11 (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼)

Loc IndexTransactional Array

Support > 4

Border Support

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (☼,☼) (☼,☼) (☼,☼) (18,10) (17,10)17 B 10 (18,1) (☼,☼) (18,2) (18,4) (☼,☼) (18,8) (☼,☼) (☼,☼) (18,9) (18,11)18 A 11 (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼)

IndexTransactional Array

Loc

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (☼,☼) (☼,☼) (☼,☼) (18,10) (17,10)17 B 10 (18,1) (☼,☼) (18,2) (18,4) (☼,☼) (18,8) (☼,☼) (☼,☼) (18,9) (18,11)18 A 11 (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼)

Loc IndexTransactional Array

T#

T1 A D C B

T2 B C E D

T3 B D E A

T4 C E F A

T5 A B

T6 A C

T7 A C

T8 E F B

T9 A F

T10 C F

T11 A D B

T12 D E B

T13 D C

T14 C F

T15 B D E F

T16 E B A D

T17 A E F C

T18 C D B A

Items

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Sub transactions generated from IM

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item FE D C B

E D B A

E C A

E B

E D B

E D B

E D B A

E C A

Item ED C B A

D C B

D B A

D B A

D B

D C

D B

D B A

D C B A

Item D

C B A

C B

C A

C A

C A

C A

C B A

Item C

B A

B A

B A

B A

B A

B A

Item B

IntroductionPre-processingMining PhaseExperimentsConclusion

Frequent sub-transaction with item F

Frequent sub-transaction with item D

Frequent sub-transaction with item E

Frequent sub-transaction with item B

Frequent sub-transaction with item C

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Co-Occurrences Frequent Item tree

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 1 0 )

E 1 E ( 1 0 )D 0

C 1

B 0 C ( 1 0 )A 1

A ( 1 0 )

Building F-COFI-tree

Frequency Count

Participation Count

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 2 0 )

E 2 E ( 2 0 )D 0

C 1

B 1 C ( 1 0 ) B ( 1 0 )A 1

A ( 1 0 )

Co-Occurrences

Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 3 0 )

E 2 E ( 2 0 ) A ( 1 0 )D 0

C 1

B 1 C ( 1 0 ) B ( 1 0 )A 2

A ( 1 0 )

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 4 0 )

E 2 E ( 2 0 ) A ( 1 0 ) C ( 1 0 )D 0

C 2

B 1 C ( 1 0 ) B ( 1 0 )A 2

A ( 1 0 )

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 5 0 )

E 2 E ( 2 0 ) A ( 1 0 ) C ( 2 0 )D 0

C 3

B 1 C ( 1 0 ) B ( 1 0 )A 2

A ( 1 0 )

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 7 0 )

E 3 E ( 3 0 ) A ( 1 0 ) C ( 2 0 )D 1

C 3

B 2 C ( 1 0 ) B ( 1 0 ) D ( 1 0 )A 2

A ( 1 0 ) B ( 1 0 )

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F ( 7 0 )

E 4 E ( 4 0 ) A ( 1 0 ) C ( 2 0 )D 1

C 4

B 2 C ( 2 0 ) B ( 1 0 ) D ( 1 0 )A 3

A ( 2 0 ) B ( 1 0 )

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E D C B

E D B A

E C A

E B

E D B

E D B

E D B A

E C A

Item E

D C B A

D C B

D B A

D B A

D B

D C

D B

D B A

D C B A

Item D

E ( 8 0 )

D 5 D ( 5 0 ) C ( 2 0 ) B ( 1 0 )

C 3

B 6

A 4 C ( 1 0 ) B ( 4 0 ) A ( 2 0 )

B ( 1 0 ) A ( 2 0 )

D ( 9 0 )

C 4 C ( 4 0 ) B ( 5 0 )

B 8

A 5

B ( 3 0 ) A ( 3 0 )

A ( 2 0 )

C B A

C B

C A

C A

C A

C A

C B A

Item CC ( 7 0 )

B ( 3 0 ) A ( 4 0 )

B 3

A 6

A ( 2 0 )

B ( 6 0 )

A 6

A ( 6 0 )

B A

B A

B A

B A

B A

B A

Item B

Co-Occurrences Frequent Item tree

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E ( 8 0 )

D 5 D ( 5 0 ) C ( 2 0 ) B ( 1 0 )

C 3

B 6

A 4 C ( 1 0 ) B ( 4 0 ) A ( 2 0 )

B ( 1 0 ) A ( 2 0 )

Mining COFI-trees

E-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

 

E ( 8 0 ) E ( 8 1 )

E D B 1

D 5 D ( 5 0 ) C ( 2 0 ) B ( 1 0 ) D ( 5 1 ) E D 1

C 3 E B 1

B 6 E D B 1

A 3 C ( 1 0 ) B ( 4 0 ) A ( 2 0 ) C ( 1 1 )

B ( 1 0 ) A ( 2 0 ) B ( 1 1 )

E-COFI-treeSupport = Frequency count – Participation count

Mining COFI-treesIntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E ( 8 1 ) E ( 8 5 ) E D B 4

E D 5

D 5 D ( 5 1 ) C ( 2 0 ) B ( 1 0 ) D ( 5 5 ) E B 5

C 3 E D B 5

B 6

A 3 C ( 1 1 ) B ( 4 0 ) A ( 2 0 ) B ( 4 4 )

B ( 1 1 ) A ( 2 0 )

Mining COFI-trees

E-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E B 1

E ( 8 5 ) E ( 8 6 )

E D 5

E B 6

D 5 D ( 5 5 ) C ( 2 0 ) B ( 1 0 ) B ( 1 1 ) E D B 5

C 3

B 6

A 3 C ( 1 1 ) B ( 4 4 ) A ( 2 0 )

B ( 1 1 ) A ( 2 0 )

Mining COFI-trees

E-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E ( 8 6 ) E ( 8 6 ) No change

D 5 D ( 5 5 ) C ( 2 0 ) B ( 1 1 ) D ( 5 5 )

C 3

B 6

A 3 C ( 1 1 ) B ( 4 4 ) A ( 2 0 )

B ( 1 1 ) A ( 2 0 ) E D 5

E B 6

E D B 5

Mining COFI-trees

E-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

D ( 9 0 )

C 4 C ( 4 0 ) B ( 5 0 )

B 8

A 5

B ( 3 0 ) A ( 3 0 )

A ( 2 0 )

C ( 7 0 )

B ( 3 0 ) A ( 4 0 )

B 3

A 6

A ( 2 0 )

B ( 6 0 )

A 6

A ( 6 0 )

Mining COFI-trees

D-COFI-tree

DBA:5

DA:5

DB:8

C-COFI-tree

CA:6

B-COFI-tree

BA:6

IntroductionPre-processingMining PhaseExperimentsConclusion

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

0

2000

4000

6000

8000

10000

12000

0.0025 0.005 0.0075 0.01

Support %

Tim

e in

Sec

on

ds

Apriori

FP-Growth

Inverted Matrix

Experimental Studies

Time needed to mine 1M transactions with different support levels

IntroductionPre-processingMining PhaseExperimentsConclusion

Pentium 700Mhz with 256 MB of RAM

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Experimental Studies

Accumulated time needed to mine 1M transactions using 4 different

support levels

IntroductionPre-processingMining PhaseExperimentsConclusion

Pentium 700Mhz with 256 MB of RAM

Time needed in seconds to mine different transaction sizes

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Conclusion and Future work

Updateable Inverted Matrix for native storage of transactions

Compressing the size of Inverted Matrix

IntroductionPre-processingMining PhaseExperimentsConclusion

Future work

New AR algorithm1. Low memory dependency2. No Superfluous processing3. Interactive mining ready4. scalable

Parallelizing the mining process as well as the construction of the Inverted Matrix

top related