TPD-12c-sequence.pdf

(518 KB) Pobierz
Mining Sequence Data
JERZY STEFANOWSKI
Inst. Informatyki PP
Wersja dla TPD 2009
„Zaawansowana eksploracja danych”
Outline of the presentation
1. Realtionships to mining frequent items
2. Motivations for sequence databases and their analysis
3. Applications
4. Approximate queries and basic techniques
5. Classification in data streams
6. Clustering
7. Conclusions
This lecture is partly based on the following resources - slides:
J.Han (data mining book), slides Pinto, Pei, etc.
and my other notes.
What Is Frequent Pattern Analysis?
Frequent pattern:
a pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set
First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context
of
frequent itemsets
and
association rule mining
Motivation: Finding inherent regularities in data
What products were often purchased together? — Beer and diapers?!
What are the subsequent purchases after buying a PC?
What kinds of DNA are sensitive to this new drug?
Can we automatically classify web documents?
Applications
Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.
Why is Frequent Pattern or Association Mining an Essential Task in
Data Mining?
Foundation for many essential data mining tasks
Association, correlation, causality
Sequential patterns, temporal or cyclic association, partial
periodicity, spatial and multimedia association
Associative classification, cluster analysis, fascicles (semantic
data compression)
DB approach to efficient mining massive data
Broad applications
Basket data analysis, cross-marketing, catalog design, sale
campaign analysis
Web log (click stream) analysis, DNA sequence analysis, etc
Zgłoś jeśli naruszono regulamin