Information Management
Data Mining
(from the book Data Mining: Concepts and Techniques)
Copyright By Assignmentchef assignmentchef
Universita degli Studi di Mining
What is data mining?
It is a set of techniques and tools aimed at extracting interesting patterns from data
Data mining is part of KDD (knowledge discovery in databases)
KDD is the process of identifying valid, novel, potential useful, and ultimately
understandable patterns in data
Data warehouses are crucial for data mining purposes
Types of data mining
Concept description
Characterization: concise and succinct representation of a collection of data Comparison: description comparing two (or more) collections of data
Descriptive data mining
Describes concepts or task-relevant data sets in a concise and informative
Predictive data mining
Builds a model based on data and on their analysis, the model is then used to predict trends and properties of unknown data
Data mining branches (1)
Aggregation queries are a very simple kind of mining
Classification
Build a model to categorize data in classes
Regression
Build a model to predict the result of a real-valued function
Clustering
Organize data into groups of similar items
Outlier detection
Identify unusual data items
Data mining branches (2)
Trend analysis and forecasting
Identify changes in patterns of data over time
Detect dependencies among data
Identify whether attributes are correlated with each other Identify which attributes likely occur together
Temporal pattern detection (or time series mining) Identify common patterns in time series
Data mining: be careful! (1)
Overfitting
Identify spurious patterns: be careful not to take coincidence for causality!
May be due to the analysis of too many attributes or of a limited number of data items
Example: ask 10.000 subjects to predict the color of 10 face-down cards, 10 subjects predicted correctly all the 10 cards
conclusion: 1 out 1.000 subjects have extra sensory perception NO
Report obvious results that do not derive from data analysis Example: women are more likely to have breast cancer
Data mining: be careful! (2)
Confuse correlation and causation
Data mining identifies correlated attributed, but this does not always imply
causality relationship!
Example: overweight people are more likely to drink diet soda
Conclusion: diet soda causes obesity NO
It is necessary to correctly interpret mining results
Data mining algorithms are not magic
Results must be carefully analyzed to avoid drawing wrong conclusions
Examples of application domains
Market analysis
Targeted marketing, customer profiling
Determining patterns of purchases over time for suggestions Cross market analysis
Corporate analysis and risk management Finance planning and asset evaluation
Resource planning
Competition
CS: assignmentchef QQ: 1823890830 Email: [email protected]
Reviews
There are no reviews yet.