Description
Efnisyfirlit
- Cover
- Half Title
- Title Page
- Copyright Page
- Contents
- List of Figures
- List of Tables
- Preface
- Acknowledgments
- Author
- SECTION I Data Mining Fundamentals
- CHAPTER 1 ■ Data Mining: A First View
- CHAPTER OBJECTIVES
- 1.1 DATA SCIENCE, ANALYTICS, MINING, AND KNOWLEDGE DISCOVERY IN DATABASES
- 1.1.1 Data Science and Analytics
- 1.1.2 Data Mining
- 1.1.3 Data Science versus Knowledge Discovery in Databases
- 1.2 WHAT CAN COMPUTERS LEARN?
- 1.2.1 Three Concept Views
- 1.2.1.1 The Classical View
- 1.2.1.2 The Probabilistic View
- 1.2.1.3 The Exemplar View
- 1.2.2 Supervised Learning
- 1.2.3 Supervised Learning: A Decision Tree Example
- 1.2.4 Unsupervised Clustering
- 1.3 IS DATA MINING APPROPRIATE FOR MY PROBLEM?
- 1.3.1 Data Mining or Data Query?
- 1.3.2 Data Mining versus Data Query: An Example
- 1.4 DATA MINING OR KNOWLEDGE ENGINEERING?
- 1.5 A NEAREST NEIGHBOR APPROACH
- 1.6 A PROCESS MODEL FOR DATA MINING
- 1.6.1 Acquiring Data
- 1.6.1.1 The Data Warehouse
- 1.6.1.2 Relational Databases and Flat Files
- 1.6.1.3 Distributed Data Access
- 1.6.2 Data Preprocessing
- 1.6.3 Mining the Data
- 1.6.4 Interpreting the Results
- 1.6.5 Result Application
- 1.7 DATA MINING, BIG DATA, AND CLOUD COMPUTING
- 1.7.1 Hadoop
- 1.7.2 Cloud Computing
- 1.8 DATA MINING ETHICS
- 1.9 INTRINSIC VALUE AND CUSTOMER CHURN
- 1.10 CHAPTER SUMMARY
- 1.11 KEY TERMS
- CHAPTER 2 ■ Data Mining: A Closer Look
- CHAPTER OBJECTIVES
- 2.1 DATA MINING STRATEGIES
- 2.1.1 Classification
- 2.1.2 Estimation
- 2.1.3 Prediction
- 2.1.4 Unsupervised Clustering
- 2.1.5 Market Basket Analysis
- 2.2 SUPERVISED DATA MINING TECHNIQUES
- 2.2.1 The Credit Card Promotion Database
- 2.2.2 Rule-Based Techniques
- 2.2.3 Neural Networks
- 2.2.4 Statistical Regression
- 2.3 ASSOCIATION RULES
- 2.4 CLUSTERING TECHNIQUES
- 2.5 EVALUATING PERFORMANCE
- 2.5.1 Evaluating Supervised Learner Models
- 2.5.2 Two-Class Error Analysis
- 2.5.3 Evaluating Numeric Output
- 2.5.4 Comparing Models by Measuring Lift
- 2.5.5 Unsupervised Model Evaluation
- 2.6 CHAPTER SUMMARY
- 2.7 KEY TERMS
- CHAPTER 3 ■ Basic Data Mining Techniques
- CHAPTER OBJECTIVES
- 3.1 DECISION TREES
- 3.1.1 An Algorithm for Building Decision Trees
- 3.1.2 Decision Trees for the Credit Card Promotion Database
- 3.1.3 Decision Tree Rules
- 3.1.4 Other Methods for Building Decision Trees
- 3.1.5 General Considerations
- 3.2 A BASIC COVERING RULE ALGORITHM
- 3.3 GENERATING ASSOCIATION RULES
- 3.3.1 Confidence and Support
- 3.3.2 Mining Association Rules: An Example
- 3.3.3 General Considerations
- 3.4 THE K-MEANS ALGORITHM
- 3.4.1 An Example Using K-means
- 3.4.2 General Considerations
- 3.5 GENETIC LEARNING
- 3.5.1 Genetic Algorithms and Supervised Learning
- 3.5.2 General Considerations
- 3.6 CHOOSING A DATA MINING TECHNIQUE
- 3.7 CHAPTER SUMMARY
- 3.8 KEY TERMS
- SECTION II Tools for Knowledge Discovery
- CHAPTER 4 ■ Weka—An Environment for Knowledge Discovery
- CHAPTER OBJECTIVES
- 4.1 GETTING STARTED WITH WEKA
- 4.2 BUILDING DECISION TREES
- 4.3 GENERATING PRODUCTION RULES WITH PART
- 4.4 ATTRIBUTE SELECTION AND NEAREST NEIGHBOR CLASSIFICATION
- 4.5 ASSOCIATION RULES
- 4.6 COST/BENEFIT ANALYSIS, (OPTIONAL)
- 4.7 UNSUPERVISED CLUSTERING WITH THE K-MEANS ALGORITHM
- 4.8 CHAPTER SUMMARY
- CHAPTER 5 ■ Knowledge Discovery with RapidMiner
- CHAPTER OBJECTIVES
- 5.1 GETTING STARTED WITH RAPIDMINER
- 5.1.1 Installing RapidMiner
- 5.1.2 Navigating the Interface
- 5.1.3 A First Process Model
- 5.1.4 A Decision Tree for the Credit Card Promotion Database
- 5.1.5 Breakpoints
- 5.2 BUILDING DECISION TREES
- 5.2.1 Scenario 1: Using a Training and Test Set
- 5.2.2 Scenario 2: Adding a Subprocess
- 5.2.3 Scenario 3: Creating, Saving, and Applying the Final Model
- 5.2.3.1 Saving a Model to an Output File
- 5.2.3.2 Reading and Applying a Model
- 5.2.4 Scenario 4: Using Cross-Validation
- 5.3 GENERATING RULES
- 5.3.1 Scenario 1: Tree to Rules
- 5.3.2 Scenario 2: Rule Induction
- 5.3.3 Scenario 3: Subgroup Discovery
- 5.4 ASSOCIATION RULE LEARNING
- 5.4.1 Association Rules for the Credit Card Promotion Database
- 5.4.2 The Market Basket Analysis Template
- 5.5 UNSUPERVISED CLUSTERING WITH K-MEANS
- 5.6 ATTRIBUTE SELECTION AND NEAREST NEIGHBOR CLASSIFICATION
- 5.7 CHAPTER SUMMARY
- CHAPTER 6 ■ The Knowledge Discovery Process
- CHAPTER OBJECTIVES
- 6.1 A PROCESS MODEL FOR KNOWLEDGE DISCOVERY
- 6.2 GOAL IDENTIFICATION
- 6.3 CREATING A TARGET DATA SET
- 6.4 DATA PREPROCESSING
- 6.4.1 Noisy Data
- 6.4.1.1 Locating Duplicate Records
- 6.4.1.2 Locating Incorrect Attribute Values
- 6.4.1.3 Data Smoothing
- 6.4.1.4 Detecting Outliers
- 6.4.2 Missing Data
- 6.5 DATA TRANSFORMATION
- 6.5.1 Data Normalization
- 6.5.2 Data Type Conversion
- 6.5.3 Attribute and Instance Selection
- 6.5.3.1 Wrapper and Filtering Techniques
- 6.5.3.2 More Attribute Selection Techniques
- 6.5.3.3 Genetic Learning for Attribute Selection
- 6.5.3.4 Creating Attributes
- 6.5.3.5 Instance Selection
- 6.6 DATA MINING
- 6.7 INTERPRETATION AND EVALUATION
- 6.8 TAKING ACTION
- 6.9 THE CRISP-DM PROCESS MODEL
- 6.10 CHAPTER SUMMARY
- 6.11 KEY TERMS
- CHAPTER 7 ■ Formal Evaluation Techniques
- CHAPTER OBJECTIVES
- 7.1 WHAT SHOULD BE EVALUATED?
- 7.2 TOOLS FOR EVALUATION
- 7.2.1 Single-Valued Summary Statistics
- 7.2.2 The Normal Distribution
- 7.2.3 Normal Distributions and Sample Means
- 7.2.4 A Classical Model for Hypothesis Testing
- 7.3 COMPUTING TEST SET CONFIDENCE INTERVALS
- 7.4 COMPARING SUPERVISED LEARNER MODELS
- 7.4.1 Comparing the Performance of Two Models
- 7.4.2 Comparing the Performance of Two or More Models
- 7.5 UNSUPERVISED EVALUATION TECHNIQUES
- 7.5.1 Unsupervised Clustering for Supervised Evaluation
- 7.5.2 Supervised Evaluation for Unsupervised Clustering
- 7.5.3 Additional Methods for Evaluating an Unsupervised Clustering
- 7.6 EVALUATING SUPERVISED MODELS WITH NUMERIC OUTPUT
- 7.7 COMPARING MODELS WITH RAPIDMINER
- 7.8 ATTRIBUTE EVALUATION FOR MIXED DATA TYPES
- 7.9 PARETO LIFT CHARTS
- 7.10 CHAPTER SUMMARY
- 7.11 KEY TERMS
- SECTION III Building Neural Networks
- CHAPTER 8 ■ Neural Networks
- CHAPTER OBJECTIVES
- 8.1 FEED-FORWARD NEURAL NETWORKS
- 8.1.1 Neural Network Input Format
- 8.1.2 Neural Network Output Format
- 8.1.3 The Sigmoid Evaluation Function
- 8.2 NEURAL NETWORK TRAINING: A CONCEPTUAL VIEW
- 8.2.1 Supervised Learning with Feed-Forward Networks
- 8.2.1.1 Training a Neural Network: Backpropagation Learning
- 8.2.1.2 Training a Neural Network: Genetic Learning
- 8.2.2 Unsupervised Clustering with Self-Organizing Maps
- 8.3 NEURAL NETWORK EXPLANATION
- 8.4 GENERAL CONSIDERATIONS
- 8.5 NEURAL NETWORK TRAINING: A DETAILED VIEW
- 8.5.1 The Backpropagation Algorithm: An Example
- 8.5.2 Kohonen Self-Organizing Maps: An Example
- 8.6 CHAPTER SUMMARY
- 8.7 KEY TERMS
- CHAPTER 9 ■ Building Neural Networks with Weka
- CHAPTER OBJECTIVES
- 9.1 DATA SETS FOR BACKPROPAGATION LEARNING
- 9.1.1 The Exclusive-OR Function
- 9.1.2 The Satellite Image Data Set
- 9.2 MODELING THE EXCLUSIVE-OR FUNCTION: NUMERIC OUTPUT
- 9.3 MODELING THE EXCLUSIVE-OR FUNCTION: CATEGORICAL OUTPUT
- 9.4 MINING SATELLITE IMAGE DATA
- 9.5 UNSUPERVISED NEURAL NET CLUSTERING
- 9.6 CHAPTER SUMMARY
- 9.7 KEY TERMS
- CHAPTER 10 ■ Building Neural Networks with RapidMiner
- CHAPTER OBJECTIVES
- 10.1 MODELING THE EXCLUSIVE-OR FUNCTION
- 10.2 MINING SATELLITE IMAGE DATA
- 10.3 PREDICTING CUSTOMER CHURN
- 10.4 RAPIDMINER’S SELF-ORGANIZING MAP OPERATOR
- 10.5 CHAPTER SUMMARY
- SECTION IV Advanced Data Mining Techniques
- CHAPTER 11 ■ Supervised Statistical Techniques
- CHAPTER OBJECTIVES
- 11.1 NAÏVE BAYES CLASSIFIER
- 11.1.1 Naïve Bayes Classifier: An Example
- 11.1.2 Zero-Valued Attribute Counts
- 11.1.3 Missing Data
- 11.1.4 Numeric Data
- 11.1.5 Implementations of the Naïve Bayes Classifier
- 11.1.6 General Considerations
- 11.2 SUPPORT VECTOR MACHINES
- 11.2.1 Linearly Separable Classes
- 11.2.2 The Nonlinear Case
- 11.2.3 General Considerations
- 11.2.4 Implementations of Support Vector Machines
- 11.3 LINEAR REGRESSION ANALYSIS
- 11.3.1 Simple Linear Regression
- 11.3.2 Multiple Linear Regression
- 11.3.2.1 Linear Regression—Weka
- 11.3.2.2 Linear Regression—RapidMiner
- 11.4 REGRESSION TREES
- 11.5 LOGISTIC REGRESSION
- 11.5.1 Transforming the Linear Regression Model
- 11.5.2 The Logistic Regression Model
- 11.6 CHAPTER SUMMARY
- 11.7 KEY TERMS
- CHAPTER 12 ■ Unsupervised Clustering Techniques
- CHAPTER OBJECTIVES
- 12.1 AGGLOMERATIVE CLUSTERING
- 12.1.1 Agglomerative Clustering: An Example
- 12.1.2 General Considerations
- 12.2 CONCEPTUAL CLUSTERING
- 12.2.1 Measuring Category Utility
- 12.2.2 Conceptual Clustering: An Example
- 12.2.3 General Considerations
- 12.3 EXPECTATION MAXIMIZATION
- 12.3.1 Implementations of the EM Algorithm
- 12.3.2 General Considerations
- 12.4 GENETIC ALGORITHMS AND UNSUPERVISED CLUSTERING
- 12.5 CHAPTER SUMMARY
- 12.6 KEY TERMS
- CHAPTER 13 ■ Specialized Techniques
- CHAPTER OBJECTIVES
- 13.1 TIME-SERIES ANALYSIS
- 13.1.1 Stock Market Analytics
- 13.1.2 Time-Series Analysis—An Example
- 13.1.2.1 Creating the Target Data Set—Numeric Output
- 13.1.2.2 Data Preprocessing and Transformation
- 13.1.2.3 Creating the Target Data Set—Categorical Output
- 13.1.2.4 Mining the Data—RapidMiner
- 13.1.2.5 Mining the Data—Weka
- 13.1.2.6 Interpretation, Evaluation, and Action
- 13.1.3 General Considerations
- 13.2 MINING THE WEB
- 13.2.1 Web-Based Mining: General Issues
- 13.2.1.1 Identifying the Goal
- 13.2.2 Preparing the Data
- 13.2.2.1 Mining the Data
- 13.2.2.2 Interpreting and Evaluating Results
- 13.2.2.3 Taking Action
- 13.2.3 Data Mining for Website Evaluation
- 13.2.4 Data Mining for Personalization
- 13.2.5 Data Mining for Website Adaptation
- 13.2.6 PageRank and Link Analysis
- 13.2.7 Operators for Web-Based Mining
- 13.3 MINING TEXTUAL DATA
- 13.3.1 Analyzing Customer Reviews
- 13.4 TECHNIQUES FOR LARGE-SIZED, IMBALANCED, AND STREAMING DATA
- 13.4.1 Large-Sized Data
- 13.4.2 Dealing with Imbalanced Data
- 13.4.2.1 Methods for Addressing Rarity
- 13.4.2.2 Receiver Operating Characteristics Curves
- 13.4.3 Methods for Streaming Data
- 13.5 ENSEMBLE TECHNIQUES FOR IMPROVING PERFORMANCE
- 13.5.1 Bagging
- 13.5.2 Boosting
- 13.5.3 AdaBoost—An Example
- 13.6 CHAPTER SUMMARY
- 13.7 KEY TERMS
- CHAPTER 14 ■ The Data Warehouse
- CHAPTER OBJECTIVES
- 14.1 OPERATIONAL DATABASES
- 14.1.1 Data Modeling and Normalization
- 14.1.2 The Relational Model
- 14.2 DATA WAREHOUSE DESIGN
- 14.2.1 Entering Data into the Warehouse
- 14.2.2 Structuring the Data Warehouse: The Star Schema
- 14.2.2.1 The Multidimensionality of the Star Schema
- 14.2.2.2 Additional Relational Schemas
- 14.2.3 Decision Support: Analyzing the Warehouse Data
- 14.3 ONLINE ANALYTICAL PROCESSING
- 14.3.1 OLAP: An Example
- 14.3.2 General Considerations
- 14.4 EXCEL PIVOT TABLES FOR DATA ANALYTICS
- 14.5 CHAPTER SUMMARY
- 14.6 KEY TERMS
- APPENDIX A—SOFTWARE AND DATA SETS FOR DATA MINING
- APPENDIX B—STATISTICS FOR PERFORMANCE EVALUATION
- BIBLIOGRAPHY
- INDEX
Reviews
There are no reviews yet.