Data Mining

Höfundur Richard J. Roiger

Útgefandi Taylor & Francis

Snið ePub

Print ISBN 9781498763974

Útgáfa 2

Útgáfuár 2017

12.390 kr.

Description

Efnisyfirlit

  • Cover
  • Half Title
  • Title Page
  • Copyright Page
  • Contents
  • List of Figures
  • List of Tables
  • Preface
  • Acknowledgments
  • Author
  • SECTION I Data Mining Fundamentals
  • CHAPTER 1 ■ Data Mining: A First View
  • CHAPTER OBJECTIVES
  • 1.1 DATA SCIENCE, ANALYTICS, MINING, AND KNOWLEDGE DISCOVERY IN DATABASES
  • 1.1.1 Data Science and Analytics
  • 1.1.2 Data Mining
  • 1.1.3 Data Science versus Knowledge Discovery in Databases
  • 1.2 WHAT CAN COMPUTERS LEARN?
  • 1.2.1 Three Concept Views
  • 1.2.1.1 The Classical View
  • 1.2.1.2 The Probabilistic View
  • 1.2.1.3 The Exemplar View
  • 1.2.2 Supervised Learning
  • 1.2.3 Supervised Learning: A Decision Tree Example
  • 1.2.4 Unsupervised Clustering
  • 1.3 IS DATA MINING APPROPRIATE FOR MY PROBLEM?
  • 1.3.1 Data Mining or Data Query?
  • 1.3.2 Data Mining versus Data Query: An Example
  • 1.4 DATA MINING OR KNOWLEDGE ENGINEERING?
  • 1.5 A NEAREST NEIGHBOR APPROACH
  • 1.6 A PROCESS MODEL FOR DATA MINING
  • 1.6.1 Acquiring Data
  • 1.6.1.1 The Data Warehouse
  • 1.6.1.2 Relational Databases and Flat Files
  • 1.6.1.3 Distributed Data Access
  • 1.6.2 Data Preprocessing
  • 1.6.3 Mining the Data
  • 1.6.4 Interpreting the Results
  • 1.6.5 Result Application
  • 1.7 DATA MINING, BIG DATA, AND CLOUD COMPUTING
  • 1.7.1 Hadoop
  • 1.7.2 Cloud Computing
  • 1.8 DATA MINING ETHICS
  • 1.9 INTRINSIC VALUE AND CUSTOMER CHURN
  • 1.10 CHAPTER SUMMARY
  • 1.11 KEY TERMS
  • CHAPTER 2 ■ Data Mining: A Closer Look
  • CHAPTER OBJECTIVES
  • 2.1 DATA MINING STRATEGIES
  • 2.1.1 Classification
  • 2.1.2 Estimation
  • 2.1.3 Prediction
  • 2.1.4 Unsupervised Clustering
  • 2.1.5 Market Basket Analysis
  • 2.2 SUPERVISED DATA MINING TECHNIQUES
  • 2.2.1 The Credit Card Promotion Database
  • 2.2.2 Rule-Based Techniques
  • 2.2.3 Neural Networks
  • 2.2.4 Statistical Regression
  • 2.3 ASSOCIATION RULES
  • 2.4 CLUSTERING TECHNIQUES
  • 2.5 EVALUATING PERFORMANCE
  • 2.5.1 Evaluating Supervised Learner Models
  • 2.5.2 Two-Class Error Analysis
  • 2.5.3 Evaluating Numeric Output
  • 2.5.4 Comparing Models by Measuring Lift
  • 2.5.5 Unsupervised Model Evaluation
  • 2.6 CHAPTER SUMMARY
  • 2.7 KEY TERMS
  • CHAPTER 3 ■ Basic Data Mining Techniques
  • CHAPTER OBJECTIVES
  • 3.1 DECISION TREES
  • 3.1.1 An Algorithm for Building Decision Trees
  • 3.1.2 Decision Trees for the Credit Card Promotion Database
  • 3.1.3 Decision Tree Rules
  • 3.1.4 Other Methods for Building Decision Trees
  • 3.1.5 General Considerations
  • 3.2 A BASIC COVERING RULE ALGORITHM
  • 3.3 GENERATING ASSOCIATION RULES
  • 3.3.1 Confidence and Support
  • 3.3.2 Mining Association Rules: An Example
  • 3.3.3 General Considerations
  • 3.4 THE K-MEANS ALGORITHM
  • 3.4.1 An Example Using K-means
  • 3.4.2 General Considerations
  • 3.5 GENETIC LEARNING
  • 3.5.1 Genetic Algorithms and Supervised Learning
  • 3.5.2 General Considerations
  • 3.6 CHOOSING A DATA MINING TECHNIQUE
  • 3.7 CHAPTER SUMMARY
  • 3.8 KEY TERMS
  • SECTION II Tools for Knowledge Discovery
  • CHAPTER 4 ■ Weka—An Environment for Knowledge Discovery
  • CHAPTER OBJECTIVES
  • 4.1 GETTING STARTED WITH WEKA
  • 4.2 BUILDING DECISION TREES
  • 4.3 GENERATING PRODUCTION RULES WITH PART
  • 4.4 ATTRIBUTE SELECTION AND NEAREST NEIGHBOR CLASSIFICATION
  • 4.5 ASSOCIATION RULES
  • 4.6 COST/BENEFIT ANALYSIS, (OPTIONAL)
  • 4.7 UNSUPERVISED CLUSTERING WITH THE K-MEANS ALGORITHM
  • 4.8 CHAPTER SUMMARY
  • CHAPTER 5 ■ Knowledge Discovery with RapidMiner
  • CHAPTER OBJECTIVES
  • 5.1 GETTING STARTED WITH RAPIDMINER
  • 5.1.1 Installing RapidMiner
  • 5.1.2 Navigating the Interface
  • 5.1.3 A First Process Model
  • 5.1.4 A Decision Tree for the Credit Card Promotion Database
  • 5.1.5 Breakpoints
  • 5.2 BUILDING DECISION TREES
  • 5.2.1 Scenario 1: Using a Training and Test Set
  • 5.2.2 Scenario 2: Adding a Subprocess
  • 5.2.3 Scenario 3: Creating, Saving, and Applying the Final Model
  • 5.2.3.1 Saving a Model to an Output File
  • 5.2.3.2 Reading and Applying a Model
  • 5.2.4 Scenario 4: Using Cross-Validation
  • 5.3 GENERATING RULES
  • 5.3.1 Scenario 1: Tree to Rules
  • 5.3.2 Scenario 2: Rule Induction
  • 5.3.3 Scenario 3: Subgroup Discovery
  • 5.4 ASSOCIATION RULE LEARNING
  • 5.4.1 Association Rules for the Credit Card Promotion Database
  • 5.4.2 The Market Basket Analysis Template
  • 5.5 UNSUPERVISED CLUSTERING WITH K-MEANS
  • 5.6 ATTRIBUTE SELECTION AND NEAREST NEIGHBOR CLASSIFICATION
  • 5.7 CHAPTER SUMMARY
  • CHAPTER 6 ■ The Knowledge Discovery Process
  • CHAPTER OBJECTIVES
  • 6.1 A PROCESS MODEL FOR KNOWLEDGE DISCOVERY
  • 6.2 GOAL IDENTIFICATION
  • 6.3 CREATING A TARGET DATA SET
  • 6.4 DATA PREPROCESSING
  • 6.4.1 Noisy Data
  • 6.4.1.1 Locating Duplicate Records
  • 6.4.1.2 Locating Incorrect Attribute Values
  • 6.4.1.3 Data Smoothing
  • 6.4.1.4 Detecting Outliers
  • 6.4.2 Missing Data
  • 6.5 DATA TRANSFORMATION
  • 6.5.1 Data Normalization
  • 6.5.2 Data Type Conversion
  • 6.5.3 Attribute and Instance Selection
  • 6.5.3.1 Wrapper and Filtering Techniques
  • 6.5.3.2 More Attribute Selection Techniques
  • 6.5.3.3 Genetic Learning for Attribute Selection
  • 6.5.3.4 Creating Attributes
  • 6.5.3.5 Instance Selection
  • 6.6 DATA MINING
  • 6.7 INTERPRETATION AND EVALUATION
  • 6.8 TAKING ACTION
  • 6.9 THE CRISP-DM PROCESS MODEL
  • 6.10 CHAPTER SUMMARY
  • 6.11 KEY TERMS
  • CHAPTER 7 ■ Formal Evaluation Techniques
  • CHAPTER OBJECTIVES
  • 7.1 WHAT SHOULD BE EVALUATED?
  • 7.2 TOOLS FOR EVALUATION
  • 7.2.1 Single-Valued Summary Statistics
  • 7.2.2 The Normal Distribution
  • 7.2.3 Normal Distributions and Sample Means
  • 7.2.4 A Classical Model for Hypothesis Testing
  • 7.3 COMPUTING TEST SET CONFIDENCE INTERVALS
  • 7.4 COMPARING SUPERVISED LEARNER MODELS
  • 7.4.1 Comparing the Performance of Two Models
  • 7.4.2 Comparing the Performance of Two or More Models
  • 7.5 UNSUPERVISED EVALUATION TECHNIQUES
  • 7.5.1 Unsupervised Clustering for Supervised Evaluation
  • 7.5.2 Supervised Evaluation for Unsupervised Clustering
  • 7.5.3 Additional Methods for Evaluating an Unsupervised Clustering
  • 7.6 EVALUATING SUPERVISED MODELS WITH NUMERIC OUTPUT
  • 7.7 COMPARING MODELS WITH RAPIDMINER
  • 7.8 ATTRIBUTE EVALUATION FOR MIXED DATA TYPES
  • 7.9 PARETO LIFT CHARTS
  • 7.10 CHAPTER SUMMARY
  • 7.11 KEY TERMS
  • SECTION III Building Neural Networks
  • CHAPTER 8 ■ Neural Networks
  • CHAPTER OBJECTIVES
  • 8.1 FEED-FORWARD NEURAL NETWORKS
  • 8.1.1 Neural Network Input Format
  • 8.1.2 Neural Network Output Format
  • 8.1.3 The Sigmoid Evaluation Function
  • 8.2 NEURAL NETWORK TRAINING: A CONCEPTUAL VIEW
  • 8.2.1 Supervised Learning with Feed-Forward Networks
  • 8.2.1.1 Training a Neural Network: Backpropagation Learning
  • 8.2.1.2 Training a Neural Network: Genetic Learning
  • 8.2.2 Unsupervised Clustering with Self-Organizing Maps
  • 8.3 NEURAL NETWORK EXPLANATION
  • 8.4 GENERAL CONSIDERATIONS
  • 8.5 NEURAL NETWORK TRAINING: A DETAILED VIEW
  • 8.5.1 The Backpropagation Algorithm: An Example
  • 8.5.2 Kohonen Self-Organizing Maps: An Example
  • 8.6 CHAPTER SUMMARY
  • 8.7 KEY TERMS
  • CHAPTER 9 ■ Building Neural Networks with Weka
  • CHAPTER OBJECTIVES
  • 9.1 DATA SETS FOR BACKPROPAGATION LEARNING
  • 9.1.1 The Exclusive-OR Function
  • 9.1.2 The Satellite Image Data Set
  • 9.2 MODELING THE EXCLUSIVE-OR FUNCTION: NUMERIC OUTPUT
  • 9.3 MODELING THE EXCLUSIVE-OR FUNCTION: CATEGORICAL OUTPUT
  • 9.4 MINING SATELLITE IMAGE DATA
  • 9.5 UNSUPERVISED NEURAL NET CLUSTERING
  • 9.6 CHAPTER SUMMARY
  • 9.7 KEY TERMS
  • CHAPTER 10 ■ Building Neural Networks with RapidMiner
  • CHAPTER OBJECTIVES
  • 10.1 MODELING THE EXCLUSIVE-OR FUNCTION
  • 10.2 MINING SATELLITE IMAGE DATA
  • 10.3 PREDICTING CUSTOMER CHURN
  • 10.4 RAPIDMINER’S SELF-ORGANIZING MAP OPERATOR
  • 10.5 CHAPTER SUMMARY
  • SECTION IV Advanced Data Mining Techniques
  • CHAPTER 11 ■ Supervised Statistical Techniques
  • CHAPTER OBJECTIVES
  • 11.1 NAÏVE BAYES CLASSIFIER
  • 11.1.1 Naïve Bayes Classifier: An Example
  • 11.1.2 Zero-Valued Attribute Counts
  • 11.1.3 Missing Data
  • 11.1.4 Numeric Data
  • 11.1.5 Implementations of the Naïve Bayes Classifier
  • 11.1.6 General Considerations
  • 11.2 SUPPORT VECTOR MACHINES
  • 11.2.1 Linearly Separable Classes
  • 11.2.2 The Nonlinear Case
  • 11.2.3 General Considerations
  • 11.2.4 Implementations of Support Vector Machines
  • 11.3 LINEAR REGRESSION ANALYSIS
  • 11.3.1 Simple Linear Regression
  • 11.3.2 Multiple Linear Regression
  • 11.3.2.1 Linear Regression—Weka
  • 11.3.2.2 Linear Regression—RapidMiner
  • 11.4 REGRESSION TREES
  • 11.5 LOGISTIC REGRESSION
  • 11.5.1 Transforming the Linear Regression Model
  • 11.5.2 The Logistic Regression Model
  • 11.6 CHAPTER SUMMARY
  • 11.7 KEY TERMS
  • CHAPTER 12 ■ Unsupervised Clustering Techniques
  • CHAPTER OBJECTIVES
  • 12.1 AGGLOMERATIVE CLUSTERING
  • 12.1.1 Agglomerative Clustering: An Example
  • 12.1.2 General Considerations
  • 12.2 CONCEPTUAL CLUSTERING
  • 12.2.1 Measuring Category Utility
  • 12.2.2 Conceptual Clustering: An Example
  • 12.2.3 General Considerations
  • 12.3 EXPECTATION MAXIMIZATION
  • 12.3.1 Implementations of the EM Algorithm
  • 12.3.2 General Considerations
  • 12.4 GENETIC ALGORITHMS AND UNSUPERVISED CLUSTERING
  • 12.5 CHAPTER SUMMARY
  • 12.6 KEY TERMS
  • CHAPTER 13 ■ Specialized Techniques
  • CHAPTER OBJECTIVES
  • 13.1 TIME-SERIES ANALYSIS
  • 13.1.1 Stock Market Analytics
  • 13.1.2 Time-Series Analysis—An Example
  • 13.1.2.1 Creating the Target Data Set—Numeric Output
  • 13.1.2.2 Data Preprocessing and Transformation
  • 13.1.2.3 Creating the Target Data Set—Categorical Output
  • 13.1.2.4 Mining the Data—RapidMiner
  • 13.1.2.5 Mining the Data—Weka
  • 13.1.2.6 Interpretation, Evaluation, and Action
  • 13.1.3 General Considerations
  • 13.2 MINING THE WEB
  • 13.2.1 Web-Based Mining: General Issues
  • 13.2.1.1 Identifying the Goal
  • 13.2.2 Preparing the Data
  • 13.2.2.1 Mining the Data
  • 13.2.2.2 Interpreting and Evaluating Results
  • 13.2.2.3 Taking Action
  • 13.2.3 Data Mining for Website Evaluation
  • 13.2.4 Data Mining for Personalization
  • 13.2.5 Data Mining for Website Adaptation
  • 13.2.6 PageRank and Link Analysis
  • 13.2.7 Operators for Web-Based Mining
  • 13.3 MINING TEXTUAL DATA
  • 13.3.1 Analyzing Customer Reviews
  • 13.4 TECHNIQUES FOR LARGE-SIZED, IMBALANCED, AND STREAMING DATA
  • 13.4.1 Large-Sized Data
  • 13.4.2 Dealing with Imbalanced Data
  • 13.4.2.1 Methods for Addressing Rarity
  • 13.4.2.2 Receiver Operating Characteristics Curves
  • 13.4.3 Methods for Streaming Data
  • 13.5 ENSEMBLE TECHNIQUES FOR IMPROVING PERFORMANCE
  • 13.5.1 Bagging
  • 13.5.2 Boosting
  • 13.5.3 AdaBoost—An Example
  • 13.6 CHAPTER SUMMARY
  • 13.7 KEY TERMS
  • CHAPTER 14 ■ The Data Warehouse
  • CHAPTER OBJECTIVES
  • 14.1 OPERATIONAL DATABASES
  • 14.1.1 Data Modeling and Normalization
  • 14.1.2 The Relational Model
  • 14.2 DATA WAREHOUSE DESIGN
  • 14.2.1 Entering Data into the Warehouse
  • 14.2.2 Structuring the Data Warehouse: The Star Schema
  • 14.2.2.1 The Multidimensionality of the Star Schema
  • 14.2.2.2 Additional Relational Schemas
  • 14.2.3 Decision Support: Analyzing the Warehouse Data
  • 14.3 ONLINE ANALYTICAL PROCESSING
  • 14.3.1 OLAP: An Example
  • 14.3.2 General Considerations
  • 14.4 EXCEL PIVOT TABLES FOR DATA ANALYTICS
  • 14.5 CHAPTER SUMMARY
  • 14.6 KEY TERMS
  • APPENDIX A—SOFTWARE AND DATA SETS FOR DATA MINING
  • APPENDIX B—STATISTICS FOR PERFORMANCE EVALUATION
  • BIBLIOGRAPHY
  • INDEX

Additional information

Veldu vöru

Rafbók til eignar

Aðrar vörur

0
    0
    Karfan þín
    Karfan þín er tómAftur í búð