Data Mining

Höfundur Richard J. Roiger

Útgefandi Taylor & Francis

Snið ePub

Print ISBN 9781498763974

Útgáfa 2

Útgáfuár 2017

12.090 kr.

Description

Efnisyfirlit

  • Cover
  • Half Title
  • Title Page
  • Copyright Page
  • Contents
  • List of Figures
  • List of Tables
  • Preface
  • Acknowledgments
  • Author
  • SECTION I Data Mining Fundamentals
  • CHAPTER 1 ■ Data Mining: A First View
  • CHAPTER OBJECTIVES
  • 1.1 DATA SCIENCE, ANALYTICS, MINING, AND KNOWLEDGE DISCOVERY IN DATABASES
  • 1.1.1 Data Science and Analytics
  • 1.1.2 Data Mining
  • 1.1.3 Data Science versus Knowledge Discovery in Databases
  • 1.2 WHAT CAN COMPUTERS LEARN?
  • 1.2.1 Three Concept Views
  • 1.2.1.1 The Classical View
  • 1.2.1.2 The Probabilistic View
  • 1.2.1.3 The Exemplar View
  • 1.2.2 Supervised Learning
  • 1.2.3 Supervised Learning: A Decision Tree Example
  • 1.2.4 Unsupervised Clustering
  • 1.3 IS DATA MINING APPROPRIATE FOR MY PROBLEM?
  • 1.3.1 Data Mining or Data Query?
  • 1.3.2 Data Mining versus Data Query: An Example
  • 1.4 DATA MINING OR KNOWLEDGE ENGINEERING?
  • 1.5 A NEAREST NEIGHBOR APPROACH
  • 1.6 A PROCESS MODEL FOR DATA MINING
  • 1.6.1 Acquiring Data
  • 1.6.1.1 The Data Warehouse
  • 1.6.1.2 Relational Databases and Flat Files
  • 1.6.1.3 Distributed Data Access
  • 1.6.2 Data Preprocessing
  • 1.6.3 Mining the Data
  • 1.6.4 Interpreting the Results
  • 1.6.5 Result Application
  • 1.7 DATA MINING, BIG DATA, AND CLOUD COMPUTING
  • 1.7.1 Hadoop
  • 1.7.2 Cloud Computing
  • 1.8 DATA MINING ETHICS
  • 1.9 INTRINSIC VALUE AND CUSTOMER CHURN
  • 1.10 CHAPTER SUMMARY
  • 1.11 KEY TERMS
  • CHAPTER 2 ■ Data Mining: A Closer Look
  • CHAPTER OBJECTIVES
  • 2.1 DATA MINING STRATEGIES
  • 2.1.1 Classification
  • 2.1.2 Estimation
  • 2.1.3 Prediction
  • 2.1.4 Unsupervised Clustering
  • 2.1.5 Market Basket Analysis
  • 2.2 SUPERVISED DATA MINING TECHNIQUES
  • 2.2.1 The Credit Card Promotion Database
  • 2.2.2 Rule-Based Techniques
  • 2.2.3 Neural Networks
  • 2.2.4 Statistical Regression
  • 2.3 ASSOCIATION RULES
  • 2.4 CLUSTERING TECHNIQUES
  • 2.5 EVALUATING PERFORMANCE
  • 2.5.1 Evaluating Supervised Learner Models
  • 2.5.2 Two-Class Error Analysis
  • 2.5.3 Evaluating Numeric Output
  • 2.5.4 Comparing Models by Measuring Lift
  • 2.5.5 Unsupervised Model Evaluation
  • 2.6 CHAPTER SUMMARY
  • 2.7 KEY TERMS
  • CHAPTER 3 ■ Basic Data Mining Techniques
  • CHAPTER OBJECTIVES
  • 3.1 DECISION TREES
  • 3.1.1 An Algorithm for Building Decision Trees
  • 3.1.2 Decision Trees for the Credit Card Promotion Database
  • 3.1.3 Decision Tree Rules
  • 3.1.4 Other Methods for Building Decision Trees
  • 3.1.5 General Considerations
  • 3.2 A BASIC COVERING RULE ALGORITHM
  • 3.3 GENERATING ASSOCIATION RULES
  • 3.3.1 Confidence and Support
  • 3.3.2 Mining Association Rules: An Example
  • 3.3.3 General Considerations
  • 3.4 THE K-MEANS ALGORITHM
  • 3.4.1 An Example Using K-means
  • 3.4.2 General Considerations
  • 3.5 GENETIC LEARNING
  • 3.5.1 Genetic Algorithms and Supervised Learning
  • 3.5.2 General Considerations
  • 3.6 CHOOSING A DATA MINING TECHNIQUE
  • 3.7 CHAPTER SUMMARY
  • 3.8 KEY TERMS
  • SECTION II Tools for Knowledge Discovery
  • CHAPTER 4 ■ Weka—An Environment for Knowledge Discovery
  • CHAPTER OBJECTIVES
  • 4.1 GETTING STARTED WITH WEKA
  • 4.2 BUILDING DECISION TREES
  • 4.3 GENERATING PRODUCTION RULES WITH PART
  • 4.4 ATTRIBUTE SELECTION AND NEAREST NEIGHBOR CLASSIFICATION
  • 4.5 ASSOCIATION RULES
  • 4.6 COST/BENEFIT ANALYSIS, (OPTIONAL)
  • 4.7 UNSUPERVISED CLUSTERING WITH THE K-MEANS ALGORITHM
  • 4.8 CHAPTER SUMMARY
  • CHAPTER 5 ■ Knowledge Discovery with RapidMiner
  • CHAPTER OBJECTIVES
  • 5.1 GETTING STARTED WITH RAPIDMINER
  • 5.1.1 Installing RapidMiner
  • 5.1.2 Navigating the Interface
  • 5.1.3 A First Process Model
  • 5.1.4 A Decision Tree for the Credit Card Promotion Database
  • 5.1.5 Breakpoints
  • 5.2 BUILDING DECISION TREES
  • 5.2.1 Scenario 1: Using a Training and Test Set
  • 5.2.2 Scenario 2: Adding a Subprocess
  • 5.2.3 Scenario 3: Creating, Saving, and Applying the Final Model
  • 5.2.3.1 Saving a Model to an Output File
  • 5.2.3.2 Reading and Applying a Model
  • 5.2.4 Scenario 4: Using Cross-Validation
  • 5.3 GENERATING RULES
  • 5.3.1 Scenario 1: Tree to Rules
  • 5.3.2 Scenario 2: Rule Induction
  • 5.3.3 Scenario 3: Subgroup Discovery
  • 5.4 ASSOCIATION RULE LEARNING
  • 5.4.1 Association Rules for the Credit Card Promotion Database
  • 5.4.2 The Market Basket Analysis Template
  • 5.5 UNSUPERVISED CLUSTERING WITH K-MEANS
  • 5.6 ATTRIBUTE SELECTION AND NEAREST NEIGHBOR CLASSIFICATION
  • 5.7 CHAPTER SUMMARY
  • CHAPTER 6 ■ The Knowledge Discovery Process
  • CHAPTER OBJECTIVES
  • 6.1 A PROCESS MODEL FOR KNOWLEDGE DISCOVERY
  • 6.2 GOAL IDENTIFICATION
  • 6.3 CREATING A TARGET DATA SET
  • 6.4 DATA PREPROCESSING
  • 6.4.1 Noisy Data
  • 6.4.1.1 Locating Duplicate Records
  • 6.4.1.2 Locating Incorrect Attribute Values
  • 6.4.1.3 Data Smoothing
  • 6.4.1.4 Detecting Outliers
  • 6.4.2 Missing Data
  • 6.5 DATA TRANSFORMATION
  • 6.5.1 Data Normalization
  • 6.5.2 Data Type Conversion
  • 6.5.3 Attribute and Instance Selection
  • 6.5.3.1 Wrapper and Filtering Techniques
  • 6.5.3.2 More Attribute Selection Techniques
  • 6.5.3.3 Genetic Learning for Attribute Selection
  • 6.5.3.4 Creating Attributes
  • 6.5.3.5 Instance Selection
  • 6.6 DATA MINING
  • 6.7 INTERPRETATION AND EVALUATION
  • 6.8 TAKING ACTION
  • 6.9 THE CRISP-DM PROCESS MODEL
  • 6.10 CHAPTER SUMMARY
  • 6.11 KEY TERMS
  • CHAPTER 7 ■ Formal Evaluation Techniques
  • CHAPTER OBJECTIVES
  • 7.1 WHAT SHOULD BE EVALUATED?
  • 7.2 TOOLS FOR EVALUATION
  • 7.2.1 Single-Valued Summary Statistics
  • 7.2.2 The Normal Distribution
  • 7.2.3 Normal Distributions and Sample Means
  • 7.2.4 A Classical Model for Hypothesis Testing
  • 7.3 COMPUTING TEST SET CONFIDENCE INTERVALS
  • 7.4 COMPARING SUPERVISED LEARNER MODELS
  • 7.4.1 Comparing the Performance of Two Models
  • 7.4.2 Comparing the Performance of Two or More Models
  • 7.5 UNSUPERVISED EVALUATION TECHNIQUES
  • 7.5.1 Unsupervised Clustering for Supervised Evaluation
  • 7.5.2 Supervised Evaluation for Unsupervised Clustering
  • 7.5.3 Additional Methods for Evaluating an Unsupervised Clustering
  • 7.6 EVALUATING SUPERVISED MODELS WITH NUMERIC OUTPUT
  • 7.7 COMPARING MODELS WITH RAPIDMINER
  • 7.8 ATTRIBUTE EVALUATION FOR MIXED DATA TYPES
  • 7.9 PARETO LIFT CHARTS
  • 7.10 CHAPTER SUMMARY
  • 7.11 KEY TERMS
  • SECTION III Building Neural Networks
  • CHAPTER 8 ■ Neural Networks
  • CHAPTER OBJECTIVES
  • 8.1 FEED-FORWARD NEURAL NETWORKS
  • 8.1.1 Neural Network Input Format
  • 8.1.2 Neural Network Output Format
  • 8.1.3 The Sigmoid Evaluation Function
  • 8.2 NEURAL NETWORK TRAINING: A CONCEPTUAL VIEW
  • 8.2.1 Supervised Learning with Feed-Forward Networks
  • 8.2.1.1 Training a Neural Network: Backpropagation Learning
  • 8.2.1.2 Training a Neural Network: Genetic Learning
  • 8.2.2 Unsupervised Clustering with Self-Organizing Maps
  • 8.3 NEURAL NETWORK EXPLANATION
  • 8.4 GENERAL CONSIDERATIONS
  • 8.5 NEURAL NETWORK TRAINING: A DETAILED VIEW
  • 8.5.1 The Backpropagation Algorithm: An Example
  • 8.5.2 Kohonen Self-Organizing Maps: An Example
  • 8.6 CHAPTER SUMMARY
  • 8.7 KEY TERMS
  • CHAPTER 9 ■ Building Neural Networks with Weka
  • CHAPTER OBJECTIVES
  • 9.1 DATA SETS FOR BACKPROPAGATION LEARNING
  • 9.1.1 The Exclusive-OR Function
  • 9.1.2 The Satellite Image Data Set
  • 9.2 MODELING THE EXCLUSIVE-OR FUNCTION: NUMERIC OUTPUT
  • 9.3 MODELING THE EXCLUSIVE-OR FUNCTION: CATEGORICAL OUTPUT
  • 9.4 MINING SATELLITE IMAGE DATA
  • 9.5 UNSUPERVISED NEURAL NET CLUSTERING
  • 9.6 CHAPTER SUMMARY
  • 9.7 KEY TERMS
  • CHAPTER 10 ■ Building Neural Networks with RapidMiner
  • CHAPTER OBJECTIVES
  • 10.1 MODELING THE EXCLUSIVE-OR FUNCTION
  • 10.2 MINING SATELLITE IMAGE DATA
  • 10.3 PREDICTING CUSTOMER CHURN
  • 10.4 RAPIDMINER’S SELF-ORGANIZING MAP OPERATOR
  • 10.5 CHAPTER SUMMARY
  • SECTION IV Advanced Data Mining Techniques
  • CHAPTER 11 ■ Supervised Statistical Techniques
  • CHAPTER OBJECTIVES
  • 11.1 NAÏVE BAYES CLASSIFIER
  • 11.1.1 Naïve Bayes Classifier: An Example
  • 11.1.2 Zero-Valued Attribute Counts
  • 11.1.3 Missing Data
  • 11.1.4 Numeric Data
  • 11.1.5 Implementations of the Naïve Bayes Classifier
  • 11.1.6 General Considerations
  • 11.2 SUPPORT VECTOR MACHINES
  • 11.2.1 Linearly Separable Classes
  • 11.2.2 The Nonlinear Case
  • 11.2.3 General Considerations
  • 11.2.4 Implementations of Support Vector Machines
  • 11.3 LINEAR REGRESSION ANALYSIS
  • 11.3.1 Simple Linear Regression
  • 11.3.2 Multiple Linear Regression
  • 11.3.2.1 Linear Regression—Weka
  • 11.3.2.2 Linear Regression—RapidMiner
  • 11.4 REGRESSION TREES
  • 11.5 LOGISTIC REGRESSION
  • 11.5.1 Transforming the Linear Regression Model
  • 11.5.2 The Logistic Regression Model
  • 11.6 CHAPTER SUMMARY
  • 11.7 KEY TERMS
  • CHAPTER 12 ■ Unsupervised Clustering Techniques
  • CHAPTER OBJECTIVES
  • 12.1 AGGLOMERATIVE CLUSTERING
  • 12.1.1 Agglomerative Clustering: An Example
  • 12.1.2 General Considerations
  • 12.2 CONCEPTUAL CLUSTERING
  • 12.2.1 Measuring Category Utility
  • 12.2.2 Conceptual Clustering: An Example
  • 12.2.3 General Considerations
  • 12.3 EXPECTATION MAXIMIZATION
  • 12.3.1 Implementations of the EM Algorithm
  • 12.3.2 General Considerations
  • 12.4 GENETIC ALGORITHMS AND UNSUPERVISED CLUSTERING
  • 12.5 CHAPTER SUMMARY
  • 12.6 KEY TERMS
  • CHAPTER 13 ■ Specialized Techniques
  • CHAPTER OBJECTIVES
  • 13.1 TIME-SERIES ANALYSIS
  • 13.1.1 Stock Market Analytics
  • 13.1.2 Time-Series Analysis—An Example
  • 13.1.2.1 Creating the Target Data Set—Numeric Output
  • 13.1.2.2 Data Preprocessing and Transformation
  • 13.1.2.3 Creating the Target Data Set—Categorical Output
  • 13.1.2.4 Mining the Data—RapidMiner
  • 13.1.2.5 Mining the Data—Weka
  • 13.1.2.6 Interpretation, Evaluation, and Action
  • 13.1.3 General Considerations
  • 13.2 MINING THE WEB
  • 13.2.1 Web-Based Mining: General Issues
  • 13.2.1.1 Identifying the Goal
  • 13.2.2 Preparing the Data
  • 13.2.2.1 Mining the Data
  • 13.2.2.2 Interpreting and Evaluating Results
  • 13.2.2.3 Taking Action
  • 13.2.3 Data Mining for Website Evaluation
  • 13.2.4 Data Mining for Personalization
  • 13.2.5 Data Mining for Website Adaptation
  • 13.2.6 PageRank and Link Analysis
  • 13.2.7 Operators for Web-Based Mining
  • 13.3 MINING TEXTUAL DATA
  • 13.3.1 Analyzing Customer Reviews
  • 13.4 TECHNIQUES FOR LARGE-SIZED, IMBALANCED, AND STREAMING DATA
  • 13.4.1 Large-Sized Data
  • 13.4.2 Dealing with Imbalanced Data
  • 13.4.2.1 Methods for Addressing Rarity
  • 13.4.2.2 Receiver Operating Characteristics Curves
  • 13.4.3 Methods for Streaming Data
  • 13.5 ENSEMBLE TECHNIQUES FOR IMPROVING PERFORMANCE
  • 13.5.1 Bagging
  • 13.5.2 Boosting
  • 13.5.3 AdaBoost—An Example
  • 13.6 CHAPTER SUMMARY
  • 13.7 KEY TERMS
  • CHAPTER 14 ■ The Data Warehouse
  • CHAPTER OBJECTIVES
  • 14.1 OPERATIONAL DATABASES
  • 14.1.1 Data Modeling and Normalization
  • 14.1.2 The Relational Model
  • 14.2 DATA WAREHOUSE DESIGN
  • 14.2.1 Entering Data into the Warehouse
  • 14.2.2 Structuring the Data Warehouse: The Star Schema
  • 14.2.2.1 The Multidimensionality of the Star Schema
  • 14.2.2.2 Additional Relational Schemas
  • 14.2.3 Decision Support: Analyzing the Warehouse Data
  • 14.3 ONLINE ANALYTICAL PROCESSING
  • 14.3.1 OLAP: An Example
  • 14.3.2 General Considerations
  • 14.4 EXCEL PIVOT TABLES FOR DATA ANALYTICS
  • 14.5 CHAPTER SUMMARY
  • 14.6 KEY TERMS
  • APPENDIX A—SOFTWARE AND DATA SETS FOR DATA MINING
  • APPENDIX B—STATISTICS FOR PERFORMANCE EVALUATION
  • BIBLIOGRAPHY
  • INDEX
Show More

Additional information

Veldu vöru

Rafbók til eignar

Reviews

There are no reviews yet.

Be the first to review “Data Mining”

Netfang þitt verður ekki birt. Nauðsynlegir reitir eru merktir *

Aðrar vörur

0
    0
    Karfan þín
    Karfan þín er tómAftur í búð