Description
Efnisyfirlit
- FRONT MATTER
- Contents
- Preface
- Overview of the Book
- Distinctive Features of the Book
- Teaching Support
- Acknowledgments
- Author
- Part I An Overview of Data Mining
- 1 Introduction to Data, Data Patterns, and Data Mining
- 1.1 Examples of Small Data Sets
- TABLE 1.1 Balloon Data Set
- TABLE 1.2 Space Shuttle O-Ring Data Set
- 1.2 Types of Data Variables
- 1.2.1 Attribute Variable versus Target Variable
- TABLE 1.3 Lenses Data Set
- TABLE 1.4 Data Set for a Manufacturing System to Detect and Diagnose Faults
- FIGURE 1.1 A manufacturing system with nine machines and production flows of parts.
- 1.2.2 Categorical Variable versus Numeric Variable
- 1.3 Data Patterns Learned through Data Mining
- 1.3.1 Classification and Prediction Patterns
- FIGURE 1.2 The fitted linear relation model of Launch Temperature with Number of O-rings with Stress in the space shuttle O-ring data set.
- TABLE 1.5 Predicted Value of O-Rings with Stress
- 1.3.2 Cluster and Association Patterns
- FIGURE 1.3 Clustering of 10 data records in the data set of a manufacturing system.
- 1.3.3 Data Reduction Patterns
- FIGURE 1.4 Reduction of a two-dimensional data set to a one-dimensional data set.
- 1.3.4 Outlier and Anomaly Patterns
- FIGURE 1.5 Frequency histogram of Launch Temperature in the space shuttle data set.
- 1.3.5 Sequential and Temporal Patterns
- FIGURE 1.6 Temperature in each quarter of a 3-year period.
- TABLE 1.6 Test Data Set for a Manufacturing System to Detect and Diagnose Faults
- 1.4 Training Data and Test Data
- Exercises
- Part II Algorithms for Mining Classification and Prediction Patterns
- 2 Linear and Nonlinear Regression Models
- 2.1 Linear Regression Models
- FIGURE 2.1 Illustration of a simple regression model.
- 2.2 Least-Squares Method and Maximum Likelihood Method of Parameter Estimation
- Example 2.1
- TABLE 2.1 Data Set of O-Rings with Stress along with the Predicted Target Value from the Linear Regression
- TABLE 2.2 Calculation for Estimating the Parameters of the Linear Model in Example 2.1
- 2.3 Nonlinear Regression Models and Parameter Estimation
- 2.4 Software and Applications
- Exercises
- 3 Naïve Bayes Classifier
- 3.1 Bayes Theorem
- 3.2 Classification Based on the Bayes Theorem and Naïve Bayes Classifier
- Example 3.1
- TABLE 3.1 Training Data Set for System Fault Detection
- TABLE 3.2 Classification of Data Records in the Testing Data Set for System Fault Detection
- 3.3 Software and Applications
- Exercises
- 4 Decision and Regression Trees
- 4.1 Learning a Binary Decision Tree and Classifying Data Using a Decision Tree
- 4.1.1 Elements of a Decision Tree
- TABLE 4.1 Data Set for System Fault Detection
- FIGURE 4.1 Decision tree for system fault detection.
- 4.1.2 Decision Tree with the Minimum Description Length
- 4.1.3 Split Selection Methods
- TABLE 4.2 Binary Split of the Root Node and Calculation of Information Entropy for the Data Set of System Fault Detection
- FIGURE 4.2 Information entropy.
- 4.1.4 Algorithm for the Top-Down Construction of a Decision Tree
- TABLE 4.3 Binary Split of the Root Node and Calculation of the Gini-Index for the Data Set of System Fault Detection
- Example 4.1
- TABLE 4.4 Binary Split of an Internal Node with D = {2, 4, 5, 9, 10} and Calculation of Information Entropy for the Data Set of System Fault Detection
- TABLE 4.5 Binary Split of an Internal Node with D = {2, 4, 5, 9, 10} and Calculation of the Gini-Index Values for the Data Set of System Fault Detection
- 4.1.5 Classifying Data Using a Decision Tree
- FIGURE 4.3 Classifying a data record for no system fault using the decision tree for system fault detection.
- TABLE 4.6 Classification of Data Records in the Testing Data Set for System Fault Detection
- FIGURE 4.4 Classifying a data record for multiple machine faults using the decision tree for system fault detection.
- 4.2 Learning a Nonbinary Decision Tree
- Example 4.2
- FIGURE 4.5 Decision tree for the lenses data set.
- TABLE 4.7 Nonbinary Split of the Root Node and Calculation of Information Entropy for the Lenses Data Set
- TABLE 4.8 Nonbinary Split of an Internal Node, {2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24}, and Calculation of Information Entropy for the Lenses Data Set
- TABLE 4.9 Nonbinary Split of an Internal Node, {2, 6, 10, 14, 18, 22}, and Calculation of Information Entropy for the Lenses Data Set
- TABLE 4.10 Nonbinary Split of an Internal Node, {4, 8, 12, 16, 20, 24}, and Calculation of Information Entropy for the Lenses Data Set
- 4.3 Handling Numeric and Missing Values of Attribute Variables
- 4.4 Handling a Numeric Target Variable and Constructing a Regression Tree
- 4.5 Advantages and Shortcomings of the Decision Tree Algorithm
- FIGURE 4.6 Decision tree for the balloon data set.
- 4.6 Software and Applications
- Exercises
- 5 Artificial Neural Networks for Classification and Prediction
- 5.1 Processing Units of ANNs
- FIGURE 5.1 Processing unit of ANN.
- FIGURE 5.2 Examples of transfer functions.
- TABLE 5.1 AND Function
- FIGURE 5.3 Implementation of the AND function using one processing unit.
- TABLE 5.2 OR Function
- FIGURE 5.4 Implementation of the OR function using one processing unit.
- 5.2 Architectures of ANNs
- FIGURE 5.5 Architecture of a one-layer feedforward ANN.
- FIGURE 5.6 Architecture of a two-layer feedforward ANN.
- FIGURE 5.7 A two-layer feedforward ANN to implement the XOR function.
- TABLE 5.3 XOR Function
- FIGURE 5.8 Architecture of a recurrent ANN.
- 5.3 Methods of Determining Connection Weights for a Perceptron
- 5.3.1 Perceptron
- 5.3.2 Properties of a Processing Unit
- FIGURE 5.9 Example of the decision boundary and the separation of the input space into two regions by a processing unit.
- 5.3.3 Graphical Method of Determining Connection Weights and Biases
- Example 5.1
- FIGURE 5.10 Illustration of the graphical method to determine connection weights.
- 5.3.4 Learning Method of Determining Connection Weights and Biases
- Figure 5.11 Illustration of the learning method to change connection weights.
- 5.3.5 Limitation of a Perceptron
- FIGURE 5.12 Four data points of the XOR function.
- TABLE 5.4 Function of Each Processing Unit in a Two-Layer ANN to Implement the XOR Function
- Table 5.5 NOT Function
- 5.4 Back-Propagation Learning Method for a Multilayer Feedforward ANN
- Example 5.2
- FIGURE 5.13 A set of weights with randomly assigned values in a two-layer feedforward ANN for the XOR function.
- FIGURE 5.14 Effect of the learning rate.
- 5.5 Empirical Selection of an ANN Architecture for a Good Fit to Data
- FIGURE 5.15 An illustration of a nonlinear model overfitting to data from a linear model.
- 5.6 Software and Applications
- Exercises
- 6 Support Vector Machines
- 6.1 Theoretical Foundation for Formulating and Solving an Optimization Problem to Learn a Classification Function
- 6.2 SVM Formulation for a Linear Classifier and a Linearly Separable Problem
- FIGURE 6.1 SVM for a linear classifier and a linearly separable problem. (a) A decision boundary with a large margin. (b) A decision boundary with a small margin.
- 6.3 Geometric Interpretation of the SVM Formulation for the Linear Classifier
- 6.4 Solution of the Quadratic Programming Problem for a Linear Classifier
- Example 6.1
- TABLE 6.1 AND function
- FIGURE 6.2 Decision function and support vectors for the SVM linear classifier in Example 6.1.
- 6.5 SVM Formulation for a Linear Classifier and a Nonlinearly Separable Problem
- 6.6 SVM Formulation for a Nonlinear Classifier and a Nonlinearly Separable Problem
- FIGURE 6.3 A polynomial decision function in a two-dimensional space.
- FIGURE 6.4 A Gaussian radial basis function in a two-dimensional space.
- 6.7 Methods of Using SVM for Multi-Class Classification Problems
- 6.8 Comparison of ANN and SVM
- 6.9 Software and Applications
- Exercises
- 7 k-Nearest Neighbor Classifier and Supervised Clustering
- 7.1 k-Nearest Neighbor Classifier
- Example 7.1
- TABLE 7.1 Training Data Set for System Fault Detection
- TABLE 7.2 Testing Data Set for System Fault Detection and the Classification Results in Examples 7.1 and 7.2
- 7.2 Supervised Clustering
- TABLE 7.3 Supervised Clustering Algorithm
- Example 7.2
- 7.3 Software and Applications
- Exercises
- Part III Algorithms for Mining Cluster and Association Patterns
- 8 Hierarchical Clustering
- 8.1 Procedure of Agglomerative Hierarchical Clustering
- 8.2 Methods of Determining the Distance between Two Clusters
- Example 8.1
- 8.3 Illustration of the Hierarchical Clustering Procedure
- Example 8.2
- TABLE 8.1 Data Set for System Fault Detection with Nine Cases of Single-Machine Faults
- FIGURE 8.1 Result of hierarchical clustering for the data set of system fault detection.
- TABLE 8.2 Distance for Each Pair of Clusters: C1, C2, C3, C4, C5, C6, C7, C8, and C9
- TABLE 8.3 Distance for Each Pair of Clusters: C1,5, C2,4, C3, C6,7, C8, and C9
- TABLE 8.4 Distance for Each Pair of Clusters: C1,5, C2,4,8, C3, C6,7, and C9
- TABLE 8.5 Distance for Each Pair of Clusters: C1,5,6,7,9, C2,4,8, and C3
- 8.4 Nonmonotonic Tree of Hierarchical Clustering
- FIGURE 8.2 An example of three data points for which the centroid linkage method produces a nonmonotonic tree of hierarchical clustering.
- FIGURE 8.3 Nonmonotonic tree of hierarchical clustering for the data points in Figure 8.2.
- 8.5 Software and Applications
- Exercises
- 9 K-Means Clustering and Density-Based Clustering
- 9.1 K-Means Clustering
- Table 9.1 K-Means Clustering Algorithm
- Exercise 9.1
- Table 9.2 Data Set for System Fault Detection with Nine Cases of Single-Machine Faults
- 9.2 Density-Based Clustering
- 9.3 Software and Applications
- Exercises
- 10 Self-Organizing Map
- 10.1 Algorithm of Self-Organizing Map
- FIGURE 10.1 Architectures of SOM with a (a) one-, (b) two-, and (c) three-dimensional output map.
- TABLE 10.1 Learning Algorithm of SOM
- Example 10.1
- FIGURE 10.2 Architecture of SOM for Example 10.1.
- TABLE 10.2 Data Set for System Fault Detection with Nine Cases of Single-Machine Faults
- FIGURE 10.3 The winner nodes for the nine data points in Example 10.1 using initial weight values.
- 10.2 Software and Applications
- Exercises
- 11 Probability Distributions of Univariate Data
- 11.1 Probability Distribution of Univariate Data and Probability Distribution Characteristics of Various Data Patterns
- TABLE 11.1 Values of Launch Temperature in the Space Shuttle O-Ring Data Set
- FIGURE 11.1 Frequency histogram of the Launch Temperature data.
- FIGURE 11.2 Time series data patterns and their probability distributions. (a) The data plot and histogram of spike pattern, (b) the data plot and histogram of random fluctuation pattern. Time series data patterns and their probability distributions. (c) the data plot and histogram of a step change pattern, and (d) the data plot and histogram of a steady change pattern.
- 11.2 Method of Distinguishing Four Probability Distributions
- TABLE 11.2 Combinations of Skewness and Mode Test Results for Distinguishing Four Probability Distributions
- 11.3 Software and Applications
- Exercises
- 12 Association Rules
- 12.1 Definition of Association Rules and Measures of Association
- TABLE 12.1 Data Set for System Fault Detection with Nine Cases of Single-Machine Faults and Item Sets Obtained from This Data Set
- FIGURE 12.1 A manufacturing system with nine machines and production flows of parts.
- 12.2 Association Rule Discovery
- TABLE 12.2 Apriori Algorithm
- Example 12.1
- Example 12.2
- 12.3 Software and Applications
- Exercises
- 13 Bayesian Network
- 13.1 Structure of a Bayesian Network and Probability Distributions of Variables
- TABLE 13.1 Training Data Set for System Fault Detection
- FIGURE 13.1 Manufacturing system with nine machines and production flows of parts.
- FIGURE 13.2 Structure of a Bayesian network for the data set of system fault detection.
- TABLE 13.2 P(x5|x1)
- TABLE 13.3 P(x6|x3)
- TABLE 13.4 P(x4|x3, x2)
- TABLE 13.5 P(x9|x5)
- TABLE 13.6 P(x7|x5, x6)
- TABLE 13.7 P(x8|x4)
- TABLE 13.8 P(y|x9)
- TABLE 13.9 P(y|x7)
- TABLE 13.10 P(y|x8)
- TABLE 13.11 P(x1)
- TABLE 13.12 P(x2)
- TABLE 13.13 P(x3)
- Example 13.1
- 13.2 Probabilistic Inference
- Example 13.2
- Example 13.3
- 13.3 Learning of a Bayesian Network
- 13.4 Software and Applications
- Exercises
- Part IV Algorithms for Mining Data Reduction Patterns
- 14 Principal Component Analysis
- 14.1 Review of Multivariate Statistics
- Example 14.1
- TABLE 14.1 Data Set for System Fault Detection with Two Quality Variables
- TABLE 14.2 Joint and Marginal Probabilities of Two Quality Variables
- 14.2 Review of Matrix Algebra
- FIGURE 14.1 Computation of the length of a vector.
- FIGURE 14.2 Computation of the angle between two vectors.
- Example 14.2
- Example 14.3
- Example 14.4
- 14.3 Principal Component Analysis
- Example 14.5
- 14.4 Software and Applications
- Exercises
- 15 Multidimensional Scaling
- 15.1 Algorithm of MDS
- TABLE 15.1 MDS Algorithm
- TABLE 15.2 Monotone Regression Algorithm
- Example 15.1
- TABLE 15.3 Data Set for System Fault Detection with Three Cases of Single-Machine Faults
- TABLE 15.4 Euclidean Distance for Each Pair of Data Points
- 15.2 Number of Dimensions
- FIGURE 15.1 An example of plotting the stress of a MDS result versus the number of dimensions.
- 15.3 INDSCALE for Weighted MDS
- 15.4 Software and Applications
- Exercises
- Part V Algorithms for Mining Outlier and Anomaly Patterns
- 16 Univariate Control Charts
- 16.1 Shewhart Control Charts
- TABLE 16.1 Samples of Data Observations
- 16.2 CUSUM Control Charts
- Example 16.1
- TABLE 16.2 Data Observations of the Launch Temperature from the Data Set of O-Rings with Stress along with Statistics for the Two-Side CUSUM Control Chart
- FIGURE 16.1 Two-side CUSUM control chart for the launch temperature in the data set of O-ring with stress.
- 16.3 EWMA Control Charts
- FIGURE 16.2 Exponentially decreasing weights on data observations.
- Exercise 16.2
- TABLE 16.3 Data Observations of the Launch Temperature from the Data Set of O-Rings with Stress along with the EWMA Statistic for the EWMA Control Chart
- FIGURE 16.3 EWMA control chart to monitor the launch temperature from the data set of O-rings with stress.
- 16.4 Cuscore Control Charts
- 16.5 Receiver Operating Curve (ROC) for Evaluation and Comparison of Control Charts
- TABLE 16.4 Pairs of the False Alarm Rate and the Hit Rate for Various Values of the Decision Threshold H for the Two-Side CUSUM Control Chart in Example 16.1
- FIGURE 16.4 ROC for the two-side CUSUM control chart in Example 16.1.
- 16.6 Software and Applications
- Exercises
- 17 Multivariate Control Charts
- 17.1 Hotelling’s T2 Control Charts
- FIGURE 17.1 An illustration of statistical distance measured by Hotelling’s T 2 and control limits of Hotelling’s T 2 control charts and univariate control charts.
- Example 17.1
- TABLE 17.1 Data Set for System Fault Detection with Two Quality Variables
- 17.2 Multivariate EWMA Control Charts
- 17.3 Chi-Square Control Charts
- 17.4 Applications
- Exercises
- Part VI Algorithms for Mining Sequential and Temporal Patterns
- 18 Autocorrelation and Time Series Analysis
- 18.1 Autocorrelation
- 18.2 Stationarity and Nonstationarity
- 18.3 ARMA Models of Stationary Series Data
- Table 18.1 Time Series of an AR(1) Model with ϕ1 = 0.9, x0 = 3, and and a White Noise Process for et
- FIGURE 18.1 Time series data generated using an AR(1) model with ϕ1 = 0.9 and a white noise process for et.
- TABLE 18.2 Time Series of an MA(1) Model with θ1 = 0.9 and a White Noise Process for et
- FIGURE 18.2 Time series data generated using an MA(1) model with θ1 = 0.9 and a white noise process for et.
- 18.4 ACF and PACF Characteristics of ARMA Models
- 18.5 Transformations of Nonstationary Series Data and ARIMA Models
- 18.6 Software and Applications
- Exercises
- 19 Markov Chain Models and Hidden Markov Models
- 19.1 Markov Chain Models
- Example 19.1
- FIGURE 19.1 States and state transitions in Example 19.1.
- 19.2 Hidden Markov Models
- FIGURE 19.2 Any path method and the best path method for a hidden Markov model.
- 19.3 Learning Hidden Markov Models
- Example 19.2
- 19.4 Software and Applications
- Exercises
- 20 Wavelet Analysis
- 20.1 Definition of Wavelet
- FIGURE 20.1 The scaling function and the wavelet function of the Haar wavelet and the dilation and shift effects.
- 20.2 Wavelet Transform of Time Series Data
- FIGURE 20.2 A sample of time series data from (a) a function, (b) a sample of data points taken from a function, and (c) an approximation of the function using the scaling function of Haar wavelet.
- Example 20.1
- FIGURE 20.3 Graphic illustration of the Paul wavelet, the DoG wavelet, the Daubechies wavelet, and the Morlet wavelet. (Ye, N., Secure Computer and Network Systems: Modeling, Analysis and Design, 2008, Figure 11.2, p. 200. Copyright Wiley-VCH Verlag GmbH & Co. KGaA. Reproduced with permission).
- 20.3 Reconstruction of Time Series Data from Wavelet Coefficients
- Example 20.2
- 20.4 Software and Applications
- Exercises
- References
- Index
Reviews
There are no reviews yet.