Description
Efnisyfirlit
- Preface
- Contents
- 1 Data Mining and Information Systems: Quo Vadis?
- Robert Stahlbock, Stefan Lessmann, and Sven F. Crone
- 1.1 Introduction
- 1.2 Special Issues in Data Mining
- 1.2.1 Confirmatory Data Analysis
- 1.2.2 Knowledge Discovery from Supervised Learning
- 1.2.3 Classification Analysis
- 1.2.4 Hybrid Data Mining Procedures
- 1.2.5 Web Mining
- 1.2.6 Privacy-Preserving Data Mining
- 1.3 Conclusion and Outlook
- References
- Part I Confirmatory Data Analysis
- 2 Response-Based Segmentation Using Finite Mixture Partial Least Squares
- Christian M. Ringle, Marko Sarstedt, and Erik A. Mooi
- 2.1 Introduction
- 2.1.1 On the Use of PLS Path Modeling
- 2.1.2 Problem Statement
- 2.1.3 Objectives and Organization
- 2.2 Partial Least Squares Path Modeling
- 2.3 Finite Mixture Partial Least Squares Segmentation
- 2.3.1 Foundations
- 2.3.2 Methodology
- 2.3.3 Systematic Application of FIMIX-PLS
- 2.4 Application of FIMIX-PLS
- 2.4.1 On Measuring Customer Satisfaction
- 2.4.2 Data and Measures
- 2.4.3 Data Analysis and Results
- 2.5 Summary and Conclusion
- References
- Part II Knowledge Discovery from Supervised Learning
- 3 Building Acceptable Classification Models
- David Martens and Bart Baesens
- 3.1 Introduction
- 3.2 Comprehensibility of Classification Models
- 3.2.1 Measuring Comprehensibility
- 3.2.2 Obtaining Comprehensible Classification Models
- 3.2.2.1 Building Rule-Based Models
- 3.2.2.2 Combining Output Types
- 3.2.2.3 Visualization
- 3.3 Justifiability of Classification Models
- 3.3.1 Taxonomy of Constraints
- 3.3.2 Monotonicity Constraint
- 3.3.3 Measuring Justifiability
- 3.3.4 Obtaining Justifiable Classification Models
- 3.4 Conclusion
- References
- 4 Mining Interesting Rules Without Support Requirement: A General Universal Existential Upward Closu
- Yannick Le Bras, Philippe Lenca, and Stéphane Lallich
- 4.1 Introduction
- 4.2 State of the Art
- 4.3 An Algorithmic Property of Confidence
- 4.3.1 On UEUC Framework
- 4.3.2 The UEUC Property
- 4.3.3 An Efficient Pruning Algorithm
- 4.3.4 Generalizing the UEUC Property
- 4.4 A Framework for the Study of Measures
- 4.4.1 Adapted Functions of Measure
- 4.4.1.1 Association Rules
- 4.4.1.2 Contingency Tables
- 4.4.2 Expression of a Set of Measures of Ddconf
- 4.5 Conditions for GUEUC
- 4.5.1 A Sufficient Condition
- 4.5.2 A Necessary Condition
- 4.5.3 Classification of the Measures
- 4.6 Conclusion
- References
- 5 Classification Techniques and Error Control in Logic Mining
- Giovanni Felici, Bruno Simeone, and Vincenzo Spinelli
- 5.1 Introduction
- 5.2 Brief Introduction to Box Clustering
- 5.3 BC-Based Classifier
- 5.4 Best Choice of a Box System
- 5.5 Bi-criterion Procedure for BC-Based Classifier
- 5.6 Examples
- 5.6.1 The Data Sets
- 5.6.2 Experimental Results with BC
- 5.6.3 Comparison with Decision Trees
- 5.7 Conclusions
- References
- Part III Classification Analysis
- 6 An Extended Study of the Discriminant Random Forest
- Tracy D. Lemmond, Barry Y. Chen, Andrew O. Hatch,and William G. Hanley
- 6.1 Introduction
- 6.2 Random Forests
- 6.3 Discriminant Random Forests
- 6.3.1 Linear Discriminant Analysis
- 6.3.2 The Discriminant Random Forest Methodology
- 6.4 DRF and RF: An Empirical Study
- 6.4.1 Hidden Signal Detection
- 6.4.1.1 Training on T1, Testing on J2
- 6.4.1.2 Prediction Performance for J2 with Cross-validation
- 6.4.2 Radiation Detection
- 6.4.3 Significance of Empirical Results
- 6.4.4 Small Samples and Early Stopping
- 6.4.5 Expected Cost
- 6.5 Conclusions
- References
- 7 Prediction with the SVM Using Test Point Margins
- Süreyya Özögür-Akyüz, Zakria Hussain, and John Shawe-Taylor
- 7.1 Introduction
- 7.2 Methods
- 7.3 Data Set Description
- 7.4 Results
- 7.5 Discussion and Future Work
- References
- 8 Effects of Oversampling Versus Cost-Sensitive Learning for Bayesian and SVM Classifiers
- Alexander Liu, Cheryl Martin, Brian La Cour, and Joydeep Ghosh
- 8.1 Introduction
- 8.2 Resampling
- 8.2.1 Random Oversampling
- 8.2.2 Generative Oversampling
- 8.3 Cost-Sensitive Learning
- 8.4 Related Work
- 8.5 A Theoretical Analysis of Oversampling Versus Cost-Sensitive Learning
- 8.5.1 Bayesian Classification
- 8.5.2 Resampling Versus Cost-Sensitive Learning in Bayesian Classifiers
- 8.5.3 Effect of Oversampling on Gaussian Naive Bayes
- 8.5.3.1 Random Oversampling
- 8.5.3.2 Generative Oversampling
- 8.5.3.3 Comparison to Cost-Sensitive Learning
- 8.5.4 Effects of Oversampling for Multinomial Naive Bayes
- 8.6 Empirical Comparison of Resampling and Cost-SensitiveLearning
- 8.6.1 Explaining Empirical Differences Between Resampling and Cost-Sensitive Learning
- 8.6.2 Naive Bayes Comparisons on Low-Dimensional Gaussian Data
- 8.6.2.1 Gaussian Naive Bayes on Artificial, Low-Dimensional Data
- 8.6.2.2 A Note on ROC and AUC
- 8.6.3 Multinomial Naive Bayes
- 8.6.4 SVMs
- 8.6.5 Discussion
- 8.7 Conclusion
- Appendix
- References
- 9 The Impact of Small Disjuncts on Classifier Learning
- Gary M. Weiss
- 9.1 Introduction
- 9.2 An Example: The Vote Data Set
- 9.3 Description of Experiments
- 9.4 The Problem with Small Disjuncts
- 9.5 The Effect of Pruning on Small Disjuncts
- 9.6 The Effect of Training Set Size on Small Disjuncts
- 9.7 The Effect of Noise on Small Disjuncts
- 9.8 The Effect of Class Imbalance on Small Disjuncts
- 9.9 Related Work
- 9.10 Conclusion
- References
- Part IV Hybrid Data Mining Procedures
- 10 Predicting Customer Loyalty Labels in a Large Retail Database: A Case Study in Chile
- Cristián J. Figueroa
- 10.1 Introduction
- 10.2 Related Work
- 10.3 Objectives of the Study
- 10.3.1 Supervised and Unsupervised Learning
- 10.3.2 Unsupervised Algorithms
- 10.3.2.1 Self-Organizing Map
- 10.3.2.2 Sammon Mapping
- 10.3.2.3 Curvilinear Component Analysis
- 10.3.3 Variables for Segmentation
- 10.3.4 Exploratory Data Analysis
- 10.3.5 Results of the Segmentation
- 10.4 Results of the Classifier
- 10.5 Business Validation
- 10.5.1 In-Store Minutes Charges for Prepaid Cell Phones
- 10.5.2 Distribution of Products in the Store
- 10.6 Conclusions and Discussion
- Appendix
- References
- 11 PCA-Based Time Series Similarity Search
- Leonidas Karamitopoulos, Georgios Evangelidis, and Dimitris Dervos
- 11.1 Introduction
- 11.2 Background
- 11.2.1 Review of PCA
- 11.2.2 Implications of PCA in Similarity Search
- 11.2.3 Related Work
- 11.3 Proposed Approach
- 11.4 Experimental Methodology
- 11.4.1 Data Sets
- 11.4.2 Evaluation Methods
- 11.4.3 Rival Measures
- 11.5 Results
- 11.5.1 1-NN Classification
- 11.5.2 k-NN Similarity Search
- 11.5.3 Speeding Up the Calculation of APEdist
- 11.6 Conclusion
- References
- 12 Evolutionary Optimization of Least-Squares Support Vector Machines
- Arjan Gijsberts, Giorgio Metta, and Léon Rothkrantz
- 12.1 Introduction
- 12.2 Kernel Machines
- 12.2.1 Least-Squares Support Vector Machines
- 12.2.2 Kernel Functions
- 12.2.2.1 Conditions for Kernels
- 12.3 Evolutionary Computation
- 12.3.1 Genetic Algorithms
- 12.3.2 Evolution Strategies
- 12.3.3 Genetic Programming
- 12.4 Related Work
- 12.4.1 Hyperparameter Optimization
- 12.4.2 Combined Kernel Functions
- 12.5 Evolutionary Optimization of Kernel Machines
- 12.5.1 Hyperparameter Optimization
- 12.5.2 Kernel Construction
- 12.5.3 Objective Function
- 12.6 Results
- 12.6.1 Data Sets
- 12.6.2 Results for Hyperparameter Optimization
- 12.6.3 Results for EvoKMGP
- 12.7 Conclusions and Future Work
- References
- 13 Genetically Evolved kNN Ensembles
- Ulf Johansson, Rikard König, and Lars Niklasson
- 13.1 Introduction
- 13.2 Background and Related Work
- 13.3 Method
- 13.3.1 Data sets
- 13.4 Results
- 13.5 Conclusions
- References
- Part V Web-Mining
- 14 Behaviorally Founded Recommendation Algorithm for Browsing Assistance Systems
- Peter Géczy, Noriaki Izumi, Shotaro Akaho, and Kôiti Hasida
- 14.1 Introduction
- 14.1.1 Related Works
- 14.1.2 Our Contribution and Approach
- 14.2 Concept Formalization
- 14.3 System Design
- 14.3.1 A Priori Knowledge of Human–System Interactions
- 14.3.2 Strategic Design Factors
- 14.3.3 Recommendation Algorithm Derivation
- 14.4 Practical Evaluation
- 14.4.1 Intranet Portal
- 14.4.2 System Evaluation
- 14.4.3 Practical Implications and Limitations
- 14.5 Conclusions and Future Work
- References
- 15 Using Web Text Mining to Predict Future Events: A Testof the Wisdom of Crowds Hypothesis
- Scott Ryan and Lutz Hamel
- 15.1 Introduction
- 15.2 Method
- 15.2.1 Hypotheses and Goals
- 15.2.2 General Methodology
- 15.2.3 The 2006 Congressional and Gubernatorial Elections
- 15.2.4 Sporting Events and Reality Television Programs
- 15.2.5 Movie Box Office Receipts and Music Sales
- 15.2.6 Replication
- 15.3 Results and Discussion
- 15.3.1 The 2006 Congressional and Gubernatorial Elections
- 15.3.2 Sporting Events and Reality Television Programs
- 15.3.3 Movie and Music Album Results
- 15.4 Conclusion
- References
- Part VI Privacy-Preserving Data Mining
- 16 Avoiding Attribute Disclosure with the (Extended) p-Sensitive k-Anonymity Model
- Traian Marius Truta and Alina Campan
- 16.1 Introduction
- 16.2 Privacy Models and Algorithms
- 16.2.1 The p-Sensitive k-Anonymity Model and Its Extension
- 16.2.2 Algorithms for the p-Sensitive k-Anonymity Model
- 16.3 Experimental Results
- 16.3.1 Experiments for p-Sensitive k-Anonymity
- 16.3.2 Experiments for Extended p-Sensitive k-Anonymity
- 16.4 New Enhanced Models Based on p-Sensitive k-Anonymity
- 16.4.1 Constrained p-Sensitive k-Anonymity
- 16.4.2 p-Sensitive k-Anonymity in Social Networks
- 16.5 Conclusions and Future Work
- References
- 17 Privacy-Preserving Random Kernel Classification of Checkerboard Partitioned Data
- Olvi L. Mangasarian and Edward W. Wild
- 17.1 Introduction
- 17.2 Privacy-Preserving Linear Classifier for Checkerboard Partitioned Data
- 17.3 Privacy-Preserving Nonlinear Classifier for Checkerboard Partitioned Data
- 17.4 Computational Results
- 17.5 Conclusion and Outlook
- References
Reviews
There are no reviews yet.