Description
Efnisyfirlit
- Half-title
- Reviews
- Title page
- Copyright information
- Brief Contents
- Table of contents
- About the Authors
- Preface
- Who This Book is For
- Topics Covered in this Book
- How to Read this Book
- Cross-Chapter Case Study: Sober
- Additional Material
- Acknowledgments
- Sober: 1000‰ Driven by Technology
- Part I Databases and Database Design
- 1 Fundamental Concepts of Database Management
- 1.1 Applications of Database Technology
- 1.2 Key Definitions
- 1.3 File versus Database Approach to Data Management
- 1.3.1 The File-Based Approach
- 1.3.2 The Database Approach
- 1.4 Elements of a Database System
- 1.4.1 Database Model versus Instances
- 1.4.2 Data Model
- 1.4.3 The Three-Layer Architecture
- 1.4.4 Catalog
- 1.4.5 Database Users
- 1.4.6 Database Languages
- 1.5 Advantages of Database Systems and Database Management
- 1.5.1 Data Independence
- 1.5.2 Database Modeling
- 1.5.3 Managing Structured, Semi-Structured, and Unstructured Data
- 1.5.4 Managing Data Redundancy
- 1.5.5 Specifying Integrity Rules
- 1.5.6 Concurrency Control
- 1.5.7 Backup and Recovery Facilities
- 1.5.8 Data Security
- 1.5.9 Performance Utilities
- Summary
- Problems and Exercises
- 2 Architecture and Categorization of DBMSs
- 2.1 Architecture of a DBMS
- 2.1.1 Connection and Security Manager
- 2.1.2 DDL Compiler
- 2.1.3 Query Processor
- 2.1.3.1 DML Compiler
- 2.1.3.2 Query Parser and Query Rewriter
- 2.1.3.3 Query Optimizer
- 2.1.3.4 Query Executor
- 2.1.4 Storage Manager
- 2.1.4.1 Transaction Manager
- 2.1.4.2 Buffer Manager
- 2.1.4.3 Lock Manager
- 2.1.4.4 Recovery Manager
- 2.1.5 DBMS Utilities
- 2.1.6 DBMS Interfaces
- 2.2 Categorization of DBMSs
- 2.2.1 Categorization Based on Data Model
- 2.2.1.1 Hierarchical DBMSs
- 2.2.1.2 Network DBMSs
- 2.2.1.3 Relational DBMSs
- 2.2.1.4 Object-Oriented DBMSs
- 2.2.1.5 Object-Relational/Extended Relational DBMSs
- 2.2.1.6 XML DBMSs
- 2.2.1.7 NoSQL DBMSs
- 2.2.2 Categorization Based on Degree of Simultaneous Access
- 2.2.3 Categorization Based on Architecture
- 2.2.4 Categorization Based on Usage
- Summary
- Problems and Exercises
- 3 Conceptual Data Modeling Using the (E)ER Model and UML Class Diagram
- 3.1 Phases of Database Design
- 3.2 The Entity Relationship Model
- 3.2.1 Entity Types
- 3.2.2 Attribute Types
- 3.2.3.1 Domains
- 3.2.3.2 Key Attribute Types
- 3.2.3.3 Simple versus Composite Attribute Types
- 3.2.3.4 Single-Valued versus Multi-Valued Attribute Types
- 3.2.3.5 Derived Attribute Type
- 3.2.4 Relationship Types
- 3.2.4.1 Degree and Roles
- 3.2.4.2 Cardinalities
- 3.2.4.3 Relationship Attribute Types
- 3.2.5 Weak Entity Types
- 3.2.6 Ternary Relationship Types
- 3.2.7 Examples of the ER Model
- 3.2.8 Limitations of the ER Model
- 3.3 The Enhanced Entity Relationship (EER) Model
- 3.3.1 Specialization/Generalization
- 3.3.2 Categorization
- 3.3.3 Aggregation
- 3.3.4 Examples of the EER Model
- 3.3.5 Designing an EER Model
- 3.4 The UML Class Diagram
- 3.4.1 Recap of Object Orientation
- 3.4.2 Classes
- 3.4.3 Variables
- 3.4.4 Access Modifiers
- 3.4.5 Associations
- 3.4.5.1 Association Class
- 3.4.5.2 Unidirectional versus Bidirectional Association
- 3.4.5.3 Qualified Association
- 3.4.6 Specialization/Generalization
- 3.4.7 Aggregation
- 3.4.8 UML Example
- 3.4.9 Advanced UML Modeling Concepts
- 3.4.9.1 Changeability Property
- 3.4.9.2 Object Constraint Language (OCL)
- 3.4.9.3 Dependency Relationship
- 3.4.10 UML versus EER
- Summary
- Problems and Exercises
- 4 Organizational Aspects of Data Management
- 4.1 Data Management
- 4.1.1 Catalogs and the Role of Metadata
- 4.1.2 Metadata Modeling
- 4.1.3 Data Quality
- 4.1.3.1 Data Quality Dimensions
- Accuracy
- Completeness
- Consistency
- Accessibility
- 4.1.3.2 Data Quality Problems
- 4.1.4 Data Governance
- 4.2 Roles in Data Management
- 4.2.1 Information Architect
- 4.2.2 Database Designer
- 4.2.3 Data Owner
- 4.2.4 Data Steward
- 4.2.5 Database Administrator
- 4.2.6 Data Scientist
- Summary
- Problems and Exercises
- Part II Types of Database Systems
- 5 Legacy Databases
- 5.1 The Hierarchical Model
- 5.2 The CODASYL Model
- Summary
- Problems and Exercises
- 6 Relational Databases: The Relational Model
- 6.1 The Relational Model
- 6.1.1 Basic Concepts
- 6.1.2 Formal Definitions
- 6.1.3 Types of Keys
- 6.1.3.1 Superkeys and Keys
- 6.1.3.2 Candidate Keys, Primary Keys, and Alternative Keys
- 6.1.3.3 Foreign Keys
- 6.1.4 Relational Constraints
- 6.1.5 Example Relational Data Model
- 6.2 Normalization
- 6.2.1 Insertion, Deletion, and Update Anomalies in an Unnormalized Relational Model
- 6.2.2 Informal Normalization Guidelines
- 6.2.3 Functional Dependencies and Prime Attribute Types
- 6.2.4 Normalization Forms
- 6.2.4.1 First Normal Form (1NF)
- 6.2.4.2 Second Normal Form (2NF)
- 6.2.4.3 Third Normal Form (3NF)
- 6.2.4.4 Boyce–Codd Normal Form (BCNF)
- 6.2.4.5 Fourth Normal Form (4NF)
- 6.3 Mapping a Conceptual ER Model to a Relational Model
- 6.3.1 Mapping Entity Types
- 6.3.2 Mapping Relationship Types
- 6.3.2.1 Mapping a Binary 1:1 Relationship type
- 6.3.2.2 Mapping a Binary 1:N Relationship Type
- 6.3.2.3 Mapping a Binary M:N Relationship Type
- 6.3.2.4 Mapping Unary Relationship Types
- 6.3.2.5 Mapping n-ary Relationship Types
- 6.3.3 Mapping Multi-Valued Attribute Types
- 6.3.4 Mapping Weak Entity Types
- 6.3.5 Putting it All Together
- 6.4 Mapping a Conceptual EER Model to a Relational Model
- 6.4.1 Mapping an EER Specialization
- 6.4.2 Mapping an EER Categorization
- 6.4.3 Mapping an EER Aggregation
- Summary
- Problems and Exercises
- 7 Relational Databases: Structured Query Language (SQL)
- 7.1 Relational Database Management Systems and SQL
- 7.1.1 Key Characteristics of SQL
- 7.1.2 Three-Layer Database Architecture
- 7.2 SQL Data Definition Language
- 7.2.1 Key DDL Concepts
- 7.2.2 DDL Example
- 7.2.3 Referential Integrity Constraints
- 7.2.4 DROP and ALTER Command
- 7.3 SQL Data Manipulation Language
- 7.3.1 SQL SELECT Statement
- 7.3.1.1 Simple Queries
- 7.3.1.2 Queries with Aggregate Functions
- 7.3.1.3 Queries with GROUP BY/HAVING
- 7.3.1.4 Queries with ORDER BY
- 7.3.1.5 Join Queries
- Inner Joins
- Outer Joins
- 7.3.1.6 Nested Queries
- 7.3.1.7 Correlated Queries
- 7.3.1.8 Queries with ALL/ANY
- 7.3.1.9 Queries with EXISTS
- 7.3.1.10 Queries with Subqueries in SELECT/FROM
- 7.3.1.11 Queries with Set Operations
- 7.3.2 SQL INSERT Statement
- 7.3.3 SQL DELETE Statement
- 7.3.4 SQL UPDATE Statement
- 7.4 SQL Views
- 7.5 SQL Indexes
- 7.6 SQL Privileges
- 7.7 SQL for Metadata Management
- Summary
- Problems and Exercises
- 8 Object-Oriented Databases and Object Persistence
- 8.1 Recap: Basic Concepts of OO
- 8.2 Advanced Concepts of OO
- 8.2.1 Method Overloading
- 8.2.2 Inheritance
- 8.2.3 Method Overriding
- 8.2.4 Polymorphism and Dynamic Binding
- 8.3 Basic Principles of Object Persistence
- 8.3.1 Serialization
- 8.4 OODBMS
- 8.4.1 Object Identifiers
- 8.4.2 ODMG Standard
- 8.4.3 Object Model
- 8.4.4 Object Definition Language (ODL)
- 8.4.5 Object Query Language (OQL)
- 8.4.5.1 Simple OQL Queries
- 8.4.5.2 SELECT FROM WHERE OQL Queries
- 8.4.5.3 Join OQL Queries
- 8.4.5.4 Other OQL Queries
- 8.4.6 Language Bindings
- 8.5 Evaluating OODBMSs
- Summary
- Problems and Exercises
- 9 Extended Relational Databases
- 9.1 Limitations of the Relational Model
- 9.2 Active RDBMS Extensions
- 9.2.1 Triggers
- 9.2.2 Stored Procedures
- 9.3 Object-Relational RDBMS Extensions
- 9.3.1 User-Defined Types
- 9.3.1.1 Distinct Data Types
- 9.3.1.2 Opaque Data Types
- 9.3.1.3 Unnamed Row Types
- 9.3.1.4 Named Row Types
- 9.3.1.5 Table Data Types
- 9.3.2 User-Defined Functions
- 9.3.3 Inheritance
- 9.3.3.1 Inheritance at Data Type Level
- 9.3.3.2 Inheritance at Table Type Level
- 9.3.4 Behavior
- 9.3.5 Polymorphism
- 9.3.6 Collection Types
- 9.3.7 Large Objects
- 9.4 Recursive SQL Queries
- Summary
- Problems and Exercises
- 10 XML Databases
- 10.1 Extensible Markup Language
- 10.1.1 Basic Concepts
- 10.1.2 Document Type Definition and XML Schema Definition
- 10.1.3 Extensible Stylesheet Language
- 10.1.4 Namespaces
- 10.1.5 XPath
- 10.2 Processing XML Documents
- 10.3 Storage of XML Documents
- 10.3.1 The Document-Oriented Approach for Storing XML Documents
- 10.3.2 The Data-Oriented Approach for Storing XML Documents
- 10.3.3 The Combined Approach for Storing XML Documents
- 10.4 Differences Between XML Data and Relational Data
- 10.5 Mappings Between XML Documents and (Object-) Relational Data
- 10.5.1 Table-Based Mapping
- 10.5.2 Schema-Oblivious Mapping
- 10.5.3 Schema-Aware Mapping
- 10.5.4 SQL/XML
- 10.6 Searching XML Data
- 10.6.1 Full-Text Search
- 10.6.2 Keyword-Based Search
- 10.6.3 Structured Search With XQuery
- 10.6.4 Semantic Search With RDF and SPARQL
- 10.7 XML for Information Exchange
- 10.7.1 Message-Oriented Middleware
- 10.7.2 SOAP-Based Web Services
- 10.7.3 REST-Based Web Services
- 10.7.4 Web Services and Databases
- 10.8 Other Data Representation Formats
- Summary
- Problems and Exercises
- 11 NoSQL Databases
- 11.1 The NoSQL Movement
- 11.1.1 The End of the ‘‘One Size Fits All’’ Era?
- 11.1.2 The Emergence of the NoSQL Movement
- 11.2 Key–Value Stores
- 11.2.1 From Keys to Hashes
- 11.2.2 Horizontal Scaling
- 11.2.3 An Example: Memcached
- 11.2.4 Request Coordination
- 11.2.5 Consistent Hashing
- 11.2.6 Replication and Redundancy
- 11.2.7 Eventual Consistency
- 11.2.8 Stabilization
- 11.2.9 Integrity Constraints and Querying
- 11.3 Tuple and Document Stores
- 11.3.1 Items with Keys
- 11.3.2 Filters and Queries
- 11.3.3 Complex Queries and Aggregation with MapReduce
- 11.3.4 SQL After All. . .
- 11.4 Column-Oriented Databases
- 11.5 Graph-Based Databases
- 11.5.1 Cypher Overview
- 11.5.2 Exploring a Social Graph
- 11.6 Other NoSQL Categories
- Summary
- Problems and Exercises
- Part III Physical Data Storage, Transaction Management, and Database Access
- 12 Physical File Organization and Indexing
- 12.1 Storage Hardware and Physical Database Design
- 12.1.1 The Storage Hierarchy
- 12.1.2 Internals of Hard Disk Drives
- 12.1.3 From Logical Concepts to Physical Constructs
- 12.2 Record Organization
- 12.3 File Organization
- 12.3.1 Introductory Concepts: Search Keys, Primary, and Secondary File Organization
- 12.3.2 Heap File Organization
- 12.3.3 Sequential File Organization
- 12.3.4 Random File Organization (Hashing)
- 12.3.4.1 Key-to-Address Transformation
- 12.3.4.2 Factors that Determine the Efficiency of Random File Organization
- 12.3.5 Indexed Sequential File Organization
- 12.3.5.1 Basic Terminology of Indexes
- 12.3.5.2 Primary Indexes
- 12.3.5.3 Clustered Indexes
- 12.3.5.4 Multilevel Indexes
- 12.3.6 List Data Organization (Linear and Nonlinear Lists)
- 12.3.6.1 Linear Lists
- 12.3.6.2 Tree Data Structures
- 12.3.7 Secondary Indexes and Inverted Files
- 12.3.7.1 Characteristics of Secondary Indexes
- 12.3.7.2 Inverted Files
- 12.3.7.3 Multicolumn Indexes
- 12.3.7.4 Other Index Types
- 12.3.8 B-Trees and B+-Trees
- 12.3.8.1 Multilevel Indexes Revisited
- 12.3.8.2 Binary Search Trees
- 12.3.8.3 B-Trees
- 12.3.8.4 B+-Trees
- Summary
- Problems and Exercises
- 13 Physical Database Organization
- 13.1 Physical Database Organization and Database Access Methods
- 13.1.1 From Database to Tablespace
- 13.1.2 Index Design
- 13.1.3 Database Access Methods
- 13.1.3.1 Functioning of the Query Optimizer
- 13.1.3.2 Index Search (with Atomic Search Key)
- 13.1.3.3 Multiple Index and Multicolumn Index Search
- 13.1.3.4 Index-Only Access
- 13.1.3.5 Full Table Scan
- 13.1.4 Join Implementations
- 13.1.4.1 Nested-Loop Join
- 13.1.4.2 Sort-Merge Join
- 13.1.4.3 Hash Join
- 13.2 Enterprise Storage Subsystems and Business Continuity
- 13.2.1 Disk Arrays and RAID
- 13.2.2 Enterprise Storage Subsystems
- 13.2.2.1 Overview and Classification
- 13.2.2.2 DAS (Directly Attached Storage)
- 13.2.2.3 SAN (Storage Area Network)
- 13.2.2.4 NAS (Network Attached Storage)
- 13.2.2.5 NAS Gateway
- 13.2.2.6 iSCSI/Storage Over IP
- 13.2.3 Business Continuity
- 13.2.3.1 Contingency Planning, Recovery Point, and Recovery Time
- 13.2.3.2 Availability and Accessibility of Storage Devices
- 13.2.3.3 Availability of Database Functionality
- 13.2.3.4 Data Availability
- Summary
- Problems and Exercises
- 14 Basics of Transaction Management
- 14.1 Transactions, Recovery, and Concurrency Control
- 14.2 Transactions and Transaction Management
- 14.2.1 Delineating Transactions and the Transaction Lifecycle
- 14.2.2 DBMS Components Involved in Transaction Management
- 14.2.3 The Logfile
- 14.3 Recovery
- 14.3.1 Types of Failures
- 14.3.2 System Recovery
- 14.3.3 Media Recovery
- 14.4 Concurrency Control
- 14.4.1 Typical Concurrency Problems
- 14.4.1.1 Lost Update Problem
- 14.4.1.2 Uncommitted Dependency Problem (aka Dirty Read Problem)
- 14.4.1.3 Inconsistent Analysis Problem
- 14.4.1.4 Other Concurrency-Related Problems
- 14.4.2 Schedules and Serial Schedules
- 14.4.3 Serializable Schedules
- 14.4.4 Optimistic and Pessimistic Schedulers
- 14.4.5 Locking and Locking Protocols
- 14.4.5.1 Purposes of Locking
- 14.4.5.2 The Two-Phase Locking Protocol (2PL)
- 14.4.5.3 Cascading Rollbacks
- 14.4.5.4 Dealing with Deadlocks
- 14.4.5.5 Isolation Levels
- 14.4.5.6 Lock Granularity
- 14.5 The ACID Properties of Transactions
- Summary
- Problems and Exercises
- 15 Accessing Databases and Database APIs
- 15.1 Database System Architectures
- 15.1.1 Centralized System Architectures
- 15.1.2 Tiered System Architectures
- 15.2 Classification of Database APIs
- 15.2.1 Proprietary versus Universal APIs
- 15.2.2 Embedded versus Call-Level APIs
- 15.2.3 Early Binding versus Late Binding
- 15.3 Universal Database APIs
- 15.3.1 ODBC
- 15.3.2 OLE DB and ADO
- 15.3.3 ADO.NET
- 15.3.4 Java DataBase Connectivity (JDBC)
- 15.3.5 Intermezzo: SQL Injection and Access Security
- 15.3.6 SQLJ
- 15.3.7 Intermezzo: Embedded APIs versus Embedded DBMSs
- 15.3.8 Language-Integrated Querying
- 15.4 Object Persistence and Object-Relational Mapping APIs
- 15.4.1 Object Persistence with Enterprise JavaBeans
- 15.4.2 Object Persistence with the Java Persistence API
- 15.4.3 Object Persistence with Java Data Objects
- 15.4.4 Object Persistence in Other Host Languages
- 15.5 Database API Summary
- 15.6 Database Access in the World Wide Web
- 15.6.1 Introduction: the Original Web Server
- 15.6.2 The Common Gateway Interface: Toward Dynamic Web Pages
- 15.6.3 Client-Side Scripting: The Desire for a Richer Web
- 15.6.4 JavaScript as a Platform
- 15.6.5 DBMSs Adapt: REST, Other Web Services, and a Look Ahead
- Summary
- Problems and Exercises
- 16 Data Distribution and Distributed Transaction Management
- 16.1 Distributed Systems and Distributed Databases
- 16.2 Architectural Implications of Distributed Databases
- 16.3 Fragmentation, Allocation, and Replication
- 16.3.1 Vertical Fragmentation
- 16.3.2 Horizontal Fragmentation (Sharding)
- 16.3.3 Mixed Fragmentation
- 16.3.4 Replication
- 16.3.5 Distribution and Replication of Metadata
- 16.4 Transparency
- 16.5 Distributed Query Processing
- 16.6 Distributed Transaction Management and Concurrency Control
- 16.6.1 Primary Site and Primary Copy 2PL
- 16.6.2 Distributed 2PL
- 16.6.3 The Two-Phase Commit Protocol (2PC)
- 16.6.4 Optimistic Concurrency and Loosely Coupled Systems
- 16.6.5 Compensation-Based Transaction Models
- 16.7 Eventual Consistency and BASE Transactions
- 16.7.1 Horizontal Fragmentation and Consistent Hashing
- 16.7.2 The CAP Theorem
- 16.7.3 BASE Transactions
- 16.7.4 Multi-Version Concurrency Control and Vector Clocks
- 16.7.5 Quorum-Based Consistency
- Summary
- Problems and Exercises
- Part IV Data Warehousing, Data Governance, and (Big) Data Analytics
- 17 Data Warehousing and Business Intelligence
- 17.1 Operational versus Tactical/Strategic Decision-Making
- 17.2 Data Warehouse Definition
- 17.3 Data Warehouse Schemas
- 17.3.1 Star Schema
- 17.3.2 Snowflake Schema
- 17.3.3 Fact Constellation
- 17.3.4 Specific Schema Issues
- 17.3.4.1 Surrogate Keys
- 17.3.4.2 Granularity of the Fact Table
- 17.3.4.3 Factless Fact Tables
- 17.3.4.4 Optimizing the Dimension Tables
- 17.3.4.5 Defining Junk Dimensions
- 17.3.4.6 Defining Outrigger Tables
- 17.3.4.7 Slowly Changing Dimensions
- 17.3.4.8 Rapidly Changing Dimensions
- 17.4 The Extraction, Transformation, and Loading (ETL) Process
- 17.5 Data Marts
- 17.6 Virtual Data Warehouses and Virtual Data Marts
- 17.7 Operational Data Store
- 17.8 Data Warehouses versus Data Lakes
- 17.9 Business Intelligence
- 17.9.1 Query and Reporting
- 17.9.2 Pivot Tables
- 17.9.3 On-Line Analytical Processing (OLAP)
- 17.9.3.1 MOLAP
- 17.9.3.2 ROLAP
- 17.9.3.3 HOLAP
- 17.9.3.4 OLAP Operators
- 17.9.3.5 OLAP Queries in SQL
- Summary
- Problems and Exercises
- 18 Data Integration, Data Quality, and Data Governance
- 18.1 Data and Process Integration
- 18.1.1 Convergence of Analytical and Operational Data Needs
- 18.1.2 Data Integration and Data Integration Patterns
- 18.1.2.1 Data Consolidation: Extract, Transform, Load (ETL)
- 18.1.2.2 Data Federation: Enterprise Information Integration (EII)
- 18.1.2.3 Data Propagation: Enterprise Application Integration (EAI)
- 18.1.2.4 Data Propagation: Enterprise Data Replication (EDR)
- 18.1.2.5 Changed Data Capture (CDC), Near-Real-Time ETL, and Event Processing
- 18.1.2.6 Data Virtualization
- 18.1.2.7 Data as a Service and Data in the Cloud
- 18.1.3 Data Services and Data Flows in the Context of Data and Process Integration
- 18.1.3.1 Business Process Integration
- 18.1.3.2 Patterns for Managing Sequence Dependencies and Data Dependencies in Processes
- 18.1.3.3 A Unified View on Data and Process Integration
- 18.2 Searching Unstructured Data and Enterprise Search
- 18.2.1 Principles of Full-Text Search
- 18.2.2 Indexing Full-Text Documents
- 18.2.3 Web Search Engines
- 18.2.4 Enterprise Search
- 18.3 Data Quality and Master Data Management
- 18.4 Data Governance
- 18.4.1 Total Data Quality Management (TDQM)
- 18.4.2 Capability Maturity Model Integration (CMMI)
- 18.4.3 Data Management Body of Knowledge (DMBOK)
- 18.4.4 Control Objectives for Information and Related Technology (COBIT)
- 18.4.5 Information Technology Infrastructure Library
- 18.5 Outlook
- 18.6 Conclusion
- Problems and Exercises
- 19 Big Data
- 19.1 The 5 Vs of Big Data
- 19.2 Hadoop
- 19.2.1 History of Hadoop
- 19.2.2 The Hadoop Stack
- 19.2.2.1 The Hadoop Distributed File System
- 19.2.2.2 MapReduce
- 19.2.2.3 Yet Another Resource Negotiator
- 19.3 SQL on Hadoop
- 19.3.1 HBase: The First Database on Hadoop
- 19.3.2 Pig
- 19.3.3 Hive
- 19.4 Apache Spark
- 19.4.1 Spark Core
- 19.4.2 Spark SQL
- 19.4.3 MLlib, Spark Streaming, and GraphX
- 19.5 Conclusion
- Problems and Exercises
- 20 Analytics
- 20.1 The Analytics Process Model
- 20.2 Example Analytics Applications
- 20.3 Data Scientist Job Profile
- 20.4 Data Pre-Processing
- 20.4.1 Denormalizing Data for Analysis
- 20.4.2 Sampling
- 20.4.3 Exploratory Analysis
- 20.4.4 Missing Values
- 20.4.5 Outlier Detection and Handling
- 20.5 Types of Analytics
- 20.5.1 Predictive Analytics
- 20.5.1.1 Linear Regression
- 20.5.1.2 Logistic Regression
- Logistic Regression Properties
- 20.5.1.3 Decision Trees
- Splitting Decision
- Stopping Decision
- Decision Tree Properties
- Regression Trees
- 20.5.1.4 Other Predictive Analytics Techniques
- 20.5.2 Evaluating Predictive Models
- 20.5.2.1 Splitting Up the Dataset
- 20.5.2.2 Performance Measures for Classification Models
- 20.5.2.3 Performance Measures for Regression Models
- 20.5.2.4 Other Performance Measures for Predictive Analytical Models
- 20.5.3 Descriptive Analytics
- 20.5.3.1 Association Rules
- Basic Setting
- Support, Confidence, and Lift
- Post-Processing Association Rules
- 20.5.3.2 Sequence Rules
- 20.5.3.3 Clustering
- Hierarchical Clustering
- K-means Clustering
- 20.5.4 Social Network Analytics
- 20.5.4.1 Social Network Definitions
- 20.5.4.2 Social Network Metrics
- 20.5.4.3 Social Network Learning
- 20.6 Post-Processing of Analytical Models
- 20.7 Critical Success Factors for Analytical Models
- 20.8 Economic Perspective on Analytics
- 20.8.1 Total Cost of Ownership (TCO)
- 20.8.2 Return on Investment
- 20.8.3 In- versus Outsourcing
- 20.8.4 On-Premises versus Cloud Solutions
- 20.8.5 Open-Source versus Commercial Software
- 20.9 Improving the ROI of Analytics
- 20.9.1 New Sources of Data
- 20.9.2 Data Quality
- 20.9.3 Management Support
- 20.9.4 Organizational Aspects
- 20.9.5 Cross-Fertilization
- 20.10 Privacy and Security
- 20.10.1 Overall Considerations Regarding Privacy and Security
- 20.10.2 The RACI Matrix
- 20.10.3 Accessing Internal Data
- 20.10.3.1 Anonymization
- 20.10.3.2 SQL Views
- 20.10.3.3 Label-Based Access Control
- 20.10.4 Privacy Regulation
- 20.11 Conclusion
- Problems and Exercises
- Appendix Using the Online Environment
- How to Access the Online Environment
- Environment: Relational Databases and SQL
- Environment: MongoDB
- Environment: Neo4j and Cypher
- Environment: Tree Structure Visualizations
- Environment: HBase
- Glossary
- Index
- Endorsements




