Principles of Database Management

Höfundur Wilfried Lemahieu; Seppe vanden Broucke; Bart Baesens

Útgefandi Cambridge University Press

Snið Page Fidelity

Print ISBN 9781107186125

Útgáfa 0

Höfundarréttur

8.390 kr.

Description

Efnisyfirlit

  • Half-title
  • Reviews
  • Title page
  • Copyright information
  • Brief Contents
  • Table of contents
  • About the Authors
  • Preface
  • Who This Book is For
  • Topics Covered in this Book
  • How to Read this Book
  • Cross-Chapter Case Study: Sober
  • Additional Material
  • Acknowledgments
  • Sober: 1000‰ Driven by Technology
  • Part I Databases and Database Design
  • 1 Fundamental Concepts of Database Management
  • 1.1 Applications of Database Technology
  • 1.2 Key Definitions
  • 1.3 File versus Database Approach to Data Management
  • 1.3.1 The File-Based Approach
  • 1.3.2 The Database Approach
  • 1.4 Elements of a Database System
  • 1.4.1 Database Model versus Instances
  • 1.4.2 Data Model
  • 1.4.3 The Three-Layer Architecture
  • 1.4.4 Catalog
  • 1.4.5 Database Users
  • 1.4.6 Database Languages
  • 1.5 Advantages of Database Systems and Database Management
  • 1.5.1 Data Independence
  • 1.5.2 Database Modeling
  • 1.5.3 Managing Structured, Semi-Structured, and Unstructured Data
  • 1.5.4 Managing Data Redundancy
  • 1.5.5 Specifying Integrity Rules
  • 1.5.6 Concurrency Control
  • 1.5.7 Backup and Recovery Facilities
  • 1.5.8 Data Security
  • 1.5.9 Performance Utilities
  • Summary
  • Problems and Exercises
  • 2 Architecture and Categorization of DBMSs
  • 2.1 Architecture of a DBMS
  • 2.1.1 Connection and Security Manager
  • 2.1.2 DDL Compiler
  • 2.1.3 Query Processor
  • 2.1.3.1 DML Compiler
  • 2.1.3.2 Query Parser and Query Rewriter
  • 2.1.3.3 Query Optimizer
  • 2.1.3.4 Query Executor
  • 2.1.4 Storage Manager
  • 2.1.4.1 Transaction Manager
  • 2.1.4.2 Buffer Manager
  • 2.1.4.3 Lock Manager
  • 2.1.4.4 Recovery Manager
  • 2.1.5 DBMS Utilities
  • 2.1.6 DBMS Interfaces
  • 2.2 Categorization of DBMSs
  • 2.2.1 Categorization Based on Data Model
  • 2.2.1.1 Hierarchical DBMSs
  • 2.2.1.2 Network DBMSs
  • 2.2.1.3 Relational DBMSs
  • 2.2.1.4 Object-Oriented DBMSs
  • 2.2.1.5 Object-Relational/Extended Relational DBMSs
  • 2.2.1.6 XML DBMSs
  • 2.2.1.7 NoSQL DBMSs
  • 2.2.2 Categorization Based on Degree of Simultaneous Access
  • 2.2.3 Categorization Based on Architecture
  • 2.2.4 Categorization Based on Usage
  • Summary
  • Problems and Exercises
  • 3 Conceptual Data Modeling Using the (E)ER Model and UML Class Diagram
  • 3.1 Phases of Database Design
  • 3.2 The Entity Relationship Model
  • 3.2.1 Entity Types
  • 3.2.2 Attribute Types
  • 3.2.3.1 Domains
  • 3.2.3.2 Key Attribute Types
  • 3.2.3.3 Simple versus Composite Attribute Types
  • 3.2.3.4 Single-Valued versus Multi-Valued Attribute Types
  • 3.2.3.5 Derived Attribute Type
  • 3.2.4 Relationship Types
  • 3.2.4.1 Degree and Roles
  • 3.2.4.2 Cardinalities
  • 3.2.4.3 Relationship Attribute Types
  • 3.2.5 Weak Entity Types
  • 3.2.6 Ternary Relationship Types
  • 3.2.7 Examples of the ER Model
  • 3.2.8 Limitations of the ER Model
  • 3.3 The Enhanced Entity Relationship (EER) Model
  • 3.3.1 Specialization/Generalization
  • 3.3.2 Categorization
  • 3.3.3 Aggregation
  • 3.3.4 Examples of the EER Model
  • 3.3.5 Designing an EER Model
  • 3.4 The UML Class Diagram
  • 3.4.1 Recap of Object Orientation
  • 3.4.2 Classes
  • 3.4.3 Variables
  • 3.4.4 Access Modifiers
  • 3.4.5 Associations
  • 3.4.5.1 Association Class
  • 3.4.5.2 Unidirectional versus Bidirectional Association
  • 3.4.5.3 Qualified Association
  • 3.4.6 Specialization/Generalization
  • 3.4.7 Aggregation
  • 3.4.8 UML Example
  • 3.4.9 Advanced UML Modeling Concepts
  • 3.4.9.1 Changeability Property
  • 3.4.9.2 Object Constraint Language (OCL)
  • 3.4.9.3 Dependency Relationship
  • 3.4.10 UML versus EER
  • Summary
  • Problems and Exercises
  • 4 Organizational Aspects of Data Management
  • 4.1 Data Management
  • 4.1.1 Catalogs and the Role of Metadata
  • 4.1.2 Metadata Modeling
  • 4.1.3 Data Quality
  • 4.1.3.1 Data Quality Dimensions
  • Accuracy
  • Completeness
  • Consistency
  • Accessibility
  • 4.1.3.2 Data Quality Problems
  • 4.1.4 Data Governance
  • 4.2 Roles in Data Management
  • 4.2.1 Information Architect
  • 4.2.2 Database Designer
  • 4.2.3 Data Owner
  • 4.2.4 Data Steward
  • 4.2.5 Database Administrator
  • 4.2.6 Data Scientist
  • Summary
  • Problems and Exercises
  • Part II Types of Database Systems
  • 5 Legacy Databases
  • 5.1 The Hierarchical Model
  • 5.2 The CODASYL Model
  • Summary
  • Problems and Exercises
  • 6 Relational Databases: The Relational Model
  • 6.1 The Relational Model
  • 6.1.1 Basic Concepts
  • 6.1.2 Formal Definitions
  • 6.1.3 Types of Keys
  • 6.1.3.1 Superkeys and Keys
  • 6.1.3.2 Candidate Keys, Primary Keys, and Alternative Keys
  • 6.1.3.3 Foreign Keys
  • 6.1.4 Relational Constraints
  • 6.1.5 Example Relational Data Model
  • 6.2 Normalization
  • 6.2.1 Insertion, Deletion, and Update Anomalies in an Unnormalized Relational Model
  • 6.2.2 Informal Normalization Guidelines
  • 6.2.3 Functional Dependencies and Prime Attribute Types
  • 6.2.4 Normalization Forms
  • 6.2.4.1 First Normal Form (1NF)
  • 6.2.4.2 Second Normal Form (2NF)
  • 6.2.4.3 Third Normal Form (3NF)
  • 6.2.4.4 Boyce–Codd Normal Form (BCNF)
  • 6.2.4.5 Fourth Normal Form (4NF)
  • 6.3 Mapping a Conceptual ER Model to a Relational Model
  • 6.3.1 Mapping Entity Types
  • 6.3.2 Mapping Relationship Types
  • 6.3.2.1 Mapping a Binary 1:1 Relationship type
  • 6.3.2.2 Mapping a Binary 1:N Relationship Type
  • 6.3.2.3 Mapping a Binary M:N Relationship Type
  • 6.3.2.4 Mapping Unary Relationship Types
  • 6.3.2.5 Mapping n-ary Relationship Types
  • 6.3.3 Mapping Multi-Valued Attribute Types
  • 6.3.4 Mapping Weak Entity Types
  • 6.3.5 Putting it All Together
  • 6.4 Mapping a Conceptual EER Model to a Relational Model
  • 6.4.1 Mapping an EER Specialization
  • 6.4.2 Mapping an EER Categorization
  • 6.4.3 Mapping an EER Aggregation
  • Summary
  • Problems and Exercises
  • 7 Relational Databases: Structured Query Language (SQL)
  • 7.1 Relational Database Management Systems and SQL
  • 7.1.1 Key Characteristics of SQL
  • 7.1.2 Three-Layer Database Architecture
  • 7.2 SQL Data Definition Language
  • 7.2.1 Key DDL Concepts
  • 7.2.2 DDL Example
  • 7.2.3 Referential Integrity Constraints
  • 7.2.4 DROP and ALTER Command
  • 7.3 SQL Data Manipulation Language
  • 7.3.1 SQL SELECT Statement
  • 7.3.1.1 Simple Queries
  • 7.3.1.2 Queries with Aggregate Functions
  • 7.3.1.3 Queries with GROUP BY/HAVING
  • 7.3.1.4 Queries with ORDER BY
  • 7.3.1.5 Join Queries
  • Inner Joins
  • Outer Joins
  • 7.3.1.6 Nested Queries
  • 7.3.1.7 Correlated Queries
  • 7.3.1.8 Queries with ALL/ANY
  • 7.3.1.9 Queries with EXISTS
  • 7.3.1.10 Queries with Subqueries in SELECT/FROM
  • 7.3.1.11 Queries with Set Operations
  • 7.3.2 SQL INSERT Statement
  • 7.3.3 SQL DELETE Statement
  • 7.3.4 SQL UPDATE Statement
  • 7.4 SQL Views
  • 7.5 SQL Indexes
  • 7.6 SQL Privileges
  • 7.7 SQL for Metadata Management
  • Summary
  • Problems and Exercises
  • 8 Object-Oriented Databases and Object Persistence
  • 8.1 Recap: Basic Concepts of OO
  • 8.2 Advanced Concepts of OO
  • 8.2.1 Method Overloading
  • 8.2.2 Inheritance
  • 8.2.3 Method Overriding
  • 8.2.4 Polymorphism and Dynamic Binding
  • 8.3 Basic Principles of Object Persistence
  • 8.3.1 Serialization
  • 8.4 OODBMS
  • 8.4.1 Object Identifiers
  • 8.4.2 ODMG Standard
  • 8.4.3 Object Model
  • 8.4.4 Object Definition Language (ODL)
  • 8.4.5 Object Query Language (OQL)
  • 8.4.5.1 Simple OQL Queries
  • 8.4.5.2 SELECT FROM WHERE OQL Queries
  • 8.4.5.3 Join OQL Queries
  • 8.4.5.4 Other OQL Queries
  • 8.4.6 Language Bindings
  • 8.5 Evaluating OODBMSs
  • Summary
  • Problems and Exercises
  • 9 Extended Relational Databases
  • 9.1 Limitations of the Relational Model
  • 9.2 Active RDBMS Extensions
  • 9.2.1 Triggers
  • 9.2.2 Stored Procedures
  • 9.3 Object-Relational RDBMS Extensions
  • 9.3.1 User-Defined Types
  • 9.3.1.1 Distinct Data Types
  • 9.3.1.2 Opaque Data Types
  • 9.3.1.3 Unnamed Row Types
  • 9.3.1.4 Named Row Types
  • 9.3.1.5 Table Data Types
  • 9.3.2 User-Defined Functions
  • 9.3.3 Inheritance
  • 9.3.3.1 Inheritance at Data Type Level
  • 9.3.3.2 Inheritance at Table Type Level
  • 9.3.4 Behavior
  • 9.3.5 Polymorphism
  • 9.3.6 Collection Types
  • 9.3.7 Large Objects
  • 9.4 Recursive SQL Queries
  • Summary
  • Problems and Exercises
  • 10 XML Databases
  • 10.1 Extensible Markup Language
  • 10.1.1 Basic Concepts
  • 10.1.2 Document Type Definition and XML Schema Definition
  • 10.1.3 Extensible Stylesheet Language
  • 10.1.4 Namespaces
  • 10.1.5 XPath
  • 10.2 Processing XML Documents
  • 10.3 Storage of XML Documents
  • 10.3.1 The Document-Oriented Approach for Storing XML Documents
  • 10.3.2 The Data-Oriented Approach for Storing XML Documents
  • 10.3.3 The Combined Approach for Storing XML Documents
  • 10.4 Differences Between XML Data and Relational Data
  • 10.5 Mappings Between XML Documents and (Object-) Relational Data
  • 10.5.1 Table-Based Mapping
  • 10.5.2 Schema-Oblivious Mapping
  • 10.5.3 Schema-Aware Mapping
  • 10.5.4 SQL/XML
  • 10.6 Searching XML Data
  • 10.6.1 Full-Text Search
  • 10.6.2 Keyword-Based Search
  • 10.6.3 Structured Search With XQuery
  • 10.6.4 Semantic Search With RDF and SPARQL
  • 10.7 XML for Information Exchange
  • 10.7.1 Message-Oriented Middleware
  • 10.7.2 SOAP-Based Web Services
  • 10.7.3 REST-Based Web Services
  • 10.7.4 Web Services and Databases
  • 10.8 Other Data Representation Formats
  • Summary
  • Problems and Exercises
  • 11 NoSQL Databases
  • 11.1 The NoSQL Movement
  • 11.1.1 The End of the ‘‘One Size Fits All’’ Era?
  • 11.1.2 The Emergence of the NoSQL Movement
  • 11.2 Key–Value Stores
  • 11.2.1 From Keys to Hashes
  • 11.2.2 Horizontal Scaling
  • 11.2.3 An Example: Memcached
  • 11.2.4 Request Coordination
  • 11.2.5 Consistent Hashing
  • 11.2.6 Replication and Redundancy
  • 11.2.7 Eventual Consistency
  • 11.2.8 Stabilization
  • 11.2.9 Integrity Constraints and Querying
  • 11.3 Tuple and Document Stores
  • 11.3.1 Items with Keys
  • 11.3.2 Filters and Queries
  • 11.3.3 Complex Queries and Aggregation with MapReduce
  • 11.3.4 SQL After All. . .
  • 11.4 Column-Oriented Databases
  • 11.5 Graph-Based Databases
  • 11.5.1 Cypher Overview
  • 11.5.2 Exploring a Social Graph
  • 11.6 Other NoSQL Categories
  • Summary
  • Problems and Exercises
  • Part III Physical Data Storage, Transaction Management, and Database Access
  • 12 Physical File Organization and Indexing
  • 12.1 Storage Hardware and Physical Database Design
  • 12.1.1 The Storage Hierarchy
  • 12.1.2 Internals of Hard Disk Drives
  • 12.1.3 From Logical Concepts to Physical Constructs
  • 12.2 Record Organization
  • 12.3 File Organization
  • 12.3.1 Introductory Concepts: Search Keys, Primary, and Secondary File Organization
  • 12.3.2 Heap File Organization
  • 12.3.3 Sequential File Organization
  • 12.3.4 Random File Organization (Hashing)
  • 12.3.4.1 Key-to-Address Transformation
  • 12.3.4.2 Factors that Determine the Efficiency of Random File Organization
  • 12.3.5 Indexed Sequential File Organization
  • 12.3.5.1 Basic Terminology of Indexes
  • 12.3.5.2 Primary Indexes
  • 12.3.5.3 Clustered Indexes
  • 12.3.5.4 Multilevel Indexes
  • 12.3.6 List Data Organization (Linear and Nonlinear Lists)
  • 12.3.6.1 Linear Lists
  • 12.3.6.2 Tree Data Structures
  • 12.3.7 Secondary Indexes and Inverted Files
  • 12.3.7.1 Characteristics of Secondary Indexes
  • 12.3.7.2 Inverted Files
  • 12.3.7.3 Multicolumn Indexes
  • 12.3.7.4 Other Index Types
  • 12.3.8 B-Trees and B+-Trees
  • 12.3.8.1 Multilevel Indexes Revisited
  • 12.3.8.2 Binary Search Trees
  • 12.3.8.3 B-Trees
  • 12.3.8.4 B+-Trees
  • Summary
  • Problems and Exercises
  • 13 Physical Database Organization
  • 13.1 Physical Database Organization and Database Access Methods
  • 13.1.1 From Database to Tablespace
  • 13.1.2 Index Design
  • 13.1.3 Database Access Methods
  • 13.1.3.1 Functioning of the Query Optimizer
  • 13.1.3.2 Index Search (with Atomic Search Key)
  • 13.1.3.3 Multiple Index and Multicolumn Index Search
  • 13.1.3.4 Index-Only Access
  • 13.1.3.5 Full Table Scan
  • 13.1.4 Join Implementations
  • 13.1.4.1 Nested-Loop Join
  • 13.1.4.2 Sort-Merge Join
  • 13.1.4.3 Hash Join
  • 13.2 Enterprise Storage Subsystems and Business Continuity
  • 13.2.1 Disk Arrays and RAID
  • 13.2.2 Enterprise Storage Subsystems
  • 13.2.2.1 Overview and Classification
  • 13.2.2.2 DAS (Directly Attached Storage)
  • 13.2.2.3 SAN (Storage Area Network)
  • 13.2.2.4 NAS (Network Attached Storage)
  • 13.2.2.5 NAS Gateway
  • 13.2.2.6 iSCSI/Storage Over IP
  • 13.2.3 Business Continuity
  • 13.2.3.1 Contingency Planning, Recovery Point, and Recovery Time
  • 13.2.3.2 Availability and Accessibility of Storage Devices
  • 13.2.3.3 Availability of Database Functionality
  • 13.2.3.4 Data Availability
  • Summary
  • Problems and Exercises
  • 14 Basics of Transaction Management
  • 14.1 Transactions, Recovery, and Concurrency Control
  • 14.2 Transactions and Transaction Management
  • 14.2.1 Delineating Transactions and the Transaction Lifecycle
  • 14.2.2 DBMS Components Involved in Transaction Management
  • 14.2.3 The Logfile
  • 14.3 Recovery
  • 14.3.1 Types of Failures
  • 14.3.2 System Recovery
  • 14.3.3 Media Recovery
  • 14.4 Concurrency Control
  • 14.4.1 Typical Concurrency Problems
  • 14.4.1.1 Lost Update Problem
  • 14.4.1.2 Uncommitted Dependency Problem (aka Dirty Read Problem)
  • 14.4.1.3 Inconsistent Analysis Problem
  • 14.4.1.4 Other Concurrency-Related Problems
  • 14.4.2 Schedules and Serial Schedules
  • 14.4.3 Serializable Schedules
  • 14.4.4 Optimistic and Pessimistic Schedulers
  • 14.4.5 Locking and Locking Protocols
  • 14.4.5.1 Purposes of Locking
  • 14.4.5.2 The Two-Phase Locking Protocol (2PL)
  • 14.4.5.3 Cascading Rollbacks
  • 14.4.5.4 Dealing with Deadlocks
  • 14.4.5.5 Isolation Levels
  • 14.4.5.6 Lock Granularity
  • 14.5 The ACID Properties of Transactions
  • Summary
  • Problems and Exercises
  • 15 Accessing Databases and Database APIs
  • 15.1 Database System Architectures
  • 15.1.1 Centralized System Architectures
  • 15.1.2 Tiered System Architectures
  • 15.2 Classification of Database APIs
  • 15.2.1 Proprietary versus Universal APIs
  • 15.2.2 Embedded versus Call-Level APIs
  • 15.2.3 Early Binding versus Late Binding
  • 15.3 Universal Database APIs
  • 15.3.1 ODBC
  • 15.3.2 OLE DB and ADO
  • 15.3.3 ADO.NET
  • 15.3.4 Java DataBase Connectivity (JDBC)
  • 15.3.5 Intermezzo: SQL Injection and Access Security
  • 15.3.6 SQLJ
  • 15.3.7 Intermezzo: Embedded APIs versus Embedded DBMSs
  • 15.3.8 Language-Integrated Querying
  • 15.4 Object Persistence and Object-Relational Mapping APIs
  • 15.4.1 Object Persistence with Enterprise JavaBeans
  • 15.4.2 Object Persistence with the Java Persistence API
  • 15.4.3 Object Persistence with Java Data Objects
  • 15.4.4 Object Persistence in Other Host Languages
  • 15.5 Database API Summary
  • 15.6 Database Access in the World Wide Web
  • 15.6.1 Introduction: the Original Web Server
  • 15.6.2 The Common Gateway Interface: Toward Dynamic Web Pages
  • 15.6.3 Client-Side Scripting: The Desire for a Richer Web
  • 15.6.4 JavaScript as a Platform
  • 15.6.5 DBMSs Adapt: REST, Other Web Services, and a Look Ahead
  • Summary
  • Problems and Exercises
  • 16 Data Distribution and Distributed Transaction Management
  • 16.1 Distributed Systems and Distributed Databases
  • 16.2 Architectural Implications of Distributed Databases
  • 16.3 Fragmentation, Allocation, and Replication
  • 16.3.1 Vertical Fragmentation
  • 16.3.2 Horizontal Fragmentation (Sharding)
  • 16.3.3 Mixed Fragmentation
  • 16.3.4 Replication
  • 16.3.5 Distribution and Replication of Metadata
  • 16.4 Transparency
  • 16.5 Distributed Query Processing
  • 16.6 Distributed Transaction Management and Concurrency Control
  • 16.6.1 Primary Site and Primary Copy 2PL
  • 16.6.2 Distributed 2PL
  • 16.6.3 The Two-Phase Commit Protocol (2PC)
  • 16.6.4 Optimistic Concurrency and Loosely Coupled Systems
  • 16.6.5 Compensation-Based Transaction Models
  • 16.7 Eventual Consistency and BASE Transactions
  • 16.7.1 Horizontal Fragmentation and Consistent Hashing
  • 16.7.2 The CAP Theorem
  • 16.7.3 BASE Transactions
  • 16.7.4 Multi-Version Concurrency Control and Vector Clocks
  • 16.7.5 Quorum-Based Consistency
  • Summary
  • Problems and Exercises
  • Part IV Data Warehousing, Data Governance, and (Big) Data Analytics
  • 17 Data Warehousing and Business Intelligence
  • 17.1 Operational versus Tactical/Strategic Decision-Making
  • 17.2 Data Warehouse Definition
  • 17.3 Data Warehouse Schemas
  • 17.3.1 Star Schema
  • 17.3.2 Snowflake Schema
  • 17.3.3 Fact Constellation
  • 17.3.4 Specific Schema Issues
  • 17.3.4.1 Surrogate Keys
  • 17.3.4.2 Granularity of the Fact Table
  • 17.3.4.3 Factless Fact Tables
  • 17.3.4.4 Optimizing the Dimension Tables
  • 17.3.4.5 Defining Junk Dimensions
  • 17.3.4.6 Defining Outrigger Tables
  • 17.3.4.7 Slowly Changing Dimensions
  • 17.3.4.8 Rapidly Changing Dimensions
  • 17.4 The Extraction, Transformation, and Loading (ETL) Process
  • 17.5 Data Marts
  • 17.6 Virtual Data Warehouses and Virtual Data Marts
  • 17.7 Operational Data Store
  • 17.8 Data Warehouses versus Data Lakes
  • 17.9 Business Intelligence
  • 17.9.1 Query and Reporting
  • 17.9.2 Pivot Tables
  • 17.9.3 On-Line Analytical Processing (OLAP)
  • 17.9.3.1 MOLAP
  • 17.9.3.2 ROLAP
  • 17.9.3.3 HOLAP
  • 17.9.3.4 OLAP Operators
  • 17.9.3.5 OLAP Queries in SQL
  • Summary
  • Problems and Exercises
  • 18 Data Integration, Data Quality, and Data Governance
  • 18.1 Data and Process Integration
  • 18.1.1 Convergence of Analytical and Operational Data Needs
  • 18.1.2 Data Integration and Data Integration Patterns
  • 18.1.2.1 Data Consolidation: Extract, Transform, Load (ETL)
  • 18.1.2.2 Data Federation: Enterprise Information Integration (EII)
  • 18.1.2.3 Data Propagation: Enterprise Application Integration (EAI)
  • 18.1.2.4 Data Propagation: Enterprise Data Replication (EDR)
  • 18.1.2.5 Changed Data Capture (CDC), Near-Real-Time ETL, and Event Processing
  • 18.1.2.6 Data Virtualization
  • 18.1.2.7 Data as a Service and Data in the Cloud
  • 18.1.3 Data Services and Data Flows in the Context of Data and Process Integration
  • 18.1.3.1 Business Process Integration
  • 18.1.3.2 Patterns for Managing Sequence Dependencies and Data Dependencies in Processes
  • 18.1.3.3 A Unified View on Data and Process Integration
  • 18.2 Searching Unstructured Data and Enterprise Search
  • 18.2.1 Principles of Full-Text Search
  • 18.2.2 Indexing Full-Text Documents
  • 18.2.3 Web Search Engines
  • 18.2.4 Enterprise Search
  • 18.3 Data Quality and Master Data Management
  • 18.4 Data Governance
  • 18.4.1 Total Data Quality Management (TDQM)
  • 18.4.2 Capability Maturity Model Integration (CMMI)
  • 18.4.3 Data Management Body of Knowledge (DMBOK)
  • 18.4.4 Control Objectives for Information and Related Technology (COBIT)
  • 18.4.5 Information Technology Infrastructure Library
  • 18.5 Outlook
  • 18.6 Conclusion
  • Problems and Exercises
  • 19 Big Data
  • 19.1 The 5 Vs of Big Data
  • 19.2 Hadoop
  • 19.2.1 History of Hadoop
  • 19.2.2 The Hadoop Stack
  • 19.2.2.1 The Hadoop Distributed File System
  • 19.2.2.2 MapReduce
  • 19.2.2.3 Yet Another Resource Negotiator
  • 19.3 SQL on Hadoop
  • 19.3.1 HBase: The First Database on Hadoop
  • 19.3.2 Pig
  • 19.3.3 Hive
  • 19.4 Apache Spark
  • 19.4.1 Spark Core
  • 19.4.2 Spark SQL
  • 19.4.3 MLlib, Spark Streaming, and GraphX
  • 19.5 Conclusion
  • Problems and Exercises
  • 20 Analytics
  • 20.1 The Analytics Process Model
  • 20.2 Example Analytics Applications
  • 20.3 Data Scientist Job Profile
  • 20.4 Data Pre-Processing
  • 20.4.1 Denormalizing Data for Analysis
  • 20.4.2 Sampling
  • 20.4.3 Exploratory Analysis
  • 20.4.4 Missing Values
  • 20.4.5 Outlier Detection and Handling
  • 20.5 Types of Analytics
  • 20.5.1 Predictive Analytics
  • 20.5.1.1 Linear Regression
  • 20.5.1.2 Logistic Regression
  • Logistic Regression Properties
  • 20.5.1.3 Decision Trees
  • Splitting Decision
  • Stopping Decision
  • Decision Tree Properties
  • Regression Trees
  • 20.5.1.4 Other Predictive Analytics Techniques
  • 20.5.2 Evaluating Predictive Models
  • 20.5.2.1 Splitting Up the Dataset
  • 20.5.2.2 Performance Measures for Classification Models
  • 20.5.2.3 Performance Measures for Regression Models
  • 20.5.2.4 Other Performance Measures for Predictive Analytical Models
  • 20.5.3 Descriptive Analytics
  • 20.5.3.1 Association Rules
  • Basic Setting
  • Support, Confidence, and Lift
  • Post-Processing Association Rules
  • 20.5.3.2 Sequence Rules
  • 20.5.3.3 Clustering
  • Hierarchical Clustering
  • K-means Clustering
  • 20.5.4 Social Network Analytics
  • 20.5.4.1 Social Network Definitions
  • 20.5.4.2 Social Network Metrics
  • 20.5.4.3 Social Network Learning
  • 20.6 Post-Processing of Analytical Models
  • 20.7 Critical Success Factors for Analytical Models
  • 20.8 Economic Perspective on Analytics
  • 20.8.1 Total Cost of Ownership (TCO)
  • 20.8.2 Return on Investment
  • 20.8.3 In- versus Outsourcing
  • 20.8.4 On-Premises versus Cloud Solutions
  • 20.8.5 Open-Source versus Commercial Software
  • 20.9 Improving the ROI of Analytics
  • 20.9.1 New Sources of Data
  • 20.9.2 Data Quality
  • 20.9.3 Management Support
  • 20.9.4 Organizational Aspects
  • 20.9.5 Cross-Fertilization
  • 20.10 Privacy and Security
  • 20.10.1 Overall Considerations Regarding Privacy and Security
  • 20.10.2 The RACI Matrix
  • 20.10.3 Accessing Internal Data
  • 20.10.3.1 Anonymization
  • 20.10.3.2 SQL Views
  • 20.10.3.3 Label-Based Access Control
  • 20.10.4 Privacy Regulation
  • 20.11 Conclusion
  • Problems and Exercises
  • Appendix Using the Online Environment
  • How to Access the Online Environment
  • Environment: Relational Databases and SQL
  • Environment: MongoDB
  • Environment: Neo4j and Cypher
  • Environment: Tree Structure Visualizations
  • Environment: HBase
  • Glossary
  • Index
  • Endorsements

Additional information

Veldu vöru

Rafbók til eignar, Leiga á rafbók í 180 daga

Aðrar vörur

0
    0
    Karfan þín
    Karfan þín er tómAftur í búð