Description
Efnisyfirlit
- Title Page
- Copyright
- Contents
- 1 Data Warehousing, Business Intelligence, and Dimensional Modeling Primer
- Different Worlds of Data Capture and Data Analysis
- Goals of Data Warehousing and Business Intelligence
- Publishing Metaphor for DW/BI Managers
- Dimensional Modeling Introduction
- Star Schemas Versus OLAP Cubes
- Fact Tables for Measurements
- Dimension Tables for Descriptive Context
- Facts and Dimensions Joined in a Star Schema
- Kimball’s DW/BI Architecture
- Operational Source Systems
- Extract, Transformation, and Load System
- Presentation Area to Support Business Intelligence
- Business Intelligence Applications
- Restaurant Metaphor for the Kimball Architecture
- Alternative DW/BI Architectures
- Independent Data Mart Architecture
- Hub-and-Spoke Corporate Information Factory Inmon Architecture
- Hybrid Hub-and-Spoke and Kimball Architecture
- Dimensional Modeling Myths
- Myth 1: Dimensional Models are Only for Summary Data
- Myth 2: Dimensional Models are Departmental, Not Enterprise
- Myth 3: Dimensional Models are Not Scalable
- Myth 4: Dimensional Models are Only for Predictable Usage
- Myth 5: Dimensional Models Can’t Be Integrated
- More Reasons to Think Dimensionally
- Agile Considerations
- Summary
- 2 Kimball Dimensional Modeling Techniques Overview
- Fundamental Concepts
- Gather Business Requirements and Data Realities
- Collaborative Dimensional Modeling Workshops
- Four-Step Dimensional Design Process
- Business Processes
- Grain
- Dimensions for Descriptive Context
- Facts for Measurements
- Star Schemas and OLAP Cubes
- Graceful Extensions to Dimensional Models
- Basic Fact Table Techniques
- Fact Table Structure
- Additive, Semi-Additive, Non-Additive Facts
- Nulls in Fact Tables
- Conformed Facts
- Transaction Fact Tables
- Periodic Snapshot Fact Tables
- Accumulating Snapshot Fact Tables
- Factless Fact Tables
- Aggregate Fact Tables or OLAP Cubes
- Consolidated Fact Tables
- Basic Dimension Table Techniques
- Dimension Table Structure
- Dimension Surrogate Keys
- Natural, Durable, and Supernatural Keys
- Drilling Down
- Degenerate Dimensions
- Denormalized Flattened Dimensions
- Multiple Hierarchies in Dimensions
- Flags and Indicators as Textual Attributes
- Null Attributes in Dimensions
- Calendar Date Dimensions
- Role-Playing Dimensions
- Junk Dimensions
- Snowflaked Dimensions
- Outrigger Dimensions
- Integration via Conformed Dimensions
- Conformed Dimensions
- Shrunken Dimensions
- Drilling Across
- Value Chain
- Enterprise Data Warehouse Bus Architecture
- Enterprise Data Warehouse Bus Matrix
- Detailed Implementation Bus Matrix
- Opportunity/Stakeholder Matrix
- Dealing with Slowly Changing Dimension Attributes
- Type 0: Retain Original
- Type 1: Overwrite
- Type 2: Add New Row
- Type 3: Add New Attribute
- Type 4: Add Mini-Dimension
- Type 5: Add Mini-Dimension and Type 1 Outrigger
- Type 6: Add Type 1 Attributes to Type 2 Dimension
- Type 7: Dual Type 1 and Type 2 Dimensions
- Dealing with Dimension Hierarchies
- Fixed Depth Positional Hierarchies
- Slightly Ragged/Variable Depth Hierarchies
- Ragged/Variable Depth Hierarchies with Hierarchy Bridge Tables
- Ragged/Variable Depth Hierarchies with Pathstring Attributes
- Advanced Fact Table Techniques
- Fact Table Surrogate Keys
- Centipede Fact Tables
- Numeric Values as Attributes or Facts
- Lag/Duration Facts
- Header/Line Fact Tables
- Allocated Facts
- Profit and Loss Fact Tables Using Allocations
- Multiple Currency Facts
- Multiple Units of Measure Facts
- Year-to-Date Facts
- Multipass SQL to Avoid Fact-to-Fact Table Joins
- Timespan Tracking in Fact Tables
- Late Arriving Facts
- Advanced Dimension Techniques
- Dimension-to-Dimension Table Joins
- Multivalued Dimensions and Bridge Tables
- Time Varying Multivalued Bridge Tables
- Behavior Tag Time Series
- Behavior Study Groups
- Aggregated Facts as Dimension Attributes
- Dynamic Value Bands
- Text Comments Dimension
- Multiple Time Zones
- Measure Type Dimensions
- Step Dimensions
- Hot Swappable Dimensions
- Abstract Generic Dimensions
- Audit Dimensions
- Late Arriving Dimensions
- Special Purpose Schemas
- Supertype and Subtype Schemas for Heterogeneous Products
- Real-Time Fact Tables
- Error Event Schemas
- 3 Retail Sales
- Four-Step Dimensional Design Process
- Step 1: Select the Business Process
- Step 2: Declare the Grain
- Step 3: Identify the Dimensions
- Step 4: Identify the Facts
- Retail Case Study
- Step 1: Select the Business Process
- Step 2: Declare the Grain
- Step 3: Identify the Dimensions
- Step 4: Identify the Facts
- Dimension Table Details
- Date Dimension
- Product Dimension
- Store Dimension
- Promotion Dimension
- Other Retail Sales Dimensions
- Degenerate Dimensions for Transaction Numbers
- Retail Schema in Action
- Retail Schema Extensibility
- Factless Fact Tables
- Dimension and Fact Table Keys
- Dimension Table Surrogate Keys
- Dimension Natural and Durable Supernatural Keys
- Degenerate Dimension Surrogate Keys
- Date Dimension Smart Keys
- Fact Table Surrogate Keys
- Resisting Normalization Urges
- Snowflake Schemas with Normalized Dimensions
- Outriggers
- Centipede Fact Tables with Too Many Dimensions
- Summary
- 4 Inventory
- Value Chain Introduction
- Inventory Models
- Inventory Periodic Snapshot
- Inventory Transactions
- Inventory Accumulating Snapshot
- Fact Table Types
- Transaction Fact Tables
- Periodic Snapshot Fact Tables
- Accumulating Snapshot Fact Tables
- Complementary Fact Table Types
- Value Chain Integration
- Enterprise Data Warehouse Bus Architecture
- Understanding the Bus Architecture
- Enterprise Data Warehouse Bus Matrix
- Conformed Dimensions
- Drilling Across Fact Tables
- Identical Conformed Dimensions
- Shrunken Rollup Conformed Dimension with Attribute Subset
- Shrunken Conformed Dimension with Row Subset
- Shrunken Conformed Dimensions on the Bus Matrix
- Limited Conformity
- Importance of Data Governance and Stewardship
- Conformed Dimensions and the Agile Movement
- Conformed Facts
- Summary
- 5 Procurement
- Procurement Case Study
- Procurement Transactions and Bus Matrix
- Single Versus Multiple Transaction Fact Tables
- Complementary Procurement Snapshot
- Slowly Changing Dimension Basics
- Type 0: Retain Original
- Type 1: Overwrite
- Type 2: Add New Row
- Type 3: Add New Attribute
- Type 4: Add Mini-Dimension
- Hybrid Slowly Changing Dimension Techniques
- Type 5: Mini-Dimension and Type 1 Outrigger
- Type 6: Add Type 1 Attributes to Type 2 Dimension
- Type 7: Dual Type 1 and Type 2 Dimensions
- Slowly Changing Dimension Recap
- Summary
- 6 Order Management
- Order Management Bus Matrix
- Order Transactions
- Fact Normalization
- Dimension Role Playing
- Product Dimension Revisited
- Customer Dimension
- Deal Dimension
- Degenerate Dimension for Order Number
- Junk Dimensions
- Header/Line Pattern to Avoid
- Multiple Currencies
- Transaction Facts at Different Granularity
- Another Header/Line Pattern to Avoid
- Invoice Transactions
- Service Level Performance as Facts, Dimensions, or Both
- Profit and Loss Facts
- Audit Dimension
- Accumulating Snapshot for Order Fulfillment Pipeline
- Lag Calculations
- Multiple Units of Measure
- Beyond the Rearview Mirror
- Summary
- 7 Accounting
- Accounting Case Study and Bus Matrix
- General Ledger Data
- General Ledger Periodic Snapshot
- Chart of Accounts
- Period Close
- Year-to-Date Facts
- Multiple Currencies Revisited
- General Ledger Journal Transactions
- Multiple Fiscal Accounting Calendars
- Drilling Down Through a Multilevel Hierarchy
- Financial Statements
- Budgeting Process
- Dimension Attribute Hierarchies
- Fixed Depth Positional Hierarchies
- Slightly Ragged Variable Depth Hierarchies
- Ragged Variable Depth Hierarchies
- Shared Ownership in a Ragged Hierarchy
- Time Varying Ragged Hierarchies
- Modifying Ragged Hierarchies
- Alternative Ragged Hierarchy Modeling Approaches
- Advantages of the Bridge Table Approach for Ragged Hierarchies
- Consolidated Fact Tables
- Role of OLAP and Packaged Analytic Solutions
- Summary
- 8 Customer Relationship Management
- CRM Overview
- Operational and Analytic CRM
- Customer Dimension Attributes
- Name and Address Parsing
- International Name and Address Considerations
- Customer-Centric Dates
- Aggregated Facts as Dimension Attributes
- Segmentation Attributes and Scores
- Counts with Type 2 Dimension Changes
- Outrigger for Low Cardinality Attribute Set
- Customer Hierarchy Considerations
- Bridge Tables for Multivalued Dimensions
- Bridge Table for Sparse Attributes
- Bridge Table for Multiple Customer Contacts
- Complex Customer Behavior
- Behavior Study Groups for Cohorts
- Step Dimension for Sequential Behavior
- Timespan Fact Tables
- Tagging Fact Tables with Satisfaction Indicators
- Tagging Fact Tables with Abnormal Scenario Indicators
- Customer Data Integration Approaches
- Master Data Management Creating a Single Customer Dimension
- Partial Conformity of Multiple Customer Dimensions
- Avoiding Fact-to-Fact Table Joins
- Low Latency Reality Check
- Summary
- 9 Human Resources Management
- Employee Profile Tracking
- Precise Effective and Expiration Timespans
- Dimension Change Reason Tracking
- Profile Changes as Type 2 Attributes or Fact Events
- Headcount Periodic Snapshot
- Bus Matrix for HR Processes
- Packaged Analytic Solutions and Data Models
- Recursive Employee Hierarchies
- Change Tracking on Embedded Manager Key
- Drilling Up and Down Management Hierarchies
- Multivalued Skill Keyword Attributes
- Skill Keyword Bridge
- Skill Keyword Text String
- Survey Questionnaire Data
- Text Comments
- Summary
- 10 Financial Services
- Banking Case Study and Bus Matrix
- Dimension Triage to Avoid Too Few Dimensions
- Household Dimension
- Multivalued Dimensions and Weighting Factors
- Mini-Dimensions Revisited
- Adding a Mini-Dimension to a Bridge Table
- Dynamic Value Banding of Facts
- Supertype and Subtype Schemas for Heterogeneous Products
- Supertype and Subtype Products with Common Facts
- Hot Swappable Dimensions
- Summary
- 11 Telecommunications
- Telecommunications Case Study and Bus Matrix
- General Design Review Considerations
- Balance Business Requirements and Source Realities
- Focus on Business Processes
- Granularity
- Single Granularity for Facts
- Dimension Granularity and Hierarchies
- Date Dimension
- Degenerate Dimensions
- Surrogate Keys
- Dimension Decodes and Descriptions
- Conformity Commitment
- Design Review Guidelines
- Draft Design Exercise Discussion
- Remodeling Existing Data Structures
- Geographic Location Dimension
- Summary
- 12 Transportation
- Airline Case Study and Bus Matrix
- Multiple Fact Table Granularities
- Linking Segments into Trips
- Related Fact Tables
- Extensions to Other Industries
- Cargo Shipper
- Travel Services
- Combining Correlated Dimensions
- Class of Service
- Origin and Destination
- More Date and Time Considerations
- Country-Specific Calendars as Outriggers
- Date and Time in Multiple Time Zones
- Localization Recap
- Summary
- 13 Education
- University Case Study and Bus Matrix
- Accumulating Snapshot Fact Tables
- Applicant Pipeline
- Research Grant Proposal Pipeline
- Factless Fact Tables
- Admissions Events
- Course Registrations
- Facility Utilization
- Student Attendance
- More Educational Analytic Opportunities
- Summary
- 14 Healthcare
- Healthcare Case Study and Bus Matrix
- Claims Billing and Payments
- Date Dimension Role Playing
- Multivalued Diagnoses
- Supertypes and Subtypes for Charges
- Electronic Medical Records
- Measure Type Dimension for Sparse Facts
- Freeform Text Comments
- Images
- Facility/Equipment Inventory Utilization
- Dealing with Retroactive Changes
- Summary
- 15 Electronic Commerce
- Clickstream Source Data
- Clickstream Data Challenges
- Clickstream Dimensional Models
- Page Dimension
- Event Dimension
- Session Dimension
- Referral Dimension
- Clickstream Session Fact Table
- Clickstream Page Event Fact Table
- Step Dimension
- Aggregate Clickstream Fact Tables
- Google Analytics
- Integrating Clickstream into Web Retailer’s Bus Matrix
- Profitability Across Channels Including Web
- Summary
- 16 Insurance
- Insurance Case Study
- Insurance Value Chain
- Draft Bus Matrix
- Policy Transactions
- Dimension Role Playing
- Slowly Changing Dimensions
- Mini-Dimensions for Large or Rapidly Changing Dimensions
- Multivalued Dimension Attributes
- Numeric Attributes as Facts or Dimensions
- Degenerate Dimension
- Low Cardinality Dimension Tables
- Audit Dimension
- Policy Transaction Fact Table
- Heterogeneous Supertype and Subtype Products
- Complementary Policy Accumulating Snapshot
- Premium Periodic Snapshot
- Conformed Dimensions
- Conformed Facts
- Pay-in-Advance Facts
- Heterogeneous Supertypes and Subtypes Revisited
- Multivalued Dimensions Revisited
- More Insurance Case Study Background
- Updated Insurance Bus Matrix
- Detailed Implementation Bus Matrix
- Claim Transactions
- Transaction Versus Profile Junk Dimensions
- Claim Accumulating Snapshot
- Accumulating Snapshot for Complex Workflows
- Timespan Accumulating Snapshot
- Periodic Instead of Accumulating Snapshot
- Policy/Claim Consolidated Periodic Snapshot
- Factless Accident Events
- Common Dimensional Modeling Mistakes to Avoid
- Mistake 10: Place Text Attributes in a Fact Table
- Mistake 9: Limit Verbose Descriptors to Save Space
- Mistake 8: Split Hierarchies into Multiple Dimensions
- Mistake 7: Ignore the Need to Track Dimension Changes
- Mistake 6: Solve All Performance Problems with More Hardware
- Mistake 5: Use Operational Keys to Join Dimensions and Facts
- Mistake 4: Neglect to Declare and Comply with the Fact Grain
- Mistake 3: Use a Report to Design the Dimensional Model
- Mistake 2: Expect Users to Query Normalized Atomic Data
- Mistake 1: Fail to Conform Facts and Dimensions
- Summary
- 17 Kimball DW/BI Lifecycle Overview
- Lifecycle Roadmap
- Roadmap Mile Markers
- Lifecycle Technology Track
- Technical Architecture Design
- Product Selection and Installation
- Lifecycle Data Track
- Dimensional Modeling
- Physical Design
- ETL Design and Development
- Lifecycle BI Applications Track
- BI Application Specification
- BI Application Development
- Lifecycle Wrap-up Activities
- Deployment
- Maintenance and Growth
- Common Pitfalls to Avoid
- Summary
- 18 Dimensional Modeling Process and Tasks
- Modeling Process Overview
- Get Organized
- Identify Participants, Especially Business Representatives
- Review the Business Requirements
- Leverage a Modeling Tool
- Leverage a Data Profiling Tool
- Leverage or Establish Naming Conventions
- Coordinate Calendars and Facilities
- Design the Dimensional Model
- Reach Consensus on High-Level Bubble Chart
- Develop the Detailed Dimensional Model
- Review and Validate the Model
- Finalize the Design Documentation
- Summary
- 19 ETL Subsystems and Techniques
- Round Up the Requirements
- Business Needs
- Compliance
- Data Quality
- Security
- Data Integration
- Data Latency
- Archiving and Lineage
- BI Delivery Interfaces
- Available Skills
- Legacy Licenses
- The 34 Subsystems of ETL
- Extracting: Getting Data into the Data Warehouse
- Subsystem 1: Data Profiling
- Subsystem 2: Change Data Capture System
- Subsystem 3: Extract System
- Cleaning and Conforming Data
- Improving Data Quality Culture and Processes
- Subsystem 4: Data Cleansing System
- Subsystem 5: Error Event Schema
- Subsystem 6: Audit Dimension Assembler
- Subsystem 7: Deduplication System
- Subsystem 8: Conforming System
- Delivering: Prepare for Presentation
- Subsystem 9: Slowly Changing Dimension Manager
- Subsystem 10: Surrogate Key Generator
- Subsystem 11: Hierarchy Manager
- Subsystem 12: Special Dimensions Manager
- Subsystem 13: Fact Table Builders
- Subsystem 14: Surrogate Key Pipeline
- Subsystem 15: Multivalued Dimension Bridge Table Builder
- Subsystem 16: Late Arriving Data Handler
- Subsystem 17: Dimension Manager System
- Subsystem 18: Fact Provider System
- Subsystem 19: Aggregate Builder
- Subsystem 20: OLAP Cube Builder
- Subsystem 21: Data Propagation Manager
- Managing the ETL Environment
- Subsystem 22: Job Scheduler
- Subsystem 23: Backup System
- Subsystem 24: Recovery and Restart System
- Subsystem 25: Version Control System
- Subsystem 26: Version Migration System
- Subsystem 27: Workflow Monitor
- Subsystem 28: Sorting System
- Subsystem 29: Lineage and Dependency Analyzer
- Subsystem 30: Problem Escalation System
- Subsystem 31: Parallelizing/Pipelining System
- Subsystem 32: Security System
- Subsystem 33: Compliance Manager
- Subsystem 34: Metadata Repository Manager
- Summary
- 20 ETL System Design and Development Process and Tasks
- ETL Process Overview
- Develop the ETL Plan
- Step 1: Draw the High-Level Plan
- Step 2: Choose an ETL Tool
- Step 3: Develop Default Strategies
- Step 4: Drill Down by Target Table
- Develop the ETL Specification Document
- Develop One-Time Historic Load Processing
- Step 5: Populate Dimension Tables with Historic Data
- Step 6: Perform the Fact Table Historic Load
- Develop Incremental ETL Processing
- Step 7: Dimension Table Incremental Processing
- Step 8: Fact Table Incremental Processing
- Step 9: Aggregate Table and OLAP Loads
- Step 10: ETL System Operation and Automation
- Real-Time Implications
- Real-Time Triage
- Real-Time Architecture Trade-Offs
- Real-Time Partitions in the Presentation Server
- Summary
- 21 Big Data Analytics
- Big Data Overview
- Extended RDBMS Architecture
- MapReduce/Hadoop Architecture
- Comparison of Big Data Architectures
- Recommended Best Practices for Big Data
- Management Best Practices for Big Data
- Architecture Best Practices for Big Data
- Data Modeling Best Practices for Big Data
- Data Governance Best Practices for Big Data
- Summary
- Index
- Advertisement
Reviews
There are no reviews yet.