Description
Efnisyfirlit
- Cover image
- Title page
- Table of Contents
- Inside Front Cover
- In Praise of Computer Architecture: A Quantitative Approach Sixth Edition
- Copyright
- Dedication
- Foreword
- Preface
- Why We Wrote This Book
- This Edition
- Topic Selection and Organization
- An Overview of the Content
- Navigating the Text
- Chapter Structure
- Case Studies With Exercises
- Supplemental Materials
- Helping Improve This Book
- Concluding Remarks
- Acknowledgments
- 1. Fundamentals of Quantitative Design and Analysis
- Abstract
- 1.1 Introduction
- 1.2 Classes of Computers
- 1.3 Defining Computer Architecture
- 1.4 Trends in Technology
- 1.5 Trends in Power and Energy in Integrated Circuits
- 1.6 Trends in Cost
- 1.7 Dependability
- 1.8 Measuring, Reporting, and Summarizing Performance
- 1.9 Quantitative Principles of Computer Design
- 1.10 Putting It All Together: Performance, Price, and Power
- 1.11 Fallacies and Pitfalls
- 1.12 Concluding Remarks
- 1.13 Historical Perspectives and References
- Case Studies and Exercises by Diana Franklin
- References
- 2. Memory Hierarchy Design
- Abstract
- 2.1 Introduction
- 2.2 Memory Technology and Optimizations
- 2.3 Ten Advanced Optimizations of Cache Performance
- 2.4 Virtual Memory and Virtual Machines
- 2.5 Cross-Cutting Issues: The Design of Memory Hierarchies
- 2.6 Putting It All Together: Memory Hierarchies in the ARM Cortex-A53 and Intel Core i7 6700
- 2.7 Fallacies and Pitfalls
- 2.8 Concluding Remarks: Looking Ahead
- 2.9 Historical Perspectives and References
- Case Studies and Exercises by Norman P. Jouppi, Rajeev Balasubramonian, Naveen Muralimanohar, and Sheng Li
- References
- 3. Instruction-Level Parallelism and Its Exploitation
- Abstract
- 3.1 Instruction-Level Parallelism: Concepts and Challenges
- 3.2 Basic Compiler Techniques for Exposing ILP
- 3.3 Reducing Branch Costs With Advanced Branch Prediction
- 3.4 Overcoming Data Hazards With Dynamic Scheduling
- 3.5 Dynamic Scheduling: Examples and the Algorithm
- 3.6 Hardware-Based Speculation
- 3.7 Exploiting ILP Using Multiple Issue and Static Scheduling
- 3.8 Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and Speculation
- 3.9 Advanced Techniques for Instruction Delivery and Speculation
- 3.10 Cross-Cutting Issues
- 3.11 Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor Throughput
- 3.12 Putting It All Together: The Intel Core i7 6700 and ARM Cortex-A53
- 3.13 Fallacies and Pitfalls
- 3.14 Concluding Remarks: What’s Ahead?
- 3.15 Historical Perspective and References
- Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell
- References
- 4. Data-Level Parallelism in Vector, SIMD, and GPU Architectures
- Abstract
- 4.1 Introduction
- 4.2 Vector Architecture
- 4.3 SIMD Instruction Set Extensions for Multimedia
- 4.4 Graphics Processing Units
- 4.5 Detecting and Enhancing Loop-Level Parallelism
- 4.6 Cross-Cutting Issues
- 4.7 Putting It All Together: Embedded Versus Server GPUs and Tesla Versus Core i7
- 4.8 Fallacies and Pitfalls
- 4.9 Concluding Remarks
- 4.10 Historical Perspective and References
- Case Study and Exercises by Jason D. Bakos
- References
- 5. Thread-Level Parallelism
- Abstract
- 5.1 Introduction
- 5.2 Centralized Shared-Memory Architectures
- 5.3 Performance of Symmetric Shared-Memory Multiprocessors
- 5.4 Distributed Shared-Memory and Directory-Based Coherence
- 5.5 Synchronization: The Basics
- 5.6 Models of Memory Consistency: An Introduction
- 5.7 Cross-Cutting Issues
- 5.8 Putting It All Together: Multicore Processors and Their Performance
- 5.9 Fallacies and Pitfalls
- 5.10 The Future of Multicore Scaling
- 5.11 Concluding Remarks
- 5.12 Historical Perspectives and References
- Case Studies and Exercises by Amr Zaky and David A. Wood
- References
- 6. Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism
- Abstract
- 6.1 Introduction
- 6.2 Programming Models and Workloads for Warehouse-Scale Computers
- 6.3 Computer Architecture of Warehouse-Scale Computers
- 6.4 The Efficiency and Cost of Warehouse-Scale Computers
- 6.5 Cloud Computing: The Return of Utility Computing
- 6.6 Cross-Cutting Issues
- 6.7 Putting It All Together: A Google Warehouse-Scale Computer
- 6.8 Fallacies and Pitfalls
- 6.9 Concluding Remarks
- 6.10 Historical Perspectives and References
- Case Studies and Exercises by Parthasarathy Ranganathan
- References
- 7. Domain-Specific Architectures
- Abstract
- 7.1 Introduction
- 7.2 Guidelines for DSAs
- 7.3 Example Domain: Deep Neural Networks
- 7.4 Google’s Tensor Processing Unit, an Inference Data Center Accelerator
- 7.5 Microsoft Catapult, a Flexible Data Center Accelerator
- 7.6 Intel Crest, a Data Center Accelerator for Training
- 7.7 Pixel Visual Core, a Personal Mobile Device Image Processing Unit
- 7.8 Cross-Cutting Issues
- 7.9 Putting It All Together: CPUs Versus GPUs Versus DNN Accelerators
- 7.10 Fallacies and Pitfalls
- 7.11 Concluding Remarks
- 7.12 Historical Perspectives and References
- Case Studies and Exercises by Cliff Young
- References
- Appendix A. Instruction Set Principles
- Abstract
- A.1 Introduction
- A.2 Classifying Instruction Set Architectures
- A.3 Memory Addressing
- A.4 Type and Size of Operands
- A.5 Operations in the Instruction Set
- A.6 Instructions for Control Flow
- A.7 Encoding an Instruction Set
- A.8 Cross-Cutting Issues: The Role of Compilers
- A.9 Putting It All Together: The RISC-V Architecture
- A.10 Fallacies and Pitfalls
- References
- Appendix B. Review of Memory Hierarchy
- Abstract
- B.1 Introduction
- B.2 Cache Performance
- B.3 Six Basic Cache Optimizations
- B.4 Virtual Memory
- B.5 Protection and Examples of Virtual Memory
- B.6 Fallacies and Pitfalls
- B.7 Concluding Remarks
- B.8 Historical Perspective and References
- Exercises by Amr Zaky
- References
- Appendix C. Pipelining: Basic and Intermediate Concepts
- Abstract
- C.1 Introduction
- C.2 The Major Hurdle of Pipelining—Pipeline Hazards
- C.3 How Is Pipelining Implemented?
- C.4 What Makes Pipelining Hard to Implement?
- C.5 Extending the RISC V Integer Pipeline to Handle Multicycle Operations
- C.6 Putting It All Together: The MIPS R4000 Pipeline
- C.7 Cross-Cutting Issues
- C.8 Fallacies and Pitfalls
- C.9 Concluding Remarks
- C.10 Historical Perspective and References
- Updated Exercises by Diana Franklin
- References
- Appendix D. Storage Systems
- D.1 Introduction
- D.2 Advanced Topics in Disk Storage
- D.3 Definition and Examples of Real Faults and Failures
- D.4 I/O Performance, Reliability Measures, and Benchmarks
- D.5 A Little Queuing Theory
- D.6 Crosscutting Issues
- D.7 Designing and Evaluating an I/O System—The Internet Archive Cluster
- D.8 Putting It All Together: NetApp FAS6000 Filer
- D.9 Fallacies and Pitfalls
- D.10 Concluding Remarks
- D.11 Historical Perspective and References
- Case Studies with Exercises by Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau
- References
- Appendix E. Embedded Systems
- E.1 Introduction
- E.2 Signal Processing and Embedded Applications: The Digital Signal Processor
- E.3 Embedded Benchmarks
- E.4 Embedded Multiprocessors
- E.5 Case Study: The Emotion Engine of the Sony PlayStation 2
- E.6 Case Study: Sanyo VPC-SX500 Digital Camera
- E.7 Case Study: Inside a Cell Phone
- E.8 Concluding Remarks
- References
- Appendix F. Interconnection Networks
- F.1 Introduction
- F.2 Interconnecting Two Devices
- F.3 Connecting More than Two Devices
- F.4 Network Topology
- F.5 Network Routing, Arbitration, and Switching
- F.6 Switch Microarchitecture
- F.7 Practical Issues for Commercial Interconnection Networks
- F.8 Examples of Interconnection Networks
- F.9 Internetworking
- F.10 Crosscutting Issues for Interconnection Networks
- F.11 Fallacies and Pitfalls
- F.12 Concluding Remarks
- F.13 Historical Perspective and References
- Exercises
- References
- Appendix G. Vector Processors in More Depth
- G.1 Introduction
- G.2 Vector Performance in More Depth
- G.3 Vector Memory Systems in More Depth
- G.4 Enhancing Vector Performance
- G.5 Effectiveness of Compiler Vectorization
- G.6 Putting It All Together: Performance of Vector Processors
- G.7 A Modern Vector Supercomputer: The Cray X1
- G.8 Concluding Remarks
- G.9 Historical Perspective and References
- Exercises
- References
- Appendix H. Hardware and Software for VLIW and EPIC
- H.1 Introduction: Exploiting Instruction-Level Parallelism Statically
- H.2 Detecting and Enhancing Loop-Level Parallelism
- H.3 Scheduling and Structuring Code for Parallelism
- H.4 Hardware Support for Exposing Parallelism: Predicated Instructions
- H.5 Hardware Support for Compiler Speculation
- H.6 The Intel IA-64 Architecture and Itanium Processor
- H.7 Concluding Remarks
- Reference
- Appendix I. Large-Scale Multiprocessors and Scientific Applications
- I.1 Introduction
- I.2 Interprocessor Communication: The Critical Performance Issue
- I.3 Characteristics of Scientific Applications
- I.4 Synchronization: Scaling Up
- I.5 Performance of Scientific Applications on Shared-Memory Multiprocessors
- I.6 Performance Measurement of Parallel Processors with Scientific Applications
- I.7 Implementing Cache Coherence
- I.8 The Custom Cluster Approach: Blue Gene/L
- I.9 Concluding Remarks
- References
- Appendix J. Computer Arithmetic
- J.1 Introduction
- J.2 Basic Techniques of Integer Arithmetic
- J.3 Floating Point
- J.4 Floating-Point Multiplication
- J.5 Floating-Point Addition
- J.6 Division and Remainder
- J.7 More on Floating-Point Arithmetic
- J.8 Speeding Up Integer Addition
- J.9 Speeding Up Integer Multiplication and Division
- J.10 Putting It All Together
- J.11 Fallacies and Pitfalls
- J.12 Historical Perspective and References
- Exercises
- References
- Appendix K. Survey of Instruction Set Architectures
- K.1 Introduction
- K.2 A Survey of RISC Architectures for Desktop, Server, and Embedded Computers
- K.3 The Intel 80×86
- K.4 The VAX Architecture
- K.5 The IBM 360/370 Architecture for Mainframe Computers
- K.6 Historical Perspective and References
- Acknowledgments
- Appendix L. Advanced Concepts on Address Translation
- Appendix M. Historical Perspectives and References
- M.1 Introduction
- M.2 The Early Development of Computers (Chapter 1)
- References
- M.3 The Development of Memory Hierarchy and Protection (Chapter 2 and Appendix B)
- References
- M.4 The Evolution of Instruction Sets (Appendices A, J, and K)
- References
- M.5 The Development of Pipelining and Instruction-Level Parallelism (Chapter 3 and Appendices C and H)
- References
- M.6 The Development of SIMD Supercomputers, Vector Computers, Multimedia SIMD Instruction Extensions, and Graphical Processor Units (Chapter 4)
- References
- M.7 The History of Multiprocessors and Parallel Processing (Chapter 5 and Appendices F, G, and I)
- References
- M.8 The Development of Clusters (Chapter 6)
- References
- M.9 Historical Perspectives and References
- References
- M.10 The History of Magnetic Storage, RAID, and I/O Buses (Appendix D)
- References
- References
- Index
- Back End Sheet
- Inside Back Cover
Reviews
There are no reviews yet.