1 Parallel Architectures and Compilers.- 1.1 Introduction.- 1.2 Book Overview.- 1.3 Vector and Parallel Machines.- 1.4 Parallelism in Programs.- 1.4.1 Sources and Types of Parallelism.- 1.4.2 Expressing Parallelism in Programs.- 1.4.3 Utilization of Parallelism.- 1.4.4 The Role of the Compiler.- 1.5 Basic Concepts and Definitions.- 2 Program restructuring for parallel execution.- 2.1 Data Dependences.- 2.2 Common Optimizations.- 2.2.1 Induction Variable Substitution.- 2.2.2 Index Recurrences.- 2.2.3 Loop Unrolling.- 2.2.4 Constant Propagation and Expression Evaluation.- 2.3 Transformations for Vector/Parallel Loops.- 2.3.1 Loop Vectorization and Loop Distribution.- 2.3.2 Loop Interchange.- 2.3.3 Node Splitting and Statement Reordering.- 2.3.4 Code Reordering for Minimizing Communication.- 2.3.5 Loop Blocking.- 2.4 Cycle Shrinking.- 2.4.1 Simple Loops.- 2.4.2 Complex Loops.- 2.4.3 Computing Dependence Distances.- 2.4.4 Cycle Shrinking vs. Partition.- 2.4.5 The Cost of Barriers in Cycle Shrinking.- 2.5 Loop Spreading.- 2.5.1 The Transformation for Chains of Loops.- 2.5.2 Generalizations.- 2.6 Loop Coalescing.- 2.6.1 Processor Assignment and Subscript Calculation.- 2.6.2 Hybrid Loops.- 2.6.3 Non-Perfectly Nested Loops, One Way Nesting.- 2.6.4 Multiway Nested Loops.- 2.7 Run-Time Dependence Testing.- 2.7.1 The RDC Transformation for Nonsubscripted Subscripts.- 2.8 Subscript Blocking.- 2.8.1 The Problem of Subscripted Subscripts and its Application.- 2.8.2 The Transformation.- 2.8.3 Recurrences with Subscripted Subscripts.- 2.8.4 Multiply Nested Loops.- 2.8.5 Expected Speedup.- 2.9 Future Directions.- 3 A Comprehensive Environment for Automatic Packaging and Scheduling of Parallelism.- 3.1 Introduction.- 3.1.1 Previous Approaches & Current Parallel Machines.- 3.1.2 Modes of Operation and Importance of Scheduling.- 3.1.3 Sophisticated vs Simple & Static vs Dynamic.- 3.1.4 Scheduling Goals: Overhead & Load Balancing.- 3.1.5 Granularity and Partitioning.- 3.1.6 Synchronization and Precedence Relations.- 3.2 A Comprehensive Approach to Scheduling.- 3.2.1 Partitioning.- 3.2.2 Pre-Scheduling.- 3.2.3 Run-Time Task Scheduling.- 3.2.4 Loop Scheduling.- 3.2.5 Beyond Loops: Scalar, Vector, and VLIW Processors.- 3.3 Auto-Scheduling Compilers.- 3.3.1 Possible Advantages & Disadvantages.- 3.3.2 A General Framework.- 3.3.3 Enforcing Task Execution Order.- 3.3.4 Optimizing the Drive Code.- 4 Static and Dynamic Loop Scheduling.- 4.1 Introduction.- 4.1.1 Parallel Loops.- 4.1.2 Self-Scheduling Through Implicit Coalescing.- 4.2 The Guided Self-Scheduling (GSS(k)) Algorithm.- 4.2.1 Implicit Loop Coalescing and Interchange.- 4.2.2 The Scheduling Algorithm.- 4.2.3 Further Reduction of Synchronization Operations.- 4.3 Simulation Results.- 4.3.1 The Simulator.- 4.3.2 Experiments.- 4.4 Static Loop Scheduling.- 4.4.1 Definitions and Basic Concepts.- 4.4.2 Optimal Processor Assignment to Parallel Loops.- 4.4.3 Experiments.- 5 Run-Time Overhead.- 5.1 Introduction.- 5.2 Bounds for Dynamic Loop Scheduling.- 5.3 Overhead of Parallel Tasks.- 5.4 Two Run-Time Overhead Models.- 5.4.1 Run-time Overhead is O(p).- 5.4.2 Run-Time Overhead is O(log p).- 5.4.3 Measurements.- 5.5 Deciding the Minimum Unit of Allocation.- 6 Static Program Partitioning.- 6.1 Introduction.- 6.1.1 Communication and Parallelism Trade-offs.- 6.1.2 More on Communication and Partitioning.- 6.2 Methods for Program Partitioning.- 6.2.1 A Model for Quantifying Communication.- 6.3 Optimal Task Composition for Chains.- 6.4 Details of Interprocessor Communication.- 7 Static Task Scheduling.- 7.1 Introduction.- 7.2 Optimal Allocations for High Level Spreading.- 7.3 Scheduling Independent Serial Tasks.- 7.4 High Level Spreading for Complete Task Graphs.- 7.4.1 Processor Allocation for p-Wide Task Graphs.- 7.4.2 List Scheduling Heuristics.- 7.4.3 The Weighted Priority Heuristic Algorithm.- 7.5 Bounds for Static Scheduling.- 8 Speedup Bounds for Parallel Programs.- 8.1 Introduction.- 8.2 General Bounds on Speedup.- 8.3 Speedup Measures for Task Graphs.- 8.4 Speedup Measures for Doacr Loops.- 8.5 Multiprocessors vs. Vector/Array Machines.- References.