Invited Program.- Embedded Systems as Datacenters.- Larrabee: A Many-Core Intel Architecture for Visual Computing.- Architectural Support for Concurrency.- Remote Store Programming.- Low-Overhead, High-Speed Multi-core Barrier Synchronization.- Improving Performance by Reducing Aborts in Hardware Transactional Memory.- Energy and Throughput Efficient Transactional Memory for Embedded Multicore Systems.- Compilation and Runtime Systems.- Split Register Allocation: Linear Complexity Without the Performance Penalty.- Trace-Based Data Layout Optimizations for Multi-core Processors.- Buffer Sizing for Self-timed Stream Programs on Heterogeneous Distributed Memory Multiprocessors.- Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures.- Reconfigurable and Customized Architectures.- Virtual Ways: Efficient Coherence for Architecturally Visible Storage in Automatic Instruction Set Extensions.- Accelerating XML Query Matching through Custom Stack Generation on FPGAs.- An Application-Aware Load Balancing Strategy for Network Processors.- Memory-Aware Application Mapping on Coarse-Grained Reconfigurable Arrays.- Multicore Efficiency, Reliability, and Power.- Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors.- Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors.- RELOCATE: Register File Local Access Pattern Redistribution Mechanism for Power and Thermal Management in Out-of-Order Embedded Processor.- Performance and Power Aware CMP Thread Allocation Modeling.- Memory Organization and Optimization.- Multi-level Hardware Prefetching Using Low Complexity Delta Correlating Prediction Tables with Partial Matching.- Scalable Shared-Cache Management by Containing Thrashing Workloads.- SRP: Symbiotic Resource Partitioning of the Memory Hierarchy in CMPs.- DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems.- Programming and Analysis of Accelerators.- Tagged Procedure Calls (TPC): Efficient Runtime Support for Task-Based Parallelism on the Cell Processor.- Analysis of Task Offloading for Accelerators.- Offload – Automating Code Migration to Heterogeneous Multicore Systems.- Computer Generation of Efficient Software Viterbi Decoders.