《大规模并行处理器程序设计(影印版)/大学计算机教育国外著名教材系列》(美)柯克清华大学出版社PDF电子书网盘迅雷下载、免费在线阅读-兰台网

Preface

Acknowledgments

Dedication

CHAPTER 1 INTRODUCTION

1.1 GPUs as Parallel Computers

1.2 Architecture of a Modem GPU

1.3 Why More Speed or Parallelism?

1.4 Parallel Programming Languages and Models

1.5 0verarching Goals

1.6 Organization of the Book

CHAPTER 2 HISTORY OF GPU COMPUTING

2.1 Evolution of Graphics Pipelines

2.1.1 The Era of Fixed-Function Graphics Pipelines

2.1.2 Evolution of Programmable Real-Time Graphics

2.1.3 Unified Graphics and Computing Processors

2.1.4 GPGPU: An Intermediate Step

2.2 GPU Computing

2.2.1 Scalable GPUs

2.2.2 Recent Developments

2.3 Future Trends

CHAPTER 3 INTRODUCTION TO CUDA

3.1 Data Parallelism

3.2 CUDA Program Structure

3.3 A Matrix-Matrix Multiplication Example

3.4 Device Memories and Data Transfer

3.5 Kernel Functions and Threading

3.6 Summary

3.6.1 Function declarations

3.6.2 Kernel launch

3.6.3 Predefined variables

3.6.4 Runtime API

CHAPTER 4 CUDA THREADS

4.1 CUDA Thread Organization

4.2 blockIdx and threadIdx

4.3 Synchronization and Transparent Scalability

4.4 Thread Assignment

4.5 Thread Scheduling and Latency Tolerance

4.6 Summary

4.7 Exercises

CHAPTER 5 CUDATM MEMORIES

5.1 Importance of Memory Access Efficiency

5.2 CUDA Device Memory Types

5.3 A Strategy for Reducing Global Memory Traffic

5.4 Memory as a Limiting Factor to Parallelism

5.5 Summary

5.6 Exercises

CHAPTER 6 PERFORMANCE CONSIDERATIONS

6.1 More on Thread Execution

6.2 Global Memory Bandwidth

6.3 Dynamic Partitioning of SM Resources

6.4 Data Prefetching

6.5 Instruction Mix

6.6 Thread Granularity

6.7 Measured Performance and Summary

6.8 Exercises

CHAPTER 7 FLOATING POINT CONSIDERATIONS

7.1 Floating-Point Format

7.1.1 Normalized Representation of M

7.1.2 Excess Encoding of E

7.2 Representable Numbers

7.3 Special Bit Patterns and Precision

7.4 Arithmetic Accuracy and Rounding

7.5 Algorithm Considerations

7.6 Summary

7.7 Exercises

CHAPTER 8 APPLICATION CASE STUDY: ADVANCED MRI RECONSTRUCTION

8.1 Application Background

8.2 Iterative Reconstruction

8.3 Computing FHd

Step 1. Determine the Kernel Parallelism Structure

Step 2. Getting Around the Memory Bandwidth Limitation.

Step 3. Using Hardware Trigonometry Functions

Step 4. Experimental Performance Tuning

8.4 Final Evaluation

8.5 Exercises

CHAPTER 9 APPLICATION CASE STUDY: MOLECULAR VISUALIZATION AND ANALYSIS

9.1 Application Background

9.2 A Simple Kernel Implementation

9.3 Instruction Execution Efficiency

9.4 Memory Coalescing

9.5 Additional Performance Comparisons

9.6 Using Multiple GPUs

9.7 Exercises

CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL THINKING

10.1 Goals of Parallel Programming

10.2 Problem Decomposition

10.3 Algorithm Selection

10.4 Computational Thinking

10.5 Exercises

CHAPTER 11 A BRIEF INTRODUCTION TO OPENCLTM

11.1 Background

11.2 Data Parallelism Model

11.3 Device Architecture

11.4 Kernel Functions

11.5 Device Management and Kernel Launch

11.6 Electrostatic Potential Map in OpenCL

11.7 Summary

11.8 Exercises

CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK

12.1 Goals Revisited

12.2 Memory Architecture Evolution

12.2.1 Large Virtual and Physical Address Spaces

12.2.2 Unified Device Memory Space

12.2.3 Configurable Caching and Scratch Pad

12.2.4 Enhanced Atomic Operations

12.2.5 Enhanced Global Memory Access

12.3 Kernel Execution Control Evolution

12.3.1 Function Calls within Kernel Functions

12.3.2 Exception Handling in Kernel Functions

12.3.3 Simultaneous Execution of Multiple Kernels

12.3.4 Interruptible Kernels

12.4 Core Performance

12.4.1 Double-Precision Speed

12.4.2 Better Control Flow Efficiency

12.5 Programming Environment

12.6 A Bright Outlook

APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION SOURCE CODE

A.1 matrixmul.cu

A.2 matri mulgol d.cpp

A.3 matrixmul, h

A.4 assi st. h

A.5 Expected Output

APPENDIX B GPU COMPUTE CAPABILITIES

B.1 GPU Compute Capability Tables

B.2 Memory Coalescing Variations

Index

图书	大规模并行处理器程序设计(影印版)/大学计算机教育国外著名教材系列
内容	编辑推荐本书介绍了并行计算的思想，使得读者可以把这种问题的思考方式渗透到高性能并行计算中去；介绍了CUDA的使用，CUDA是NVIDIA公司专门为大规模并行环境创建的一种软件开发工具；介绍如何使用CUDA编程模式和OpenCL来获得高性能和高可靠性。内容推荐本书介绍了并行程序设计与GPU体系结构的基本概念，并详细探讨了用于构建并行程序的各种技术，用案例演示了并行程序设计的整个开发过程，即从并行计算的思想开始，直到最终实现实际且高效的并行程序。目录 Preface Acknowledgments Dedication CHAPTER 1 INTRODUCTION 1.1 GPUs as Parallel Computers 1.2 Architecture of a Modem GPU 1.3 Why More Speed or Parallelism? 1.4 Parallel Programming Languages and Models 1.5 0verarching Goals 1.6 Organization of the Book CHAPTER 2 HISTORY OF GPU COMPUTING 2.1 Evolution of Graphics Pipelines 2.1.1 The Era of Fixed-Function Graphics Pipelines 2.1.2 Evolution of Programmable Real-Time Graphics 2.1.3 Unified Graphics and Computing Processors 2.1.4 GPGPU: An Intermediate Step 2.2 GPU Computing 2.2.1 Scalable GPUs 2.2.2 Recent Developments 2.3 Future Trends CHAPTER 3 INTRODUCTION TO CUDA 3.1 Data Parallelism 3.2 CUDA Program Structure 3.3 A Matrix-Matrix Multiplication Example 3.4 Device Memories and Data Transfer 3.5 Kernel Functions and Threading 3.6 Summary 3.6.1 Function declarations 3.6.2 Kernel launch 3.6.3 Predefined variables 3.6.4 Runtime API CHAPTER 4 CUDA THREADS 4.1 CUDA Thread Organization 4.2 blockIdx and threadIdx 4.3 Synchronization and Transparent Scalability 4.4 Thread Assignment 4.5 Thread Scheduling and Latency Tolerance 4.6 Summary 4.7 Exercises CHAPTER 5 CUDATM MEMORIES 5.1 Importance of Memory Access Efficiency 5.2 CUDA Device Memory Types 5.3 A Strategy for Reducing Global Memory Traffic 5.4 Memory as a Limiting Factor to Parallelism 5.5 Summary 5.6 Exercises CHAPTER 6 PERFORMANCE CONSIDERATIONS 6.1 More on Thread Execution 6.2 Global Memory Bandwidth 6.3 Dynamic Partitioning of SM Resources 6.4 Data Prefetching 6.5 Instruction Mix 6.6 Thread Granularity 6.7 Measured Performance and Summary 6.8 Exercises CHAPTER 7 FLOATING POINT CONSIDERATIONS 7.1 Floating-Point Format 7.1.1 Normalized Representation of M 7.1.2 Excess Encoding of E 7.2 Representable Numbers 7.3 Special Bit Patterns and Precision 7.4 Arithmetic Accuracy and Rounding 7.5 Algorithm Considerations 7.6 Summary 7.7 Exercises CHAPTER 8 APPLICATION CASE STUDY: ADVANCED MRI RECONSTRUCTION 8.1 Application Background 8.2 Iterative Reconstruction 8.3 Computing FHd Step 1. Determine the Kernel Parallelism Structure Step 2. Getting Around the Memory Bandwidth Limitation. Step 3. Using Hardware Trigonometry Functions Step 4. Experimental Performance Tuning 8.4 Final Evaluation 8.5 Exercises CHAPTER 9 APPLICATION CASE STUDY: MOLECULAR VISUALIZATION AND ANALYSIS 9.1 Application Background 9.2 A Simple Kernel Implementation 9.3 Instruction Execution Efficiency 9.4 Memory Coalescing 9.5 Additional Performance Comparisons 9.6 Using Multiple GPUs 9.7 Exercises CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL THINKING 10.1 Goals of Parallel Programming 10.2 Problem Decomposition 10.3 Algorithm Selection 10.4 Computational Thinking 10.5 Exercises CHAPTER 11 A BRIEF INTRODUCTION TO OPENCLTM 11.1 Background 11.2 Data Parallelism Model 11.3 Device Architecture 11.4 Kernel Functions 11.5 Device Management and Kernel Launch 11.6 Electrostatic Potential Map in OpenCL 11.7 Summary 11.8 Exercises CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK 12.1 Goals Revisited 12.2 Memory Architecture Evolution 12.2.1 Large Virtual and Physical Address Spaces 12.2.2 Unified Device Memory Space 12.2.3 Configurable Caching and Scratch Pad 12.2.4 Enhanced Atomic Operations 12.2.5 Enhanced Global Memory Access 12.3 Kernel Execution Control Evolution 12.3.1 Function Calls within Kernel Functions 12.3.2 Exception Handling in Kernel Functions 12.3.3 Simultaneous Execution of Multiple Kernels 12.3.4 Interruptible Kernels 12.4 Core Performance 12.4.1 Double-Precision Speed 12.4.2 Better Control Flow Efficiency 12.5 Programming Environment 12.6 A Bright Outlook APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION SOURCE CODE A.1 matrixmul.cu A.2 matri mulgol d.cpp A.3 matrixmul, h A.4 assi st. h A.5 Expected Output APPENDIX B GPU COMPUTE CAPABILITIES B.1 GPU Compute Capability Tables B.2 Memory Coalescing Variations Index
标签
缩略图
书名	大规模并行处理器程序设计(影印版)/大学计算机教育国外著名教材系列
副书名
原作名
作者	(美)柯克
译者
编者
绘者
出版社	清华大学出版社
商品编码（ISBN）	9787302229735
开本	16开
页数	258
版次	1
装订	平装
字数
出版时间	2010-07-01
首版时间	2010-07-01
印刷时间	2010-07-01
正文语种	英
读者对象	青年(14-20岁),研究人员,普通成人
适用范围
发行范围	公开发行
发行模式	实体书
首发网站
连载网址
图书大类
图书小类
重量	0.374
CIP核字
中图分类号	TP311.11
丛书名
印张	17.5
印次	1
出版地	北京
长	230
宽	185
高	13
整理
媒质	图书
用纸	普通纸
是否注音	否
影印版本	原版
出版商国别	CN
是否套装	单册
著作权合同登记号
版权提供者
定价
印数	3000
出品方
作品荣誉
主角
配角
其他角色
一句话简介
立意
作品视角
所属系列
文章进度
内容简介
作者简介
目录
文摘
安全警示	适度休息有益身心健康，请勿长期沉迷于阅读小说。
随便看	家教随笔（ALL CP）蝶舞飞扬列仙传悲秋杀手魂猎人神的游戏没有期限的约定不同的结局飞蛾扑火御风行敛财娘子入江湖离心痕（暂时）（展司）许愿清云惠月杠上流氓少爷你听白玉若雪左岸香颂雪玉思情浅浅東宫蒼龍·死神卷·蒼龍劫鸟声同时....... 水晶般透明三国我不是疯子同时...... SoftPepper DVD Ripper SoftPepper DVD to iPod Converter SoftPepper MOV Converter SoftPepper DVD to PSP Converter SoftPepper DVD to AVI Converter SoftPepper Zune Video Converter SoftPepper DVD to MPEG Converter SoftPepper DVD to Zune Converter SoftPepper Video Converter SoftPepper iPod Video Converter 篮球MVP 找茬儿极速逃生2-地牢尼莫拼图越野国度闪光国际象棋会说话的吉娃娃 PBA? Bowling Challenge 荒野快跑农场射击亲爱的婚姻宸汐缘婚姻合伙人猎心者空降利刃极限17羽你同行安家秦时明月之万里长城秦时明月之君临天下马卡龙少女