![]() ![]() Optimizations can be applied at various levels, from overlapping data transfers with computation all the way down to fine-tuning floating-point operation sequences. Instead, strategies can be applied incrementally as they are learned. However, as with APOD as a whole, program optimization is an iterative process (identify an opportunity for optimization, apply and test the optimization, verify the speedup achieved, and repeat), meaning that it is not necessary for a programmer to spend large amounts of time memorizing the bulk of all possible optimization strategies prior to seeing good speedups. Since there are many possible optimizations that can be considered, having a good understanding of the needs of the application can help to make the process as smooth as possible. ![]() Optimize Īfter each round of application parallelization is complete, the developer can move to optimizing the implementation to improve performance. As even CPU architectures will require exposing parallelism in order to improve or simply maintain the performance of sequential applications, the CUDA family of parallel programming languages (CUDA C++, CUDA Fortran, etc.) aims to make the expression of this parallelism as simple as possible, while simultaneously enabling operation on CUDA-capable GPUs designed for maximum parallel throughput. On the other hand, some applications’ designs will require some amount of refactoring to expose their inherent parallelism. Depending on the original code, this can be as simple as calling into an existing GPU-optimized library such as cuBLAS, cuFFT, or Thrust, or it could be as simple as adding a few preprocessor directives as hints to a parallelizing compiler. Having identified the hotspots and having done the basic exercises to set goals and expectations, the developer needs to parallelize the code. Armed with this knowledge, the developer can evaluate these bottlenecks for parallelization and start to investigate GPU acceleration.īy understanding the end-user’s requirements and constraints and by applying Amdahl’s and Gustafson’s laws, the developer can determine the upper bound of performance improvement from acceleration of the identified portions of the application. Assess įor an existing project, the first step is to assess the application to locate the parts of the code that are responsible for the bulk of the execution time. ![]() This guide introduces the Assess, Parallelize, Optimize, Deploy(APOD) design cycle for applications with the goal of helping application developers to rapidly identify the portions of their code that would most readily benefit from GPU acceleration, rapidly realize that benefit, and begin leveraging the resulting speedups in production as early as possible.ĪPOD is a cyclical process: initial speedups can be achieved, tested, and deployed with only minimal initial investment of time, at which point the cycle can begin again by identifying further optimization opportunities, seeing additional speedups, and then deploying the even faster versions of the application into production. In particular, the optimization section of this guide assumes that you have already successfully downloaded and installed the CUDA Toolkit (if not, please refer to the relevant CUDA Installation Guide for your platform) and that you have a basic familiarity with the CUDA C++ programming language and environment (if not, please refer to the CUDA C++ Programming Guide). The following documents are especially important resources: This guide refers to and relies on several other documents that you should have at your disposal for reference, all of which are available at no cost from the CUDA website. The discussions in this guide all use the C++ programming language, so you should be comfortable reading C++ code. This approach will greatly improve your understanding of effective programming practices and enable you to better use the guide for reference later. As a result, it is recommended that first-time readers proceed through the guide sequentially. While the contents can be used as a reference manual, you should be aware that some topics are revisited in different contexts as various programming and configuration topics are explored. ![]() It presents established parallelization and optimization techniques and explains coding metaphors and idioms that can greatly simplify programming for CUDA-capable GPU architectures. This Best Practices Guide is a manual to help developers obtain the best performance from NVIDIA ® CUDA ® GPUs. The programming guide to using the CUDA Toolkit to obtain the best performance from NVIDIA GPUs. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |