The Model of Computation of CUDA and its Formal Semantics
Axel Habermaier
We formalize the model of computation of modern graphics cards based on the specification
of Nvidia’s Compute Unified Device Architecture (CUDA). CUDA programs are executed by
thousands of threads concurrently and have access to several different types of memory with
unique access patterns and latencies. The underlying hardware uses a single instruction,
multiple threads execution model that groups threads into warps. All threads of the same
warp execute the program in lockstep. If threads of the same warp execute a data-dependent
control flow instruction, control flow might diverge and the different execution paths are
executed sequentially. Once all paths complete execution, all threads are executed in parallel
again. An operational semantics of a significant subset of CUDA’s memory operations and
programming instructions is presented, including shared and non-shared memory operations, atomic operations on shared memory, cached memory access, recursive function calls,
control flow instructions, and thread synchronization.
Based on this formalization we prove that CUDA’s single instruction, multiple threads
execution model is safe: For all threads it is provably true that a thread only executes the
instructions it is allowed to execute, that it does not continue execution after processing the
last instruction of the program, and that it does not skip any instructions it should have
executed. On the other hand, we demonstrate that CUDA’s inability to handle control flow
instructions individually for each thread can cause unexpected program behavior in the
sense that a liveness property is violated.
erschienen 17.10.2011
in: Augsburg
Technical Report 2011-14, Institute of Computer Science, University of Augsburg
