Compile CUDA with SCALE#
This guide covers the steps required to compile an existing CUDA project for an AMD GPU using SCALE.
SCALE creates directories that aim to impersonate the NVIDIA CUDA Toolkit (from the point of view of your build system). Compilation with SCALE is therefore a matter of telling your build system that the CUDA installation path is one provided by SCALE, rather than the one provided by NVIDIA.
Install SCALE#
Install SCALE, if you haven't already.
Identifying GPU Target#
If you don't already know which AMD GPU target you need to compile for, you can
use the scaleinfo
command provided by SCALE to find out:
scaleinfo
Example output:
Found 1 CUDA devices
Device 0 (00:23:00.0): AMD Radeon Pro W6800 - gfx1030 (AMD) <amdgcn-amd-amdhsa--gfx1030>
Total memory: 29.984375 GB [32195477504 B]
Free memory: 29.570312 GB [31750881280 B]
Warp size: 32
Maximum threads per block: 1024
Maximum threads per multiprocessor: 2048
Multiprocessor count: 30
Maximum block dimensions: 1024x1024x1024
Maximum grid dimensions: 2147483647x4194303x4194303
Global memory size: 29.984375 GB [32195477504 B]
Shared memory size: 64.000000 kB [65536 B]
Constant memory size: 2.000000 GB [2147483647 B]
Clock rate: 2555000 kHz
Memory clock rate: 1000000 kHz
In this example, the GPU target ID is gfx1030
.
If your GPU is not listed in the output of this command, it is not currently supported by SCALE.
If the scaleinfo
command is not found, ensure
that <SCALE install path>/bin
is in PATH
.
Point your build system at SCALE#
To allow compilation without build system changes, SCALE provides a series of
directories that are recognised by build systems as being CUDA Toolkit
installations. One such directory is provided for each supported AMD GPU
target. These directories can be found at <SCALE install
path>/targets/gfxXXXX
, where gfxXXXX
is the name of an AMD GPU target,
such as gfx1030
.
You must tell your build system to use the "CUDA Toolkit" corresponding to the desired AMD GPU target.
For example: to build for gfx1030
you would tell your build system that
CUDA is installed at <SCALE install path>/targets/gfx1030
.
The remainder of this document assumes that SCALE_PATH
is an environment
variable you have set to such a path (for example:
/opt/scale/targets/gfx1030
).
CMake#
Add SCALE's nvcc
first in PATH
:
export PATH="${SCALE_PATH}/bin:$PATH"
Then add these arguments to your cmake
invocation:
# Replace with the path to your SCALE install, followed by the name of the
# AMD GPU target you want to compile for.
-DCMAKE_CUDA_COMPILER="${SCALE_PATH}/bin/nvcc"
# See "Why sm_86?" below
-DCMAKE_CUDA_ARCHITECTURES=86
# Either this, or set LD_LIBRARY_PATH to point to ${SCALE_PATH}/lib at runtime.
# Read more below.
-DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON
This will work for any modern CMake project that is using CMake's native CUDA support.
You can check CMake's output to verify it has properly detected SCALE instead of picking up NVIDIA CUDA (if it is installed):
-- The CUDA compiler identification is NVIDIA 12.5.999
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/scale/gfx1030/bin/nvcc
- The compiler ID should be "NVIDIA", followed by a version number ending in
999
. - The "Check for working CUDA compiler" line should point to the SCALE nvcc compiler, not the NVIDIA one.
- Other paths (such as that of cublas, for example) should be pointing to the SCALE versions, not the NVIDIA ones.
Others#
Most build systems will use environment variables and information from invoking
the first nvcc
found in PATH
to determine where CUDA is. As a result,
the following works for many other build systems:
# Update accordingly.
SCALE_INSTALL_DIR=/opt/scale/targets/gfx1030
export PATH="${SCALE_INSTALL_DIR}/bin:$PATH"
export CUDA_HOME="${SCALE_INSTALL_DIR}"
export CUDA_PATH="${SCALE_INSTALL_DIR}"
# If your system has a very old C++ compiler that chokes on SCALE compilers,
# you could add the following to build all your C++ code with the modern
# version of `clang++` bundled with SCALE (and send us a bug report!)
#export CC="${SCALE_INSTALL_DIR}/bin/clang"
#export CXX="${SCALE_INSTALL_DIR}/bin/clang++"
#export CUDAHOSTCXX="${SCALE_INSTALL_DIR}/bin/clang++"
<Your usual build here>
A build-system-specific way of specifying you wish to compile for sm_86 may also be required.
You can verify that SCALE has been correctly added to PATH
by executing
nvcc --version
. You should see output like:
nvcc: NVIDIA (R) Cuda compiler driver
Actually, no. This is the SCALE compiler, and the first/last line of this output are lies to make CMake work.
clang version 17.0.0
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/scale/targets/gfx1030/bin
Cuda compilation tools, release 12.5, V12.5.999
Finding the libraries at runtime#
For maximum compatibility with projects that depend on NVIDIA's "compute capability" numbering scheme, SCALE provides one "cuda mimic directory" per supported GPU target that maps the new target to "sm_86" in NVIDIA's numbering scheme.
This means that each of the target
subdirectories contains
identically-named libraries, so SCALE cannot meaningfully add them to the
system's library search path when it is installed. The built executable/library
therefore needs to be told how to find the libraries via another mechanism,
such as:
- rpath. With CMake, the simplest
thing that "usually just works" is to add
-DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON
to your cmake incantation. - Set
LD_LIBRARY_PATH
to include${SCALE_DIR}/lib
at runtime.
Support for multiple GPU architectures in a single binary ("Fat binaries") is in development.
Next steps#
- Understand why all compilation magically uses
sm_86
by reading about Compute Capability Mapping - Learn about CUDA dialects and SCALE language extensions
- Report a bug