Skip to content

Release 1.6.0 (2026-03-13)#

Supported architectures#

  • Added support for most NVIDIA GPUs.

Compiler#

  • clangd works now, providing meaningful diagnostics for both sides of CUDA translation units, in either clang or nvcc dialect mode.
  • Fixed miscompiles of constexpr-evaluated expressions involving errors.
  • Fix an edgecase of SFINAE behaving differently from NVIDIA nvcc.
  • Fix various compiler crashes relating to diagnostic deferral.
  • Fix a compiler crash caused by attempting to convert an i1 to bf16.
  • Various performance improvents.
  • Don't reject __launch_bounds__ expressions containing commas.
  • Added an optimisation to detect cumsums and lower them using DPP/PERMLANE.
  • CUDA printf is now subjected to type checking of the format arguments against the format string speifiers.
  • PTX diagnostics now prettily underline the offending element.
  • Slightly improved compile times for all CUDA translation units.
  • elect instruction no longer crashes the compiler.
  • PTX parser no longer rejects labels immediatley followed by }.
  • Newly-supported NVCC flags:
    • --gpu-architecture
    • --gpu-code

Library (AMD)#

  • Managed memory is now approximatley 50x faster.
  • cudaDeviceSynchronize() is now much faster.
  • XNACK may be used to accelerate certain workloads on supportd hardware now.
  • C++03 is now supported.
  • scaleenv no longer explodes when users set non-default bash flags.
  • Various header compatibility fixes/tweaks.
  • Added atomicAdd_block/system for float2/float4.