Release 1.6.0 (2026-03-13)#
Supported architectures#
- Added support for most NVIDIA GPUs.
Compiler#
clangdworks now, providing meaningful diagnostics for both sides of CUDA translation units, in either clang or nvcc dialect mode.- Fixed miscompiles of constexpr-evaluated expressions involving errors.
- Fix an edgecase of SFINAE behaving differently from NVIDIA nvcc.
- Fix various compiler crashes relating to diagnostic deferral.
- Fix a compiler crash caused by attempting to convert an i1 to bf16.
- Various performance improvents.
- Don't reject
__launch_bounds__expressions containing commas. - Added an optimisation to detect cumsums and lower them using DPP/PERMLANE.
- CUDA
printfis now subjected to type checking of the format arguments against the format string speifiers. - PTX diagnostics now prettily underline the offending element.
- Slightly improved compile times for all CUDA translation units.
electinstruction no longer crashes the compiler.- PTX parser no longer rejects labels immediatley followed by
}. - Newly-supported NVCC flags:
--gpu-architecture--gpu-code
Library (AMD)#
- Managed memory is now approximatley 50x faster.
cudaDeviceSynchronize()is now much faster.XNACKmay be used to accelerate certain workloads on supportd hardware now.- C++03 is now supported.
scaleenvno longer explodes when users set non-default bash flags.- Various header compatibility fixes/tweaks.
- Added
atomicAdd_block/systemforfloat2/float4.