Release 1.0.1 (2024-07-24)#
This release primarily fixes issues that prevent people from successfully compiling their projects with SCALE. Many thanks to those users who submitted bug reports.
CUDA APIs#
- The
extraargument tocuLaunchKernelis now supported. - Added support for some more undocumented NVIDIA headers.
- Fix various overload resolution issues with atomic APIs.
- Fix overload resolution issues with min/max.
- Added various undocumented macros to support projects that are explicitly checking cuda include guard macros.
lrint()andllrint()no longer crash the compiler. :D- Newly supported CUDA APIs:
nvrtcGetNumSupportedArchsnvrtcGetSupportedArchscudaLaunchKernelEx,cuLaunchKernelEx,cudaLaunchKernelExC: some of the performance-hint launch options are no-ops.__vavgs2,__vavgs4- All the
atomic*_block()andatomic*_system()variants.
Compiler#
- Improved parsing of nvcc arguments:
- Allow undocumented option variants (
-foo bar,--foo bar,--foo=bar, and-foo=barare always allowed, it seems). - Implement "interesting" quoting/escaping rules in nvcc arguments, such as
embedded quotes and
\,. We now correctly handle cursed arguments like:'-Xcompiler=-Wl\,-O1' '-Xcompiler=-Wl\,-rpath\,/usr/lib,-Wl\,-rpath-link\,/usr/lib'
- Allow undocumented option variants (
-
Support for more nvcc arguments:
- NVCC-style diagnostic flags:
-Werror,-disable-warnings, etc. --run,--run-args-Xlinker,-linker-options-no-exceptions,-noeh-minimal: no-op. Exact semantics are undocumented, and build times are reasonably fast anyway.-gen-opt-lto,-dlink-time-opt,-dlto. No-ops: device LTO not yet supported.-t,--threads,-split-compile: No-ops: they're flags for making compilation faster and are specific to how nvcc is implemented.-device-int128: no-op: we always enable int128.-extra-device-vectorization: no-op: vectorisation optimisations are controlled by the usual-O*flags.-entries,-source-in-ptx,-src-in-ptx: no-ops: there is no PTX.-use-local-env,-idp,-ddp,-dp, etc.: ignored since they are meaningless except on Windows.
- NVCC-style diagnostic flags:
-
Allow variadic device functions in non-evaluated functions.
- Don't warn about implicit conversion from
cudaLaneMask_ttobool. __builtin_provableno longer causes compiler crashes in-O0/-O1builds.- Fixed a bug causing PTX
asmblocks inside non-template, non-dependent member functions of template classes to sometimes not be compiled, causing PTX to end up in the AMD binary unmodified. - CUDA launch tokens with spaces (ie.:
myKernel<< <1, 1>> >()) are now supported. - Building non-cuda C translation units with SCALE-nvcc now works.
Other#
- The
mesonbuild system no longer regards SCALE-nvcc as a "broken" compiler. hsakmtsysinfono longer explodes if it doesn't like your GPU.- New documentation pages.
- Published more details about thirdparty testing, including the build scripts.