Commit graph

183 commits

Author SHA1 Message Date
Seunghoon Lee 3ad1da30db
Update rust.yml 2024-02-19 00:06:25 +09:00
Seunghoon Lee d325f6a43c
Enable build for Linux. 2024-02-18 23:38:55 +09:00
Seunghoon Lee 8f59c750ac
Update rust.yml 2024-02-18 23:28:21 +09:00
Seunghoon Lee deb3904fe3
Update rust.yml 2024-02-18 23:13:04 +09:00
Seunghoon Lee 2bdbdb9e86
Update rust.yml 2024-02-18 22:59:21 +09:00
Seunghoon Lee 2218ad5d16
Update rust.yml 2024-02-18 22:58:26 +09:00
Seunghoon Lee ea2cb726fa
Create rust.yml 2024-02-18 22:49:40 +09:00
Seunghoon Lee ad970a7665
Implement cublasSgetrsBatched. 2024-02-18 21:34:25 +09:00
Seunghoon Lee 8f3c1292b0
Merge remote-tracking branch 'upstream/master' 2024-02-17 04:13:06 +09:00
Seunghoon Lee 3d8cce6d19
Enable cuDNN for Windows. (under development) 2024-02-17 03:43:17 +09:00
Andrzej Janik 4a81dbffb5
Update llama.cpp support (#102)
Add sign extension support to prmt, allow set.<op>.f16x2.f16x2, add more BLAS mappings
2024-02-16 00:01:21 +01:00
Ikko Eltociear Ashimine 9f7be97ef6
Update README.md (#100)
uderlying -> underlying
2024-02-15 18:15:31 +01:00
Andrzej Janik 8d10f756a9
Add troubleshooting/debugging instructions (#91) 2024-02-15 13:25:52 +01:00
ManInDark c884348427
Fixed typo in readme (#89) 2024-02-15 01:38:42 +01:00
Seunghoon Lee 63edf48d84
Merge branch 'master' of https://github.com/lshqqytiger/ZLUDA 2024-02-15 06:54:43 +09:00
Seunghoon Lee 1ef7ef3938
Add support of cuBLAS, cuSPARSE for Windows. 2024-02-15 06:54:21 +09:00
Arna13 0c3bf2d9d0
Fixing typo in README.md (#63) 2024-02-13 21:57:51 +01:00
Sean McLemon f2a44e0e05
Tidy up some English in ARCHITECTURE.md (#61) 2024-02-13 21:55:21 +01:00
Andrzej Janik 1b9ba2b233 Nobody expects the Red Team
Too many changes to list, but broadly:
* Remove Intel GPU support from the compiler
* Add AMD GPU support to the compiler
* Remove Intel GPU host code
* Add AMD GPU host code
* More device instructions. From 40 to 68
* More host functions. From 48 to 184
* Add proof of concept implementation of OptiX framework
* Add minimal support of cuDNN, cuBLAS, cuSPARSE, cuFFT, NCCL, NVML
* Improve ZLUDA launcher for Windows
2024-02-11 20:45:51 +01:00
Andrzej Janik 60d2124a16
Search for a new developer (#44) 2021-02-28 12:18:44 +01:00
Andrzej Janik 4d3e37befc
Update README.md (#42) 2021-02-22 01:32:04 +01:00
Andrzej Janik a906c350f2
Make misc fixes (#41)
* Update ze_loader.lib to the newest version
* Export _ptsz/_ptds for which we have a legacy stream implementations
* Stop producing build logs if we are not looking at them anyway
2021-02-22 01:29:03 +01:00
Andrzej Janik ab690c6491
Add zluda_redirect.dll to CI builds (#40) 2021-02-21 17:44:42 +01:00
Andrzej Janik 4ed9ef8edb
Improve CI (#39)
* Use official GPU driver packages for building on Linux
* Start building on Windows
* Start uploading artifacts
2021-02-21 14:44:58 +01:00
Andrzej Janik 36514bd6eb
Improve ZLUDA injection (#37)
Improve injector&redirector so it's no longer required to manually mess with files if the application links nvcuda.dll. Additionally inject into child processes
2021-02-20 21:40:19 +01:00
Andrzej Janik 972f612562
Fix signed integer conversion (#36)
This fixes the last remaining bug preventing end-to-end GeekBench run, so also update Geekbench results in README
2021-01-26 21:05:09 +01:00
Andrzej Janik 3e2e73ac33 Add script for replaying dumped kernel (#34)
zluda_dump can already create traces of GPU execution, this script can replay those traces.
Additionally, changed added just enough code in core ZLUDA to support simple PyCUDAexecution
2021-01-23 16:57:07 +01:00
Andrzej Janik ff8135e8a3
Add a library for dumping kernels arguments before and after launch (#18) 2021-01-16 22:28:48 +01:00
Andrzej Janik 09f679693b
Prevent linker from stripping exports on Linux (#33) 2021-01-15 01:17:44 +01:00
Andrzej Janik 5cd9a5fbc4
Add empty implementation of cuDeviceGetLuid (#30)
This function is required by recent versions of CUDA runtime on Windows
2021-01-08 19:43:46 +01:00
Andrzej Janik 237a6c113a
Regenerate SPIR-V tests (#29)
In one of the previous commits we made a change to mark ld/st as aligned. This change was not propagated to test files
2021-01-08 19:06:11 +01:00
Andrzej Janik 078ae20c2c
Improve build procedure and instructions (#28)
Fixes issues pointed out in #27:
* spirv_tools-sys was build in non-test profiles
* By default ZLUDA dll has a wrong name
* We relied on third-party OpenCL installation on Windows
* We encouraged building debug configuration
* We didn't provide build information for developers (cmake, python, submodules)
2021-01-08 17:17:46 +01:00
Andrzej Janik 2c0e9b912f
Fix Windows ZLUDA injector (#26)
Fix various bugs in injector and redirector, make them more robust and enable building them by default
2021-01-03 18:45:48 +01:00
Andrzej Janik 659b2c6ec4 Merge commit '4b96dbc8f49c5ae00c96935e0b576df88a5d8af9' 2021-01-03 17:54:01 +01:00
Andrzej Janik 4b96dbc8f4 Squashed 'ext/detours/' changes from 39aa864..36b69b9
36b69b9 Make Detours MinGW Clang-compatible

git-subtree-dir: ext/detours
git-subtree-split: 36b69b971888b2ca0c5913563bae011efaa4a42e
2021-01-03 17:54:01 +01:00
Andrzej Janik 77523940b3 Merge commit 'dabc40cb19bf4e297c32284d26c74adbd6775e49' as 'ext/detours' 2021-01-03 17:52:14 +01:00
Andrzej Janik dabc40cb19 Squashed 'ext/detours/' content from commit 39aa864
git-subtree-dir: ext/detours
git-subtree-split: 39aa864d2985099c8d847e29a5fb86618039b9c4
2021-01-03 17:52:14 +01:00
Takeshi Watanabe ae950163cd
Add building only CI (#25)
Testing isn't working yet because some tests require live Intel GPU and live NVIDIA GPU
2020-12-29 22:54:48 +01:00
Andrzej Janik 63af70a01f
Fix builtins generation, mark ld/st as aligned (#22)
Two changes:
* Fixes to builtins generation that I forgot to include in #21
* Marking of ld/st as aligned - this gives a big performance boost in GeekBench SFFT
2020-12-12 20:40:24 +01:00
Andrzej Janik a3cfa24593
Fix SPIR-V code generation for PTX special registers (#21)
We currently directly map PTX special registers: %ntid, %tid, etc. to SPIR-V builtins with type OpTypeVector %uint 4.
This is wrong and leads to a silent corruption, which fails e.g. Depth of Field in GeekBench
2020-12-11 21:31:08 +01:00
vosen 770a379452
Refactor how vectors are handled (#20)
Current code has a problem with handling vector members: "b.x" in "mov.u32 a, b.x". This functionality has been kinda tacked-on and has annoying issues:
* vector members support is only limited to being source of movs (so "add.u32 a.x, b.x, c.y" will not work)
* the width of "b" in "b.x" is not known, which led to some "interesting" workarounds
* passes can either convert all member accesses to other member accesses or to temporaries. No way to convert some member accesses to temporaries (which we need for an important fix)
This commit solves all this
2020-12-09 00:20:06 +01:00
vosen a6a9eb347b
Merge pull request #15 from nilsmartel/patch-2
Fix small typo
2020-11-29 00:36:05 +01:00
vosen 295a70e1cb
Merge pull request #14 from ritschwumm/patch-1
fix typo in readme
2020-11-29 00:35:44 +01:00
Nils Martel f452550c4f
Fix small typo 2020-11-27 14:26:27 +01:00
ritschwumm b11ba3d1f3
fix typo in readme 2020-11-27 07:24:51 +01:00
Andrzej Janik 103881f70a Update wording, add license 2020-11-24 23:23:53 +01:00
Andrzej Janik 892e47a653 Update README with links to GeekBench results 2020-11-23 22:38:12 +01:00
Andrzej Janik 690f4f3ad2 Append short project name to the device if there's not enough space for long name 2020-11-23 22:24:35 +01:00
Andrzej Janik 8fa044004f Change wording slightly 2020-11-23 22:18:30 +01:00
Andrzej Janik 25fc385b8d Add graph with Geekbench results 2020-11-23 22:15:59 +01:00