Commit graph

129 commits

Author SHA1 Message Date
Seunghoon Lee d7714d84c0
Add support of ROCm 6. (#27)
* Add support of ROCm 6.1.2 for Windows.

* Fix CI.

* Use llvm.sqrt.f64.
2024-07-13 13:47:35 +09:00
Andrzej Janik 2d8c47f147
Support Meshroom (#153) 2024-05-17 00:35:38 +02:00
NyanCatTW1 fcd7a57888
Fix + improve vprintf implementation (#211) 2024-05-16 00:38:52 +02:00
Andrzej Janik f0c905db15
Fix trap instruction codegen, don't fail build with older Rust versions (#229) 2024-05-08 15:19:59 +02:00
Andrzej Janik 27c0e13677
Minor codegen improvements (#225) 2024-05-06 00:28:49 +02:00
Andrzej Janik 5d5f7cca75
Rewrite surface implementation to more accurately support unofficial CUDA semantics (#203)
This fixes black screen in some CompuBench tests (TV-L1 Optical Flow) and other apps that use CUDA surfaces incorrectly
2024-04-14 02:39:34 +02:00
Andrzej Janik 774f4bcb37
Implement sad instruction (#198) 2024-04-06 01:23:53 +02:00
Andrzej Janik 0d9ace2475
Fix buggy carry flags when mixing subc/sub.cc with addc/add.cc (#197) 2024-04-05 23:26:08 +02:00
NyanCatTW1 76bae5f91b
Implement mad.hi.cc (#196) 2024-04-05 19:12:59 +02:00
Andrzej Janik 4a81dbffb5
Update llama.cpp support (#102)
Add sign extension support to prmt, allow set.<op>.f16x2.f16x2, add more BLAS mappings
2024-02-16 00:01:21 +01:00
Andrzej Janik 1b9ba2b233 Nobody expects the Red Team
Too many changes to list, but broadly:
* Remove Intel GPU support from the compiler
* Add AMD GPU support to the compiler
* Remove Intel GPU host code
* Add AMD GPU host code
* More device instructions. From 40 to 68
* More host functions. From 48 to 184
* Add proof of concept implementation of OptiX framework
* Add minimal support of cuDNN, cuBLAS, cuSPARSE, cuFFT, NCCL, NVML
* Improve ZLUDA launcher for Windows
2024-02-11 20:45:51 +01:00
Andrzej Janik a906c350f2
Make misc fixes (#41)
* Update ze_loader.lib to the newest version
* Export _ptsz/_ptds for which we have a legacy stream implementations
* Stop producing build logs if we are not looking at them anyway
2021-02-22 01:29:03 +01:00
Andrzej Janik 972f612562
Fix signed integer conversion (#36)
This fixes the last remaining bug preventing end-to-end GeekBench run, so also update Geekbench results in README
2021-01-26 21:05:09 +01:00
Andrzej Janik ff8135e8a3
Add a library for dumping kernels arguments before and after launch (#18) 2021-01-16 22:28:48 +01:00
Andrzej Janik 237a6c113a
Regenerate SPIR-V tests (#29)
In one of the previous commits we made a change to mark ld/st as aligned. This change was not propagated to test files
2021-01-08 19:06:11 +01:00
Andrzej Janik 63af70a01f
Fix builtins generation, mark ld/st as aligned (#22)
Two changes:
* Fixes to builtins generation that I forgot to include in #21
* Marking of ld/st as aligned - this gives a big performance boost in GeekBench SFFT
2020-12-12 20:40:24 +01:00
Andrzej Janik a3cfa24593
Fix SPIR-V code generation for PTX special registers (#21)
We currently directly map PTX special registers: %ntid, %tid, etc. to SPIR-V builtins with type OpTypeVector %uint 4.
This is wrong and leads to a silent corruption, which fails e.g. Depth of Field in GeekBench
2020-12-11 21:31:08 +01:00
vosen 770a379452
Refactor how vectors are handled (#20)
Current code has a problem with handling vector members: "b.x" in "mov.u32 a, b.x". This functionality has been kinda tacked-on and has annoying issues:
* vector members support is only limited to being source of movs (so "add.u32 a.x, b.x, c.y" will not work)
* the width of "b" in "b.x" is not known, which led to some "interesting" workarounds
* passes can either convert all member accesses to other member accesses or to temporaries. No way to convert some member accesses to temporaries (which we need for an important fix)
This commit solves all this
2020-12-09 00:20:06 +01:00
Andrzej Janik bcd1740ba9 Add README and rebuild .spv library 2020-11-23 21:50:21 +01:00
Andrzej Janik eb7c9aeeee Rename everything 2020-11-23 20:01:10 +01:00
Andrzej Janik 0415f873ae Throw away useless stuff 2020-11-23 20:00:57 +01:00
Andrzej Janik cd141590be Fix typo in selp 2020-11-22 21:50:54 +01:00
Andrzej Janik 6e39c4a90c Fix linking with shl/shr, add memset on host and support __assertfail 2020-11-21 01:53:07 +01:00
Andrzej Janik 84ac086146 Fix problems with linking 2020-11-21 00:27:37 +01:00
Andrzej Janik 70dc298381 Fix buggy handling of u8 shared memory 2020-11-20 00:07:50 +01:00
Andrzej Janik f77b653d36 Implement stateless-to-stateful optimization 2020-11-19 22:12:12 +01:00
Andrzej Janik a6765baa3a Add back erroneously removed functionality 2020-11-12 22:47:14 +01:00
Andrzej Janik a2e77fe961 Refactor host code to use one big lock 2020-11-12 20:12:14 +01:00
Andrzej Janik 62d14cdffe Fix ftz behavior slightly 2020-11-07 16:14:37 +01:00
Andrzej Janik ac6265f257 Implement instructions bfe, rem, xor 2020-11-06 00:56:45 +01:00
Andrzej Janik d7bf1acf84 Implement instructions clz, brev, popc 2020-11-05 22:10:06 +01:00
Andrzej Janik 8e409254b3 Fix same width float-to-float conversions 2020-11-05 21:39:34 +01:00
Andrzej Janik 96702d86c9 Fix issues with .param/.local and implement sin, cos, ex2, lg2 2020-11-05 00:27:46 +01:00
Andrzej Janik e5a53ed5d3 Implement neg instruction 2020-11-01 14:58:44 +01:00
Andrzej Janik b7d61baf37 Implement div, sqrt, rsqrt and more of setp 2020-11-01 14:34:03 +01:00
Andrzej Janik a82eb20817 Implement atomic instructions 2020-10-31 21:28:15 +01:00
Andrzej Janik 861116f223 Add support for fma instruction 2020-10-26 23:46:28 +01:00
Andrzej Janik c8dadca7d2 Implement selp instruction 2020-10-26 19:18:23 +01:00
Andrzej Janik fc7cc00f47 Add support for and instruction 2020-10-26 18:45:28 +01:00
Andrzej Janik 40bdb83e6b Support float constants 2020-10-26 01:49:25 +01:00
Andrzej Janik 17b788f2a7 Implement ftz handling through Intel extension 2020-10-25 21:09:16 +01:00
Andrzej Janik 45f5183370 Implement ftz handling through Khronos extensions 2020-10-25 19:29:28 +01:00
Andrzej Janik 6480cccc4f Implement rcp instruction 2020-10-25 11:21:51 +01:00
Andrzej Janik eb9053a42f Add test for indirect shared mem use 2020-10-25 10:34:09 +01:00
Andrzej Janik 85ee8210df Add dynamic shared mem support 2020-10-25 00:24:40 +02:00
Andrzej Janik 28a0968294 Fix small regression 2020-10-18 15:06:37 +02:00
Andrzej Janik 2b3ecc99e3 Implement pass to handle .extern .shared and add parsing code for it 2020-10-18 14:46:05 +02:00
Andrzej Janik 27d25865af Add support for top-level global variables, improve array support 2020-10-04 19:53:07 +02:00
Andrzej Janik 9a65dd32f5 Add sub, min, max 2020-10-02 00:11:28 +02:00
Andrzej Janik bd3d440dba Implement or 2020-10-01 20:28:57 +02:00