Rollup merge of #83019 - eddyb:spirv-no-block-swap, r=nagisa

core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V.

SPIR-V primarily supports what it calls the "Logical addressing model" (and AFAIK for graphical shaders it's the only option), and what that implies is that there is no "memory" to uniformly address at some byte/word level, and that you can't really talk about values having a "raw representation" in terms of sequences of bytes. Therefore, the "block"-wise swapping optimization employed by `ptr::swap_nonoverlapping_one` (where a "block" is 32 bytes, currently), is fundamentally incompatible with SPIR-V "memory".

As such, [Rust-GPU](https://github.com/EmbarkStudios/rust-gpu/)'s `rustc_codegen_spirv` backend cannot currently allow the use of `ptr::swap_nonoverlapping_one` - but that comes at a great price, since it's the building block of `mem::{swap,replace}`, and those in turn are used by e.g. `Option::take` and `Range`'s `Iterator` implementation (the latter blocking the use of `for i in 0..n` loops).

There's 4 options I can see in terms of supporting `ptr::swap_nonoverlapping_one` in `rustc_codegen_spirv`:
* legalize the block-wise swap loop back into swapping whole values, for SPIR-V
  * this is made borderline impossible by the fact that the size of the state "on the stack" is a block, and has to be expanded back to the appropriate size of the value being swapped, so in practice this would have to effectively pattern-match on the exact shape of the block-wise swapping algorithm, as a roundabout way of "patching `core::ptr` on the fly"
* (**this PR**) disable the block-wise swap optimization altogether when `#[cfg(target_arch = "spirv")`
  * I've tested it and it does in fact allow compiling `for i in 0..n` loops, which was my primary motivation
  * main downside IMO is the fact that `core` now acknowledges an out-of-tree backend
    * as a counterpoint, any attempt to compile Rust to SPIR-V would run into this problem, one way or another
* only enable the block-wise swap optimization on targets where it's been empirically proven to be an improvement
  * would avoid any surprises in terms of potentially-broken/inefficient codegen, in general
  * however, it may be universally applicable (thanks to caches), even if the optimal block size could differ
* move low-level swapping into an intrinsic, where the backend can choose any optimization approach it wants
  * this also has an impact on MIR optimizations (cc ``@rust-lang/wg-mir-opt)`` - which currently cannot hope to make sense of e.g. `Option::take` despite it being effectively `_0 = *_1;` `*_1 = None;` `return;`
  * long-term this is my preferred approach, and I can start working on it if that's desired, but I wanted to confirm that this swapping optimization is the final blocker for [Rust-GPU](https://github.com/EmbarkStudios/rust-gpu/) supporting e.g. range `for` loops

r? ``@nagisa`` cc ``@rust-lang/libs``
This commit is contained in:
Dylan DPC 2021-04-05 00:24:29 +02:00 committed by GitHub
commit 4e3f471499
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -473,19 +473,32 @@ pub const unsafe fn swap_nonoverlapping<T>(x: *mut T, y: *mut T, count: usize) {
#[inline]
#[rustc_const_unstable(feature = "const_swap", issue = "83163")]
pub(crate) const unsafe fn swap_nonoverlapping_one<T>(x: *mut T, y: *mut T) {
// For types smaller than the block optimization below,
// just swap directly to avoid pessimizing codegen.
if mem::size_of::<T>() < 32 {
// SAFETY: the caller must guarantee that `x` and `y` are valid
// for writes, properly aligned, and non-overlapping.
unsafe {
let z = read(x);
copy_nonoverlapping(y, x, 1);
write(y, z);
// NOTE(eddyb) SPIR-V's Logical addressing model doesn't allow for arbitrary
// reinterpretation of values as (chunkable) byte arrays, and the loop in the
// block optimization in `swap_nonoverlapping_bytes` is hard to rewrite back
// into the (unoptimized) direct swapping implementation, so we disable it.
// FIXME(eddyb) the block optimization also prevents MIR optimizations from
// understanding `mem::replace`, `Option::take`, etc. - a better overall
// solution might be to make `swap_nonoverlapping` into an intrinsic, which
// a backend can choose to implement using the block optimization, or not.
#[cfg(not(target_arch = "spirv"))]
{
// Only apply the block optimization in `swap_nonoverlapping_bytes` for types
// at least as large as the block size, to avoid pessimizing codegen.
if mem::size_of::<T>() >= 32 {
// SAFETY: the caller must uphold the safety contract for `swap_nonoverlapping`.
unsafe { swap_nonoverlapping(x, y, 1) };
return;
}
} else {
// SAFETY: the caller must uphold the safety contract for `swap_nonoverlapping`.
unsafe { swap_nonoverlapping(x, y, 1) };
}
// Direct swapping, for the cases not going through the block optimization.
// SAFETY: the caller must guarantee that `x` and `y` are valid
// for writes, properly aligned, and non-overlapping.
unsafe {
let z = read(x);
copy_nonoverlapping(y, x, 1);
write(y, z);
}
}