std: Enable usage of `thread_local!` through imports
The `thread_local!` macro delegated to an internal macro but it didn't
do so in a macros-and-the-module-system compatible fashion, meaning if a
`#![no_std]` crate imported `std` and tried to use `thread_local!` it
would fail due to missing a lookup of an internal macro.
This commit switches the macro to instead use `$crate` to invoke other
macros, ensuring that it'll work when `thread_local!` is imported alone.
The `thread_local!` macro delegated to an internal macro but it didn't
do so in a macros-and-the-module-system compatible fashion, meaning if a
`#![no_std]` crate imported `std` and tried to use `thread_local!` it
would fail due to missing a lookup of an internal macro.
This commit switches the macro to instead use `$crate` to invoke other
macros, ensuring that it'll work when `thread_local!` is imported alone.
Some code in the TLS implementation in libstd stores `Some(val)` into an
`&mut Option<T>` (effectively) and then pulls out `&T`, but it currently
uses `.unwrap()` which can codegen into a panic even though it can never
panic. With sufficient optimizations enabled (like LTO) the compiler can
see through this but this commit helps it along in normal mode
(`--release` with Cargo by default) to avoid codegen'ing the panic path.
This ends up improving the optimized codegen on wasm by ensuring that a
call to panic pulling in more file size doesn't stick around.
This means when the other thread wakes it can continue right away
instead of having to wait for the mutex.
Also add some comments explaining why the mutex needs to be locked in
the first place.
Unchecked thread spawning
# Summary
Add an unsafe interface for spawning lifetime-unrestricted threads for
library authors to build less-contrived, less-hacky safe abstractions
on.
# Motivation
So a few years back scoped threads were entirely removed from the Rust
stdlib, the reason being that it was possible to leak the scoped thread's
join guards without resorting to unsafe code, which meant the concept
was not completely safe, either.
Only a maximally-restrictive safe API for thread spawning was kept in the
stdlib, that requires `'static` lifetime bounds on both the thread closure
and its return type.
A number of 3rd party libraries sprung up to offer their implementations
for safe scoped threads implementations.
These work by essentially hiding the join guards from the user, thus
forcing them to join at the end of an (internal) function scope.
However, since these libraries have to use the maximally restrictive
thread spawning API, they have to resort to some very contrived manipulations
and subversions of Rust's type system to basically achieve what this commit does
with some minimal restructuring of the current code and exposing a new unsafe
function signature for spawning threads without lifetime restrictions.
Obviously this is unsafe, but its main use would be to allow library authors
to write safe abstractions with and around it.
To further illustrate my point, here's a quick summary of the hoops that,
for instance `crossbeam`, has to jump through to spawn a lifetime unrestricted
thread, all of which would not be necessary if an unsafe API existed as part
of the stdlib:
1. Allocate an `Arc<Option<T>>` on the heap where the result with type
`T: 'a` will go (in practice requires `Mutex` or `UnsafeCell` as well).
2. Wrap the desired thread closure with lifetime bound `'a` into another
closure (also `..: 'a`) that returns `()`, executes the inner closure and
writes its result into the pre-allocated `Option<T>`.
3. Box the wrapping closure, cast it to a trait object (`FnBox`) and
(unsafely) transmute its lifetime bound from `'a` to `'static`.
So while this new `spawn_unchecked` function is certainly not very relevant
for general use, since scoped threads are so common I think it makes sense
to expose an interface for libraries implementing these to build on.
The changes implemented are also very minimal: The current `spawn` function
(which internally contains unsafe code) is moved into an unsafe `spawn_unchecked`
function, which the safe function then wraps around.
# Issues
- ~~so far, no documentation for the new function (yet)~~
- the name of the function might be controversial, as `*_unchecked` more commonly
indicates that some sort of runtime check is omitted (`unrestricted` may be
more fitting)
- if accepted, it might make sense to add a freestanding `thread::spawn_unchecked`
function similar to the current `thread::spawn` for convenience.
This adds an implementation of thread local storage for the
`wasm32-unknown-unknown` target when the `atomics` feature is
implemented. This, however, comes with a notable caveat of that it
requires a new feature of the standard library, `wasm-bindgen-threads`,
to be enabled.
Thread local storage for wasm (when `atomics` are enabled and there's
actually more than one thread) is powered by the assumption that an
external entity can fill in some information for us. It's not currently
clear who will fill in this information nor whose responsibility it
should be long-term. In the meantime there's a strategy being gamed out
in the `wasm-bindgen` project specifically, and the hope is that we can
continue to test and iterate on the standard library without committing
to a particular strategy yet.
As to the details of `wasm-bindgen`'s strategy, LLVM doesn't currently
have the ability to emit custom `global` values (thread locals in a
`WebAssembly.Module`) so we leverage the `wasm-bindgen` CLI tool to do
it for us. To that end we have a few intrinsics, assuming two global values:
* `__wbindgen_current_id` - gets the current thread id as a 32-bit
integer. It's `wasm-bindgen`'s responsibility to initialize this
per-thread and then inform libstd of the id. Currently `wasm-bindgen`
performs this initialization as part of the `start` function.
* `__wbindgen_tcb_{get,set}` - in addition to a thread id it's assumed
that there's a global available for simply storing a pointer's worth
of information (a thread control block, which currently only contains
thread local storage). This would ideally be a native `global`
injected by LLVM, but we don't have a great way to support that right
now.
To reiterate, this is all intended to be unstable and purely intended
for testing out Rust on the web with threads. The story is very likely
to change in the future and we want to make sure that we're able to do
that!
Previously the code below would not be guaranteed to exit when the first
spawned thread took the `return, // already unparked` path because there
was no write to synchronize with a read in `park`.
```
use std::sync::atomic::{AtomicBool, Ordering};
use std:🧵:{current, spawn, park};
static FLAG: AtomicBool = AtomicBool::new(false);
fn main() {
let thread_0 = current();
spawn(move || {
FLAG.store(true, Ordering::Relaxed);
thread_0.unpark();
});
let thread_0 = current();
spawn(move || {
thread_0.unpark();
});
while !FLAG.load(Ordering::Relaxed) {
park();
}
}
```
Impl Send & Sync for JoinHandle
This is just a cosmetic change - it slightly relaxes and clarifies the public API without effectively promising any new guarantees.
Currently we have [these auto trait implementations](https://doc.rust-lang.org/nightly/std/thread/struct.JoinHandle.html#synthetic-implementations):
```rust
impl<T: Send> Send for JoinHandle<T> {}
impl<T: Sync> Sync for JoinHandle<T> {}
```
Bound `T: Send` doesn't make much sense because `JoinHandle<T>` can be created only when `T: Send`. Note that [`JoinHandle::<T>::join`](https://doc.rust-lang.org/nightly/std/thread/struct.JoinHandle.html#method.join) doesn't require `T: Send` so why should the `Send` impl?
And the `Sync` impl doesn't need `T: Sync` because `JoinHandle<T>` cannot even share `T` - it can only send it to the thread that calls `join`.
park/park_timeout: prohibit spurious wakeups in next park
<pre><code>
// The implementation currently uses the trivial strategy of a Mutex+Condvar
// with wakeup flag, which does not actually allow spurious wakeups.
</pre></code>
Because does not actually allow spurious wakeups.
so we have let thread.inner.cvar.wait(m) in the loop to prohibit spurious wakeups.
but if notified after we locked, this notification doesn't be consumed, it return, the next park will consume this notification...this is also 'spurious wakeup' case, 'one unpark() wakeups two park()'.
We should improve this situation:
`thread.inner.state.store(EMPTY, SeqCst);`
This commit applies a few code size optimizations for the wasm target to
the standard library, namely around panics. We notably know that in most
configurations it's impossible for us to print anything in
wasm32-unknown-unknown so we can skip larger portions of panicking that
are otherwise simply informative. This allows us to get quite a nice
size reduction.
Finally we can also tweak where the allocation happens for the
`Box<Any>` that we panic with. By only allocating once unwinding starts
we can reduce the size of a panicking wasm module from 44k to 350 bytes.