Rollup merge of #77844 - RalfJung:zst-box, r=nikomatsakis

clarify rules for ZST Boxes

LLVM's rules around `getelementptr inbounds` with offset 0 are a bit annoying, and as a consequence we have no choice but say that a `Box<()>` pointing to previously allocated memory that has since been freed is UB. Clarify the docs to reflect this.

This is based on conversations on the LLVM mailing list.
* Here's my initial mail: https://lists.llvm.org/pipermail/llvm-dev/2019-February/130452.html
* The first email of the March part of that thread: https://lists.llvm.org/pipermail/llvm-dev/2019-March/130831.html
* First email of the April part: https://lists.llvm.org/pipermail/llvm-dev/2019-April/131693.html

The conclusion for me at least was that `getelementptr inbounds` with offset 0 is *not* the identity function, but can sometimes return `poison` even when the input is a regular pointer -- specifically, it returns `poison` when this pointer points into something that LLVM "knows has been deallocated", i.e., a former LLVM-managed allocation. It is however the identity function on pointers obtained by casting integers.

Note that there [are formal proposals](https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf) for LLVM semantics where `getelementptr inbounds` with offset 0 isn't quite the identity function but never returns `poison` (it affects the provenance of the pointer but in a way that doesn't matter if this pointer is never used for memory accesses), and indeed this is likely necessary to consistently describe LLVM semantics. But with the informal LLVM LangRef that we have right now, and with LLVM devs insisting otherwise, it seems unwise to rely on this.
This commit is contained in:
Dylan DPC 2020-11-21 19:44:07 +01:00 committed by GitHub
commit 6cd02a85f1
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 17 additions and 2 deletions

View file

@ -62,6 +62,13 @@
//! T` obtained from [`Box::<T>::into_raw`] may be deallocated using the
//! [`Global`] allocator with [`Layout::for_value(&*value)`].
//!
//! For zero-sized values, the `Box` pointer still has to be [valid] for reads
//! and writes and sufficiently aligned. In particular, casting any aligned
//! non-zero integer literal to a raw pointer produces a valid pointer, but a
//! pointer pointing into previously allocated memory that since got freed is
//! not valid. The recommended way to build a Box to a ZST if `Box::new` cannot
//! be used is to use [`ptr::NonNull::dangling`].
//!
//! So long as `T: Sized`, a `Box<T>` is guaranteed to be represented
//! as a single pointer and is also ABI-compatible with C pointers
//! (i.e. the C type `T*`). This means that if you have extern "C"
@ -125,6 +132,7 @@
//! [`Global`]: crate::alloc::Global
//! [`Layout`]: crate::alloc::Layout
//! [`Layout::for_value(&*value)`]: crate::alloc::Layout::for_value
//! [valid]: ptr#safety
#![stable(feature = "rust1", since = "1.0.0")]
@ -530,7 +538,10 @@ impl<T: ?Sized> Box<T> {
/// memory problems. For example, a double-free may occur if the
/// function is called twice on the same raw pointer.
///
/// The safety conditions are described in the [memory layout] section.
///
/// # Examples
///
/// Recreate a `Box` which was previously converted to a raw pointer
/// using [`Box::into_raw`]:
/// ```

View file

@ -16,12 +16,16 @@
//! provided at this point are very minimal:
//!
//! * A [null] pointer is *never* valid, not even for accesses of [size zero][zst].
//! * All pointers (except for the null pointer) are valid for all operations of
//! [size zero][zst].
//! * For a pointer to be valid, it is necessary, but not always sufficient, that the pointer
//! be *dereferenceable*: the memory range of the given size starting at the pointer must all be
//! within the bounds of a single allocated object. Note that in Rust,
//! every (stack-allocated) variable is considered a separate allocated object.
//! * Even for operations of [size zero][zst], the pointer must not be pointing to deallocated
//! memory, i.e., deallocation makes pointers invalid even for zero-sized operations. However,
//! casting any non-zero integer *literal* to a pointer is valid for zero-sized accesses, even if
//! some memory happens to exist at that address and gets deallocated. This corresponds to writing
//! your own allocator: allocating zero-sized objects is not very hard. The canonical way to
//! obtain a pointer that is valid for zero-sized accesses is [`NonNull::dangling`].
//! * All accesses performed by functions in this module are *non-atomic* in the sense
//! of [atomic operations] used to synchronize between threads. This means it is
//! undefined behavior to perform two concurrent accesses to the same location from different