Rollup merge of #75545 - eddyb:instant-sub-branchless, r=sfackler

std/sys/unix/time: make it easier for LLVM to optimize `Instant` subtraction.

This PR is the minimal change necessary to get LLVM to optimize `if self.t.tv_nsec >= other.t.tv_nsec` to branchless instructions (at least on x86_64), inspired by @m-ou-se's own attempts at optimizing `Instant` subtraction.

I stumbled over this by looking at the total number of instructions executed by `rustc -Z self-profile`, and found that after disabling ASLR, the largest source of non-determinism remaining was from this `if` taking one branch or the other, depending on the values involved.

The reason this code is even called so many times to make a difference, is that `measureme` (the `-Z self-profile` implementation) currently uses `Instant::elapsed` for its event timestamps (of which there can be millions).

I doubt it's critical to land this, although perhaps it could slightly improve some forms of benchmarking.
This commit is contained in:
Tyler Mandry 2020-08-14 20:07:16 -07:00 committed by GitHub
commit 29a946203a
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -20,17 +20,29 @@ impl Timespec {
fn sub_timespec(&self, other: &Timespec) -> Result<Duration, Duration> {
if self >= other {
Ok(if self.t.tv_nsec >= other.t.tv_nsec {
Duration::new(
(self.t.tv_sec - other.t.tv_sec) as u64,
(self.t.tv_nsec - other.t.tv_nsec) as u32,
)
// NOTE(eddyb) two aspects of this `if`-`else` are required for LLVM
// to optimize it into a branchless form (see also #75545):
//
// 1. `self.t.tv_sec - other.t.tv_sec` shows up as a common expression
// in both branches, i.e. the `else` must have its `- 1`
// subtraction after the common one, not interleaved with it
// (it used to be `self.t.tv_sec - 1 - other.t.tv_sec`)
//
// 2. the `Duration::new` call (or any other additional complexity)
// is outside of the `if`-`else`, not duplicated in both branches
//
// Ideally this code could be rearranged such that it more
// directly expresses the lower-cost behavior we want from it.
let (secs, nsec) = if self.t.tv_nsec >= other.t.tv_nsec {
((self.t.tv_sec - other.t.tv_sec) as u64, (self.t.tv_nsec - other.t.tv_nsec) as u32)
} else {
Duration::new(
(self.t.tv_sec - 1 - other.t.tv_sec) as u64,
(
(self.t.tv_sec - other.t.tv_sec - 1) as u64,
self.t.tv_nsec as u32 + (NSEC_PER_SEC as u32) - other.t.tv_nsec as u32,
)
})
};
Ok(Duration::new(secs, nsec))
} else {
match other.sub_timespec(self) {
Ok(d) => Err(d),