Rollup merge of #75545 - eddyb:instant-sub-branchless, r=sfackler
std/sys/unix/time: make it easier for LLVM to optimize `Instant` subtraction. This PR is the minimal change necessary to get LLVM to optimize `if self.t.tv_nsec >= other.t.tv_nsec` to branchless instructions (at least on x86_64), inspired by @m-ou-se's own attempts at optimizing `Instant` subtraction. I stumbled over this by looking at the total number of instructions executed by `rustc -Z self-profile`, and found that after disabling ASLR, the largest source of non-determinism remaining was from this `if` taking one branch or the other, depending on the values involved. The reason this code is even called so many times to make a difference, is that `measureme` (the `-Z self-profile` implementation) currently uses `Instant::elapsed` for its event timestamps (of which there can be millions). I doubt it's critical to land this, although perhaps it could slightly improve some forms of benchmarking.
This commit is contained in:
commit
29a946203a
1 changed files with 20 additions and 8 deletions
|
@ -20,17 +20,29 @@ impl Timespec {
|
|||
|
||||
fn sub_timespec(&self, other: &Timespec) -> Result<Duration, Duration> {
|
||||
if self >= other {
|
||||
Ok(if self.t.tv_nsec >= other.t.tv_nsec {
|
||||
Duration::new(
|
||||
(self.t.tv_sec - other.t.tv_sec) as u64,
|
||||
(self.t.tv_nsec - other.t.tv_nsec) as u32,
|
||||
)
|
||||
// NOTE(eddyb) two aspects of this `if`-`else` are required for LLVM
|
||||
// to optimize it into a branchless form (see also #75545):
|
||||
//
|
||||
// 1. `self.t.tv_sec - other.t.tv_sec` shows up as a common expression
|
||||
// in both branches, i.e. the `else` must have its `- 1`
|
||||
// subtraction after the common one, not interleaved with it
|
||||
// (it used to be `self.t.tv_sec - 1 - other.t.tv_sec`)
|
||||
//
|
||||
// 2. the `Duration::new` call (or any other additional complexity)
|
||||
// is outside of the `if`-`else`, not duplicated in both branches
|
||||
//
|
||||
// Ideally this code could be rearranged such that it more
|
||||
// directly expresses the lower-cost behavior we want from it.
|
||||
let (secs, nsec) = if self.t.tv_nsec >= other.t.tv_nsec {
|
||||
((self.t.tv_sec - other.t.tv_sec) as u64, (self.t.tv_nsec - other.t.tv_nsec) as u32)
|
||||
} else {
|
||||
Duration::new(
|
||||
(self.t.tv_sec - 1 - other.t.tv_sec) as u64,
|
||||
(
|
||||
(self.t.tv_sec - other.t.tv_sec - 1) as u64,
|
||||
self.t.tv_nsec as u32 + (NSEC_PER_SEC as u32) - other.t.tv_nsec as u32,
|
||||
)
|
||||
})
|
||||
};
|
||||
|
||||
Ok(Duration::new(secs, nsec))
|
||||
} else {
|
||||
match other.sub_timespec(self) {
|
||||
Ok(d) => Err(d),
|
||||
|
|
Loading…
Reference in a new issue