Optimize common path of Once::doit

Optimize `Once::doit`: perform optimistic check that initializtion is
already completed.  `load` is much cheaper than `fetch_add` at least
on x86_64.

Verified with this test:

```
static mut o: one::Once = one::ONCE_INIT;
unsafe {
    loop {
        let start = time::precise_time_ns();
        let iters = 50000000u64;
        for _ in range(0, iters) {
            o.doit(|| { println!("once!"); });
        }
        let end = time::precise_time_ns();
        let ps_per_iter = 1000 * (end - start) / iters;
        println!("{} ps per iter", ps_per_iter);

        // confuse the optimizer
        o.doit(|| { println!("once!"); });
    }
}
```

Test executed on Mac, Intel Core i7 2GHz. Result is:
* 20ns per iteration without patch
*  4ns per iteration with this patch applied

Once.doit could be even faster (800ps per iteration), if `doit` function
was split into a pair of `doit`/`doit_slow`, and `doit` marked as
`#[inline]` like this:

```
#[inline(always)]
pub fn doit(&self, f: ||) {
    if self.cnt.load(atomics::SeqCst) < 0 {
        return
    }

    self.doit_slow(f);
}

fn doit_slow(&self, f: ||) { ... }
```
This commit is contained in:
Stepan Koltsov 2014-05-14 10:23:42 +00:00
parent db5ca23118
commit f853cf79b5

View file

@ -64,6 +64,11 @@ impl Once {
/// When this function returns, it is guaranteed that some initialization
/// has run and completed (it may not be the closure specified).
pub fn doit(&self, f: ||) {
// Optimize common path: load is much cheaper than fetch_add.
if self.cnt.load(atomics::SeqCst) < 0 {
return
}
// Implementation-wise, this would seem like a fairly trivial primitive.
// The stickler part is where our mutexes currently require an
// allocation, and usage of a `Once` should't leak this allocation.