rust/compiler
bors 872503d918 Auto merge of #78781 - eddyb:measureme-rdpmc, r=oli-obk
Integrate measureme's hardware performance counter support.

*Note: this is a companion to https://github.com/rust-lang/measureme/pull/143, and duplicates some information with it for convenience*

**(much later) EDIT**: take any numbers with a grain of salt, they may have changed since initial PR open.

## Credits

I'd like to start by thanking `@alyssais,` `@cuviper,` `@edef1c,` `@glandium,` `@jix,` `@Mark-Simulacrum,` `@m-ou-se,` `@mystor,` `@nagisa,` `@puckipedia,` and `@yorickvP,` for all of their help with testing, and valuable insight and suggestions.
Getting here wouldn't have been possible without you!

(If I've forgotten anyone please let me know, I'm going off memory here, plus some discussion logs)

## Summary

This PR adds support to `-Z self-profile` for counting hardware events such as "instructions retired" (as opposed to being limited to time measurements), using the `rdpmc` instruction on `x86_64` Linux.

While other OSes may eventually be supported, preliminary research suggests some kind of kernel extension/driver is required to enable this, whereas on Linux any user can profile (at least) their own threads.

Supporting Linux on architectures other than x86_64 should be much easier (provided the hardware supports such performance counters), and was mostly not done due to a lack of readily available test hardware.
That said, 32-bit `x86` (aka `i686`) would be almost trivial to add and test once we land the initial `x86_64` version (as all the CPU detection code can be reused).

A new flag `-Z self-profile-counter` was added, to control which of the named `measureme` counters is used, and which defaults to `wall-time`, in order to keep `-Z self-profile`'s current functionality unchanged (at least for now).

The named counters so far are:
* `wall-time`: the existing time measurement
    * name chosen for consistency with `perf.rust-lang.org`
    * continues to use `std::time::Instant` for a nanosecond-precision "monotonic clock"
* `instructions:u`: the hardware performance counter usually referred to as "Instructions retired"
    * here "retired" (roughly) means "fully executed"
    * the `:u` suffix is from the Linux `perf` tool and indicates the counter only runs while userspace code is executing, and therefore counts no kernel instructions
        * *see [Caveats/Subtracting IRQs](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs) for why this isn't entirely true and why `instructions-minus-irqs:u` should be preferred instead*
* `instructions-minus-irqs:u`: same as `instructions:u`, except the count of hardware interrupts ("IRQs" here for brevity) is subtracted
    * *see [Caveats/Subtracting IRQs](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs) for why this should be preferred over `instructions:u`*
* `instructions-minus-r0420:u`: experimental counter, same as `instructions-minus-irqs:u` but subtracting an undocumented counter (`r0420:u`) instead of IRQs
    * the `rXXXX` notation is again from Linux `perf`, and indicates a "raw" counter, with a hex representation of the low-level counter configuration - this was picked because we still don't *really* know what it is
    * this only exists for (future) testing and isn't included/used in any comparisons/data we've put together so far
    * *see [Challenges/Zen's undocumented 420 counter](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Epilogue-Zen’s-undocumented-420-counter) for details on how this counter was found and what it does*

---

There are also some additional commits:
* ~~see [Challenges/Rebasing *shouldn't* affect the results, right?](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Rebasing-*shouldn’t*-affect-the-results,-right) for details on the changes to `rustc_parse` and `rustc_trait_section` (the latter far more dubious, and probably shouldn't be merged, or not as-is)~~
  *  **EDIT**: the effects of these are no long quantifiable, the PR includes reverts for them
* ~~see [Challenges/`jemalloc`: purging will commence in ten seconds](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#jemalloc-purging-will-commence-in-ten-seconds) for details on the `jemalloc` change~~
  * this is also separately found in #77162, and we probably want to avoid doing it by default, ideally we'd use the runtime control API `jemalloc` offers (assuming that can stop the timer that's already running, which I'm not sure about)
  * **EDIT**: until we can do this based on `-Z` flags, this commit has also been reverted
* the `proc_macro` change was to avoid randomized hashing and therefore ASLR-like effects

---

**(much later) EDIT**: take any numbers with a grain of salt, they may have changed since initial PR open.

#### Write-up / report

Because of how extensive the full report ended up being, I've kept most of it [on `hackmd.io`](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view), but for convenient access, here are all the sections (with individual links):
<sup>(someone suggested I'd make a backup, so [here it is on the wayback machine](http://web.archive.org/web/20201127164748/https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view) - I'll need to remember to update that if I have to edit the write-up)</sup>

* [**Motivation**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Motivation)

* [**Results**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Results)
    * [**Overhead**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Overhead)
    *Preview (see the report itself for more details):*

    |Counter|Total<br>`instructions-minus-irqs:u`|Overhead from "Baseline"<br>(for all 1903881<br>counter reads)|Overhead from "Baseline"<br>(per each counter read)|
    |-|-|-|-|
    |Baseline|63637621286 ±6||
    |`instructions:u`|63658815885 ±2|&nbsp;&nbsp;+21194599 ±8|&nbsp;&nbsp;+11|
    |`instructions-minus-irqs:u`|63680307361 ±13|&nbsp;&nbsp;+42686075 ±19|&nbsp;&nbsp;+22|
    |`wall-time`|63951958376 ±10275|+314337090 ±10281|+165|

    * [**"Macro" noise (self time)**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#“Macro”-noise-(self-time))
    *Preview (see the report itself for more details):*

    || `wall-time` (ns) | `instructions:u` | `instructions-minus-irqs:u`
    -: | -: | -: | -:
    `typeck` | 5478261360 ±283933373 (±~5.2%) | 17350144522 ±6392 (±~0.00004%) | 17351035832.5 ±4.5 (±~0.00000003%)
    `expand_crate` | 2342096719 ±110465856 (±~4.7%) | 8263777916 ±2937 (±~0.00004%) | 8263708389 ±0 (±~0%)
    `mir_borrowck` | 2216149671 ±119458444 (±~5.4%) | 8340920100 ±2794 (±~0.00003%) | 8341613983.5 ±2.5 (±~0.00000003%)
    `mir_built` | 1269059734 ±91514604 (±~7.2%) | 4454959122 ±1618 (±~0.00004%) | 4455303811 ±1 (±~0.00000002%)
    `resolve_crate` | 942154987.5 ±53068423.5 (±~5.6%) | 3951197709 ±39 (±~0.000001%) | 3951196865 ±0 (±~0%)

    * [**"Micro" noise (individual sampling intervals)**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#“Micro”-noise-(individual-sampling-intervals))

* [**Caveats**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Caveats)
    * [**Disabling ASLR**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Disabling-ASLR)
    * [**Non-deterministic proc macros**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Non-deterministic-proc-macros)
    * [**Subtracting IRQs**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs)
    * [**Lack of support for multiple threads**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Lack-of-support-for-multiple-threads)

* [**Challenges**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Challenges)
    * [**How do we even read hardware performance counters?**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#How-do-we-even-read-hardware-performance-counters)
    * [**ASLR: it's free entropy**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#ASLR-it’s-free-entropy)
    * [**The serializing instruction**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#The-serializing-instruction)
    * [**Getting constantly interrupted**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Getting-constantly-interrupted)
    * [**AMD patented time-travel and dubbed it `SpecLockMap`<br><sup>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;or: "how we accidentally unlocked `rr` on AMD Zen"</sup>**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#AMD-patented-time-travel-and-dubbed-it-SpecLockMapnbspnbspnbspnbspnbspnbspnbspnbspor-“how-we-accidentally-unlocked-rr-on-AMD-Zen”)
    * [**`jemalloc`: purging will commence in ten seconds**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#jemalloc-purging-will-commence-in-ten-seconds)
    * [**Rebasing *shouldn't* affect the results, right?**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Rebasing-*shouldn’t*-affect-the-results,-right)
    * [**Epilogue: Zen's undocumented 420 counter**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Epilogue-Zen’s-undocumented-420-counter)
2022-06-14 13:37:39 +00:00
..
rustc Rollup merge of #97385 - oli-obk:smir-tool-lib, r=pnkfelix 2022-06-14 07:47:24 +09:00
rustc_apfloat
rustc_arena
rustc_ast Revert b983e42936. 2022-06-10 08:35:03 +10:00
rustc_ast_lowering remove unnecessary to_string and String::new 2022-06-13 15:48:40 +09:00
rustc_ast_passes remove unnecessary to_string and String::new 2022-06-13 15:48:40 +09:00
rustc_ast_pretty
rustc_attr remove unnecessary to_string and String::new 2022-06-13 15:48:40 +09:00
rustc_borrowck Rename the ConstS::val field as kind. 2022-06-14 13:06:44 +10:00
rustc_builtin_macros remove unnecessary to_string and String::new for tool_only_span_suggestion 2022-06-13 16:01:16 +09:00
rustc_codegen_cranelift Rename the ConstS::val field as kind. 2022-06-14 13:06:44 +10:00
rustc_codegen_gcc Remove unused macro rule 2022-06-07 08:50:13 -04:00
rustc_codegen_llvm Rollup merge of #95243 - vladimir-ea:compiler_watch_os, r=nagisa 2022-06-14 07:47:23 +09:00
rustc_codegen_ssa Rollup merge of #97935 - nnethercote:rename-ConstS-val-as-kind, r=lcnr 2022-06-14 10:35:29 +02:00
rustc_const_eval Rename the ConstS::val field as kind. 2022-06-14 13:06:44 +10:00
rustc_data_structures Integrate measureme's hardware performance counter support. 2022-06-13 07:56:47 +00:00
rustc_driver
rustc_error_codes Add comment for internal error codes 2022-06-12 19:52:49 -03:00
rustc_error_messages Rollup merge of #97948 - davidtwco:diagnostic-translation-lints, r=oli-obk 2022-06-14 10:35:31 +02:00
rustc_errors Rollup merge of #97948 - davidtwco:diagnostic-translation-lints, r=oli-obk 2022-06-14 10:35:31 +02:00
rustc_expand remove unnecessary to_string and String::new 2022-06-13 15:48:40 +09:00
rustc_feature Rollup merge of #97948 - davidtwco:diagnostic-translation-lints, r=oli-obk 2022-06-14 10:35:31 +02:00
rustc_fs_util
rustc_graphviz
rustc_hir Address comments 2022-06-11 16:38:48 -07:00
rustc_hir_pretty
rustc_incremental Revert dc08bc51f2. 2022-06-10 11:58:29 +10:00
rustc_index Auto merge of #97862 - SparrowLii:superset, r=lcnr 2022-06-09 07:13:46 +00:00
rustc_infer Rollup merge of #97935 - nnethercote:rename-ConstS-val-as-kind, r=lcnr 2022-06-14 10:35:29 +02:00
rustc_interface remove unnecessary to_string and String::new 2022-06-13 15:48:40 +09:00
rustc_lexer
rustc_lint Rollup merge of #97948 - davidtwco:diagnostic-translation-lints, r=oli-obk 2022-06-14 10:35:31 +02:00
rustc_lint_defs
rustc_llvm RustWrapper: adapt to APInt API changes in LLVM 15 2022-06-07 14:47:57 -04:00
rustc_log
rustc_macros Auto merge of #94732 - nnethercote:infallible-encoder, r=bjorn3 2022-06-08 10:24:12 +00:00
rustc_metadata Auto merge of #95880 - cjgillot:def-ident-span, r=petrochenkov 2022-06-11 20:08:48 +00:00
rustc_middle Rename the ConstS::val field as kind. 2022-06-14 13:06:44 +10:00
rustc_mir_build Rename the ConstS::val field as kind. 2022-06-14 13:06:44 +10:00
rustc_mir_dataflow Merge arms in borrowed locals transfer function 2022-06-12 07:27:57 +02:00
rustc_mir_transform Rename the ConstS::val field as kind. 2022-06-14 13:06:44 +10:00
rustc_monomorphize Rename the ConstS::val field as kind. 2022-06-14 13:06:44 +10:00
rustc_parse Rollup merge of #95211 - terrarier2111:improve-parser, r=compiler-errors 2022-06-14 07:47:22 +09:00
rustc_parse_format
rustc_passes Rollup merge of #97948 - davidtwco:diagnostic-translation-lints, r=oli-obk 2022-06-14 10:35:31 +02:00
rustc_plugin_impl
rustc_privacy Folding revamp. 2022-06-08 09:24:03 +10:00
rustc_query_impl Integrate measureme's hardware performance counter support. 2022-06-13 07:56:47 +00:00
rustc_query_system Revert dc08bc51f2. 2022-06-10 11:58:29 +10:00
rustc_resolve remove unnecessary to_string and String::new for tool_only_span_suggestion 2022-06-13 16:01:16 +09:00
rustc_save_analysis
rustc_serialize Revert dc08bc51f2. 2022-06-10 11:58:29 +10:00
rustc_session Auto merge of #78781 - eddyb:measureme-rdpmc, r=oli-obk 2022-06-14 13:37:39 +00:00
rustc_smir
rustc_span Rollup merge of #97948 - davidtwco:diagnostic-translation-lints, r=oli-obk 2022-06-14 10:35:31 +02:00
rustc_symbol_mangling Rename the ConstS::val field as kind. 2022-06-14 13:06:44 +10:00
rustc_target Add Apple WatchOS compile targets 2022-06-13 16:08:53 +01:00
rustc_trait_selection Rollup merge of #97935 - nnethercote:rename-ConstS-val-as-kind, r=lcnr 2022-06-14 10:35:29 +02:00
rustc_traits Rename the ConstS::val field as kind. 2022-06-14 13:06:44 +10:00
rustc_ty_utils Auto merge of #95880 - cjgillot:def-ident-span, r=petrochenkov 2022-06-11 20:08:48 +00:00
rustc_type_ir Revert b983e42936. 2022-06-10 08:35:03 +10:00
rustc_typeck Rollup merge of #97935 - nnethercote:rename-ConstS-val-as-kind, r=lcnr 2022-06-14 10:35:29 +02:00