The temporary object was used as a workaround when the target parser may
change STI. D14346 made the MCSubtargetInfo argument to
createMCAsmParser const, so we no longer need the temporary object.
As part of the nontrivial unswitching we could end up removing child
loops. This patch add a notification to the pass manager when
that happens (using the markLoopAsDeleted callback).
Without this there could be stale LoopAccessAnalysis results cached
in the analysis manager. Those analysis results are cached based on
a Loop* as key. Since the BumpPtrAllocator used to allocate
Loop objects could be resetted between different runs of for
example the loop-distribute pass (running on different functions),
a new Loop object could be created using the same Loop pointer.
And then when requiring the LoopAccessAnalysis for the loop we
got the stale (corrupt) result from the destroyed loop.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D109257
The current inconsistency confuse contributors which coding guidlines to follow.
It would be better to have it consistent using clang-format tool.
Reviewed By: mhjacobson
Differential Revision: https://reviews.llvm.org/D109270
The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop.
I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD.
But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.
The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop.
I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD.
But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.
For RMW instructions, the load and store hold the MEC for an extra cycle, but within the same single uop. This is alluded to in the Intel AOM:
"The MEC also owns the MEC RSV, which is responsible for scheduling of all loads and stores. Load and
store instructions go through addresses generation phase in program order to avoid on-the-fly memory
ordering later in the pipeline. Therefore, an unknown address will stall younger memory instructions."
Noticed while trying to get a cheap SLM test box up and running with llvm-exegesis - RMW arithmetic is always 1uop - and matches what Agner / InstLatX64 report as well.
These were all set to the same best case mul i32 values (which seems to be the only version of MUL that SLM actually performs well with).
Noticed while trying to improve multiplication costs for vectorization via the D103695 helper script. Confirmed with Intel AoM / Agner / InstLatX64.
Previously only await inside the async function (coroutine after lowering to async runtime) would check the error state
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D109229
This option is used to select between the format headers output column
width option. This option should be independent of the locale setting.
It's encouraged to default to Unicode unless the platform doesn't offer
that option.
[format.string.std]/10
```
For the purposes of width computation, a string is assumed to be in a
locale-independent, implementation-defined encoding. Implementations
should use a Unicode encoding on platforms capable of displaying Unicode
```
Reviewed By: #libc, ldionne, vitaut
Differential Revision: https://reviews.llvm.org/D103379
This implements the initial version of the `std::formatter` class and its specializations. It also implements the following formatting functions:
- `format`
- `vformat`
- `format_to`
- `vformat_to`
- `format_to_n`
- `formatted_size`
All functions have a `char` and `wchar_t` version. Parsing the format-spec and
using the parsed format-spec hasn't been implemented. The code isn't optimized,
neither for speed, nor for size.
The goal is to have the rudimentary basics working, which can be used as a
basis to improve upon. The formatters used in this commit are simple stubs that
will be replaced by real formatters in later commits.
The formatters that are slated to be replaced in this patch series don't have
an availability macro to avoid merge conflicts.
Note the formatter for `bool` uses `0` and `1` instead of "false" and
"true". This will be fixed when the stub is replaced with a real
formatter.
Implements parts of:
- P0645 Text Formatting
Completes:
- LWG3539 format_to must not copy models of output_iterator<const charT&>
Reviewed By: ldionne, #libc, vitaut
Differential Revision: https://reviews.llvm.org/D96664
The change here is basically the same as in D108880: Rather than
looking at bitcasts, look at calls and their function type. We
still need to look through bitcasts to find those calls.
The change in llvm/test/CodeGen/WebAssembly/add-prototypes-conflict.ll
is due to different visitation order. add-prototypes-opaque-ptrs.ll
is a copy of add-prototypes.ll with -force-opaque-pointers.
Differential Revision: https://reviews.llvm.org/D109256
`SVB.getStateManager().getOwningEngine().getAnalysisManager().getAnalyzerOptions()`
is quite a mouthful and might involve a few pointer indirections to get
such a simple thing like an analyzer option.
This patch introduces an `AnalyzerOptions` reference to the `SValBuilder`
abstract class, while refactors a few cases to use this /simpler/ accessor.
Reviewed By: martong, Szelethus
Differential Revision: https://reviews.llvm.org/D108824
Quoting https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html:
> In the absence of the zero-length array extension, in ISO C90 the contents
> array in the example above would typically be declared to have a single
> element.
We should not assume that the size of the //flexible array member// field has
a single element, because in some cases they use it as a fallback for not
having the //zero-length array// language extension.
In this case, the analyzer should return `Unknown` as the extent of the field
instead.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D108230
Create a gpu memset op and corresponding CUDA and ROCm wrappers.
Reviewed By: herhut, lorenrose1013
Differential Revision: https://reviews.llvm.org/D107548
FuncOp always lowers to an LLVM external linkage presently. This makes it impossible to define functions in mlir which are local to the current module. Until MLIR FuncOps have a more formal linkage specification, this commit allows funcop's to have an optionally specified llvm.linkage attribute, whose value will be used as the linkage of the llvm funcop when lowered.
Differential Revision: https://reviews.llvm.org/D108524
Support LLVM linkage
This patch should fix the build failure that surfaced when build llvm
with GCC: https://lab.llvm.org/staging/#/builders/16/builds/10450
GCC complained that I explicitely specialized
`ScriptedPythonInterface::ExtractValueFromPythonObject` in a
in non-namespace scope, which is tolerated by Clang.
To solve this issue, the specialization were declared out of the class
and implemented in the source file.
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
This makes the IR more readable, in particular when this will be used on
the builtin func outside of the LLVM dialect.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D109209
The preprocessor definitions __BYTE_ORDER__, __ORDER_BIG_ENDIAN__, and
__ORDER_LITTLE_ENDIAN__ are gcc extensions (also supported by clang),
but msvc (and others) do not define them. As a result __BYTE_ORDER__
and __ORDER_BIG_ENDIAN__ both evaluate to 0 by the prepreprocessor,
and __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__, the first `#if` condition
to 1, hence assuming the wrong byte order for x86(_64).
This patch instead uses CMake's TestBigEndian module to determine
target architecture's endianness at configure-time.
Note this also uses the same mechanism for the runtime. If compiling
flang as a cross-compiler, the runtime for the compile-target must be
built separately (Flang does not support the LLVM_ENABLE_RUNTIMES
mechanism yet).
Fixes llvm.org/PR51597
Reviewed By: ijan1, Leporacanthicus
Differential Revision: https://reviews.llvm.org/D109108
Fix edge case where "0x" would be considered a complete hexadecimal
number for purposes of str_end. Now the hexadecimal prefix needs a valid
digit after it, else just the 0 will be counted as the number.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D109084
This simplifies setting up sparse tensors through C-style data structures.
Useful for runtimes that want to interact with MLIR-generated code
without knowning about all bufferization details (viz. memrefs).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109251