Using the LLVM rules for install ensures that DESTDIR and other expected
variables for an LLVM install work correctly.
Tested:
Manually with DESTDIR=/tmp/testinstall/ ninja install-llvmlibc
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D129041
The vector types aren't legal with soft float.
Also disable under NoImplicitFloat for good measure.
Fixes PR56351.
Differential Revision: https://reviews.llvm.org/D129060
This patch adds translation for omp.task from OpenMPDialect to LLVM IR
Dialect and adds tests for the same.
Depends on D71989
Reviewed By: ftynse, kiranchandramohan, peixin, Meinersbur
Differential Revision: https://reviews.llvm.org/D123919
The constant expression used in the test will become invalid in
the future. Convert the input into bitcode, so we test that auto-
upgrade happens gracefully once this is the case.
RVV doesn't have immediate field for memory addressing. Currently
we build MachineInstructions in PEI to computing stack offset for
RVV load store instructions. These instructions were added too late to
can be optimized by CSE, LICM... passes.
This patch makes FrameIndex SDNodes can't be matched in RVV Load Store
instruction selection patterns. So that the FrameIndex SDNodes would be
selected as `ADDI GPR, targetframeindex`.
There are 2 advantages for such change:
1. Stack objects address computing can be optimized by machine function
passes.
2. Since the ADDI instruction's destination register can be used as a
temp register, we can save an emergency spill slot.
Differential Revision: https://reviews.llvm.org/D128187
This patch replaces the tight hard cut-off for the number of runtime
checks with a more accurate cost-driven approach.
The new approach allows vectorization with a larger number of runtime
checks in general, but only executes the vector loop (and runtime checks) if
considered profitable at runtime. Profitable here means that the cost-model
indicates that the runtime check cost + vector loop cost < scalar loop cost.
To do that, LV computes the minimum trip count for which runtime check cost
+ vector-loop-cost < scalar loop cost.
Note that there is still a hard cut-off to avoid excessive compile-time/code-size
increases, but it is much larger than the original limit.
The performance impact on standard test-suites like SPEC2006/SPEC2006/MultiSource
is mostly neutral, but the new approach can give substantial gains in cases where
we failed to vectorize before due to the over-aggressive cut-offs.
On AArch64 with -O3, I didn't observe any regressions outside the noise level (<0.4%)
and there are the following execution time improvements. Both `IRSmk` and `srad` are relatively short running, but the changes are far above the noise level for them on my benchmark system.
```
CFP2006/447.dealII/447.dealII -1.9%
CINT2017rate/525.x264_r/525.x264_r -2.2%
ASC_Sequoia/IRSmk/IRSmk -9.2%
Rodinia/srad/srad -36.1%
```
`size` regressions on AArch64 with -O3 are
```
MultiSource/Applications/hbd/hbd 90256.00 106768.00 18.3%
MultiSourc...ks/ASCI_Purple/SMG2000/smg2000 240676.00 257268.00 6.9%
MultiSourc...enchmarks/mafft/pairlocalalign 472603.00 489131.00 3.5%
External/S...2017rate/525.x264_r/525.x264_r 613831.00 630343.00 2.7%
External/S...NT2006/464.h264ref/464.h264ref 818920.00 835448.00 2.0%
External/S...te/538.imagick_r/538.imagick_r 1994730.00 2027754.00 1.7%
MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 1236471.00 1253015.00 1.3%
MultiSource/Applications/oggenc/oggenc 2108147.00 2124675.00 0.8%
External/S.../CFP2006/447.dealII/447.dealII 4742999.00 4759559.00 0.3%
External/S...rate/510.parest_r/510.parest_r 14206377.00 14239433.00 0.2%
```
Reviewed By: lebedev.ri, ebrevnov, dmgreen
Differential Revision: https://reviews.llvm.org/D109368
The refactor in https://reviews.llvm.org/D128230 introduced a new target and the name is not scoped properly, leading to name collisions on larger projects. It is done properly on the target just below, so applying the same pattern here fixes the issue.
Use ConstantFoldBinaryOpOperands() instead, to prepare for the case
where not all binary operators have a constant expression form.
I believe this code actually intended to set OnlyIfReduced=true,
however ConstantExpr::get() actually accepts a Flags argument at
that position (and OnlyIfReducedTy as the next argument), so this
ended up creating a constant expression with some random flag
(probably exact or nuw depending on which).
The previous code made the assumption that the defining
operation is a fir::ConvertOp without checking. This results in
segmentation fault in code like the added test.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D129077
This commit adds:
- Additional test coverage of the DELETE and END commands.
- File names to be read in the line endings test.
- A use of ADDLIB in the nonascii test.
Differential Revision: https://reviews.llvm.org/D128838
This patch slightly extends the limit on the RecursionMaxDepth inside
the SLP vectorizer. It does it only when it hits a load (or zext/sext of
a load), which allows it to peek through in the places where it will be
the most valuable, without ballooning out the O(..) by any 2^n factors.
Differential Revision: https://reviews.llvm.org/D122148
Use ConstantFoldBinaryOpOperands() instead, to handle the case
where not all binary ops have a constant expression variant.
This is a bit awkward because we only want to pop the element from
Ops once we're sure that it has folded.
This in an extension to the code added in D123911 which added vector
combine folding of shuffle-select patterns, attempting to reduce the
total amount of shuffling required in patterns like:
%x = shuffle %i1, %i2
%y = shuffle %i1, %i2
%a = binop %x, %y
%b = binop %x, %y
shuffle %a, %b, selectmask
This patch extends the handing of shuffles that are dependent on one
another, which can arise from the SLP vectorizer, as-in:
%x = shuffle %i1, %i2
%y = shuffle %x
The input shuffles can also be emitted, in which case they are treated
like identity shuffles. This patch also attempts to calculate a better
ordering of input shuffles, which can help getting lower cost input
shuffles, pushing complex shuffles further down the tree.
Differential Revision: https://reviews.llvm.org/D128732
This operation is fallible, but ConstantFoldConstantImpl() is not.
If we fail to fold, we should simply return the original expression.
I don't think this can cause any issues right now, but it becomes
a problem if once make ConstantFoldInstOperandsImpl() not create a
constant expression for everything it possibly could.
Finalization is F2003 and although the runtime supports it already,
lowering is not ensuring all the derived type are finalized properly
when they should. This will require surveying the places where lowering
needs to call it. Add a hard TODO for now.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D129069
Co-authored-by: Jean Perier <jperier@nvidia.com>
This patch puts the code to safely bitcast a predicate, and possibly zero
any undefined lanes when doing a widening cast, into one place and merges
the functionality with lowerConvertToSVBool.
This is some cleanup inspired by D128665.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D128926
Currently, there've been a lot of warnings while building MLIR.
This change fixes the warnings listed below.
.../SparseTensorUtils.cpp: In instantiation of ‘...::openSparseTensorCOO(...) [with ...]’:
.../SparseTensorUtils.cpp:1672:3: required from here
.../SparseTensorUtils.cpp:87:21: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘PrimaryType’ [-Wformat=]
.../OptUtils.cpp:36:5: warning: this statement may fall through [-Wimplicit-fallthrough=]
.../AffineOps.cpp:1741:32: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
Reviewed By: aartbik, wrengr, aeubanks
Differential Revision: https://reviews.llvm.org/D128993
When we do profiling in ASTContext::getAutoType, it wouldn't think about
the canonical declaration for the type constraint. It is bad since it
would cause a negative ODR mismatch while we already know the type
constraint declaration is a redeclaration for the previous one. Also it shouldn't be
bad to use the canonical declaration here.
This revision updates the op semantics to also allow rank-reducing behavior as well
as updates the implementation to reuse code between the sequential and the parallel
version of the op.
Depends on D128920
Differential Revision: https://reviews.llvm.org/D128985
Specifically:
- Diffs are not passed around on mailing lists any more.
- Diffs should be `-U999999`.
- Clarify part about automated emails.
Differential review: https://reviews.llvm.org/D128645
These conditions are later checked in the HoistTerminator code
path. Checking them here is somewhat confusing, because this code
only checks the first instruction in the block, which is not
necessarily the terminator.
This is moslty NFC and will allow tensor.parallel_insert_slice to gain
rank-reducing semantics by reusing the vast majority of the tensor.insert_slice impl.
Depends on D128857
Differential Revision: https://reviews.llvm.org/D128920
This removes the insertvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
This is very similar to the extractvalue removal from D125795.
insertvalue is also not supported in bitcode, so no auto-ugprade
is necessary.
ConstantExpr::getInsertValue() can be replaced with
IRBuilder::CreateInsertValue() or ConstantFoldInsertValueInstruction(),
depending on whether a constant result is required (with the latter
being fallible).
The ConstantExpr::hasIndices() and ConstantExpr::getIndices()
methods also go away here, because there are no longer any constant
expressions with indices.
Differential Revision: https://reviews.llvm.org/D128719
This handles the code we get for this.
int foo(unsigned x, int *y) {
return y[x >> 3];
}
The srl and shl implied by the array index will be combined to
form (srl (and X, C2), C1). We need to reverse this get to back
the shl to fold into SHXADD.
Some more complex cases require checking the relationship of
operands on different nodes of the match. They also require
additional instructions to be created. Using a ComplexPattern
gives us that flexibility.
I'll be adding another pattern in a future patch.
If clang modules are not enabled it becomes unnecessary to read the session timestamp file in order
to pass `-fbuild-session-timestamp` to the `cc1` invocation.
Differential Revision: https://reviews.llvm.org/D129030
variable with its multiple aliases.
This patch handles the case where a variable has
multiple aliases.
AIX's assembly directive .set is not usable for the
aliasing purpose, and using different labels allows
AIX to emulate symbol aliases. If a value is emitted
between any two labels, meaning they are not aligned,
XCOFF will automatically calculate the offset for them.
This patch implements:
1) Emits the label of the alias just before emitting
the value of the sub-element that the alias referred to.
2) A set of aliases that refers to the same offset
should be aligned.
3) We didn't emit aliasing labels for common and
zero-initialized local symbols in
PPCAIXAsmPrinter::emitGlobalVariableHelper, but
emitted linkage for them in
AsmPrinter::emitGlobalAlias, which caused a FAILURE.
This patch fixes the bug by blocking emitting linkage
for the alias without a label.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D124654
When running tests, the check_clang_tidy script encodes the output
string, making it hard to read when debugging checks. This removes the
.encode() call.
Test Plan:
Making a new default check for testing (as of right now, it includes a
failing test):
[~/llvm-project/clang-tools-extra] python3 clang-tidy/add_new_check.py
bugprone example
<...>
Pre-changes:
[~/llvm-project/build] ninja check-clang-tools
<...>
------------------------ clang-tidy output -----------------------
b"1 warning
generated.\n/data/users/nvankempen/llvm-project/build/Debug/tools/clang/tools/extra/test/clang-tidy/checkers/Output/bugprone-example.cpp.tmp.cpp:4:6:
warning: function 'f' is insufficiently awesome [bugprone-example]\nvoid
f();\n
^\n/data/users/nvankempen/llvm-project/build/Debug/tools/clang/tools/extra/test/clang-tidy/checkers/Output/bugprone-example.cpp.tmp.cpp:4:6:
note: insert 'awesome'\nvoid f();\n ^\n awesome_\n"
------------------------------------------------------------------
<...>
Post-changes:
[~/llvm-project/build] ninja check-clang-tools
<...>
------------------------ clang-tidy output -----------------------
1 warning generated.
/data/users/nvankempen/llvm-project/build/Debug/tools/clang/tools/extra/test/clang-tidy/checkers/Output/bugprone-example.cpp.tmp.cpp:4:6:
warning: function 'f' is insufficiently awesome [bugprone-example]
void f();
^
/data/users/nvankempen/llvm-project/build/Debug/tools/clang/tools/extra/test/clang-tidy/checkers/Output/bugprone-example.cpp.tmp.cpp:4:6:
note: insert 'awesome'
void f();
^
awesome_
------------------------------------------------------------------
<...>
Differential Revision: https://reviews.llvm.org/D127807