llvm/mlir/docs/PDLL.md
River Riddle 9595f3568a [mlir:PDL] Remove the ConstantParams support from native Constraints/Rewrites
This support has never really worked well, and is incredibly clunky to
use (it effectively creates two argument APIs), and clunky to generate (it isn't
clear how we should actually expose this from PDL frontends). Treating these
as just attribute arguments is much much cleaner in every aspect of the stack.
If we need to optimize lots of constant parameters, it would be better to
investigate internal representation optimizations (e.g. batch attribute creation),
that do not affect the user (we want a clean external API).

Differential Revision: https://reviews.llvm.org/D121569
2022-03-19 13:28:24 -07:00

1400 lines
49 KiB
Markdown

# PDLL - PDL Language
This document details the PDL Language (PDLL), a custom frontend language for
writing pattern rewrites targeting MLIR.
Note: This document assumes a familiarity with MLIR concepts; more specifically
the concepts detailed within the
[MLIR Pattern Rewriting](https://mlir.llvm.org/docs/PatternRewriter/) and
[Operation Definition Specification (ODS)](https://mlir.llvm.org/docs/OpDefinitions/)
documentation.
[TOC]
## Introduction
Pattern matching is an extremely important component within MLIR, as it
encompasses many different facets of the compiler. From canonicalization, to
optimization, to conversion; every MLIR based compiler will heavily rely on the
pattern matching infrastructure in some capacity.
The PDL Language (PDLL) provides a declarative pattern language designed from
the ground up for representing MLIR pattern rewrites. PDLL is designed to
natively support writing matchers on all of MLIRs constructs via an intuitive
interface that may be used for both ahead-of-time (AOT) and just-in-time (JIT)
pattern compilation.
## Rationale
This section provides details on various design decisions, their rationale, and
alternatives considered when designing PDLL. Given the nature of software
development, this section may include references to areas of the MLIR compiler
that no longer exist.
### Why build a new language instead of improving TableGen DRR?
Note: The section assumes familiarity with
[TDRR](https://mlir.llvm.org/docs/DeclarativeRewrites/), please refer the
relevant documentation before continuing.
Tablegen DRR (TDRR), i.e.
[Table-driven Declarative Rewrite Rules](https://mlir.llvm.org/docs/DeclarativeRewrites/),
is a declarative DSL for defining MLIR pattern rewrites within the
[TableGen](https://llvm.org/docs/TableGen/index.html) language. This
infrastructure is currently the main way in which patterns may be defined
declaratively within MLIR. TDRR utilizes TableGen's `dag` support to enable
defining MLIR patterns that fit nicely within a DAG structure; in a similar way
in which tablegen has been used to defined patterns for LLVM's backend
infrastructure (SelectionDAG/Global Isel/etc.). Unfortunately however, the
TableGen language is not as amenable to the structure of MLIR patterns as it has
been for LLVM.
The issues with TDRR largely stem from the use of TableGen as the host language
for the DSL. These issues have risen from a mismatch in the structure of
TableGen compared to the structure of MLIR, and from TableGen having different
motivational goals than MLIR. A majority (or all depending on how stubborn you
are) of the issues that we've come across with TDRR have been addressable in
some form; the sticking point here is that the solutions to these problems have
often been more "creative" than we'd like. This is a problem, and why we decided
not to invest a larger effort into improving TDRR; users generally don't want
"creative" APIs, they want something that is intuitive to read/write.
To highlight some of these issues, below we will take a tour through some of the
problems that have arisen, and how we "fixed" them.
#### Multi-result operations
MLIR natively supports a variable number of operation results. For the DAG based
structure of TDRR, any form of multiple results (operations in this instance)
creates a problem. This is because the DAG wants a single root node, and does
not have nice facilities for indexing or naming the multiple results. Let's take
a look at a quick example to see how this manifests:
```tablegen
// Suppose we have a three result operation, defined as seen below.
def ThreeResultOp : Op<"three_result_op"> {
let arguments = (ins ...);
let results = (outs
AnyTensor:$output1,
AnyTensor:$output2,
AnyTensor:$output3
);
}
// To bind the results of `ThreeResultOp` in a TDRR pattern, we bind all results
// to a single name and use a special naming convention: `__N`, where `N` is the
// N-th result.
def : Pattern<(ThreeResultOp:$results ...),
[(... $results__0), ..., (... $results__2), ...]>;
```
In TDRR, we "solved" the problem of accessing multiple results, but this isn't a
very intuitive interface for users. Magical naming conventions obfuscate the
code and can easily introduce bugs and other errors. There are various things
that we could try to improve this situation, but there is a fundamental limit to
what we can do given the limits of the TableGen dag structure. In PDLL, however,
we have the freedom and flexibility to provide a proper interface into
operations, regardless of their structure:
```pdll
// Import our definition of `ThreeResultOp`.
#include "ops.td"
Pattern {
...
// In PDLL, we can directly reference the results of an operation variable.
// This provides a closer mental model to what the user expects.
let threeResultOp = op<my_dialect.three_result_op>;
let userOp = op<my_dialect.user_op>(threeResultOp.output1, ..., threeResultOp.output3);
...
}
```
#### Constraints
In TDRR, the match dag defines the general structure of the input IR to match.
Any non-structural/non-type constraints on the input are generally relegated to
a list of constraints specified after the rewrite dag. For very simple patterns
this may suffice, but with larger patterns it becomes quite problematic as it
separates the constraint from the entity it constrains and negatively impacts
the readability of the pattern. As an example, let's look at a simple pattern
that adds additional constraints to its inputs:
```tablegen
// Suppose we have a two result operation, defined as seen below.
def TwoResultOp : Op<"two_result_op"> {
let arguments = (ins ...);
let results = (outs
AnyTensor:$output1,
AnyTensor:$output2
);
}
// A simple constraint to check if a value is use_empty.
def HasNoUseOf: Constraint<CPred<"$_self.use_empty()">, "has no use">;
// Check if two values have a ShapedType with the same element type.
def HasSameElementType : Constraint<
CPred<"$0.getType().cast<ShapedType>().getElementType() == "
"$1.getType().cast<ShapedType>().getElementType()">,
"values have same element type">;
def : Pattern<(TwoResultOp:$results $input),
[(...), (...)],
[(HasNoUseOf:$results__1),
(HasSameElementType $results__0, $input)]>;
```
Above, when observing the constraints we need to search through the input dag
for the inputs (also keeping in mind the magic naming convention for multiple
results). For this simple pattern it may be just a few lines above, but complex
patterns often grow to 10s of lines long. In PDLL, these constraints can be
applied directly on or next to the entities they apply to:
```pdll
// The same constraints that we defined above:
Constraint HasNoUseOf(value: Value) [{
return success(value.use_empty());
}];
Constraint HasSameElementType(value1: Value, value2: Value) [{
return success(value1.getType().cast<ShapedType>().getElementType() ==
value2.getType().cast<ShapedType>().getElementType());
}];
Pattern {
// In PDLL, we can apply the constraint as early (or as late) as we want. This
// enables better structuring of the matcher code, and improves the
// readability/maintainability of the pattern.
let op = op<my_dialect.two_result_op>(input: Value);
HasNoUseOf(op.output2);
HasSameElementType(input, op.output2);
// ...
}
```
#### Replacing Multiple Operations
Often times a pattern will transform N number of input operations into N number
of result operations. In PDLL, replacing multiple operations is as simple as
adding two [`replace` statements](#replace-statement). In TDRR, the situation is
a bit more nuanced. Given the single root structure of the TableGen dag,
replacing a non-root operation is not nicely supported. It currently isn't
natively possible, and instead requires using multiple patterns. We could
potentially add another special rewrite directive, or extend `replaceWithValue`,
but this simply highlights how even a basic IR transformation is muddled by the
complexity of the host language.
### Why not build a DSL in "X"?
Yes! Well yes and no. To understand why, we have to consider what types of users
we are trying to serve and what constraints we enforce upon them. The goal of
PDLL is to provide a default and effective pattern language for MLIR that all
users of MLIR can interact with immediately, regardless of their host
environment. This language is available with no extra dependencies and comes
"free" along with MLIR. If we were to use an existing host language to build our
new DSL, we would need to make compromises along with it depending on the
language. For some, there are questions of how to enforce matching environments
(python2 or python3?, which version?), performance considerations, integration,
etc. As an LLVM project, this could also mean enforcing a new language
dependency on the users of MLIR (many of which may not want/need such a
dependency otherwise). Another issue that comes along with any DSL that is
embeded in another language: mitigating the user impedance mismatch between what
the user expects from the host language and what our "backend" supports. For
example, the PDL IR abstraction only contains limited support for control flow.
If we were to build a DSL in python, we would need to ensure that complex
control flow is either handled completely or effectively errors out. Even with
ideal error handling, not having the expected features available creates user
frustration. In addition to the environment constraints, there is also the issue
of language tooling. With PDLL we intend to build a very robust and modern
toolset that is designed to cater the needs of pattern developers, including
code completion, signature help, and many more features that are specific to the
problem we are solving. Integrating custom language tooling into existing
languages can be difficult, and in some cases impossible (as our DSL would
merely be a small subset of the existing language).
These various points have led us to the initial conclusion that the most
effective tool we can provide for our users is a custom tool designed for the
problem at hand. With all of that being said, we understand that not all users
have the same constraints that we have placed upon ourselves. We absolutely
encourage and support the existence of various PDL frontends defined in
different languages. This is one of the original motivating factors around
building the PDL IR abstraction in the first place; to enable innovation and
flexibility for our users (and in turn their users). For some, such as those in
research and the Machine Learning space, they may already have a certain
language (such as Python) heavily integrated into their workflow. For these
users, a PDL DSL in their language may be ideal and we will remain committed to
supporting and endorsing that from an infrastructure point-of-view.
## Language Specification
Note: PDLL is still under active development, and the designs discussed below
are not necessarily final and may be subject to change.
The design of PDLL is heavily influenced and centered around the
[PDL IR abstraction](https://mlir.llvm.org/docs/Dialects/PDLOps/), which in turn
is designed as an abstract model of the core MLIR structures. This leads to a
design and structure that feels very similar to if you were directly writing the
IR you want to match.
### Includes
PDLL supports an `include` directive to import content defined within other
source files. There are two types of files that may be included: `.pdll` and
`.td` files.
#### `.pdll` includes
When including a `.pdll` file, the contents of that file are copied directly into
the current file being processed. This means that any patterns, constraints,
rewrites, etc., defined within that file are processed along with those within
the current file.
#### `.td` includes
When including a `.td` file, PDLL will automatically import any pertinent
[ODS](https://mlir.llvm.org/docs/OpDefinitions/) information within that file.
This includes any defined operations, constraints, interfaces, and more, making
them implicitly accessible within PDLL. This is important, as ODS information
allows for certain PDLL constructs, such as the
[`operation` expression](#operation), to become much more powerful.
### Patterns
In any pattern descriptor language, pattern definition is at the core. In PDLL,
patterns start with `Pattern` optionally followed by a name and a set of pattern
metadata, and finally terminated by a pattern body. A few simple examples are
shown below:
```pdll
// Here we have defined an anonymous pattern:
Pattern {
// Pattern bodies are separated into two components:
// * Match Section
// - Describes the input IR.
let root = op<toy.reshape>(op<toy.reshape>(arg: Value));
// * Rewrite Section
// - Describes how to transform the IR.
// - Last statement starts the rewrite.
replace root with op<toy.reshape>(arg);
}
// Here we have defined a pattern named `ReshapeReshapeOptPattern` with a
// benefit of 10:
Pattern ReshapeReshapeOptPattern with benefit(10) {
replace op<toy.reshape>(op<toy.reshape>(arg: Value))
with op<toy.reshape>(arg);
}
```
After the definition of the pattern metadata, we specify the pattern body. The
structure of a pattern body is comprised of two main sections, the `match`
section and the `rewrite` section. The `match` section of a pattern describes
the expected input IR, whereas the `rewrite` section describes how to transform
that IR. This distinction is an important one to make, as PDLL handles certain
variables and expressions differently within the different sections. When
relevant in each of the sections below, we shall explicitly call out any
behavioral differences.
The general layout of the `match` and `rewrite` section is as follows: the
*last* statement of the pattern body is required to be a
[`operation rewrite statement`](#operation-rewrite-statements), and denotes the
`rewrite` section; every statement before denotes the `match` section.
#### Pattern metadata
Rewrite patterns in MLIR have a set of metadata that allow for controlling
certain behaviors, and providing information to the rewrite driver applying the
pattern. In PDLL, a pattern can provide a non-default value for this metadata
after the pattern name. Below, examples are shown for the different types of
metadata supported:
##### Benefit
The benefit of a Pattern is an integer value that represents the "benefit" of
matching that pattern. It is used by pattern drivers to determine the relative
priorities of patterns during application; a pattern with a higher benefit is
generally applied before one with a lower benefit.
In PDLL, a pattern has a default benefit set to the number of input operations,
i.e. the number of distinct `Op` expressions/variables, in the match section. This
rule is driven by an observation that larger matches are more beneficial than smaller
ones, and if a smaller one is applied first the larger one may not apply anymore.
Patterns can override this behavior by specifying the benefit in the metadata section
of the pattern:
```pdll
// Here we specify that this pattern has a benefit of `10`, overriding the
// default behavior.
Pattern with benefit(10) {
...
}
```
##### Bounded Rewrite Recursion
During pattern application, there are situations in which a pattern may be
applicable to the result of a previous application of that same pattern. If the
pattern does not properly handle this recusive application, the pattern driver
could become stuck in an infinite loop of application. To prevent this, patterns
by-default are assumed to not have proper recursive bounding and will not be
recursively applied. A pattern can signal that it does have proper handling for
recursion by specifying the `recusion` flag in the pattern metadata section:
```pdll
// Here we signal that this pattern properly bounds recursive application.
Pattern with recusion {
...
}
```
#### Single Line "Lambda" Body
Patterns generally define their body using a compound block of statements, as
shown below:
```pdll
Pattern {
replace op<my_dialect.foo>(operands: ValueRange) with operands;
}
```
Patterns also support a lambda-like syntax for specifying simple single line
bodies. The lambda body of a Pattern expects a single
[operation rewrite statement](#operation-rewrite-statements):
```pdll
Pattern => replace op<my_dialect.foo>(operands: ValueRange) with operands;
```
### Variables
Variables in PDLL represent specific instances of IR entities, such as `Value`s,
`Operation`s, `Type`s, etc. Consider the simple pattern below:
```pdll
Pattern {
let value: Value;
let root = op<mydialect.foo>(value);
replace root with value;
}
```
In this pattern we define two variables, `value` and `root`, using the `let`
statement. The `let` statement allows for defining variables and constraining
them. Every variable in PDLL is of a certain type, which defines the type of IR
entity the variable represents. The type of a variable may be determined via
either a constraint, or an initializer expression.
#### Variable "Binding"
In addition to having a type, variables must also be "bound", either via an initializer
expression or to a non-native constraint or rewrite use within the `match` section of the
pattern. "Binding" a variable contextually identifies that variable within either the
input (i.e. `match` section) or output (i.e. `rewrite` section) IR. In the `match` section,
this allows for building the match tree from the pattern's root operation, which must be
"bound" to the [operation rewrite statement](#operation-rewrite-statements) that denotes the
`rewrite` section of the pattern. All non-root variables within the `match`
section must be bound in some way to the "root" operation. To help illustrate
the concept, let's take a look at a quick example. Consider the `.mlir` snippet
below:
```mlir
func @baz(%arg: i32) {
%result = my_dialect.foo %arg, %arg -> i32
}
```
Say that we want to write a pattern that matches `my_dialect.foo` and replaces
it with its unique input argument. A naive way to write this pattern in PDLL is
shown below:
```pdll
Pattern {
// ** match section ** //
let arg: Value;
let root = op<my_dialect.foo>(arg, arg);
// ** rewrite section ** //
replace root with arg;
}
```
In the above pattern, the `arg` variable is "bound" to the first and second operands
of the `root` operation. Every use of `arg` is constrained to be the same `Value`, i.e.
the first and second operands of `root` will be constrained to refer to the same input
Value. The same is true for the `root` operation, it is bound to the "root" operation of the
pattern as it is used in input of the top-level [`replace` statement](#replace-statement)
of the `rewrite` section of the pattern. Writing this pattern using the C++ API, the concept
of "binding" becomes more clear:
```c++
struct Pattern : public OpRewritePattern<my_dialect::FooOp> {
LogicalResult matchAndRewrite(my_dialect::FooOp root, PatternRewriter &rewriter) {
Value arg = root->getOperand(0);
if (arg != root->getOperand(1))
return failure();
rewriter.replaceOp(root, arg);
return success();
}
};
```
If a variable is not "bound" properly, PDLL won't be able to identify what value
it would correspond to in the IR. As a final example, let's consider a variable
that hasn't been bound:
```pdll
Pattern {
// ** match section ** //
let arg: Value;
let root = op<my_dialect.foo>
// ** rewrite section ** //
replace root with arg;
}
```
If we were to write this exact pattern in C++, we would end up with:
```c++
struct Pattern : public OpRewritePattern<my_dialect::FooOp> {
LogicalResult matchAndRewrite(my_dialect::FooOp root, PatternRewriter &rewriter) {
// `arg` was never bound, so we don't know what input Value it was meant to
// correspond to.
Value arg;
rewriter.replaceOp(root, arg);
return success();
}
};
```
#### Variable Constraints
```pdll
// This statement defines a variable `value` that is constrained to be a `Value`.
let value: Value;
// This statement defines a variable `value` that is constrained to be a `Value`
// *and* constrained to have a single use.
let value: [Value, HasOneUse];
```
Any number of single entity constraints may be attached directly to a variable
upon declaration. Within the `matcher` section, these constraints may add
additional checks on the input IR. Within the `rewriter` section, constraints
are *only* used to define the type of the variable. There are a number of
builtin constraints that correlate to the core MLIR constructs: `Attr`, `Op`,
`Type`, `TypeRange`, `Value`, `ValueRange`. Along with these, users may define
custom constraints that are implemented within PDLL, or natively (i.e. outside
of PDLL). See the general [Constraints](#constraints) section for more detailed
information.
#### Inline Variable Definition
Along with the `let` statement, variables may also be defined inline by
specifying the constraint list along with the desired variable name in the first
place that the variable would be used. After definition, the variable is visible
from all points forward. See below for an example:
```pdll
// `value` is used as an operand to the operation `root`:
let value: Value;
let root = op<my_dialect.foo>(value);
replace root with value;
// `value` could also be defined "inline":
let root = op<my_dialect.foo>(value: Value);
replace root with value;
```
Note that the point of definition of an inline variable is the point of reference,
meaning that an inline variable can be used immediately in the same parent
expression within which it was defined:
```pdll
let root = op<my_dialect.foo>(value: Value, _: Value, value);
replace root with value;
```
##### Wildcard Variable Definition
Often times when defining a variable inline, the variable isn't intended to be
used anywhere else in the pattern. For example, this may happen if you want to
attach constraints to a variable but have no other use for it. In these
situations, the "wildcard" variable can be used to remove the need to provide a
name, as "wildcard" variables are not visible outside of the point of
definition. An example is shown below:
```pdll
Pattern {
let root = op<my_dialect.foo>(arg: Value, _: Value, _: [Value, I64Value], arg);
replace root with arg;
}
```
In the above example, the second operand isn't needed for the pattern but we
need to provide it to signal that a second operand does exist (we just don't
care what it is in this pattern).
### Operation Expression
An operation expression in PDLL represents an MLIR operation. In the `match`
section of the pattern, this expression models one of the input operations to
the pattern. In the `rewrite` section of the pattern, this expression models one
of the operations to create. The general structure of the operation expression
is very similar to that of the "generic form" of textual MLIR assembly:
```pdll
let root = op<my_dialect.foo>(operands: ValueRange) {attr = attr: Attr} -> (resultTypes: TypeRange);
```
Let's walk through each of the different components of the expression:
#### Operation name
The operation name signifies which type of MLIR Op this operation corresponds
to. In the `match` section of the pattern, the name may be elided. This would
cause this pattern to match *any* operation type that satifies the rest of the
constraints of the operation. In the `rewrite` section, the name is required.
```pdll
// `root` corresponds to an instance of a `my_dialect.foo` operation.
let root = op<my_dialect.foo>;
// `root` could be an instance of any operation type.
let root = op<>;
```
#### Operands
The operands section corresponds to the operands of the operation. This section
of an operation expression may be elided, in which case the operands are not
constrained in any way. When present, the operands of an operation expression
are interpreted in the following ways:
1) A single instance of type `ValueRange`:
In this case, the single range is treated as all of the operands of the
operation:
```pdll
// Define an instance with single range of operands.
let root = op<my_dialect.foo>(allOperands: ValueRange);
```
2) A variadic number of either `Value` or `ValueRange`:
In this case, the inputs are expected to correspond with the operand groups as
defined on the operation in ODS.
Given the following operation definition in ODS:
```tablegen
def MyIndirectCallOp {
let arguments = (ins FunctionType:$call, Variadic<AnyType>:$args);
}
```
We can match the operands as so:
```pdll
let root = op<my_dialect.indirect_call>(call: Value, args: ValueRange);
```
#### Results
The results section corresponds to the result types of the operation. This
section of an operation expression may be elided, in which case the result types
are not constrained in any way. When present, the result types of an operation
expression are interpreted in the following ways:
1) A single instance of type `TypeRange`:
In this case, the single range is treated as all of the result types of the
operation:
```pdll
// Define an instance with single range of types.
let root = op<my_dialect.foo> -> (allResultTypes: TypeRange);
```
2) A variadic number of either `Type` or `TypeRange`:
In this case, the inputs are expected to correspond with the result groups as
defined on the operation in ODS.
Given the following operation definition in ODS:
```tablegen
def MyOp {
let results = (outs SomeType:$result, Variadic<SomeType>:$otherResults);
}
```
We can match the result types as so:
```pdll
let root = op<my_dialect.op> -> (result: Type, otherResults: TypeRange);
```
#### Attributes
The attributes section of the operation expression corresponds to the attribute
dictionary of the operation. This section of an operation expression may be
elided, in which case the attributes are not constrained in any way. The
composition of this component maps exactly to how attribute dictionaries are
structured in the MLIR textual assembly format:
```pdll
let root = op<my_dialect.foo> {attr1 = attrValue: Attr, attr2 = attrValue2: Attr};
```
Within the `{}` attribute entries are specified by an identifier or string name,
corresponding to the attribute name, followed by an assignment to the attribute
value. If the attribute value is elided, the value of the attribute is
implicitly defined as a
[`UnitAttr`](https://mlir.llvm.org/docs/Dialects/Builtin/#unitattr).
```pdll
let unitConstant = op<my_dialect.constant> {value};
```
##### Accessing Operation Results
In multi-operation patterns, the result of one operation often feeds as an input
into another. The result groups of an operation may be accessed by name or by
index via the `.` operator:
Note: Remember to import the definition of your operation via
[include](#`.td`_includes) to ensure it is visible to PDLL.
Given the following operation definition in ODS:
```tablegen
def MyResultOp {
let results = (outs SomeType:$result);
}
def MyInputOp {
let arguments = (ins SomeType:$input, SomeType:$input);
}
```
We can write a pattern where `MyResultOp` feeds into `MyInputOp` as so:
```pdll
// In this example, we use both `result`(the name) and `0`(the index) to refer to
// the first result group of `resultOp`.
// Note: If we elide the result types section within the match section, it means
// they aren't constrained, not that the operation has no results.
let resultOp = op<my_dialect.result_op>;
let inputOp = op<my_dialect.input_op>(resultOp.result, resultOp.0);
```
Along with result name access, variables of `Op` type may implicitly convert to
`Value` or `ValueRange`. These variables are converted to `Value` when they are
known (via ODS) to only have one result, in all other cases they convert to
`ValueRange`:
```pdll
// `resultOp` may also convert implicitly to a Value for use in `inputOp`:
let resultOp = op<my_dialect.result_op>;
let inputOp = op<my_dialect.input_op>(resultOp);
// We could also inline `resultOp` directly:
let inputOp = op<my_dialect.input_op>(op<my_dialect.result_op>);
```
### Attribute Expression
An attribute expression represents a literal MLIR attribute. It allows for
statically specifying an MLIR attribute to use, by specifying the textual form
of that attribute.
```pdll
let trueConstant = op<arith.constant> {value = attr<"true">};
let applyResult = op<affine.apply>(args: ValueRange) {map = attr<"affine_map<(d0, d1) -> (d1 - 3)>">}
```
### Type Expression
A type expression represents a literal MLIR type. It allows for statically
specifying an MLIR type to use, by specifying the textual form of that type.
```pdll
let i32Constant = op<arith.constant> -> (type<"i32">);
```
### Tuples
PDLL provides native support for tuples, which are used to group multiple
elements into a single compound value. The values in a tuple can be of any type,
and do not need to be of the same type. There is also no limit to the number of
elements held by a tuple. The elements of a tuple can be accessed by index:
```pdll
let tupleValue = (op<my_dialect.foo>, attr<"10 : i32">, type<"i32">);
let opValue = tupleValue.0;
let attrValue = tupleValue.1;
let typeValue = tupleValue.2;
```
You can also name the elements of a tuple and use those names to refer to the
values of the individual elements. An element name consists of an identifier
followed immediately by an equal (=).
```pdll
let tupleValue = (
opValue = op<my_dialect.foo>,
attr<"10 : i32">,
typeValue = type<"i32">
);
let opValue = tupleValue.opValue;
let attrValue = tupleValue.1;
let typeValue = tupleValue.typeValue;
```
Tuples are used to represent multiple results from a
[constraint](#constraints-with-multiple-results) or
[rewrite](#rewrites-with-multiple-results).
### Constraints
Constraints provide the ability to inject additional checks on the input IR
within the `match` section of a pattern. Constraints can be applied anywhere
within the `match` section, and depending on the type can either be applied via
the constraint list of a [variable](#variables) or via the call operator (e.g.
`MyConstraint(...)`). There are three main categories of constraints:
#### Core Constraints
PDLL defines a number of core constraints that constrain the type of the IR
entity. These constraints can only be applied via the
[constraint list](#variable-constraints) of a variable.
* `Attr` (`<` type `>`)?
A single entity constraint that corresponds to an `mlir::Attribute`. This
constraint optionally takes a type component that constrains the result type of
the attribute.
```pdll
// Define a simple variable using the `Attr` constraint.
let attr: Attr;
let constant = op<arith.constant> {value = attr};
// Define a simple variable using the `Attr` constraint, that has its type
// constrained as well.
let attrType: Type;
let attr: Attr<attrType>;
let constant = op<arith.constant> {value = attr};
```
* `Op` (`<` op-name `>`)?
A single entity constraint that corresponds to an `mlir::Operation *`.
```pdll
// Match only when the input is from another operation.
let inputOp: Op;
let root = op<my_dialect.foo>(inputOp);
// Match only when the input is from another `my_dialect.foo` operation.
let inputOp: Op<my_dialect.foo>;
let root = op<my_dialect.foo>(inputOp);
```
* `Type`
A single entity constraint that corresponds to an `mlir::Type`.
```pdll
// Define a simple variable using the `Type` constraint.
let resultType: Type;
let root = op<my_dialect.foo> -> (resultType);
```
* `TypeRange`
A single entity constraint that corresponds to a `mlir::TypeRange`.
```pdll
// Define a simple variable using the `TypeRange` constraint.
let resultTypes: TypeRange;
let root = op<my_dialect.foo> -> (resultTypes);
```
* `Value` (`<` type-expr `>`)?
A single entity constraint that corresponds to an `mlir::Value`. This constraint
optionally takes a type component that constrains the result type of the value.
```pdll
// Define a simple variable using the `Value` constraint.
let value: Value;
let root = op<my_dialect.foo>(value);
// Define a variable using the `Value` constraint, that has its type constrained
// to be same as the result type of the `root` op.
let valueType: Type;
let input: Value<valueType>;
let root = op<my_dialect.foo>(input) -> (valueType);
```
* `ValueRange` (`<` type-expr `>`)?
A single entity constraint that corresponds to a `mlir::ValueRange`. This
constraint optionally takes a type component that constrains the result types of
the value range.
```pdll
// Define a simple variable using the `ValueRange` constraint.
let inputs: ValueRange;
let root = op<my_dialect.foo>(inputs);
// Define a variable using the `ValueRange` constraint, that has its types
// constrained to be same as the result types of the `root` op.
let valueTypes: TypeRange;
let inputs: ValueRange<valueTypes>;
let root = op<my_dialect.foo>(inputs) -> (valueTypes);
```
#### Defining Constraints in PDLL
Aside from the core constraints, additional constraints can also be defined
within PDLL. This allows for building matcher fragments that can be composed
across many different patterns. A constraint in PDLL is defined similarly to a
function in traditional programming languages; it contains a name, a set of
input arguments, a set of result types, and a body. Results of a constraint are
returned via a `return` statement. A few examples are shown below:
```pdll
/// A constraint that takes an input and constrains the use to an operation of
/// a given type.
Constraint UsedByFooOp(value: Value) {
op<my_dialect.foo>(value);
}
/// A constraint that returns a result of an existing operation.
Constraint ExtractResult(op: Op<my_dialect.foo>) -> Value {
return op.result;
}
Pattern {
let value = ExtractResult(op<my_dialect.foo>);
UsedByFooOp(value);
}
```
##### Constraints with multiple results
Constraints can return multiple results by returning a tuple of values. When
returning multiple results, each result can also be assigned a name to use when
indexing that tuple element. Tuple elements can be referenced by their index
number, or by name if they were assigned one.
```pdll
// A constraint that returns multiple results, with some of the results assigned
// a more readable name.
Constraint ExtractMultipleResults(op: Op<my_dialect.foo>) -> (Value, result1: Value) {
return (op.result1, op.result2);
}
Pattern {
// Return a tuple of values.
let result = ExtractMultipleResults(op: op<my_dialect.foo>);
// Index the tuple elements by index, or by name.
replace op<my_dialect.foo> with (result.0, result.1, result.result1);
}
```
##### Constraint result type inference
In addition to explicitly specifying the results of the constraint via the
constraint signature, PDLL defined constraints also support inferring the result
type from the return statement. Result type inference is active whenever the
constraint is defined with no result constraints:
```pdll
// This constraint returns a derived operation.
Constraint ReturnSelf(op: Op<my_dialect.foo>) {
return op;
}
// This constraint returns a tuple of two Values.
Constraint ExtractMultipleResults(op: Op<my_dialect.foo>) {
return (result1 = op.result1, result2 = op.result2);
}
Pattern {
let values = ExtractMultipleResults(op<my_dialect.foo>);
replace op<my_dialect.foo> with (values.result1, values.result2);
}
```
##### Single Line "Lambda" Body
Constraints generally define their body using a compound block of statements, as
shown below:
```pdll
Constraint ReturnSelf(op: Op<my_dialect.foo>) {
return op;
}
Constraint ExtractMultipleResults(op: Op<my_dialect.foo>) {
return (result1 = op.result1, result2 = op.result2);
}
```
Constraints also support a lambda-like syntax for specifying simple single line
bodies. The lambda body of a Constraint expects a single expression, which is
implicitly returned:
```pdll
Constraint ReturnSelf(op: Op<my_dialect.foo>) => op;
Constraint ExtractMultipleResults(op: Op<my_dialect.foo>)
=> (result1 = op.result1, result2 = op.result2);
```
#### Native Constraints
Constraints may also be defined outside of PDLL, and registered natively within
the C++ API.
##### Importing existing Native Constraints
Constraints defined externally can be imported into PDLL by specifying a
constraint "declaration". This is similar to the PDLL form of defining a
constraint but omits the body. Importing the declaration in this form allows for
PDLL to statically know the expected input and output types.
```pdll
// Import a single entity value native constraint that checks if the value has a
// single use. This constraint must be registered by the consumer of the
// compiled PDL.
Constraint HasOneUse(value: Value);
// Import a multi-entity type constraint that checks if two values have the same
// element type.
Constraint HasSameElementType(value1: Value, value2: Value);
Pattern {
// A single entity constraint can be applied via the variable argument list.
let value: HasOneUse;
// Otherwise, constraints can be applied via the call operator:
let value: Value = ...;
let value2: Value = ...;
HasOneUse(value);
HasSameElementType(value, value2);
}
```
External constraints are those registered explicitly with the `RewritePatternSet` via
the C++ PDL API. For example, the constraints above may be registered as:
```c++
// TODO: Cleanup when we allow more accessible wrappers around PDL functions.
static LogicalResult hasOneUseImpl(PDLValue pdlValue, PatternRewriter &rewriter) {
Value value = pdlValue.cast<Value>();
return success(value.hasOneUse());
}
static LogicalResult hasSameElementTypeImpl(ArrayRef<PDLValue> pdlValues,
PatternRewriter &rewriter) {
Value value1 = pdlValues[0].cast<Value>();
Value value2 = pdlValues[1].cast<Value>();
return success(value1.getType().cast<ShapedType>().getElementType() ==
value2.getType().cast<ShapedType>().getElementType());
}
void registerNativeConstraints(RewritePatternSet &patterns) {
patternList.getPDLPatterns().registerConstraintFunction(
"HasOneUse", hasOneUseImpl);
patternList.getPDLPatterns().registerConstraintFunction(
"HasSameElementType", hasSameElementTypeImpl);
}
```
##### Defining Native Constraints in PDLL
In addition to importing native constraints, PDLL also supports defining native
constraints directly when compiling ahead-of-time (AOT) for C++. These
constraints can be defined by specifying a string code block after the
constraint declaration:
```pdll
Constraint HasOneUse(value: Value) [{
return success(value.hasOneUse());
}];
Constraint HasSameElementType(value1: Value, value2: Value) [{
return success(value1.getType().cast<ShapedType>().getElementType() ==
value2.getType().cast<ShapedType>().getElementType());
}];
Pattern {
// A single entity constraint can be applied via the variable argument list.
let value: HasOneUse;
// Otherwise, constraints can be applied via the call operator:
let value: Value = ...;
let value2: Value = ...;
HasOneUse(value);
HasSameElementType(value, value2);
}
```
The arguments of the constraint are accessible within the code block via the
same name. The type of these native variables are mapped directly to the
corresponding MLIR type of the [core constraint](#core-constraints) used. For
example, an `Op` corresponds to a variable of type `Operation *`.
The results of the constraint can be populated using the provided `results`
variable. This variable is a `PDLResultList`, and expects results to be
populated in the order that they are defined within the result list of the
constraint declaration.
In addition to the above, the code block may also access the current
`PatternRewriter` using `rewriter`.
#### Defining Constraints Inline
In addition to global scope, PDLL Constraints and Native Constraints defined in
PDLL may be specified *inline* at any level of nesting. This means that they may
be defined in Patterns, other Constraints, Rewrites, etc:
```pdll
Constraint GlobalConstraint() {
Constraint LocalConstraint(value: Value) {
...
};
Constraint LocalNativeConstraint(value: Value) [{
...
}];
let someValue: [LocalConstraint, LocalNativeConstraint] = ...;
}
```
Constraints that are defined inline may also elide the name when used directly:
```pdll
Constraint GlobalConstraint(inputValue: Value) {
Constraint(value: Value) { ... }(inputValue);
Constraint(value: Value) [{ ... }](inputValue);
}
```
When defined inline, PDLL constraints may reference any previously defined
variable:
```pdll
Constraint GlobalConstraint(op: Op<my_dialect.foo>) {
Constraint LocalConstraint() {
let results = op.results;
};
}
```
### Rewriters
Rewriters define the set of transformations to be performed within the `rewrite`
section of a pattern, and, more specifically, how to transform the input IR
after a successful pattern match. All PDLL rewrites must be defined within the
`rewrite` section of the pattern. The `rewrite` section is denoted by the last
statement within the body of the `Pattern`, which is required to be an
[operation rewrite statement](#operation-rewrite-statements). There are two main
categories of rewrites in PDLL: operation rewrite statements, and user defined
rewrites.
#### Operation Rewrite statements
Operation rewrite statements are builtin PDLL statements that perform an IR
transformation given a root operation. These statements are the only ones able
to start the `rewrite` section of a pattern, as they allow for properly
["binding"](#variable-binding) the root operation of the pattern.
##### `erase` statement
```pdll
// A pattern that erases all `my_dialect.foo` operations.
Pattern => erase op<my_dialect.foo>;
```
The `erase` statement erases a given operation.
##### `replace` statement
```pdll
// A pattern that replaces the root operation with its input value.
Pattern {
let root = op<my_dialect.foo>(input: Value);
replace root with input;
}
// A pattern that replaces the root operation with multiple input values.
Pattern {
let root = op<my_dialect.foo>(input: Value, _: Value, input2: Value);
replace root with (input, input2);
}
// A pattern that replaces the root operation with another operation.
// Note that when an operation is used as the replacement, we can infer its
// result types from the input operation. In these cases, the result
// types of replacement operation may be elided.
Pattern {
// Note: In this pattern we also inlined the `root` expression.
replace op<my_dialect.foo> with op<my_dialect.bar>;
}
```
The `replace` statement allows for replacing a given root operation with either
another operation, or a set of input `Value` and `ValueRange` values. When an operation
is used as the replacement, we allow infering the result types from the input operation.
In these cases, the result types of replacement operation may be elided. Note that no
other components aside from the result types will be inferred from the input operation
during the replacement.
##### `rewrite` statement
```pdll
// A simple pattern that replaces the root operation with its input value.
Pattern {
let root = op<my_dialect.foo>(input: Value);
rewrite root with {
...
replace root with input;
};
}
```
The `rewrite` statement allows for rewriting a given root operation with a block
of nested rewriters. The root operation is not implicitly erased or replaced,
and any transformations to it must be expressed within the nested rewrite block.
The inner body may contain any number of other rewrite statements, variables, or
expressions.
#### Defining Rewriters in PDLL
Additional rewrites can also be defined within PDLL, which allows for building
rewrite fragments that can be composed across many different patterns. A
rewriter in PDLL is defined similarly to a function in traditional programming
languages; it contains a name, a set of input arguments, a set of result types,
and a body. Results of a rewrite are returned via a `return` statement. A few
examples are shown below:
```pdll
// A rewrite that constructs and returns a new operation, given an input value.
Rewrite BuildFooOp(value: Value) -> Op {
return op<my_dialect.foo>(value);
}
Pattern {
// We invoke the rewrite in the same way as functions in traditional
// languages.
replace op<my_dialect.old_op>(input: Value) with BuildFooOp(input);
}
```
##### Rewrites with multiple results
Rewrites can return multiple results by returning a tuple of values. When
returning multiple results, each result can also be assigned a name to use when
indexing that tuple element. Tuple elements can be referenced by their index
number, or by name if they were assigned one.
```pdll
// A rewrite that returns multiple results, with some of the results assigned
// a more readable name.
Rewrite CreateRewriteOps() -> (Op, result1: ValueRange) {
return (op<my_dialect.bar>, op<my_dialect.foo>);
}
Pattern {
rewrite root: Op<my_dialect.foo> with {
// Invoke the rewrite, which returns a tuple of values.
let result = CreateRewriteOps();
// Index the tuple elements by index, or by name.
replace root with (result.0, result.1, result.result1);
}
}
```
##### Rewrite result type inference
In addition to explicitly specifying the results of the rewrite via the rewrite
signature, PDLL defined rewrites also support inferring the result type from the
return statement. Result type inference is active whenever the rewrite is
defined with no result constraints:
```pdll
// This rewrite returns a derived operation.
Rewrite ReturnSelf(op: Op<my_dialect.foo>) => op;
// This rewrite returns a tuple of two Values.
Rewrite ExtractMultipleResults(op: Op<my_dialect.foo>) {
return (result1 = op.result1, result2 = op.result2);
}
Pattern {
rewrite root: Op<my_dialect.foo> with {
let values = ExtractMultipleResults(op<my_dialect.foo>);
replace root with (values.result1, values.result2);
}
}
```
##### Single Line "Lambda" Body
Rewrites generally define their body using a compound block of statements, as
shown below:
```pdll
Rewrite ReturnSelf(op: Op<my_dialect.foo>) {
return op;
}
Rewrite EraseOp(op: Op) {
erase op;
}
```
Rewrites also support a lambda-like syntax for specifying simple single line
bodies. The lambda body of a Rewrite expects a single expression, which is
implicitly returned, or a single
[operation rewrite statement](#operation-rewrite-statements):
```pdll
Rewrite ReturnSelf(op: Op<my_dialect.foo>) => op;
Rewrite EraseOp(op: Op) => erase op;
```
#### Native Rewriters
Rewriters may also be defined outside of PDLL, and registered natively within
the C++ API.
##### Importing existing Native Rewrites
Rewrites defined externally can be imported into PDLL by specifying a
rewrite "declaration". This is similar to the PDLL form of defining a
rewrite but omits the body. Importing the declaration in this form allows for
PDLL to statically know the expected input and output types.
```pdll
// Import a single input native rewrite that returns a new operation. This
// rewrite must be registered by the consumer of the compiled PDL.
Rewrite BuildOp(value: Value) -> Op;
Pattern {
replace op<my_dialect.old_op>(input: Value) with BuildOp(input);
}
```
External rewrites are those registered explicitly with the `RewritePatternSet` via
the C++ PDL API. For example, the rewrite above may be registered as:
```c++
// TODO: Cleanup when we allow more accessible wrappers around PDL functions.
static void buildOpImpl(ArrayRef<PDLValue> args, PatternRewriter &rewriter,
PDLResultList &results) {
Value value = args[0].cast<Value>();
// insert special rewrite logic here.
Operation *resultOp = ...;
results.push_back(resultOp);
}
void registerNativeRewrite(RewritePatternSet &patterns) {
patterns.getPDLPatterns().registerRewriteFunction("BuildOp", buildOpImpl);
}
```
##### Defining Native Rewrites in PDLL
In addition to importing native rewrites, PDLL also supports defining native
rewrites directly when compiling ahead-of-time (AOT) for C++. These rewrites can
be defined by specifying a string code block after the rewrite declaration:
```pdll
Rewrite BuildOp(value: Value) -> (foo: Op<my_dialect.foo>, bar: Op<my_dialect.bar>) [{
// We push back the results into the `results` variable in the order defined
// by the result list of the rewrite declaration.
results.push_back(rewriter.create<my_dialect::FooOp>(value));
results.push_back(rewriter.create<my_dialect::BarOp>());
}];
Pattern {
let root = op<my_dialect.foo>(input: Value);
rewrite root with {
// Invoke the native rewrite and use the results when replacing the root.
let results = BuildOp(input);
replace root with (results.foo, results.bar);
}
}
```
The arguments of the rewrite are accessible within the code block via the
same name. The type of these native variables are mapped directly to the
corresponding MLIR type of the [core constraint](#core-constraints) used. For
example, an `Op` corresponds to a variable of type `Operation *`.
The results of the rewrite can be populated using the provided `results`
variable. This variable is a `PDLResultList`, and expects results to be
populated in the order that they are defined within the result list of the
rewrite declaration.
In addition to the above, the code block may also access the current
`PatternRewriter` using `rewriter`.
#### Defining Rewrites Inline
In addition to global scope, PDLL Rewrites and Native Rewrites defined in PDLL
may be specified *inline* at any level of nesting. This means that they may be
defined in Patterns, other Rewrites, etc:
```pdll
Rewrite GlobalRewrite(inputValue: Value) {
Rewrite localRewrite(value: Value) {
...
};
Rewrite localNativeRewrite(value: Value) [{
...
}];
localRewrite(inputValue);
localNativeRewrite(inputValue);
}
```
Rewrites that are defined inline may also elide the name when used directly:
```pdll
Rewrite GlobalRewrite(inputValue: Value) {
Rewrite(value: Value) { ... }(inputValue);
Rewrite(value: Value) [{ ... }](inputValue);
}
```
When defined inline, PDLL rewrites may reference any previously defined
variable:
```pdll
Rewrite GlobalRewrite(op: Op<my_dialect.foo>) {
Rewrite localRewrite() {
let results = op.results;
};
}
```