Minor spelling tweaks

Closes tensorflow/mlir#145

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/145 from kiszk:spelling_tweaks_g3doc ae9140aab5b797441e880d43e557903585815e40
PiperOrigin-RevId: 271173907
This commit is contained in:
Kazuaki Ishizaki 2019-09-25 11:57:13 -07:00 committed by A. Unique TensorFlower
parent 3848baec69
commit a2bce652af
20 changed files with 68 additions and 66 deletions

View file

@ -38,7 +38,7 @@ Some important things to think about w.r.t. canonicalization patterns:
## Globally Applied Rules
These transformation are applied to all levels of IR:
These transformations are applied to all levels of IR:
* Elimination of operations that have no side effects and have no uses.

View file

@ -25,14 +25,14 @@ benefits, including but not limited to:
* **Being declarative**: The pattern creator just needs to state the rewrite
pattern declaratively, without worrying about the concrete C++ methods to
call.
* **Removing boilerplate and showing the very essense the the rewrite**:
* **Removing boilerplate and showing the very essence of the rewrite**:
`mlir::RewritePattern` is already good at hiding boilerplate for defining a
rewrite rule. But we still need to write the class and function structures
required by the C++ programming language, inspect ops for matching, and call
op `build()` methods for constructing. These statements are typically quite
simple and similar, so they can be further condensed with auto-generation.
Because we reduce the boilerplate to the bare minimum, the declarative
rewrite rule will just contain the very essense of the rewrite. This makes
rewrite rule will just contain the very essence of the rewrite. This makes
it very easy to understand the pattern.
## Strengths and Limitations
@ -239,7 +239,7 @@ to replace the matched `AOp`.
#### Binding op results
In the result pattern, we can bind to the result(s) of an newly built op by
In the result pattern, we can bind to the result(s) of a newly built op by
attaching symbols to the op. (But we **cannot** bind to op arguments given that
they are referencing previously bound symbols.) This is useful for reusing
newly created results where suitable. For example,
@ -270,7 +270,7 @@ directly fed in as arguments to build the new op. For such cases, we can apply
transformations on the arguments by calling into C++ helper functions. This is
achieved by `NativeCodeCall`.
For example, if we want to catpure some op's attributes and group them as an
For example, if we want to capture some op's attributes and group them as an
array attribute to construct a new op:
```tblgen
@ -361,7 +361,7 @@ $in2)`, then this will be translated into C++ call `someFn($in1, $in2, $in0)`.
##### Customizing entire op building
`NativeCodeCall` is not only limited to transforming arguments for building an
op; it can also used to specify how to build an op entirely. An example:
op; it can be also used to specify how to build an op entirely. An example:
If we have a C++ function for building an op:
@ -379,10 +379,10 @@ def : Pat<(... $input, $attr), (createMyOp $input, $attr)>;
### Supporting auxiliary ops
A declarative rewrite rule supports multiple result patterns. One of the purpose
is to allow generating _auxiliary ops_. Auxiliary ops are operations used for
building the replacement ops; but they are not directly used for replacement
themselves.
A declarative rewrite rule supports multiple result patterns. One of the
purposes is to allow generating _auxiliary ops_. Auxiliary ops are operations
used for building the replacement ops; but they are not directly used for
replacement themselves.
For the case of uni-result ops, if there are multiple result patterns, only the
value generated from the last result pattern will be used to replace the matched
@ -556,7 +556,7 @@ correspond to multiple actual values.
Constraints can be placed on op arguments when matching. But sometimes we need
to also place constraints on the matched op's results or sometimes need to limit
the matching with some constraints that cover both the arugments and the
the matching with some constraints that cover both the arguments and the
results. The third parameter to `Pattern` (and `Pat`) is for this purpose.
For example, we can write
@ -587,7 +587,7 @@ You can
### Adjusting benefits
The benefit of a `Pattern` is an integer value indicating the benfit of matching
The benefit of a `Pattern` is an integer value indicating the benefit of matching
the pattern. It determines the priorities of patterns inside the pattern rewrite
driver. A pattern with a higher benefit is applied before one with a lower
benefit.
@ -599,7 +599,7 @@ pattern. This is based on the heuristics and assumptions that:
* If a smaller one is applied first the larger one may not apply anymore.
The forth parameter to `Pattern` (and `Pat`) allows to manually tweak a
The fourth parameter to `Pattern` (and `Pat`) allows to manually tweak a
pattern's benefit. Just supply `(addBenefit N)` to add `N` to the benefit value.
## Special directives

View file

@ -55,7 +55,7 @@ enum Kinds {
### Defining the type class
As described above, `Type` objects in MLIR are value-typed and rely on having an
implicity internal storage object that holds the actual data for the type. When
implicitly internal storage object that holds the actual data for the type. When
defining a new `Type` it isn't always necessary to define a new storage class.
So before defining the derived `Type`, it's important to know which of the two
classes of `Type` we are defining. Some types are `primitives` meaning they do
@ -256,7 +256,7 @@ Once the dialect types have been defined, they must then be registered with a
```c++
struct MyDialect : public Dialect {
MyDialect(MLIRContext *context) : Dialect(/*name=*/"mydialect", context) {
/// Add these types to the dialcet.
/// Add these types to the dialect.
addTypes<SimpleType, ComplexType>();
}
};

View file

@ -12,7 +12,7 @@ LLVM style guide:
* Adopts [camelBack](https://llvm.org/docs/Proposals/VariableNames.html);
* Except for IR units (Region, Block, and Operation), non-nullable output
argument are passed by non-const reference in general.
arguments are passed by non-const reference in general.
* IR constructs are not designed for [const correctness](UsageOfConst.md).
* Do *not* use recursive algorithms if the recursion can't be bounded
statically: that is avoid recursion if there is a possible IR input that can

View file

@ -105,7 +105,7 @@ parenthesization, (2) negation, (3) modulo, multiplication, floordiv, and
ceildiv, and (4) addition and subtraction. All of these operators associate from
left to right.
A _multi-dimensional affine expression_ is a comma separated list of
A _multidimensional affine expression_ is a comma separated list of
one-dimensional affine expressions, with the entire list enclosed in
parentheses.
@ -119,7 +119,7 @@ affine function. MLIR further extends the definition of an affine function to
allow 'floordiv', 'ceildiv', and 'mod' with respect to positive integer
constants. Such extensions to affine functions have often been referred to as
quasi-affine functions by the polyhedral compiler community. MLIR uses the term
'affine map' to refer to these multi-dimensional quasi-affine functions. As
'affine map' to refer to these multidimensional quasi-affine functions. As
examples, $$(i+j+1, j)$$, $$(i \mod 2, j+i)$$, $$(j, i/4, i \mod 4)$$, $$(2i+1,
j)$$ are two-dimensional affine functions of $$(i, j)$$, but $$(i \cdot j,
i^2)$$, $$(i \mod j, i/j)$$ are not affine functions of $$(i, j)$$.

View file

@ -109,7 +109,7 @@ Examples:
In these operations, `<size>` must be a value of wrapped LLVM IR integer type,
`<address>` must be a value of wrapped LLVM IR pointer type, and `<value>` must
be a value of wrapped LLVM IR type that corresponds to the pointee type of
be a value of wrapped LLVM IR type that corresponds to the pointer type of
`<address>`.
The `index` operands are integer values whose semantics is identical to the

View file

@ -20,7 +20,7 @@ format and to facilitate transformations. Therefore, it should
* Stay as the same semantic level and try to be a mechanical 1:1 mapping;
* But deviate representationally if possible with MLIR mechanisms.
* Be straightforward to serialize into and deserialize drom the SPIR-V binary
* Be straightforward to serialize into and deserialize from the SPIR-V binary
format.
## Conventions
@ -55,10 +55,10 @@ instructions are represented in the SPIR-V dialect. Notably,
* Requirements for capabilities, extensions, extended instruction sets,
addressing model, and memory model is conveyed using `spv.module`
attributes. This is considered better because these information are for the
exexcution environment. It's eaiser to probe them if on the module op
execution environment. It's easier to probe them if on the module op
itself.
* Annotations/decoration instrutions are "folded" into the instructions they
decorate and represented as attributes on those ops. This elimiates
* Annotations/decoration instructions are "folded" into the instructions they
decorate and represented as attributes on those ops. This eliminates
potential forward references of SSA values, improves IR readability, and
makes querying the annotations more direct.
* Types are represented using MLIR standard types and SPIR-V dialect specific
@ -252,7 +252,7 @@ block, one loop continue block, one merge block.
...
\ | /
v
+-------------+ (may have mulitple incoming branches)
+-------------+ (may have multiple incoming branches)
| merge block |
+-------------+
```

View file

@ -92,7 +92,7 @@ for %i = 0 to 3 {
On a GPU one could then map `i`, `j`, `k` to blocks and threads. Notice that the
temporary storage footprint is `3 * 5` values but `3 * 4 * 5` values are
actually transferred betwen `%A` and `%tmp`.
actually transferred between `%A` and `%tmp`.
Alternatively, if a notional vector broadcast operation were available, the
lowered code would resemble:

View file

@ -349,7 +349,7 @@ that match predicates eliminate the need for dynamically computed costs in
almost all cases: you can simply instantiate the same pattern one time for each
possible cost and use the predicate to guard the match.
The two phase nature of this API (match separate from rewrite) is important for
The two-phase nature of this API (match separate from rewrite) is important for
two reasons: 1) some clients may want to explore different ways to tile the
graph, and only rewrite after committing to one tiling. 2) We want to support
runtime extensibility of the pattern sets, but want to be able to statically

View file

@ -312,8 +312,8 @@ it.
An MLIR Function is an operation with a name containing one [region](#regions).
The region of a function is not allowed to implicitly capture values defined
outside of the function, and all external references must use Function arguments
or attributes that establish a symbolic connection(e.g. symbols referenced by
outside of the function, and all external references must use function arguments
or attributes that establish a symbolic connection (e.g. symbols referenced by
name via a string attribute like [SymbolRefAttr](#symbol-reference-attribute)):
``` {.ebnf}
@ -455,12 +455,14 @@ func @accelerator_compute(i64, i1) -> i64 {
^bb2:
"accelerator.launch"() {
^bb0:
// Region of code nested under "accelerator_launch", it can reference %a but
// Region of code nested under "accelerator.launch", it can reference %a but
// not %value.
%new_value = "accelerator.do_something"(%a) : (i64) -> ()
}
// %new_value cannot be referenced outside of the region
...
^bb3:
...
}
```
@ -796,7 +798,7 @@ memref<16x32xf32, #identity, memspace0>
// f32 elements.
%T = alloc(%M, %N) [%B1, %B2] : memref<?x?xf32, #tiled_dynamic>
// A memref that has a two element padding at either end. The allocation size
// A memref that has a two-element padding at either end. The allocation size
// will fit 16 * 68 float elements of data.
%P = alloc() : memref<16x64xf32, #padded>
@ -1296,7 +1298,7 @@ Syntax:
integer-set-attribute ::= affine-map
```
An integer-set attribute is an attribute that represents a integer-set object.
An integer-set attribute is an attribute that represents an integer-set object.
#### String Attribute

View file

@ -116,7 +116,7 @@ of the benefits that MLIR provides, in no particular order:
The MLIR in-memory data structure has a human readable and writable format, as
well as [a specification](LangRef.md) for that format - built just like any
other programming language. Important properties of this format is that it is
other programming language. Important properties of this format are that it is
compact, easy to read, and lossless. You can dump an MLIR program out to disk
and munge around with it, then send it through a few more passes.
@ -139,7 +139,7 @@ the product more reliable, and making it easier to track down bugs when they
appear - because the verifier can be run at any time, either as a compiler pass
or with a single function call.
While MLIR provides a well considered infrastructure for IR verification, and
While MLIR provides a well-considered infrastructure for IR verification, and
has simple checks for existing TensorFlow operations, there is a lot that should
be added here and lots of opportunity to get involved!
@ -166,7 +166,7 @@ turned into zero:
The "CHECK" comments are interpreted by the
[LLVM FileCheck tool](https://llvm.org/docs/CommandGuide/FileCheck.html), which
is sort of like a really advanced grep. This test is fully self contained: it
is sort of like a really advanced grep. This test is fully self-contained: it
feeds the input into the [canonicalize pass](Canonicalization.md), and checks
that the output matches the CHECK lines. See the `test/Transforms` directory for
more examples. In contrast, standard unit testing exposes the API of the
@ -258,7 +258,7 @@ This is still a work in progress, but we have sightlines towards a
tiles into other DAG tiles, using a declarative pattern format. DAG to DAG
rewriting is a generalized solution for many common compiler optimizations,
lowerings, and other rewrites and having an IR enables us to invest in building
a single high quality implementation.
a single high-quality implementation.
Declarative pattern rules are preferable to imperative C++ code for a number of
reasons: they are more compact, easier to reason about, can have checkers
@ -313,7 +313,7 @@ transformations) today, and are committed to pushing hard to make it better.
MLIR has been designed to be memory and compile-time efficient in its algorithms
and data structures, using immutable and uniqued structures, low level
bit-packing, and other well known techniques to avoid unnecessary heap
bit-packing, and other well-known techniques to avoid unnecessary heap
allocations, and allow simple and safe multithreaded optimization of MLIR
programs. There are other reasons to believe that the MLIR implementations of
common transformations will be more efficient than the Python and C++

View file

@ -242,7 +242,7 @@ like `"0.5f"`, and an integer array default value should be specified as like
`Confined` is provided as a general mechanism to help modelling further
constraints on attributes beyond the ones brought by value types. You can use
`Confined` to compose complex constraints out of more primitive ones. For
example, an 32-bit integer attribute whose minimal value must be 10 can be
example, a 32-bit integer attribute whose minimal value must be 10 can be
expressed as `Confined<I32Attr, [IntMinValue<10>]>`.
Right now, the following primitive constraints are supported:
@ -373,7 +373,7 @@ def MyInterface : OpInterface<"MyInterface"> {
### Custom builder methods
For each operation, there are two builder automatically generated based on the
For each operation, there are two builders automatically generated based on the
arguments and returns types:
```c++
@ -388,7 +388,7 @@ static void build(Builder *, OperationState &tblgen_state,
ArrayRef<NamedAttribute> attributes);
```
The above cases makes sure basic uniformity so that we can create ops using the
The above cases make sure basic uniformity so that we can create ops using the
same form regardless of the exact op. This is particularly useful for
implementing declarative pattern rewrites.
@ -572,7 +572,7 @@ a float tensor, and so on.
Similarly, a set of `AttrConstraint`s are created for helping modelling
constraints of common attribute kinds. They are the `Attr` subclass hierarchy.
It includes `F32Attr` for the constraints of being an float attribute,
It includes `F32Attr` for the constraints of being a float attribute,
`F32ArrayAttr` for the constraints of being a float array attribute, and so on.
### Multi-entity constraint
@ -648,7 +648,7 @@ replaced by the current attribute `attr` at expansion time.
For more complicated predicates, you can wrap it in a single `CPred`, or you
can use predicate combiners to combine them. For example, to write the
constraint that an attribute `attr` is an 32-bit or 64-bit integer, you can
constraint that an attribute `attr` is a 32-bit or 64-bit integer, you can
write it as
```tablegen
@ -695,9 +695,9 @@ def MyOp : Op<...> {
As to whether we should define the predicate using a single `CPred` wrapping
the whole expression, multiple `CPred`s with predicate combiners, or a single
`CPred` "invoking" a function, there are no clear-cut criteria. Defining using
`CPred` and predicate combiners is preferrable since it exposes more information
`CPred` and predicate combiners is preferable since it exposes more information
(instead hiding all the logic behind a C++ function) into the op definition spec
so that it can pontentially drive more auto-generation cases. But it will
so that it can potentially drive more auto-generation cases. But it will
require a nice library of common predicates as the building blocks to avoid the
duplication, which is being worked on right now.
@ -928,7 +928,7 @@ the output type (shape) for given input type (shape).
But shape functions are determined by attributes and could be arbitrarily
complicated with a wide-range of specification possibilities. Equality
relationship are common (e.g., the elemental type of the output matches the
relationships are common (e.g., the elemental type of the output matches the
primitive type of the inputs, both inputs have exactly the same type [primitive
type and shape]) and so these should be easy to specify. Algebraic relationships
would also be common (e.g., a concat of `[n,m]` and `[n,m]` matrix along axis 0

View file

@ -79,7 +79,7 @@ In order to exactly represent the Real zero with an integral-valued affine
value, the zero point must be an integer between the minimum and maximum affine
value (inclusive). For example, given an affine value represented by an 8 bit
unsigned integer, we have: $$ 0 \leq zero\_point \leq 255$$. This is important,
because in deep neural networks's convolution-like operations, we frequently
because in deep neural networks' convolution-like operations, we frequently
need to zero-pad inputs and outputs, so zero must be exactly representable, or
the result will be biased.
@ -123,7 +123,7 @@ $$
In the above, we assume that $$real\_value$$ is a Single, $$scale$$ is a Single,
$$roundToNearestInteger$$ returns a signed 32 bit integer, and $$zero\_point$$
is an unsigned 8 or 16 bit integer. Note that bit depth and number of fixed
point values is indicative of common types on typical hardware but is not
point values are indicative of common types on typical hardware but is not
constrained to particular bit depths or a requirement that the entire range of
an N-bit integer is used.

View file

@ -10,7 +10,7 @@ See [MLIR specification](LangRef.md) for more information about MLIR, the
structure of the IR, operations, etc. See
[Table-driven Operation Definition](OpDefinitions.md) and
[Declarative Rewrite Rule](DeclarativeRewrites.md) for the detailed explanation
of all available mechansims for defining operations and rewrites in a
of all available mechanisms for defining operations and rewrites in a
table-driven manner.
## Adding operation
@ -90,7 +90,7 @@ OpFoldResult SpecificOp::fold(ArrayRef<Attribute> constOperands) {
There are multiple forms of graph rewrite that can be performed in MLIR. One of
the most common is DAG tile to DAG tile rewrite. Patterns provide a concise way
to express this transformation as a pair of source pattern to match and
resultant pattern. There is both the C++ classes to represent this
resultant pattern. There are both the C++ classes to represent this
transformation, as well as the patterns in TableGen from which these can be
generated.

View file

@ -39,7 +39,7 @@ neural network accelerators.
MLIR uses ideas drawn from IRs of LLVM and Swift for lower level constructs
while combining them with ideas from the polyhedral abstraction to represent
loop nests, multi-dimensional data (tensors), and transformations on these
loop nests, multidimensional data (tensors), and transformations on these
entities as first class concepts in the IR.
MLIR is a multi-level IR, i.e., it represents code at a domain-specific
@ -58,7 +58,7 @@ polyhedral abstraction.
Maps, sets, and relations with affine constraints are the core structures
underlying a polyhedral representation of high-dimensional loop nests and
multi-dimensional arrays. These structures are represented as textual
multidimensional arrays. These structures are represented as textual
expressions in a form close to their mathematical form. These structures are
used to capture loop nests, tensor data structures, and how they are reordered
and mapped for a target architecture. All structured or "conforming" loops are
@ -513,7 +513,7 @@ parsing/printing, will be available.
Dialect extended types are represented as string literals wrapped inside of the
dialect namespace. This means that the parser delegates to the dialect for
parsing specific type instances. This differs from the representation of dialect
defined operations, of which have a identifier name that the parser uses to
defined operations, of which have an identifier name that the parser uses to
identify and parse them.
This representation was chosen for several reasons:
@ -773,7 +773,7 @@ our current design in practice.
The current MLIR uses a representation of polyhedral schedules using a tree of
if/for loops. We extensively debated the tradeoffs involved in the typical
unordered polyhedral instruction representation (where each instruction has
multi-dimensional schedule information), discussed the benefits of schedule tree
multidimensional schedule information), discussed the benefits of schedule tree
forms, and eventually decided to go with a syntactic tree of affine if/else
conditionals and affine for loops. Discussion of the tradeoff was captured in
this document:
@ -806,7 +806,7 @@ At a high level, we have two alternatives here:
This representation is based on a simplified form of the domain/schedule
representation used by the polyhedral compiler community. Domains represent what
has to be executed while schedules represent the order in which domain elements
are interleaved. We model domains as non piece-wise convex integer sets, and
are interleaved. We model domains as non-piece-wise convex integer sets, and
schedules as affine functions; however, the former can be disjunctive, and the
latter can be piece-wise affine relations. In the schedule tree representation,
domain and schedules for instructions are represented in a tree-like structure
@ -1110,7 +1110,7 @@ The problem is that LLVM has several objects in its IR that are globally uniqued
and also mutable: notably constants like `i32 0`. In LLVM, these constants are
`Value*r`'s, which allow them to be used as operands to instructions, and that
they also have SSA use lists. Because these things are uniqued, every `i32 0` in
any function share a use list. This means that optimizing multiple functions in
any function shares a use list. This means that optimizing multiple functions in
parallel won't work (at least without some sort of synchronization on the use
lists, which would be unbearably inefficient).
@ -1122,7 +1122,7 @@ expressions, types, etc are all immutable, uniqued, and immortal). 2) constants
are defined in per-function pools, instead of being globally uniqued. 3)
functions themselves are not SSA values either, so they don't have the same
problem as constants. 4) FunctionPasses are copied (through their copy ctor)
into one instances per thread, avoiding sharing of local state across threads.
into one instance per thread, avoiding sharing of local state across threads.
This allows MLIR function passes to support efficient multithreaded compilation
and code generation.

View file

@ -10,7 +10,7 @@ This document is a very early design proposal (which has since been accepted)
that explored the tradeoffs of using this simplified form vs the traditional
polyhedral schedule list form. At some point, this document could be dusted off
and written as a proper academic paper, but until now, it is better to included
it in this crufty form than not to. Beware that this document uses archaic
it in this crafty form than not to. Beware that this document uses archaic
syntax and should not be considered a canonical reference to modern MLIR.
## Introduction
@ -282,7 +282,7 @@ transformations want to be explicit about what they are doing.
### Simplicity of code generation
A key final stage of an mlfunc is its conversion to a cfg function, which is
A key final stage of an mlfunc is its conversion to a CFG function, which is
required as part of lowering to the target machine. The simplified form has a
clear advantage here: the IR has a direct correspondence to the structure of the
generated code.

View file

@ -49,8 +49,8 @@ elimination, only one constant remains in the IR.
FileCheck is an extremely useful utility, it allows for easily matching various
parts of the output. This ease of use means that it becomes easy to write
brittle tests that are essentially `diff` tests. FileCheck tests should be as
self contained as possible and focus on testing the minimal set of functionality
needed. Let's see an example:
self-contained as possible and focus on testing the minimal set of
functionalities needed. Let's see an example:
```mlir {.mlir}
// RUN: mlir-opt %s -cse | FileCheck %s

View file

@ -65,7 +65,7 @@ public:
```
Unlike more complex types, RangeType does not require a hashing key for
unique'ing in the `MLIRContext`. Note that all MLIR types derive from
uniquing in the `MLIRContext`. Note that all MLIR types derive from
`mlir::Type::TypeBase` and expose `using Base::Base` to enable generic hooks to
work properly (in this instance for llvm-style casts. RangeType does not even
require an implementation file as the above represents the whole code for the
@ -187,7 +187,7 @@ view it slices and pretty-prints as:
%2 = linalg.slice %1[*, *, %0, *] : !linalg.view<?x?x?xf32>
```
In this particular case, %2 slices dimension `2` of the four dimensional view
In this particular case, %2 slices dimension `2` of the four-dimensional view
%1. The returned `!linalg.view<?x?x?xf32>` indicates that the indexing is
rank-reducing and that %0 is an `index`.

View file

@ -227,7 +227,7 @@ public:
PatternMatchResult match(Operation *op) const override;
// A "rewriting" function that takes an original operation `op`, a list of
// already rewritten opreands, and a function builder `rewriter`. It can use
// already rewritten operands, and a function builder `rewriter`. It can use
// the builder to construct new operations and ultimately create new values
// that will replace those currently produced by the original operation. It
// needs to define as many value as the original operation, but their types
@ -259,7 +259,7 @@ PatternMatchResult ViewOpConversion::match(Operation *op) const override {
}
```
The actual conversion function may become quite involved. First, Let us go over
The actual conversion function may become quite involved. First, let us go over
the components of a view descriptor and see how they can be constructed to
represent a _complete_ view of a `memref`, e.g. a view that covers all its
elements.
@ -412,7 +412,7 @@ struct ViewDescriptor {
return builder.getArrayAttr(attrs);
}
// Emit instructions obtaining individual values from the decsriptor.
// Emit instructions obtaining individual values from the descriptor.
Value *ptr() { return intrinsics::extractvalue(elementPtrType(), d, pos(0)); }
Value *offset() { return intrinsics::extractvalue(indexType(), d, pos(1)); }
Value *size(unsigned dim) {

View file

@ -82,7 +82,7 @@ def main() {
# reuse the previously specialized and inferred version and return `<2, 2>`
var d = multiply_transpose(b, a);
# A new call with `<2, 2>` for both dimension will trigger another
# A new call with `<2, 2>` for both dimensions will trigger another
# specialization of `multiply_transpose`.
var e = multiply_transpose(c, d);