Update docs for region inference to reflect current state better

This commit is contained in:
Niko Matsakis 2015-10-28 16:42:34 -04:00
parent c81ce8249c
commit 9ef241656f

View file

@ -2,13 +2,12 @@ Region inference
# Terminology # Terminology
Note that we use the terms region and lifetime interchangeably, Note that we use the terms region and lifetime interchangeably.
though the term `lifetime` is preferred.
# Introduction # Introduction
Region inference uses a somewhat more involved algorithm than type Region inference uses a somewhat more involved algorithm than type
inference. It is not the most efficient thing ever written though it inference. It is not the most efficient thing ever written though it
seems to work well enough in practice (famous last words). The reason seems to work well enough in practice (famous last words). The reason
that we use a different algorithm is because, unlike with types, it is that we use a different algorithm is because, unlike with types, it is
impractical to hand-annotate with regions (in some cases, there aren't impractical to hand-annotate with regions (in some cases, there aren't
@ -25,22 +24,42 @@ once.
The constraints are always of one of three possible forms: The constraints are always of one of three possible forms:
- ConstrainVarSubVar(R_i, R_j) states that region variable R_i - `ConstrainVarSubVar(Ri, Rj)` states that region variable Ri must be
must be a subregion of R_j a subregion of Rj
- ConstrainRegSubVar(R, R_i) states that the concrete region R - `ConstrainRegSubVar(R, Ri)` states that the concrete region R (which
(which must not be a variable) must be a subregion of the variable R_i must not be a variable) must be a subregion of the variable Ri
- ConstrainVarSubReg(R_i, R) is the inverse - `ConstrainVarSubReg(Ri, R)` states the variable Ri shoudl be less
than the concrete region R. This is kind of deprecated and ought to
be replaced with a verify (they essentially play the same role).
In addition to constraints, we also gather up a set of "verifys"
(what, you don't think Verify is a noun? Get used to it my
friend!). These represent relations that must hold but which don't
influence inference proper. These take the form of:
- `VerifyRegSubReg(Ri, Rj)` indicates that Ri <= Rj must hold,
where Rj is not an inference variable (and Ri may or may not contain
one). This doesn't influence inference because we will already have
inferred Ri to be as small as possible, so then we just test whether
that result was less than Rj or not.
- `VerifyGenericBound(R, Vb)` is a more complex expression which tests
that the region R must satisfy the bound `Vb`. The bounds themselves
may have structure like "must outlive one of the following regions"
or "must outlive ALL of the following regions. These bounds arise
from constraints like `T: 'a` -- if we know that `T: 'b` and `T: 'c`
(say, from where clauses), then we can conclude that `T: 'a` if `'b:
'a` *or* `'c: 'a`.
# Building up the constraints # Building up the constraints
Variables and constraints are created using the following methods: Variables and constraints are created using the following methods:
- `new_region_var()` creates a new, unconstrained region variable; - `new_region_var()` creates a new, unconstrained region variable;
- `make_subregion(R_i, R_j)` states that R_i is a subregion of R_j - `make_subregion(Ri, Rj)` states that Ri is a subregion of Rj
- `lub_regions(R_i, R_j) -> R_k` returns a region R_k which is - `lub_regions(Ri, Rj) -> Rk` returns a region Rk which is
the smallest region that is greater than both R_i and R_j the smallest region that is greater than both Ri and Rj
- `glb_regions(R_i, R_j) -> R_k` returns a region R_k which is - `glb_regions(Ri, Rj) -> Rk` returns a region Rk which is
the greatest region that is smaller than both R_i and R_j the greatest region that is smaller than both Ri and Rj
The actual region resolution algorithm is not entirely The actual region resolution algorithm is not entirely
obvious, though it is also not overly complex. obvious, though it is also not overly complex.
@ -54,14 +73,6 @@ Alternatively, you can call `commit()` which ends all snapshots.
Snapshots can be recursive---so you can start a snapshot when another Snapshots can be recursive---so you can start a snapshot when another
is in progress, but only the root snapshot can "commit". is in progress, but only the root snapshot can "commit".
# Resolving constraints
The constraint resolution algorithm is not super complex but also not
entirely obvious. Here I describe the problem somewhat abstractly,
then describe how the current code works. There may be other, smarter
ways of doing this with which I am unfamiliar and can't be bothered to
research at the moment. - NDM
## The problem ## The problem
Basically our input is a directed graph where nodes can be divided Basically our input is a directed graph where nodes can be divided
@ -83,31 +94,20 @@ Before resolution begins, we build up the constraints in a hashmap
that maps `Constraint` keys to spans. During resolution, we construct that maps `Constraint` keys to spans. During resolution, we construct
the actual `Graph` structure that we describe here. the actual `Graph` structure that we describe here.
## Our current algorithm ## Computing the values for region variables
We divide region variables into two groups: Expanding and Contracting. The algorithm is a simple dataflow algorithm. Each region variable
Expanding region variables are those that have a concrete region begins as empty. We iterate over the constraints, and for each constraint
predecessor (direct or indirect). Contracting region variables are we grow the relevant region variable to be as big as it must be to meet all the
all others. constraints. This means the region variables can grow to be `'static` if
necessary.
We first resolve the values of Expanding region variables and then ## Verification
process Contracting ones. We currently use an iterative, fixed-point
procedure (but read on, I believe this could be replaced with a linear
walk). Basically we iterate over the edges in the graph, ensuring
that, if the source of the edge has a value, then this value is a
subregion of the target value. If the target does not yet have a
value, it takes the value from the source. If the target already had
a value, then the resulting value is Least Upper Bound of the old and
new values. When we are done, each Expanding node will have the
smallest region that it could possibly have and still satisfy the
constraints.
We next process the Contracting nodes. Here we again iterate over the After all constraints are fully propoagated, we do a "verification"
edges, only this time we move values from target to source (if the step where we walk over the verify bounds and check that they are
source is a Contracting node). For each contracting node, we compute satisfied. These bounds represent the "maximal" values that a region
its value as the GLB of all its successors. Basically contracting variable can take on, basically.
nodes ensure that there is overlap between their successors; we will
ultimately infer the largest overlap possible.
# The Region Hierarchy # The Region Hierarchy