Reworking Lengauer-Tarjan
In compiler circles, Lengauer & Tarjan's 1979 paper "A Fast Algorithm for Finding Dominators in a Flowgraph" is a classic: it describes and proves a viable algorithm for computing the idom
of every node in a rooted directed graph. I have just two criticisms of the paper:
- All the sample code is written in Algol. This was a fine choice in 1979, but tastes have changed over the intervening 46 years, and most modern eyes will see the Algol code as somewhat antiquated.
- It starts by presenting an optimised variant of the algorithm, rather than starting with the most basic exposition and then introducing optimisations.
I believe that both problems can be addressed via a little bit of reworking.
The algorithm takes as input a graph G
with some root node r
. We immediately perform a depth first search on G
; let T
be the spanning tree discovered by that search, and replace all node labels with the pre-order index from that search. The root node is now 0
, one of its successors is 1
, and so forth. For arbitrary nodes v
and w
, this allows us to write things like v < w
and v ≤ w
and min(v, w)
, all of which are operating on these indices. We also introduce some notation for talking about these graphs:
Notation | Meaning |
---|---|
v ->G w | There is an edge from v to w in G , and w != r |
v -*->T w | There is a path from v to w in T , or v == w |
We jump in at what the paper calls Theorem 4, which enables this definition:
sdom(w) = min({ v | v ->G w and v ≤ w} union
{sdom(u) | u -*->T v and v ->G w and v > w and u > w})
This definition is recursive, but only in the case of u > w
, so we can compute it for all nodes by first computing it for w = N-1
, then for w = N-2
, and so forth until we eventually reach w = 1
(we cannot compute it for w = 0
, as both sets are empty in that case).
Reworking Theorem 3 slightly enables this definition:
idomfrom(w) = argminu{sdom(u) | u -*->T w and u > sdom(w)}
Where argminu{expr | condition}
gives the u
which minimises expr
(over all the possible u
which meet condition
), or any such u
if there are multiple u
achieving that minimum. Note that the set is always non-empty, as u = w
meets the condition. As u -*->T w
implies u ≤ w
, we also have idomfrom(w) ≤ w
.
Then reworking Theorem 2 slightly enables this definition:
idom(w) = sdom(w) if idomfrom(w) ≥ w else idom(idomfrom(w))
This definition is recursive, but only in the case of idomfrom(w) < w
, so we can compute it for all nodes by first computing it for w = 1
, then for w = 2
, and so forth until eventually we reach w = N-1
.
This gives us everything we need to compute idom
, via a four step process:
- Perform a depth first search on
G
, to giveT
(and pre-order integer nodes). - Compute
sdom(w)
for allw > 0
, in decreasing order ofw
. - Compute
idomfrom(w)
for allw > 0
, in any order. - Compute
idom(w)
for allw > 0
, in increasing order ofw
.
In both steps 2 and 3, we're interested in either min
or argminu
over the set {sdom(u) | u -*->T v and u > limit}
, for various values of v
and limit
. There are (at least) three different strategies for computing each of these min
/ argminu
:
- From scratch every time: whenever such a computation is required, start with
u = v
, and then iterateu = parent(u)
(whereparent
gives the parent inT
) untilu ≤ limit
, taking themin
/argminu
ofsdom(u)
over all the observedu
for whichu > limit
. - Like strategy 1, but introduce a layer of caching to avoid repeated work. The paper calls this "path compression", which is one way of viewing it, but you reach the exact same place if you instead apply aggressive caching to strategy 1. For this to work, all queries need to be made in decreasing order of
limit
, as otherwise previously cached results wouldn't be valid. This happens naturally for step 2 (because it haslimit = w
, and is performed in decreasing order ofw
), but slightly more work is required to extend it to step 3: the usual approach is to chop up step 3 into lots of small pieces, and perform them interleaved with step 2 at appropriate times. - Like strategy 2, but doing the union-find equivalent of "union by rank/size" rather than the union-find equivalent of "path compression". Note that the
min
andargminu
operations in question aren't quite union-find, but they're sufficiently similar that the ideas can carry over.
These three strategies are in increasing order of implementation complexity, but also decreasing order of worst-case algorithmic complexity. Strategy 3 is best in theory, but the general consensus is that strategy 2 usually beats it in practice. Meanwhile, the Go compiler uses strategy 1 and considers it sufficient.
If using strategy 1, the pseudocode for steps 2 through 4 can be quite concise:
# Step 2
for w in range(N-1, 0, -1):
sdom[w] = min(strategy1(v, w)[0] if v > w else v for v in pred(w))
# Step 3
for w in range(1, N):
idomfrom[w] = strategy1(w, sdom[w])[1]
# Step 4
for w in range(1, N):
idom[w] = sdom[w] # Relevant when idomfrom[w] = w
idom[w] = idom[idomfrom[w]] # No change when idomfrom[w] = w
def strategy1(v, limit):
us = [v]
while parent(us[-1]) > limit: us.append(parent(us[-1]))
return min((sdom[u], u) for u in us)
Note that pred
in step 2 gives the predecessors in G
, whereas parent
in strategy1
gives the parent in T
.
There are two little tricks to avoid some of the function calls in step 3:
- If
sdom(w) == 0
, thenu = w
is an acceptable result fromargminu
. - If
sdom(w) == parent(w)
, thenu = w
will be only possibleargminu
.
We can revise step 3 to detect these two cases, and handle them without a call:
# Step 3
for w in range(1, N):
if sdom[w] in {0, parent(w)}:
idomfrom[w] = w
else:
idomfrom[w] = strategy1(w, sdom[w])[1]
If we then want to use something better than strategy1
, step 3 needs to be chopped up and interleaved with step 2. One way of doing this is to introduce an array called deferred
, which is conceptually storing various sets: calls to strategy1
in step 3 are changed to instead add a node to a set, and then the actual calls happen later when the set is drained. The pseudocode for steps 2 and 3 thus becomes:
# Step 2 and Step 3
deferred = [0] * N # Initialise deferred (all sets empty)
for w in range(N-1, 0, -1):
v = deferred[w]
while v != 0: # Drain set
idomfrom[v] = strategy1(v, w)[1]
v = deferred[v]
sdom[w] = min(strategy1(v, w)[0] if v > w else v for v in pred(w))
if sdom[w] in {0, parent(w)}:
idomfrom[w] = w
else:
# Add w to set, will drain when step 2 reaches sdom(w)
deferred[w] = deferred[sdom[w]]
deferred[sdom[w]] = w
Then we can upgrade steps 2 and 3 from strategy1
to strategy2
:
# Step 2 and Step 3
deferred = [0] * N # Initialise deferred (all sets empty)
for w in range(N-1, 0, -1):
v = deferred[w]
while v != 0: # Drain set
idomfrom[v] = strategy2(v, w)[1]
v = deferred[v]
sdom[w] = min(strategy2(v, w)[0] if v > w else v for v in pred(w))
if sdom[w] in {0, parent(w)}:
idomfrom[w] = w
else:
# Add w to set, will drain when step 2 reaches sdom(w)
deferred[w] = deferred[sdom[w]]
deferred[sdom[w]] = w
cache_ancestor[w] = parent(w)
cache_result[w] = (sdom(w), w)
def strategy2(v, limit):
vs = []
ancestor = cache_ancestor[v]
while ancestor > limit:
vs.append(v)
v = ancestor
ancestor = cache_ancestor[v]
result = cache_result[v]
while vs:
v = vs.pop()
result = min(result, cache_result[v])
cache_result[v] = result
cache_ancestor[v] = ancestor
return result
I consider strategy2
to be sufficient, so I won't present a strategy3
. Instead, a few memory optimisations are possible to anyone squeezing out performance:
- The
idom
andsdom
arrays can share the same underlying storage, as for any givenw
, onceidom[w]
has been assigned,sdom[w]
is not accessed. This also removes the need for theidom[w] = sdom[w]
assignment in step 4. - The
cache_ancestor
andparent
arrays can share the same underlying storage, as for any givenw
, oncecache_ancestor[w]
has been assigned for the first time,parent(w)
is not used. This also removes the need for thecache_ancestor[w] = parent(w)
assignment. - The
deferred
andidomfrom
arrays can share the same underlying storage, albeit the draining needs a minor tweak to readdeferred[v]
before writingidomfrom[v]
.
Once idom(w)
has been computed, it can be used to compute DF(w)
in the style of Cytron et al's 1991 paper "Efficiently Computing Static Single Assignment Form and the Control Dependence Graph":
for w in bottom_up_traversal_of_dominator_tree:
DF(w) = set()
for v in succ(w): # successors in G
if idom(v) != w:
DF(w).add(v)
for v in children(w): # children in the dominator tree
for u in DF(v):
if idom(u) != w:
DF(w).add(u)
Alternatively, idom(w)
can be used to compute DF(w)
in the style of Cooper et al's 2001 paper "A Simple, Fast Dominance Algorithm":
for w in G:
DF(w) = set()
for w in G:
if len(pred(w)) ≥ 2: # predecessors in G
for v in pred(w): # predecessors in G
u = v
while u != idom(w):
DF(u).add(w)
u = idom(u)
Note that if len(pred(w)) ≥ 2:
is an optimisation, rather than a requirement for correctness: when len(pred(w)) == 1
, the sole predecessor of w
will be idom(w)
, so the innermost loop won't execute.
This formulation is easily amenable to representing DF
values as arrays rather than sets, as deduplication needs only to look at the most recently inserted value:
for w in G:
DF(w) = []
for w in G:
if len(pred(w)) ≥ 2: # predecessors in G
for v in pred(w): # predecessors in G
u = v
while u != idom(w):
if len(DF(u)) == 0 or DF(u)[-1] != w: DF(u).append(w)
u = idom(u)
Finally, once DF(w)
is available for all w
, SSA construction becomes simple: if G
denotes a control flow graph of basic blocks, then for every variable assigned-to in w
, DF(w)
is exactly the set of basic blocks which require a phi
function (or basic block argument) inserted at their start to reconverge the various assignments of that variable. A mere two caveats apply:
- The
phi
function is another assignment to the variable in question, which possibly enlarges the set of variables assigned-to by the basic block into which thephi
was inserted. Multiple iterations can be required to reach a fixpoint. - Some liveness analysis can be applied to avoid inserting
phi
functions whose result would never be used.