Class JitVarScopeModel
This implements the Variable Scope Analysis phase of the JitCompiler
. The result provides
the set of in-scope (alive) varnodes for each basic block. The design of this analysis, and the
shortcuts we take, are informed by the design of downstream phases. In particular, we do not
intend to allocate each SSA variable. There are often many, many such variables, and attempting
to allocate them to as few target resources, e.g., JVM locals, as possible is probably a
complicated and expensive algorithm. I don't think we'd gain much from it either. Instead, we'll
just allocate by varnode. To do that, though, we still have to consider that some varnodes
overlap and otherwise alias others. If we are able to handle all that aliasing in place, then we
need not generate code for the synthetic ops. One might ask, well then why do any of the Data
Flow Analysis in the first place? 1) We still need data flow to inform the selection of JVM local
types. We have not measured the run-time cost of the bitwise casts, but we do know the bytecode
for each cast occupies space, counted against the 65,535-byte max. 2) We also need data flow to
inform operation elimination, which removes many wasted flag computations.
To handle the aliasing, we coalesce overlapping varnodes. For example, EAX
will get
coalesced with RAX
, but BH
will not get coalesced with BL
,
assuming no other part of RBX
is accessed. The JitDataFlowModel
records all
varnodes accessed in the course of its intra-block analysis. Only those actually accessed are
considered. We then compute scope in terms of these coalesced varnodes. For example, if both
RAX
and EAX
are used by a passage, then an access of EAX
causes
RAX
to remain in scope.
The decision to compute scope on a block-by-block basis instead of op-by-op is for simplicity. We
intend to birth and retire variables along block transitions by considering what variables are
coming into or leaving scope on the flow edge. Birthing is just reading a variable's
value from the run-time state
into its allocated JVM local.
Conversely, retiring is writing the value back out to the state. There's little to be
gained by retiring a variable midway through a block as opposed to the end of the block. Perhaps
if one giant block handles a series of variables in sequence, we could have used a single JVM
local to allocate each, but we're already committed to allocating a JVM local per (coalesced)
varnode. So, while that may ensure only one variable is alive at a time, the number of JVM locals
required remains the same. Furthermore, the amount of bytecode emitted remains the same, but at
different locations in the block. The case where this might be worth considering is a userop
invocation, because all live variables must be forcefully retired.
We then consider what common cases we want to ensure are optimized, when we're limited to a block-by-block analysis. One that comes to mind is a function with an early bail. Consider the following C source:
int func(my_struct* ptr) { if (ptr == NULL) { return ERR; } // Do some serious work return ptr->v; }
Often, the C compiler will group all the returns into one final basic block, so we might get the following p-code:
1 RSP = INT_SUB RSP, 0x20:8 2 $U00:1 = INT_EQUAL RDI, 0:8 # RDI is ptr 3 CBRANCH <err>, $U0:1 4 # Do some serious work 5 $U10:8 = INT_ADD RDI, 0xc:8 # Offset to field v 6 EAX = LOAD [ram] $U10:8 7 BRANCH <exit> <err> 8 EAX = COPY 0xffffffff:4 <exit> 9 RSP = INT_ADD RSP, 0x20:8 10 RIP = LOAD [ram] RSP 11 RSP = INT_ADD RSP, 8:8 12 RETURN RIP
Note that I've elided the actual x86 machine code and all of the noise generated by C compilation and p-code lifting, and I've presumed the decoded passage contains exactly the example function. The result is your typical if-else diamond. We'll place the error case on the left:
+---------+ | 1--3 | | CBRANCH | +-T-----F-+ / \ / \ +--------+ +--------+ | 8 | | 4--7 | | (fall) | | BRANCH | +--------+ +--------+ \ / \ / +---------+ | 9--12 | | RETURN | +---------+
Suppose the "serious work" on line 4 accesses several varnodes: RBX, RCX, RDX, and RSI. If execution follows the error path, we'd rather not birth any of those variables. Thus, we might like the result of the scope analysis to be:
Block | Live Vars |
---|---|
1–3 | RDI, RSP, $U00:1 |
4–7 | EAX, RBX, RCX, RDI, RDX, RSI, RSP, $U10:8 |
8 | EAX, RSP |
9–12 | RIP, RSP |
This can be achieved rather simply: Define two sets for each block, the upward view and the
downward view. The first corresponds to all varnodes that could be accessed before entering this
block or while in it. The second corresponds to all varnodes that could be access while in this
block or after leaving it. The upward view is computed by initializing each set to the varnodes
accessed by its block. Then we "push" each set upward by adding its elements into the set for
each block with flows into this one, until the sets converge. The downward sets are similarly
computed, independently of the upward sets. The result is the intersection of these sets, per
block. The algorithm is somewhat intuitive in that we accrue live variables as we move toward the
"body" of the control flow graph, and they begin to drop off as we approach an exit. The accrual
is captured by the downward set, and the drop off is captured by intersection with the upward
set. This will also prevent retirement and rebirth of variables. Essentially, if we are between
two accesses of a varnode, then that varnode is alive. Consider RSP
from the example
above. The algorithm considers it alive in blocks 4–7 and 8, despite the fact neither
actually accesses it. Nevertheless, we'd rather generate one birth upon entering block 1–3,
keep it alive in the body, and then generate one retirement upon leaving block 9–12.
One notable effect of this algorithm is that all blocks in a loop will have the same variables in scope.... I think this is okay. We'll birth the relevant variables upon entering the loop, keep them all alive during loop execution, and then retire them (unless they're accessed downstream) upon leaving.
-
Constructor Summary
ConstructorsConstructorDescriptionConstruct the model -
Method Summary
Modifier and TypeMethodDescriptionGet the collection of all coalesced varnodesvoid
For diagnostics: Dump the analysis result to stderrgetCoalesced
(Varnode part) Get the varnode into which the given varnode was coalescedGet the set of live varnodes for the given block
-
Constructor Details
-
JitVarScopeModel
Construct the model- Parameters:
cfm
- the control flow modeldfm
- the data flow model
-
-
Method Details
-
getCoalesced
Get the varnode into which the given varnode was coalescedIn many cases, the result is the same varnode.
- Parameters:
part
- the varnode- Returns:
- the coalesced varnode
-
coalescedVarnodes
Get the collection of all coalesced varnodes- Returns:
- the varnodes
-
getLiveVars
Get the set of live varnodes for the given block- Parameters:
block
- the block- Returns:
- the live varnodes
-
dumpResult
public void dumpResult()For diagnostics: Dump the analysis result to stderr- See Also:
-