Class JitDataFlowModel
This implements the Data Flow Analysis phase of the JitCompiler
. The result is a use-def
graph. The graph follows Static Single Assignment (SSA) form, in that each definition of a
variable, even if it's at the same address as a previous definition, is given a unique
identifier. The graph is bipartite with ops
on one side and values
on the other. Please node the distinction between a varnode and a variable in
this context. A varnode refers to the address and size in the machine's state. For
better or for worse, this is often referred to as a "variable" in other contexts. A
variable in the SSA sense is a unique "instance" of a varnode with precisely one
definition. Consider the following x86 assembly:
MOV RAX, qword ptr [...] ADD RAX, RDX MOV qword ptr [...], RAX
Ignoring RAM, there are two varnodes at play, named for the registers they represent: RAX
and RDX
. However, there are three variables. The first is an instance of RAX
,
defined by the first MOV
instruction. The second is an instance of RDX
, which is
implicitly defined as an input to the passage. The third is another instance of of RAX
,
defined by the ADD
instruction. These could be given unique names
RAX
1, RDX
in, and RAX
2, respectively.
Thus, the ADD
instruction uses RAX
1 and RDX
in, to
define RAX
2. The last MOV
instruction uses RAX
2. If
we plot each instruction and variable in a graph, drawing edges for each use and definition, we
get a use-def graph.
Our analysis produces a use-def graph for the passage's p-code (not instructions) in two steps: First, we analyze each basic block independently. There are a lot of nuts and bolts in the implementation, but the analysis is achieved by straightforward interpretation of each block's p-code ops. Second, we connect the blocks' use-def graphs together using phi nodes where appropriate, according to the control flow.
Intra-block analysis
For each block, we create a p-code interpreter consisting of a JitDataFlowState
and
JitDataFlowExecutor
. Both are given this model's JitDataFlowArithmetic
, which
populates the use-def graph. We then feed the block's p-code into the executor. The block gets a
fresh JitDataFlowState
, so that its result has no dependency on the interpretation of any
other block, except in the numbering of variable identifiers; those must be unique across the
model.
During interpretation, varnode accesses generate value nodes. When a constant varnode is
accessed, it simply creates a JitConstVal
. When an op produces an output, it generates a
JitOutVar
and places it into the interpreter's state
for its
varnode. When a varnode is read, the interpreter examines its state for the last definition. If
one is found, the variable is returned, its use noted, and nothing new is generated. Otherwise, a
JitMissingVar
is generated. Note that the interpreter does not track memory variables in
its state, because the JIT translator does not seek to optimize these. At run time, such accesses
will affect the emulator's state immediately. Registers and Sleigh uniques, on the other hand,
are allocated as JVM locals, so we must know how they are used and defined. Direct memory
accesses generate JitDirectMemoryVar
and JitMemoryOutVar
. Indirect memory
accesses are denoted by the load
and store
op nodes, not as
variables. There is a dummy JitIndirectMemoryVar
singleton, so that the state can return
something when the memory address is not fixed.
Inter-block analysis
Up to this point, each block's use-def sub-graph is disconnected from the others'. We define each
missing
variable generated during block interpretation as a phi
op. A phi op is said to belong to the block that generated the missing variable. We seek
options for the phi op by examining the block's inward flows. For each source block, we check the
most recent definition of the sought varnode. If one is present, the option is added to the phi
op. Otherwise, we create an option by generating another phi op and taking its output. The new
phi op belongs to the source block, and we recurse to seek its options. If a cycle is
encountered, or we encounter a block with no inward flows, we do not recurse. An
input
variable is generated whenever we encounter a passage entry, indicating
the variable could be defined outside the passage.
Note that the resulting phi ops may not adhere precisely to the formal definition of phi node. A phi op may have only one option. The recursive part of the option seeking algorithm generates chains of phi ops such that an option must come from an immediately upstream block, even if that block does not offer a direct definition. This may produce long chains when a varnode use is several block flows removed from a possible definition. We had considered simplifying/removing single-option phi ops afterward, but we found it too onerous, and the output bytecode is not improved. We do not generate bytecode for phi ops; they are synthetic and only used for analysis.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected class
A diagnostic tool for visualizing the use-def graph.protected class
An upward graph traversal for collecting all values in the use-def graph. -
Constructor Summary
ConstructorsConstructorDescriptionJitDataFlowModel
(JitAnalysisContext context, JitControlFlowModel cfm) Construct the data flow model. -
Method Summary
Modifier and TypeMethodDescriptionGet all values (and variables) in the use-def graphSame asallValues()
, but sorted by ID with constants at the topprotected void
analyze()
Construct the use-def graphvoid
For diagnostics: Dump the analysis result to stderrvoid
For diagnostics: Dump the synthetic ops to stderrvoid
exportGraphviz
(File file) Generate a graphviz .dot file to visualize the use-def graph.Generate a variable representing a direct memory accessgenerateIndirectMemoryVar
(AddressSpace space, JitVal offset, int size, boolean quantize) Generate a variable representing an indirect memory accessgenerateOutVar
(Varnode out) Generate a new op output variable for eventual placement in the use-def graphGet the per-block data flow analyzer for the given basic blockGet the model's arithmetic that places p-code ops into the use-def graphGet the use-def op node for the given p-code opGet a wrapper library that places userop calls into the use-def graphprotected JitDataFlowBlockAnalyzer
<T extends JitOp>
TnotifyOp
(T op) Add the givenJitOp
to the use-def graphphiNodes()
Get all the phi nodes in the use-def graph.Get all the synthetic op nodes in the use-def graph.
-
Constructor Details
-
JitDataFlowModel
Construct the data flow model.Analysis is performed as part of constructing the model.
- Parameters:
context
- the analysis contextcfm
- the control flow model
-
-
Method Details
-
getArithmetic
Get the model's arithmetic that places p-code ops into the use-def graph- Returns:
- the arithmetic
-
getLibrary
Get a wrapper library that places userop calls into the use-def graph- Returns:
- the library
-
phiNodes
Get all the phi nodes in the use-def graph.- Returns:
- the list of phi nodes
-
synthNodes
Get all the synthetic op nodes in the use-def graph.- Returns:
- the list of synthetic op nodes
-
generateOutVar
Generate a new op output variable for eventual placement in the use-def graph -
generateDirectMemoryVar
Generate a variable representing a direct memory access- Parameters:
vn
- the varnode, which ought to be neither register nor unique- Returns:
- the variable
-
generateIndirectMemoryVar
public JitIndirectMemoryVar generateIndirectMemoryVar(AddressSpace space, JitVal offset, int size, boolean quantize) Generate a variable representing an indirect memory access- Parameters:
space
- the address space containing the variable, which out to be neither register nor uniqueoffset
- another variable describing the (dynamic) offset of the variable in the given spacesize
- the number of bytes in the variablequantize
- true if the offset should be quantized (as ingetVar
).- Returns:
- the variable
- See Also:
-
notifyOp
Add the givenJitOp
to the use-def graph- Type Parameters:
T
- the type of the node- Parameters:
op
- the op- Returns:
- the same op
- See Also:
-
getJitOp
Get the use-def op node for the given p-code opNOTE: When used in testing, if the passage is manufactured from a
PcodeProgram
, the decoder will re-write the p-code ops asJitPassage.DecodedPcodeOp
s. Be sure to pass an op to this method that comes from the resultingJitPassage
, not the original program, or else this method will certainly returnnull
.- Parameters:
op
- the p-code op from the source passage- Returns:
- the node from the use-def graph, if present, or
null
-
allValues
Get all values (and variables) in the use-def graph- Returns:
- the set of values
-
allValuesSorted
Same asallValues()
, but sorted by ID with constants at the top- Returns:
- the list of values
-
getOrCreateAnalyzer
-
getAnalyzer
Get the per-block data flow analyzer for the given basic block- Parameters:
block
- the block- Returns:
- the analyzer
-
analyze
protected void analyze()Construct the use-def graph -
dumpResult
public void dumpResult()For diagnostics: Dump the analysis result to stderr- See Also:
-
dumpSynth
public void dumpSynth()For diagnostics: Dump the synthetic ops to stderr- See Also:
-
exportGraphviz
Generate a graphviz .dot file to visualize the use-def graph.WARNING: This is an internal diagnostic that is only as complete as it needed to be.
- Parameters:
file
- the output file
-