ghidra.pcode.emu.jit.analysis.JitDataFlowModel

public class JitDataFlowModel extends Object

The data flow analysis for JIT-accelerated emulation.

This implements the Data Flow Analysis phase of the JitCompiler. The result is a use-def graph. The graph follows Static Single Assignment (SSA) form, in that each definition of a variable, even if it's at the same address as a previous definition, is given a unique identifier. The graph is bipartite with ops on one side and values on the other. Please node the distinction between a varnode and a variable in this context. A varnode refers to the address and size in the machine's state. For better or for worse, this is often referred to as a "variable" in other contexts. A variable in the SSA sense is a unique "instance" of a varnode with precisely one definition. Consider the following x86 assembly:

 MOV RAX, qword ptr [...]
 ADD RAX, RDX
 MOV qword ptr [...], RAX

Ignoring RAM, there are two varnodes at play, named for the registers they represent: RAX and RDX. However, there are three variables. The first is an instance of RAX, defined by the first MOV instruction. The second is an instance of RDX, which is implicitly defined as an input to the passage. The third is another instance of of RAX, defined by the ADD instruction. These could be given unique names RAX₁, RDX_in, and RAX₂, respectively. Thus, the ADD instruction uses RAX₁ and RDX_in, to define RAX₂. The last MOV instruction uses RAX₂. If we plot each instruction and variable in a graph, drawing edges for each use and definition, we get a use-def graph.

Our analysis produces a use-def graph for the passage's p-code (not instructions) in two steps: First, we analyze each basic block independently. There are a lot of nuts and bolts in the implementation, but the analysis is achieved by straightforward interpretation of each block's p-code ops. Second, we connect the blocks' use-def graphs together using phi nodes where appropriate, according to the control flow.

Intra-block analysis

For each block, we create a p-code interpreter consisting of a JitDataFlowState and JitDataFlowExecutor. Both are given this model's JitDataFlowArithmetic, which populates the use-def graph. We then feed the block's p-code into the executor. The block gets a fresh JitDataFlowState, so that its result has no dependency on the interpretation of any other block, except in the numbering of variable identifiers; those must be unique across the model.

During interpretation, varnode accesses generate value nodes. When a constant varnode is accessed, it simply creates a JitConstVal. When an op produces an output, it generates a JitOutVar and places it into the interpreter's state for its varnode. When a varnode is read, the interpreter examines its state for the last definition. If one is found, the variable is returned, its use noted, and nothing new is generated. Otherwise, a JitMissingVar is generated. Note that the interpreter does not track memory variables in its state, because the JIT translator does not seek to optimize these. At run time, such accesses will affect the emulator's state immediately. Registers and Sleigh uniques, on the other hand, are allocated as JVM locals, so we must know how they are used and defined. Direct memory accesses generate JitDirectMemoryVar and JitMemoryOutVar. Indirect memory accesses are denoted by the load and store op nodes, not as variables. There is a dummy JitIndirectMemoryVar singleton, so that the state can return something when the memory address is not fixed.

Inter-block analysis

Up to this point, each block's use-def sub-graph is disconnected from the others'. We define each missing variable generated during block interpretation as a phi op. A phi op is said to belong to the block that generated the missing variable. We seek options for the phi op by examining the block's inward flows. For each source block, we check the most recent definition of the sought varnode. If one is present, the option is added to the phi op. Otherwise, we create an option by generating another phi op and taking its output. The new phi op belongs to the source block, and we recurse to seek its options. If a cycle is encountered, or we encounter a block with no inward flows, we do not recurse. An input variable is generated whenever we encounter a passage entry, indicating the variable could be defined outside the passage.

Note that the resulting phi ops may not adhere precisely to the formal definition of phi node. A phi op may have only one option. The recursive part of the option seeking algorithm generates chains of phi ops such that an option must come from an immediately upstream block, even if that block does not offer a direct definition. This may produce long chains when a varnode use is several block flows removed from a possible definition. We had considered simplifying/removing single-option phi ops afterward, but we found it too onerous, and the output bytecode is not improved. We do not generate bytecode for phi ops; they are synthetic and only used for analysis.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

protected class

JitDataFlowModel.GraphvizExporter

A diagnostic tool for visualizing the use-def graph.

protected class

JitDataFlowModel.ValCollector

An upward graph traversal for collecting all values in the use-def graph.
Constructor Summary

Constructors

Constructor

Description

JitDataFlowModel(JitAnalysisContext context, JitControlFlowModel cfm)

Construct the data flow model.
Method Summary

Modifier and Type

Method

Description

Set<JitVal>

allValues()

Get all values (and variables) in the use-def graph

List<JitVal>

allValuesSorted()

Same as allValues(), but sorted by ID with constants at the top

protected void

analyze()

Construct the use-def graph

void

dumpResult()

For diagnostics: Dump the analysis result to stderr

void

dumpSynth()

For diagnostics: Dump the synthetic ops to stderr

void

exportGraphviz(File file)

Generate a graphviz .dot file to visualize the use-def graph.

JitDirectMemoryVar

generateDirectMemoryVar(Varnode vn)

Generate a variable representing a direct memory access

JitIndirectMemoryVar

generateIndirectMemoryVar(AddressSpace space, JitVal offset, int size, boolean quantize)

Generate a variable representing an indirect memory access

JitOutVar

generateOutVar(Varnode out)

Generate a new op output variable for eventual placement in the use-def graph

JitDataFlowBlockAnalyzer

getAnalyzer(JitControlFlowModel.JitBlock block)

Get the per-block data flow analyzer for the given basic block

JitDataFlowArithmetic

getArithmetic()

Get the model's arithmetic that places p-code ops into the use-def graph

JitOp

getJitOp(PcodeOp op)

Get the use-def op node for the given p-code op

JitDataFlowUseropLibrary

getLibrary()

Get a wrapper library that places userop calls into the use-def graph

protected JitDataFlowBlockAnalyzer

getOrCreateAnalyzer(JitControlFlowModel.JitBlock block)

<T extends JitOp> T

notifyOp(T op)

Add the given JitOp to the use-def graph

List<JitPhiOp>

phiNodes()

Get all the phi nodes in the use-def graph.

List<JitSyntheticOp>

synthNodes()

Get all the synthetic op nodes in the use-def graph.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- JitDataFlowModel
  
  public JitDataFlowModel(JitAnalysisContext context, JitControlFlowModel cfm)
  
  Construct the data flow model.
  Analysis is performed as part of constructing the model.
  
  Parameters:
  
  context - the analysis context
  
  cfm - the control flow model
Method Details
- getArithmetic
  
  public JitDataFlowArithmetic getArithmetic()
  
  Get the model's arithmetic that places p-code ops into the use-def graph
  
  Returns:
  
  the arithmetic
- getLibrary
  
  public JitDataFlowUseropLibrary getLibrary()
  
  Get a wrapper library that places userop calls into the use-def graph
  
  Returns:
  
  the library
- phiNodes
  
  public List<JitPhiOp> phiNodes()
  
  Get all the phi nodes in the use-def graph.
  
  Returns:
  
  the list of phi nodes
- synthNodes
  
  public List<JitSyntheticOp> synthNodes()
  
  Get all the synthetic op nodes in the use-def graph.
  
  Returns:
  
  the list of synthetic op nodes
- generateOutVar
  
  public JitOutVar generateOutVar(Varnode out)
  
  Generate a new op output variable for eventual placement in the use-def graph
  Parameters:
  
  out - the varnode describing the corresponding PcodeOp's output.
  
  Returns:
  
  the generated variable
  
  See Also:
  
  JitDataFlowModel
- generateDirectMemoryVar
  
  public JitDirectMemoryVar generateDirectMemoryVar(Varnode vn)
  
  Generate a variable representing a direct memory access
  
  Parameters:
  
  vn - the varnode, which ought to be neither register nor unique
  
  Returns:
  
  the variable
- generateIndirectMemoryVar
  
  public JitIndirectMemoryVar generateIndirectMemoryVar(AddressSpace space, JitVal offset, int size, boolean quantize)
  
  Generate a variable representing an indirect memory access
  Parameters:
  
  space - the address space containing the variable, which out to be neither register nor unique
  
  offset - another variable describing the (dynamic) offset of the variable in the given space
  
  size - the number of bytes in the variable
  
  quantize - true if the offset should be quantized (as in getVar).
  
  Returns:
  
  the variable
  
  See Also:
  
  JitIndirectMemoryVar
  
  JitLoadOp
  
  JitStoreOp
- notifyOp
  
  public <T extends JitOp> T notifyOp(T op)
  
  Add the given JitOp to the use-def graph
  Type Parameters:
  
  T - the type of the node
  
  Parameters:
  
  op - the op
  
  Returns:
  
  the same op
  
  See Also:
  
  JitDataFlowModel
- getJitOp
  
  public JitOp getJitOp(PcodeOp op)
  
  Get the use-def op node for the given p-code op
  NOTE: When used in testing, if the passage is manufactured from a PcodeProgram, the decoder will re-write the p-code ops as JitPassage.DecodedPcodeOps. Be sure to pass an op to this method that comes from the resulting JitPassage, not the original program, or else this method will certainly return null.
  
  Parameters:
  
  op - the p-code op from the source passage
  
  Returns:
  
  the node from the use-def graph, if present, or null
- allValues
  
  public Set<JitVal> allValues()
  
  Get all values (and variables) in the use-def graph
  
  Returns:
  
  the set of values
- allValuesSorted
  
  public List<JitVal> allValuesSorted()
  
  Same as allValues(), but sorted by ID with constants at the top
  
  Returns:
  
  the list of values
- getOrCreateAnalyzer
  
  protected JitDataFlowBlockAnalyzer getOrCreateAnalyzer(JitControlFlowModel.JitBlock block)
- getAnalyzer
  
  public JitDataFlowBlockAnalyzer getAnalyzer(JitControlFlowModel.JitBlock block)
  
  Get the per-block data flow analyzer for the given basic block
  
  Parameters:
  
  block - the block
  
  Returns:
  
  the analyzer
- analyze
  
  protected void analyze()
  
  Construct the use-def graph
- dumpResult
  
  public void dumpResult()
  
  For diagnostics: Dump the analysis result to stderr
  See Also:
  
  JitCompiler.Diag.PRINT_DFM
- dumpSynth
  
  public void dumpSynth()
  
  For diagnostics: Dump the synthetic ops to stderr
  See Also:
  
  JitCompiler.Diag.PRINT_SYNTH
- exportGraphviz
  
  public void exportGraphviz(File file)
  
  Generate a graphviz .dot file to visualize the use-def graph.
  WARNING: This is an internal diagnostic that is only as complete as it needed to be.
  
  Parameters:
  
  file - the output file

Class JitDataFlowModel

Intra-block analysis

Inter-block analysis

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

JitDataFlowModel

Method Details

getArithmetic

getLibrary

phiNodes

synthNodes

generateOutVar

generateDirectMemoryVar

generateIndirectMemoryVar

notifyOp

getJitOp

allValues

allValuesSorted

getOrCreateAnalyzer

getAnalyzer

analyze

dumpResult

dumpSynth

exportGraphviz