Class JitDataFlowModel

java.lang.Object
ghidra.pcode.emu.jit.analysis.JitDataFlowModel

public class JitDataFlowModel extends Object
The data flow analysis for JIT-accelerated emulation.

This implements the Data Flow Analysis phase of the JitCompiler. The result is a use-def graph. The graph follows Static Single Assignment (SSA) form, in that each definition of a variable, even if it's at the same address as a previous definition, is given a unique identifier. The graph is bipartite with ops on one side and values on the other. Please node the distinction between a varnode and a variable in this context. A varnode refers to the address and size in the machine's state. For better or for worse, this is often referred to as a "variable" in other contexts. A variable in the SSA sense is a unique "instance" of a varnode with precisely one definition. Consider the following x86 assembly:

 MOV RAX, qword ptr [...]
 ADD RAX, RDX
 MOV qword ptr [...], RAX
 

Ignoring RAM, there are two varnodes at play, named for the registers they represent: RAX and RDX. However, there are three variables. The first is an instance of RAX, defined by the first MOV instruction. The second is an instance of RDX, which is implicitly defined as an input to the passage. The third is another instance of of RAX, defined by the ADD instruction. These could be given unique names RAX1, RDXin, and RAX2, respectively. Thus, the ADD instruction uses RAX1 and RDXin, to define RAX2. The last MOV instruction uses RAX2. If we plot each instruction and variable in a graph, drawing edges for each use and definition, we get a use-def graph.

Our analysis produces a use-def graph for the passage's p-code (not instructions) in two steps: First, we analyze each basic block independently. There are a lot of nuts and bolts in the implementation, but the analysis is achieved by straightforward interpretation of each block's p-code ops. Second, we connect the blocks' use-def graphs together using phi nodes where appropriate, according to the control flow.

Intra-block analysis

For each block, we create a p-code interpreter consisting of a JitDataFlowState and JitDataFlowExecutor. Both are given this model's JitDataFlowArithmetic, which populates the use-def graph. We then feed the block's p-code into the executor. The block gets a fresh JitDataFlowState, so that its result has no dependency on the interpretation of any other block, except in the numbering of variable identifiers; those must be unique across the model.

During interpretation, varnode accesses generate value nodes. When a constant varnode is accessed, it simply creates a JitConstVal. When an op produces an output, it generates a JitOutVar and places it into the interpreter's state for its varnode. When a varnode is read, the interpreter examines its state for the last definition. If one is found, the variable is returned, its use noted, and nothing new is generated. Otherwise, a JitMissingVar is generated. Note that the interpreter does not track memory variables in its state, because the JIT translator does not seek to optimize these. At run time, such accesses will affect the emulator's state immediately. Registers and Sleigh uniques, on the other hand, are allocated as JVM locals, so we must know how they are used and defined. Direct memory accesses generate JitDirectMemoryVar and JitMemoryOutVar. Indirect memory accesses are denoted by the load and store op nodes, not as variables. There is a dummy JitIndirectMemoryVar singleton, so that the state can return something when the memory address is not fixed.

Inter-block analysis

Up to this point, each block's use-def sub-graph is disconnected from the others'. We define each missing variable generated during block interpretation as a phi op. A phi op is said to belong to the block that generated the missing variable. We seek options for the phi op by examining the block's inward flows. For each source block, we check the most recent definition of the sought varnode. If one is present, the option is added to the phi op. Otherwise, we create an option by generating another phi op and taking its output. The new phi op belongs to the source block, and we recurse to seek its options. If a cycle is encountered, or we encounter a block with no inward flows, we do not recurse. An input variable is generated whenever we encounter a passage entry, indicating the variable could be defined outside the passage.

Note that the resulting phi ops may not adhere precisely to the formal definition of phi node. A phi op may have only one option. The recursive part of the option seeking algorithm generates chains of phi ops such that an option must come from an immediately upstream block, even if that block does not offer a direct definition. This may produce long chains when a varnode use is several block flows removed from a possible definition. We had considered simplifying/removing single-option phi ops afterward, but we found it too onerous, and the output bytecode is not improved. We do not generate bytecode for phi ops; they are synthetic and only used for analysis.

  • Constructor Details

    • JitDataFlowModel

      public JitDataFlowModel(JitAnalysisContext context, JitControlFlowModel cfm)
      Construct the data flow model.

      Analysis is performed as part of constructing the model.

      Parameters:
      context - the analysis context
      cfm - the control flow model
  • Method Details

    • getArithmetic

      public JitDataFlowArithmetic getArithmetic()
      Get the model's arithmetic that places p-code ops into the use-def graph
      Returns:
      the arithmetic
    • getLibrary

      public JitDataFlowUseropLibrary getLibrary()
      Get a wrapper library that places userop calls into the use-def graph
      Returns:
      the library
    • phiNodes

      public List<JitPhiOp> phiNodes()
      Get all the phi nodes in the use-def graph.
      Returns:
      the list of phi nodes
    • synthNodes

      public List<JitSyntheticOp> synthNodes()
      Get all the synthetic op nodes in the use-def graph.
      Returns:
      the list of synthetic op nodes
    • generateOutVar

      public JitOutVar generateOutVar(Varnode out)
      Generate a new op output variable for eventual placement in the use-def graph
      Parameters:
      out - the varnode describing the corresponding PcodeOp's output.
      Returns:
      the generated variable
      See Also:
    • generateDirectMemoryVar

      public JitDirectMemoryVar generateDirectMemoryVar(Varnode vn)
      Generate a variable representing a direct memory access
      Parameters:
      vn - the varnode, which ought to be neither register nor unique
      Returns:
      the variable
    • generateIndirectMemoryVar

      public JitIndirectMemoryVar generateIndirectMemoryVar(AddressSpace space, JitVal offset, int size, boolean quantize)
      Generate a variable representing an indirect memory access
      Parameters:
      space - the address space containing the variable, which out to be neither register nor unique
      offset - another variable describing the (dynamic) offset of the variable in the given space
      size - the number of bytes in the variable
      quantize - true if the offset should be quantized (as in getVar).
      Returns:
      the variable
      See Also:
    • notifyOp

      public <T extends JitOp> T notifyOp(T op)
      Add the given JitOp to the use-def graph
      Type Parameters:
      T - the type of the node
      Parameters:
      op - the op
      Returns:
      the same op
      See Also:
    • getJitOp

      public JitOp getJitOp(PcodeOp op)
      Get the use-def op node for the given p-code op

      NOTE: When used in testing, if the passage is manufactured from a PcodeProgram, the decoder will re-write the p-code ops as JitPassage.DecodedPcodeOps. Be sure to pass an op to this method that comes from the resulting JitPassage, not the original program, or else this method will certainly return null.

      Parameters:
      op - the p-code op from the source passage
      Returns:
      the node from the use-def graph, if present, or null
    • allValues

      public Set<JitVal> allValues()
      Get all values (and variables) in the use-def graph
      Returns:
      the set of values
    • allValuesSorted

      public List<JitVal> allValuesSorted()
      Same as allValues(), but sorted by ID with constants at the top
      Returns:
      the list of values
    • getOrCreateAnalyzer

      protected JitDataFlowBlockAnalyzer getOrCreateAnalyzer(JitControlFlowModel.JitBlock block)
    • getAnalyzer

      Get the per-block data flow analyzer for the given basic block
      Parameters:
      block - the block
      Returns:
      the analyzer
    • analyze

      protected void analyze()
      Construct the use-def graph
    • dumpResult

      public void dumpResult()
      For diagnostics: Dump the analysis result to stderr
      See Also:
    • dumpSynth

      public void dumpSynth()
      For diagnostics: Dump the synthetic ops to stderr
      See Also:
    • exportGraphviz

      public void exportGraphviz(File file)
      Generate a graphviz .dot file to visualize the use-def graph.

      WARNING: This is an internal diagnostic that is only as complete as it needed to be.

      Parameters:
      file - the output file