Class HashStore

java.lang.Object
ghidra.program.model.correlate.HashStore

public class HashStore extends Object
HashStore is a sorted, basic-block aware, store for Instruction "n-grams" to help quickly match similar sequences of Instructions between two functions. The Instructions comprising a single n-gram are hashed for quick lookup by the main matching algorithm (HashedFunctionAddressCorrelation). Hash diversity is important to minimize collisions, even though the number of hashes calculated for a single function pair match is small. Hashes are built and sorted respectively using the calcHashes() and insertHashes() methods. The main sort is on the number of collisions for a hash (indicating that there are duplicate or near duplicate instruction sequences), the hashes with fewer (or no) duplicates come first. The secondary sort is on "n", the number of Instructions in the n-gram, which effectively describes the significance of the match, or how unlikely the match is to occur at random. The main matching algorithm effectively creates a HashSort for both functions, and then in a loop calls hash = getFirstEntry() on one side to get the most significant possible match getEntry(has) to see if there is a matching n-gram on the other side If there is a match it is declared to the sort with the matchHash() call, allowing overlapping n-grams to be removed and deconflicting information to be updated. If there is no match, hashes can be removed with the removeHash() method to allow new hashes to move to the top of the sort. The store uses a couple of methods to help deconflict very similar sequences of instructions within the same function. Primarily, the sort is basic-block aware. All n-grams are contained within a single basic block, and when an initial match is found, hashes for other n-grams within that block (and its matching block on the other side) are modified so that n-grams within that block pair can only match each other.
  • Constructor Details

  • Method Details

    • getTotalInstructions

      public int getTotalInstructions()
      Returns:
      total number of Instructions in the whole function
    • numMatchedInstructions

      public int numMatchedInstructions()
      Returns:
      number of instructions that have been matched so far
    • removeHash

      public void removeHash(HashEntry hashEntry)
      Remove a particular HashEntry. This may affect multiple instructions.
      Parameters:
      hashEntry - is the entry
    • calcHashes

      public void calcHashes(int minLength, int maxLength, boolean wholeBlock, boolean matchOnly, HashCalculator hashCalc) throws MemoryAccessException
      Calculate hashes for all blocks
      Parameters:
      minLength - is the minimum length of an n-gram for these passes
      maxLength - is the maximum length of an n-gram for these passes
      wholeBlock - if true, allows blocks that are smaller than the minimum length to be considered as 1 n-gram.
      matchOnly - if true, only generates n-grams for sequences in previously matched blocks
      hashCalc - is the hash function
      Throws:
      MemoryAccessException
    • insertHashes

      public void insertHashes()
      Insert all hashes associated with unknown (i.e not matched) blocks and instructions
    • matchHash

      public void matchHash(HashStore.NgramMatch match, List<Instruction> instResult, List<CodeBlock> blockResult)
      Mark a particular n-gram hash and instruction as having a match. Set of instructions covered by n-gram are removed, and data structures are updated
      Parameters:
      match - is the n-gram being declared as a match
      instResult - collects the explicit set of Instructions matched
      blockResult - collects the explicit set of CodeBlocks matched
    • extendMatch

      public static void extendMatch(int nGramSize, InstructHash srcInstruct, HashStore.NgramMatch srcMatch, InstructHash destInstruct, HashStore.NgramMatch destMatch, HashCalculator hashCalc) throws MemoryAccessException
      Try to extend a match on a pair of n-grams to the Instructions right before and right after the n-gram. The match is extended if the Instruction adjacent to the n-gram, and its corresponding pair on the other side, hash to the same value using the hash function. The NgramMatch objects are updated to reflect the original n-gram match plus any additional extension.
      Parameters:
      nGramSize - is the original size of the matching n-gram.
      srcInstruct - is the first Instruction in the "source" n-gram
      srcMatch - is the "source" NgramMatch object to be populate
      destInstruct - is the first Instruction in the "destination" n-gram
      destMatch - is the "destination" NgramMatch object to populate
      hashCalc - is the hash function object
      Throws:
      MemoryAccessException
    • getUnmatchedInstructions

      public List<Instruction> getUnmatchedInstructions()
      Returns:
      list of unmatched instructions across the whole function
    • clearSort

      public void clearSort()
      Clear the main sort structures, but preserve blocks and instructions
    • isEmpty

      public boolean isEmpty()
      Returns:
      true if there are no n-grams left in the sort
    • getFirstEntry

      public HashEntry getFirstEntry()
      Returns:
      the first HashEntry in the sort. The least number of matching n-grams and the biggest n-gram.
    • getEntry

      public HashEntry getEntry(Hash hash)
      Get the HashEntry corresponding to a given hash
      Parameters:
      hash - is the Hash to match
      Returns:
      the set of n-grams (HashEntry) matching this hash
    • getBlock

      public Block getBlock(Address addr)
      Get the basic-block with the corresponding start Address
      Parameters:
      addr - is the starting address
      Returns:
      the Block object
    • getMonitor

      public TaskMonitor getMonitor()
      Returns:
      the TaskMonitor for this store