Package ghidra.program.model.correlate
Class HashStore
java.lang.Object
ghidra.program.model.correlate.HashStore
HashStore is a sorted, basic-block aware, store for Instruction "n-grams" to help quickly match similar
sequences of Instructions between two functions. The Instructions comprising a single n-gram are hashed
for quick lookup by the main matching algorithm (HashedFunctionAddressCorrelation). Hash diversity is
important to minimize collisions, even though the number of hashes calculated for a single function pair
match is small.
Hashes are built and sorted respectively using the calcHashes() and insertHashes() methods. The main sort
is on the number of collisions for a hash (indicating that there are duplicate or near duplicate instruction
sequences), the hashes with fewer (or no) duplicates come first. The secondary sort is on
"n", the number of Instructions in the n-gram, which effectively describes the significance of the match, or how
unlikely the match is to occur at random. The main matching algorithm effectively creates a HashSort for both
functions, and then in a loop calls
hash = getFirstEntry() on one side to get the most significant possible match
getEntry(has) to see if there is a matching n-gram on the other side
If there is a match it is declared to the sort with the matchHash() call, allowing overlapping n-grams to be
removed and deconflicting information to be updated. If there is no match, hashes can be removed with the
removeHash() method to allow new hashes to move to the top of the sort.
The store uses a couple of methods to help deconflict very similar sequences of instructions within the same function.
Primarily, the sort is basic-block aware. All n-grams are contained within a single basic block, and when an initial
match is found, hashes for other n-grams within that block (and its matching block on the other side) are modified
so that n-grams within that block pair can only match each other.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Class explicitly labeling (one-side of) a matching n-gram pair. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
calcHashes
(int minLength, int maxLength, boolean wholeBlock, boolean matchOnly, HashCalculator hashCalc) Calculate hashes for all blocksvoid
Clear the main sort structures, but preserve blocks and instructionsstatic void
extendMatch
(int nGramSize, InstructHash srcInstruct, HashStore.NgramMatch srcMatch, InstructHash destInstruct, HashStore.NgramMatch destMatch, HashCalculator hashCalc) Try to extend a match on a pair of n-grams to the Instructions right before and right after the n-gram.Get the basic-block with the corresponding start AddressGet the HashEntry corresponding to a given hashint
void
Insert all hashes associated with unknown (i.e not matched) blocks and instructionsboolean
isEmpty()
void
matchHash
(HashStore.NgramMatch match, List<Instruction> instResult, List<CodeBlock> blockResult) Mark a particular n-gram hash and instruction as having a match.int
void
removeHash
(HashEntry hashEntry) Remove a particular HashEntry.
-
Constructor Details
-
HashStore
- Throws:
CancelledException
-
-
Method Details
-
getTotalInstructions
public int getTotalInstructions()- Returns:
- total number of Instructions in the whole function
-
numMatchedInstructions
public int numMatchedInstructions()- Returns:
- number of instructions that have been matched so far
-
removeHash
Remove a particular HashEntry. This may affect multiple instructions.- Parameters:
hashEntry
- is the entry
-
calcHashes
public void calcHashes(int minLength, int maxLength, boolean wholeBlock, boolean matchOnly, HashCalculator hashCalc) throws MemoryAccessException Calculate hashes for all blocks- Parameters:
minLength
- is the minimum length of an n-gram for these passesmaxLength
- is the maximum length of an n-gram for these passeswholeBlock
- if true, allows blocks that are smaller than the minimum length to be considered as 1 n-gram.matchOnly
- if true, only generates n-grams for sequences in previously matched blockshashCalc
- is the hash function- Throws:
MemoryAccessException
-
insertHashes
public void insertHashes()Insert all hashes associated with unknown (i.e not matched) blocks and instructions -
matchHash
public void matchHash(HashStore.NgramMatch match, List<Instruction> instResult, List<CodeBlock> blockResult) Mark a particular n-gram hash and instruction as having a match. Set of instructions covered by n-gram are removed, and data structures are updated- Parameters:
match
- is the n-gram being declared as a matchinstResult
- collects the explicit set of Instructions matchedblockResult
- collects the explicit set of CodeBlocks matched
-
extendMatch
public static void extendMatch(int nGramSize, InstructHash srcInstruct, HashStore.NgramMatch srcMatch, InstructHash destInstruct, HashStore.NgramMatch destMatch, HashCalculator hashCalc) throws MemoryAccessException Try to extend a match on a pair of n-grams to the Instructions right before and right after the n-gram. The match is extended if the Instruction adjacent to the n-gram, and its corresponding pair on the other side, hash to the same value using the hash function. The NgramMatch objects are updated to reflect the original n-gram match plus any additional extension.- Parameters:
nGramSize
- is the original size of the matching n-gram.srcInstruct
- is the first Instruction in the "source" n-gramsrcMatch
- is the "source" NgramMatch object to be populatedestInstruct
- is the first Instruction in the "destination" n-gramdestMatch
- is the "destination" NgramMatch object to populatehashCalc
- is the hash function object- Throws:
MemoryAccessException
-
getUnmatchedInstructions
- Returns:
- list of unmatched instructions across the whole function
-
clearSort
public void clearSort()Clear the main sort structures, but preserve blocks and instructions -
isEmpty
public boolean isEmpty()- Returns:
- true if there are no n-grams left in the sort
-
getFirstEntry
- Returns:
- the first HashEntry in the sort. The least number of matching n-grams and the biggest n-gram.
-
getEntry
Get the HashEntry corresponding to a given hash- Parameters:
hash
- is the Hash to match- Returns:
- the set of n-grams (HashEntry) matching this hash
-
getBlock
Get the basic-block with the corresponding start Address- Parameters:
addr
- is the starting address- Returns:
- the Block object
-
getMonitor
- Returns:
- the TaskMonitor for this store
-