Class ExecutableScorer
java.lang.Object
ghidra.features.bsim.query.client.ExecutableScorer
- Direct Known Subclasses:
ExecutableScorerSingle
Class for accumulating a matrix of scores between pairs of executables
ExecutableRecords are registered with addExecutable. Scoring is accumulated
by repeatedly providing clusters of functions to scoreCluster.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classContainer for a pair of FunctionDescriptions, possibly from different DescriptionManagers along with similarity/significance information -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected DescriptionManagerprotected Map<Integer, ExecutableRecord> protected doubleprotected doubleprotected ExecutableRecordprotected int -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidaddExecutable(ExecutableRecord exeRecord) Register an executable for the scoring matrixprotected booleancheckPreliminaryPairThreshold(int hitcount, int pairThreshold) Make check if we are going to have too many pairs.voidCommit the singled out executables self-significance score to permanent storageprotected voidcommitSelfScore(String md5, float selfScore) Commit a self-significance score for a specific executable to permanent storageintgetExecutable(int index) Get the index-th executable.getExecutable(String md5) Retrieve a specific ExecutableRecord by md5floatgetNormalizedScore(int a, boolean useLibrary) floatgetNormalizedScore(int a, int b, boolean useLibrary) Computes a score comparing two executables, normalized between 0.0 and 1.0, indicating the percentage of functional similarity between the two.floatgetScore(int a) Get score of executable (as compared to our singled out executable)floatgetScore(int a, int b) Return the similarity score between two executablesfloatgetSelfScore(int a) Retrieve the similarity score of an executable with itselfdoubledoublefloatprotected voidInitialize the scoring matrix with zero.protected voidlabelAndFilter(DescriptionManager manage) For every executable in the container -manage-, if the executable matches up with a registered executable, set its xref index to match the registered executables xref index, otherwise set it to zero to indicate the executable should be filteredintprotected List<ExecutableScorer.FunctionPair> pairFunctions(LSHVectorFactory vectorFactory, List<DescriptionManager> vec2func, List<VectorResult> vectors, int hitcount, int pairThreshold) Generate all pairs of functions for any function associated with a list of vectors For each pair of functions generate the FunctionPair object with corresponding similarity and significance.protected voidAssuming all executables have been registered, establish index values for all executables to facilitate accessing the scoring matrixvoidresetStorage(double simThresh, double sigThresh) Clear any persistent storage for self-significance scores, and establish new thresholdsprotected booleanscoreCluster(LSHVectorFactory vectorFactory, List<DescriptionManager> vec2Functions, List<VectorResult> vectors, int hitcount, int pairThreshold) Given a cluster of vectors, the set of functions associated with each vector, a similarity threshold, and total number of functions in the cluster, let each pair of function contribute to the score matrixprotected voidGiven a pair of score contributing functions that have been fully filtered, add the score into the matrixvoidSet a single executable as focus to enable the single parameter getScore(int)protected voidSave off information about database settings to inform later queries
-
Field Details
-
executableSet
-
index2ExeMap
-
simThreshold
protected double simThreshold -
sigThreshold
protected double sigThreshold -
singleExe
-
singleExeXref
protected int singleExeXref
-
-
Constructor Details
-
ExecutableScorer
public ExecutableScorer()
-
-
Method Details
-
getSimThreshold
public double getSimThreshold()- Returns:
- the similarity threshold associated with these scores OR -1.0 if no threshold has been set
-
getSigThreshold
public double getSigThreshold()- Returns:
- the significance threshold associated with these scores
-
setSingleExecutable
Set a single executable as focus to enable the single parameter getScore(int)- Parameters:
md5- is the 32-character md5 hash of the executable single out- Throws:
LSHException- if we can't find the executable
-
countSelfScores
public int countSelfScores()- Returns:
- number of executable self-significance scores are (will be) available
-
resetStorage
Clear any persistent storage for self-significance scores, and establish new thresholds- Parameters:
simThresh- is the new similarity thresholdsigThresh- is the new significance threshold- Throws:
LSHException- if there's a problem clearing storage
-
numExecutables
public int numExecutables()- Returns:
- the number of executables being compared
-
getSingularExecutable
- Returns:
- ExecutableRecord being singled out for comparison
-
getSingularSelfScore
public float getSingularSelfScore() -
getExecutable
Retrieve a specific ExecutableRecord by md5- Parameters:
md5- is the MD5 string- Returns:
- the matching ExecutableRecord
- Throws:
LSHException- if the ExecutableRecord isn't present
-
getExecutable
Get the index-th executable. NOTE: The first index is 1- Parameters:
index- of the executable to retrieve- Returns:
- the ExecutableRecord describing the executable
-
transferSettings
Save off information about database settings to inform later queries- Parameters:
info- is the information object returned by the database
-
addExecutable
Register an executable for the scoring matrix- Parameters:
exeRecord- is the ExecutableRecord to register- Throws:
LSHException- if the executable was already registered with different metadata
-
populateExecutableIndex
protected void populateExecutableIndex()Assuming all executables have been registered, establish index values for all executables to facilitate accessing the scoring matrix -
labelAndFilter
For every executable in the container -manage-, if the executable matches up with a registered executable, set its xref index to match the registered executables xref index, otherwise set it to zero to indicate the executable should be filtered- Parameters:
manage- is the container of ExecutableRecords to label
-
initializeScores
protected void initializeScores()Initialize the scoring matrix with zero. The matrix size is the number of executables registered with addExecutable() -
scorePair
Given a pair of score contributing functions that have been fully filtered, add the score into the matrix- Parameters:
pair- is the pair of functions
-
getScore
public float getScore(int a, int b) Return the similarity score between two executables- Parameters:
a- is the index matching getXrefIndex() of the first executableb- is the index matching getXrefIndex() of the second executable- Returns:
- the similarity score
-
getSelfScore
Retrieve the similarity score of an executable with itself- Parameters:
a- is the index of the executable- Returns:
- its self-similarity score
- Throws:
LSHException- if the score is not accessible
-
commitSelfScore
Commit the singled out executables self-significance score to permanent storage- Throws:
LSHException- if there's a problem writing, or the operation isn't supported
-
commitSelfScore
Commit a self-significance score for a specific executable to permanent storage- Parameters:
md5- is the 32-character md5 hash of the executableselfScore- is the self-significance score- Throws:
LSHException- if there's a problem writing, or the operation isn't supported
-
getScore
public float getScore(int a) Get score of executable (as compared to our singled out executable)- Parameters:
a- is the index of the executable- Returns:
- the score
-
getNormalizedScore
Computes a score comparing two executables, normalized between 0.0 and 1.0, indicating the percentage of functional similarity between the two. 1.0 means "identical" 0.0 means completely "dissimilar"- Parameters:
a- is the index of the first executableb- is the index of the second executableuseLibrary- is true if the score measures percent "containment" of the smaller executable in the larger.- Returns:
- the normalized score
- Throws:
LSHException- if the self-scores for either executable are not available
-
getNormalizedScore
- Throws:
LSHException
-
pairFunctions
protected List<ExecutableScorer.FunctionPair> pairFunctions(LSHVectorFactory vectorFactory, List<DescriptionManager> vec2func, List<VectorResult> vectors, int hitcount, int pairThreshold) Generate all pairs of functions for any function associated with a list of vectors For each pair of functions generate the FunctionPair object with corresponding similarity and significance. This is inherently quadratic, but we try to be efficient. Duplicate vector pairs are only compared once and the similarity cached, and each function pair is generated only once, ie. (funcA,funcB) but not (funcB,funcA)- Parameters:
vectorFactory- provides weights for significance scoresvec2func- is the list of FunctionDescription sets associated with each vectorvectors- is the list of vectorshitcount- is the cumulative total of functionspairThreshold- is the maximum number of pairs that can be produced- Returns:
- the array of FunctionPairs or null if pairThreshold is exceeded
-
checkPreliminaryPairThreshold
protected boolean checkPreliminaryPairThreshold(int hitcount, int pairThreshold) Make check if we are going to have too many pairs. This is preliminary because we haven't yet fetched the functions- Parameters:
hitcount- is the total number of pairs to fetchpairThreshold- is the maximum number of pairs allowed- Returns:
- true if the pair threshold is not exceeded
-
scoreCluster
protected boolean scoreCluster(LSHVectorFactory vectorFactory, List<DescriptionManager> vec2Functions, List<VectorResult> vectors, int hitcount, int pairThreshold) Given a cluster of vectors, the set of functions associated with each vector, a similarity threshold, and total number of functions in the cluster, let each pair of function contribute to the score matrix- Parameters:
vectorFactory- is a factory for computing significance scoresvec2Functions- is the list of sets of functionsvectors- is the list of vectorshitcount- is the number of functions in the clusterpairThreshold- is maximum number of pairs allowed- Returns:
- true if the number of pairs was not exceeded
-