Class ExecutableScorer
java.lang.Object
ghidra.features.bsim.query.client.ExecutableScorer
- Direct Known Subclasses:
ExecutableScorerSingle
Class for accumulating a matrix of scores between pairs of executables
ExecutableRecords are registered with addExecutable. Scoring is accumulated
by repeatedly providing clusters of functions to scoreCluster.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
Container for a pair of FunctionDescriptions, possibly from different DescriptionManagers along with similarity/significance information -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected DescriptionManager
protected Map
<Integer, ExecutableRecord> protected double
protected double
protected ExecutableRecord
protected int
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected void
addExecutable
(ExecutableRecord exeRecord) Register an executable for the scoring matrixprotected boolean
checkPreliminaryPairThreshold
(int hitcount, int pairThreshold) Make check if we are going to have too many pairs.void
Commit the singled out executables self-significance score to permanent storageprotected void
commitSelfScore
(String md5, float selfScore) Commit a self-significance score for a specific executable to permanent storageint
getExecutable
(int index) Get the index-th executable.getExecutable
(String md5) Retrieve a specific ExecutableRecord by md5float
getNormalizedScore
(int a, boolean useLibrary) float
getNormalizedScore
(int a, int b, boolean useLibrary) Computes a score comparing two executables, normalized between 0.0 and 1.0, indicating the percentage of functional similarity between the two.float
getScore
(int a) Get score of executable (as compared to our singled out executable)float
getScore
(int a, int b) Return the similarity score between two executablesfloat
getSelfScore
(int a) Retrieve the similarity score of an executable with itselfdouble
double
float
protected void
Initialize the scoring matrix with zero.protected void
labelAndFilter
(DescriptionManager manage) For every executable in the container -manage-, if the executable matches up with a registered executable, set its xref index to match the registered executables xref index, otherwise set it to zero to indicate the executable should be filteredint
protected List
<ExecutableScorer.FunctionPair> pairFunctions
(LSHVectorFactory vectorFactory, List<DescriptionManager> vec2func, List<VectorResult> vectors, int hitcount, int pairThreshold) Generate all pairs of functions for any function associated with a list of vectors For each pair of functions generate the FunctionPair object with corresponding similarity and significance.protected void
Assuming all executables have been registered, establish index values for all executables to facilitate accessing the scoring matrixvoid
resetStorage
(double simThresh, double sigThresh) Clear any persistent storage for self-significance scores, and establish new thresholdsprotected boolean
scoreCluster
(LSHVectorFactory vectorFactory, List<DescriptionManager> vec2Functions, List<VectorResult> vectors, int hitcount, int pairThreshold) Given a cluster of vectors, the set of functions associated with each vector, a similarity threshold, and total number of functions in the cluster, let each pair of function contribute to the score matrixprotected void
Given a pair of score contributing functions that have been fully filtered, add the score into the matrixvoid
Set a single executable as focus to enable the single parameter getScore(int)protected void
Save off information about database settings to inform later queries
-
Field Details
-
executableSet
-
index2ExeMap
-
simThreshold
protected double simThreshold -
sigThreshold
protected double sigThreshold -
singleExe
-
singleExeXref
protected int singleExeXref
-
-
Constructor Details
-
ExecutableScorer
public ExecutableScorer()
-
-
Method Details
-
getSimThreshold
public double getSimThreshold()- Returns:
- the similarity threshold associated with these scores OR -1.0 if no threshold has been set
-
getSigThreshold
public double getSigThreshold()- Returns:
- the significance threshold associated with these scores
-
setSingleExecutable
Set a single executable as focus to enable the single parameter getScore(int)- Parameters:
md5
- is the 32-character md5 hash of the executable single out- Throws:
LSHException
- if we can't find the executable
-
countSelfScores
public int countSelfScores()- Returns:
- number of executable self-significance scores are (will be) available
-
resetStorage
Clear any persistent storage for self-significance scores, and establish new thresholds- Parameters:
simThresh
- is the new similarity thresholdsigThresh
- is the new significance threshold- Throws:
LSHException
- if there's a problem clearing storage
-
numExecutables
public int numExecutables()- Returns:
- the number of executables being compared
-
getSingularExecutable
- Returns:
- ExecutableRecord being singled out for comparison
-
getSingularSelfScore
public float getSingularSelfScore() -
getExecutable
Retrieve a specific ExecutableRecord by md5- Parameters:
md5
- is the MD5 string- Returns:
- the matching ExecutableRecord
- Throws:
LSHException
- if the ExecutableRecord isn't present
-
getExecutable
Get the index-th executable. NOTE: The first index is 1- Parameters:
index
- of the executable to retrieve- Returns:
- the ExecutableRecord describing the executable
-
transferSettings
Save off information about database settings to inform later queries- Parameters:
info
- is the information object returned by the database
-
addExecutable
Register an executable for the scoring matrix- Parameters:
exeRecord
- is the ExecutableRecord to register- Throws:
LSHException
- if the executable was already registered with different metadata
-
populateExecutableIndex
protected void populateExecutableIndex()Assuming all executables have been registered, establish index values for all executables to facilitate accessing the scoring matrix -
labelAndFilter
For every executable in the container -manage-, if the executable matches up with a registered executable, set its xref index to match the registered executables xref index, otherwise set it to zero to indicate the executable should be filtered- Parameters:
manage
- is the container of ExecutableRecords to label
-
initializeScores
protected void initializeScores()Initialize the scoring matrix with zero. The matrix size is the number of executables registered with addExecutable() -
scorePair
Given a pair of score contributing functions that have been fully filtered, add the score into the matrix- Parameters:
pair
- is the pair of functions
-
getScore
public float getScore(int a, int b) Return the similarity score between two executables- Parameters:
a
- is the index matching getXrefIndex() of the first executableb
- is the index matching getXrefIndex() of the second executable- Returns:
- the similarity score
-
getSelfScore
Retrieve the similarity score of an executable with itself- Parameters:
a
- is the index of the executable- Returns:
- its self-similarity score
- Throws:
LSHException
- if the score is not accessible
-
commitSelfScore
Commit the singled out executables self-significance score to permanent storage- Throws:
LSHException
- if there's a problem writing, or the operation isn't supported
-
commitSelfScore
Commit a self-significance score for a specific executable to permanent storage- Parameters:
md5
- is the 32-character md5 hash of the executableselfScore
- is the self-significance score- Throws:
LSHException
- if there's a problem writing, or the operation isn't supported
-
getScore
public float getScore(int a) Get score of executable (as compared to our singled out executable)- Parameters:
a
- is the index of the executable- Returns:
- the score
-
getNormalizedScore
Computes a score comparing two executables, normalized between 0.0 and 1.0, indicating the percentage of functional similarity between the two. 1.0 means "identical" 0.0 means completely "dissimilar"- Parameters:
a
- is the index of the first executableb
- is the index of the second executableuseLibrary
- is true if the score measures percent "containment" of the smaller executable in the larger.- Returns:
- the normalized score
- Throws:
LSHException
- if the self-scores for either executable are not available
-
getNormalizedScore
- Throws:
LSHException
-
pairFunctions
protected List<ExecutableScorer.FunctionPair> pairFunctions(LSHVectorFactory vectorFactory, List<DescriptionManager> vec2func, List<VectorResult> vectors, int hitcount, int pairThreshold) Generate all pairs of functions for any function associated with a list of vectors For each pair of functions generate the FunctionPair object with corresponding similarity and significance. This is inherently quadratic, but we try to be efficient. Duplicate vector pairs are only compared once and the similarity cached, and each function pair is generated only once, ie. (funcA,funcB) but not (funcB,funcA)- Parameters:
vectorFactory
- provides weights for significance scoresvec2func
- is the list of FunctionDescription sets associated with each vectorvectors
- is the list of vectorshitcount
- is the cumulative total of functionspairThreshold
- is the maximum number of pairs that can be produced- Returns:
- the array of FunctionPairs or null if pairThreshold is exceeded
-
checkPreliminaryPairThreshold
protected boolean checkPreliminaryPairThreshold(int hitcount, int pairThreshold) Make check if we are going to have too many pairs. This is preliminary because we haven't yet fetched the functions- Parameters:
hitcount
- is the total number of pairs to fetchpairThreshold
- is the maximum number of pairs allowed- Returns:
- true if the pair threshold is not exceeded
-
scoreCluster
protected boolean scoreCluster(LSHVectorFactory vectorFactory, List<DescriptionManager> vec2Functions, List<VectorResult> vectors, int hitcount, int pairThreshold) Given a cluster of vectors, the set of functions associated with each vector, a similarity threshold, and total number of functions in the cluster, let each pair of function contribute to the score matrix- Parameters:
vectorFactory
- is a factory for computing significance scoresvec2Functions
- is the list of sets of functionsvectors
- is the list of vectorshitcount
- is the number of functions in the clusterpairThreshold
- is maximum number of pairs allowed- Returns:
- true if the number of pairs was not exceeded
-