Class ExecutableScorer

java.lang.Object
ghidra.features.bsim.query.client.ExecutableScorer
Direct Known Subclasses:
ExecutableScorerSingle

public class ExecutableScorer extends Object
Class for accumulating a matrix of scores between pairs of executables ExecutableRecords are registered with addExecutable. Scoring is accumulated by repeatedly providing clusters of functions to scoreCluster.
  • Field Details

    • executableSet

      protected DescriptionManager executableSet
    • index2ExeMap

      protected Map<Integer,ExecutableRecord> index2ExeMap
    • simThreshold

      protected double simThreshold
    • sigThreshold

      protected double sigThreshold
    • singleExe

      protected ExecutableRecord singleExe
    • singleExeXref

      protected int singleExeXref
  • Constructor Details

    • ExecutableScorer

      public ExecutableScorer()
  • Method Details

    • getSimThreshold

      public double getSimThreshold()
      Returns:
      the similarity threshold associated with these scores OR -1.0 if no threshold has been set
    • getSigThreshold

      public double getSigThreshold()
      Returns:
      the significance threshold associated with these scores
    • setSingleExecutable

      public void setSingleExecutable(String md5) throws LSHException
      Set a single executable as focus to enable the single parameter getScore(int)
      Parameters:
      md5 - is the 32-character md5 hash of the executable single out
      Throws:
      LSHException - if we can't find the executable
    • countSelfScores

      public int countSelfScores()
      Returns:
      number of executable self-significance scores are (will be) available
    • resetStorage

      public void resetStorage(double simThresh, double sigThresh) throws LSHException
      Clear any persistent storage for self-significance scores, and establish new thresholds
      Parameters:
      simThresh - is the new similarity threshold
      sigThresh - is the new significance threshold
      Throws:
      LSHException - if there's a problem clearing storage
    • numExecutables

      public int numExecutables()
      Returns:
      the number of executables being compared
    • getSingularExecutable

      public ExecutableRecord getSingularExecutable()
      Returns:
      ExecutableRecord being singled out for comparison
    • getSingularSelfScore

      public float getSingularSelfScore()
    • getExecutable

      public ExecutableRecord getExecutable(String md5) throws LSHException
      Retrieve a specific ExecutableRecord by md5
      Parameters:
      md5 - is the MD5 string
      Returns:
      the matching ExecutableRecord
      Throws:
      LSHException - if the ExecutableRecord isn't present
    • getExecutable

      public ExecutableRecord getExecutable(int index)
      Get the index-th executable. NOTE: The first index is 1
      Parameters:
      index - of the executable to retrieve
      Returns:
      the ExecutableRecord describing the executable
    • transferSettings

      protected void transferSettings(DatabaseInformation info)
      Save off information about database settings to inform later queries
      Parameters:
      info - is the information object returned by the database
    • addExecutable

      protected void addExecutable(ExecutableRecord exeRecord) throws LSHException
      Register an executable for the scoring matrix
      Parameters:
      exeRecord - is the ExecutableRecord to register
      Throws:
      LSHException - if the executable was already registered with different metadata
    • populateExecutableIndex

      protected void populateExecutableIndex()
      Assuming all executables have been registered, establish index values for all executables to facilitate accessing the scoring matrix
    • labelAndFilter

      protected void labelAndFilter(DescriptionManager manage)
      For every executable in the container -manage-, if the executable matches up with a registered executable, set its xref index to match the registered executables xref index, otherwise set it to zero to indicate the executable should be filtered
      Parameters:
      manage - is the container of ExecutableRecords to label
    • initializeScores

      protected void initializeScores()
      Initialize the scoring matrix with zero. The matrix size is the number of executables registered with addExecutable()
    • scorePair

      protected void scorePair(ExecutableScorer.FunctionPair pair)
      Given a pair of score contributing functions that have been fully filtered, add the score into the matrix
      Parameters:
      pair - is the pair of functions
    • getScore

      public float getScore(int a, int b)
      Return the similarity score between two executables
      Parameters:
      a - is the index matching getXrefIndex() of the first executable
      b - is the index matching getXrefIndex() of the second executable
      Returns:
      the similarity score
    • getSelfScore

      public float getSelfScore(int a) throws LSHException
      Retrieve the similarity score of an executable with itself
      Parameters:
      a - is the index of the executable
      Returns:
      its self-similarity score
      Throws:
      LSHException - if the score is not accessible
    • commitSelfScore

      public void commitSelfScore() throws LSHException
      Commit the singled out executables self-significance score to permanent storage
      Throws:
      LSHException - if there's a problem writing, or the operation isn't supported
    • commitSelfScore

      protected void commitSelfScore(String md5, float selfScore) throws LSHException
      Commit a self-significance score for a specific executable to permanent storage
      Parameters:
      md5 - is the 32-character md5 hash of the executable
      selfScore - is the self-significance score
      Throws:
      LSHException - if there's a problem writing, or the operation isn't supported
    • getScore

      public float getScore(int a)
      Get score of executable (as compared to our singled out executable)
      Parameters:
      a - is the index of the executable
      Returns:
      the score
    • getNormalizedScore

      public float getNormalizedScore(int a, int b, boolean useLibrary) throws LSHException
      Computes a score comparing two executables, normalized between 0.0 and 1.0, indicating the percentage of functional similarity between the two. 1.0 means "identical" 0.0 means completely "dissimilar"
      Parameters:
      a - is the index of the first executable
      b - is the index of the second executable
      useLibrary - is true if the score measures percent "containment" of the smaller executable in the larger.
      Returns:
      the normalized score
      Throws:
      LSHException - if the self-scores for either executable are not available
    • getNormalizedScore

      public float getNormalizedScore(int a, boolean useLibrary) throws LSHException
      Throws:
      LSHException
    • pairFunctions

      protected List<ExecutableScorer.FunctionPair> pairFunctions(LSHVectorFactory vectorFactory, List<DescriptionManager> vec2func, List<VectorResult> vectors, int hitcount, int pairThreshold)
      Generate all pairs of functions for any function associated with a list of vectors For each pair of functions generate the FunctionPair object with corresponding similarity and significance. This is inherently quadratic, but we try to be efficient. Duplicate vector pairs are only compared once and the similarity cached, and each function pair is generated only once, ie. (funcA,funcB) but not (funcB,funcA)
      Parameters:
      vectorFactory - provides weights for significance scores
      vec2func - is the list of FunctionDescription sets associated with each vector
      vectors - is the list of vectors
      hitcount - is the cumulative total of functions
      pairThreshold - is the maximum number of pairs that can be produced
      Returns:
      the array of FunctionPairs or null if pairThreshold is exceeded
    • checkPreliminaryPairThreshold

      protected boolean checkPreliminaryPairThreshold(int hitcount, int pairThreshold)
      Make check if we are going to have too many pairs. This is preliminary because we haven't yet fetched the functions
      Parameters:
      hitcount - is the total number of pairs to fetch
      pairThreshold - is the maximum number of pairs allowed
      Returns:
      true if the pair threshold is not exceeded
    • scoreCluster

      protected boolean scoreCluster(LSHVectorFactory vectorFactory, List<DescriptionManager> vec2Functions, List<VectorResult> vectors, int hitcount, int pairThreshold)
      Given a cluster of vectors, the set of functions associated with each vector, a similarity threshold, and total number of functions in the cluster, let each pair of function contribute to the score matrix
      Parameters:
      vectorFactory - is a factory for computing significance scores
      vec2Functions - is the list of sets of functions
      vectors - is the list of vectors
      hitcount - is the number of functions in the cluster
      pairThreshold - is maximum number of pairs allowed
      Returns:
      true if the number of pairs was not exceeded