Class LSHVectorFactory

java.lang.Object
generic.lsh.vector.LSHVectorFactory
Direct Known Subclasses:
WeightedLSHCosineVectorFactory

public abstract class LSHVectorFactory extends Object
  • Field Details

    • weightFactory

      protected WeightFactory weightFactory
    • idfLookup

      protected IDFLookup idfLookup
    • settings

      protected int settings
  • Constructor Details

    • LSHVectorFactory

      public LSHVectorFactory()
  • Method Details

    • buildZeroVector

      public abstract LSHVector buildZeroVector()
      Generate vector with all coefficients zero.
      Returns:
      the zero vector
    • buildVector

      public abstract LSHVector buildVector(int[] feature)
      Generate an LSHVector from a feature set, individual features are integer hashes. The integers MUST already be sorted. The same integer can occur more than once in the array (term frequency (TF) > 1). The factory decides internally how to create weights based on term frequency and any knowledge of Inverse Document Frequency (IDF)
      Parameters:
      feature - is the sorted array of integer features
      Returns:
      the newly minted LSHVector
    • restoreVectorFromXml

      public abstract LSHVector restoreVectorFromXml(XmlPullParser parser)
      Generate an LSHVector based on XML tag seen by pull parser. Factory generates weights based on term frequency info in the XML tag and its internal IDF knowledge
      Parameters:
      parser - is the XML parser
      Returns:
      the newly minted LSHVector
    • restoreVectorFromSql

      public abstract LSHVector restoreVectorFromSql(String sql) throws IOException
      Generate an LSHVector based on string returned from SQL query Factory generates weights based on term frequency info in the string and its internal IDF knowledge
      Parameters:
      sql - is the column data string returned by an SQL query
      Returns:
      the newly minted LSHVector
      Throws:
      IOException
    • set

      public void set(WeightFactory wFactory, IDFLookup iLookup, int settings)
      Load the factory with weights and the feature map
      Parameters:
      wFactory - is the weight table of IDF and TF weights
      iLookup - is the map from features int the weight table
      settings - is an integer id for this particular weighting scheme
    • isLoaded

      public boolean isLoaded()
      Returns:
      true if this factory has weights and lookup loaded
    • getSignificanceScale

      public double getSignificanceScale()
      Returns:
      the weighttable's significance scale for this factory
    • getSignificanceAddend

      public double getSignificanceAddend()
      Returns:
      the weighttable's significance addend for this factory
    • getSettings

      public int getSettings()
      Returns:
      settings ID used to generate factory's current weights
    • getSelfSignificance

      public double getSelfSignificance(LSHVector vector)
      Calculate a vector's significance as compared to itself, normalized for this factory's specific weight settings
      Parameters:
      vector - is the LSHVector
      Returns:
      the vector's significance score
    • calculateSignificance

      public double calculateSignificance(VectorCompare data)
      Given comparison data generated by the LSHVector.compare() method, calculate the significance of any similarity between the two vectors, normalized for this factory's specific weight settings
      Parameters:
      data - is the comparison object produced when comparing two LSHVectors
      Returns:
      the significance score
    • readWeights

      public void readWeights(XmlPullParser parser) throws SAXException
      Read both the weights and the lookup hashes from an XML stream
      Parameters:
      parser - is the XML parser
      Throws:
      SAXException