Package generic.lsh.vector
Class LSHVectorFactory
java.lang.Object
generic.lsh.vector.LSHVectorFactory
- Direct Known Subclasses:
WeightedLSHCosineVectorFactory
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionabstract LSHVector
buildVector
(int[] feature) Generate an LSHVector from a feature set, individual features are integer hashes.abstract LSHVector
Generate vector with all coefficients zero.double
Given comparison data generated by the LSHVector.compare() method, calculate the significance of any similarity between the two vectors, normalized for this factory's specific weight settingsdouble
getSelfSignificance
(LSHVector vector) Calculate a vector's significance as compared to itself, normalized for this factory's specific weight settingsint
double
double
boolean
isLoaded()
void
readWeights
(XmlPullParser parser) Read both the weights and the lookup hashes from an XML streamabstract LSHVector
Generate an LSHVector based on string returned from SQL query Factory generates weights based on term frequency info in the string and its internal IDF knowledgeabstract LSHVector
restoreVectorFromXml
(XmlPullParser parser) Generate an LSHVector based on XML tag seen by pull parser.void
set
(WeightFactory wFactory, IDFLookup iLookup, int settings) Load the factory with weights and the feature map
-
Field Details
-
weightFactory
-
idfLookup
-
settings
protected int settings
-
-
Constructor Details
-
LSHVectorFactory
public LSHVectorFactory()
-
-
Method Details
-
buildZeroVector
Generate vector with all coefficients zero.- Returns:
- the zero vector
-
buildVector
Generate an LSHVector from a feature set, individual features are integer hashes. The integers MUST already be sorted. The same integer can occur more than once in the array (term frequency (TF) > 1). The factory decides internally how to create weights based on term frequency and any knowledge of Inverse Document Frequency (IDF)- Parameters:
feature
- is the sorted array of integer features- Returns:
- the newly minted LSHVector
-
restoreVectorFromXml
Generate an LSHVector based on XML tag seen by pull parser. Factory generates weights based on term frequency info in the XML tag and its internal IDF knowledge- Parameters:
parser
- is the XML parser- Returns:
- the newly minted LSHVector
-
restoreVectorFromSql
Generate an LSHVector based on string returned from SQL query Factory generates weights based on term frequency info in the string and its internal IDF knowledge- Parameters:
sql
- is the column data string returned by an SQL query- Returns:
- the newly minted LSHVector
- Throws:
IOException
-
set
Load the factory with weights and the feature map- Parameters:
wFactory
- is the weight table of IDF and TF weightsiLookup
- is the map from features int the weight tablesettings
- is an integer id for this particular weighting scheme
-
isLoaded
public boolean isLoaded()- Returns:
- true if this factory has weights and lookup loaded
-
getSignificanceScale
public double getSignificanceScale()- Returns:
- the weighttable's significance scale for this factory
-
getSignificanceAddend
public double getSignificanceAddend()- Returns:
- the weighttable's significance addend for this factory
-
getSettings
public int getSettings()- Returns:
- settings ID used to generate factory's current weights
-
getSelfSignificance
Calculate a vector's significance as compared to itself, normalized for this factory's specific weight settings- Parameters:
vector
- is the LSHVector- Returns:
- the vector's significance score
-
calculateSignificance
Given comparison data generated by the LSHVector.compare() method, calculate the significance of any similarity between the two vectors, normalized for this factory's specific weight settings- Parameters:
data
- is the comparison object produced when comparing two LSHVectors- Returns:
- the significance score
-
readWeights
Read both the weights and the lookup hashes from an XML stream- Parameters:
parser
- is the XML parser- Throws:
SAXException
-