public class FreeTextSuggester
extends org.apache.lucene.search.suggest.Lookup
implements org.apache.lucene.util.Accountable
build(org.apache.lucene.search.suggest.InputIterator) and predicts based
on the last grams-1 tokens in the request sent to lookup(java.lang.CharSequence, boolean, int). This tries
to handle the "long tail" of suggestions for when the incoming query is a
never before seen query string.
Likely this suggester would only be used as a fallback, when the primary suggester fails to find any suggestions.
Note that the weight for each suggestion is unused, and the suggestions are the analyzed forms (so your analysis process should normally be very "light").
This uses the stupid backoff language model to smooth scores across ngram models; see "Large language models in machine translation", http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.1126 for details.
From lookup(java.lang.CharSequence, boolean, int), the key of each result is the ngram token; the value is
Long.MAX_VALUE * score (fixed point, cast to long). Divide by Long.MAX_VALUE
to get the score back, which ranges from 0.0 to 1.0.
onlyMorePopular is unused.
| Modifier and Type | Field and Description |
|---|---|
static java.lang.String |
CODEC_NAME
Codec name used in the header for the saved model.
|
static double |
DEFAULT_ALPHA
The constant used for backoff smoothing; during lookup, this means that if a
given trigram did not occur, and we backoff to the bigram, the overall score
will be 0.4 times what the bigram model would have assigned.
|
static int |
DEFAULT_GRAMS
By default we use a bigram model.
|
static byte |
DEFAULT_SEPARATOR
The default character used to join multiple tokens into a single ngram token.
|
static int |
VERSION_CURRENT
Current version of the the saved model file format.
|
static int |
VERSION_START
Initial version of the the saved model file format.
|
| Constructor and Description |
|---|
FreeTextSuggester(org.apache.lucene.analysis.Analyzer analyzer)
Instantiate, using the provided analyzer for both indexing and lookup, using
bigram model by default.
|
FreeTextSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer,
org.apache.lucene.analysis.Analyzer queryAnalyzer)
Instantiate, using the provided indexing and lookup analyzers, using bigram
model by default.
|
FreeTextSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer,
org.apache.lucene.analysis.Analyzer queryAnalyzer,
int grams)
Instantiate, using the provided indexing and lookup analyzers, with the
specified model (2 = bigram, 3 = trigram, etc.).
|
FreeTextSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer,
org.apache.lucene.analysis.Analyzer queryAnalyzer,
int grams,
byte separator)
Instantiate, using the provided indexing and lookup analyzers, and specified
model (2 = bigram, 3 = trigram ,etc.).
|
FreeTextSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer,
org.apache.lucene.analysis.Analyzer queryAnalyzer,
int grams,
byte separator,
double alpha) |
| Modifier and Type | Method and Description |
|---|---|
void |
build(org.apache.lucene.search.suggest.InputIterator iterator) |
void |
build(org.apache.lucene.search.suggest.InputIterator iterator,
double ramBufferSizeMB)
Build the suggest index, using up to the specified amount of temporary RAM
while building.
|
java.lang.Object |
get(java.lang.CharSequence key)
Returns the weight associated with an input string, or null if it does not
exist.
|
java.util.Collection<org.apache.lucene.util.Accountable> |
getChildResources() |
long |
getCount() |
boolean |
load(org.apache.lucene.store.DataInput input) |
java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult> |
lookup(java.lang.CharSequence key,
boolean onlyMorePopular,
int num) |
java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult> |
lookup(java.lang.CharSequence key,
int num)
Lookup, without any context.
|
java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult> |
lookup(java.lang.CharSequence key,
java.util.Set<org.apache.lucene.util.BytesRef> contexts,
boolean onlyMorePopular,
int num) |
java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult> |
lookup(java.lang.CharSequence key,
java.util.Set<org.apache.lucene.util.BytesRef> contexts,
int num)
Retrieve suggestions.
|
long |
ramBytesUsed()
Returns byte size of the underlying FST.
|
boolean |
store(org.apache.lucene.store.DataOutput output) |
public static final java.lang.String CODEC_NAME
public static final int VERSION_START
public static final int VERSION_CURRENT
public static final int DEFAULT_GRAMS
public static final double DEFAULT_ALPHA
public static final byte DEFAULT_SEPARATOR
public FreeTextSuggester(org.apache.lucene.analysis.Analyzer analyzer)
public FreeTextSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer,
org.apache.lucene.analysis.Analyzer queryAnalyzer)
public FreeTextSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer,
org.apache.lucene.analysis.Analyzer queryAnalyzer,
int grams)
public FreeTextSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer,
org.apache.lucene.analysis.Analyzer queryAnalyzer,
int grams,
byte separator)
ShingleFilter.setTokenSeparator(java.lang.String) to join multiple tokens into a single
ngram token; it must be an ascii (7-bit-clean) byte. No input tokens should
have this byte, otherwise IllegalArgumentException is thrown.public FreeTextSuggester(org.apache.lucene.analysis.Analyzer indexAnalyzer,
org.apache.lucene.analysis.Analyzer queryAnalyzer,
int grams,
byte separator,
double alpha)
public long ramBytesUsed()
ramBytesUsed in interface org.apache.lucene.util.Accountablepublic java.util.Collection<org.apache.lucene.util.Accountable> getChildResources()
getChildResources in interface org.apache.lucene.util.Accountablepublic void build(org.apache.lucene.search.suggest.InputIterator iterator)
throws java.io.IOException
build in class org.apache.lucene.search.suggest.Lookupjava.io.IOExceptionpublic void build(org.apache.lucene.search.suggest.InputIterator iterator,
double ramBufferSizeMB)
throws java.io.IOException
iterator - the input interatorramBufferSizeMB - the buffer size in MBytejava.io.IOException - on any IO releated errorpublic boolean store(org.apache.lucene.store.DataOutput output)
throws java.io.IOException
store in class org.apache.lucene.search.suggest.Lookupjava.io.IOExceptionpublic boolean load(org.apache.lucene.store.DataInput input)
throws java.io.IOException
load in class org.apache.lucene.search.suggest.Lookupjava.io.IOExceptionpublic java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult> lookup(java.lang.CharSequence key,
boolean onlyMorePopular,
int num)
lookup in class org.apache.lucene.search.suggest.Lookuppublic java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult> lookup(java.lang.CharSequence key,
int num)
public java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult> lookup(java.lang.CharSequence key,
java.util.Set<org.apache.lucene.util.BytesRef> contexts,
boolean onlyMorePopular,
int num)
lookup in class org.apache.lucene.search.suggest.Lookuppublic long getCount()
getCount in class org.apache.lucene.search.suggest.Lookuppublic java.util.List<org.apache.lucene.search.suggest.Lookup.LookupResult> lookup(java.lang.CharSequence key,
java.util.Set<org.apache.lucene.util.BytesRef> contexts,
int num)
throws java.io.IOException
java.io.IOExceptionpublic java.lang.Object get(java.lang.CharSequence key)
Copyright © 2019–2020 Redlink GmbH. All rights reserved.