I have a Solr instance with a suggester component. It works fine, using the AnalyzingInfixLookupFactory
implementation.
However, I want to expand the suggestions to a content
field, which can contain a lot of text. The suggester finds suggestions all right, but it returns the entire field value, instead of just a sentence, or part of a sentence.
So, if I want a suggestion for "foo", and the content
field contains a text like:
"I really like pizza. And donuts. Let's get some from that other place. The foo bar place."
The suggestion will be that entire text, instead of just "The foo bar place". And, obviously, when content
is hundreds of words long, this is just not usabe.
Is there a way to limit the number of returned words for a suggestion?
Here's my search component:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">autocomplete</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="indexPath">suggestions</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">suggest</str>
<str name="suggestAnalyzerFieldType">text_suggest</str>
<str name="buildOnStartup">false</str>
<bool name="highlight">false</bool>
<str name="payloadField">label</str>
</lst>
</searchComponent>
And here's the request handler:
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.dictionary">autocomplete</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Finally, here is the field from which the suggestions are derived:
<fieldType name="text_suggest" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="suggest" type="text_suggest" indexed="true" multiValued="true" stored="true"/>
I then use a bunch of <copyField>
s to copy the content over.
EDIT 2015-08-28
The content
field definition is as follows:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="txt/mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="txt/stopwords.txt" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="25"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory" mapping="txt/mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="content" type="text" indexed="true" stored="true" termVectors="true"/>
EDIT 2016-09-28
This issue is probably related: Is Solr SuggestComponent able to return shingles instead of whole field values?
I think what you might be looking for is solr.ShingleFilterFactory, which simply allows to limit the token size basing on the words count, rather than text lenght as in solr.NGramFilterFactory you've been trying to use.
Please see SOLR wiki page for more details:
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With