I need a google like suggestions in a search box. Solr is already a given. The results should look like this:
searchterm Alex
results Alexander Behling, Alexander Someone ...
searchterm cab
results cable, high voltage cable, cable cutter
The aim is to have phrases as suggestion and not entire fields or excerpts. The query should be caseinsensitive, Alex should have the same results as alex, but the searchresult (suggestions) must have the original case.
The suggestions must be filterable by category, we have the results of several domains in one index and the result should be filtered by a specific field containing the domain. contextField only works with "AnalyzingInfixLookupFactory and BlendedInfixLookupFactory currently support this feature, when backed by DocumentDictionaryFactory."
config (no special schema changes):
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">default</str>
<str name="lookupImpl">FreeTextLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">content</str>
<str name="ngrams">3</str>
<str name="separator"> </str>
<str name="suggestFreeTextAnalyzerFieldType">text_general</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">default</str>
<str name="echoParams">explicit</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
This works reasonable well, but delivers only single words.
searchterm Alex
results Alexander, Alexandra ...
Advantage is a very high indexing speed. I tried to combine this with a ShingleFilter, but this didn't work, probably because the ShingleFilter is already part of the FreeTextLookupFactory. Because of the FreeTextLookupFactory categories are not supported.
schema:
<field name="suggest_field" type="text_suggest" indexed="true" stored="true" multiValued="true"/>
<field name="site" type="string" stored="true" indexed="true"/>
<copyField source="content" dest="suggest_field"/>
<fieldType name="text_suggest" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<!--filter class="solr.LowerCaseFilterFactory"/-->
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.ShingleFilterFactory"
minShingleSize="2"
maxShingleSize="4"
outputUnigrams="true"
outputUnigramsIfNoShingles="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
config:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">default</str>
<str name="lookupImpl">BlendedInfixLookupFactory</str>
<str name="blenderType">position_linear</str>
<str name="dictionaryimpl">DocumentDictionaryFactory</str>
<str name="field">suggest_field</str>
<str name="weightField">weight</str>
<str name="suggestAnalyzerFieldType">text_suggest</str>
<str name="queryAnalyzerFieldType">phrase_suggest</str>
<str name="indexPath">suggest</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
<bool name="exactMatchFirst">true</bool>
<str name="contextField">site</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">default</str>
<str name="echoParams">explicit</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>FreeTextLookupFactory
The second approach leads to a for me strange behaviour:
searchterm Alex or alex
results nothing ...
searchterm cab
results cable, cables, voltage cables, cable accessories, power cables ...
Using the same fields, there are no search results for certain queries. The indexing speed is already > 12h for <10000 entries. Due to the BlendedInfixLookupFactory and DocumentDictionaryFactory categories should be supported.
But when using a category in the query. http://localhost:8983/solr/magnolia/suggest?wt=json&suggest=true&suggest.q=nym&suggest.cfq=com
the results are empty. The field "site" does contain the value "com" multiple times in the index.
schema:
<field name="suggest_field" type="text_shingle" indexed="true" stored="true" multiValued="true"/>
...
<copyField source="_text_" dest="suggest_field"/>
...
<fieldType name="text_shingle" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_suggestions.txt" format="snowball" />
<!--filter class="solr.EdgeNGramFilterFactory" minGramSize="4" maxGramSize="15"/-->
<filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="4" outputUnigrams="false" outputUnigramsIfNoShingles="true" fillerToken=""/>
</analyzer>
</fieldType>
<!-- marc johnen : used for autocomplete-->
<fieldType name="text_suggest" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>
</fieldType>
config:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">default</str>
<str name="lookupImpl">BlendedInfixLookupFactory</str>
<str name="dictionaryImpl">HighFrequencyDictionaryFactory</str>
<str name="field">suggest_field</str>
<str name="suggestAnalyzerFieldType">text_suggest</str>
<str name="minPrefixChars">2</str>
<str name="exactMatchFirst">true</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">true</str>
<str name="highlight">false</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">default</str>
<str name="echoParams">explicit</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
The results of this approach are quite good, basically as specified except for some duplicate phrases because some keywords are duplicated because they have whitespaces at the beginning or end like "power cable" and "power cable ". Other than that quite good.
searchterm Alex
results Alexander Behling, Alexander Someone ...
searchterm cab
results cable, high voltage cable, cable cutter
Indexing easily takes a day for <10000 documents. The main problem though is that because of the HighFrequencyDictionaryFactory categories are not supported.
The query I use looks like this:
http://localhost:8983/solr/magnolia/suggest?wt=json&suggest=true&suggest.q=cab
Adding a <str name="contextField">site</str>
in the config for categories and &suggest.cfq=com
to the query when applicable.
Suggester is a search component, which is a building block of Solr’s search pipeline. To make this component work, two things need to be configured in the search engine’s config: the data source for suggestions (dictionaryImpl parameter), and how these suggestions are stored and searched in query-time (lookupImpl parameter).
But today, Google won’t auto-suggest their names as you begin to type, deeming them too piracy related. Aside from taking out some potentially innocent parties, the whole thing feels kind of hypocritical.
And, if you can reduce the suggestions search to a single term search, which will result in a corresponding increase in the suggestion index, it would be the simplest and safest option to use. Maintainability — monitoring using Solr's index is much more reliable than using an internal in-memory data structure or internal index.
Language also has an impact. Different suggestions will appear if you’ve told Google that you prefer to search in a particular language, or based on the language Google assumes you use, as determined by your browser’s settings. Google’s suggestions may also contain things you’ve searched for before, if you make use of Google’s web history feature.
I ended up using the FreeTextLookupFactory and created a separate field and suggester for each language.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With