For example I have synonyms laptop,netbook,notebook in index_synonyms.txt
When user search for netbook I want to boost original text more then expanded by synonyms? Is there way to specify this in SynonymFilterFactory? For example use original term twice so his TF will be bigger
As far as I know, there is no way to do this with the existing SynonymFilterFactory. But following is a trick you can use to get this behavior.
Let's say your field is called title
. Create another field which is a copy of this, say title_synonyms
. Now ensure that SynonymFilterFactory is used as an analyzer only for title_synonyms
(you can do this by using different field types for the two fields — say text
and text_synonyms
). Search in both these fields but give higher boost to title
than title_synonyms
.
Here are sample field type definitions:
<fieldType name="text" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_synonyms" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms_index.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms_query.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
And here are sample field definitions:
<field name="title" type="text" stored="false"
required="true" multiValued="true"/>
<field name="title_synonyms" type="text_synonyms" stored="false"
required="true" multiValued="true"/>
Copy title
field to title_synonyms
:
<copyField source="title" dest="title_synonyms"/>
If you are using dismax
, you can give different boosts to these fields like so:
<str name="qf">title^10 title_synonyms^1</str>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With