I'm trying to sort a solr query by a field ignoring stopwords, but can't seem to find a way to do that. For example, I want the results to be sorted like:
Is this possible? Right now the field type is defined like:
<fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" />
</analyzer>
</fieldType>
And the field is added like:
<field name="title" type="alphaOnlySort" indexed="true" stored="false"/>
It seems like someone else would've had to do this too? Or is sorting without stopwords a no-no?
KeywordTokenizerFactory does not break the content into individual pieces so StopFilterFactory is trying to match the token (the entire content) to the stop word list and finding no matches. To get the stop words out of the index you need to use a tokeniser like WhitespaceTokenizerFactory BUT you cannot sort on a tokenised field. So the only way I can think to do this is to:
Generally the only stop words you want for sorting (not searching) are "A", "AN", "THE". I'm not very good at reg expressions but I'm sure this is trivial for many.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With