I'm having some difficulties with either how to construct the Solr query, or how to setup the schema to get searches in our web store to work better.
First some configuration (Solr 4.2.1)
<field name="mfgpartno" type="text_en_splitting_tight" indexed="true" stored="true" />
<field name="mfgpartno_sort" type="string" indexed="true" stored="false" />
<field name="mfgpartno_search" type="sku_partial" indexed="true" stored="true" />
<copyField source="mfgpartno" dest="mfgpartno_sort" />
<copyField source="mfgpartno" dest="mfgpartno_search" />
<fieldType name="sku_partial" class="solr.TextField" omitTermFreqAndPositions="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="1" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false"/>
<filter class="solr.NGramFilterFactory" minGramSize="4" maxGramSize="100" side="front" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false"/>
</analyzer>
</fieldType>
Let me break this down into stages (I'm only going to go into enough to replicate the problem - the initial stages aren't using edismax, that is what we've chosen to use on our website):
q=DV\-5PBRP
<- With this query I get 18 results but, not the one I'm looking for (this is most likely do to the default df
searching on the productname field - fine)q=mfgpartno_search:DV\-5PBRP
<- this gives me the 1 result I'm looking for, but due to the query building I need to do on the website it's better if I can use the q
parameter like stage 1.q=DV\-5PBRP&defType=edismax&qf=mfgpartno_search
<- this also gives me the 1 result I'm looking for, but again due to the website search qf
needs to be spanning more fields. Because it needs to search more fields (actual qf
= productname_search shortdesc_search fulldesc_search mfgpartno_search productname shortdesc fulldesc keywords
) to get more accurate searching I implemented stage 4.q=DV\-5PBRP&defType=edismax&qf=mfgpartno_search&q.op=AND
<- with this test I get 0 results - though this works great for most searches on our site.My big problem with search has been the special characters like the dash that sometimes must be literal, and sometimes act as separators as in product names or descriptions. Sometimes people will even search or replace the dash with a space on a part number search and it should still show relevant data.
I'm kind of stuck on how to get this special character search working - especially as it pertains to this mfgpartno_search field. How might I configure either the schema or query (or both) to get this working?
Maybe you could try the Regular Expression Pattern Tokenizer, and make a suitable regular expression for you article numbers. Lucene (which Solr is built upon) is very focused on tokenization for prose.
What you want here is probably an N-gram split, as well as 1-grams? And maybe that dashes are replaced with spaces, something like
DV-5PBRP -> {DV 5PBRP, DV, 5P, BR, PB, RP, D, V, 5, P, B, R}
As you can see, the index will be quite large for very small fields. Make sure the ranking of the results are heavily weighted for the larger ngrams.
I do think you should remove the stop word list for the article numbers field.
The N-gram size should probably start at 1 or 2.
Simply make sure the various analyzers doesn't:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With