Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache solr search part of the word

I'm using apache solr search engine for indexing my website database..

I'm using django+http://haystacksearch.org/

So let's say I have document that have word "Chicken"

When I search for "chicken" - solr can find this document

But When I search "chick" - it does not find anything..

Is there a way to fix this ?

like image 706
Pydev UA Avatar asked Dec 29 '09 12:12

Pydev UA


4 Answers

Note: The following solution is Solr 1.4 (and above) specific!

For more flexibility, I would recommend indexing your data with the NGramTokenizerFactory to do complete front and back wildcard searches. If you just want to search for substrings at the beginning or end of the string, consider using the EdgeNGramTokenizerFactory.

Here's a drop in replacement of the text field type which would accomodate your need:

<fieldType name="text" class="solr.TextField" > <analyzer type="index">     <tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />     <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query">     <tokenizer class="solr.WhitespaceTokenizerFactory" />     <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> 
like image 80
Brian Avatar answered Sep 17 '22 18:09

Brian


If you want to find all words that start with chick, search for chick*.

like image 28
Chase Seibert Avatar answered Sep 18 '22 18:09

Chase Seibert


When I've used

<tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />

for making wildcard search from Brian's answer, Solr indexing time dramaticly increased. In more than 20 times! The other decision of wildcard searching problem I found here:

http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

You need just add filter

<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />

(default tokenizer - solr.WhitespaceTokenizerFactory in index block of FieldType). For me result was the same with less system costs.

like image 21
Vasiliy Toporov Avatar answered Sep 17 '22 18:09

Vasiliy Toporov


A different approach, if you are having trouble with a small set of words, would be to use the solr.SynonymFilterFactory

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

You just have to maintain a simple text file that contains synonyms:

chick peep chicken
dawg hound dog
moggie puss kitten cat

Plurals should take care of themselves with other filters.

like image 43
JP. Avatar answered Sep 16 '22 18:09

JP.