Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use NGramTokenizerFactory or NGramFilterFactory?

Recently, I am studying how to store and index using Solr. I want to do facet.prefix search. With whitespace tokenizer, "Where are you" will be splited into three words and indexed. If I search facet.prefix="where are", no result will be returned.

I google and found NGramFilterFactory can help me. But when I apply this filter factory, I found the result is "w, h, e, ..., wh, ..", which split the sentence by character, not by token word.

I use the parameters maxGramSize and minGramSize, set to 1 and 3. Does the NGramFilterFactory work right? Should I add some other parameters? Is there some other filter factories which can help me?

Thanks!

like image 906
user572485 Avatar asked Nov 14 '22 04:11

user572485


1 Answers

Facets should only be applied to non tokenized fields like strings. if you want that results will be displayed for "what are" use no tokenizer at all for that field (or a copyField directive). I guess that you want to use facet.prefix for autocompletion. you can do this, look here.

for the ngramtokenizer check this out.

like image 171
Karussell Avatar answered Dec 21 '22 17:12

Karussell