Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searchkick - trailing special characters

I'm using Searchkich on Rails 5 app.

In my search_data for model Part I have string fields that contain dots (.) and hyphens (-). I would like to make a literal search for those fields using dots and hyphens in query string. I am using word_start match.

When my query string looks like this: 66.6 it works OK (it finds all records with queried field starting with 66.6).

However if dot (or other special character) is trailing (ie. 66. or 66- or even 66.---.-.---) it behaves like the query string is just 66. It seems like anything after "normal" characters (letters and digits) is being trimmed.

My search looks like this:

Part.search "66.", fields: [:catalogue_number], misspellings: false, match: :word_start

What is the possible solution to this?

EDIT:

Ok, I broke it down and it seems that dots and hyphens are two separate problems.

  1. Dots in query string seem to behave as described above - if the dot is followed by any "normal" character search works as expected. However trailing dots seem to be ignored.
  2. Hyphens in the middle of the query string behave like whitespaces - they divide query string to different strings (afterwards connected with operator and). Trailing hyphens seem to be ignored (like dots).

What I need is for both dots and hyphens to behave literally wherever they are in the query string.

like image 858
glizda101 Avatar asked Jan 17 '17 08:01

glizda101


1 Answers

The word_start analyzer of searchkick use this ES configuration ( source here )

searchkick_word_start_index: {
    type: "custom",
    tokenizer: "standard",
    filter: ["lowercase", "asciifolding", "searchkick_edge_ngram"]
}

It uses the standard tokenizer that split strings on hyphens and dots ( there are other rules used by standard tokenizer, but non-relevant to your case ) ( doc here )

You should try with the text_start match of searchkick that use this configuration

searchkick_text_start_index: {
    type: "custom",
    tokenizer: "keyword",
    filter: ["lowercase", "asciifolding", "searchkick_edge_ngram"]
}

The Elastic keyword tokenizer will preserve the "." and "-" and should work for your use case.

NB: A think that the working matching on 66.6 is a fluke since standard analyzer also strips the "."

like image 154
Pierre Mallet Avatar answered Oct 17 '22 12:10

Pierre Mallet