Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Case insensitive fields in Elasticsearch

I am using NEST with ElasticSearch and I trying to search by allowing users to type search phrases into a search box. All is working fine apart from the fact when the user enters a search phrase they need to make sure the field name is the same case as the field name in Elastic search.

For example, one of my fields is called bookTitle. If they search like below then it works

bookTitle:"A Tale of Two Cities"

If they search like the example below it does not work

booktitle:"A Tale of Two Cities" BookTitle:"A Tale of Two Cities"

The code I am using to search is below. Does anyone have any ideas on how I can fix this. I was hoping that there is an ElasticSearch/NEST setting that allows me to do this as opposed to doing somehthing ugly with the search text like finding "BookTitle" and replacing with "bookTitle".

   public List<ElasticSearchRecord> Search(string searchterm) {

        var results = _client.Search<ElasticSearchRecord>(s => s
                        .Query(q => q
                            .QueryString(qs => qs
                                .DefaultField("content")
                                .Query(searchterm)
                            )
                        ));


        return results.Documents.ToList();
    }

Any help greatly appreciated.

like image 559
Jim Culverwell Avatar asked Oct 03 '15 04:10

Jim Culverwell


3 Answers

The way you want this is not possible with Elasticsearch. You are in control of the mapping, you define the names of the fields, you are the one who's controlling the queries.

By this judgement, you need to watch for what your users will type in the search field, Elasticsearch out-of-the-box will not help you with lowercasing field names or anything like that.

So, whatever solution you will choose it will be a workaround.

My suggestion is to define a set of rules, that should also be communicated to your users. Something around these lines:

  • your field names are all lowercased, or camel-cased
  • you define the mapping as strict so that you are in full control of it
  • you notify the users (in the web interface or the UI) that they should search field names given a set of rules (lowercase only, or camel case only etc)

Another approach is to define what goes into the _all field. And in your QueryString you don't use specific field names and ES will use query_string by its default setting. This means ES will use _all, an _all field you know what it contains.

Just for the sake of mentioning this, but by any means I don't recommend it, I think you can use a script to do whatever you want with the field name in Groovy. But, this means you will not use the real power of Elasticsearch.

Educate your users and define a set of rules to stick to, as I mentioned above.

like image 93
Andrei Stefan Avatar answered Oct 04 '22 20:10

Andrei Stefan


You could cache the mapping in C# in-memory and confirm that all search fields are found from it. If no exact match is found then try finding best matching field(s). If there are multiple options to choose from then throw an error and ask the user to be more specific.

Actually the UI could do this on-the-fly as they type and help them choose the right option.

like image 28
NikoNyrh Avatar answered Oct 04 '22 22:10

NikoNyrh


Haven't tested this as production-quality yet, but in theory you can save all of your objects with a lowercase naming strategy and .ToLower() your queried fields so they always match.

Begin by creating the appropriate naming strategy:

public class LowercaseNamingStrategy : Newtonsoft.Json.Serialization.NamingStrategy
{
    protected override string ResolvePropertyName(string name)
    {
        return name.ToLower();
    }
}

And the appropriate serializer:

    public class ElasticSerializer : JsonNetSerializer
    {
        public ElasticSerializer(IConnectionSettingsValues settings)
            : base(settings)
        {
            this.ContractResolver.NamingStrategy = new LowercaseNamingStrategy();
        }
    }

Then use the serializer in your connection settings of your NEST client:

        var pool = new StaticConnectionPool([your nodes]);
        var settings = new ConnectionSettings(pool, s => new ElasticSerializer(s));
        var client = new ElasticClient(settings);

This will store your fields as lowercase. Then when you query, just force all of the user-provided fields to be lowercase and things should line up.

If you're not starting from scratch, you'll have to repopulate your data to keep the naming strategy unified.

like image 45
Chaos Avatar answered Oct 04 '22 20:10

Chaos