Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform full text search on number fields in ElasticSearch?

Question:

Without converting a number field to a string, how can I perform a full text search on it?

I'm trying mimic the behavior of _all to dynamically convert a number field to a string when performing a query.

Example.

Setup:

curl -XPUT http://localhost:9200/test/items/1 -d '{accountId : 12341234, name:"Bob"}'
curl -XPUT http://localhost:9200/test/items/2 -d '{accountId : 980987, name:"Marry"}'
curl -XPUT http://localhost:9200/test/items/3 -d '{accountId : 234234, name:"Daniel"}'

Objective:

Find an accountId with the number 4.

What I've done.

I tried these two queries but received 0 hits.

Queries:

curl -XPOST "http://localhost:9200/test/items/_search" -d '{
  "query": {
    "term": {
      "accountId": "4"
    }
  }
}'

curl -XPOST "http://localhost:9200/test/items/_search" -d '{
  "query": {
    "query_string": {
      "query": "4"
    }
  }
}'

Output:

{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 0,
        "max_score": null,
        "hits": []
    }
}
like image 908
Larry Battle Avatar asked Sep 15 '14 18:09

Larry Battle


1 Answers

I suggest that you use the ngram tokenizer for that purpose.

Here is a code sample of what you might need. You would want to first set the analyzer settings with the tokenizer you wish to use.

curl -XPUT localhost:9200/test?pretty=true -d '{
  "settings":{
    "analysis":{
      "analyzer":{
        "my_ngram_analyzer":{
          "tokenizer":"my_ngram_tokenizer"
        }
      },
      "tokenizer":{
        "my_ngram_tokenizer":{
          "type":"nGram",
          "token_chars":[
            "letter",
            "digit"
          ]
        }
      }
    }
  }
}'

More on Ngram Tokenizer here.

Then you should define the following mapping:

curl -XPUT localhost:9200/test/items/_mapping?pretty=true -d '{
  "items":{
    "properties":{
      "accountId":{
        "analyzer":"my_ngram_analyzer",
        "type":"string"
      },
      "name":{
        "type":"string"
      }
    }
  }
}'

The reason 'accountId' is a 'string' is that the Ngram tokenizer doesn't work on numeric fields.

Now you can query your index :

curl -XGET localhost:9200/test/_search?pretty=true -d'
{
  "query": {
    "query_string": {
      "default_field": "accountId",
      "query": "4"
    }
  }
}'

You can find here the bash script I used to test it.

NB: Of course this is just a demo on about how you can use the Ngram Tokenizer. I hope it will help

like image 184
eliasah Avatar answered Sep 27 '22 17:09

eliasah