Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch array must and must_not

I have a documents looking like this in my elasticsearch DB :

{
   "tags"   =>   [
      "tag-1",
      "tag-2",
      "tag-3",
      "tag-A"
   ]
   "created_at"   =>"2013-07-02 12:42:19   UTC",
   "label"   =>"Mon super label"
}

I would like to be able to filter my documents with this criteria : Documents tags array must have tags-1, tags-3 and tags-2 but must not have tags-A.

I tried to use a bool filter but I can't manage to make it work !

like image 759
user2854544 Avatar asked Jan 16 '14 17:01

user2854544


1 Answers

Here is a method that seems to accomplish you want: http://sense.qbox.io/gist/4dd806936f12a9668d61ce63f39cb2c284512443

First I created an index with an explicit mapping. I did this so I could set the "tags" property to "index": "not_analyzed". This means that the text will not be modified in any way, which will simplify the querying process for this example.

curl -XPUT "http://localhost:9200/test_index" -d'
{
    "mappings": {
        "docs" : {
            "properties": {
                "tags" : {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "label" : {
                    "type": "string"
                }
            }
        }
    }
}'

and then add some docs:

curl -XPUT "http://localhost:9200/test_index/docs/1" -d'
{
    "tags" : [
        "tag-1",
        "tag-2",
        "tag-3",
        "tag-A"
    ],
    "label" : "item 1"
}'
curl -XPUT "http://localhost:9200/test_index/docs/2" -d'
{
    "tags" : [
        "tag-1",
        "tag-2",
        "tag-3"
    ],
    "label" : "item 2"
}'
curl -XPUT "http://localhost:9200/test_index/docs/3" -d'
{
    "tags" : [
        "tag-1",
        "tag-2"
    ],
    "label" : "item 3"
}'

Then we can query using must and must_not clauses in a bool filter as follows:

curl -XPOST "http://localhost:9200/test_index/_search" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     "terms": {
                        "tags": [
                           "tag-1",
                           "tag-2",
                           "tag-3"
                        ],
                        "execution" : "and"
                     }
                  }
               ],
               "must_not": [
                  {
                      "term": {
                         "tags": "tag-A"
                      }
                  }
               ]
            }
         }
      }
   }
}'

which yields the correct result:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 2,
      "successful": 2,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "docs",
            "_id": "2",
            "_score": 1,
            "_source": {
               "tags": [
                  "tag-1",
                  "tag-2",
                  "tag-3"
               ],
               "label": "item 2"
            }
         }
      ]
   }
}

Notice the "execution" : "and" parameter in the terms filter in the must clause. This means only docs that have all the "tags" specified will be returned (rather than those that match one or more). That may have been what you were missing. You can read more about the options in the ES docs.

I made a runnable example here that you can play with, if you have ES installed and running at localhost:9200, or you can provide your own endpoint.

like image 77
Sloan Ahrens Avatar answered Sep 19 '22 11:09

Sloan Ahrens