Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search document with empty array field, on ElasticSearch

I have a set of documents (type 'article') and I want to search for the document that have elements/objects into an array field

{
    "_type": "article",
    "_source": {
        "title": "Article 1",
        "locations": [
            {
                "address": "ES headquarter",
                "city": "Berlin"
            }
        ]
    }
}

I want two queries (just one, but with a little variation):

  • get all the articles that have locations
  • get all the articles that have NO locations

I tried different things but probably I'm too bad with ElasticSearch:

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": [
        {
          "type": {
            "value": "article"
          }
        },
        {
          "bool": {
            "must_not": {
              "missing": {
                "field": "location",
                "existence": true,
                "null_value": true
              }
            }
          }
        }
      ]
    }
  }
}

this doesn't work.

  • How would you fix my query?

but mainly:

  • How would you perform this search for documents with a field that is an empty array?
like image 933
Kamafeather Avatar asked Nov 04 '14 14:11

Kamafeather


3 Answers

If address is a mandatory field in location array you can modify your query:

"must_not": {
  "missing": {
    "field": "locations.address"
  }
}

AFAIK, in ES you cannot query non-leaf elements (like your location field) (see issue), and in case object types ES flattens the nested fields (see nested type, object type). That's why I suggested to query for one of the leaf elements instead. But it requires that one of them is mandatory (which is unfortunately not satisfied in your case).

Anyway I found the solution using the _source parameter inside the source_filtering:

"must_not": {
  "script": {
    "script": "_source.locations.size() > 0"
  }
}

Note that using "lang":"groovy" you should write: "script": "_source.locations.size > 0"

like image 126
Zoltan Balogh Avatar answered Oct 21 '22 20:10

Zoltan Balogh


If you don't want to enable scripting, you can combine the Exists Query with a must_not bool query, e.g:

{
  "query":{
    "bool":{
      "must_not":[
        {
          "exists":{
            "field":"tags"
          }
        }
      ]
    }
  }
}
like image 40
eggplantkiller Avatar answered Oct 21 '22 20:10

eggplantkiller


As per the Elasticsearch documentation

An empty array is treated as a missing field — a field with no values.

Let's suppose you have two documents in the article-index index

# First document
{
    "_type": "article",
    "_source": {
        "title": "Article 1",
        "locations": [{"address": "ES headquarter", "city": "Berlin"}]
    }
}
# Second document
{
    "_type": "article",
    "_source": {
        "title": "Article 2",
        "locations": []
    }
}

Expected queries would be:

  1. Get all the articles that have locations
GET article-index/_search
{
  "query": {
    "exists": {
       "field": "locations"
    }
  }
}
  1. get all the articles that have NO locations
GET article-index/_search
{
  "query": { 
    "bool": {
      "must": {
        "exists": {
          "field": "locations"
        }
      }
    }
  }
}
like image 1
Azeem Chauhan Avatar answered Oct 21 '22 21:10

Azeem Chauhan