Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch: How to write query where string field is either null or empty?

I want to check for documents that have media_url == '' || media_url == null. I have a query:

{
    "engagements": [
        "blah"
    ],
    "query": {
        "from": 0,
        "size": 2,
        "sort": [
            {
                "bookmarked": {
                    "order": "desc"
                }
            },
            {
                "created_at": {
                    "order": "desc"
                }
            }
        ],
        "facets": {},
        "query": {
            "filtered": {
                "query": {
                    "match_all": {}
                },
                "filter": {
                    "bool": {
                        "must": [
                            {
                                "term": {
                                    "car_id": "78778"
                                }
                            },
                            {
                                "range": {
                                    "created_at": {
                                        "gte": "2015-04-12T04:00:00.000Z",
                                        "lte": "2015-05-13T03:59:59.999Z"
                                    }
                                }
                            },
                            {
                                "term": {
                                    "media_url": ""
                                }
                            }
                        ],
                        "should": [
                            {
                                "term": {
                                    "bookmarked": false
                                }
                            }
                        ]
                    }
                }
            }
        },
        "aggregations": {
            "word_frequencies": {
                "terms": {
                    "field": "text",
                    "size": 150
                }
            }
        },
        "highlight": {
            "fields": {
                "text": {
                    "fragment_size": 1500
                }
            }
        }
    },
    "api": "_search"
}

However, if I do what I do above, then records that are set to null wouldn't be returned. What should I do to return records with either '' or null as their media_url value?

like image 690
bigpotato Avatar asked May 13 '15 20:05

bigpotato


People also ask

How do I search for null values in Elasticsearch?

A null value cannot be indexed or searched. When a field is set to null , (or an empty array or an array of null values) it is treated as though that field has no values. Replace explicit null values with the term NULL . An empty array does not contain an explicit null , and so won't be replaced with the null_value .

What is term query in Elasticsearch?

Term queryedit. Returns documents that contain an exact term in a provided field. You can use the term query to find documents based on a precise value such as a price, a product ID, or a username. Avoid using the term query for text fields.

What kind of queries does Elasticsearch support?

Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses: Leaf query clauses.


2 Answers

Perhaps you can try using the "or" filter. http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-or-filter.html

{
  "or": [
    {
      "term": {
        "media_url": ""
      }
    },
    {
      "term": {
        "media_url": null
      }
    }
  ]
}

Edit: Here's the full query (untested since I don't have an example document/index template)

{
    "engagements": [
        "blah"
    ],
    "query": {
      "from": 0,
      "size": 2,
      "sort": [
         {
            "bookmarked": {
               "order": "desc"
            }
         },
         {
            "created_at": {
               "order": "desc"
            }
         }
      ],
      "facets": {},
      "query": {
         "filtered": {
            "query": {
               "match_all": {}
            },
            "filter": {
               "bool": {
                  "must": [
                     {
                        "term": {
                           "car_id": "78778"
                        }
                     },
                     {
                        "range": {
                           "created_at": {
                              "gte": "2015-04-12T04:00:00.000Z",
                              "lte": "2015-05-13T03:59:59.999Z"
                           }
                        }
                     },
                     {
                        "or": [
                           {
                              "term": {
                                 "media_url": ""
                              }
                           },
                           {
                              "term": {
                                 "media_url": null
                              }
                           }
                        ]
                     }
                  ],
                  "should": [
                     {
                        "term": {
                           "bookmarked": false
                        }
                     }
                  ]
               }
            }
         }
      },
      "aggregations": {
         "word_frequencies": {
            "terms": {
               "field": "text",
               "size": 150
            }
         }
      },
      "highlight": {
         "fields": {
            "text": {
               "fragment_size": 1500
            }
         }
      }
   },
   "api": "_search"
}
like image 133
phuc77 Avatar answered Sep 28 '22 03:09

phuc77


You can use the missing filter to take care of null value or field itself is missing. You can combine the same with an empty string term to achieve what you want.

{ 
   "or": [
    {
      "term": {
        "media_url": ""
      }
    },
    {
      "missing": {
        "field": "media_url"
      }
    }   
    ]
}

Use the above instead of the single term query for "media_url" in the must clause of your Boolean filter.

like image 37
Prabin Meitei Avatar answered Sep 28 '22 02:09

Prabin Meitei