Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I create an "OR" filter using elasticsearch-dsl-py?

The query below is what I would like to construct using elasticsearch-dsl-py, but I do not know how to do it.

GET /my_index/_search
{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "status": "published"
              }
            },
            {
              "or": {
                "filters": [
                  {
                    "range": {
                      "start_publication": {
                        "lte": "2015-02-17T03:45:00.245012+00:00"
                      }
                    }
                  },
                  {
                    "missing": {
                      "field": "start_publication"
                    }
                  }
                ]
              }
            },
            {
              "or":{
                "filters": [
                  {
                    "range": {
                      "end_publication": {
                        "gte": "2015-02-17T03:45:00.245012+00:00"
                      }
                    }
                  },
                  {
                    "missing": {
                      "field": "end_publication"
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

Using elasticsearch-dsl-py, this is as close as I can get, but it is not the same. The '|' operator is turns into 'should' clauses, instead of 'OR'.

    client = Elasticsearch()
    now = timezone.now()

    s = Search(using=client,
               index="my_index"
        ).filter(
            "term", status=PUBLISHED
        ).filter(
            F("range", start_publication={"lte": now}, ) |
            F("missing", field="start_publication")
        ).filter(
            F("range", end_publication={"gte": now}, ) |
            F("missing", field="end_publication")
        )
    response = s.execute()
like image 715
Joost VanDorp Avatar asked Feb 17 '15 14:02

Joost VanDorp


People also ask

For what purpose is query DSL used in Elasticsearch?

Query DSLedit. Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses: Leaf query clauses.

How does filter work in Elasticsearch?

Frequently used filters will be cached automatically by Elasticsearch, to speed up performance. Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation.

What is difference between filter and must in Elasticsearch?

The clause (query) must appear in matching documents and will contribute to the score. The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored. Filter clauses are executed in filter context, meaning that scoring is ignored and clauses are considered for caching.

What are hits Elasticsearch?

A search consists of one or more queries that are combined and sent to Elasticsearch. Documents that match a search's queries are returned in the hits, or search results, of the response.


2 Answers

With Elasticsearch 2.x (and elasticsearch-dsl > 2.x) you can't apply filters as in @theslow1's comment anymore. Instead you have to construct your filter by combining Qs:

search = Search(using=esclient, index="myIndex")
firstFilter = Q("match", color='blue') & Q("match", status='published')
secondFilter = Q("match", color='yellow') & Q("match", author='John Doe')
combinedFilter = firstFilter | secondFilter
search = search.query('bool', filter=[combinedFilter])

The search.query('bool', filter=[combinedQ]) applies the Q-criteria as filter as described in the elasticsearch-dsl documentation.

like image 192
Michael Avatar answered Oct 10 '22 05:10

Michael


Solution:

s = Search(using=client,
           index="my_index"
    ).filter(
        "term", status=PUBLISHED
    ).filter(
        "or", [F("range", start_publication={"lte": now}, ),
               F("missing", field="start_publication")]
    ).filter(
        "or", [F("range", end_publication={"gte": now}, ),
               F("missing", field="end_publication")]
    )

Which turns into:

{  
   "query":{  
      "filtered":{  
         "filter":{  
            "bool":{  
               "must":[  
                  {  
                     "term":{  
                        "status":"published"
                     }
                  },
                  {  
                     "or":{  
                        "filters":[  
                           {  
                              "range":{  
                                 "start_publication":{  
                                    "lte":"2015-02-17T03:45:00.245012+00:00"
                                 }
                              }
                           },
                           {  
                              "missing":{  
                                 "field":"start_publication"
                              }
                           }
                        ]
                     }
                  },
                  {  
                     "or":{  
                        "filters":[  
                           {  
                              "range":{  
                                 "end_publication":{  
                                    "gte":"2015-02-17T03:45:00.245012+00:00"
                                 }
                              }
                           },
                           {  
                              "missing":{  
                                 "field":"end_publication"
                              }
                           }
                        ]
                     }
                  }
               ]
            }
         },
         "query":{  
            "match_all":{  

            }
         }
      }
   }
}

Hopefully this can be included in the elasticsearch-dsl-py documentation in the future.

like image 22
Joost VanDorp Avatar answered Oct 10 '22 06:10

Joost VanDorp