Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to sort by a range in Elasticsearch?

When I execute the following query:

{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "my_value": "hi"
          }
        },
        {
          "range": {
            "my_range": {
              "gt": 0,
              "lte": 200
            }
          }
        }
      ]
    }
  },
  "sort": {
    "my_range": {
      "order": "asc",
      "mode": "min"
    }
  }
}

I get the error:

"caused_by": {
  "type": "illegal_argument_exception",
  "reason": "Fielddata is not supported on field [my_range] of type [long_range]"
}

How can I enable a range datatype to be sortable? Is this possible?

Elasticsearch version: 5.4, but I am wondering if this is possible with ANY version.

More context

Not all documents in the alias/index have the range field. However, the query filters to only include documents with that field.

like image 488
MirroredFate Avatar asked Jul 17 '18 19:07

MirroredFate


People also ask

What is range query in Elasticsearch?

Range Queries in Elasticsearch Combining the greater than ( gt ) and less than ( lt ) range parameters is an effective way to search for documents that contain a certain field value within a range where you know the upper and lower bounds.

How do I sort in Elasticsearch?

Sort mode optioneditPick the highest value. Use the sum of all values as sort value. Only applicable for number based array fields. Use the average of all values as sort value.

How does Elasticsearch sort work?

What is Elasticsearch Sorting? In order to sort by relevance, we need to represent relevance as a value. In Elasticsearch, the relevance score is represented by the floating-point number returned in the search results as the _score, so the default sort order is _score descending.

What is Elasticsearch DSL?

Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built on top of the official low-level client ( elasticsearch-py ). It provides a more convenient and idiomatic way to write and manipulate queries.


2 Answers

It is not straight-forward to sort using a field of range data type. Still you can use script based sorting to some extent to get the expected result.

e.g. For simplicity of script I'm assuming for all your docs, the data indexed against my_range field has data for gt and lte only and you want to sort based on the minimum values of the two then you can add the below for sorting:

{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "my_value": "hi"
          }
        },
        {
          "range": {
            "my_range": {
              "gt": 0,
              "lte": 200
            }
          }
        }
      ]
    }
  },
  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "lang": "painless",
        "inline": "Math.min(params['_source']['my_range']['gt'], params['_source']['my_range']['lte'])"            
      },
      "order": "asc"
    }
  }
}

You can modify the script as per your needs for complex data involving combination of all lt, gt, lte, gte.

Updates (Scripts for other different use cases):

1. Sort by difference
"Math.abs(params['_source']['my_range']['gt'] - params['_source']['my_range']['lte'])"
2. Sort by gt
"params['_source']['my_range']['gt']"
3. Sort by lte
"params['_source']['my_range']['lte']"
4. Sorting if query returns few docs which don't have range field
"if(params['_source']['my_range'] != null) { <sorting logic> } else { return 0; }"

Replace <sorting logic> with the required logic of sorting (which can be one of the 3 above or the one in the query)

return 0 can be replace by return -1 or anything other number as per the sorting needs

like image 145
Nishant Avatar answered Oct 21 '22 23:10

Nishant


I think what you are looking for is sort based on the difference of the range coz I'm not sure if simply sorting on any of the range values would make any sense.

For e.g. if range for one document is 100, 300 and another 200, 600 then you would want to sort based on the difference for e.g. you would want the lesser range to be appearing i.e 300-100 = 200 to be appearing at the top.

If so, I've made use of the below painless script and implemented script based sorting.

Sorting based on difference in Range

POST <your_index_name>/_search
{  
   "query":{  
      "match_all":{  

      }
   },
   "sort":{  
      "_script":{  
         "type":"number",
         "script":{  
            "lang":"painless",
            "inline":"params._source.my_range.lte-params._source.my_range.gte"
         },
         "order":"asc"
      }
   }
} 

Note that in this case, sort won't be based on any of the field values of my_range but only on their differences. If you want to further sort based on the fields like lte, lt, gte or gt you can have your sort implemented with multiple script as below:

Sorting based on difference in Range + Range Field (my_range.lte)

POST <your_index_name>/_search
{  
   "query":{  
      "match_all":{  

      }
   },
   "sort":[  
      {  
         "_script":{  
            "type":"number",
            "script":{  
               "lang":"painless",
               "inline":"params._source.my_range.lte - params._source.my_range.gte"
            },
            "order":"asc"
         }
      },
      {  
         "_script":{  
            "type":"number",
            "script":{  
               "lang":"painless",
               "inline":"params._source.my_range.lte"
            },
            "order":"asc"
         }
      }
   ]
}

So in this case even if for two documents, ranges are same, the one with the lesser my_range.lte would be showing up first.

Sort based on range field

However if you simply want to sort based on one of the range values, you can make use of below query.

POST <your_index_name>/_search
{  
   "query":{  
      "match_all":{  

      }
   },
   "sort":{  
      "_script":{  
         "type":"number",
         "script":{  
            "lang":"painless",
            "inline":"params._source.my_range.lte"
         },
         "order":"asc"
      }
   }
}

Updated Answer to manage documents without range

This is for the scenario, Sort based on difference in range + Range.lte or Range.lt whichever is present

The below code what it does is,

  • Checks if the document has my_range field
  • If it doesn't have, then by default it would return Long.MAX_VALUE. This would mean if you sort by asc, this document should returned last.
  • Further it would check if document has lte or lt and uses that value as high. Note that default value of high is Long.MAX_VALUE.
  • Similarly it would check if document has gte or gt and uses that value as low. Default value of low would be 0.
  • Calculate now high - low value on which sorting would be applied.

Updated Query

POST <your_index_name>/_search
{  
   "size":100,
   "query":{  
      "match_all":{  

      }
   },
   "sort":[  
      {  
         "_script":{  
            "type":"number",
            "script":{  
               "lang":"painless",
               "inline":""" 
              if(params._source.my_range==null){ 
                return Long.MAX_VALUE; 
              } else { 

                long high = Long.MAX_VALUE; 
                long low = 0L; 

                if(params._source.my_range.lte!=null){ 
                  high = params._source.my_range.lte; 
                } else if(params._source.my_range.lt!=null){ 
                  high = params._source.my_range.lt; 
                } 

                if(params._source.my_range.gte!=null){ 
                  low = params._source.my_range.gte; 
                } else if (params._source.my_range.gt==null){ 
                  low = params._source.my_range.gt; 
                } 

              return high - low; 

              } 
                """
            },
            "order":"asc"
         }
      },
      {  
         "_script":{  
            "type":"number",
            "script":{  
               "lang":"painless",
               "inline":""" 
                if(params._source.my_range==null){ 
                  return Long.MAX_VALUE; 
                } 

                long high = Long.MAX_VALUE; 
                if(params._source.my_range.lte!=null){ 
                  high = params._source.my_range.lte; 
                } else if(params._source.my_range.lt!=null){ 
                  high = params._source.my_range.lt; 
                } 
                  return high;"""
            },
            "order":"asc"
         }
      }
   ]
}

This should work with ES 5.4. Hope it helps!

like image 1
Kamal Avatar answered Oct 22 '22 00:10

Kamal