Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch complex proximity query

Given that I have a query like below:

council* W/5 (tip OR tips)

The above query can be translated as: Find anything that has council* and (tip OR tips) no more than 5 words apart.

So following text will match:

  • Shellharbour City Council Tip
  • council best tip
  • councils top 10 tips

But this one should not match:

  • ... City Council at Shellharbour. There is not any good tip at all.

I need help to build an elasticsearch query for that. I was thinking about Regex query but I'm not quite sure about better alternatives. Thanks

like image 332
Van Thoai Nguyen Avatar asked Mar 04 '14 03:03

Van Thoai Nguyen


People also ask

What is proximity search in Elasticsearch?

A proximity search allows for the order of the terms to be different or for the terms to be further apart than in the search query. This is useful if you don't just want to ensure that the terms exist within a field, but also that they appear close to each other, i.e. in the same context.

How do I search all fields in Elasticsearch?

Either the query_string query or the match query would be what you're looking for. query_string will use the special _all field if none is specified in default_field , so that would work out well. And with match you can just specify the _all as well.

What is Elasticsearch slop?

Question 1: Slop is the number of words separating the span clauses. So slop 0 would mean they are adjacent.


1 Answers

You can use a combination of the span_near query, span_multi and span_or. We can use the query below to perform the same search.

{
  "query": {
    "span_near": {
      "clauses": [
        {
          "span_multi":
          {
            "match":
            {
              "prefix": { "text": "council"}
            }
          }
        },
        {
          "span_or": {
            "clauses": [
              {
                "span_term": {
                  "text": {
                    "value": "tip"
                  }
                }
              },
              {
                "span_term": {
                  "text": {
                    "value": "tips"
                  }
                }
              }
            ]
          }
        }
      ],
      "slop": 5,
      "in_order": true
    }
  }
}

The important things to look out for are the span_term which is the text your searching for. In this example I only had one field called "text". Slop indicates the number of words we will allow between the terms, and in_order indicates that the order of words is important. So "tip council" will not match, where as "council tip" will.

like image 187
Akshay Avatar answered Sep 17 '22 23:09

Akshay