Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preserving order of terms in ElasticSearch query

Is it possible in ElasticSearch to form a query that would preserve the ordering of the terms?

A simple example would be having these documents indexed using standard analyzer:

  1. You know for search
  2. You know search
  3. Know search for you

I could query for +you +search and this would return me all documents, including the third one.

What if I wanted to only retrieve the documents which have the terms in this specific order? Can I form a query that would do that for me?

Considering it is possible for phrases by simply quoting the text: "you know" (retrieve 1st and 2nd docs) it feels to me like there should be a way of preserving the order for multiple terms that aren't adjacent.

In the above simple example I could use proximity searches, but this doesn't cover more complex cases.

like image 863
Artur Avatar asked Oct 29 '14 15:10

Artur


People also ask

In which order are my Elasticsearch queries filters executed?

Q: Do filters get executed before or after queries? A: Neither, really. Everything is interleaved, regardless of whether they are queries of filters. Conjunctions get executed in a way where the clause with the least cost is used to lead iteration and other clauses are advance d to check whether they match too.

How do you use terms in Elasticsearch?

Basic Usage Elasticsearch will go through the specified field and search for all the documents that match the set value. Below is an example output: When using the term query, you must specify the field and the value under which to search.

What is difference between term and terms in Elasticsearch?

Term query return documents that contain one or more exact term in a provided field. The terms query is the same as the term query, except you can search for multiple values. Warning: Avoid using the term query for text fields.


2 Answers

You could use a span_near query, it has a in_order parameter.

{
    "query": {
        "span_near": {
            "clauses": [
                {
                    "span_term": {
                        "field": "you"
                    }
                },
                {
                    "span_term": {
                        "field": "search"
                    }
                }
            ],
            "slop": 2,
            "in_order": true
        }
    }
}
like image 190
Dan Tuffery Avatar answered Nov 15 '22 15:11

Dan Tuffery


Phrase matching doesn't ensure order ;-). If you specify enough slopes -like 2, for example - "hello world" will match "world hello". But this is not necessarily a bad thing because usually searches are more relevant if two terms are "close" to each other and it doesn't matter their order. And I don't think authors of this feature thought of matching words that are 1000 slops apart.

There is a solution that I could find to keep the order, not simple though: using scripts. Here's one example:

POST /my_index/my_type/_bulk
{ "index": { "_id": 1 }}
{ "title": "hello world" }
{ "index": { "_id": 2 }}
{ "title": "world hello" }
{ "index": { "_id": 3 }}
{ "title": "hello term1 term2 term3 term4 world" }

POST my_index/_search
{
  "query": {
    "filtered": {
      "query": {
        "match": {
          "title": {
            "query": "hello world",
            "slop": 5,
            "type": "phrase"
          }
        }
      },
      "filter": {
        "script": {
          "script": "term1Pos=0;term2Pos=0;term1Info = _index['title'].get('hello',_POSITIONS);term2Info = _index['title'].get('world',_POSITIONS); for(pos in term1Info){term1Pos=pos.position;}; for(pos in term2Info){term2Pos=pos.position;}; return term1Pos<term2Pos;",
          "params": {}
        }
      }
    }
  }
}

To make the script itself more readable, I am rewriting here with indentations:

term1Pos = 0;
term2Pos = 0;
term1Info = _index['title'].get('hello',_POSITIONS);
term2Info = _index['title'].get('world',_POSITIONS);
for(pos in term1Info) {
  term1Pos = pos.position;
}; 
for(pos in term2Info) {
  term2Pos = pos.position;
}; 
return term1Pos < term2Pos;

Above is a query that searches for "hello world" with a slop of 5 which in the docs above will match all of them. But the scripted filter will ensure that the position in document of word "hello" is lower than the position in document for word "world". In this way, no matter how many slops we set in the query, the fact that the positions are one after the other ensures the order.

This is the section in the documentation that sheds some light on the things used in the script above.

like image 21
Andrei Stefan Avatar answered Nov 15 '22 17:11

Andrei Stefan