Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find out result of elasticsearch parsing a query_string?

Is there a way to find out via the elasticsearch API how a query string query is actually parsed? You can do that manually by looking at the lucene query syntax, but it would be really nice if you could look at some representation of the actual results the parser has.

like image 223
Hans-Peter Störr Avatar asked Aug 23 '13 10:08

Hans-Peter Störr


People also ask

How do I retrieve data from Elasticsearch?

You can use the search API to search and aggregate data stored in Elasticsearch data streams or indices. The API's query request body parameter accepts queries written in Query DSL. The following request searches my-index-000001 using a match query. This query matches documents with a user.id value of kimchy .

How do I get all records in Elasticsearch?

Introduction. You can use cURL in a UNIX terminal or Windows command prompt, the Kibana Console UI, or any one of the various low-level clients available to make an API call to get all of the documents in an Elasticsearch index. All of these methods use a variation of the GET request to search the index.

How do I query keyword in Elasticsearch?

Querying Elasticsearch works by matching the queried terms with the terms in the Inverted Index, the terms queried and the one in the Inverted Index must be exactly the same, else it won't get matched.

How do I capture a specific field in Elasticsearch?

There are two recommended methods to retrieve selected fields from a search query: Use the fields option to extract the values of fields present in the index mapping. Use the _source option if you need to access the original data that was passed at index time.


1 Answers

As javanna mentioned in comments there's _validate api. Here's what works on my local elastic (version 1.6):

curl -XGET 'http://localhost:9201/pl/_validate/query?explain&pretty' -d'
{
  "query": {
      "query_string": {
      "query": "a OR (b AND c) OR (d AND NOT(e or f))",
      "default_field": "t"
    }
  }
}
'

pl is name of index on my cluster. Different index could have different analyzers, that's why query validation is executed in a scope of an index.

The result of the above curl is following:

{
  "valid" : true,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "explanations" : [ {
    "index" : "pl",
    "valid" : true,
    "explanation" : "filtered(t:a (+t:b +t:c) (+t:d -(t:e t:or t:f)))->cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter@ce2d82f1)"
  } ]
}

I made one OR lowercase on purpose and as you can see in explanation, it is interpreted as a token and not as a operator.

As for interpretation of the explanation. Format is similar to +- operators of query string query:

  • ( and ) characters start and end bool query
  • + prefix means clause that will be in must
  • - prefix means clause that will be in must_not
  • no prefix means that it will be in should (with default_operator equal to OR)

So above will be equivalent to following:

{
  "bool" : {
    "should" : [
      {
        "term" : { "t" : "a" }
      },
      {
        "bool": {
          "must": [
            {
              "term" : { "t" : "b" }
            },
            {
              "term" : { "t" : "c" }
            }
          ]
        }
      },
      {
        "bool": {
          "must": {
              "term" : { "t" : "d" }
          },
          "must_not": {
            "bool": {
              "should": [
                {
                  "term" : { "t" : "e" }
                },
                {
                  "term" : { "t" : "or" }
                },
                {
                  "term" : { "t" : "f" }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

I used _validate api quite heavily to debug complex filtered queries with many conditions. It is especially useful if you want to check how analyzer tokenized input like an url or if some filter is cached.

There's also an awesome parameter rewrite that I was not aware of until now, which causes the explanation to be even more detailed showing the actual Lucene query that will be executed.

like image 186
slawek Avatar answered Oct 19 '22 05:10

slawek