Index: <pre class="prettyprint"><code>{ "settings": { "index.percolator.map_unmapped_fields_as_text": true, }, "mappings": { "properties": { "query": { "type": "percolator" } } } } </code></pre> This test percolator query works <pre class="prettyprint"><code>{ "query": { "match": { "message": "blah" } } } </code></pre> This query doesn't work <pre class="prettyprint"><code>{ "query": { "simple_query_string": { "query": "bl*" } } } </code></pre> Results: <pre class="prettyprint"><code>{"took":15,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.13076457,"hits":[{"_index":"my-index","_type":"_doc","_id":"1","_score":0.13076457,"_source":{"query":{"match":{"message":"blah"}}},"fields":{"_percolator_document_slot":[0]}}]}} </code></pre> Why this simple_query_string query doesn't match the document ?

I don't understand what you are asking either. It may be that you do not understand percolator very well? This is an example I just tried now. Let's assume you have an index - let's call it <code>test</code> - in which you want to index some documents. This index has the following mapping (just a random test index I have in my test setup): <pre class="prettyprint"><code>{ "settings": { "analysis": { "filter": { "email": { "type": "pattern_capture", "preserve_original": true, "patterns": [ "([^@]+)", "(\\p{L}+)", "(\\d+)", "@(.+)", "([^-@]+)" ] } }, "analyzer": { "email": { "tokenizer": "uax_url_email", "filter": [ "email", "lowercase", "unique" ] } } } }, "mappings": { "properties": { "code": { "type": "long" }, "date": { "type": "date" }, "part": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "val": { "type": "long" }, "email": { "type": "text", "analyzer": "email" } } } } </code></pre> You notice it has a custom <code>email</code> analyzer that splits something like <code>foo@bar.com</code> into these tokens: <code>foo@bar.com</code>, <code>foo</code>, <code>bar.com</code>, <code>bar</code>, <code>com</code>. As the documentation says, you could create a separate percolator index that will hold only your percolator queries, not also the documents themselves. And, even if the percolator index doesn't contain the documents themselves, it should hold the mapping of the index that should hold the documents (<code>test</code> in our case). This is the mapping of the percolator index (which I called it <code>percolator_index</code>) that also has the special analyzer used for splitting the <code>email</code> field: <pre class="prettyprint"><code>{ "settings": { "analysis": { "filter": { "email": { "type": "pattern_capture", "preserve_original": true, "patterns": [ "([^@]+)", "(\\p{L}+)", "(\\d+)", "@(.+)", "([^-@]+)" ] } }, "analyzer": { "email": { "tokenizer": "uax_url_email", "filter": [ "email", "lowercase", "unique" ] } } } }, "mappings": { "properties": { "query": { "type": "percolator" }, "code": { "type": "long" }, "date": { "type": "date" }, "part": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "val": { "type": "long" }, "email": { "type": "text", "analyzer": "email" } } } } </code></pre> Its mapping and settings are almost the same with my original index, the only difference being the additional <code>query</code> field which is of type <code>percolator</code> added to the mapping. The query you are interested it - <code>simple_query_string</code> - should go into a document inside <code>percolator_index</code>. Like so: <pre class="prettyprint"><code>PUT /percolator_index/_doc/1?refresh { "query": { "simple_query_string" : { "query" : "month foo@bar.com", "fields": ["part", "email"] } } } </code></pre> To make it more interesting, I added the <code>email</code> field in there to be specifically searched for in the query (by default, all of them are searched). Now, the aim is to test a document that should eventually go into <code>test</code> index against this <code>simple_query_string</code> query from your percolator index. For example: <pre class="prettyprint"><code>GET /percolator_index/_search { "query": { "percolate": { "field": "query", "document": { "date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo@bar.com" } } } } </code></pre> What's under <code>document</code> is, obviously, your future (non-existent yet) document. This will be matched against the above defined <code>simple_query_string</code> and will result in a match: <pre class="prettyprint"><code>{ "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 0.39324823, "hits": [ { "_index": "percolator_index", "_type": "_doc", "_id": "1", "_score": 0.39324823, "_source": { "query": { "simple_query_string": { "query": "month foo@bar.com", "fields": [ "part", "email" ] } } }, "fields": { "_percolator_document_slot": [ 0 ] } } ] } } </code></pre> What if I would have percolated this document instead: <pre class="prettyprint"><code>{ "query": { "percolate": { "field": "query", "document": { "date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo" } } } } </code></pre> (notice that the email is only <code>foo</code>) This is the result: <pre class="prettyprint"><code>{ "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 0.26152915, "hits": [ { "_index": "percolator_index", "_type": "_doc", "_id": "1", "_score": 0.26152915, "_source": { "query": { "simple_query_string": { "query": "month foo@bar.com", "fields": [ "part", "email" ] } } }, "fields": { "_percolator_document_slot": [ 0 ] } } ] } } </code></pre> Notice that the score is a bit lower than the first percolated document. This is probably like this because <code>foo</code> (my email) matched only one of the terms inside my analyzed <code>foo@bar.com</code>, while <code>foo@bar.com</code> would have matched all of them (thus giving a better score) Not sure what analyzer are you talking about though. I think the example above covers the only "analyzer" issue/unknown that I think may be a bit confusing.

How to percolate simple_query_string/query_string query

Tags:

elasticsearch

elasticsearch-percolate

Index:

Click to copy

{
    "settings": {
        "index.percolator.map_unmapped_fields_as_text": true,
    },
    "mappings": {
        "properties": {
            "query": {
                "type": "percolator"
            }
        }
    }
}

This test percolator query works

Click to copy

{
    "query": {
        "match": {
            "message": "blah"
        }
    }
}

This query doesn't work

Click to copy

{
    "query": {
        "simple_query_string": {
            "query": "bl*"
        }
    }
}

Results:

Click to copy

{"took":15,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.13076457,"hits":[{"_index":"my-index","_type":"_doc","_id":"1","_score":0.13076457,"_source":{"query":{"match":{"message":"blah"}}},"fields":{"_percolator_document_slot":[0]}}]}}

Why this simple_query_string query doesn't match the document ?

925

asked Oct 31 '19 21:10

Rrr

1 Answers

I don't understand what you are asking either. It may be that you do not understand percolator very well? This is an example I just tried now.

Let's assume you have an index - let's call it test - in which you want to index some documents. This index has the following mapping (just a random test index I have in my test setup):

Click to copy

{  
    "settings": {
        "analysis": {
          "filter": {
            "email": {
              "type": "pattern_capture",
              "preserve_original": true,
              "patterns": [
                "([^@]+)",
                "(\\p{L}+)",
                "(\\d+)",
                "@(.+)",
                "([^-@]+)"
              ]
            }
          },
          "analyzer": {
            "email": {
              "tokenizer": "uax_url_email",
              "filter": [
                "email",
                "lowercase",
                "unique"
              ]
            }
          }
        }
      },
    "mappings": {
        "properties": {
            "code": {
                "type": "long"
            },
            "date": {
                "type": "date"
            },
            "part": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "val": {
                "type": "long"
            },
            "email": {
              "type": "text",
              "analyzer": "email"
            }
        }
    }
}

You notice it has a custom email analyzer that splits something like foo@bar.com into these tokens: foo@bar.com, foo, bar.com, bar, com.

As the documentation says, you could create a separate percolator index that will hold only your percolator queries, not also the documents themselves. And, even if the percolator index doesn't contain the documents themselves, it should hold the mapping of the index that should hold the documents (test in our case).

This is the mapping of the percolator index (which I called it percolator_index) that also has the special analyzer used for splitting the email field:

Click to copy

{  
    "settings": {
        "analysis": {
          "filter": {
            "email": {
              "type": "pattern_capture",
              "preserve_original": true,
              "patterns": [
                "([^@]+)",
                "(\\p{L}+)",
                "(\\d+)",
                "@(.+)",
                "([^-@]+)"
              ]
            }
          },
          "analyzer": {
            "email": {
              "tokenizer": "uax_url_email",
              "filter": [
                "email",
                "lowercase",
                "unique"
              ]
            }
          }
        }
      },
    "mappings": {
        "properties": {
            "query": {
                "type": "percolator"
            },
            "code": {
                "type": "long"
            },
            "date": {
                "type": "date"
            },
            "part": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "val": {
                "type": "long"
            },
            "email": {
              "type": "text",
              "analyzer": "email"
            }
        }
    }
}

Its mapping and settings are almost the same with my original index, the only difference being the additional query field which is of type percolator added to the mapping.

The query you are interested it - simple_query_string - should go into a document inside percolator_index. Like so:

Click to copy

PUT /percolator_index/_doc/1?refresh
{
    "query": {
        "simple_query_string" : {
            "query" : "month foo@bar.com",
            "fields": ["part", "email"]
        }
    }
}

To make it more interesting, I added the email field in there to be specifically searched for in the query (by default, all of them are searched).

Now, the aim is to test a document that should eventually go into test index against this simple_query_string query from your percolator index. For example:

Click to copy

GET /percolator_index/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo@bar.com"
      }
    }
  }
}

What's under document is, obviously, your future (non-existent yet) document. This will be matched against the above defined simple_query_string and will result in a match:

Click to copy

{
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.39324823,
        "hits": [
            {
                "_index": "percolator_index",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.39324823,
                "_source": {
                    "query": {
                        "simple_query_string": {
                            "query": "month foo@bar.com",
                            "fields": [
                                "part",
                                "email"
                            ]
                        }
                    }
                },
                "fields": {
                    "_percolator_document_slot": [
                        0
                    ]
                }
            }
        ]
    }
}

What if I would have percolated this document instead:

Click to copy

{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo"
      }
    }
  }
}

(notice that the email is only foo) This is the result:

Click to copy

{
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.26152915,
        "hits": [
            {
                "_index": "percolator_index",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.26152915,
                "_source": {
                    "query": {
                        "simple_query_string": {
                            "query": "month foo@bar.com",
                            "fields": [
                                "part",
                                "email"
                            ]
                        }
                    }
                },
                "fields": {
                    "_percolator_document_slot": [
                        0
                    ]
                }
            }
        ]
    }
}

Notice that the score is a bit lower than the first percolated document. This is probably like this because foo (my email) matched only one of the terms inside my analyzed foo@bar.com, while foo@bar.com would have matched all of them (thus giving a better score)

Not sure what analyzer are you talking about though. I think the example above covers the only "analyzer" issue/unknown that I think may be a bit confusing.

105

answered Nov 15 '22 07:11

Andrei Stefan

Related questions
                            
                                Searching against secured AWS ElasticSearch
                            
                                Elasticsearch Histogram of visits
                            
                                Implementing Suggestions 'xxx in Category' using elasticsearch
                            
                                Why my elasticsearch failed to build transportclient in JAVA API?
                            
                                Random disconnects from master node NoNodeAvailableException using Elastic Cloud/Found
                            
                                setting up Elasticsearch server for processing data from microservices
                            
                                Unable to rebuild_index elasticsearch with Django Haystack: 'Connection refused'
                            
                                Root user in Elasticsearch 2.4.0 in Docker container
                            
                                How to perform date arithmetic between nested and unnested dates in Elasticsearch?
                            
                                Connecting to Docker Elasticsearch instance through Java/Spring Boot
                            
                                Connect kibana to elasticsearch in kubernetes cluster
                            
                                End of search results using search_after parameter from Elastic Search API
                            
                                CQRS: project out-of-order notifications in an ElasticSearch read model
                            
                                How to implement ACL on an ElasticSearch-based system?
                            
                                Storing nested objects in elastic search
                            
                                How to tune Elasticsearch to make it indexing fast?
                            
                                Using AWS4 Signature via Postman for CRUD Elastic operations
                            
                                Why elastic-search container memory usage keeps increasing with little use?
                            
                                Elasticsearch: Can it be used to avoid writing your own NLP? (e.g. Re-invent the wheel)
                            
                                Unable to search a query with symbols in elasticsearch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to percolate simple_query_string/query_string query

Tags:

elasticsearch

elasticsearch-percolate

Rrr

People also ask

1 Answers

Andrei Stefan

Recent Activity

Donate For Us