Index:
{
"settings": {
"index.percolator.map_unmapped_fields_as_text": true,
},
"mappings": {
"properties": {
"query": {
"type": "percolator"
}
}
}
}
This test percolator query works
{
"query": {
"match": {
"message": "blah"
}
}
}
This query doesn't work
{
"query": {
"simple_query_string": {
"query": "bl*"
}
}
}
Results:
{"took":15,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.13076457,"hits":[{"_index":"my-index","_type":"_doc","_id":"1","_score":0.13076457,"_source":{"query":{"match":{"message":"blah"}}},"fields":{"_percolator_document_slot":[0]}}]}}
Why this simple_query_string query doesn't match the document ?
Percolate queries can be simply thought of as an inverse search. Instead of sending a query to an index and getting the matching documents, you send a document to an index and get the matching queries. This is exactly what most alerting systems need.
The search API allows you to execute a search query and get back search hits that match the query. The query can either be provided using a simple query string as a parameter, or using a request body. As with everything else, Elasticsearch can be searched using HTTP.
I don't understand what you are asking either. It may be that you do not understand percolator very well? This is an example I just tried now.
Let's assume you have an index - let's call it test
- in which you want to index some documents. This index has the following mapping (just a random test index I have in my test setup):
{
"settings": {
"analysis": {
"filter": {
"email": {
"type": "pattern_capture",
"preserve_original": true,
"patterns": [
"([^@]+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)",
"([^-@]+)"
]
}
},
"analyzer": {
"email": {
"tokenizer": "uax_url_email",
"filter": [
"email",
"lowercase",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"code": {
"type": "long"
},
"date": {
"type": "date"
},
"part": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"val": {
"type": "long"
},
"email": {
"type": "text",
"analyzer": "email"
}
}
}
}
You notice it has a custom email
analyzer that splits something like [email protected]
into these tokens: [email protected]
, foo
, bar.com
, bar
, com
.
As the documentation says, you could create a separate percolator index that will hold only your percolator queries, not also the documents themselves. And, even if the percolator index doesn't contain the documents themselves, it should hold the mapping of the index that should hold the documents (test
in our case).
This is the mapping of the percolator index (which I called it percolator_index
) that also has the special analyzer used for splitting the email
field:
{
"settings": {
"analysis": {
"filter": {
"email": {
"type": "pattern_capture",
"preserve_original": true,
"patterns": [
"([^@]+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)",
"([^-@]+)"
]
}
},
"analyzer": {
"email": {
"tokenizer": "uax_url_email",
"filter": [
"email",
"lowercase",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"query": {
"type": "percolator"
},
"code": {
"type": "long"
},
"date": {
"type": "date"
},
"part": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"val": {
"type": "long"
},
"email": {
"type": "text",
"analyzer": "email"
}
}
}
}
Its mapping and settings are almost the same with my original index, the only difference being the additional query
field which is of type percolator
added to the mapping.
The query you are interested it - simple_query_string
- should go into a document inside percolator_index
. Like so:
PUT /percolator_index/_doc/1?refresh
{
"query": {
"simple_query_string" : {
"query" : "month [email protected]",
"fields": ["part", "email"]
}
}
}
To make it more interesting, I added the email
field in there to be specifically searched for in the query (by default, all of them are searched).
Now, the aim is to test a document that should eventually go into test
index against this simple_query_string
query from your percolator index. For example:
GET /percolator_index/_search
{
"query": {
"percolate": {
"field": "query",
"document": {
"date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"[email protected]"
}
}
}
}
What's under document
is, obviously, your future (non-existent yet) document. This will be matched against the above defined simple_query_string
and will result in a match:
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.39324823,
"hits": [
{
"_index": "percolator_index",
"_type": "_doc",
"_id": "1",
"_score": 0.39324823,
"_source": {
"query": {
"simple_query_string": {
"query": "month [email protected]",
"fields": [
"part",
"email"
]
}
}
},
"fields": {
"_percolator_document_slot": [
0
]
}
}
]
}
}
What if I would have percolated this document instead:
{
"query": {
"percolate": {
"field": "query",
"document": {
"date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo"
}
}
}
}
(notice that the email is only foo
)
This is the result:
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.26152915,
"hits": [
{
"_index": "percolator_index",
"_type": "_doc",
"_id": "1",
"_score": 0.26152915,
"_source": {
"query": {
"simple_query_string": {
"query": "month [email protected]",
"fields": [
"part",
"email"
]
}
}
},
"fields": {
"_percolator_document_slot": [
0
]
}
}
]
}
}
Notice that the score is a bit lower than the first percolated document. This is probably like this because foo
(my email) matched only one of the terms inside my analyzed [email protected]
, while [email protected]
would have matched all of them (thus giving a better score)
Not sure what analyzer are you talking about though. I think the example above covers the only "analyzer" issue/unknown that I think may be a bit confusing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With