I have an index with multiple fields in it. I want to filter out based on presence of search string in all the fields except one - user_comments. The query search that I am doing is
{
"from": offset,
"size": limit,
"_source": [
"document_title"
],
"query": {
"function_score": {
"query": {
"bool": {
"must":
{
"query_string": {
"query": "#{query}"
}
}
}
}
}
}
}
Although the query string is searching through all the fields, and giving me documents with matching string in the user_comments field as well. But, I want to query it against all the fields leaving out the user_comments field. The white-list is a very big list and also the name of the fields are dynamic, so it is not feasible to mention the white-listed field list using the fields parameter like.
"query_string": {
"query": "#{query}",
"fields": [
"document_title",
"field2"
]
}
Can anybody please suggest an idea on how to exclude a field from being searched?
Delete by query API | Elasticsearch Guide [7.13] | Elastic The only way to remove the field is to run a reindex into a new index, making sure you don't carry that field over.
The _source field contains the original JSON document body that was passed at index time. The _source field itself is not indexed (and thus is not searchable), but it is stored so that it can be returned when executing fetch requests, like get or search.
The _all field is meant to index all the content that come from all the fields that your documents are composed of. You can search on it but never return it, since it's indexed but not stored in lucene.
There is a way to make it work, it's not pretty but will do the job. You may achieve your goal using a boost and multifield parameters of query_string
, bool
query to combine the scores and setting min_score
:
POST my-query-string/doc/_search
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "#{query}",
"type": "most_fields",
"boost": 1
}
},
{
"query_string": {
"fields": [
"comments"
],
"query": "#{query}",
"boost": -1
}
}
]
}
},
"min_score": 0.00001
}
Let's assume you have the following set of documents:
PUT my-query-string/doc/1
{
"title": "Prodigy in Bristol",
"text": "Prodigy in Bristol",
"comments": "Prodigy in Bristol"
}
PUT my-query-string/doc/2
{
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham",
"comments": "And also in Bristol"
}
PUT my-query-string/doc/3
{
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham and Bristol",
"comments": "And also in Cardiff"
}
PUT my-query-string/doc/4
{
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham",
"comments": "And also in Cardiff"
}
In your search request you would like to see only documents 1 and 3, but your original query will return 1, 2 and 3.
In Elasticsearch, search results are sorted by relevance _score
, the bigger the score the better.
So let's try to boost down the "comments"
field so its impact into relevance score is neglected. We can do this by combining two queries with a should
and using a negative boost
:
POST my-query-string/doc/_search
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "Bristol"
}
},
{
"query_string": {
"fields": [
"comments"
],
"query": "Bristol",
"boost": -1
}
}
]
}
}
}
This will give us the following output:
{
"hits": {
"total": 3,
"max_score": 0.2876821,
"hits": [
{
"_index": "my-query-string",
"_type": "doc",
"_id": "3",
"_score": 0.2876821,
"_source": {
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham and Bristol",
"comments": "And also in Cardiff"
}
},
{
"_index": "my-query-string",
"_type": "doc",
"_id": "2",
"_score": 0,
"_source": {
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham",
"comments": "And also in Bristol"
}
},
{
"_index": "my-query-string",
"_type": "doc",
"_id": "1",
"_score": 0,
"_source": {
"title": "Prodigy in Bristol",
"text": "Prodigy in Bristol",
"comments": "Prodigy in Bristol",
"discount_percent": 10
}
}
]
}
}
Document 2 has got penalized, but also document 1 did, although it is a desired match for us. Why did it happen?
Here's how Elasticsearch computed _score
in this case:
_score = max(title:"Bristol", text:"Bristol", comments:"Bristol") - comments:"Bristol"
Document 1 matches the comments:"Bristol"
part and it also happens to be the best score. According to our formula the resulting score is 0.
What we would actually like to do is to boost first clause (with "all" fields) more if more fields matched.
query_string
matching more fields?We can, query_string
in multifield mode has a type
parameter that does exactly that. The query will look like this:
POST my-query-string/doc/_search
{
"query": {
"bool": {
"should": [
{
"query_string": {
"type": "most_fields",
"query": "Bristol"
}
},
{
"query_string": {
"fields": [
"comments"
],
"query": "Bristol",
"boost": -1
}
}
]
}
}
}
This will give us the following output:
{
"hits": {
"total": 3,
"max_score": 0.57536423,
"hits": [
{
"_index": "my-query-string",
"_type": "doc",
"_id": "1",
"_score": 0.57536423,
"_source": {
"title": "Prodigy in Bristol",
"text": "Prodigy in Bristol",
"comments": "Prodigy in Bristol",
"discount_percent": 10
}
},
{
"_index": "my-query-string",
"_type": "doc",
"_id": "3",
"_score": 0.2876821,
"_source": {
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham and Bristol",
"comments": "And also in Cardiff"
}
},
{
"_index": "my-query-string",
"_type": "doc",
"_id": "2",
"_score": 0,
"_source": {
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham",
"comments": "And also in Bristol"
}
}
]
}
}
As you can see, the undesired document 2 is on the bottom and has score of 0. Here's how the score was computed this time:
_score = sum(title:"Bristol", text:"Bristol", comments:"Bristol") - comments:"Bristol"
So the documents matching "Bristol"
in any field got selected. Relevance score for comments:"Bristol"
got eliminated, and only documents matching title:"Bristol"
or text:"Bristol"
got a _score
> 0.
Yes, we can, using min_score
:
POST my-query-string/doc/_search
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "Bristol",
"type": "most_fields",
"boost": 1
}
},
{
"query_string": {
"fields": [
"comments"
],
"query": "Bristol",
"boost": -1
}
}
]
}
},
"min_score": 0.00001
}
This will work (in our case) since the score of the documents will be 0 if and only if "Bristol"
was matched against field "comments"
only and didn't match any other field.
The output will be:
{
"hits": {
"total": 2,
"max_score": 0.57536423,
"hits": [
{
"_index": "my-query-string",
"_type": "doc",
"_id": "1",
"_score": 0.57536423,
"_source": {
"title": "Prodigy in Bristol",
"text": "Prodigy in Bristol",
"comments": "Prodigy in Bristol",
"discount_percent": 10
}
},
{
"_index": "my-query-string",
"_type": "doc",
"_id": "3",
"_score": 0.2876821,
"_source": {
"title": "Prodigy in Birmigham",
"text": "Prodigy in Birmigham and Bristol",
"comments": "And also in Cardiff"
}
}
]
}
}
Sure. I wouldn't actually advise to go with _score
tweaking since it is a pretty complex matter.
I would advise to make a fetch of existing mapping and construct a list of fields to run the query against beforehand, this will make the code much simpler and straightforward.
Originally it was proposed to use this kind of query with exactly the same intent as the solution above:
POST my-query-string/doc/_search
{
"query": {
"function_score": {
"query": {
"bool": {
"must": {
"query_string": {
"fields" : ["*", "comments^0"],
"query": "#{query}"
}
}
}
}
}
},
"min_score": 0.00001
}
The only problem is that if an index contains any numeric values, this part:
"fields": ["*"]
raises an error since textual query string cannot be applied to a number.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With