I'm looking for a way to do exact array matches in elastic search. Let's say these are my documents:
{"id": 1, "categories" : ["c", "d"]}
{"id": 2, "categories" : ["b", "c", "d"]}
{"id": 3, "categories" : ["c", "d", "e"]}
{"id": 4, "categories" : ["d"]}
{"id": 5, "categories" : ["c", "d"]}
Is there a way to search for all document's that have exactly the categories "c" and "d" (documents 1 and 5), no more or less?
As a bonus: Searching for "one of these" categories should still be possible as well (for example you could search for "c" and get 1, 2, 3 and 5)
Any clever way to tackle this problem?
If you have a discrete, known set of categories, you could use a bool query:
"bool" : {
"must" : {
"terms" : { "categories" : ["c", "d"],
minimum_should_match : 2
}
},
"must_not" : {
"terms" : { "categories" : ["a", "b", "e"],
minimum_should_match : 1
}
}
}
Otherwise, Probably the easiest way to accomplish this, I think, is to store another field serving as a categories keyword.
{"id": 1, "categories" : ["c", "d"], "categorieskey" : "cd"}
Something like that. Then you could easily query with a term query for precisely the results you want, like:
term { "categorieskey" : "cd" }
And you could still search non-exclusively, as;
term { "categories" : "c" }
Querying for two categories that must both be present is easy enough, but then preventing any other potential categories from being present is a bit harder. You could do it, probably. You'dd probably want to write a query to find records with both, then apply a filter to it eliminating any records with categories other than the ones specified. It's not really a sort of search that Lucene is really designed to handle, to my knowledge.
Honestly I'm having a bit of trouble coming up with a good filter to use here. You might need a script filter, or you could filter the results after they have been retrieved.
I found a solution for our usage case that appears to work. It relies on two filters and the knowledge of how many categories we want to match against. We make use of a terms filter and a script filter to check the size of the array. In this example, marketBasketList is similar to your categories entry.
{
"query": {
"bool": {
"must": [
{
"match": {
"siteId": 4
}
},
{
"match": {
"marketBasketList": {
"query": [
10,
11
],
"operator": "and"
}
}
}
]
},
"boost": 1,
"filter": {
"and": {
"filters": [
{
"script": {
"script": "doc['marketBasketList'].values.length == 2"
}
},
{
"terms": {
"marketBasketList": [
10,
11
],
"execution": "and"
}
}
]
}
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With