Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exact search in array object type using elasticsearch

I'm looking for a way to do exact array matches in elastic search. Let's say these are my documents:

{"id": 1, "categories" : ["c", "d"]}
{"id": 2, "categories" : ["b", "c", "d"]}
{"id": 3, "categories" : ["c", "d", "e"]}
{"id": 4, "categories" : ["d"]}
{"id": 5, "categories" : ["c", "d"]}

Is there a way to search for all document's that have exactly the categories "c" and "d" (documents 1 and 5), no more or less?

As a bonus: Searching for "one of these" categories should still be possible as well (for example you could search for "c" and get 1, 2, 3 and 5)

Any clever way to tackle this problem?

like image 596
Pascal Avatar asked Oct 01 '12 15:10

Pascal


2 Answers

If you have a discrete, known set of categories, you could use a bool query:

"bool" : {
    "must" : {
        "terms" : { "categories" : ["c", "d"],
             minimum_should_match : 2
         }
    },
    "must_not" : {
        "terms" : { "categories" : ["a", "b", "e"],
             minimum_should_match : 1
         }
    }
}

Otherwise, Probably the easiest way to accomplish this, I think, is to store another field serving as a categories keyword.

{"id": 1, "categories" : ["c", "d"], "categorieskey" : "cd"}

Something like that. Then you could easily query with a term query for precisely the results you want, like:

term { "categorieskey" : "cd" }

And you could still search non-exclusively, as;

term { "categories" : "c" }

Querying for two categories that must both be present is easy enough, but then preventing any other potential categories from being present is a bit harder. You could do it, probably. You'dd probably want to write a query to find records with both, then apply a filter to it eliminating any records with categories other than the ones specified. It's not really a sort of search that Lucene is really designed to handle, to my knowledge.

Honestly I'm having a bit of trouble coming up with a good filter to use here. You might need a script filter, or you could filter the results after they have been retrieved.

like image 136
femtoRgon Avatar answered Oct 31 '22 00:10

femtoRgon


I found a solution for our usage case that appears to work. It relies on two filters and the knowledge of how many categories we want to match against. We make use of a terms filter and a script filter to check the size of the array. In this example, marketBasketList is similar to your categories entry.

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "siteId": 4
          }
        },
        {
          "match": {
            "marketBasketList": {
              "query": [
                10,
                11
              ],
              "operator": "and"
            }
          }
        }
      ]
    },
    "boost": 1,
    "filter": {
      "and": {
        "filters": [
          {
            "script": {
              "script": "doc['marketBasketList'].values.length == 2"
            }
          },
          {
            "terms": {
              "marketBasketList": [
                10,
                11
              ],
              "execution": "and"
            }
          }
        ]
      }
    }
  }
}
like image 21
Lucas Holt Avatar answered Oct 31 '22 01:10

Lucas Holt