Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing Array.Except(Array2) > 0 query in elasticsearch filter?

Let's say I have following documents indexed:

[
    {
        "Id": 1,
        "Numbers": [1, 2, 3]
    },
    {
        "Id": 2,
        "Numbers": [4, 5]
    }    
]

I have a parameter [1,2,4,5], which defines which numbers I am not allowed to see - I want to find documents where "Numbers" array contains at least one element NOT in my input array (so in this case first document should be returned).

Real scenario is for finding groups which (or who's child groups) do not contain products belonging to certain product type. I have recursively indexed product type ids (represented as numbers in the example) and I want to find groups which contain products not belonging to my input parameter (my input parameter being an array of product type ids I am not allowed to see)

Which query/filter should I use and how should it be constructed? I have considered the following:

        return desc.Bool(b => b
            .MustNot(mn => mn.Bool(mnb => mnb.Must(mnbm => mnbm.Terms(t => t.ItemGroups, permissions.RestrictedItemGroups) && mnbm.Term(t => t.ItemGroupCount, permissions.RestrictedItemGroups.Count())))));

but the problem is if I have 6 restricted item groups, where as a given group contains 3 restricted groups, then I won't find any matches because the count won't match. That makes quite a bit of sense now. As a workaround I've implemented Results.Except(Restricted) in C# to filter out restricted groups post-search, but would love to implement it in elasticsearch.

like image 963
Evaldas Raisutis Avatar asked Dec 13 '18 14:12

Evaldas Raisutis


1 Answers

New answer

I'm leaving the older answer below as it might be of use to other people. In your case, you want to filter out documents that don't match and not only flag them. So, the following query would get you what you expect, i.e. only the first document:

POST test/_search
{
  "query": {
    "script": {
      "script": {
        "source": """
          // copy the doc values into a temporary list
          def tmp = new ArrayList(doc.Numbers.values);

          // remove all ids from the params
          tmp.removeIf(n -> params.ids.contains((int)n));

          // return true if the array still contains ids, false if not
          return tmp.size() > 0;
        """,
        "params": {
          "ids": [
            1,
            2,
            4,
            5
          ]
        }
      }
    }
  }
}

Older answer

One way to solve this is by using a script field which will return true or false depending on your condition:

POST test/_search
{
  "_source": true,
  "script_fields": {
    "not_present": {
      "script": {
        "source": """
      // copy the numbers array
      def tmp = params._source.Numbers;

      // remove all ids from the params
      tmp.removeIf(n -> params.ids.contains(n));

      // return true if the array still contains data, false if not
      return tmp.length > 0;
""",
        "params": {
          "ids": [ 1, 2, 4, 5 ]
        }
      }
    }
  }
}

The result would look like this:

  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "Id" : 2,
          "Numbers" : [
            4,
            5
          ]
        },
        "fields" : {
          "not_present" : [
            false                           <--- you don't want this doc
          ]
        }
      },
      {
        "_index" : "test",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "Id" : 1,
          "Numbers" : [
            1,
            2,
            3
          ]
        },
        "fields" : {
          "not_present" : [
            true                            <--- you want this one, though
          ]
        }
      }
    ]
  }
}
like image 98
Val Avatar answered Sep 19 '22 13:09

Val