Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch post_filter and filtered aggregations not behaving the same way

I've been spending a whole week on this with no hope of solving it. I am following this (quite old) article on e-commerce search and faceted filtering, etc., and it's working good so far (the search results are great and the aggregations work great when the filters are applied IN the query. I am using ElasticSearch 6.1.1.

But because I want to allow my users to perform multiple selections on the facets, I've moved the filters into the post_filter section. This still works well, it's filtering the results correctly and shows the aggregation counts for the whole document set accurately.

After reading this question on StackOverflow, I've realised that I have to perform some crazy acrobatics with 'filtered' aggregations alongside 'special' aggregations to reciprocally prune the aggregations in order to show correct counts AND allow multiple filters with them at the same time. I've asked for some clarification on that question but no response yet (it is an old question).

The problem I've been struggling with for so long is to get a set of filtered aggregations on nested fields where ALL the facets are filtered with all the filters.

My plan is to use the general aggregations (unfiltered) and keep the selected facet aggregations unfiltered (so that I can select multiple entries) but filter all the OTHER aggregations with the currently selected facets, so that I can only display the filters I can still apply.

However, if I use THE SAME filter on the documents (which work fine), and put the filter in the filtered aggregations, they do not work as desired. The counts are all wrong. I am aware that aggregations are computed before the filters, which is why I am replicating the filters on the aggregations I want.

Here is my query:

  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "fields": [
              "search_data.full_text_boosted^7",
              "search_data.full_text^2"
            ],
            "type": "cross_fields",
            "analyzer": "full_text_search_analyzer",
            "query": "some book"
          }
        }
      ]
    }
  }

Nothing special here, it works great and returns relevant results.

And here is my filter (in post_filter):

"post_filter" : {
    "bool" : {
      "must" : [
      {
        "nested": {
          "path": "string_facets",
            "query": {
              "bool" : {
                "filter" : 
                [
                  { "term" : { "string_facets.facet_name" : "Cover colour" } },
                  { "terms" : { "string_facets.facet_value" : [ "Green" ] } }
                ]
              }
            }
          }
        }

      ]
    }
  }

Let me stress: this works FINE. I am seeing the correct results ( in this case '13' results are shown, all matching the correct field - 'Cover colour' = 'Green' ).

Here are my general (unfiltered aggregations), that return all of the facets with correct count for all the products:

    "agg_string_facets": {
  "nested": {
    "path": "string_facets"
  },
  "aggregations": {
      "facet_name": {
        "terms": {
          "field": "string_facets.facet_name"
        },
        "aggregations": {
          "facet_value": {
            "terms": {
              "field": "string_facets.facet_value"
            }
          }
        }
      }
  }
}

This too works perfectly! I am seeing all the aggregations with accurate facet counts for all documents matching my query.

Now, check this out: I am creating an aggregation for the same nested fields but filtered so that I can get the aggregations + facets that 'survive' my filter:

"agg_all_facets_filtered" : {

           "filter" : {
             "bool" : {
               "must" : [
                {
                   "nested": {
                     "path": "string_facets",
                     "query": {
                       "bool" : {
                         "filter" : [
                           { "term" : { "string_facets.facet_name" : "Cover colour" } },
                           { "terms" : { "string_facets.facet_value" : [ "Green" ] } }
                          ]
                       }
                    }
                  }
              }]
            }
        },
        "aggs" : {
         "agg_all_facets_filtered" : {
           "nested": { "path": "string_facets" },
           "aggregations": {
            "facet_name": {
              "terms": { "field": "string_facets.facet_name" },
              "aggregations": {
                    "facet_value": {
                      "terms": { "field": "string_facets.facet_value" }
                    }
                  }
                }
            }  
         }

       }

Please note the filter I am using on this aggregation is the same as the one that filters my results in the first place (in post).

But for some reason, the aggregations returned are all wrong, namely the facet counts. For example, in my search here, I am getting 13 results, but the aggregation returned from 'agg_all_facets_filtered' only has a count of: 'Cover colour' = 4.

{
  "key": "Cover colour",
  "doc_count": 4,
  "facet_value": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
          "key": "Green",
          "doc_count": 4
        }
    ]
  }
}

After checking why 4, I noticed that 3 of the documents contain the facet 'Cover colour' twice: once for 'Green' and once for 'Some other colours'... so it seems my aggregations are only counting the entries that have that facet name TWICE - or have it in common with other documents. This is why I think my filter on the aggregation is wrong. I've done a lot of reading on the AND vs OR of the matching/filters, I tried with 'Filter', 'Should', etc. Nothing fixes this.

I am sorry this is was long question but:

HOW can I write the aggregation filter so that the facets returned have the correct counts, given the fact that my filter works perfectly on its own?

Thanks all very much.

UPDATE: Following request for example, here is my full query (please note the filters in post_filter as well as the same filter in the filtered aggregations):

{
  "size" : 0,
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "fields": [
              "search_data.full_text_boosted^7",
              "search_data.full_text^2"
            ],
            "type": "cross_fields",
            "analyzer": "full_text_search_analyzer",
            "query": "bible"
          }
        }
      ]
    }
  },

  "post_filter" : {

    "bool" : {
      "must" : [
      {
        "nested": {
          "path": "string_facets",
            "query": {
              "bool" : {
                "filter" : 
                [
                  { "term" : { "string_facets.facet_name" : "Cover colour" } },
                  { "terms" : { "string_facets.facet_value" : [ "Green" ] } }
                ]
              }
            }
          }
        }

      ]
    }

  },

  "aggregations": {

        "agg_string_facets": {
      "nested": {
        "path": "string_facets"
      },
      "aggregations": {
          "facet_name": {
            "terms": {
              "field": "string_facets.facet_name"
            },
            "aggregations": {
              "facet_value": {
                "terms": {
                  "field": "string_facets.facet_value"
                }
              }
            }
          }
      }
    },

    "agg_all_facets_filtered" : {

           "filter" : {
             "bool" : {
               "must" : [
                {
                   "nested": {
                     "path": "string_facets",
                     "query": {
                       "bool" : {
                         "filter" : [
                           { "term" : { "string_facets.facet_name" : "Cover colour" } },
                           { "terms" : { "string_facets.facet_value" : [ "Green" ] } }
                          ]
                       }
                    }
                  }
              }]
            }
        },
        "aggs" : {
         "agg_all_facets_filtered" : {
           "nested": { "path": "string_facets" },
           "aggregations": {
            "facet_name": {
              "terms": { "field": "string_facets.facet_name" },
              "aggregations": {
                    "facet_value": {
                      "terms": { "field": "string_facets.facet_value" }
                    }
                  }
                }
            }  
         }

       }


    }

  }
}

The returned results are correct (as far as documents go) and here is the aggregation (unfiltered, from the results, for 'agg_string_facets' - notice 'Green' shows 13 documents - which is correct):

{
            "key": "Cover colour",
            "doc_count": 483,
            "facet_value": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 111,
              "buckets": [
                {
                  "key": "Black",
                  "doc_count": 87
                },
                {
                  "key": "Brown",
                  "doc_count": 75
                },
                {
                  "key": "Blue",
                  "doc_count": 45
                },
                {
                  "key": "Burgundy",
                  "doc_count": 43
                },
                {
                  "key": "Pink",
                  "doc_count": 30
                },
                {
                  "key": "Teal",
                  "doc_count": 27
                },
                {
                  "key": "Tan",
                  "doc_count": 20
                },
                {
                  "key": "White",
                  "doc_count": 18
                },
                {
                  "key": "Chocolate",
                  "doc_count": 14
                },
                {
                  "key": "Green",
                  "doc_count": 13
                }
              ]
            }
          }

And here is the aggregation (filtered with the same filter, at the same time from 'agg_all_facets_filtered'), showing only 4 for 'Green':

{
              "key": "Cover colour",
              "doc_count": 4,
              "facet_value": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "Green",
                    "doc_count": 4
                  }
                ]
              }
            }

UPDATE 2: Here are some sample documents returned by the query:

"hits": {
    "total": 13,
    "max_score": 17.478987,
    "hits": [
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "33107",
        "_score": 17.478987,
        "_source": {
          "type": "product",
          "document_id": 33107,
          "search_data": {
            "full_text": "hcsb compact ultrathin bible mint green leathertouch  holman bible staff leather binding 9781433617751 ",
            "full_text_boosted": "HCSB Compact Ultrathin Bible Mint Green Leathertouch Holman Bible Staff "
          },
          "search_result_data": {
            "name": "HCSB Compact Ultrathin Bible, Mint Green Leathertouch (Leather)",
            "preview_image": "/images/products/medium/0.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=33107"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Leather"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Compact"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Ultrathin"
            },
            {
              "facet_name": "Bible version",
              "facet_value": "HCSB"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      },
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "17240",
        "_score": 17.416323,
        "_source": {
          "type": "product",
          "document_id": 17240,
          "search_data": {
            "full_text": "kjv thinline bible compact  leather binding 9780310439189 ",
            "full_text_boosted": "KJV Thinline Bible Compact "
          },
          "search_result_data": {
            "name": "KJV Thinline Bible, Compact (Leather)",
            "preview_image": "/images/products/medium/17240.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=17240"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Leather"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Compact"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Thinline"
            },
            {
              "facet_name": "Bible version",
              "facet_value": "KJV"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      },
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "17243",
        "_score": 17.416323,
        "_source": {
          "type": "product",
          "document_id": 17243,
          "search_data": {
            "full_text": "kjv busy mom's bible  leather binding 9780310439134 ",
            "full_text_boosted": "KJV Busy Mom'S Bible "
          },
          "search_result_data": {
            "name": "KJV Busy Mom's Bible (Leather)",
            "preview_image": "/images/products/medium/17243.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=17243"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Leather"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Pocket"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Thinline"
            },
            {
              "facet_name": "Bible version",
              "facet_value": "KJV"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Pink"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      },
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "33030",
        "_score": 15.674053,
        "_source": {
          "type": "product",
          "document_id": 33030,
          "search_data": {
            "full_text": "apologetics study bible for students grass green leathertou  mcdowell sean; holman bible s leather binding 9781433617720 ",
            "full_text_boosted": "Apologetics Study Bible For Students Grass Green Leathertou Mcdowell Sean; Holman Bible S"
          },
          "search_result_data": {
            "name": "Apologetics Study Bible For Students, Grass Green Leathertou (Leather)",
            "preview_image": "/images/products/medium/33030.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=33030"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Leather"
            },
            {
              "facet_name": "Bible designation",
              "facet_value": "Study Bible"
            },
            {
              "facet_name": "Bible designation",
              "facet_value": "Students"
            },
            {
              "facet_name": "Bible feature",
              "facet_value": "Indexed"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      },
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "33497",
        "_score": 15.674053,
        "_source": {
          "type": "product",
          "document_id": 33497,
          "search_data": {
            "full_text": "hcsb life essentials study bible brown / green  getz gene a.; holman bible st imitation leather 9781586400446 ",
            "full_text_boosted": "HCSB Life Essentials Study Bible Brown  Green Getz Gene A ; Holman Bible St"
          },
          "search_result_data": {
            "name": "HCSB Life Essentials Study Bible Brown / Green (Imitation Leather)",
            "preview_image": "/images/products/medium/33497.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=33497"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Imitation Leather"
            },
            {
              "facet_name": "Bible designation",
              "facet_value": "Study Bible"
            },
            {
              "facet_name": "Bible version",
              "facet_value": "HCSB"
            },
            {
              "facet_name": "Binding",
              "facet_value": "Imitation leather"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Brown"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      }
}
like image 790
Cristian Cotovan Avatar asked Dec 21 '18 17:12

Cristian Cotovan


1 Answers

Mystery solved! Thanks for your input, it turns out that the version I was using (6.1.1) has a bug. I don't know what the bug exactly is, but I've installed ElasticSearch 6.5, reindexed my data and with no changes to the queries or mappings, it all works as it should!

Now, I don't know if I should submit a bug report to ES, or just leave it, seeing as it's an older version and they've moved on.

like image 82
Cristian Cotovan Avatar answered Nov 03 '22 23:11

Cristian Cotovan