Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hierarchical faceting with Elasticsearch

I'm using elasticsearch and need to implement facet search for hierarchical object as follow:

  • category 1 (10)
    • subcategory 1 (4)
    • subcategory 2 (6)
  • category 2 (X)
    • ...

So I need to get facets for two related objects. Documentation says that it's possible to get such kind of facets for numeric value, but I need it for strings http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-stats-facet.html

Here is another interesting topic, unfortunately it's old: http://elasticsearch-users.115913.n3.nabble.com/Pivot-facets-td2981519.html

Does it possible with elastic search? If so, how can I do that?

like image 458
zonder Avatar asked Dec 16 '13 18:12

zonder


2 Answers

The previous solution works really well until you have no more than a multi-level tag on a single-document. In this case a simple aggregation doesn't work, because the flat structure of the lucene fields mix the results on the internal aggregation. See the example below:

DELETE /test_category
POST /test_category

# Insert a doc with 2 hierarchical tags 
POST /test_category/test/1 
{
  "categories": [
    {
      "cat_1": "1",
      "cat_2": "1.1"
    },
    {
      "cat_1":  "2",
      "cat_2": "2.2"
    }
  ]
}

# Simple two-levels aggregations query
GET /test_category/test/_search?search_type=count
{
  "aggs": {
    "main_category": {
      "terms": {
        "field": "categories.cat_1"
      },
      "aggs": {
        "sub_category": {
          "terms": {
            "field": "categories.cat_2"
          }
        }
      }
    }
  }
}

That's the WRONG response that I have got on ES 1.4, where the fields on the internal aggregation are mixed at a document level:

{
   ...
   "aggregations": {
      "main_category": {
         "buckets": [
            {
               "key": "1",
               "doc_count": 1,
               "sub_category": {
                  "buckets": [
                     {
                        "key": "1.1",
                        "doc_count": 1
                     },
                     {
                        "key": "2.2",  <= WRONG
                        "doc_count": 1
                     }
                  ]
               }
            },
            {
               "key": "2",
               "doc_count": 1,
               "sub_category": {
                  "buckets": [
                     {
                        "key": "1.1", <= WRONG
                        "doc_count": 1
                     },
                     {
                        "key": "2.2",
                        "doc_count": 1
                     }
                  ]
               }
            }
         ]
      }
   }
}

A Solution can be to use nested objects. These are the steps to do:

1) Define a new type in the schema with nested objects

POST /test_category/test2/_mapping
{
  "test2": {
    "properties": {
      "categories": {
        "type": "nested",
        "properties": {
          "cat_1": {
            "type": "string"
          },
          "cat_2": {
            "type": "string"
          }
        }
      }
    }
  }
}

# Insert a single document 
POST /test_category/test2/1 
{"categories":[{"cat_1":"1","cat_2":"1.1"},{"cat_1":"2","cat_2":"2.2"}]}

2) Run a nested aggregation query:

GET /test_category/test2/_search?search_type=count
{
  "aggs": {
    "categories": {
      "nested": {
        "path": "categories"
      },
      "aggs": {
        "main_category": {
          "terms": {
            "field": "categories.cat_1"
          },
          "aggs": {
            "sub_category": {
              "terms": {
                "field": "categories.cat_2"
              }
            }
          }
        }
      }
    }
  }
}

That's the response, now correct, that I have got:

{
       ...
       "aggregations": {
          "categories": {
             "doc_count": 2,
             "main_category": {
                "buckets": [
                   {
                      "key": "1",
                      "doc_count": 1,
                      "sub_category": {
                         "buckets": [
                            {
                               "key": "1.1",
                               "doc_count": 1
                            }
                         ]
                      }
                   },
                   {
                      "key": "2",
                      "doc_count": 1,
                      "sub_category": {
                         "buckets": [
                            {
                               "key": "2.2",
                               "doc_count": 1
                            }
                         ]
                      }
                   }
                ]
             }
          }
       }
    }

The same solution can be extended to a more than two-levels hierarchy facet.

like image 142
pippobaudos Avatar answered Sep 17 '22 22:09

pippobaudos


Currently, elasticsearch does not support hierarchical facetting out-of-the-box. But the upcoming 1.0 release features a new aggregations module, that can be used to get these kind of facets (which are more like pivot-facets rather than hierarchical facets). Version 1.0 is currently in beta, you can download the second beta and test out aggregatins by yourself. Your example might look like

curl -XPOST 'localhost:9200/_search?pretty' -d '
{
   "aggregations": {
      "main category": {
         "terms": {
            "field": "cat_1",
            "order": {"_term": "asc"}
         },
         "aggregations": {
            "sub category": {
               "terms": {
                  "field": "cat_2",
                  "order": {"_term": "asc"}
               }
            }
         }
      }
   }
}'

The idea is, to have a different field for each level of facetting and bucket your facets based on the terms of the first level (cat_1). These aggregations then would have sub-buckets, based on the terms of the second level (cat_2). The result may look like

{
  "aggregations" : {
    "main category" : {
      "buckets" : [ {
        "key" : "category 1",
        "doc_count" : 10,
        "sub category" : {
          "buckets" : [ {
            "key" : "subcategory 1",
            "doc_count" : 4
          }, {
            "key" : "subcategory 2",
            "doc_count" : 6
          } ]
        }
      }, {
        "key" : "category 2",
        "doc_count" : 7,
        "sub category" : {
          "buckets" : [ {
            "key" : "subcategory 1",
            "doc_count" : 3
          }, {
            "key" : "subcategory 2",
            "doc_count" : 4
          } ]
        }
      } ]
    }
  }
}
like image 39
knutwalker Avatar answered Sep 20 '22 22:09

knutwalker