ElasticSearch - get all available filters (aggregate) from index

Tags:

elasticsearch

Let's say I have:

"hits": [
      {
        "_index": "products",
        "_type": "product",
        "_id": "599c2b3fc991ee0a597034fa",
        "_score": 1,
        "_source": {,
          "attributes": {
            "1": [
              "a"
            ],
            "2": [
              "b",
              "c"
            ],
            "3": [
              "d",
              "e"
            ],
            "4": [
              "f",
              "g"
            ],
            "5": [
              "h",
              "i"
            ]
          }
        }
      },
      {
        "_index": "products",
        "_type": "product",
        "_id": "599c4bb4b970c25976ced8bd",
        "_score": 1,
        "_source": {
          "attributes": {
            "1": [
              "z"
            ],
            "2": [
              "y"
            ]
          }
        }

Each product has attributes. Each attribute has ID and a value. I can filter the products by attributes fine but for now I am creating the "possible attributes" list from MongoDB. I would like to find a way to generate such a list from ElasticSearch alone (and maybe just query MongoDB for additional data).

What I need is:

{
  1: [a, z],
  2: [b, c, y],
  etc.
}

How would such a aggregation look like? Get all available attributes (grouped by attribute.id) with all of their possible values (throughout all products)?

650

asked Sep 19 '17 11:09

Tomasz Gałkowski

1 Answers

You cannot do it in one query but it is fairly easy in two:

Retrieving the list of attributes

You can use mapping to get all the fields in your documents:

curl -XGET "http://localhost:9200/your_index/your_type/_mapping"

Retrieving their values

You can then use multiple Terms aggregation to get all the values of a field:

curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "field1Values": {
      "terms": {
        "field": "field1",
        "size": 20
      }
    },
    "field2Values": {
      "terms": {
        "field": "field2",
        "size": 20
      }
    },
    "field3Values": {
      "terms": {
        "field": "field3",
        "size": 20
      }
    },
    ...
  }
}'

This retrieve the top 20 most frequents values for each field.

This limit of 20 values is a restriction to prevent a huge response (if you have a few billion documents with a unique fields for instance). You can modify the "size" parameters of the terms aggregation to increase it. From your requirements I guess choosing something 10x larger than a rough estimate of the number of different values taken by each field should do the trick.

How to handle huge cardinality on values

You can also do an intermediate query using the cardinality aggregation to get this actual value and then use it as the size of your term aggregation. Please note than cardinality is an estimate when it comes to large number so you may want to use cardinality * 2.

curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "field1Cardinality": {
      "cardinality": {
        "field": "field1"
      }
    },
    "field2Cardinality": {
      "cardinality": {
        "field": "field2"
      }
    },
    "field3Cardinality": {
      "cardinality": {
        "field": "field3"
      }
    },
    ...
  }
}'

How to handle huge cardinality on values

The previous works if there is not so many different attributes. If there is, you should alter how the documents are stored to prevent a Mapping explosion,

Storing them like this:

{
    "attributes":[
        {
            "name":"1",
            "value":[
                "a"
            ]
        },
        {
            "name":"2",
            "value":[
                "b",
                "c"
            ]
        },
        {
            "name":"3",
            "value":[
                "d",
                "e"
            ]
        },
        {
            "name":"4",
            "value":[
                "f",
                "g"
            ]
        },
        {
            "name":"5",
            "value":[
                "h",
                "i"
            ]
        }
    ]
}

Would fix the problem and you will be able to use a term aggregation on "name" and then a sub terms aggregation on "value" to get what you want:

curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "attributes": {
      "terms": {
        "field": "attributes.name",
        "size": 1000
      },
      "aggs": {
        "values": {
          "terms": {
            "field": "attributes.value",
            "size": 100
          }
        }
      }
    }
  }
}'

It requires to use a Nested mapping for attributes.

116

answered Nov 15 '22 07:11

Pandawan

Related questions
                            
                                Case-insensitive replace in pattern_replace
                            
                                Elasticsearch curl: (7) couldn't connect to host
                            
                                ElasticSearch - Searching with hyphens in name
                            
                                Elasticsearch and .NET
                            
                                Elasticsearch failed to recover after crash
                            
                                findAllByX incorrectly limited to 10 results
                            
                                Boost a query word in Elasticsearch
                            
                                Elasticsearch cannot start when bind to public ip address
                            
                                Elasticsearch 2.1 - Deprecated search types
                            
                                Elasticsearch management tools like phpMyAdmin for mysql [closed]
                            
                                Elasticsearch suggestions with filter
                            
                                Dedup elasticsearch results using multiple fields as unique key
                            
                                How to build a GraphQL API on top of a Django/Elasticsearch/MySQL backend?
                            
                                Elasticsearch - Aggregations on part of bool query
                            
                                Unable to install Search Guard plugin for Elasticsearch-5.x
                            
                                Why install logstash if I can just send the data through REST to elasticsearch?
                            
                                How to get all field names in elasticsearch index
                            
                                What is the fastest way of indexing to ElasticSearch
                            
                                What's the best Kibana multi tenancy free open source project?
                            
                                How to add pre-existing data from DynamoDB to Elasticsearch?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ElasticSearch - get all available filters (aggregate) from index

Tags:

elasticsearch

Tomasz Gałkowski

People also ask

1 Answers

Retrieving the list of attributes

Retrieving their values

How to handle huge cardinality on values

How to handle huge cardinality on values

Pandawan

Recent Activity

Donate For Us