Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use doc_count as cumulative count

I'm currently attempting to generate graphs from data collected in Elastic Search. Every time an user is generated, I insert a record in ES, having the following (example) data:

{
  "country": "US",
  "id": "79ca9523dcd62420030de12b75e08bb7",
  "createdAt": "1450912898"
}

ID is a hash of the User ID, so that the User ID can not be determined from the ID stored in ES, for privacy reasons.

The type mapping in the ES index is as following:

{
  "user": {
    "_timestamp": {
      "enabled": true
    },
    "properties": {
      "country": {
        "type": "string"
      },
      "createdAt": {
        "type": "date",
        "format": "epoch_second"
      },
      "id": {
        "type": "string",
        "index": "not_analyzed"
      }
    }
  }
}

Now, to get a graph of users per day, I have the following query:

{
  "size": 0,
  "query": {
    "type": {
      "value": "user"
    }
  },
  "aggs": {
    "users_per_day": {
      "date_histogram": {
        "field": "createdAt",
        "interval": "day"
      }
    }
  }
}

This gives me a nice result such as this (for the result I set the interval to minute to give you a slight idea of what the problem is):

[{
  "key_as_string": "1450909920",
  "key": 1450909920000,
  "doc_count": 8
},
{
  "key_as_string": "1450909980",
  "key": 1450909980000,
  "doc_count": 2
},
{
  "key_as_string": "1450910040",
  "key": 1450910040000,
  "doc_count": 5
},
{
  "key_as_string": "1450910100",
  "key": 1450910100000,
  "doc_count": 8
},
{
  "key_as_string": "1450910160",
  "key": 1450910160000,
  "doc_count": 4
},
{
  "key_as_string": "1450910220",
  "key": 1450910220000,
  "doc_count": 3
},
{
  "key_as_string": "1450910280",
  "key": 1450910280000,
  "doc_count": 6
}]

I'd like to use the doc_count to generate a cumulative graph, so that I can see the growth of my userbase, instead of the amount of accounts made per day. Despite searching on the internet, I can not find a single answer that seems to be related to my question. Most answers I find are directing me to the Cumulative Sum Aggregation page, but the example given there will give you a cumulative sum of all results captured in a single bucket. I'd like a cumulative sum of all bucket totals.

like image 811
Ruben Rutten Avatar asked Dec 25 '22 11:12

Ruben Rutten


1 Answers

You're on the right path with the cumulative sum aggregation and you can definitely use it. You just need to use the special _count bucket path and that will do the job you expect.

{
  "size": 0,
  "query": {
    "type": {
      "value": "user"
    }
  },
  "aggs": {
    "users_per_day": {
      "date_histogram": {
        "field": "createdAt",
        "interval": "day"
      },
      "aggs": {
        "cumulative": {
          "cumulative_sum": {
            "buckets_path": "_count"
          }
        }
      }
    }
  }
}

The results will look like this:

[{
  "key_as_string": "1450909920",
  "key": 1450909920000,
  "doc_count": 8,
  "cumulative": {"value": 8}
},
{
  "key_as_string": "1450909980",
  "key": 1450909980000,
  "doc_count": 2,
  "cumulative": {"value": 10}
},
{
  "key_as_string": "1450910040",
  "key": 1450910040000,
  "doc_count": 5,
  "cumulative": {"value": 15}
},
{
  "key_as_string": "1450910100",
  "key": 1450910100000,
  "doc_count": 8,
  "cumulative": {"value": 23}
},
{
  "key_as_string": "1450910160",
  "key": 1450910160000,
  "doc_count": 4,
  "cumulative": {"value": 27}
},
{
  "key_as_string": "1450910220",
  "key": 1450910220000,
  "doc_count": 3,
  "cumulative": {"value": 30}
},
{
  "key_as_string": "1450910280",
  "key": 1450910280000,
  "doc_count": 6,
  "cumulative": {"value": 36}
}]
like image 113
Val Avatar answered Jan 02 '23 00:01

Val