Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch nested object based filter and count operation

I am new on elasticsearch. trying to use it on analytics calculations. I don't know, does it possible to do that but, I am trying to find customers who has 0 purchases. I stored, orders as nested object arrays for each customer. Here you may found an example mapping properties of customers index:

"first_name" => [
 "type" => "text"
],
"last_name" => [
    "type"=> "text"
],
"email" => [
    "type"=> "text"
],
"total_spent" => [
    "type"=> "text"
],
"aov" => [
    "type"=> "float"
],
"orders_count" => [
    "type"=> "integer"
],
"orders" => [
    "type" => "nested",
    "properties" => [
        "order_id" => [
            "type"=>"text"
        ],
        "total_price" => [
            "type"=>"float"
        ]
    ]
]

Example Customer indexes:

    [
   {
      "_index":"customers_index",
      "_type":"_doc",
      "_id":"1",
      "_score":1,
      "_source":{
         "first_name":"Stephen",
         "last_name":"Long",
         "email":"[email protected]",
         "total_spent":"0.00",
         "aov":0,
         "orders":[]
      }
   },
   {
      "_index":"customers_index",
      "_type":"_doc",
      "_id":"2",
      "_score":1,
      "_source":{
         "first_name":"Reece",
         "last_name":"Dixon",
         "email":"[email protected]",
         "total_spent":"0.10",
         "aov":"0.1",
         "orders":[
            {
               "total_price":"0.10",
               "placed_at":"2020-09-24T20:08:35.000000Z",
               "order_id":2723671867546
            }
         ]
      }
   },
   {
      "_index":"customers_index",
      "_type":"_doc",
      "_id":"3",
      "_score":1,
      "_source":{
         "first_name":"John",
         "last_name":"Marshall",
         "email":"[email protected]",
         "total_spent":"0.10",
         "aov":"0.04",
         "orders":[
            {
               "total_price":"0.10",
               "placed_at":"2020-09-24T20:10:52.000000Z",
               "order_id":2723675930778
            },
            {
               "total_price":"0.30",
               "placed_at":"2020-09-24T20:09:45.000000Z",
               "order_id":2723673899162
            },
            {
               "total_price":"0.10",
               "placed_at":"2020-09-16T09:55:22.000000Z",
               "order_id":2704717414554
            }
         ]
      }
   }
]

First of all, I want to ask do you think this kind of mapping fits with nature of elastic search? As an example, I am able to group customers by specific date range and sum total_spent as aggregated data. However, What I want to learn is, does it possible to find customers who has no order with the filtering nested orders array for the specific date range? Do you think, this kind of queries, has some performance issues?

I have not familiar with nosql databases. I am an RDBMS guy. So, I am trying to understand the concept of Elastic Search as an analytics db.

Thanks for responses

Edit:

I am trying to count nested objects within filter of a specified date range between the objects. Does it possible and making sense to do that on elasticsearch? Simply, I want to view, customers who has 1 orders or multiple orders depending to input within a specified dates.

I know how to get, daily count of customers but What If I want to count customers who has 1 orders on the specified date range with in a group of daily statement?

Possible response that I expected:

{
...
"aggregations":[
{
"date":"2020-09-01",
"total_customers_zero_purchased":15
}
...
]
}
like image 831
bl4cksta Avatar asked Oct 10 '20 01:10

bl4cksta


Video Answer


1 Answers

There are a bunch of questions raised here so I'll focus on the most important part.

First off, it's customary to make certain text fields of type .keyword so we can later aggregate on them. This means:

PUT customers_index
{
  "mappings": {
    "properties": {
      "email": {
        "type": "keyword"    <--
      }
    }
  }
}

After that we can proceed to our query but have to note that when we iterate over date ranges, we need to specify a date field. Meaning:

  1. that the iterated range is built automatically from the available/present values (and we can filter to limit its scope)
  2. but also that when a doc does not include a date in a given range, it'll be, understandably, skipped.

Practically speaking, we cannot get a per-day rolling aggregation (because we don't know what we don't know) but only a single-day metric. For example

GET customers_index/_search
{
  "size": 0,
  "aggs": {
    "multibucket_simulator": {
      "filters": {
        "filters": {
          "all": {
            "match_all": {}
          }
        }
      },
      "aggs": {
        "all_customers": {
          "cardinality": {
            "field": "email"
          }
        },
        "customers_who_purchased_at_date": {
          "filter": {
            "nested": {
              "path": "orders",
              "query": {
                "range": {
                  "orders.placed_at": {
                    "gte": "2020-09-16T00:00:00.000000Z",
                    "lt": "2020-09-26T00:00:00.000000Z"
                  }
                }
              }
            }
          },
          "aggs": {
            "customer_count": {
              "cardinality": {
                "field": "email"
              }
            }
          }
        },
        "total_customers_zero_purchased": {
          "bucket_script": {
            "buckets_path": {
              "all": "all_customers.value",
              "filtered": "customers_who_purchased_at_date>customer_count.value"
            },
            "script": "params.all - params.filtered"
          }
        }
      }
    }
  }
}

yielding

"aggregations" : {
  "multibucket_simulator" : {
    ...
    "buckets" : {
      "all" : {
        ...
        "customers_who_purchased_at_date" : {
          ...
        },
        "all_customers" : {
          ...
        },
        "total_customers_zero_purchased" : {       <---
          "value" : 1.0
        }
      }
    }
  }
}

thereby answering the question:

How many customers did not purchase anything between 09/16 and 09/25 inclusive?

like image 58
Joe - Elasticsearch Handbook Avatar answered Oct 19 '22 06:10

Joe - Elasticsearch Handbook