Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use aggregate functions in Amazon Dynamodb

Am New to dynamodb I have a table in DynamoDB with more than 100k items in it. Also, this table gets refreshed frequently. On this table, I want to be able to do something similar to this in the relation database world: how i can get max value from the table.

like image 653
pranay Avatar asked Apr 26 '16 13:04

pranay


People also ask

Can we do aggregation in DynamoDB?

DynamoDB does not provide aggregation functions. You must make creative use of queries, scans, indices, and assorted tools to perform these tasks. In all this, the throughput expense of queries/scans in these operations can be heavy.

How do you count in DynamoDB?

To get the item count of a dynamodb table, you have to: Open the AWS Dynamodb console and click on your table's name. In the Overview tab scroll down to the Items summary section. Click on the Get live item count button.

Can DynamoDB have multiple partition keys?

There are two types of primary keys in DynamoDB: Partition key: This is a simple primary key. If the table has only a partition key, then no two items can have the same partition key value. Composite primary key: This is a combination of partition key and sort key.

What is difference between Scan and query in DynamoDB?

DynamoDB supports two different types of read operations, which are query and scan. A query is a lookup based on either the primary key or an index key. A scan is, as the name indicates, a read call that scans the entire table in order to find a particular result.


1 Answers

DynamoDB is a NoSQL database and therefore is very limited on how you can query data. It is not possible to perform aggregations such as max value from a table by directly calling the DynamoDB API. You will have to look to different tools and approaches to solve this problem.

There are a number of possible solutions you can consider:

Perform A Table Scan

With more than 100k items in your table this is likely a very bad idea. A table scan will read through every single item and you can have application side logic identify the maximum value. This really isn't a workable solution.

Materialized Index in DynamoDB

Depending on your use case you can use DynamoDB streams and a Lambda function to maintain an index in a separate DynamoDB table. If your table is write only, no updates and no deletions, you could store the maximum in a separate table and as new records get inserted you can compare them and perform the necessary updates.

This approach is workable under some constrained circumstances, but is not a generalized solution.

Perform Analytic using Amazon Redshift

DynamoDB is not meant to do analytical operations such as maximum, while Redshift is a very powerful big data platform that can perform these types of calculations with ease. Similar to the DynamoDB index, you can use DynamoDB streams to send data into Redshift as records get inserted to maintain a near real time copy of the table for analytical purposes.

If you are looking for more of an offline or analytical solution this is a good choice.

Perform Analytics using Elasticsearch

While DynamoDB is a powerful NoSQL solution with strong guarantees on data durability, Elasticsearch provides a very flexible querying method that allows for queries such as maximum and these aggregations can be sliced and diced on any attribute value in real time. Similar to the above solutions you can use DynamoDB streams to send record inserts updates and deletions into the Elasticsearch index in real time.

If you want to stick with DynamoDB but need some additional querying capability, this is really a good option especially when using the AWS ES service which will fully manage an Elasticsearch cluster for you. It is important to remember that Elasticsearch doesn't replace your DynamoDB table, it is just an easily searchable index of the same data.

Just use a SQL Database

The obvious solution is if you have SQL requirements then move from a NoSQL based system to a SQL based system. AWS's RDS offering provides a managed solution. While DynamoDB provides a lot of benefits, if your use case is pulling you towards a SQL solution the easiest thing to do may be to not fight it and just change solutions.

This is not to say that a SQL based solution or NoSQL based solution is better, there are pros and cons to each and those vary based on the specific use case, but it is definitely an option to consider.

like image 195
JaredHatfield Avatar answered Sep 28 '22 06:09

JaredHatfield