Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

limit offset, sorting and aggregation challenges in DynamoDB

I am using DynamoDB to store my device events (in JSON format) into table for further analysis and using scan APIs to display the result set on UI, which requires

  • To define limit offset of records,say 10 records per page, means result set should be paginated(e.g. page-1 has 0-10 records, page-2 has 11-20 records and so on), i got an API like scanRequest.withLimit(10) but it has different meaning of limit offset, does DynamoDB API comes with support of limit offset?
  • I also need to sort result set on basis of user input fields like sorting on Date, Serial Number etc, but still didn't get any sorting/order by APIs.
  • I may look for aggregation e.g. on Device Name, Date etc. which also doesn't seems to be available in DynamoDB.

The above situation led me to think about some others noSQL database solutions, Please assist me on above mentioned issues.

like image 255
Anil Shukla Avatar asked Sep 07 '15 08:09

Anil Shukla


2 Answers

The right way to think about DynamoDB is as a key-value store with support for indexes.

"Amazon DynamoDB supports key-value data structures. Each item (row) is a key-value pair where the primary key is the only required attribute for items in a table and uniquely identifies each item. DynamoDB is schema-less. Each item can have any number of attributes (columns). In addition to querying the primary key, you can query non-primary key attributes using Global Secondary Indexes and Local Secondary Indexes." https://aws.amazon.com/dynamodb/details/

A table can have 2 types of keys:

  1. Hash Type Primary Key—The primary key is made of one attribute, a hash attribute. DynamoDB builds an unordered hash index on this primary key attribute. Each item in the table is uniquely identified by its hash key value.
  2. Hash and Range Type Primary Key—The primary key is made of two attributes. The first attribute is the hash attribute and the second one is the range attribute. DynamoDB builds an unordered hash index on the hash primary key attribute, and a sorted range index on the range primary key attribute. Each item in the table is uniquely identified by the combination of its hash and range key values. It is possible for two items to have the same hash key value, but those two items must have different range key values.

What kind of primary key have you set up for your Device Events table? I would suggest that you denormalize your data (i.e. pull specific attributes out of the json) and build additional indexes on those attributes that you want to sort and aggregate on: Date, Serial Number, etc. If I know what kind of primary key you have set up on your table, I can point you in the right direction to build these indices so that you can get what you need via the query method. The scan method will be inefficient for you because it reads every row in the table.

Lastly, with regard to your "limit offset" question, I think that you're looking for the ExclusiveStartKey, which will be returned by DynamoDB in the response to your query.

like image 174
readyornot Avatar answered Feb 19 '23 20:02

readyornot


The ExclusiveStartKey is what will help you do pagination. It's not necessary to depend on the LastEvaluatedKey from the response. You'll get LastEvaluatedKey only if you are getting more than a MB worth data. If LIMIT page size is such that total returned data size is less than 1 MB, you'll not get back LastEvaluatedKey. But that does not stop you from using ExclusiveStartKey as an offset.

like image 45
Ouroboros Avatar answered Feb 19 '23 20:02

Ouroboros