Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query dynamoDB by date range

Tags:

I am developing an application that allows users to read books. I am using DynamoDB for storing details of the books that user reads and I plan to use the data stored in DynamoDB for calculating statistics, such as trending books, authors, etc.

My current schema looks like this:

user_id | timestamp | book_id | author_id 

user_id is the partition key, and timestamp is the sort key.

The problem I am having is that, with this schema I am only able to query the details of the books that a single user (partition key) has read. That is one of the requirements for me.

The other requirement is to query all the records that has been created in a certain date range, eg: records created in the past 7 days. With this schema, I am unable to run this query.

I have looked into so many other options, and haven't figured out a way to create a schema that would allow me to run both queries.

  • Retrieve the records of the books read by a single user (Can be done).
  • Retrieve the records of books read by all the users in last x days (Unable to do it).

I do not want to run a scan, since It will be expensive and I looked into the option of using GSI for timestamp, but it requires me to specify a hash key, and therefore I cannot query all the records created between 2 dates.

like image 886
haris Avatar asked Jun 15 '17 20:06

haris


People also ask

Does DynamoDB support range queries?

Querying is a very powerful operation in DynamoDB. It allows you to select multiple Items that have the same partition ("HASH") key but different sort ("RANGE") keys.

Does DynamoDB support datetime?

There are multiple ways to represent a timestamp in DynamoDB. Probably the most common is to use a Number type to represent the timestamp with the value as a Unix timestamp (seconds or milliseconds). Additionally, you can store the timestamp as a String type with the value as an ISO 8601 formatted string.

How do I sort DynamoDB Query results?

You can use the sort-key and apply the ScanIndexForward parameter in a query to sort in either ascending or descending order. Here I limit items returned to 1.

What is the difference between scan and Query in DynamoDB?

DynamoDB supports two types of read operations: Query and Scan. To find information, a query operation uses either the primary key or the index key. Scan, as the name implies, is a read call that scans the entire table for a specified result. DynamoDB is designed to be query-optimized.


2 Answers

One naive solution would be to create a GSI with a constant hash key across all books and timestamp as a range key. This will allow you to perform your type of queries.

The problem with this approach is that it is likely to become a scaling bottleneck, as same hash key means same node. One workaround for this problem is to do sharding: create a set of hash keys (ex: from 1 to 10) and assign random key from this set to every book. Then when you make a query you will need to make 10 queries and merge results. You can even make this set size dynamic, so that it scales with your data.

I would also suggest looking into other tools (not DynamoDB) for this use case, as DDB is not the best tool for data analysis. You might, for example, feed DynamoDB data into CloudSearch or ElasticSearch and do your analysis there.

like image 173
Tofig Hasanov Avatar answered Oct 11 '22 13:10

Tofig Hasanov


One solution could be using GSI and including two more columns, when ever you ingest a record kindly ingest date as a primary key e.g 2017-07-02 and timestamp as range key 04:22:33:000.

Maintain one table for checkpoint which would contain the process name and timestamp of the table, Everytime you read from the table you can update the checkpoint table to get incremental data. if you want to get last 7 day data change timestamp to past 7 date and get data between last 7 day and current time.

You can use query spec for the same by passing date as a partition and using between keywords for timestamp which is range condition.

Date diff you will to calculate from checkpoint table and current date and so day wise you get the data.

like image 23
user3484461 Avatar answered Oct 11 '22 15:10

user3484461