Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do DynamoDB Queries of composite Sort Key's made of two ISO 8061 dates return correct results (with BETWEEN)?

It seems like the consensus for handling Sort Key Date objects (ISO 8061 format) in DynamoDB is to use the STRING key type, where the query case will use the BETWEEN comparison to find results between two dates (also ISO 8061), see answer:

  1. Is it possible to save datetime to DynamoDB?

My Question is does the mechanics of the DynamoDB string comparison 'BETWEEN' that enables the above functionality ALSO allow you to query TWO date objects stitched together (example: 2018-05-01-2018-05-10) as a single Sort Key where the first stitched date is ALWAYS before the second stitched date.

After initial testing it APPEARS that the answer is YES, so long as your BETWEEN query also has two concatenated dates this should be possible.

I am hoping someone can help me figure out whether this would be safe / stable to deploy.

Before we get into the background I figured I would post my test case here to make it clear what I am asking.

Given the Sort Key: 2018-05-01-2018-05-25

  1. Query BETWEEN 2018-05-01-2018-05-20 and 2018-05-01-2018-05-26 (RETURNS item)
  2. Query BETWEEN 2018-05-01-2018-05-20 and 2018-05-01-2018-05-24 (No Data Found)
  3. Query BETWEEN 2018-05-01-2018-05-20 and 2018-05-01-2018-05-25 (RETURNS item)
  4. Query BETWEEN 2018-05-02-2018-05-20 and 2018-05-02-2018-05-26 (No Data Found)

Background


I am working on building a queryable DynamoDB table of flight combinations where I would like the Query to check both whether a departing flight date is AFTER a certain date and ALSO whether a returning flight is before a certain date.

Essentially my understanding is that BETWEEN queries on ISO 8061 dates in DynamoDB work due to the way the numbers cascade from left to right (year-month-date). If this is the case then it should be safe (in my circumstances) to concatenate two dates into one Sort Key.

Representation of Sort Key value would be...

  • departDate1-returnDate2
  • year-month-date-year-month-date

I am pre-checking the data so that a departing flight (first concatenated date) can be assumed to ALWAYS be BEFORE a returning flight (second concatenated date) if it is in our table so therefore given a BETWEEN operation my assumption is that this SHOULD be possible.

I will post back after more testing but I am hoping someone has more experience with exactly how (and why) the DynamoDB BETWEEN Query function works in this case.

like image 750
Necevil Avatar asked Nov 07 '22 07:11

Necevil


1 Answers

Posting this answer as I have been using the above in production for some time.

Short Answer:

Sort of, depending on your goals. You can chain multiple ISO 8061 dates in a Sort Key on DynamoDB and use the Between function — however the second date will still need to be run through a Filter to ensure perfect adherence.

Much longer Answer with Examples

If your goal is to use the Composite Sort Key to first reduce the number of records returned, and then Filter over that — then it's fine. It accomplishes the goal of limiting the number of records you need to filter (as in many cases the Sort Key BETWEEN query actually does filter out junk that doesn't match).

However it does not work / return goal results in all cases.

So for an example piece of data (Sort Key) in DyanmoDB: 2018-09-01_2018-09-07

Our goal here would be to ONLY return data where BOTH the start date (2018-09-01 in the case of the example) and the end date (2018-09-07 in the case of the above example) are within the bounds of our between query.

An simple example query to return the above data object would be:

Between 2018-08-15_2018-09-07 and 2018-09-05_2018-09-08

At first it would appear that the query works (as mentioned in my original question) since the above DOES return the example data. The issue is that this is not always the case.

Breaking Example Sort Key

An example of a composite ISO 8061 Sort Key date that would be returned by the above query but DOES NOT actually fit within the goal start / end date of our Queries intention would be:

2018-09-04_2018-09-15

While the initial start date is within our goal bounds — the fact that the start date is smaller than the start date maximum (2018-09-05) causes the end date to be allowed over our ideal end date maximum — since the total number is still smaller than our maximum.

This is because of how DynamoDB looks at Between queries...

Notes on DynamoDB Between Logic

DynamoDB basically sees the ISO 8061 dates as the numeric representation of that same date without the dash. So 2018-09-01 = 20180901. Since the ISO 8061 dates cascade from largest to smallest two of them together looks (to Dynamo) like this: 2018090120180907.

Once it removes the characters that match with each other it just does a simple comparison of numeric Greater Than / Less Than.

Still Useful?

So as mentioned you do get data that you don't want returned from the Between query where applies to a composite Sort Key with two (or more) ISO 8061 dates concatenated together. BUT if your goal is to first reduce the number of records returned in order to filter more efficiently — then this can still be a good thing (depending on your use case).

You just have to plan to ALSO run a Filter against your returned Query data to ensure that all dates are within bounds.

like image 73
Necevil Avatar answered Nov 15 '22 05:11

Necevil