I have a little problem that needs some suggestions:
The latest entry of each table should be quickly obtainable and will most likely be queried the most (sorta like "follow data in real time"). With the lack of 'Last()' or similar, I was thinking of creating another table "LatestValues" where the latest entry of each data table is updated for a faster retrieval. This, however, would add an extra update for each write operation. Also, most of the traffic would be concentrated on this table (good/bad?). Is there a better solution for this or am I missing something?
Also, lets say we want to query for the values in data tables. Since scanning is obviously out of the question, is the only option left to create a secondary index by duplicating the data, effectively doubling the storaging requirements and the amount write operations? Any other solutions?
I'm primarily looking at DynamoDB and Azure Table Storage, but I'm also curious how BigTable handles this.
GetItem – Retrieves a single item from a table. This is the most efficient way to read a single item because it provides direct access to the physical location of the item. (DynamoDB also provides the BatchGetItem operation, allowing you to perform up to 100 GetItem calls in a single operation.)
When not to use DynamoDB: When multi-item or cross table transactions are required. When complex queries and joins are required. When real-time analytics on historic data is required.
Hash key in DynamoDB The primary reason for that complexity is that you cannot query DynamoDB without the hash key. So, it's not allowed to query the entire database. That means you cannot do what you would call a full table scan in other databases.
I just published an article today with some common "recipes" about DynamoDB. One of them is "Storing article revisions, getting always the latest" I think it might interest you :)
In a nutshell, you can get the latest item using Query(hash_key=..., ScanIndexForward=True, limit=1)
But, this assumes you have a range_key_defined.
With Scan
, you have no such parameter as ScanIndexForward=false
and anyway, you can not rely on the order as data is spread over partitions and the Scan
request is then load balanced.
To achieve you goal with DynamoDB, you may "split" your timestamp this way:
hash_key
: daterange_key
: time or full timestamp, as you preferThen, you can use the 'trick' of Query
+ Limit=1
+ ScanIndexForward=false
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With