Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS DynamoDB - Pick a record/item randomly?

Any ideas how to pick an item/record randomly from a DynamoDB table? I don't believe there are any provisions for this in the API.

I thought about maintaining a table of NumericId|MyOtherKey ("NumericIdTable") and then generating a random number between 0 and the total number of records I have, then getting that item from NumericIdTable but it's not going to work in the long-run.

Thoughts/ideas welcome.

like image 229
ben Avatar asked May 19 '12 15:05

ben


People also ask

What is the difference between Scan and Query in DynamoDB?

DynamoDB supports two different types of read operations, which are query and scan. A query is a lookup based on either the primary key or an index key. A scan is, as the name indicates, a read call that scans the entire table in order to find a particular result.

Can you Query in DynamoDB?

In Amazon DynamoDB, you can use either the DynamoDB API, or PartiQL, a SQL-compatible query language, to query an item from a table. With Amazon DynamoDB the Query action lets you retrieve data in a similar fashion. The Query action provides quick, efficient access to the physical locations where the data is stored.

Can we Query DynamoDB without primary key?

Hash key in DynamoDB The primary reason for that complexity is that you cannot query DynamoDB without the hash key. So, it's not allowed to query the entire database. That means you cannot do what you would call a full table scan in other databases.


2 Answers

One approach I came up with to pick a random item from a DynamoDB Table:

  1. Generate a random RangeKey over all possible RangeKeys in your Table
  2. Query the Table with this RangeKey and the RangeKeyCondition GreaterThan and a Limit of 1

For example if you use a UUID as Identifier for your RangeKey you could get your random Item like the following

RandomRangeKey = new UUID RandomItem = Query( "HashKeyValue": "KeyOfRandomItems",                     "RangeKeyCondition": { "AttributeValueList":                                 "RandomRangeKey",                                 "ComparisonOperator":"GT"},                      "Limit": 1 ) 

This way you get a random Item and only consume 1 read capacity.

There is a chance to miss the first query for a random variable by generating a smaller UUID than the smallest one used in the table. This chance scales down with the table scaling up and you can easily send another request using the SmallerThan Comparison on the same random key, which then ensures a hit for a random item.


If your Tabledesign doesn't allow randomizable RangeKeys you could follow your approach and create a separate RandomItem table and store the ID under a randomizable RangeKey. A possible table structure for this would be

*RandomItemTable    TableName - HashKey    UUID - Rangekey    ItemId 

Keep in mind, for this approach you need to manage the redundancy between the original table and the randomization table.

like image 170
nenTi Avatar answered Oct 12 '22 06:10

nenTi


If you're using GUID as your Hash Key for the table, you can do something like this:

var client = new AmazonDynamoDBClient();  var lastKeyEvaluated = new Dictionary<string, AttributeValue>()  {      { "YOUR_HASH_KEY", new AttributeValue(Guid.NewGuid().ToString()) }  };  var request = new ScanRequest() {     TableName = YOUR_TABLE_NAME,     ExclusiveStartKey = lastKeyEvaluated,     Limit = 1 }; var response = client.Scan(request); 

This will give you a random record every time since it generates a random GUID as the lastKeyEvaluated.

like image 29
cmilam Avatar answered Oct 12 '22 07:10

cmilam