Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamo DB: global secondary index, sparse index

I am considering taking advantage of sparse indexes as described in the AWS guidelines. In the example described --

... in the GameScores table, certain players might have earned a particular achievement for a game - such as "Champ" - but most players have not. Rather than scanning the entire GameScores table for Champs, you could create a global secondary index with a partition key of Champ and a sort key of UserId.

My question is: what happens when the number of champs becomes very large? I suppose that the "Champ" partition will become very large and you would start to experience uneven load distribution. In order to get uniform load distribution, would I need to randomize the "Champ" value by (effectively) sharding over n shards, e.g. Champ.0, Champ.1 ... Champ.99?

Alternatively, is there a different access pattern that can be used when fetching entities with a specific attribute that may grow large over time?

like image 412
pestrella Avatar asked Oct 31 '22 03:10

pestrella


1 Answers

this is exactly the solution you need (Champ.0, Champ.1 ... Champ.N)

N should be [expected partitions for this index + some growth gap] (if you expect for high load, or many 'champs' then you can choose N=200) (for a good hash distribution over partitions). i recommend that N will be modulo on userId. (this can help you to do some manipulations by userId.)

we also use this solution if your hash key is Boolean (in dynamodb you can represent boolean as string), so in this case the hash will be "true.0", "true.1" .... "true.N" and the same for "false".

like image 114
Eyal Ch Avatar answered Nov 15 '22 06:11

Eyal Ch