Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Designing tables in Amazon dynamodb

I am new to DynamoDB and I have a big mess of: how my tables should be look like.

I have read the posts here: (its recommended for who didn't read it yet) http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/BestPractices.html

And now I have some dilemmas that I think everyone who start using DynamoDB will have.

First, my tables: STUDENTS, TEAMS, PROJECTS

STUDENTS: id, age ...

TEAMS: id, student-1-id, student-2-id, current-project, prev-project, last-updated-on

PROJECTS: id, team-id, list of questions, list student1answers, list student2answers

some comments:

  1. as you can see I don't use range-key. Do I need to?.
  2. each answer is a json of (number of question, text, date of inserted)
  3. every student can be in multiple teams.

My dilemmas:

  1. I want to get all the teams of a specific student that updated after specific date.

for now I am using 2 scans operations: one search the student1 and the second search the student2.

       **Is there a better way ?**

I have thought about adding a new table: user-Battles: student-id, team-id so i can query the teams for the specific students and then batch_get_item all the teams but what with the last-update-on? how can I also query by this inside the batch_get_item ?

  1. When a project overs I don't use it anymore. what to do with the old items ? delete ? Move them to another table ?

  2. In the project table, the attributes that can be updated are the answers attributes so I think to move them to another table for performances.

Do I really need to move them if its updated just twice? (when student1 send answer and when student2 send answer - and then the project is old)

*If I create a new table for the answers I will not have to store them in a JSON format

How would you design the tables? Please let me know.

like image 613
user1623454 Avatar asked Oct 05 '12 18:10

user1623454


1 Answers

Nice question with lot of details :)

If I had only one advise, it would be:

keep in mind that, with NoSQL, it is not only OK but normal, even recommended to de-normalize your data.

This said, for you "dilemna", your suggestion was pretty good. You should de-normalize with the date as the range_key. One way could be to add a table like this:

  • hash_key: student
  • range_key: date
  • team: team_id

But still, this is not perfect as the table would keep on growing. Each updating inserting a new object. Indeed, it is not possible to edit a key. You would have to do your own cleaning code.

In DynamoDB, you do not have to worry about performance slowdown caused by "old" items(except for scan), this is the main strength of DynamoDB. Nonetheless, this is always a good practice to keep data clean but be consistent. If you start moving expired projects then, move all of them or you will end up not knowing where your data are.

Last suggestion: are you sure "ids" are the best thing to describe your objects ? Most of the time, a name, date or any unique attribute makes a better key.

like image 190
yadutaf Avatar answered Oct 14 '22 20:10

yadutaf