I would like to hear your opinion about the effective implementation of one-to-many relationship with Python NDB. (e.g. Person(one)-to-Tasks(many))
In my understanding, there are three ways to implement it.
I choose a way based on the logic below usually, but does it make sense to you? If you have better logic, please teach me.
Use 'parent' argument
Use 'repeated' Structured property
Use 'repeated' Key property
No.2 increases the size of entity, but we can save the datastore operations. (We need to use projection query to reduce CPU time for the deserialization though). Therefore, I use this way as much as I can.
I really appreciate your opinion.
A key thing you are missing: How are you reading the data?
If you are displaying all the tasks for a given person on a request, 2 makes sense: you can query the person and show all his tasks.
However, if you need to query say a list of all tasks say due at a certain time, querying for repeated structured properties is terrible. You will want individual entities for your Tasks.
There's a fourth option, which is to use a KeyProperty in your Task that points to your Person. When you need a list of Tasks for a person you can issue a query.
If you need to search for individual Tasks, then you probably want to go with #4. You can use it in combination with #3 as well.
Also, the number of repeated properties has nothing to do with 100. It has everything to do with the size of your Person and Task entities, and how much will fit into 1MB. This is potentially dangerous, because if your Task entity can potentially be large, you might run out of space in your Person entity faster than you expect.
One thing that most GAE users will come to realize (sooner or later) is that the datastore does not encourage design according to the formal normalization principles that would be considered a good idea in relational databases. Instead it often seems to encourage design that is unintuitive and anathema to established norms. Although relational database design principles have their place, they just don't work here.
I think the basis for the datastore design instead falls into two questions:
How am I going to read this data and how do I read it with the minimum number of read operations?
Is storing it that way going to lead to an explosion in the number of write and indexing operations?
If you answer these two questions with as much foresight and actual tests as you can, I think you're doing pretty well. You could formalize other rules and specific cases, but these questions will work most of the time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With