Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Effective implementation of one-to-many relationship with Python NDB

I would like to hear your opinion about the effective implementation of one-to-many relationship with Python NDB. (e.g. Person(one)-to-Tasks(many))

In my understanding, there are three ways to implement it.

  1. Use 'parent' argument
  2. Use 'repeated' Structured property
  3. Use 'repeated' Key property

I choose a way based on the logic below usually, but does it make sense to you? If you have better logic, please teach me.

  1. Use 'parent' argument

    • Transactional operation is required between these entities
    • Bidirectional reference is required between these entities
    • Strongly intend 'Parent-Child' relationship
  2. Use 'repeated' Structured property

    • Don't need to use 'many' entity individually (Always, used with 'one' entity)
    • 'many' entity is only referred by 'one' entity
    • Number of 'repeated' is less than 100
  3. Use 'repeated' Key property

    • Need to use 'many' entity individually
    • 'many' entity can be referred by other entities
    • Number of 'repeated' is more than 100

No.2 increases the size of entity, but we can save the datastore operations. (We need to use projection query to reduce CPU time for the deserialization though). Therefore, I use this way as much as I can.

I really appreciate your opinion.

like image 325
Chikashi Kato Avatar asked Feb 06 '13 21:02

Chikashi Kato


2 Answers

A key thing you are missing: How are you reading the data?

If you are displaying all the tasks for a given person on a request, 2 makes sense: you can query the person and show all his tasks.

However, if you need to query say a list of all tasks say due at a certain time, querying for repeated structured properties is terrible. You will want individual entities for your Tasks.

There's a fourth option, which is to use a KeyProperty in your Task that points to your Person. When you need a list of Tasks for a person you can issue a query.

If you need to search for individual Tasks, then you probably want to go with #4. You can use it in combination with #3 as well.

Also, the number of repeated properties has nothing to do with 100. It has everything to do with the size of your Person and Task entities, and how much will fit into 1MB. This is potentially dangerous, because if your Task entity can potentially be large, you might run out of space in your Person entity faster than you expect.

like image 123
dragonx Avatar answered Sep 20 '22 12:09

dragonx


One thing that most GAE users will come to realize (sooner or later) is that the datastore does not encourage design according to the formal normalization principles that would be considered a good idea in relational databases. Instead it often seems to encourage design that is unintuitive and anathema to established norms. Although relational database design principles have their place, they just don't work here.

I think the basis for the datastore design instead falls into two questions:

  1. How am I going to read this data and how do I read it with the minimum number of read operations?

  2. Is storing it that way going to lead to an explosion in the number of write and indexing operations?

If you answer these two questions with as much foresight and actual tests as you can, I think you're doing pretty well. You could formalize other rules and specific cases, but these questions will work most of the time.

like image 41
Sudhir Jonathan Avatar answered Sep 20 '22 12:09

Sudhir Jonathan