Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GAE ndb design, performance and use of repeated properties

Say I have a picture gallery and a picture could potentially have 100k+ fans. Which ndb design is more efficient?

class picture(ndb.model):
    fanIds = ndb.StringProperty(repeated=True)
    ... [other picture properties]

or

class picture(ndb.model):
    ... [other picture properties]

class fan(ndb.model):
    pictureId = StringProperty()
    fanId = StringProperty()

Is there any limit on the number of items you can add to an ndb repeated property and is there any performance hit with storing a large amount of items in a repeated property? If it is less efficient to use repeated properties, what is their intended use?

like image 893
waigani Avatar asked Mar 13 '13 04:03

waigani


2 Answers

Do not use repeated properties if you have more than 100-1000 values. (1000 is probably already pushing it.) They weren't designed for such use.

like image 97
Guido van Rossum Avatar answered Sep 22 '22 16:09

Guido van Rossum


Generally v1 would be much cheaper.

In terms of read/write costs, you pay per entity fetch/written, so you want to reduce the number of entities. version 1 will be cheaper. Significantly cheaper if you fetch every fan every time you fetch a picture.

However each entity is limited to 1MB. If you have 100k+ fans, you could hit that limit depending on the size of your fanId. That's not counting your other picture data, so you could blow that 1MB limit. You'll have to add some more complex code to handle overflow cases.

Large entities take longer to fetch than small entities. If you're going to fetch all the fans at once all the time, v1 will be better. If you're only going to fetch say 5 fans at any one point, v2 might be faster (only might). If on the other hand you try to pull 100k fan entities... that's gonna take forever.

like image 30
dragonx Avatar answered Sep 21 '22 16:09

dragonx