Is there any clever way to avoid making a costly query with an IN clause in cases like the following one?
I'm using Google App Engine to build a Facebook application and at some point I (obviously) need to query the datastore to get all the entities that belong to any of the facebook friends of the given user.
Suppose I have a couple of entities modeled as such:
class Thing(db.Model):
owner = db.ReferenceProperty(reference_class=User, required=True)
owner_id = db.StringProperty(required=True)
...
and
class User(db.Model):
id = db.StringProperty(required=True)
...
At some point I query Facebook to get the list of friends of a given user and I need to perform the following query
# get all Thing instances that belong to friends
query = Thing.all()
query.filter('owner_id IN', friend_ids)
If I did that, AppEngine would perform a subquery for each id in friend_ids
, probably exceeding the maximum number of subqueries any query can spawn (30).
Is there any better way to do this (i.e. minimizing the number of queries)?
I understand that there are no relations and joins using the datastore but, in particular, I would consider adding new fields to the User
or Thing
class if it helps in making things easier.
I don't think there's an elegant solution, but you could try this:
On the User model, use Facebook ID as the key name, and store each user's list of things in a ListProperty.
class Thing(db.Model):
...
class User(db.Model):
things = db.ListProperty(db.Key)
...
Entity creation would go like this:
user = User.get_or_insert(my_facebook_id)
thing = Thing()
thing.put()
user.things.append(thing.key())
user.put()
Retrieval takes 2 queries:
friends = User.get_by_key_name(friend_ids)
thing_keys = []
for friend in friends:
thing_keys.extend(friend.things)
things = db.get(thing_keys)
This Google I/O talk by Brett Slatkin addresses the exact situation you're dealing with. See also his follow up talk this year.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With