Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I ensure data integrity for objects in google app engine without using key names?

I'm having a bit of trouble in Google App Engine ensuring that my data is correct when using an ancestor relationship without key names.

Let me explain a little more: I've got a parent entity category, and I want to create a child entity item. I'd like to create a function that takes a category name and item name, and creates both entities if they don't exist. Initially I created one transaction and created both in the transaction if needed using a key name, and this worked fine. However, I realized I didn't want to use the name as the key as it may need to change, and I tried within my transaction to do this:

def add_item_txn(category_name, item_name):
  category_query = db.GqlQuery("SELECT * FROM Category WHERE name=:category_name", category_name=category_name)
category = category_query.get()
if not category:
    category = Category(name=category_name, count=0)

item_query = db.GqlQuery("SELECT * FROM Item WHERE name=:name AND ANCESTOR IS :category", name=item_name, category=category)
item_results = item_query.fetch(1)
if len(item_results) == 0:
  item = Item(parent=category, name=name)

db.run_in_transaction(add_item_txn, "foo", "bar")

What I found when I tried to run this is that App Engine rejects this as it won't let you run a query in a transaction: Only ancestor queries are allowed inside transactions.

Looking at the example Google gives about how to address this:

def decrement(key, amount=1):
    counter = db.get(key)
    counter.count -= amount
    if counter.count < 0:    # don't let the counter go negative
        raise db.Rollback()
    db.put(counter)

q = db.GqlQuery("SELECT * FROM Counter WHERE name = :1", "foo")
counter = q.get()
db.run_in_transaction(decrement, counter.key(), amount=5)

I attempted to move my fetch of the category to before the transaction:

def add_item_txn(category_key, item_name):
    category = category_key.get()
    item_query = db.GqlQuery("SELECT * FROM Item WHERE name=:name AND ANCESTOR IS :category", name=item_name, category=category)
    item_results = item_query.fetch(1)
    if len(item_results) == 0:
         item = Item(parent=category, name=name)

category_query = db.GqlQuery("SELECT * FROM Category WHERE name=:category_name", category_name="foo")
category = category_query.get()
if not category:
    category = Category(name=category_name, count=0)
db.run_in_transaction(add_item_txn, category.key(), "bar")

This seemingly worked, but I found when I ran this with a number of requests that I had duplicate categories created, which makes sense, as the category is queried outside the transaction and multiple requests could create multiple categories.

Does anyone have any idea how I can create these categories properly? I tried to put the category creation into a transaction, but received the error about ancestor queries only again.

Thanks!

Simon

like image 372
Simon Avatar asked Nov 05 '22 09:11

Simon


1 Answers

Here is an approach to solving your problem. It is not an ideal approach in many ways, and I sincerely hope that someone other AppEnginer will come up with a neater solution than I have. If not, give this a try.

My approach utilizes the following strategy: it creates entities that act as aliases for the Category entities. The name of the Category can change, but the alias entity will retain its key, and we can use elements of the alias's key to create a keyname for your Category entities, so we will be able to look up a Category by its name, but its storage is decoupled from its name.

The aliases are all stored in a single entity group, and that allows us to use a transaction-friendly ancestor query, so we can lookup or create a CategoryAlias without risking that multiple copies will be created.

When I want to lookup or create a Category and item combo, I can use the category's keyname to programatically generate a key inside the transaction, and we are allowed to get an entity via its key inside a transaction.

class CategoryAliasRoot(db.Model):
    count = db.IntegerProperty()
    # Not actually used in current code; just here to avoid having an empty
    # model definition.

    __singleton_keyname = "categoryaliasroot"

    @classmethod
    def get_instance(cls):
            # get_or_insert is inherently transactional; no chance of
            # getting two of these objects.
        return cls.get_or_insert(cls.__singleton_keyname, count=0)

class CategoryAlias(db.Model):
    alias = db.StringProperty()

    @classmethod
    def get_or_create(cls, category_alias):
        alias_root = CategoryAliasRoot.get_instance()
        def txn():
            existing_alias = cls.all().ancestor(alias_root).filter('alias = ', category_alias).get()
            if existing_alias is None:
                existing_alias = CategoryAlias(parent=alias_root, alias=category_alias)
                existing_alias.put()

            return existing_alias

        return db.run_in_transaction(txn)

    def keyname_for_category(self):
        return "category_" + self.key().id

    def rename(self, new_name):
        self.alias = new_name
        self.put()

class Category(db.Model):
    pass

class Item(db.Model):
    name = db.StringProperty()

def get_or_create_item(category_name, item_name):

    def txn(category_keyname):
        category_key = Key.from_path('Category', category_keyname)

        existing_category = db.get(category_key)
        if existing_category is None:
            existing_category = Category(key_name=category_keyname)
            existing_category.put()

        existing_item = Item.all().ancestor(existing_category).filter('name = ', item_name).get()
        if existing_item is None:
            existing_item = Item(parent=existing_category, name=item_name)
            existing_item.put()

        return existing_item

    cat_alias = CategoryAlias.get_or_create(category_name)
    return db.run_in_transaction(txn, cat_alias.keyname_for_category())

Caveat emptor: I have not tested this code. Obviously, you will need to change it to match your actual models, but I think that the principles that it uses are sound.

UPDATE: Simon, in your comment, you mostly have the right idea; although, there is an important subtlety that you shouldn't miss. You'll notice that the Category entities are not children of the dummy root. They do not share a parent, and they are themselves the root entities in their own entity groups. If the Category entities did all have the same parent, that would make one giant entity group, and you'd have a performance nightmare because each entity group can only have one transaction running on it at a time.

Rather, the CategoryAlias entities are the children of the bogus root entity. That allows me to query inside a transaction, but the entity group doesn't get too big because the Items that belong to each Category aren't attached to the CategoryAlias.

Also, the data in the CategoryAlias entity can change without changing the entitie's key, and I am using the Alias's key as a data point for generating a keyname that can be used in creating the actual Category entities themselves. So, I can change the name that is stored in the CategoryAlias without losing my ability to match that entity with the same Category.

like image 169
Adam Crossland Avatar answered Nov 11 '22 04:11

Adam Crossland