Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GAE transaction failure and idempotency

The Google App Engine documentation contains this paragraph:

Note: If your application receives an exception when committing a transaction, it does not always mean that the transaction failed. You can receive DatastoreTimeoutException, ConcurrentModificationException, or DatastoreFailureException exceptions in cases where transactions have been committed and eventually will be applied successfully. Whenever possible, make your Datastore transactions idempotent so that if you repeat a transaction, the end result will be the same.

Wait, what? It seems like there's a very important class of transactions that just simply cannot be made idempotent because they depend on current datastore state. For example, a simple counter, as in a like button. The transaction needs to read the current count, increment it, and write out the count again. If the transaction appears to "fail" but doesn't REALLY fail, and there's no way for me to tell that on the client side, then I need to try again, which will result in one click generating two "likes." Surely there is some way to prevent this with GAE?

Edit:

it seems that this is problem inherent in distributed systems, as per non other than Guido van Rossum -- see this link:

app engine datastore transaction exception

So it looks like designing idempotent transactions is pretty much a must if you want a high degree of reliability.

I was wondering if it was possible to implement a global system across a whole app for ensuring idempotency. The key would be to maintain a transaction log in the datastore. The client would generated a GUID, and then include that GUID with the request (the same GUID would be re-sent on retries for the same request). On the server, at the start of each transaction, it would look in the datastore for a record in the Transactions entity group with that ID. If it found it, then this is a repeated transaction, so it would return without doing anything.

Of course this would require enabling cross-group transactions, or having a separate transaction log as a child of each entity group. Also there would be a performance hit if failed entity key lookups are slow, because almost every transaction would include a failed lookup, because most GUIDs would be new.

In terms of the additional $ cost in terms of additional datastore interactions, this would probably still be less than if I had to make every transaction idempotent, since that would require a lot of checking what's in the datastore in each level.

like image 855
eeeeaaii Avatar asked Jul 18 '13 11:07

eeeeaaii


2 Answers

dan wilkerson, simon goldsmith, et al. designed a thorough global transaction system on top of app engine's local (per entity group) transactions. at a high level, it uses techniques similar to the GUID one you describe. dan dealt with "submarine writes," ie the transactions you describe that report failure but later surface as succeeded, as well as many other theoretical and practical details of the datastore. erick armbrust implemented dan's design in tapioca-orm.

i don't necessarily recommend that you implement his design or use tapioca-orm, but you'd definitely be interested in the research.

in response to your questions: plenty of people implement GAE apps that use the datastore without idempotency. it's only important when you need transactions with certain kinds of guarantees like the ones you describe. it's definitely important to understand when you do need them, but you often don't.

the datastore is implemented on top of megastore, which is described in depth in this paper. in short, it uses multi-version concurrency control within each entity group and Paxos for replication across datacenters, both of which can contribute to submarine writes. i don't know if there are public numbers on submarine write frequency in the datastore, but if there are, searches with these terms and on the datastore mailing lists should find them.

amazon's S3 isn't really a comparable system; it's more of a CDN than a distributed database. amazon's SimpleDB is comparable. it originally only provided eventual consistency, and eventually added a very limited kind of transactions they call conditional writes, but it doesn't have true transactions. other NoSQL databases (redis, mongo, couchdb, etc.) have different variations on transactions and consistency.

basically, there's always a tradeoff in distributed databases between scale, transaction breadth, and strength of consistency guarantees. this is best known by eric brewer's CAP theorem, which says the three axes of the tradeoff are consistency, availability, and partition tolerance.

like image 135
ryan Avatar answered Oct 21 '22 17:10

ryan


The best way I came up with making counters idempotent is using a set instead of an integer in order to count. Thus, when a person "likes" something, instead of incrementing a counter I add the like to the thing like this:

class Thing {
Set<User> likes = ....

public void like (User u) {
  likes.add(u);
}
public Integer getLikeCount() {
  return likes.size();
}
}

this is in java, but i hope you get my point even if you are using python.

This method is idempotent and you can add a single user for how many times you like, it will only be counted once. Of course, it has the penalty of storing a huge set instead of a simple counter. But hey, don't you need to keep track of likes anyway? If you don't want to bloat the Thing object, create another object ThingLikes, and cache the like count on the Thing object.

like image 39
Moshe Shaham Avatar answered Oct 21 '22 16:10

Moshe Shaham