Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does the automatic use of caching in NDB, the Google App Engine Datastore library for Python, invalidate the transaction model?

A major selling point of Google Cloud Datastore is that it provides strong consistency within an entity group.

Cloud Datastore ensures that entity lookups by key and ancestor queries always receive strongly consistent data.

[Datastore is good for] Transactions based on ACID properties, for example, transferring funds from one bank account to another.

The NDB library is the documented way to access the Datastore from Google App Engine for Python.

However, by default, the NDB library uses caching to speed up results. The caches used are an "in-context cache" and memcache. But neither of these caches can be updated transactionally with the datastore. It seems therefore that important consistency properties have to be given up (emphasis mine):

when the transaction is committed, its context will attempt to delete all such entities from memcache. Note, however, that some failures may prevent these deletions from happening.

Is my understanding of this correct? That is, when using the NDB library in the default configuration, there is no consistency guarantee for access even within an entity group?

If I am right, this is a big problem.

It sacrifices pretty much the biggest property of the Datastore. All this documentation about consistency and ACID transactions. Talks at Google IO about how to use entity groups to get consistency. Even research papers. And quietly, in a small corner of the documentation, in the most casual of sentences, I learn that I don't get these properties in the default configuration.

This is incredibly misleading. I'm sure most people have not seen this. Most implementations are probably expecting ACID transactions within entity groups, but they are not getting it. These are serious bugs in production code.

This is a major failure of implementation and documentation. The default should never have sacrificed consistency for speed. Consistency was the whole point of entity groups. And if the implementation did this unexpected thing that changes the semantics so dramatically, then the documentation should have made it deafeningly clear.

like image 898
user2771609 Avatar asked Jul 11 '17 20:07

user2771609


People also ask

What is NDB Google App Engine?

This is a Python 3 version of the ndb client library for use with Google Cloud Datastore. The original Python 2 version was designed specifically for the Google App Engine python27 runtime. This version of ndb is designed for the Google App Engine Python 3 runtime and will run on other Python 3 platforms as well.

What is memcache in cloud computing?

Memcached is an easy-to-use, high-performance, in-memory data store. It offers a mature, scalable, open-source solution for delivering sub-millisecond response times making it useful as a cache or session store.


1 Answers

As far as I aware if you get entities withing transactions cache is not used so you are OK on data modifications.

Direct datastore reads by key are consistent. So if you want to get strongly consistent results on reads you would need to disable the ndb cache where needed. Otherwise you get eventual consistency e.g. if cache invalidation succeed or cache expires/evicted.

You also may want manually remove entities from cache after transaction completed with ndb.delete() and _use_datastore=False to make sure cache is clean.

like image 122
Alexander Trakhimenok Avatar answered Oct 17 '22 10:10

Alexander Trakhimenok