Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between appengine datastore timeout errors 5 and 11?

I'm trying to speed up a Google App Engine request handler that has a big datastore PutMulti call (500 entities) by splitting it into batches of entities and running concurrent goroutines to send smallerPutMulti calls (100 entities each).

Before this, I had often been getting the datastore error Call error 11: Deadline exceeded (timeout) from my PutMulti calls going over the deadline when I tested the handler on many concurrent requests. After the parallelization, the handler did speed up, but I still occasionally got that error and also another type of error, API error 5 (datastore_v3: TIMEOUT): The datastore operation timed out, or the data was temporarily unavailable.

Is this error 5 due to contention in the datastore, and what is the difference between errors 5 and 11?

like image 966
Andy Haskell Avatar asked Jan 20 '16 18:01

Andy Haskell


2 Answers

These errors come from two different places, the first, the call error, is a local error that is caused by a timeout in the RPC client. It indicates that there was a timeout waiting for completion of an RPC. The default RPC timeout in google.golang.org/appengine is 60 seconds.

The second error comes from the service side. This error indicates that a timeout occurred performing operations within datastore. Some of these operations have timeouts much shorter than 60s, and typically this may indicate contention.

A possibly simpler way to understand the differences is that you will find that if you make a single multi operation with a very large number of changes, you can trigger the first timeout with ease. If you create a significant number of concurrent operations against a single key or small set of keys, you will more readily trigger the latter. As timeouts are general indicators of saturation of shared resources, there are of course many ways and combinations to generate them. In general, one will want to retry operations as appropriate, and also size operations appropriately, as well as aggregating operations on hot keys as best as possible to reduce the chance of contention related issues. As others have suggested, the python and java docs have some examples of this already.

You may wish to make use of https://godoc.org/google.golang.org/appengine#IsTimeoutError and if you need to increase the timeout for the first error class, you may be able to adjust the context deadline, see the methods here: https://godoc.org/golang.org/x/net/context#WithDeadline Note: you will not be able to extend the deadline beyond that of a request deadline, however, if you are running in tasks or VMs you can extend to long deadlines.

like image 137
raggi Avatar answered Sep 28 '22 07:09

raggi


The first error you see may be just the timeout in normal operation, the 2nd is likely because of write contention. More on this: Handling Datastore Errors https://cloud.google.com/appengine/articles/handling_datastore_errors

like image 25
Jeff Allen Avatar answered Sep 28 '22 06:09

Jeff Allen