Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Actual concurrency in Google App Engine backend/module instances

Google App Engine offers services like Task Queues and Backends (now Modules) to parallelize handling of requests and doing "concurrent work". Typical fan-in fan-out/fork-join techniques can easily be implemented with Pipelines API, Fantasm etc.

When configuring the hardware of Backends/Modules you choose between B1, B2, B4, B8, but it does not say anything about the number of cores in the CPU configuration. Maybe the number of CPU cores is not relevant here. Backends support spawning "Background Threads" for each incoming request, but Python cannot actually do real concurrent work because of the famous GIL (Global Interpreter Lock).

One frontend instance will handle 8 requests (default, maximum 30) before firing up a new instance.

Python 2.7 with the Threadsafe directive is said to handle incoming request in parallel on one isolated instance, is this correct, or is it only incoming requests that are spread across the independent instances which are done with real concurrency?

On Google App Engine, what is actually performed with real concurrency technically, and on the other side, what is the recommended design pattern gaining most real concurrency and scaling?

You could make a "manual scaling" Backend/Module with 10-20 resident B8 instances with each spawning 10 "out-lived" background threads and doing 10 concurrent async URL fetches at all times for I/O work, or should it be fanned-out with dynamic instance creation?

like image 369
Fredrik Bertin Fjeld Avatar asked Nov 10 '22 13:11

Fredrik Bertin Fjeld


1 Answers

Python 2.7 with the Threadsafe directive is said to handle incoming request in parallel on one isolated instance, is this correct?

Yes, that's correct. It actually does run multiple simultaneous requests on each instance, as opposed to just spreading them across instances. Same with Java and Go (but it sounds like not PHP). It's generally considered a best practice to allow this, since it improves the efficiency of most workloads substantially.

This SO answer has the best details I've seen on how GAE determines whether and when to run requests concurrently.

You're right that Python has a GIL, which limits concurrency across cores to a degree, and for workloads that truly are CPU bound, more than one thread per core doesn't help you much. However, the vast majority of workloads are not CPU bound, especially webapps on platforms like GAE. They're usually I/O bound instead, ie they spend most of their time waiting on the datastore, HTTP fetches to other services, etc. App Engine uses that blocked time to efficiently run other concurrent requests on the same instance.

like image 76
ryan Avatar answered Nov 15 '22 05:11

ryan