I am trying to create a simple web app using Python on GAE. The app needs to spawn some threads per request received. For this I am using python's threading library. I spawn all the threads and then wait on them.
t1.start()
t2.start()
t3.start()
t1.join()
t2.join()
t3.join()
The application runs fine except for the fact that the threads are running serially rather than concurrently(confirmed this by printing the timestamps at the beginning/end of each thread's run() method). I have followed the instructions given in http://code.google.com/appengine/docs/python/python27/using27.html#Multithreading to enable multithreading
My app.yaml looks like:
application: myapp
version: 1
runtime: python27
api_version: 1
threadsafe: true
handlers:
- url: /favicon\.ico
static_files: favicon.ico
upload: favicon\.ico
- url: /stylesheet
static_dir: stylesheet
- url: /javascript
static_dir: javascript
- url: /pages
static_dir: pages
- url: .*
script: main.app
I made sure that my local GoogleAppLauncher uses python 2.7 by setting the path explicitly in the preferences.
My threads have an average run-time of 2-3 seconds in which they make a url open call and do some processing on the result.
Am I doing something wrong, or missing some configuration to enable multithreading?
In fact, a Python process cannot run threads in parallel but it can run them concurrently through context switching during I/O bound operations. This limitation is actually enforced by GIL. The Python Global Interpreter Lock (GIL) prevents threads within the same process to be executed at the same time.
Each CPU core can have up to two threads if your CPU has multi/hyper-threading enabled. You can search for your own CPU processor to find out more. For Mac users, you can find out from About > System Report. This means that my 6-Core i7 processor has 6 cores and can have up to 12 threads.
Threading in python is used to run multiple threads (tasks, function calls) at the same time. Note that this does not mean that they are executed on different CPUs. Python threads will NOT make your program faster if it already uses 100 % CPU time.
Are you experiencing this in the dev_appserver or after uploading your app to the production service? From your mention of GoogleAppLauncher it sounds like you may be seeing this in the dev_appserver; the dev_appserver does not emulate the threading behavior of the production servers, and you'd be surprised to find that it works just fine after you deploy your app. (If not, add a comment here.)
Another idea: if you are mostly waiting for the urlfetch, you can run many urlfetch calls in parallel by using the async interface to urlfetch: http://code.google.com/appengine/docs/python/urlfetch/asynchronousrequests.html
This approach does not require threads. (It still doesn't properly parallelize the requests in the dev_appserver; but it does do things properly on the production servers.)
The multithreading notes for GAE are merely for how requests are handled - they don't fundamentally change how Python threads work. Specifically, the "CPython Implementation Detail" note in the threading module docs still applies.
It's also worth mentioning the note in the "Sandboxing" section of the GAE docs:
Note that threads will be joined by the runtime when the request ends, so the threads cannot run past the end of the request.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With