Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build a computationally intensive webservice?

Tags:

python

I need to build a webservice that is very computationally intensive, and I'm trying to get my bearings on how best to proceed.

I expect users to connect to my service, at which point some computation is done for some amount of time, typically less than 60s. The user knows that they need to wait, so this is not really a problem. My question is, what's the best way to structure a service like this and leave me with the least amount of headache? Can I use Node.js, web.py, CherryPy, etc.? Do I need a load balancer sitting in front of these pieces if used? I don't expect huge numbers of users, perhaps hundreds or into the thousands. I'll need a number of machines to host this number of users, of course, but this is uncharted territory for me, and if someone can give me a few pointers or things to read, that would be great.

Thanks.

like image 459
user415614 Avatar asked Aug 09 '10 22:08

user415614


3 Answers

Can I use Node.js, web.py, CherryPy, etc.?

Yes. Pick one. Django is nice, also.

Do I need a load balancer sitting in front of these pieces if used?

Almost never.

I'll need a number of machines to host this number of users,

Doubtful.

Remember that each web transaction has several distinct (and almost unrelated) parts.

  1. A front-end (Apache HTTPD or NGINX or similar) accepts the initial web request. It can handle serving static files (.CSS, .JS, Images, etc.) so your main web application is uncluttered by this.

  2. A reasonably efficient middleware like mod_wsgi can manage dozens (or hundreds) of backend processes.

  3. If you choose a clever backend processing component like celery, you should be able to distribute the "real work" to the minimal number of processors to get the job done.

  4. The results are fed back into Apache HTTPD (or NGINX) via mod_wsgi to the user's browser.

Now the backend processes (managed by celery) are divorced from the essential web server. You achieve a great deal of parallelism with Apache HTTPD and mod_wsgi and celery allowing you to use every scrap of processor resource.

Further, you may be able to decompose your "computationally intensive" process into parallel processes -- a Unix Pipeline is remarkably efficient and makes use of all available resources. You have to decompose your problem into step1 | step2 | step3 and make celery manage those pipelines.

You may find that this kind of decomposition leads to serving a far larger workload than you might have originally imagined.

Many Python web frameworks will keep the user's session information in a single common database. This means that all of your backends can -- without any real work -- move the user's session from web server to web server, making "load balancing" seamless and automatic. Just have lots of HTTPD/NGINX front-ends that spawn Django (or web.py or whatever) which all share a common database. It works remarkably well.

like image 66
S.Lott Avatar answered Nov 15 '22 20:11

S.Lott


I think you can build it however you like, as long as you can make it an asynchronous service so that the users don't have to wait.

Unless, of course, the users don't mind waiting in this context.

like image 1
Robert Harvey Avatar answered Nov 15 '22 21:11

Robert Harvey


I'd recommend using nginx as it can handle rewrite/balancing/ssl etc with a minimum of fuss

like image 1
John La Rooy Avatar answered Nov 15 '22 21:11

John La Rooy