Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deploying scientific python algorithm on Amazon ec2

I have a Python scientific model that calls some C code and uses numpy, scipy, and many geographic analysis modules. I would like to deploy it on EC2 but I don't know much about EC2 yet.

I have checked that I could use the StarCluster package to deploy my stack after setting up AMIs that are derived from StarCluster AMIs. These already have numpy and scipy and ipython, so all I would have to do is add geographic modules.

My plan was to write a standalone GUI that runs on customers' machines and makes sure their inputs are valid for my model. Then the standalone GUI sends up to about 10 GB zipped archives to an FTP location. Then they sign in to my web page I run on EC2 where they configure the run properties (# of instances, # of model runs). That web page starts a script that does the customer's job on the cluster of size they specified. The a post processor processes the model output and writes results web pages and graphs that are initially password-protected for the customer viewing only. My model runs consist of individual iterations that may take 5 minutes to 3 hours.

Can anyone offer any advice for ideal set up with this model? I think I can figure out the scientific part of it, but I don't see what the starting point is for running the web interface...

Thanks

like image 246
PeterS Avatar asked Nov 04 '22 01:11

PeterS


1 Answers

Interesting project!

Adding modules to the AMI you deployed on AWS EC2 can be done via pip. First you'll need SSH access to your instance. Documentation on this is here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html Then if you don't have it installed already, you can install pip & your additional packages & modules as follows:

sudo apt-get install -y python-pip
sudo pip install numpy (already installed so no need for this)
sudo pip install scipy (same as above)

Ubuntu & Debian sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose

The versions in Ubuntu 12.10 and Debian 7.0 meet the current Scipy stack specification. Users might also want to add the NeuroDebian repository for extra Scipy packages. Fedora sudo yum install numpy scipy python-matplotlib ipython python-pandas sympy python-nose

Users of Fedora 17 and earlier should then upgrade IPython using pip: sudo pip install --upgrade ipython (info above found via scipy documentation: http://www.scipy.org/install.html)

As for your plans for the GUI & large file upload, take a look at AWS S3 (though this has some limitations) for file storage & depending on how far you want to push your solution, you may to use chunked file uploading or stream a multi-part request similar to these solutions for the file transfers:

https://github.com/blueimp/jQuery-File-Upload/wiki/Chunked-file-uploads
https://devcenter.heroku.com/articles/paperclip-s3
https://github.com/heiflo/play21-file-upload-streaming
https://github.com/netty/netty/issues/845
https://github.com/playframework/playframework/pull/884
https://github.com/floatingfrisbee/amazonfileupload
http://blog.assimov.net/blog/2011/04/03/multi-file-upload-with-uploadify-and--carrierwave-on-rails-3/

(a quick search for "chunked file uploads github" or "chunked file uploads google code" should turn up lots of options in terms of available code & detailed information.)

However, an easier direction for the file uploads/transfer may be to look at solutions like these:

http://www.bucketexplorer.com/be-download.html
https://forums.aws.amazon.com/thread.jspa?messageID=258228&tstart=0
https://forums.aws.amazon.com/thread.jspa?messageID=257781&tstart=0
http://www.jfileupload.com/products/js3upload/index.html
http://codeonaboat.wordpress.com/2011/04/22/uploading-a-file-to-amazon-s3-using-an-asp-net-mvc-application-directly-from-the-users-browser/

Regardless, you'll want to make sure your environment on your EC2 instance &/or your S3 buckets are configured to allow large file uploads & processing. For example, your AMIs php version needs to be compiled & setup via php.ini to upload files over certain sizes - there are also timeouts you'll need to be aware of - and you will likely need a 64bit AMI along with a large EBS to power all this.

As for the less complex, front-end components of your GUI, jQuery or node.js are good starting points. There are also tons of code packages & documentation on Github or in the AWS EC2/S3 forums such as the following:

https://github.com/josegonzalez/upload

Without knowing your specific requirements, plans & time/budget limitations, that's the most advice I can give. However, feel free to reply to this thread or ping me directly with any other questions.

like image 130
JaT5 Avatar answered Nov 09 '22 07:11

JaT5