Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I install matplotlib for my AWS Elastic Beanstalk application?

I'm having a hell of a time deploying matplotlib on AWS Elastic Beanstalk. I gather that my issue comes from some dependencies and the way that EB deploys packages installed with PIP, and have attempted to follow the instructions here on SO for resolving the issue.

I first tried incrementally deploying, as suggested in the linked answer, by adding pieces of the matplotlib package stack to my requirements.txt file in stages. But this takes forever (for each stage) and is prone to failure and timing out (which seems to leave build directories behind that stall subsequent package installations).

So the simple solution mentioned off-handedly at the end of the answer appeals to me: just eb ssh, activate the virtialenv with

source /opt/python/run/venv/bin/activate

and pip install packages manually. But I can't get this to work either. First I'm often confronted with left-beind build directories (as mentioned above)

pip can't proceed with requirement 'xxxx' due to a pre-existing build directory.
 location: /opt/python/run/venv/build/xxxx
This is likely due to a previous installation that failed.
pip is being responsible and not assuming it can delete this.
Please delete it and try again.

But even after removing these, I consistently get

Exception:
Traceback (most recent call last):
  File "/opt/python/run/venv/lib/python2.7/site-packages/pip/basecommand.py", line 122, in main
    status = self.run(options, args)
  File "/opt/python/run/venv/lib/python2.7/site-packages/pip/commands/install.py", line 278, in run
    requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
  File "/opt/python/run/venv/lib/python2.7/site-packages/pip/req.py", line 1197, in prepare_files
    do_download,
  File "/opt/python/run/venv/lib/python2.7/site-packages/pip/req.py", line 1375, in unpack_url
    self.session,
  File "/opt/python/run/venv/lib/python2.7/site-packages/pip/download.py", line 582, in unpack_http_url
    unpack_file(temp_location, location, content_type, link)
  File "/opt/python/run/venv/lib/python2.7/site-packages/pip/util.py", line 625, in unpack_file
    untar_file(filename, location)
  File "/opt/python/run/venv/lib/python2.7/site-packages/pip/util.py", line 533, in untar_file
    os.makedirs(location)
  File "/opt/python/run/venv/lib64/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/opt/python/run/venv/build/xxxx'

in response to pip install xxxx (and sudo pip fails with sudo: pip: command not found).

What can I do to get this working on AWS-EB? In particular, what do I need to do to get the simple SSH+PIP approach working; or is there some other better — simpler! — approach I should try.


FWIW, I have a .ebextensions/software.config with

packages:
  yum:
    gcc-c++: []
    gcc-gfortran: []
    python-devel: []
    atlas-sse3-devel: []
    lapack-devel: []
    libpng-devel: []
    freetype-devel: []
    zlib-devel: []

and a requirements.txt that ends with

pytz==2014.10
pyparsing==2.0.3
python-dateutil==2.4.0
nose==1.3.4
six>=1.8.0
mock==1.0.1

numpy==1.9.1

matplotlib==1.4.2

After about 4 hours, I've gotten far as numpy (as reported by pip list in the EB virtualenv).

And (in case it matters) the user who is SSHing is part in a group with the policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticbeanstalk:*",
        "ec2:*",
        "elasticloadbalancing:*",
        "autoscaling:*",
        "cloudwatch:*",
        "s3:*",
        "sns:*",
        "cloudformation:*",
        "rds:*",
        "sqs:*",
        "iam:PassRole"
      ],
      "Resource": "*"
    }
  ]
}
like image 344
orome Avatar asked Jan 22 '15 22:01

orome


People also ask

How do I deploy Python code in AWS Elastic Beanstalk?

Use the Elastic Beanstalk console to configure Python process settings, enable AWS X-Ray, enable log rotation to Amazon S3, and configure variables that your application can read from the environment. Open the Elastic Beanstalk console , and in the Regions list, select your AWS Region.


2 Answers

I have used many approaches to build and deploy numpy/scipy/matplotlib, on Windows as well as Linux systems. I have used system-provided package managers (aptitude, rpm), 3rd-party package managers (pypm), Python package managers (easy_install, pip), source releases, used different build environments/tools (GCC, but also Intel MKL, OpenMP). While doing so, I have run into many many quite annoying situations, but have also learned a lot about the pros and cons of each approach.

I have no experience with Elastic Beanstalk (EB), but I have experience with EC2. I see that you can SSH into an instance and poke around. So, what I suggest further below is based on

  • above-stated experiences and on
  • the more or less obvious boundary conditions regarding Beanstalk and on
  • your application scenario, described in another question here on SO and on
  • the fact that you just want to get things running, quickly

My suggestion: start off with not building these things yourself. Do not use pip. If possible, try to use the package manager of the Linux distribution in place and let it handle the installation of everything required for you, with a single command (e.g. sudo apt-get install python-matplotlib).

Disadvantages:

  • possibly old package versions, depending on the Linux distro in use
  • non-optimized builds (e.g. not built against e.g. Intel MKL or not leveraging OpenMP features or not using special instruction sets)

Advantages:

  • it quickly downloads, because packages are most likely cached near your machine
  • it quickly installs (these packages are pre-built, no compilation involved)
  • it just works

So, I hope you can just use aptitude or rpm or whatever on these machines and inherit the great work that the distribution package maintainers do for you, behind the scenes.

Once you are confident in your application and identified some bottleneck or issue, you might have reason to use a newer version of numpy/matplotlib/... or you might have reason to have a faster version of these, by creating an optimized build.

Edit: EB-related details of outlined approach

In the meantime we have learned that EB by default runs Amazon Linux which is based on Red Hat Enterprise Linux. Likewise, it uses yum as package manager and packages are in RPM format.

Amazon provides documentation about available packages. In Amazon Linux 2014.09, these packages are available: http://aws.amazon.com/de/amazon-linux-ami/2014.09-packages/

In this list we find

  • numpy-1.7.2
  • python-matplotlib-0.99.1.2

This version of matplotlib is very old, according to the changelog it is from September 2009: "2009-09-21 Tagged for release 0.99.1".

I did not anticipate it to be so old, but still, it might be sufficient for your needs. So we proceed with our plan (but I'd understand if that's a blocker).

Now, we have learned that system Python and EB Python are isolated from each other. That does not mean that EB Python cannot access system Python site packages. We just need it to tell so. A simple and clean method is to set up a proper directory structure with the packages that should be accessible to EB Python, and to communicate this directory to EB Python via sys.path.

Clearly, we need to customize the bootstrapping phase of EB containers. The available tools are documented here: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html

Obviously, we want to make use of the packages approach, and tell EB to install the numpy and python-matplotlib packages via yum. So the corresponding config file section should contain:

 packages:  
  yum:  
   numpy: []  
   python-matplotlib: []  

Explicitly mentioning numpy might not be necessary, it likely is a dependency of python-matplotlib.

Also, we need to make use of the commands section:

You can use the commands key to execute commands on the EC2 instance. The commands are processed in alphabetical order by name, and they run before the application and web server are set up and the application version file is extracted.

The following three commands create above-mentioned directory, and set up symbolic links to the numpy/mpl installation paths (these paths hopefully are available in the moment these commands become executed):

commands:
  00-create-dir:
    command: "mkdir -p /opt/py26-selected-site-packages"
  01-link-numpy:
    command: "ln -s /usr/lib64/python2.6/site-packages/numpy /opt/py26-selected-site-packages/numpy"
  02-link-mpl:
    command: "ln -s /usr/lib64/python2.6/site-packages/matplotlib /opt/py26-selected-site-packages/matplotlib"

Two uncertainties: the AWS docs to not clarify that packages are processed before commands are executed. You have to try. It it does not work, use container_commands. Secondly, it is just an educated guess that /usr/lib64/python2.6/site-packages/matplotlib is available after installing python-matplotlib. It should be installed to this place, but it may end up somewhere else. Needs to be tested. Numpy should end up where specified as inferred from this article.

[UPDATE FROM SEB] AWS documentation says "The cfn-init helper script processes these configuration sections in the following order: packages, groups, users, sources, files, commands, and then services." http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-init.html

So, your approach is safe [/UPDATE]

The crucial step, as pointed out in the comments to this answer, is to tell your Python app where to look for packages. Direct modification of sys.path before attempting to import is a reliable method to take control of this. The following code adds our special directory to the selection of directories in which Python looks out for packages, and then attempts to import matplotlib:

sys.path.append("/opt/py26-selected-site-packages")
from matplotlib import pyplot

The order in sys.path defines priorities, so in case there is any other matplotlib or numpy package available in one of the other directories, it might be a better idea to

sys.path.insert(0, "/opt/py26-selected-site-packages")

However, this should not be necessary if our whole approach was well thought-through.

like image 186
Dr. Jan-Philip Gehrcke Avatar answered Sep 29 '22 23:09

Dr. Jan-Philip Gehrcke


To add to Jan-Philip Answer :

AWS Elastic Beanstalk is using Amazon Linux distribution (except for .Net environments). Amazon Linux uses the yum package manager. MatPlotLib is available in Amazon's software repository.

[ec2-user@ip-1-1-1-174 ~]$ yum list | grep matplot
python-matplotlib.x86_64            0.99.1.2-1.6.amzn1              amzn-main

If this version is the one you need for your application, I would try to simply modify your .ebextensions/software.config file and to add the package to the yum section of it:

packages:
  yum:
    python-matplotlib: [] 
    python-devel: []
    atlas-sse3-devel: []
    lapack-devel: []
    libpng-devel: []
    freetype-devel: []
    zlib-devel: []

A last note about AWS Elastic BeansTalk and SSH.

While Amazon gives you the possibility to SSH to your Elastic Beanstalk instances, you should use this possibility only for debugging purposes, to understand why your app failed or is not installing as suggested.

Other than that, your deployment must be 100% automatic. When Elastic Beanstalk (Auto Scaling to be precise) will scale out your infrastructure (add more instances) or scale it in (terminate instances) depending on your application workload, all your manual configuration will be lost.

Best practices is to not install SSH keys on your production environment, it further reduces the surface of attacks.

like image 25
Sébastien Stormacq Avatar answered Sep 29 '22 23:09

Sébastien Stormacq