Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to be able to "move" all necessary libraries that a script requires when moving to a new machine

We work on scientific computing and regularly submit calculations to different computing clusters. For that we connect using linux shell and submitting jobs through SGE, Slurm, etc (it depends on the cluster). Our codes are composed of python and bash scripts and several binaries. Some of them depend on external libraries such as matplotlib. When we start to use a new cluster, it is a nightmare since we need to tell the admins all the libraries we need, and sometimes they can not install all of them, or they only have old versions that can not be upgraded. So we wonder what could we do here. I was wondering if we could somehow "pack" all libraries we need along with our codes. Do you think it is possible? Otherwise, how could we move to new clusters without the need for admins to install anything?

like image 661
Open the way Avatar asked Sep 15 '16 13:09

Open the way


People also ask

How do you import all objects from a module into the current namespace?

So __all__ specifies all modules that shall be loaded and imported into the current namespace when we use from <package> import * .

What is __ all __ in Python?

Python __all__ It's a list of public objects of that module, as interpreted by import * . It overrides the default of hiding everything that begins with an underscore.


2 Answers

The key is to compile all the code you need by yourself, using the compiler/library/MPI toolchains installed by the admins of the clusters, so that

  1. your software is compiled properly for the cluster hardware, and
  2. you do not depend on the admin to install the software.

The following are very useful in this case:

  • Ansible, to upload/manage configuration files, rc files, set permissions, compile your binaries, etc. and deploy a new environment easily on new clusters
  • Easybuild to install your version of Python with all the needed dependencies, and install other scientific software thanks to the community supported build procedures
  • CDE to build a package with all dependencies for your binaries on your laptop and use it as-is on the clusters.

More specifically for Python, you can use

  • virtual envs to setup a consistent set of Python modules across all clusters, independently from the modules already installed; or
  • Anaconda or Canopy to use a Python scientific distribution

to have a consistent Python install across all clusters.

like image 187
damienfrancois Avatar answered Sep 28 '22 16:09

damienfrancois


Don't get me wrong, but I think what you have to do so: stop behaving like amateurs.

Meaning: the integrity of your "system configuration" is one of the core assets of your "business". And you just told us that you are basically unable of easily re-producing your system configuration.

So, the real answer here can't be a recommendation to use this or that technology. The real answer is: you, and the other teams involved in running your operations need to come together and define a serious strategy how to fix this.

Maybe you then decide that the way to go is that your development team provides Docker buildfiles, so that your operations team can easily create images on new machines. Or you decide that you need to use something like ansible to enable centralized control over your complete environment.

like image 45
GhostCat Avatar answered Sep 28 '22 15:09

GhostCat