Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python package installation: pip vs yum, or both together?

I've just started administering a Hadoop cluster. We're using Bright Cluster Manager up to the O/S level (CentOS 7.1) and then Ambari together with Hortonworks HDP 2.3 for Hadoop.

I'm constantly getting requests for new python modules to be installed. Some modules we've installed at setup using yum and as the cluster has progressed some modules have been installed using pip.

What is the "right" way to do this? Always use yum and not be able to provide the latest and greatest modules? Always use pip and not have one point of truth (yum) showing which packages are installed? Or is it fine to use both pip and yum together?

I'm just worried that I'm filling the system with junk and too many versions of python modules. Any suggestions?

like image 801
ClusterAdmin Avatar asked Jan 19 '16 10:01

ClusterAdmin


1 Answers

Packages which are part of your distribution should be preferred, because they have been tested to work properly on your system. These packages are installed system-wide.

However if a suitable RPM package is not provided, go ahead and install it from e.g. PyPi or github with pip, but deploy virtual Python environments whenever possible. With virtual envs you don't have to install third-party packages system-wide. You will have several smaller sets of packages which are much better manageable as one set.

like image 107
VPfB Avatar answered Oct 18 '22 03:10

VPfB