I want to avoid ever "accidentally" working in a default environment.
I want to always have an equivalent to a requirements.txt
or package.json
file available, both to clearly separate one project from another, and so that I can easily look back to see what is installed (and what version of it).
But I work primarily in the data science / analytics world, and primarily with Python.
As such, I use Anaconda, pip, and Homebrew (I have a Mac). It would be great to rely upon just one package manager, and many folks espouse one method or another to accomplish this. Truth is, as of now (Sep 2018), it's impossible to work in any breadth of topics and avoid at least some mixture.
Setting my sights lower and more realistic, I simply want to make sure that there is no default environment wherever possible, to make it cleaner and easier to work on projects with others.
To my knowledge, there is no concept of an environment in Homebrew at all. Conda of course has environments, but it first sets up a default environment before you can create any others.
Is there any way to install Anaconda without any default environment, so that I will always have to source activate <my_env>
? If so, how do I do that?
Barring this, what are the best suggestions to accomplish what I want, which is to never accidentally work in an environment where it is unclear what my dependencies are, recognizing that I'm talking primarily but not exclusively about using Python?
(Please don't suggest that I should "just be careful" when installing packages. Yes, I understand that. But I am trying to pre-emptively be careful by making the wrong choices as difficult or impossible as I can. If I had no default environment, for instance, then pip
would not even work until I sourced an environment since it would not be found in my normal environment.)
No, this is not possible: Currently supported install methods include the Anaconda installer and the miniconda installer.
Installing the Anaconda platform will install the following: Python; specifically the CPython interpreter that we discussed in the previous section. A number of useful Python packages, like matplotlib, NumPy, and SciPy. Jupyter, which provides an interactive “notebook” environment for prototyping code.
What is the default path for installing Anaconda? If you accept the default option to install Anaconda on the “default path” Anaconda is installed in your user home directory: Windows 10: C:\Users\<your-username>\Anaconda3\ macOS: /Users/<your-username>/anaconda3 for the shell install, ~/opt for the graphical install.
The answer for you will be No. if you already had anaconda installed in your laptop, once you open it up you will realized you can install Python within the software. Anaconda will not only included Python, R also will be included.
I think your best bet is to simply use a virtual environment and install dependencies as they become necessary, then just check in and out of your virtual environment as your work progresses. You can make different virtual environments as you work on different projects and leave their corresponding requirements.txt files inside of the directory python creates when installing a virtual environment. Let's say I have python3.5.2 as my normal, go-to python package (because I do).
Using python3.5 let us enter into a virtual environment with nothing more than bare bones python3.5 (no installed dependencies). To do this:
[dkennetz@node venv_test]$ python -m venv my_SO_project
[dkennetz@node venv_test]$ ls
my_SO_project
so we see, python has created a directory to house my virtual environment, but my virtual environment is not being used as my default python. In order to do this, we must activate it:
[dkennetz@node venv_test]$ source ./my_SO_project/bin/activate
So my shell now looks like this:
(my_SO_project) [dkennetz@nodecn201 venv_test]$
While we are here, let's see what our requirements look like:
(my_SO_project) [dkennetz@nodecn201 venv_test]$ pip freeze > requirements.txt
(my_SO_project) [dkennetz@nodecn201 venv_test]$ ls -alh
drwxr-x--- 3 dkennetz blank 4.0K Oct 9 09:52 .
drwxr-x--- 93 dkennetz root 16K Oct 9 09:40 ..
drwxr-x--- 5 dkennetz blank 4.0K Oct 9 09:47 my_SO_project
-rwxr-x--- 1 dkennetz blank 0 Oct 9 09:47 requirements.txt
Using blank to hide group names, but as we can see, our requirements.txt file size is empty, meaning this virtual environment has no dependencies. It is purely python3.5. Now let's go ahead and install pandas and see how our dependencies change.
(my_SO_project) [dkennetz@nodecn201 venv_test]$ pip install pandas
(my_SO_project) [dkennetz@nodecn201 venv_test]$ pip freeze > requirements.txt
(my_SO_project) [dkennetz@nodecn201 venv_test]$ more requirements.txt
numpy==1.15.2
pandas==0.23.4
python-dateutil==2.7.3
pytz==2018.5
six==1.11.0
(my_SO_project) [dkennetz@nodecn201 venv_test]$ wc -l requirements.txt
5 requirements.txt
Let's say we have written some code inside the environment and we no longer want to do any more work, so we do one final pip freeze > requirements.txt and we leave:
(my_SO_project) [dkennetz@nodecn201 venv_test]$ deactivate
[dkennetz@nodecn201 venv_test]$ pip freeze > requirements_normal.txt
[dkennetz@nodecn201 venv_test]$ wc -l requirements_normal.txt
82 requirements_normal.txt
Much more dependencies popped up, but nothing has changed in our normal environment, and nothing has changed in our virtual environment. Now let's say we have taken the rest of the day off and wish to go back to our SO_project that we created yesterday. Well it is easy:
[dkennetz@nodecn201 venv_test]$ ls -alh
drwxr-x--- 3 dkennetz blank 4.0K Oct 9 10:01 .
drwxr-x--- 93 dkennetz root 16K Oct 9 09:40 ..
drwxr-x--- 5 dkennetz blank 4.0K Oct 9 09:47 my_SO_project
-rwxr-x--- 1 dkennetz blank 77 Oct 9 09:56 requirements.txt
-rwxr-x--- 1 dkennetz blank 1.3K Oct 9 10:01 requirements_normal.txt
[dkennetz@nodecn201 venv_test]$ source ./my_SO_project/bin/activate
(my_SO_project) [dkennetz@nodecn201 venv_test]$
Let's see where we left off, (we should only have pandas installed, let's overwrite our old requirements_file):
(my_SO_project) [dkennetz@nodecn201 venv_test]$ pip freeze > requirements.txt
(my_SO_project) [dkennetz@nodecn201 venv_test]$ more requirements.txt
numpy==1.15.2
pandas==0.23.4
python-dateutil==2.7.3
pytz==2018.5
six==1.11.0
Cool so now we know we are just where we left off. Just a fair warning, I have pandas installed on my root python package, but what I do not have is the awscli (amazon web services command line interface). Let's say I want that for some reason in my package:
(my_SO_project) [dkennetz@nodecn201 venv_test]$ pip install awscli
(my_SO_project) [dkennetz@nodecn201 venv_test]$ pip freeze > requirements.txt
(my_SO_project) [dkennetz@nodecn201 venv_test]$ wc -l requirements.txt
15 requirements.txt
(my_SO_project) [dkennetz@nodecn201 venv_test]$ deactivate
[dkennetz@nodecn201 venv_test]$ ls
my_SO_project requirements.txt requirements_normal.txt
[dkennetz@nodecn201 venv_test]$ pip freeze > requirements_normal.txt
[dkennetz@nodecn201 venv_test]$ wc -l requirements_normal.txt
82 requirements_normal.txt
So we now see that installing the awscli has not made a change to our python package, but it has for our venv:
[dkennetz@nodecn201 venv_test]$ more requirements_normal.txt
appdirs==1.4.3
arrow==0.7.0
attrdict==2.0.0
avro-cwl==1.8.4
...
[dkennetz@nodecn201 venv_test]$ more requirements.txt
awscli==1.16.29
botocore==1.12.19
colorama==0.3.9
docutils==0.14
...
Finally let's say you've developed a super cool data science package entirely inside of your VM and you have made it pip install-able. The quick and easy for this is to just:
[dkennetz@nodecn201 venv_test]$ pip install -r requirements.txt
You can now use this as your package list every time your "new program" is being pip installed, and better yet you know every python package you need for it because those are the only ones you have included in your environment.
All this being said, there is no reason you can't do this every time you start a new project with new people. And if you want to have anaconda in every virtual environment you ever use, install anaconda normally:
[dkennetz@nodecn201 venv_test]$ ./Anaconda-1.6.0-Linux-x86_64.sh
[dkennetz@nodecn201 venv_test]$ source /home/dkennetz/anaconda3/bin/activate
#You will be in your anaconda environment now
(base) [dkennetz@nodecn201 venv_test]$ pip freeze > anaconda_reqs.txt
Say you've started my_SO_project2 now after that first one and you want to ensure that you have anaconda in this package. create your new venv the same way you did last time. Once inside just install all the dependencies anaconda requires and you will have a fresh anaconda virtual environment:
(my_SO_project2) [dkennetz@nodecn201 venv_test]$ pip install -r anaconda_reqs.txt
And your new venv starts as a fresh environment with nothing but anaconda installed.
I hope this clarifies what I have said in the comments, and it is helpful for you.
This question seems to be asking many different things at once.
Is there any way to install Anaconda without any default environment
As mentioned, conda
will always have a base environment, which is essentially the default environment.
As such, I use Anaconda, pip, and Homebrew (I have a Mac).
As mentioned, the big difference here is that Homebrew is for system-wide installs. You should treate pip
and conda
as per-project installs, as I will explain in answering :
what are the best suggestions to accomplish what I want, which is to never accidentally work in an environment where it is unclear what my dependencies are, recognizing that I'm talking primarily but not exclusively about using Python?
I want to always have an equivalent to a requirements.txt or package.json file available, both to clearly separate one project from another, and so that I can easily look back to see what is installed (and what version of it).
After working in data science for many years, this is the solution I have settled on which solves all of your problems.
(On Mac) install all your system-level tools with Homebrew, but do yourself a favor and try to limit this to 'generic' tools such as GNU tools (e.g. wget
, tree
) or other things that will not be changing on a per-project basis and/or otherwise are better installed system wide (e.g. Vagrant, Docker, PostgreSQL
For each project, have a dedicated wrapper script that installs conda
in the current directory. Note here that I do not mean to install a global conda
and use conda environments, I mean to literally install a fresh conda in every single project. This will work fine because within your wrapper scripts, you will include a detailed, version-locked set of the conda install commands required to install the exact versions of all of the packages you require.
Additionally, your wrapper script will contain the system environment modifications required to put this conda in your $PATH
and clear out or override lingering references to any other system Pythons. conda
is able to install a fairly large amount of non-Python packages, so this takes care of your non-Python software dependencies as much as possible. This includes R installations and many R packages (for things like Bioconductor, its even safer to install this way than the 'vanilla' way due to greater version control).
For packages that must be installed with pip
, do not worry, because every conda
installation comes with its own pip
installation as well. So you can pip install
within your conda
, and the packages will remain in the conda
alone. Your pip install
command will also be version locked, using requirements.txt
if you wish, guaranteeing that it is reproducible.
conda
instance set up, you will use the aforementioned wrapper scripts to wrap up all the commands you are using in your project to run your software. If you need to work interactively, you can just call bash
from within the wrapper script and it will drop you into an interactive bash
process with your environment from the wrapper script pre-populated.In practice, I prefer to use GNU make
with a Makefile to accomplish all of these things. I create a file Makefile
at the root of each project directory, with contents that look like this:
SHELL:=/bin/bash
UNAME:=$(shell uname)
# ~~~~~ Setup Conda ~~~~~ #
PATH:=$(CURDIR)/conda/bin:$(PATH)
unexport PYTHONPATH
unexport PYTHONHOME
# install versions of conda for Mac or Linux, Python 2 or 3
ifeq ($(UNAME), Darwin)
CONDASH:=Miniconda3-4.7.12.1-MacOSX-x86_64.sh
endif
ifeq ($(UNAME), Linux)
CONDASH:=Miniconda3-4.7.12.1-Linux-x86_64.sh
endif
CONDAURL:=https://repo.continuum.io/miniconda/$(CONDASH)
conda:
@echo ">>> Setting up conda..."
@wget "$(CONDAURL)" && \
bash "$(CONDASH)" -b -p conda && \
rm -f "$(CONDASH)"
install: conda
conda install -y \
conda-forge::ncurses=6.1 \
rabbitmq-server=3.7.16 \
anaconda::postgresql=11.2 \
pip install -r requirements.txt
# start interactive bash session
bash:
bash
run:
python myscript.py
Now, when you cd
into your project directory, you can just run a command like make install
to install all of your dependencies, and a command like make run
to run your code for the project.
conda
installations: first install all your packages without specifying any version numbers, then after you get them all installed go back and add the version numbers. This is a lot easier than trying to specify them up front.Finally, if your software dependencies do not fit into either Homebrew or conda or pip in this manner, you need to start making some choices about how much reproducibility and isolation you really need. You might start to look into Docker containers or Vagrant virtual machines (in both cases you can keep the recipe in your project dir and continue to wrapper-script the commands to run your code, for future reference). I typically have not run into any per-project software that is not settled without a combination of conda, pip, Docker, or Vagrant.
For really extenuating circumstances, for example running RStudio locally which does not play nice with R and libs installed in conda, I will just concede a bit and brute-force install required packages globally for development purposes but also try to recreate an isolated version-locked R + library instance either in conda or Docker and run the code as a script there to verify the results can still be regenerated without the global pacakges
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With