conda clean --packages removes unused packages from writable package caches. What is this 'writable package cache', and how is conda able to detect that it's unused?
Is it actually running through all of the python files and looking for dependencies? Or does it keep a record of what has run before?
Does it ever remove packages that I installed via pip but never used?
Conda uses hardlinks to minimize physical disk usage. That is, a single physical copy of lib/libz.a may be referenced from the package cache (where it was first unpacked), and then in multiple environments.
Conda determines eligibility for removing a package from the package cache by counting the number of hardlinks for the files in each package. Hardlink counts are tracked by the filesystem, not by Conda. An outline of the relevant code is:
# keep a list of packages to remove
pkgs_to_remove = []
# look in all package caches (there can be multiple)
for pkg_cache in pkgs_dirs:
  # check all packages
  for pkg in pkg_cache:
    # assume removable unless...
    remove_pkg = True
    for file in pkg:
       # is there evidence that it is linked elsewhere?
       if num_links(file) > 1:
         # if so, don't remove, and move on
         remove_pkg = False
         break
    # add it to list is removable
    if remove_pkg:
      pkgs_to_remove.append(pkg)
# output some info on `pkgs_to_remove`
# check if user wants to execute removal
That is, if any file in a package has more than one link, then Conda will conclude it is used in another environment, and move on to the next package.
Note that filesystems don't keep track of symbolic links (a.k.a., symlinks, softlinks), and Conda doesn't track them, hence, Conda warns about cleaning packages in combination with the allow_softlinks setting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With