On Databricks, it is possible to install Python packages directly from a git repo, or from the dbfs:
%pip install git+https://github/myrepo
%pip install /dbfs/my-library-0.0.0-py3-none-any.whl
Is there a way to enable a live package development mode, similar to the usage of pip install -e
, such that the databricks notebook references the library files as is, and it's possible to update the library files on the go?
E.g. something like
%pip install /dbfs/my-library/ -e
combined with a way to keep my-library up-to-date?
Thanks!
I would recommend to adopt the Databricks Repos functionality that allows to import Python code into a notebook as a normal package, including the automatic reload of the code when Python package code changes.
You need to add the following two lines to your notebook that uses the Python package that you're developing:
%load_ext autoreload
%autoreload 2
Your library is recognized as the Databricks Repos main folders are automatically added to sys.path
. If your library is in a Repo subfolder, you can add it via:
import os, sys
sys.path.append(os.path.abspath('/Workspace/Repos/<username>/path/to/your/library'))
This works for the notebook node, however not for worker nodes.
P.S. You can see examples in this Databricks cookbook and in this repository.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With