How can I develop with Python libraries in editable mode on databricks?

Question

On Databricks, it is possible to install Python packages directly from a git repo, or from the dbfs:

%pip install git+https://github/myrepo
%pip install /dbfs/my-library-0.0.0-py3-none-any.whl

Is there a way to enable a live package development mode, similar to the usage of pip install -e, such that the databricks notebook references the library files as is, and it's possible to update the library files on the go?

E.g. something like

%pip install /dbfs/my-library/ -e

combined with a way to keep my-library up-to-date?

Thanks!

Alex Ott · Accepted Answer

I would recommend to adopt the Databricks Repos functionality that allows to import Python code into a notebook as a normal package, including the automatic reload of the code when Python package code changes.

You need to add the following two lines to your notebook that uses the Python package that you're developing:

%load_ext autoreload
%autoreload 2

Your library is recognized as the Databricks Repos main folders are automatically added to sys.path. If your library is in a Repo subfolder, you can add it via:

import os, sys
sys.path.append(os.path.abspath('/Workspace/Repos/<username>/path/to/your/library'))

This works for the notebook node, however not for worker nodes.

P.S. You can see examples in this Databricks cookbook and in this repository.

How can I develop with Python libraries in editable mode on databricks?

Tags:

python

pip

databricks

elke

1 Answers

Alex Ott

Recent Activity

Donate For Us

How can I develop with Python libraries in editable mode on databricks?

Tags:

python

pip

databricks

elke

1 Answers

Alex Ott

Related questions

Recent Activity

Donate For Us