Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I develop with Python libraries in editable mode on databricks?

On Databricks, it is possible to install Python packages directly from a git repo, or from the dbfs:

%pip install git+https://github/myrepo
%pip install /dbfs/my-library-0.0.0-py3-none-any.whl 

Is there a way to enable a live package development mode, similar to the usage of pip install -e, such that the databricks notebook references the library files as is, and it's possible to update the library files on the go?

E.g. something like

%pip install /dbfs/my-library/ -e

combined with a way to keep my-library up-to-date?

Thanks!

like image 908
elke Avatar asked Oct 17 '25 21:10

elke


1 Answers

I would recommend to adopt the Databricks Repos functionality that allows to import Python code into a notebook as a normal package, including the automatic reload of the code when Python package code changes.

You need to add the following two lines to your notebook that uses the Python package that you're developing:

%load_ext autoreload
%autoreload 2

Your library is recognized as the Databricks Repos main folders are automatically added to sys.path. If your library is in a Repo subfolder, you can add it via:

import os, sys
sys.path.append(os.path.abspath('/Workspace/Repos/<username>/path/to/your/library'))

This works for the notebook node, however not for worker nodes.

P.S. You can see examples in this Databricks cookbook and in this repository.

like image 116
Alex Ott Avatar answered Oct 20 '25 11:10

Alex Ott



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!