I have a locally created .egg
package that depends on boto==2.38.0.
I used setuptools to create the build distribution. Everything works in my own local environment, as it fetches boto
correctly from PiP
. However on databricks
it does not automatically fetch dependencies when I attach a library to the cluster.
I really struggled now for a few days trying to install a dependency automatically when loaded on databricks, I use setuptools;
'install_requires=['boto==2.38.0']'
is the relevant field.
When I install boto
directly from PyPi
on the databricks
server (so not relying on the install_requires
field to work properly) and then call my own .egg
, it does recognize that boto
is a package, but it does not recognize any of its modules (since it is not imported on my own .egg's namespace???). So I cannot get my .egg
to work. If this problem persists without having any solutions I'd think that is a really big problem for databricks
users right now. There should be a solution of course...
Thank you!
Your application's dependencies will not, in general, work properly if they are diverse and don't have uniform language support. The Databrick docs explain that
Databricks will install the correct version if the library supports both Python 2 and 3. If the library does not support Python 3 then library attachment will fail with an error.
In this case it will not automatically fetch dependencies when you attach a library to the cluster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With