I would like to use this library for anomaly detection in Databricks: iForest .This library can not be installed through PyPi.
How can I install libraries from GitHub in Databricks? I read about using something called an "egg" but I don't quite understand how it should be used.
You can clone the repo and create a Python package as explained here : https://github.com/titicaca/spark-iforest:
Step 2. Package pyspark-iforest and install it via pip, skip this step if you don't need the python pkg
cd spark-iforest/python
python setup.py sdist
pip install dist/pyspark-iforest-<version>.tar.gz
Here you only need the 2 first commands to generate the package but you have to change the second one to generate an egg package instead of source distribution package:
python3 setup.py bdist_egg
Now, you'll find the file in /dist folder:
pyspark_iforest-2.4.0-py3.7.egg
Finally, on Databricks, select Create > Library and choose Python Egg to upload the generated file. More details can be found here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With