How to execute egg file from Azure Data Factory (AD) pipeline? Currently I'm able only to call Databricks notebook from where executing egg file. Any way to do that directly?
Following this answer, I got the following exception:
{
"errorCode": "3201",
"message": "Must specify one jar or maven library for jar task, either via jar_uri or libraries.",
"failureType": "UserError",
"target": "Execute Egg",
"details": []
}
On my local machine I can execute python dist/hello_world-1.0-py2.7.egg
, that will print 'Hello world!'
src
|-__init__.py
|-main.py
__main__.py
setup.py
setup.py
from setuptools import setup, find_packages
setup(
name='hello-world',
version='1.0',
packages=find_packages(),
py_modules=['__main__']
)
__main_ _.py
from src.main import run
if __name__ == '__main__':
run()
src/main.py
def run():
print('Hello world!')
if __name__ == '__main__':
run()
It seems you selected Jar activity in Azure Data Factory, instead of Python activity.
In the Jar activity, the "Main class name" expects full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library.
If you select Python activity, you can specify Python file name and upload your egg library.
You can find more details about it here: https://learn.microsoft.com/en-us/azure/data-factory/transform-data-databricks-python
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With