Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Execute egg directly from Azure Data Factory

Question

How to execute egg file from Azure Data Factory (AD) pipeline? Currently I'm able only to call Databricks notebook from where executing egg file. Any way to do that directly?

What have been done

Following this answer, I got the following exception:

{
    "errorCode": "3201",
    "message": "Must specify one jar or maven library for jar task, either via jar_uri or libraries.",
    "failureType": "UserError",
    "target": "Execute Egg",
    "details": []
}

enter image description here

Code and structure

On my local machine I can execute python dist/hello_world-1.0-py2.7.egg, that will print 'Hello world!'

src
 |-__init__.py
 |-main.py
__main__.py
setup.py

setup.py

from setuptools import setup, find_packages

setup(
    name='hello-world',
    version='1.0',
    packages=find_packages(),
    py_modules=['__main__']
)

__main_ _.py

from src.main import run

if __name__ == '__main__':
    run()

src/main.py

def run():
    print('Hello world!')


if __name__ == '__main__':
    run()
like image 567
VB_ Avatar asked Nov 06 '22 09:11

VB_


1 Answers

It seems you selected Jar activity in Azure Data Factory, instead of Python activity.

Databricks activities in Azure Data Factory

In the Jar activity, the "Main class name" expects full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library.

If you select Python activity, you can specify Python file name and upload your egg library.

enter image description here

You can find more details about it here: https://learn.microsoft.com/en-us/azure/data-factory/transform-data-databricks-python

like image 174
Valdas M Avatar answered Nov 13 '22 18:11

Valdas M