i am a newbie to Airflow. i have some .jar jobs generated with Talend Open Studio for Big Data, and i want to schedule and manage those with Airflow my question is , does Airflow support .jar file or generated by TOS as DAG ? and if it does how ? or is there any alternative to run .jar on Airlow ?
im using Airflow v1.10.3 the jobs are mainly to extract and process data from a mongodb database then update the database with the new processed data.
Thanks !
In Airflow, a DAG – or a Directed Acyclic Graph – is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies. A DAG is defined in a Python script, which represents the DAGs structure (tasks and their dependencies) as code.
Airflow does support running jar files. You do this through the BashOperator
.
Quick example:
from airflow import DAG
from airflow.operators import BashOperator
from datetime import datetime
import os
import sys
args = {
'owner': 'you',
'start_date': datetime(2019, 4, 24),
'provide_context': True
}
dag = DAG(
task_id = 'runjar',
schedule_interval = None, #manually triggered
default_args = args)
run_jar_task= BashOperator(
task_id = 'runjar',
dag = dag,
bash_command = 'java -cp /path/to/your/jar.jar param1 param2'
)
Airflow will happily run .jar
files. There is a few examples kicking about for you to have a look at.
Running a standard .jar
file: run_jar.py
Running a "built" Talend jobl loan_application_data.py
Obviously with both these examples the .jar
or Talend file(s) will need to be on the server Airflow is executing on (as well as Java).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With