Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple BashOperator in Airflow doesn't recognize the current folder

Tags:

airflow

I am using Airflow to see if I can do the same work for my data ingestion, original ingestion is completed by two steps in shell:

  1. cd ~/bm3
  2. ./bm3.py runjob -p projectid -j jobid

In Airflow, I have two tasks with BashOperator:

task1 = BashOperator(
    task_id='switch2BMhome',
    bash_command="cd /home/pchoix/bm3",
    dag=dag)

task2 = BashOperator(
    task_id='kickoff_bm3',
    bash_command="./bm3.py runjob -p client1 -j ingestion",
    dag=dag)

task1 >> task2

The task1 completed as expected, log below:

[2019-03-01 16:50:17,638] {bash_operator.py:100} INFO - Temporary script location: /tmp/airflowtmpkla8w_xd/switch2ALhomeelbcfbxb
[2019-03-01 16:50:17,638] {bash_operator.py:110} INFO - Running command: cd /home/rxie/al2

the task2 failed for the reason shown in log:

[2019-03-01 16:51:19,896] {bash_operator.py:100} INFO - Temporary script location: /tmp/airflowtmp328cvywu/kickoff_al2710f17lm
[2019-03-01 16:51:19,896] {bash_operator.py:110} INFO - Running command: ./bm32.py runjob -p client1 -j ingestion
[2019-03-01 16:51:19,902] {bash_operator.py:119} INFO - Output:
[2019-03-01 16:51:19,903] {bash_operator.py:123} INFO - /tmp/airflowtmp328cvywu/kickoff_al2710f17lm: line 1: ./bm3.py: No such file or directory

So it seems every task is executed from a seemly unique temp folder, which failed the second task.

How can I run the bash command from specific location?

Any thought is highly appreciated if you can share here.

Thank you very much.

UPDATE: Thanks for the suggestion which almost works.

The bash_command="cd /home/pchoix/bm3 && ./bm3.py runjob -p client1 -j ingestion", works fine in the first place, however the runjob has multiple tasks in it, the first task works, and second task invoke impala-shell.py to run something, the impala-shell.py specifies python2 as its interpreter language while outside it, other parts are using python 3.

This is OK when I just run the bash_command in shell, but in Airflow, for unknown reason, despite I set the correct PATH and make sure in shell:

(base) (venv) [pchoix@hadoop02 ~]$ python
Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)

The task is still executed within python 3 and uses python 3, which is seen from the log:

[2019-03-01 21:42:08,040] {bash_operator.py:123} INFO -   File "/data/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/bin/../lib/impala-shell/impala_shell.py", line 220
[2019-03-01 21:42:08,040] {bash_operator.py:123} INFO -     print '\tNo options available.'
[2019-03-01 21:42:08,040] {bash_operator.py:123} INFO -                                   ^
[2019-03-01 21:42:08,040] {bash_operator.py:123} INFO - SyntaxError: Missing parentheses in call to 'print'

Note this issue doesn't exist when I run the job in shell environment:

./bm3.py runjob -p client1 -j ingestion
like image 397
mdivk Avatar asked Jun 09 '26 13:06

mdivk


1 Answers

How about:

task = BashOperator(
    task_id='switch2BMhome',
    bash_command="cd /home/pchoix/bm3 && ./bm3.py runjob -p client1 -j ingestion",
    dag=dag)
like image 196
santon Avatar answered Jun 12 '26 12:06

santon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!