If we maintain our code/scripts in github repository account, is there any way to copy these scripts from Github repository and execute on some other cluster ( which can be Hadoop or Spark).
Does airflow provides any operator to connect to Github for fetching such files ?
Maintaining scripts in Github will provide more flexibility as every change in the code will be reflected and used directly from there.
Any idea on this scenario will really help.
You can use GitPython as part of a PythonOperator task to run the pull as per a specified schedule.
import git
g = git.cmd.Git( git_dir )
g.pull()
Don't forget to make sure that you have added the relevant keys so that the airflow workers have permission to pull the data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With