After some research and testing, we have decided to start using Google Cloud Composer. Since our current DAGs and tasks are relatively small, and don't require the server to run continuously, I am looking how to manage costs.
Two questions:
preemptible
VMs seems logical. This saves costs considerably, and I'm thinking to go for 3x n1-standard-4
. I expect each task to be quite short, so don't think this will have significant impact for our workloads. Is it possible to use preemptible
VMs with Composer?Help, anyone?
Pricing for Cloud Composer is consumption based, so you pay for what you use, as measured by vCPU/hour, GB/month, and GB transferred/month. We have multiple pricing units because Cloud Composer uses several Google Cloud products as building blocks.
Before you begin. Enable the Cloud Composer API. The approximate time to create an environment is 25 minutes.
Cloud Composer is a cross platform orchestration tool that supports AWS, Azure and GCP (and more) with management, scheduling and processing abilities. Cloud Dataflow handles tasks. Cloud Composer manages entire processes coordinating tasks that may involve BigQuery, Dataflow, Dataproc, Storage, on-premises, etc.
Cloud Composer is built on the popular Apache Airflow open source project and operates using the Python programming language. By using Cloud Composer instead of a local instance of Apache Airflow, you can benefit from the best of Airflow with no installation or management overhead.
This is an interesting question.
One roadblock you may encounter is the nature of Airflow itself. Generally, Airflow is not intended for use ephemerally. Instead, I'd suspect that the vast majority of Airflow use, Cloud Composer or otherwise, is persistent. Ephemerality brings cost benefits but also risks with Airflow architecture. For example, what happens if the scheduler to restart your Airflow resources fails?
To answer your questions:
gcloud
) to connect to your projectIn the long-term, I think Composer will better support ephemeral use of worker resources. In the short term, another option is to run a lightweight Airflow environment on a small(ish) GCE VM and then suspend/resume that VM when you need to use Airflow. You don't get Composer that way, but you do benefit from the team's work improving and expanding GCP support in core Airflow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With