Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to configure Google Cloud Composer cost-effectively

After some research and testing, we have decided to start using Google Cloud Composer. Since our current DAGs and tasks are relatively small, and don't require the server to run continuously, I am looking how to manage costs.

Two questions:

  1. The option to use preemptible VMs seems logical. This saves costs considerably, and I'm thinking to go for 3x n1-standard-4. I expect each task to be quite short, so don't think this will have significant impact for our workloads. Is it possible to use preemptible VMs with Composer?
  2. Schedule to turn the Composer environment on/off, as asked in this post. I can't find how to do this in the documentation, either by turning the whole enviroment down, or to shutdown the workers as proposed in the answer.

Help, anyone?

like image 785
dkapitan Avatar asked Nov 15 '18 12:11

dkapitan


People also ask

Is Google composer free?

Pricing for Cloud Composer is consumption based, so you pay for what you use, as measured by vCPU/hour, GB/month, and GB transferred/month. We have multiple pricing units because Cloud Composer uses several Google Cloud products as building blocks.

How long does it take to create a Cloud Composer environment?

Before you begin. Enable the Cloud Composer API. The approximate time to create an environment is 25 minutes.

What is the difference between Cloud Composer and dataflow?

Cloud Composer is a cross platform orchestration tool that supports AWS, Azure and GCP (and more) with management, scheduling and processing abilities. Cloud Dataflow handles tasks. Cloud Composer manages entire processes coordinating tasks that may involve BigQuery, Dataflow, Dataproc, Storage, on-premises, etc.

Is Cloud Composer same as airflow?

Cloud Composer is built on the popular Apache Airflow open source project and operates using the Python programming language. By using Cloud Composer instead of a local instance of Apache Airflow, you can benefit from the best of Airflow with no installation or management overhead.


1 Answers

This is an interesting question.

One roadblock you may encounter is the nature of Airflow itself. Generally, Airflow is not intended for use ephemerally. Instead, I'd suspect that the vast majority of Airflow use, Cloud Composer or otherwise, is persistent. Ephemerality brings cost benefits but also risks with Airflow architecture. For example, what happens if the scheduler to restart your Airflow resources fails?

To answer your questions:

  1. Preemptibles are not supported in Composer. While PVMs have a ton of awesome benefits, they could leave tasks in a very weird state, especially if you got preempted several times.
  2. There is not formal documentation for this process because it's generally informal and not recommended if you must depend on your environment. The basic approach, though, would be to:
    1. Create a very small GCE VM
    2. Setup the Cloud SDK (gcloud) to connect to your project
    3. Create a crontab that either does a fresh create/delete of an environment when you need it /or/ pauses the VMs in the Composer worker pool

In the long-term, I think Composer will better support ephemeral use of worker resources. In the short term, another option is to run a lightweight Airflow environment on a small(ish) GCE VM and then suspend/resume that VM when you need to use Airflow. You don't get Composer that way, but you do benefit from the team's work improving and expanding GCP support in core Airflow.

like image 111
James Avatar answered Sep 28 '22 09:09

James