Is there anyway I can use preemptible instance for dataflow jobs?

Tags:

It's evident that preemptible instance are cheaper than non-preemptible instance. On daily basis 400-500 dataflow jobs are running in my organisational project. Out of which some jobs are time-sensitive and others are not. So is there any way I could use preemptible instance for non-time-constraint job, which will cost me less for overall pipeline execution. Currently I'm running dataflow jobs with below specified configuration.

        options.setTempLocation("gs://temp/");
        options.setRunner(DataflowRunner.class);
        options.setTemplateLocation("gs://temp-location/");
        options.setWorkerMachineType("n1-standard-4");
        options.setMaxNumWorkers(20);
        options.setWorkerCacheMb(2000);

I'm not able to find out any pipeline options with preemptible instance setting.

360

asked Feb 09 '20 15:02

miles212

1 Answers

Yes, it is possible to do so with Flexible Resource Scheduling in Cloud Dataflow (docs). Note that there are some things to consider:

Delayed execution: jobs are scheduled and not executed right away (you can see a new QUEUED status for your Dataflow jobs). They are run opportunistically when resources are available within a six-hour window. This makes FlexRS suitable to reduce cost for non-time-critical workloads. Also, be sure to validate your code before sending the job.
Batch jobs: as of now it only accepts batch jobs and requires to enable autoscaling:

You cannot set autoscalingAlgorithm=NONE
Dataflow Shuffle: it needs to be enabled. When so, no data is stored on persistent disks attached to the VMs. This way, when a preemption happens and resources are claimed back there is no need to redistribute the data.
Regions: according to the previous item, only regions where Dataflow Shuffle is supported can be selected. List here, turn-up for new regions will be announced in the release notes. As of now, zone is automatically chosen within the region.
Machine types: FlexRS currently supports n1-standard-2 (default) and n1-highmem-16.
SDK: requires 2.12.0 or newer for Java or Python.
Quota: quota is reserved upfront (i.e. queued jobs also consume quota).

In order to run it, use --flexRSGoal=COST_OPTIMIZED and make sure to take into account that the rest of parameters conform to the FlexRS needs.

A uniform discount rate is applied to FlexRS jobs, you can compare pricing details in the following link.

Note that you might see a Beta disclaimer in the non-English documentation but, as clarified in the release notes, it's Generally Available.

110

answered Sep 26 '22 08:09

Guillem Xercavins

Related questions
                            
                                Angular Firebase Function Deploy Error: Cannot find module 'firebase/app'
                            
                                ER_ACCESS_DENIED_ERROR CloudSQL
                            
                                firebase function Error: Cannot find module when serving locally, out of the blue on previously working project
                            
                                iOS App running on iOS 13 is resetting each time the user goes into background for at least 30sec
                            
                                Firebase hosting caches Google Cloud Run requests
                            
                                Cloud build permission denied when deploy to cloud run with "--set-sql-instance" argument
                            
                                Scheduling cron jobs on Google Cloud DataProc
                            
                                Reference Firebase user objects in Firestore
                            
                                Running PyTorch multiprocessing in a Docker container with Gunicorn worker manager
                            
                                Passing a path parameter to Google's Endpoint for Cloud Function
                            
                                How to set AWS S3 credentials in a Google Firebase cloud function?
                            
                                Flutter Firestore reset cache
                            
                                RuntimeValueProviderError when creating a google cloud dataflow template with Apache Beam python
                            
                                Spring Boot Logging and Google Cloud Platform Log Viewer
                            
                                Is it possible to increase the response timeout in Google App Engine?
                            
                                Send notification automatically from Firebase
                            
                                Firebase ML kit give FirebaseMLException: Waiting for the text recognition model to be downloaded. Please wait
                            
                                Firebase's iOS Offline Capabilities vs Core Data
                            
                                Does firebase hosting benefit from CloudFlare?
                            
                                How do I deploy Firebase Database Security rules using the command line?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there anyway I can use preemptible instance for dataflow jobs?

Tags:

google-compute-engine

google-cloud-platform

apache-beam

google-cloud-dataflow

miles212

People also ask

1 Answers

Guillem Xercavins

Recent Activity

Donate For Us