I run <code>Airflow</code> in a managed <code>Cloud-composer environment</code> (version 1.9.0), whic runs on a <code>Kubernetes</code> 1.10.9-gke.5 cluster. All my DAGs run daily at 3:00 AM or 4:00 AM. But sometime in the morning, I see a few Tasks failed without a reason during the night. <ul> <li>When checking the log using the UI - I see no log and I see no log either when I check the log folder in the GCS bucket <img src="https://i.stack.imgur.com/iQXEH.png" alt="enter image description here"></li> <li>In the instance details, it reads "Dependencies Blocking Task From Getting Scheduled" but the dependency is the dagrun itself. <img src="https://i.stack.imgur.com/CuA48.png" alt="enter image description here"></li> <li>Although the DAG is set with 5 retries and an email message it does not look as if any retry took place and I haven't received an email about the failure.</li> <li>I usually just clear the task instance and it run successfully on the first try. </li> </ul> Has anyone encountered a similar problem?

Empty logs often means the Airflow worker pod was evicted (i.e., it died before it could flush logs to GCS), which is usually due to an out of memory condition. If you go to your GKE cluster (the one under Composer's hood) you will probably see that there is indeed a evicted pod (GKE > Workloads > "airflow-worker"). You will probably see in "Tasks Instances" that said tasks have no <code>Start Date</code> nor <code>Job Id</code> or worker (<code>Hostname</code>) assigned, which, added to no logs, is a proof of the death of the pod. Since this normally happens in highly parallelised DAGs, a way to avoid this is to reduce the worker concurrency or use a better machine. EDIT: I filed this Feature Request on your behalf to get emails in case of failure, even if the pod was evicted.

Cloud composer tasks fail without reason or logs

1 Answers

Empty logs often means the Airflow worker pod was evicted (i.e., it died before it could flush logs to GCS), which is usually due to an out of memory condition. If you go to your GKE cluster (the one under Composer's hood) you will probably see that there is indeed a evicted pod (GKE > Workloads > "airflow-worker").

You will probably see in "Tasks Instances" that said tasks have no Start Date nor Job Id or worker (Hostname) assigned, which, added to no logs, is a proof of the death of the pod.

Since this normally happens in highly parallelised DAGs, a way to avoid this is to reduce the worker concurrency or use a better machine.

EDIT: I filed this Feature Request on your behalf to get emails in case of failure, even if the pod was evicted.

137

answered Sep 20 '22 07:09

Iñigo

Related questions
                            
                                Using proxy with Chromedriver within Google Cloud Engine
                            
                                How to get Google Cloud pricing details programmatically?
                            
                                DockerDaemonConnectionError when setting Google Cloud Managed VM in Ubuntu
                            
                                Run Websocket on GAE
                            
                                Node not ready, pods pending
                            
                                Google Cloud Spanner database pricing when idle
                            
                                GKE node pool custom machine type CLI
                            
                                Error importing built-in module "_subprocess" using Google Cloud Platform's Local Development Server
                            
                                Sync github repository with google cloud storage bucket
                            
                                How do you set a static IP address for a Google Container Engine (GKE) service?
                            
                                Writing to custom timeseries with the Google Cloud Monitoring v3 api
                            
                                Is it possible to use google authentication (i.e. service account) for custom API?
                            
                                Access Google BigQuery Data from local Jupyter Notebooks
                            
                                parameters for google cloud natural language api
                            
                                Do I need to Setup a Reverse Proxy behind Google App Engine or not?
                            
                                How do I statically provision a volume for a StatefulSet?
                            
                                How to delete GCP project immediately?
                            
                                Doing the equivalent of log_struct in python logger
                            
                                How do you sign a HIPAA BAA for Google Cloud platform?
                            
                                SSH browser doesn't work in Compute Engine GCP

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Cloud composer tasks fail without reason or logs

Tags:

google-cloud-platform

airflow

google-cloud-composer

Ary Jazz

People also ask

1 Answers

Iñigo

Recent Activity

Donate For Us