Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Dataflow hangs with no logs

When I run example WordCount job from Dataflow docs with *DataflowPipelineRunner, it launches workers and then just hangs with state Running.

Last two status messages:

Jan 29, 2016, 22:05:50
S02: (b959a12901787f4d): Executing operation ReadLines+WordCount.CountWords/ParDo(ExtractWords)+WordCount.CountWords/Count.PerElement/Init+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey+WordCount.CountWords/Count.PerElement/Count.PerKey/Combine.GroupedValues/Partial+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey/Reify+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey/Write

Jan 29, 2016, 22:06:42
(c3fc1276c0229a41): Workers have started successfully.

and that's it. When I click "Worker logs", it's completely empty. It stays like this for at least 20 minutes.

It works fine with DirectPipelineRunner (completes within seconds and creates output file on my gs://...).

What should I look at?

Command-line parameters:

--project=my-project
--stagingLocation=gs://my-project/dataflow/staging
like image 921
Dzmitry Lazerka Avatar asked Sep 26 '22 12:09

Dzmitry Lazerka


2 Answers

A common cause of no logs showing up is that the Cloud Logging API hasn't been enabled. If all of the APIs listed in the getting started guide are not enabled, then it could lead to both problems you described (no logging & hanging workers).

Try walking through through the getting started guide again and enabling all the relevant APIs.

like image 192
Ben Chambers Avatar answered Oct 11 '22 12:10

Ben Chambers


If all API's enabled, check once your user auth.

glcoud auth login

and

gcloud auth application-default login

Also, ensure you have run those command with the user has project owner or editor access.

Else you can use the service account with your job as below import os os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '<creds.json>'

like image 36
deepak Avatar answered Oct 11 '22 14:10

deepak