When I run example WordCount job from Dataflow docs with *DataflowPipelineRunner, it launches workers and then just hangs with state Running.
Last two status messages:
Jan 29, 2016, 22:05:50
S02: (b959a12901787f4d): Executing operation ReadLines+WordCount.CountWords/ParDo(ExtractWords)+WordCount.CountWords/Count.PerElement/Init+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey+WordCount.CountWords/Count.PerElement/Count.PerKey/Combine.GroupedValues/Partial+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey/Reify+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey/Write
Jan 29, 2016, 22:06:42
(c3fc1276c0229a41): Workers have started successfully.
and that's it. When I click "Worker logs", it's completely empty. It stays like this for at least 20 minutes.
It works fine with DirectPipelineRunner (completes within seconds and creates output file on my gs://...).
What should I look at?
Command-line parameters:
--project=my-project
--stagingLocation=gs://my-project/dataflow/staging
A common cause of no logs showing up is that the Cloud Logging API hasn't been enabled. If all of the APIs listed in the getting started guide are not enabled, then it could lead to both problems you described (no logging & hanging workers).
Try walking through through the getting started guide again and enabling all the relevant APIs.
If all API's enabled, check once your user auth.
glcoud auth login
and
gcloud auth application-default login
Also, ensure you have run those command with the user has project owner or editor
access.
Else you can use the service account with your job as below
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '<creds.json>'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With