I'm using the direct runner of Apache Beam Python SDK to execute a simple pipeline similar to the word count example. Since I'm processing a large file, I want to display metrics during the execution. I know how to report the metrics, but I can't find any way to access the metrics during the run.
I found the metrics()
function in the PipelineResult
, but it seems I only get a PipelineResult
object from the Pipeline.run()
function, which is a blocking call. In the Java SDK I found a MetricsSink
, which can be configured on PipelineOptions
, but I did not find an equivalent in the Python SDK.
How can I access live metrics during pipeline execution?
The direct runner is generally used for testing, development, and small jobs, and Pipeline.run()
was made blocking for simplicity. On other runners Pipeline.run()
is asynchronous and the result can be used to monitor the pipeline progress during execution.
You could try running a local version of an OSS runner like Flink to get this behavior.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With