Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Access Apache Beam metrics values during pipeline run in python?

I'm using the direct runner of Apache Beam Python SDK to execute a simple pipeline similar to the word count example. Since I'm processing a large file, I want to display metrics during the execution. I know how to report the metrics, but I can't find any way to access the metrics during the run.

I found the metrics() function in the PipelineResult, but it seems I only get a PipelineResult object from the Pipeline.run() function, which is a blocking call. In the Java SDK I found a MetricsSink, which can be configured on PipelineOptions, but I did not find an equivalent in the Python SDK.

How can I access live metrics during pipeline execution?

like image 429
aKzenT Avatar asked Sep 21 '25 07:09

aKzenT


1 Answers

The direct runner is generally used for testing, development, and small jobs, and Pipeline.run() was made blocking for simplicity. On other runners Pipeline.run() is asynchronous and the result can be used to monitor the pipeline progress during execution.

You could try running a local version of an OSS runner like Flink to get this behavior.

like image 176
robertwb Avatar answered Sep 22 '25 23:09

robertwb