I'm using Java Beam SDK for my dataflow job, and com.google.api.services.dataflow.model.Job
class gives details about a particular job. However, it doesn't provide any method/property to get dataflow step information such as Elements Added, Estimated Size etc.
Below is the code I'm using to get the job's information,
PipelineResult result = p.run();
String jobId = ((DataflowPipelineJob) result).getJobId();
DataflowClient client = DataflowClient.create(options);
Job job = client.getJob(jobId);
I'm looking for something like,
job.getSteps("step name").getElementsAdded();
job.getSteps("step name").getEstimatedSize();
Thanks in advance.
The SinkMetrics
class provides a bytesWritten()
method and a elementsWritten()
method. In addition, the SourceMetrics
class provides an elementsRead()
and a bytesRead()
method.
If you use the classes in the org.apache.beam.sdk.metrics
package to query for these metrics and filter by step, you should be able to get the underlying metrics for the stop (i.e., elements read).
I would add that if you're willing to look outside of the Beam Java SDK, since you're running on Google Cloud Dataflow you can use the Google Dataflow API
In particular, you can use projects.jobs.getMetrics
to fetch the detailed metrics for the job including the number of elements written/read. You will need to do some parsing of the metrics as there are hundreds of metrics for even a simple job, but the underlying data you're looking for is present via this API call (I just tested).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With