Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a dataflow job's step details using Java Beam SDK?

I'm using Java Beam SDK for my dataflow job, and com.google.api.services.dataflow.model.Job class gives details about a particular job. However, it doesn't provide any method/property to get dataflow step information such as Elements Added, Estimated Size etc.

enter image description here

Below is the code I'm using to get the job's information,

PipelineResult result = p.run();        
String jobId = ((DataflowPipelineJob) result).getJobId();
DataflowClient client = DataflowClient.create(options);
Job job = client.getJob(jobId);

I'm looking for something like,

job.getSteps("step name").getElementsAdded();
job.getSteps("step name").getEstimatedSize();

Thanks in advance.

like image 623
Vijin Paulraj Avatar asked Oct 29 '22 00:10

Vijin Paulraj


1 Answers

The SinkMetrics class provides a bytesWritten() method and a elementsWritten() method. In addition, the SourceMetrics class provides an elementsRead() and a bytesRead() method.

If you use the classes in the org.apache.beam.sdk.metrics package to query for these metrics and filter by step, you should be able to get the underlying metrics for the stop (i.e., elements read).

I would add that if you're willing to look outside of the Beam Java SDK, since you're running on Google Cloud Dataflow you can use the Google Dataflow API In particular, you can use projects.jobs.getMetrics to fetch the detailed metrics for the job including the number of elements written/read. You will need to do some parsing of the metrics as there are hundreds of metrics for even a simple job, but the underlying data you're looking for is present via this API call (I just tested).

like image 71
eb80 Avatar answered Nov 04 '22 13:11

eb80