Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract contents from PCollection in Cloud Dataflow?

Just want to know how to extract things from PCollection? Say I have applied a Count.Globally so there's a single number in the resulting PCollection, but how can I extract it as a Long value?

Thanks.

like image 509
darkjh Avatar asked Oct 23 '25 17:10

darkjh


1 Answers

It depends on how you want to use that value.

If you want to read that value after your pipeline finishes you could use one of the write transforms (e.g. AvroIO.Write) to write it to some output that you could then read from whatever code executes after your pipeline finishes.

If you want to use that value in a subsequent part of your pipeline then you could apply a View transfrom to generate a PCollectionView which you could then pass as a side input to other transforms.

Consider a simple example where the goal is to print out the Count. The Count won't be available until after the pipeline runs. So in this case we could do the following

  • Define a DoFn<Long, String> which we apply to the count in order to turn the Long into the message we want to print out.
  • Apply a TextIO.Write transform to write the message to a file.
  • Run the job and wait for it to finish. If we want to execute using the Dataflow Service we can use BlockingDataflowRunner to wait for the job to finish.
  • After the job finishes read the text file created to get the message and print it out.
like image 184
Jeremy Lewi Avatar answered Oct 25 '25 22:10

Jeremy Lewi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!