Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I perform a Union in Dataflow?

I am trying to perform a union operation in Dataflow. Is there sample code for taking the union of two PCollections in Dataflow?

like image 619
Sam McVeety Avatar asked Feb 12 '15 20:02

Sam McVeety


1 Answers

A simple way to do this would be to combine Flatten() with RemoveDuplicates() like so. Depending on whether you want the disjoint union or set-theoretic union, the RemoveDuplicates call can be omitted:

PCollection<String> pc1 = ...;
PCollection<String> pc2 = ...;
PCollection<String> union = PCollectionList.of(pc1).and(pc2)
  .apply(Flatten.<String>pCollections())
  .apply(RemoveDuplicates.<String>create());
like image 142
Sam McVeety Avatar answered Sep 28 '22 05:09

Sam McVeety