I am trying to perform a union operation in Dataflow. Is there sample code for taking the union of two PCollections in Dataflow?
A simple way to do this would be to combine Flatten() with RemoveDuplicates() like so. Depending on whether you want the disjoint union or set-theoretic union, the RemoveDuplicates call can be omitted:
PCollection<String> pc1 = ...;
PCollection<String> pc2 = ...;
PCollection<String> union = PCollectionList.of(pc1).and(pc2)
.apply(Flatten.<String>pCollections())
.apply(RemoveDuplicates.<String>create());
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With