apache-beam==2.23.0
Python 3.8.5
DirectRunner
In my Map transform, I'm trying to extract Key value for each tuple element (after GroupByKey transform upstream). But the Output is always a string 'KeyParam' instead actual key value
here's minimal code:
pipeline code
p| beam.Create([("2","elem2.1"),("1","elem1.1"),("1","elem1.2")]) \
|"group" >>beam.GroupByKey() \
| "log_PCollection_AfterGrouped" >> beam.Map(myRawProcessor.myReader) \
Map transform code
class myRawProcessor():
@classmethod
def myReader(self,e,
timestamp=beam.DoFn.TimestampParam,
window=beam.DoFn.WindowParam,
watermark=beam.DoFn.WatermarkEstimatorParam,
key=beam.DoFn.KeyParam,
*args, **kwargs):
print("=== === ===")
print(e)
print(key)
return e
Output
> === === ===
> ('2', ['elem1.1'])
> KeyParam -----> EXPECTED :: '2'
> === === ===
> ('1', ['elem1.2', 'elem1.3'])
> KeyParam ----> EXPECTED :: '1'
This is a bug, see BEAM-10780. In the meantime, avoid using DoFn.KeyParam
in this context.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With