apache-beam==2.23.0 Python 3.8.5 DirectRunner
In my Map transform, I'm trying to extract Key value for each tuple element (after GroupByKey transform upstream). But the Output is always a string 'KeyParam' instead actual key value
here's minimal code:
pipeline code
p| beam.Create([("2","elem2.1"),("1","elem1.1"),("1","elem1.2")]) \
|"group" >>beam.GroupByKey() \
| "log_PCollection_AfterGrouped" >> beam.Map(myRawProcessor.myReader) \
Map transform code
class myRawProcessor():
    @classmethod
    def myReader(self,e,
              timestamp=beam.DoFn.TimestampParam,
              window=beam.DoFn.WindowParam,
              watermark=beam.DoFn.WatermarkEstimatorParam,
              key=beam.DoFn.KeyParam,
               *args, **kwargs):
        print("=== === ===")
        print(e)
        print(key)
        return e
Output
> === === === 
> ('2', ['elem1.1']) 
> KeyParam -----> EXPECTED :: '2'
> === === === 
> ('1', ['elem1.2', 'elem1.3']) 
> KeyParam ----> EXPECTED :: '1'
This is a bug, see BEAM-10780. In the meantime, avoid using DoFn.KeyParam in this context.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With