Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Incorrect 'key' value in Map transform

apache-beam==2.23.0 Python 3.8.5 DirectRunner

In my Map transform, I'm trying to extract Key value for each tuple element (after GroupByKey transform upstream). But the Output is always a string 'KeyParam' instead actual key value

here's minimal code:

pipeline code

p| beam.Create([("2","elem2.1"),("1","elem1.1"),("1","elem1.2")]) \
|"group" >>beam.GroupByKey() \
| "log_PCollection_AfterGrouped" >> beam.Map(myRawProcessor.myReader) \

Map transform code

class myRawProcessor():
    @classmethod
    def myReader(self,e,
              timestamp=beam.DoFn.TimestampParam,
              window=beam.DoFn.WindowParam,
              watermark=beam.DoFn.WatermarkEstimatorParam,
              key=beam.DoFn.KeyParam,
               *args, **kwargs):
        print("=== === ===")
        print(e)
        print(key)
        return e

Output

> === === === 
> ('2', ['elem1.1']) 
> KeyParam -----> EXPECTED :: '2'
> === === === 
> ('1', ['elem1.2', 'elem1.3']) 
> KeyParam ----> EXPECTED :: '1'

like image 358
Vibhor Jain Avatar asked Nov 06 '22 05:11

Vibhor Jain


1 Answers

This is a bug, see BEAM-10780. In the meantime, avoid using DoFn.KeyParam in this context.

like image 192
robertwb Avatar answered Nov 14 '22 22:11

robertwb