Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Beam/Dataflow Python: AttributeError: '_UnwindowedValues' object has no attribute 'sort'

I am developing a workflow process to run on Google Cloud Dataflow using the Python SDK from Apache Beam.

When running locally the workflow successfully completes with no errors and the data output is exactly as expected.

When I try to run on Dataflow service, it throws the following error: AttributeError: '_UnwindowedValues' object has no attribute 'sort'

Which is from the following piece of code:

class OrderByDate(beam.DoFn):
def process(self, context):
    (k, v) = context.element
    v.sort(key=operator.itemgetter('date'))
    return [(k, v)]

And this is called using the standard beam.ParDo like so:

'order_by_dates' >> beam.ParDo(OrderByDate())

The data in the (k, v) tuple looks like this example:

('SOME CODE', {'date':'2017-01-01', 'value':1, 'date':'2016-12-14', 'value':4}) 

With v being the object of dates and values

I have tried switching to a standard lambda function also throws the same error.

Any ideas why this is running differently locally vs on Dataflow? Or suggested work around.

like image 812
Blakey Avatar asked Jan 27 '17 11:01

Blakey


1 Answers

Found a solution, I needed to specifically convert v to a list list(v) before doing the sort, which worked.

Strange the differences between running local vs remote.

like image 114
Blakey Avatar answered Oct 16 '22 21:10

Blakey