Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Refusing to split GroupedShuffleRangeTracker proposed split position is out of range

I am sporadically getting the following errors:

W Refusing to split at '\x00\x00\x00\x15\xbc\x19)b\x00\x01': proposed split position is out of range ['\x00\x00\x00\x15\x00\xff\x00\xff\x00\xff\x00\xff\x00\x01', '\x00\x00\x00\x15\xbc\x19)b\x00\x01'). Position of last group processed was '\x00\x00\x00\x15\xbc\x19)a\x00\x01'.

When it happens, the error is logged every so often and the job never seems to end. Although it seems that it did actually complete the job otherwise.

In the last instance I am using 10 workers and have auto scaling disabled. I am using the Python implementation of Apache Beam.

like image 370
de1 Avatar asked Jan 29 '18 22:01

de1


1 Answers

This is not an error, it's part of normal operation of a pipeline. We should probably reduce its logging level to INFO and rephrase it, because it very frequently confuses people.

This message (rather obscurely) signals that Dataflow is trying to apply dynamic rebalancing, but there's no work that can be further subdivided.

I.e. your job is stuck doing something non-parallelizable on a small number of workers, while other workers are staying idle. To investigate this further, one would need to look at the code of your job and the Dataflow job id.

like image 87
jkff Avatar answered Sep 28 '22 00:09

jkff