Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled

While running kubeflow pipeline having code that uses tensorflow 2.0. below error is displayed at end of each epoch

W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled

Also, after some epochs, it does not show log and shows this error

This step is in Failed state with this message: The node was low on resource: memory. Container main was using 100213872Ki, which exceeds its request of 0. Container wait was using 25056Ki, which exceeds its request of 0.

like image 387
Radhi Avatar asked Jan 31 '20 08:01

Radhi


2 Answers

Upgrading tensorflow from 2.1 to 2.2 fixed this issue for me. I didn't have to go to tf-nightly version.

like image 128
Safwan Avatar answered Sep 21 '22 13:09

Safwan


In my case, I didn't match the batch_size and steps_per_epoch

For example,

his = Test_model.fit_generator(datagen.flow(trainrancrop_images, trainrancrop_labels, batch_size=batchsize),
                               steps_per_epoch=len(trainrancrop_images)/batchsize,
                               validation_data=(test_images, test_labels),
                               epochs=1,
                               callbacks=[callback])

batch_size in the datagen.flow must correspond to the steps_per_epoch in Test_model.fit_generator (actually, I used the wrong value on the steps_per_epoch)

This is one of the cases for the Error, I guess.

As a result, I think the problem arises when there is wrong correspondence on the batch size and steps(iterations)

Maybe the floats can be a problem when you get the step by dividing...

Check your code about this issue.

Good luck :)

like image 25
Jaeyoung Chung Avatar answered Sep 18 '22 13:09

Jaeyoung Chung