Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Canceled future for execute_request message before replies were done

I am running the example Tensorflow convolutional neural network (CNN) code from "Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow" (https://github.com/ageron/handson-ml3). I run it on VS code on Windows 11. When I run the code of Chapter 14 and step it to

fmaps = conv_layer(images)

The kernel crashed, prompting:

Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click here for more info. View Jupyter log for further details.
warn 20:31:46.130: StdErr from Kernel Process 2022-10-12 20:31:46.130634: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8301

error 20:31:46.685: Disposing session as kernel process died ExitCode: 3221226505, Reason: c:\ProgramData\Anaconda3\lib\site-packages\traitlets\traitlets.py:2202: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use 'hmac-sha256' instead of '"hmac-sha256"' if you require traitlets >=5.
  warn(
c:\ProgramData\Anaconda3\lib\site-packages\traitlets\traitlets.py:2157: FutureWarning: Supporting extra quotes around Bytes is deprecated in traitlets 5.0. Use 'c780d88a-4eda-4d9c-96ee-78c547d489d5' instead of 'b"c780d88a-4eda-4d9c-96ee-78c547d489d5"'.
  warn(
2022-10-12 20:30:39.777271: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-12 20:30:40.158222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21670 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:02:00.0, compute capability: 8.6
2022-10-12 20:31:46.130634: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8301

info 20:31:46.685: Dispose Kernel process 17032.
error 20:31:46.685: Raw kernel process exited code: 3221226505
error 20:31:46.686: Error in waiting for cell to complete [Error: Canceled future for execute_request message before replies were done

The CUDA and GPU drivers should have been successfully installed on my Windows system. For instance, when running

N=20000
x1=tf.random.Generator.from_seed(123).normal(shape=(N,N))
x2=tf.random.Generator.from_seed(124).normal(shape=(N,N))
x3=tf.matmul(x1,x2)
y1=np.random.rand(N,N)
y2=np.random.rand(N,N)
y3=np.matmul(y1,y2)

I can see from Windows Task Manager that the GPU is running and the calculation of x3 takes ~2 seconds while the calculation of y3 takes up to minutes.

like image 915
Jimmy Li Avatar asked Oct 21 '25 20:10

Jimmy Li


1 Answers

I am happy to announce that I have solved this issue after searching for many solutions. Finally, after 2 weeks! This problem has been solved by simply installing the Zlib and CuDNN. Please see details here: https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html.

Other people who encountered such a problem may have been caused by running out of memory. Some people reported that they had to downgrade the CuDNN to a historical version to solve this issue. I am using v8.3 for CuDNN (I have not tested the most updated version v8.6).

like image 182
Jimmy Li Avatar answered Oct 24 '25 11:10

Jimmy Li



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!