CancelledError: [_Derived_]RecvAsync is cancelled

Question

I am having an issue. I run the same code on my local machine with CPU and Tensorflow 1.14.0. It works fine. However, when I run it on GPU with Tensorflow 2.0, I get

CancelledError:  [_Derived_]RecvAsync is cancelled.      [[{{node Adam/Adam/update/AssignSubVariableOp/_65}}]]   [[Reshape_13/_62]] [Op:__inference_distributed_function_3722]

Function call stack: distributed_function

Reproducible code is here:

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
print(tf.__version__)

import matplotlib.pyplot as plt
%matplotlib inline

batch_size = 32
num_obs = 100
num_cats = 1 # number of categorical features
n_steps = 10 # number of timesteps in each sample
n_numerical_feats = 18 # number of numerical features in each sample
cat_size = 12 # number of unique categories in each categorical feature
embedding_size = 1 # embedding dimension for each categorical feature

labels =  np.random.random(size=(num_obs*n_steps,1)).reshape(-1,n_steps,1)
print(labels.shape)
#(100, 10, 1)

#3 numerical variable
num_data = np.random.random(size=(num_obs*n_steps,n_numerical_feats))
print(num_data.shape)
#(1000, 3)
#Reshaping numeric features to fit into an LSTM network
features = num_data.reshape(-1,n_steps, n_numerical_feats)
print(features.shape)
#(100, 10, 3)

#one categorical variables with 4 levels
cat_data = np.random.randint(0,cat_size,num_obs*n_steps)
print(cat_data.shape)
#(1000,)
idx = cat_data.reshape(-1, n_steps)
print(idx.shape)
#(100, 10)

numerical_inputs = keras.layers.Input(shape=(n_steps, n_numerical_feats), name='numerical_inputs', dtype='float32')
#<tf.Tensor 'numerical_inputs:0' shape=(?, 10, 36) dtype=float32>

cat_input = keras.layers.Input(shape=(n_steps,), name='cat_input')
#<tf.Tensor 'cat_input:0' shape=(None, 10) dtype=float32>

cat_embedded = keras.layers.Embedding(cat_size, embedding_size, embeddings_initializer='uniform')(cat_input)
#<tf.Tensor 'embedding_1/Identity:0' shape=(None, 10, 1) dtype=float32>

merged = keras.layers.concatenate([numerical_inputs, cat_embedded])
#<tf.Tensor 'concatenate_1/Identity:0' shape=(None, 10, 37) dtype=float32>

lstm_out = keras.layers.LSTM(64, return_sequences=True)(merged)
#<tf.Tensor 'lstm_2/Identity:0' shape=(None, 10, 64) dtype=float32>

Dense_layer1 = keras.layers.Dense(32, activation='relu', use_bias=True)(lstm_out)
#<tf.Tensor 'dense_4/Identity:0' shape=(None, 10, 32) dtype=float32>
Dense_layer2 = keras.layers.Dense(1, activation='linear', use_bias=True)(Dense_layer1 )
#<tf.Tensor 'dense_5/Identity:0' shape=(None, 10, 1) dtype=float32>

model = keras.models.Model(inputs=[numerical_inputs, cat_input], outputs=Dense_layer2)

#compile model
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss='mse',
              optimizer=optimizer,
              metrics=['mae', 'mse'])
EPOCHS =5

#fit the model
#you can use input layer names instead
history = model.fit([features, idx], 
                    y = labels,
                    epochs=EPOCHS,
                    batch_size=batch_size)

Does anyone have similar issues? Obviously this is a bug but i do not know how to come around because I want to use Tensorflow 2.0.

yao he · Accepted Answer

I found that tensorflow-gpu2.0.0 was compiled with cuda7.6.0.

Then I update my cuda from 7.4.2 to 7.6.4, the problem solved.

Update cuda to 7.6.2;
Use TF_FORCE_GPU_ALLOW_GROWTH=true to force allow GPU growth.

Vinay Verma · Answer

I have faced similar issues, these steps may help you with code on tf2.0

Check the GPU memory, make sure nothing else is running on it.
Put and Run this script before importing Keras or Tensorflow, Restart runtime then first execute this.

import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
gpu = gpus[0]

tf.config.experimental.set_memory_growth(gpu, True)

Try Reducing your model size, batch size if possible. Until it works.

CancelledError: [_Derived_]RecvAsync is cancelled

Tags:

tensorflow

tensorflow2.0

keras-2

tf.keras

ARAT

2 Answers

yao he

Vinay Verma

Recent Activity

Donate For Us

CancelledError: [_Derived_]RecvAsync is cancelled

Tags:

tensorflow

tensorflow2.0

keras-2

tf.keras

ARAT

2 Answers

yao he

Vinay Verma

Related questions

Recent Activity

Donate For Us