ML Engine Batch Prediction running on wrong python version

Question

enter image description here

So I have a tensorflow model in python 3.5 registered with the ML engine and I want to run a batch prediction job using it. My API request body looks like:

{
  "versionName": "XXXXX/v8_0QSZ",
  "dataFormat": "JSON",
  "inputPaths": [
    "XXXXX"
  ],
  "outputPath": "XXXXXX",
  "region": "us-east1",
  "runtimeVersion": "1.12",
  "accelerator": {
    "count": "1",
    "type": "NVIDIA_TESLA_P100"
  }
}

Then the batch prediction job runs and returns "Job completed successfully.", however, it was completely unsuccessful and consistently threw the following error for each input:

Exception during running the graph: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node convolution_layer/conv1d/conv1d/Conv2D (defined at /usr/local/lib/python2.7/dist-packages/google/cloud/ml/prediction/frameworks/tf_prediction_lib.py:210) = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](convolution_layer/conv1d/conv1d/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, convolution_layer/conv1d/conv1d/ExpandDims_1)]] [[{{node Cast_6/_495}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_789_Cast_6", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

My questions are:

Why does the batch job report success when in reality it completely failed?
In the exception above it mentions python 2.7... yet the model is registered as python 3.5 and there is no way to specify the python version using the API. Why is the batch prediction using 2.7?
What in general can I do to make this work?
Does this have anything to do with my accelerator option?

Andrew Cassidy · Accepted Answer

Response from batch prediction dev: "we don't officially support Python 3 yet. However, the issue you're encountering is a known bug affecting our GPU runtimes for TF 1.11 and 1.12

ML Engine Batch Prediction running on wrong python version

Tags:

python

tensorflow

google-cloud-platform

google-cloud-ml

Andrew Cassidy

1 Answers

Andrew Cassidy

Recent Activity

Donate For Us

ML Engine Batch Prediction running on wrong python version

Tags:

python

tensorflow

google-cloud-platform

google-cloud-ml

Andrew Cassidy

1 Answers

Andrew Cassidy

Related questions

Recent Activity

Donate For Us