Tensorflow quantization

Question

I would like to optimize a graph using Tensorflow's transform_graph tool. I tried optimizing the graph from MultiNet (and others with similar encoder-decoder architectures). However, the optimized graph is actually slower when using quantize_weights, and even much slower when using quantize_nodes. From Tensorflow's documentation, there may be no improvements, or it may even be slower, when quantizing. Any idea if this is normal with the graph/software/hardware below?

Here is my system information for your reference:

OS Platform and Distribution: Linux Ubuntu 16.04
TensorFlow installed from: using TF source code (CPU) for graph conversion, using binary-python(GPU) for inference
TensorFlow version: both using r1.3
Python version: 2.7
Bazel version: 0.6.1
CUDA/cuDNN version: 8.0/6.0 (inference only)
GPU model and memory: GeForce GTX 1080 Ti

I can post all the scripts used to reproduce if necessary.

Benjamin Tan Wei Hao · Accepted Answer

It seems like quantization in Tensorflow only happens on CPUs. See: https://github.com/tensorflow/tensorflow/issues/2807

Tensorflow quantization

Tags:

tensorflow

tensorflow-gpu

YannickB

1 Answers

Benjamin Tan Wei Hao

Recent Activity

Donate For Us

Tensorflow quantization

Tags:

tensorflow

tensorflow-gpu

YannickB

1 Answers

Benjamin Tan Wei Hao

Related questions

Recent Activity

Donate For Us