tf.nn.depthwise_conv2d is too slow. is it normal?

Tags:

tensorflow

I am trying out a recent arxiv work called "Factorized CNN",

which mainly argues that spatially separated convolution (depth-wise convolution), together with channel-wise linear projection(1x1conv), can speed up the convolution operation.

this is the figure for their conv layer architecture

I found out that I can implement this architecture with tf.nn.depthwise_conv2d and 1x1 convolution, or with tf.nn.separable_conv2d.

below is my implementation:

#conv filter for depthwise convolution
depthwise_filter = tf.get_variable("depth_conv_w", [3,3,64,1], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/9/32)))
#conv filter for linear channel projection
pointwise_filter = tf.get_variable("point_conv_w", [1,1,64,64], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/1/64)))
conv_b = tf.get_variable("conv_b", [64], initializer=tf.constant_initializer(0))
#depthwise convolution, with multiplier 1
conv_tensor = tf.nn.relu(tf.nn.depthwise_conv2d(tensor, depthwise_filter, [1,1,1,1], padding='SAME'))
#linear channel projection with 1x1 convolution
conv_tensor = tf.nn.bias_add(tf.nn.conv2d(conv_tensor, pointwise_filter, [1,1,1,1], padding='VALID'), conv_b)
#residual
tensor = tf.add(tensor, conv_tensor)

This should be around 9 times faster than the original 3x3x64 -> 64 channel convolution.

However, I cannot experience any performance improvement.

I must assume that I am doing this wrong, or there's something wrong with tensorflow's implementation.

Since there is few example using depthwise_conv2d, I am leaving this question here.

Is this slow speed normal? or is there any mistake?

210

asked Sep 07 '16 11:09

2 Answers

the current implementation of depthwise conv2d is not fully utilizing the parallel power from GPU, you need to wait for a faster implementation in the future, for example, in caffe, there exists faster third-party impl of this kernel https://github.com/yonghenglh6/DepthwiseConvolution

answered Sep 28 '22 19:09

Zaikun Xu

Depthwise convolutions provide significant performance benefits owing to the reduction in both parameters and mult-adds. However, training depthwise convolution layers with GPUs is slow in current deep learning frameworks because their implementations cannot fully utilize the GPU capacity.

https://arxiv.org/pdf/1803.09926.pdf

answered Sep 28 '22 18:09

mrgloom

Related questions
                            
                                Postgres+SQLAlchemy converting time to UTC when using default=func.now()
                            
                                nose.tools.eq_ vs assertEqual
                            
                                Interactive plots placement in ipython notebook widget
                            
                                Default the root view in cherrypy
                            
                                How to apply SWIG OUTPUT typemaps for class types in Python?
                            
                                How to resolve "cassandra.cluster.NoHostAvailable" in a Python multi threaded program
                            
                                How do you parse and inject additional nodes in a Jinja extension?
                            
                                SockJS Python Client
                            
                                Django Storages using s3boto ignoring MEDIA_URL
                            
                                NLTK ViterbiParser fails in parsing words that are not in the PCFG rule
                            
                                Free memory during loop
                            
                                more efficient wind tunnel simulation in Pygame, using numpy
                            
                                Why does tf.Print() does not print in tensorflow
                            
                                'super' object has no attribute '__eq__'
                            
                                How to get the first canonical correlation from sklearn's CCA module?
                            
                                Deleting py.test tmpdir directory after successful test case
                            
                                With Jinja2 and Babel, how do I translate sentences containing HTML tags?
                            
                                How to obtain detailed device / partition info from file path on Linux (like UUID, hard drive serial etc.)
                            
                                How can I do begin transaction in pymysql ? (mysql)
                            
                                Why is the GHC test suite written in Python, not Haskell?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

tf.nn.depthwise_conv2d is too slow. is it normal?

Tags:

python

tensorflow

Jong Chan Park

People also ask

2 Answers

Zaikun Xu

mrgloom

Recent Activity

Donate For Us