I am trying out a recent arxiv work called "Factorized CNN",
which mainly argues that spatially separated convolution (depth-wise convolution), together with channel-wise linear projection(1x1conv), can speed up the convolution operation.
this is the figure for their conv layer architecture
I found out that I can implement this architecture with tf.nn.depthwise_conv2d and 1x1 convolution, or with tf.nn.separable_conv2d.
below is my implementation:
#conv filter for depthwise convolution
depthwise_filter = tf.get_variable("depth_conv_w", [3,3,64,1], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/9/32)))
#conv filter for linear channel projection
pointwise_filter = tf.get_variable("point_conv_w", [1,1,64,64], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/1/64)))
conv_b = tf.get_variable("conv_b", [64], initializer=tf.constant_initializer(0))
#depthwise convolution, with multiplier 1
conv_tensor = tf.nn.relu(tf.nn.depthwise_conv2d(tensor, depthwise_filter, [1,1,1,1], padding='SAME'))
#linear channel projection with 1x1 convolution
conv_tensor = tf.nn.bias_add(tf.nn.conv2d(conv_tensor, pointwise_filter, [1,1,1,1], padding='VALID'), conv_b)
#residual
tensor = tf.add(tensor, conv_tensor)
This should be around 9 times faster than the original 3x3x64 -> 64 channel convolution.
However, I cannot experience any performance improvement.
I must assume that I am doing this wrong, or there's something wrong with tensorflow's implementation.
Since there is few example using depthwise_conv2d, I am leaving this question here.
Is this slow speed normal? or is there any mistake?
Depthwise convolution is a type of convolution in which each input channel is convolved with a different kernel (called a depthwise kernel). You can understand depthwise convolution as the first step in a depthwise separable convolution.
Separable convolutions consist of first performing a depthwise spatial convolution (which acts on each input channel separately) followed by a pointwise convolution which mixes the resulting output channels.
the current implementation of depthwise conv2d is not fully utilizing the parallel power from GPU, you need to wait for a faster implementation in the future, for example, in caffe, there exists faster third-party impl of this kernel https://github.com/yonghenglh6/DepthwiseConvolution
Depthwise convolutions provide significant performance benefits owing to the reduction in both parameters and mult-adds. However, training depthwise convolution layers with GPUs is slow in current deep learning frameworks because their implementations cannot fully utilize the GPU capacity.
https://arxiv.org/pdf/1803.09926.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With