As the following piece of code shows, the tensorflow tf.nn.dilation2D
function doesn't behave as a conventional dilation operator.
import tensorflow as tf
tf.InteractiveSession()
A = [[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]]
kernel = tf.ones((3,3,1))
input4D = tf.cast(tf.expand_dims(tf.expand_dims(A, -1), 0), tf.float32)
output4D = tf.nn.dilation2d(input4D, filter=kernel, strides=(1,1,1,1), rates=(1,1,1,1), padding="SAME")
print(tf.cast(output4D[0,:,:,0], tf.int32).eval())
Returns the following tensor:
array([[1, 1, 1, 2, 2, 2, 1],
[1, 1, 2, 2, 2, 2, 2],
[1, 1, 2, 2, 2, 2, 2],
[1, 1, 2, 2, 2, 2, 2],
[1, 1, 1, 2, 2, 2, 1],
[1, 1, 1, 1, 1, 1, 1]], dtype=int32)
I don't understand neither why it behaves like that, neither how I should use tf.nn.dilation2d
to retrieve the expected output:
array([[0, 0, 0, 1, 1, 1, 0],
[0, 0, 1, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0]], dtype=int32)
Can someone enlighten the succinct documentation of tensorflow and give an explanation of what the the tf.nn.dilation2D
function does ?
Dilation adds pixels to the boundaries of objects in an image, while erosion removes pixels on object boundaries. The number of pixels added or removed from the objects in an image depends on the size and shape of the structuring element used to process the image.
Morphological Transformations are simple operations based on the shape of an image usually performed on a binary image. It takes our input image and a structuring element(kernel) which decides the nature of the operation.
Morphological transformations are some simple operations based on the image shape. It is normally performed on binary images. It needs two inputs, one is our original image, second one is called structuring element or kernel which decides the nature of operation.
The binary dilation of an image by a structuring element is the locus of the points covered by the structuring element, when its center lies within the non-zero points of the image.
As mentioned in the documentation page linked,
Computes the grayscale dilation of 4-D input and 3-D filter tensors.
and
In detail, the grayscale morphological 2-D dilation is the max-sum correlation [...]
What this means is that the kernel's values are added to the image's values at each position, then the maximum value is taken as the output value.
Compare this to correlation, replacing the multiplication with an addition, and the integral (or sum) with the maximum:
convolution: g(t) = ∫ f(𝜏) h(𝜏-t) d𝜏
dilation: g(t) = max𝜏 { f(𝜏) + h(𝜏-t) }
Or in the discrete world:
convolution: g[n] = ∑kf[k] h[k-n]
dilation: g[n] = maxk { f[k] + h[k-n] }
The dilation with a binary structuring element (kernel, what the question refers to as a “conventional dilation”) uses a structuring element (kernel) that contains only 1s and 0s. These indicate “included” and “excluded”. That is, the 1s determine the domain of the structuring element.
To recreate the same behavior with a grey-value dilation, set the “included” pixels to 0 and the “excluded” pixels to minus infinity.
For example, the 3x3 square structuring element used in the question should be a 3x3 matrix of zeros.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With