As the following piece of code shows, the tensorflow <code>tf.nn.dilation2D</code> function doesn't behave as a conventional dilation operator. <pre class="prettyprint"><code>import tensorflow as tf tf.InteractiveSession() A = [[0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 1, 1, 1, 0], [0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0]] kernel = tf.ones((3,3,1)) input4D = tf.cast(tf.expand_dims(tf.expand_dims(A, -1), 0), tf.float32) output4D = tf.nn.dilation2d(input4D, filter=kernel, strides=(1,1,1,1), rates=(1,1,1,1), padding="SAME") print(tf.cast(output4D[0,:,:,0], tf.int32).eval()) </code></pre> Returns the following tensor: <pre class="prettyprint"><code>array([[1, 1, 1, 2, 2, 2, 1], [1, 1, 2, 2, 2, 2, 2], [1, 1, 2, 2, 2, 2, 2], [1, 1, 2, 2, 2, 2, 2], [1, 1, 1, 2, 2, 2, 1], [1, 1, 1, 1, 1, 1, 1]], dtype=int32) </code></pre> I don't understand neither why it behaves like that, neither how I should use <code>tf.nn.dilation2d</code> to retrieve the expected output: <pre class="prettyprint"><code>array([[0, 0, 0, 1, 1, 1, 0], [0, 0, 1, 1, 1, 1, 1], [0, 0, 1, 1, 1, 1, 1], [0, 0, 1, 1, 1, 1, 1], [0, 0, 0, 1, 1, 1, 0], [0, 0, 0, 0, 0, 0, 0]], dtype=int32) </code></pre> Can someone enlighten the succinct documentation of tensorflow and give an explanation of what the the <code>tf.nn.dilation2D</code> function does ?

As mentioned in the documentation page linked, <blockquote> Computes the grayscale dilation of 4-D input and 3-D filter tensors. </blockquote> and <blockquote> In detail, the grayscale morphological 2-D dilation is the max-sum correlation [...] </blockquote> What this means is that the kernel's values are added to the image's values at each position, then the maximum value is taken as the output value. Compare this to correlation, replacing the multiplication with an addition, and the integral (or sum) with the maximum: convolution: g(t) = ∫ f(𝜏) h(𝜏-t) d𝜏 dilation: g(t) = max𝜏 { f(𝜏) + h(𝜏-t) } Or in the discrete world: convolution: g[n] = ∑kf[k] h[k-n] dilation: g[n] = maxk { f[k] + h[k-n] } <hr> The dilation with a binary structuring element (kernel, what the question refers to as a “conventional dilation”) uses a structuring element (kernel) that contains only 1s and 0s. These indicate “included” and “excluded”. That is, the 1s determine the domain of the structuring element. To recreate the same behavior with a grey-value dilation, set the “included” pixels to 0 and the “excluded” pixels to minus infinity. For example, the 3x3 square structuring element used in the question should be a 3x3 matrix of zeros.

Tensorflow dilation behave differently than morphological dilation

Tags:

python

image-processing

tensorflow

image-morphology

mathematical-morphology

As the following piece of code shows, the tensorflow tf.nn.dilation2D function doesn't behave as a conventional dilation operator.

import tensorflow as tf
tf.InteractiveSession()
A = [[0, 0, 0, 0, 0, 0, 0],
     [0, 0, 0, 0, 1, 0, 0],
     [0, 0, 0, 1, 1, 1, 0],
     [0, 0, 0, 0, 1, 0, 0],
     [0, 0, 0, 0, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0]]
kernel = tf.ones((3,3,1))
input4D = tf.cast(tf.expand_dims(tf.expand_dims(A, -1), 0), tf.float32)
output4D = tf.nn.dilation2d(input4D, filter=kernel, strides=(1,1,1,1), rates=(1,1,1,1), padding="SAME")
print(tf.cast(output4D[0,:,:,0], tf.int32).eval())

Returns the following tensor:

array([[1, 1, 1, 2, 2, 2, 1],
       [1, 1, 2, 2, 2, 2, 2],
       [1, 1, 2, 2, 2, 2, 2],
       [1, 1, 2, 2, 2, 2, 2],
       [1, 1, 1, 2, 2, 2, 1],
       [1, 1, 1, 1, 1, 1, 1]], dtype=int32)

I don't understand neither why it behaves like that, neither how I should use tf.nn.dilation2d to retrieve the expected output:

array([[0, 0, 0, 1, 1, 1, 0],
       [0, 0, 1, 1, 1, 1, 1],
       [0, 0, 1, 1, 1, 1, 1],
       [0, 0, 1, 1, 1, 1, 1],
       [0, 0, 0, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 0, 0]], dtype=int32)

Can someone enlighten the succinct documentation of tensorflow and give an explanation of what the the tf.nn.dilation2D function does ?

856

asked Feb 14 '19 09:02

Jav

1 Answers

As mentioned in the documentation page linked,

Computes the grayscale dilation of 4-D input and 3-D filter tensors.

and

In detail, the grayscale morphological 2-D dilation is the max-sum correlation [...]

What this means is that the kernel's values are added to the image's values at each position, then the maximum value is taken as the output value.

Compare this to correlation, replacing the multiplication with an addition, and the integral (or sum) with the maximum:

convolution: g(t) = ∫ f(𝜏) h(𝜏-t) d𝜏

dilation: g(t) = max_𝜏 { f(𝜏) + h(𝜏-t) }

Or in the discrete world:

convolution: g[n] = ∑_kf[k] h[k-n]

dilation: g[n] = max_k { f[k] + h[k-n] }

The dilation with a binary structuring element (kernel, what the question refers to as a “conventional dilation”) uses a structuring element (kernel) that contains only 1s and 0s. These indicate “included” and “excluded”. That is, the 1s determine the domain of the structuring element.

To recreate the same behavior with a grey-value dilation, set the “included” pixels to 0 and the “excluded” pixels to minus infinity.

For example, the 3x3 square structuring element used in the question should be a 3x3 matrix of zeros.

177

answered Oct 27 '22 20:10

Cris Luengo

Related questions
                            
                                python - Variable scope after using a 'with' statement [duplicate]
                            
                                python-gdb error: Python Exception <class 'RuntimeError'> Type does not have a target
                            
                                Flask - Get the name of an uploaded file minus the file extension
                            
                                Convert list of dicts to CSV in Python 3
                            
                                Python execute playsound in separate thread
                            
                                Why do we need __init__ to initialize a python class
                            
                                Basic pattern recognition in binary (pixelated) image
                            
                                Why do I get error while trying to build an architecture with multiple inputs in Keras?
                            
                                How to set `spark.driver.memory` in client mode - pyspark (version 2.3.1)
                            
                                Pandas: merge data frame but summing overlapping columns
                            
                                How to store functions as class variables in python?
                            
                                Pandas cannot read parquet files created in PySpark
                            
                                Calculate screen DPI
                            
                                Plotting a dataframe with seaborn.pairplot() in multiple colors?
                            
                                Create iterator to return elements from each iterable one by one
                            
                                Using TFRecords with keras
                            
                                Django: ConnectionAbortedError: [WinError 10053] An established connection was aborted by the software in your host machine
                            
                                conda equivalent of pip install
                            
                                Jupyter: How to change color for widgets like SelectMultiple()?
                            
                                Class with only class methods

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With