Why my one-filter convolutional neural network is unable to learn a simple gaussian kernel?

Tags:

I was surprised that the deep learning algorithms I had implemented did not work, and I decided to create a very simple example, to understand the functioning of CNN better. Here is my attempt of constructing a small CNN for a very simple task, which provides unexpected results.

I have implemented a simple CNN with only one layer of one filter. I have created a dataset of 5000 samples, the inputs x being 256x256 simulated images, and the outputs y being the corresponding blurred images (y = signal.convolvded2d(x,gaussian_kernel,boundary='fill',mode='same')). Thus, I would like my CNN to learn the convolutional filter which would transform the original image into its blurred version. In other words, I would like my CNN to recover the gaussian filter I used to create the blurred images. Note: As I want to 'imitate' the convolution process such as it is described in the mathematical framework, I am using a gaussian filter which has the same size as my images: 256x256.

It seems to me quite an easy task, and nonetheless, the CNN is unable to provide the results I would expect. Please find below the code of my training function and the results.

# Parameters
size_image = 256
normalization = 1 
sigma = 7

n_train = 4900
ind_samples_training =np.linspace(1, n_train, n_train).astype(int)
nb_epochs = 5
minibatch_size = 5
learning_rate = np.logspace(-3,-5,nb_epochs)

tf.reset_default_graph()
tf.set_random_seed(1)                             
seed = 3                                       

n_train = len(ind_samples_training)   

costs = []                                        

# Create Placeholders of the correct shape
X = tf.placeholder(tf.float64, shape=(None, size_image, size_image, 1), name = 'X')
Y_blur_true = tf.placeholder(tf.float64, shape=(None, size_image, size_image, 1), name = 'Y_true')
learning_rate_placeholder = tf.placeholder(tf.float32, shape=[])

# parameters to learn --should be an approximation of the gaussian filter 
filter_to_learn = tf.get_variable('filter_to_learn',\
                                    shape = [size_image,size_image,1,1],\
                                    dtype = tf.float64,\
                                    initializer = tf.contrib.layers.xavier_initializer(seed = 0),\
                                    trainable = True)

# Forward propagation: Build the forward propagation in the tensorflow graph
Y_blur_hat = tf.nn.conv2d(X, filter_to_learn, strides = [1,1,1,1], padding = 'SAME')

# Cost function: Add cost function to tensorflow graph
cost = tf.losses.mean_squared_error(Y_blur_true,Y_blur_hat,weights=1.0)

# Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer that minimizes the cost.
opt_adam = tf.train.AdamOptimizer(learning_rate=learning_rate_placeholder)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    optimizer = opt_adam.minimize(cost)

# Initialize all the variables globally
init = tf.global_variables_initializer()

lr = learning_rate[0]
# Start the session to compute the tensorflow graph
with tf.Session() as sess:

    # Run the initialization
    sess.run(init)

    # Do the training loop
    for epoch in range(nb_epochs):

        minibatch_cost = 0.
        seed = seed + 1

        permutation = list(np.random.permutation(n_train))
        shuffled_ind_samples = np.array(ind_samples_training)[permutation]

        # Learning rate update
        if learning_rate.shape[0]>1:
            lr = learning_rate[epoch]

        nb_minibatches = int(np.ceil(n_train/minibatch_size))

        for num_minibatch in range(nb_minibatches):

            # Minibatch indices
            ind_minibatch = shuffled_ind_samples[num_minibatch*minibatch_size:(num_minibatch+1)*minibatch_size]

            # Loading of the original image (X) and the blurred image (Y)
            minibatch_X, minibatch_Y  = load_dataset_blur(ind_minibatch,size_image, normalization, sigma)

            _ , temp_cost, filter_learnt = sess.run([optimizer,cost,filter_to_learn],\
                feed_dict = {X:minibatch_X, Y_blur_true:minibatch_Y, learning_rate_placeholder: lr})

I have run the training on 5 epochs of 4900 samples, with a batch size equal to 5. The gaussian kernel has a variance of 7^2=49. I have tried to initialize the filter to be learnt both with the xavier initiliazer method provided by tensorflow, and with the true values of the gaussian kernel we actually would like to learn. In both cases, the filter that is learnt results too different from the true gaussian one as it can be seen on the two images available at https://github.com/megalinier/Helsinki-project.

340

asked Jun 12 '19 08:06

0spirit0

2 Answers

By examining the photos it seems like the network is learning OK, as the predicted image is not so far off the true label - for better results you can tweak some hyperparams but that is not the case.

I think what you are missing is the fact that different kernels can get quite similar results since it is a convolution. Think about it, you are multiplying some matrix with another, and then summing all the results to create a new pixel. Now if the true label sum is 10, it could be a results of 2.5 + 2.5 + 2.5 + 2.5 and -10 + 10 + 10 + 0. What I am trying to say, is that your network could be learning just fine, but you will get a different values in the conv kernel than the filter.

126

answered Oct 25 '22 18:10

bluesummers

I think this would better serve as a comment as it's somewhat speculative, but it's too long...

Hard to say what exactly is wrong but there could be multiple culprits here. For one, squared error provides a weak signal in the case that target and prediction are already quite similar -- and while the xavier-initalized filter looks quite bad, the predicted (filtered) image isn't too far off the target. You could experiment with other metrics such as absolute error (e.g. 1-norm instead of 2-norm).

Second, adding regularization should help, i.e. add a weight penalty to the loss function to encourage the filter values to become small where they are not needed. As it is, what I suppose happens is: The random values in the filter average out to about 0, leading to a similar "filtering" effect as if they were actually all 0. As such, the learning algorithm doesn't have much incentive to actually pull them to 0. By adding a weight penalty, you provide this incentive.

Third, it could just be Adam messing up. It is known to provide "strange" non-optimal solutions in some very simple (e.g. convex) problems. Maybe try default Gradient Descent with learning rate decay (and possibly momentum).

answered Oct 25 '22 18:10

xdurch0

Related questions
                            
                                Create new django project in pycharm: "Remote path not provided"
                            
                                Return a response with a list of serializers Django REST Framework
                            
                                Reading Gmail Email in Python
                            
                                How to categorize a range of values in Pandas DataFrame
                            
                                Testing Django FileResponse
                            
                                AttributeError: module 'pyproj' has no attribute 'pyproj_datadir'
                            
                                Runtime error with python code online, works offline
                            
                                Decorating a property: right order
                            
                                Multi-feature causal CNN - Keras implementation
                            
                                How to send custom headers in a Scrapy Splash request?
                            
                                Converting igraph to networkx for clustering
                            
                                Conda install takes forever (stuck as SAT solver)
                            
                                django-taggit not working when using UUID
                            
                                How to have a mix of both Celery Executor and Kubernetes Executor in Apache Airflow?
                            
                                Access Google Trends Data without a wrapper, or with the API: Python
                            
                                Why does python round(np.float16(np.pi),5) return infinity? Bug, limitation, or expected?
                            
                                How can gitlab-CI install private python packages from a gitlab dependency that also refers to gitlab repositories
                            
                                Effective-Date-Range One-Hot-Encode groupby
                            
                                Error state Kalman Filter from MATLAB to Python
                            
                                Not found: Container localhost does not exist when I load model with tensorflow and flask

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why my one-filter convolutional neural network is unable to learn a simple gaussian kernel?

Tags:

python

tensorflow

0spirit0

People also ask

2 Answers

bluesummers

xdurch0

Recent Activity

Donate For Us