Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to find object coordinates (x,y) in image, my neural network seems to optimize error without learning [closed]

I generate images of a single coin pasted over a white background of size 200x200. The coin is randomly chosen among 8 euro coin images (one for each coin) and has :

  • random rotation ;
  • random size (bewteen fixed bounds) ;
  • random position (so that the coin is not cropped).

Here are two examples (center markers added): Two dataset examples

I am using Python + Lasagne. I feed the color image into the neural network that has an output layer of 2 linear neurons fully connected, one for x and one for y. The targets associated to the generated coin images are the coordinates (x,y) of the coin center.

I have tried (from Using convolutional neural nets to detect facial keypoints tutorial):

  • Dense layer architecture with various number of layers and number of units (500 max) ;
  • Convolution architecture (with 2 dense layers before output) ;
  • Sum or mean of squared difference (MSE) as loss function ;
  • Target coordinates in the original range [0,199] or normalized [0,1] ;
  • Dropout layers between layers, with dropout probability of 0.2.

I always used simple SGD, tuning the learning rate trying to have a nice decreasing error curve.

I found that as I train the network, the error decreases until a point where the output is always the center of the image. It looks like the output is independent of the input. It seems that the network output is the average of the targets I give. This behavior looks like a simple minimization of the error since the positions of the coins are uniformly distributed on the image. This is not the wanted behavior.

I have the feeling that the network is not learning but is just trying to optimize the output coordinates to minimize the mean error against the targets. Am I right? How can I prevent this? I tried to remove the bias of the output neurons because I thought maybe I'm just modifying the bias and all others parameters are being set to zero but this didn't work.

Is it possible for a neural network alone to perform well at this task? I have read that one can also train a net for present/not present binary classification and then scan the image to find possible locations of objects. But I just wondered if it was possible just using the forward computation of a neural net.

like image 254
Silicium14 Avatar asked Jan 24 '16 17:01

Silicium14


People also ask

What should we do if neural network is not learning?

When my network doesn't learn, I turn off all regularization and verify that the non-regularized network works correctly. Then I add each regularization piece back, and verify that each of those works along the way. L2 regularization (aka weight decay) or L1 regularization is set too large, so the weights can't move.

What is X and Y in neural network?

The input layer of neurons or nodes represents the input x and the number of neurons is the dimension of the input x. The output layer represents the output y and the number of the neurons depends on the nature of the target variable.

How do you optimize a neural network structure?

Optimize Neural Networks Models are trained by repeatedly exposing the model to examples of input and output and adjusting the weights to minimize the error of the model's output compared to the expected output. This is called the stochastic gradient descent optimization algorithm.

Can neural networks be used for optimization?

This work proposes the use of artificial neural networks to approximate the objective function in optimization problems to make it possible to apply other techniques to resolve the problem. The objective function is approximated by a non-linear regression that can be used to resolve an optimization problem.


1 Answers

Question : How can I prevent this [overfitting without improvement to test scores]?

What needs to be done is to re-architect your neural net. A neural net just isn't going to do a good job at predicting an X and Y coordinate. It can through create a heat map of where it detects a coin, or said another way, you could have it turn your color picture into a "coin-here" probability map.

Why? Neurons have a good ability to be used to measure probability, not coordinates. Neural nets are not the magic machines they are sold to be but instead really do follow the program laid out by their architecture. You'd have to lay out a pretty fancy architecture to have the neural net first create an internal space representation of where the coins are, then another internal representation of their center of mass, then another to use the center of mass and the original image size to somehow learn to scale the X coordinate, then repeat the whole thing for Y.

Easier, much easier, is to create a coin detector Convolution that converts your color image to a black and white image of probability-a-coin-is-here matrix. Then use that output for your custom hand written code that turns that probability matrix into an X/Y coordinate.

Question : Is it possible for a neural network alone to perform well at this task?

A resounding YES, so long as you set up the right neural net architecture (like the above), but it would probably be much easier to implement and faster to train if you broke the task into steps and only applied the Neural Net to the coin detection step.

like image 166
Anton Codes Avatar answered Oct 03 '22 23:10

Anton Codes