I generate images of a single coin pasted over a white background of size 200x200. The coin is randomly chosen among 8 euro coin images (one for each coin) and has :
Here are two examples (center markers added): Two dataset examples
I am using Python + Lasagne. I feed the color image into the neural network that has an output layer of 2 linear neurons fully connected, one for x and one for y. The targets associated to the generated coin images are the coordinates (x,y) of the coin center.
I have tried (from Using convolutional neural nets to detect facial keypoints tutorial):
I always used simple SGD, tuning the learning rate trying to have a nice decreasing error curve.
I found that as I train the network, the error decreases until a point where the output is always the center of the image. It looks like the output is independent of the input. It seems that the network output is the average of the targets I give. This behavior looks like a simple minimization of the error since the positions of the coins are uniformly distributed on the image. This is not the wanted behavior.
I have the feeling that the network is not learning but is just trying to optimize the output coordinates to minimize the mean error against the targets. Am I right? How can I prevent this? I tried to remove the bias of the output neurons because I thought maybe I'm just modifying the bias and all others parameters are being set to zero but this didn't work.
Is it possible for a neural network alone to perform well at this task? I have read that one can also train a net for present/not present binary classification and then scan the image to find possible locations of objects. But I just wondered if it was possible just using the forward computation of a neural net.
When my network doesn't learn, I turn off all regularization and verify that the non-regularized network works correctly. Then I add each regularization piece back, and verify that each of those works along the way. L2 regularization (aka weight decay) or L1 regularization is set too large, so the weights can't move.
The input layer of neurons or nodes represents the input x and the number of neurons is the dimension of the input x. The output layer represents the output y and the number of the neurons depends on the nature of the target variable.
Optimize Neural Networks Models are trained by repeatedly exposing the model to examples of input and output and adjusting the weights to minimize the error of the model's output compared to the expected output. This is called the stochastic gradient descent optimization algorithm.
This work proposes the use of artificial neural networks to approximate the objective function in optimization problems to make it possible to apply other techniques to resolve the problem. The objective function is approximated by a non-linear regression that can be used to resolve an optimization problem.
What needs to be done is to re-architect your neural net. A neural net just isn't going to do a good job at predicting an X and Y coordinate. It can through create a heat map of where it detects a coin, or said another way, you could have it turn your color picture into a "coin-here" probability map.
Why? Neurons have a good ability to be used to measure probability, not coordinates. Neural nets are not the magic machines they are sold to be but instead really do follow the program laid out by their architecture. You'd have to lay out a pretty fancy architecture to have the neural net first create an internal space representation of where the coins are, then another internal representation of their center of mass, then another to use the center of mass and the original image size to somehow learn to scale the X coordinate, then repeat the whole thing for Y.
Easier, much easier, is to create a coin detector Convolution that converts your color image to a black and white image of probability-a-coin-is-here matrix. Then use that output for your custom hand written code that turns that probability matrix into an X/Y coordinate.
A resounding YES, so long as you set up the right neural net architecture (like the above), but it would probably be much easier to implement and faster to train if you broke the task into steps and only applied the Neural Net to the coin detection step.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With