Trying to find object coordinates (x,y) in image, my neural network seems to optimize error without learning [closed]

Tags:

I generate images of a single coin pasted over a white background of size 200x200. The coin is randomly chosen among 8 euro coin images (one for each coin) and has :

random rotation ;
random size (bewteen fixed bounds) ;
random position (so that the coin is not cropped).

Here are two examples (center markers added): Two dataset examples

I am using Python + Lasagne. I feed the color image into the neural network that has an output layer of 2 linear neurons fully connected, one for x and one for y. The targets associated to the generated coin images are the coordinates (x,y) of the coin center.

I have tried (from Using convolutional neural nets to detect facial keypoints tutorial):

Dense layer architecture with various number of layers and number of units (500 max) ;
Convolution architecture (with 2 dense layers before output) ;
Sum or mean of squared difference (MSE) as loss function ;
Target coordinates in the original range [0,199] or normalized [0,1] ;
Dropout layers between layers, with dropout probability of 0.2.

I always used simple SGD, tuning the learning rate trying to have a nice decreasing error curve.

I found that as I train the network, the error decreases until a point where the output is always the center of the image. It looks like the output is independent of the input. It seems that the network output is the average of the targets I give. This behavior looks like a simple minimization of the error since the positions of the coins are uniformly distributed on the image. This is not the wanted behavior.

I have the feeling that the network is not learning but is just trying to optimize the output coordinates to minimize the mean error against the targets. Am I right? How can I prevent this? I tried to remove the bias of the output neurons because I thought maybe I'm just modifying the bias and all others parameters are being set to zero but this didn't work.

Is it possible for a neural network alone to perform well at this task? I have read that one can also train a net for present/not present binary classification and then scan the image to find possible locations of objects. But I just wondered if it was possible just using the forward computation of a neural net.

254

asked Jan 24 '16 17:01

Silicium14

1 Answers

Question : How can I prevent this [overfitting without improvement to test scores]?

What needs to be done is to re-architect your neural net. A neural net just isn't going to do a good job at predicting an X and Y coordinate. It can through create a heat map of where it detects a coin, or said another way, you could have it turn your color picture into a "coin-here" probability map.

Why? Neurons have a good ability to be used to measure probability, not coordinates. Neural nets are not the magic machines they are sold to be but instead really do follow the program laid out by their architecture. You'd have to lay out a pretty fancy architecture to have the neural net first create an internal space representation of where the coins are, then another internal representation of their center of mass, then another to use the center of mass and the original image size to somehow learn to scale the X coordinate, then repeat the whole thing for Y.

Easier, much easier, is to create a coin detector Convolution that converts your color image to a black and white image of probability-a-coin-is-here matrix. Then use that output for your custom hand written code that turns that probability matrix into an X/Y coordinate.

Question : Is it possible for a neural network alone to perform well at this task?

A resounding YES, so long as you set up the right neural net architecture (like the above), but it would probably be much easier to implement and faster to train if you broke the task into steps and only applied the Neural Net to the coin detection step.

166

answered Oct 03 '22 23:10

Anton Codes

Related questions
                            
                                Tensorflow GPU utilization only 60% (GTX 1070)
                            
                                Neural Network to predict nth square
                            
                                Getting Started with Neural Networks (ANN)?
                            
                                Is it possible to split a network across multiple GPUs in tensorflow?
                            
                                Implement K-fold cross validation in MLPClassification Python
                            
                                Printing out the validation accuracy to the console for every batch or epoch (Keras)
                            
                                How to measure overfitting when train and validation sample is small in Keras model
                            
                                Neural Network "Breeding"
                            
                                Java: micro-optimizing array manipulation
                            
                                what exactly does 'tf.contrib.rnn.DropoutWrapper'' in tensorflow do? ( three citical questions)
                            
                                List of activation functions in C#
                            
                                Proper way to implement biases in Neural Networks
                            
                                Wrap CNTK Applications
                            
                                Set half of the filters of a layer as not trainable keras/tensorflow
                            
                                Keras or Tensorflow function to draw a 3D diagram of a neural network structure?
                            
                                Can a neural network be used to find a functions minimum(a)?
                            
                                How to do supervised deepbelief training in PyBrain?
                            
                                Audio signal source separation with neural network
                            
                                number of parameters in Caffe LENET or Imagenet models
                            
                                Does Word2Vec has a hidden layer?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Trying to find object coordinates (x,y) in image, my neural network seems to optimize error without learning [closed]

Tags:

neural-network

detection

coordinates

lasagne