Neural Network Always Produces Same/Similar Outputs for Any Input

Tags:

neural-network

I have a problem where I am trying to create a neural network for Tic-Tac-Toe. However, for some reason, training the neural network causes it to produce nearly the same output for any given input.

I did take a look at Artificial neural networks benchmark, but my network implementation is built for neurons with the same activation function for each neuron, i.e. no constant neurons.

To make sure the problem wasn't just due to my choice of training set (1218 board states and moves generated by a genetic algorithm), I tried to train the network to reproduce XOR. The logistic activation function was used. Instead of using the derivative, I multiplied the error by output*(1-output) as some sources suggested that this was equivalent to using the derivative. I can put the Haskell source on HPaste, but it's a little embarrassing to look at. The network has 3 layers: the first layer has 2 inputs and 4 outputs, the second has 4 inputs and 1 output, and the third has 1 output. Increasing to 4 neurons in the second layer didn't help, and neither did increasing to 8 outputs in the first layer.

I then calculated errors, network output, bias updates, and the weight updates by hand based on http://hebb.mit.edu/courses/9.641/2002/lectures/lecture04.pdf to make sure there wasn't an error in those parts of the code (there wasn't, but I will probably do it again just to make sure). Because I am using batch training, I did not multiply by x in equation (4) there. I am adding the weight change, though http://www.faqs.org/faqs/ai-faq/neural-nets/part2/section-2.html suggests to subtract it instead.

The problem persisted, even in this simplified network. For example, these are the results after 500 epochs of batch training and of incremental training.

Input    |Target|Output (Batch)      |Output(Incremental) [1.0,1.0]|[0.0] |[0.5003781562785173]|[0.5009731800870864] [1.0,0.0]|[1.0] |[0.5003740346965251]|[0.5006347214672715] [0.0,1.0]|[1.0] |[0.5003734471544522]|[0.500589332376345] [0.0,0.0]|[0.0] |[0.5003674110937019]|[0.500095157458231]

Subtracting instead of adding produces the same problem, except everything is 0.99 something instead of 0.50 something. 5000 epochs produces the same result, except the batch-trained network returns exactly 0.5 for each case. (Heck, even 10,000 epochs didn't work for batch training.)

Is there anything in general that could produce this behavior?

Also, I looked at the intermediate errors for incremental training, and the although the inputs of the hidden/input layers varied, the error for the output neuron was always +/-0.12. For batch training, the errors were increasing, but extremely slowly and the errors were all extremely small (x10^-7). Different initial random weights and biases made no difference, either.

Note that this is a school project, so hints/guides would be more helpful. Although reinventing the wheel and making my own network (in a language I don't know well!) was a horrible idea, I felt it would be more appropriate for a school project (so I know what's going on...in theory, at least. There doesn't seem to be a computer science teacher at my school).

EDIT: Two layers, an input layer of 2 inputs to 8 outputs, and an output layer of 8 inputs to 1 output, produces much the same results: 0.5+/-0.2 (or so) for each training case. I'm also playing around with pyBrain, seeing if any network structure there will work.

Edit 2: I am using a learning rate of 0.1. Sorry for forgetting about that.

Edit 3: Pybrain's "trainUntilConvergence" doesn't get me a fully trained network, either, but 20000 epochs does, with 16 neurons in the hidden layer. 10000 epochs and 4 neurons, not so much, but close. So, in Haskell, with the input layer having 2 inputs & 2 outputs, hidden layer with 2 inputs and 8 outputs, and output layer with 8 inputs and 1 output...I get the same problem with 10000 epochs. And with 20000 epochs.

Edit 4: I ran the network by hand again based on the MIT PDF above, and the values match, so the code should be correct unless I am misunderstanding those equations.

Some of my source code is at http://hpaste.org/42453/neural_network__not_working~~; I'm working on cleaning my code somewhat and putting it in a Github (rather than a private Bitbucket) repository.~~

All of the relevant source code is now at https://github.com/l33tnerd/hsann.

384

asked Dec 20 '10 20:12

li.davidm

2 Answers

I've had similar problems, but was able to solve by changing these:

Scale down the problem to manageable size. I first tried too many inputs, with too many hidden layer units. Once I scaled down the problem, I could see if the solution to the smaller problem was working. This also works because when it's scaled down, the times to compute the weights drop down significantly, so I can try many different things without waiting.
Make sure you have enough hidden units. This was a major problem for me. I had about 900 inputs connecting to ~10 units in the hidden layer. This was way too small to quickly converge. But also became very slow if I added additional units. Scaling down the number of inputs helped a lot.
Change the activation function and its parameters. I was using tanh at first. I tried other functions: sigmoid, normalized sigmoid, Gaussian, etc.. I also found that changing the function parameters to make the functions steeper or shallower affected how quickly the network converged.
Change learning algorithm parameters. Try different learning rates (0.01 to 0.9). Also try different momentum parameters, if your algo supports it (0.1 to 0.9).

Hope this helps those who find this thread on Google!

193

answered Sep 24 '22 21:09

Justas

So I realise this is extremely late for the original post, but I came across this because I was having a similar problem and none of the reasons posted here cover what was wrong in my case.

I was working on a simple regression problem, but every time I trained the network it would converge to a point where it was giving me the same output (or sometimes a few different outputs) for each input. I played with the learning rate, the number of hidden layers/nodes, the optimization algorithm etc but it made no difference. Even when I looked at a ridiculously simple example, trying to predict the output (1d) of two different inputs (1d):

    import numpy as np     import torch     import torch.nn as nn     import torch.nn.functional as F      class net(nn.Module):         def __init__(self, obs_size, hidden_size):             super(net, self).__init__()             self.fc = nn.Linear(obs_size, hidden_size)             self.out = nn.Linear(hidden_size, 1)          def forward(self, obs):             h = F.relu(self.fc(obs))             return self.out(h)      inputs = np.array([[0.5],[0.9]])     targets = torch.tensor([3.0, 2.0], dtype=torch.float32)      network = net(1,5)     optimizer = torch.optim.Adam(network.parameters(), lr=0.001)      for i in range(10000):         out = network(torch.tensor(inputs, dtype=torch.float32))         loss = F.mse_loss(out, targets)         optimizer.zero_grad()         loss.backward()         optimizer.step()         print("Loss: %f outputs: %f, %f"%(loss.data.numpy(), out.data.numpy()[0], out.data.numpy()[1]))

but STILL it was always outputting the average value of the outputs for both inputs. It turns out the reason is that the dimensions of my outputs and targets were not the same: the targets were Size[2], and the outputs were Size[2,1], and for some reason PyTorch was broadcasting the outputs to be Size[2,2] in the MSE loss, which completely messes everything up. Once I changed:

targets = torch.tensor([3.0, 2.0], dtype=torch.float32)

targets = torch.tensor([[3.0], [2.0]], dtype=torch.float32)

It worked as it should. This was obviously done with PyTorch, but I suspect maybe other libraries broadcast variables in the same way.

answered Sep 24 '22 21:09

Henry

Related questions
                            
                                Understanding how Either is an instance of Functor
                            
                                Haskell tuple constructor (GHC) and the separation between a language and its implementation
                            
                                Mixing Haskell and C++
                            
                                What are some problems best/worst addressed by functional programming?
                            
                                Dependent Types: How is the dependent pair type analogous to a disjoint union?
                            
                                Haskell approaches to error handling
                            
                                Haskell Thrift library 300x slower than C++ in performance test
                            
                                Object-oriented programming in a purely functional programming context?
                            
                                What is the meaning of “quasi” in quasiquotations?
                            
                                What's the difference between parametric polymorphism and higher-kinded types?
                            
                                How do you identify monadic design patterns?
                            
                                What is a contravariant functor?
                            
                                simple Haskell loop
                            
                                Is there an inverse of the Haskell $ operator?
                            
                                Split a number into its digits with Haskell
                            
                                Creating Haskell shared libraries on OS X
                            
                                Haskell libraries overview and their quality [closed]
                            
                                Good resources on using functional programming in game development? [closed]
                            
                                Where to start with dependent type programming? [closed]
                            
                                How would I translate a Haskell type class into F#?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With