Using R and the package neuralnet
, I try to model data that looks like this:
These are temperature readings in 10 min intervals over several days (above is a 2 day cutout). Using the code below, I fit a neural network to the data. There are probably simpler ways to model this exact data, but in the future the data might look quite different. Using a single hidden layer with 2 neurons gives me satisfactory results:
This also works most of the time with more layers and neurons. However, with one hidden layer with one neuron and occasionally with two layers (in my case 3 and 2 neurons respectively), I get rather poor results, always in the same shape:
The only thing random is the initialization of start weights, so I assume it's related to that. However, I must admit that I have not fully grasped the theory of neural networks yet. What I would like to know is, whether the poor results are due to a local minimum ('neuralnet' uses resilient backpropagation with weight backtracking by default) and I'm simply out of luck, or if I can avoid such a scenario. I am under the impression that there is an optimal number of hidden nodes for fitting e.g. polynomials of degree 2, 5, 10. If not, what's my best course of action? A larger learning rate? Smaller error threshold? Thanks in advance.
I have not tried tuning the rprop parameters yet, so the solution might lie there.
Code:
# DATA ----------------------
minute <- seq(0, 6*24 - 1)
temp <- rep.int(17, 6*24)
temp[(6*7):(6*20)] <- 20
n <- 10
dta <- data.frame(Zeit = minute, Status = temp)
dta <- dta[rep(seq_len(nrow(dta)), n), ]
# Scale everything
maxs <- apply(dta, 2, max)
mins <- apply(dta, 2, min)
nnInput <- data.frame(Zeit = dta$Zeit, Status = dta$Status)
nnInput <- as.data.frame(scale(nnInput, center = mins, scale = maxs - mins))
trainingData <- nnInput[seq(1, nrow(nnInput), 2), ]
testData <- nnInput[seq(2, nrow(nnInput), 2), ]
# MODEL ---------------------
model <- as.formula("Status ~ Zeit")
net <- neuralnet::neuralnet(model,
trainingData,
hidden = 2,
threshold = 0.01,
linear.output = TRUE,
lifesign = "full",
stepmax = 100000,
rep = 1)
net.results <- neuralnet::compute(net, testData$Zeit)
results <- net.results$net.result * (maxs["Status"] - mins["Status"]) + mins["Status"]
testData <- as.data.frame(t(t(testData) * (maxs - mins) + mins))
cleanOutput <- data.frame(Actual = testData$Status,
Prediction = results,
diff = abs(results - testData$Status))
summary(cleanOutput)
plot(cleanOutput$Actual[1:144], main = "Zeittabelle", xlab = paste("Min. seit 0:00 *", n), ylab = "Temperatur")
lines(cleanOutput$Prediction[1:144], col = "red", lwd = 3)
The weights of artificial neural networks must be initialized to small random numbers. This is because this is an expectation of the stochastic optimization algorithm used to train the model, called stochastic gradient descent.
Random initialization is a better choice to break the symmetry. However, initializing weight with much high or low value can result in slower optimization.
No matter what was the input - if all weights are the same, all units in hidden layer will be the same too. This is the main issue with symmetry and reason why you should initialize weights randomly (or, at least, with different values). Note, that this issue affects all architectures that use each-to-each connections.
If all the weights are initialized to zeros, the derivatives will remain same for every w in W[l]. As a result, neurons will learn same features in each iterations. This problem is known as network failing to break symmetry. And not only zero, any constant initialization will produce a poor result.
Basically - initialization is really important. If you don't initialize it randomly then you might make your network not working at all (e.g. by setting all the weights to 0
). It is also proven that for sigmoid and relu a certain kind of activation might help in training your network.
But in your case - I think that the differences are mostly made by the complexity of your problem. With a models with a complexity which seem to fit the complexity of your problem performs nice. The other models may suffer for the following reasons:
UPDATE:
With small network sizes - it is quite usual to stuck in a local minimum. Depending on the amount of time which you need to train your network you may use the following techniques to overcome that:
About the connection between layer size and polynomial degree - I think that the question is not clearly stated. You must specify more details like e.g. the activation function. I also think that the nature of a polynomials and functions which could be modelled by a classic neural networks differs a lot. In polynomials - the small change in parameters values usually tends to much higher difference than in neural network case. Usually a derivative of a neural network is a bounded function whereas the polynomial derivative is unbounded when the degree is bigger that 2. Due to this facts I think - that looking for a dependency between a polynomial degree and a size of a hidden layer might be not worth serious considerations.
All you need is a good init (2016) : This paper proposes a simple method for weight initialization for deep net learning (http://arxiv.org/abs/1511.06422)
Watch this 6 mins video by andrew ng (Machine Learning, Coursera -> Week 5-> Random Initialization) explains danger of setting all initial weights to zero in Backpropagation (https://www.coursera.org/learn/machine-learning/lecture/ND5G5/random-initialization)
If we initialize all weights to the same value (e.g. zero or one). In this case, each hidden unit will get exactly the same signal. E.g. if all weights are initialized to 1, each unit gets signal equal to sum of inputs (and outputs sigmoid(sum(inputs))). If all weights are zeros, which is even worse, every hidden unit will get zero signal. No matter what was the input - if all weights are the same, all units in hidden layer will be the same too. This is why one should initialize weights randomly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With