So I've started learning about neural networks, but I'm finding it hard to figure out the basics. Grateful for any help anyone can offer..
1) Are there standard values that should be input to a neuron? For example, if a neuron has 5 incoming connections, should each connection be providing a) a continuous value between 0 and 1? b) Either 0 or 1? c) Something else?
2) If you use an activation function of tanh, that means that the neuron will start outputting 1 if the dot product input reaches about 3 (tanh(3) = .995). If I have a layer of 20 hidden nodes, that means that the weights will need to be small - around the .05 mark - if we are to avoid maxing out the activation function? Then why do we set the starting weights to be between -1 and 1? Better to start them off very small?
3) What should be the output of a neuron? a) a value between 0 and 1? b) Either 0 or 1? c) Something else? Do some ANNs have neurons outputting between -1 and 1 (I think I've seen that?)
4) Seems like the rules change for the input layer and the output layer? For the input layer, I guess you have to encode your input data into a suitable format. Does that always mean encoding into values between 0 and 1? Likewise for the output layer, presumably you have to massage your output values to something useful? So perhaps if your ANN outputs a continuous value between 0 and 1, and you want a YES or NO, then you can just make a rule that <0.5 is NO and >0.5 is YES. Is that how it works?
5) Are there disadvantages to encoding scalar input values into binary? Seems a little strange that a large number might have a 1 as the final bit, yet that number+1 has a 0 as the final bit? Is there a more continuous way of encoding values that works better?
Sorry, lots of questions.. Grateful for any answers. Thanks!
Weighted Input Each input is multiplied by the weight associated with the synapse connecting the input to the current neuron. If there are 3 inputs or neurons in the previous layer, each neuron in the current layer will have 3 distinct weights — one for each each synapse.
Since it is using the sigmoid function, the weights usually range from -∞ to +∞ because the sigmoid of -∞ is near 0 and the sigmoid of +∞ is near 1, and you need to be capable of having near 0 and near 1 values as outputs of your neurons.
In common textbook networks like a multilayer perceptron - each hidden layer and the output layer in a regressor, or up to the softmax, normalized output layer of a classifier, have weights.
Initializing all the weights with zeros leads the neurons to learn the same features during training. In fact, any constant initialization scheme will perform very poorly.
Normalized values help training a lot, so make sure your inputs are in a short range. What the range should be depends on the task: sometimes, the variables are naturally booleans, but when they're real-valued, you'd better scale them and center them at zero. Otherwise, the network will spend time learning the mean and variance of the data, which is wasteful because there are very fast, very simple algorithms for that.
If you start out with large weights, training behavior is unpredictable. I've never heard anyone say that initial weights should be in [-1, 1]; the common recipe, AFAIK, is to use small random Gaussians with mean 0 and variance 1 (what you get from randn
in Matlab or NumPy).
Depends on the activation function. For hidden-layer neurons, tanh is a common activation function, and it has range [-1, 1]. For the output layer, the appropriate activation function depends on the task. For regression you'd want a linear (unbounded) activation, while for probability estimation and classification you want logistic or softmax activation with range (0, 1).
This is a repetition of questions 1 and 3.
I really don't understand why you'd want to do this. Is there anything wrong with floating point numbers?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With