Are there standard input, weight and output values for neural network nodes? [closed]

So I've started learning about neural networks, but I'm finding it hard to figure out the basics. Grateful for any help anyone can offer..

1) Are there standard values that should be input to a neuron? For example, if a neuron has 5 incoming connections, should each connection be providing a) a continuous value between 0 and 1? b) Either 0 or 1? c) Something else?

2) If you use an activation function of tanh, that means that the neuron will start outputting 1 if the dot product input reaches about 3 (tanh(3) = .995). If I have a layer of 20 hidden nodes, that means that the weights will need to be small - around the .05 mark - if we are to avoid maxing out the activation function? Then why do we set the starting weights to be between -1 and 1? Better to start them off very small?

3) What should be the output of a neuron? a) a value between 0 and 1? b) Either 0 or 1? c) Something else? Do some ANNs have neurons outputting between -1 and 1 (I think I've seen that?)

4) Seems like the rules change for the input layer and the output layer? For the input layer, I guess you have to encode your input data into a suitable format. Does that always mean encoding into values between 0 and 1? Likewise for the output layer, presumably you have to massage your output values to something useful? So perhaps if your ANN outputs a continuous value between 0 and 1, and you want a YES or NO, then you can just make a rule that <0.5 is NO and >0.5 is YES. Is that how it works?

5) Are there disadvantages to encoding scalar input values into binary? Seems a little strange that a large number might have a 1 as the final bit, yet that number+1 has a 0 as the final bit? Is there a more continuous way of encoding values that works better?

Sorry, lots of questions.. Grateful for any answers. Thanks!

  1. Normalized values help training a lot, so make sure your inputs are in a short range. What the range should be depends on the task: sometimes, the variables are naturally booleans, but when they're real-valued, you'd better scale them and center them at zero. Otherwise, the network will spend time learning the mean and variance of the data, which is wasteful because there are very fast, very simple algorithms for that.

  2. If you start out with large weights, training behavior is unpredictable. I've never heard anyone say that initial weights should be in [-1, 1]; the common recipe, AFAIK, is to use small random Gaussians with mean 0 and variance 1 (what you get from randn in Matlab or NumPy).

  3. Depends on the activation function. For hidden-layer neurons, tanh is a common activation function, and it has range [-1, 1]. For the output layer, the appropriate activation function depends on the task. For regression you'd want a linear (unbounded) activation, while for probability estimation and classification you want logistic or softmax activation with range (0, 1).

  4. This is a repetition of questions 1 and 3.

  5. I really don't understand why you'd want to do this. Is there anything wrong with floating point numbers?

