Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neural Networks: What does the input layer consist of?

Final Edit: Cleaned up the question and accepted runDOSrun's answer. IVlad's is equally good, and user3760780's is extremely helpful too. I recommend reading all three of those as well as the comments. The TLDR answer is that Possibility #1 is more or less the correct one but I phrased it very badly.

What does the input layer consist of in Neural Networks? What does that layer do?

A similar question is here Neural Networks: Does the input layer consist of neurons? but the answers there did not clear up my confusion.

Like the poster in the question above, I'm confused by the many contradicting things the Internet has to say about the input layer of a basic feed-forward network.

I'll skip the links to contradicting tutorials and articles and list the three possibilities that I can see. Which one (if any) is the correct one?

  1. The input layer passes the data directly to the first hidden layer where the data is multiplied by the first hidden layer's weights.
  2. The input layer passes the data through the activation function before passing it on. The data is then multiplied by the first hidden layer's weights.
  3. The input layer has its own weights that multiply the incoming data. The input layer then passes the data through the activation function before passing it on. The data is then multiplied by the first hidden layer's weights.

Thanks!

EDIT 1: Here is an image and an example for further clarity.

like image 625
MindSeeker Avatar asked Sep 11 '15 02:09

MindSeeker


People also ask

What is input and output layer in neural network?

Model Representation The Neural Network is constructed from 3 type of layers: Input layer — initial data for the neural network. Hidden layers — intermediate layer between input and output layer and place where all the computation is done. Output layer — produce the result for given inputs.

How many input layers are there in neural network?

There are three layers; an input layer, hidden layers, and an output layer. Inputs are inserted into the input layer, and each node provides an output value via an activation function. The outputs of the input layer are used as inputs to the next hidden layer.

Is the input layer the first layer?

Input Layer — This is the first layer in the neural network. It takes input signals(values) and passes them on to the next layer. It doesn't apply any operations on the input signals(values) & has no weights and biases values associated.

Does input layer have weights?

The input layer has its own weights that multiply the incoming data. The input layer then passes the data through the activation function before passing it on.


2 Answers

Out of your 3 descriptions, the first one fits best:

  1. The input layer passes the data directly to the first hidden layer where the data is multiplied by the first hidden layer's weights.

A standard Multilayer Perceptron's input layer consists of units (you can call them input neurons, but I prefer to use the term units because you expect a neuron to do some computations, which isn't the case for the input layer) that you assign a value to (a part of one of your input data instances, or the value of a feature of a single instance in machine learning terms) and they simply feed that value to every neuron in the first hidden layer, resulting in exactly the first case you portray in your image.

I would rephrase it to this for more accuracy:

  • Each unit of the input layer, in top-to-bottom order, passes its assigned value to each neuron of the first hidden layer. Then, each hidden layer neuron multiplies each of these values (x1, x2, ..., xm) with its weight vector (w1, w2, ..., wm), sums the multiplied values (x1*w1 + x2*w2 + ... + xm*wm), applies its activation function to this sum (logistic, tanh, identity function) and returns the value computed by the activation function to the next layer.

So for your example, top-most neuron in the hidden layer would receive the inputs:

.5, .6

From the input layer, and it would compute and return:

g(.4 * .5 + .3 * .6)

Where g is its activation function, which can be anything:

g(x) = x # identity function, like in your picture
g(x) = 1 / (1 + exp(-x)) # logistic sigmoid

In my opinion, it is not entirely right to say the weights also go into it, since its weights are its own, but I guess this distinction is not very important; it certainly doesn't affect the result.

You have to remember that this is all conceptually speaking. In proper implementations, you won't have any actual layers at all, just some matrix multiplications. But they will implement the same concept. When trying to understand something, you should start by referring to the underlying concept.

  1. The input layer passes the data through the activation function before passing it on. The data is then multiplied by the first hidden layer's weights.

This is not correct, the input layer only returns some values assigned to it to every neuron in the next layer.

Is there some reference where you found it? I'm quite sure it's not standard practice to do this.

  1. The input layer has its own weights that multiply the incoming data. The input layer then passes the data through the activation function before passing it on. The data is then multiplied by the first hidden layer's weights.

Again, not the case. It has no weights and no activation functions.

like image 136
IVlad Avatar answered Oct 28 '22 03:10

IVlad


Since I gave an answer in the thread you linked I'll try my best to clear up your confusions as well.

First thing I notice is that you seem to be confused about which layer a weight belongs to. The answer is not to one but two. The weight in your image is the weight from input to hidden layer and should be referenced as such in order to avoid ambiguity within multiple layers. Again, different conventions. But stick to this one since it reflects the official math notations best (weights are referenced as w_ij indicating that a weight goes from i to j (sometimes j to i depending on the author)).

Let me also start by saying that natural speech and graphs are always ambiguous and the best way to approach things is math. It's plain and clear ...although most of us may have a bad relation with it :)

That being said let's start with an image anyway (this is a single layer perceptron, just pretend the next layer is actually a hidden layer it makes no difference):

enter image description here

This image is clearer for beginners since it breaks up the process of activating a single neuron into all its components:

  • The inputs and weights (between inp and hid layer) are combined and summed. This is the linear combination enter image description here with net_j being the input for the neuron j in the next hidden layer.

  • This net input is fed into the activation function f, such that the activation per hidden neuron in the hidden layer is enter image description here (here described as o_j, I'll refer to it as h_j since we pretend it's in a hidden layer).

So the whole process of getting the value for each hidden neuron h_j can be summed up with the simple formula:

enter image description here

  • Each input gets multiplied with its connection to this neuron h_j.
  • All these weight-input products get summed.
  • This is fed into the activation function f.
  • The result is the value of the neuron h_j.

This is done with all neurons h_j and then repeated for the next layer.

So actually none of your options are 100% correct or complete. 1.) is phrased correctly but incomplete.

Edit: The correct possibility in your image is #1:

enter image description here

(Weights only have 2 indices as said, units have 1 index. w_ij is the weight from unit x_i to h_j)

like image 36
runDOSrun Avatar answered Oct 28 '22 03:10

runDOSrun