Neural Networks: What does the input layer consist of?

Tags:

Final Edit: Cleaned up the question and accepted runDOSrun's answer. IVlad's is equally good, and user3760780's is extremely helpful too. I recommend reading all three of those as well as the comments. The TLDR answer is that Possibility #1 is more or less the correct one but I phrased it very badly.

What does the input layer consist of in Neural Networks? What does that layer do?

A similar question is here Neural Networks: Does the input layer consist of neurons? but the answers there did not clear up my confusion.

Like the poster in the question above, I'm confused by the many contradicting things the Internet has to say about the input layer of a basic feed-forward network.

I'll skip the links to contradicting tutorials and articles and list the three possibilities that I can see. Which one (if any) is the correct one?

The input layer passes the data directly to the first hidden layer where the data is multiplied by the first hidden layer's weights.
The input layer passes the data through the activation function before passing it on. The data is then multiplied by the first hidden layer's weights.
The input layer has its own weights that multiply the incoming data. The input layer then passes the data through the activation function before passing it on. The data is then multiplied by the first hidden layer's weights.

Thanks!

EDIT 1: Here is an image and an example for further clarity.

625

asked Sep 11 '15 02:09

MindSeeker

2 Answers

Out of your 3 descriptions, the first one fits best:

The input layer passes the data directly to the first hidden layer where the data is multiplied by the first hidden layer's weights.

A standard Multilayer Perceptron's input layer consists of units (you can call them input neurons, but I prefer to use the term units because you expect a neuron to do some computations, which isn't the case for the input layer) that you assign a value to (a part of one of your input data instances, or the value of a feature of a single instance in machine learning terms) and they simply feed that value to every neuron in the first hidden layer, resulting in exactly the first case you portray in your image.

I would rephrase it to this for more accuracy:

Each unit of the input layer, in top-to-bottom order, passes its assigned value to each neuron of the first hidden layer. Then, each hidden layer neuron multiplies each of these values (x1, x2, ..., xm) with its weight vector (w1, w2, ..., wm), sums the multiplied values (x1*w1 + x2*w2 + ... + xm*wm), applies its activation function to this sum (logistic, tanh, identity function) and returns the value computed by the activation function to the next layer.

So for your example, top-most neuron in the hidden layer would receive the inputs:

.5, .6

From the input layer, and it would compute and return:

g(.4 * .5 + .3 * .6)

Where g is its activation function, which can be anything:

g(x) = x # identity function, like in your picture
g(x) = 1 / (1 + exp(-x)) # logistic sigmoid

In my opinion, it is not entirely right to say the weights also go into it, since its weights are its own, but I guess this distinction is not very important; it certainly doesn't affect the result.

You have to remember that this is all conceptually speaking. In proper implementations, you won't have any actual layers at all, just some matrix multiplications. But they will implement the same concept. When trying to understand something, you should start by referring to the underlying concept.

The input layer passes the data through the activation function before passing it on. The data is then multiplied by the first hidden layer's weights.

This is not correct, the input layer only returns some values assigned to it to every neuron in the next layer.

Is there some reference where you found it? I'm quite sure it's not standard practice to do this.

The input layer has its own weights that multiply the incoming data. The input layer then passes the data through the activation function before passing it on. The data is then multiplied by the first hidden layer's weights.

Again, not the case. It has no weights and no activation functions.

136

answered Oct 28 '22 03:10

IVlad

Since I gave an answer in the thread you linked I'll try my best to clear up your confusions as well.

First thing I notice is that you seem to be confused about which layer a weight belongs to. The answer is not to one but two. The weight in your image is the weight from input to hidden layer and should be referenced as such in order to avoid ambiguity within multiple layers. Again, different conventions. But stick to this one since it reflects the official math notations best (weights are referenced as w_ij indicating that a weight goes from i to j (sometimes j to i depending on the author)).

Let me also start by saying that natural speech and graphs are always ambiguous and the best way to approach things is math. It's plain and clear ...although most of us may have a bad relation with it :)

That being said let's start with an image anyway (this is a single layer perceptron, just pretend the next layer is actually a hidden layer it makes no difference):

enter image description here

This image is clearer for beginners since it breaks up the process of activating a single neuron into all its components:

The inputs and weights (between inp and hid layer) are combined and summed. This is the linear combination with net_j being the input for the neuron j in the next hidden layer.
This net input is fed into the activation function f, such that the activation per hidden neuron in the hidden layer is (here described as o_j, I'll refer to it as h_j since we pretend it's in a hidden layer).

So the whole process of getting the value for each hidden neuron h_j can be summed up with the simple formula:

enter image description here

Each input gets multiplied with its connection to this neuron h_j.
All these weight-input products get summed.
This is fed into the activation function f.
The result is the value of the neuron h_j.

This is done with all neurons h_j and then repeated for the next layer.

So actually none of your options are 100% correct or complete. 1.) is phrased correctly but incomplete.

Edit: The correct possibility in your image is #1:

enter image description here

(Weights only have 2 indices as said, units have 1 index. w_ij is the weight from unit x_i to h_j)

answered Oct 28 '22 03:10

runDOSrun

Related questions
                            
                                Extracting Key-Phrases from text based on the Topic with Python
                            
                                Way to infer the size of the userbase of a site from sampling taken usernames
                            
                                What techniques exist for the software-driven locomotion of a bipedal robot?
                            
                                Is POS tagging deterministic?
                            
                                How to represent text for classification in weka?
                            
                                Classifying Single Instance in Weka
                            
                                Is the bias node necessary in very large neural networks?
                            
                                MATLAB's glmfit vs fitglm
                            
                                Function Approximation: How is tile coding different from highly discretized state space?
                            
                                Where is exactly the definition of function-like macros such as CHECK & CHECK_EQ in Caffe?
                            
                                Is Session.run(fetches) guaranteed to execute its "fetches" arguments in-order?
                            
                                How to plot using matplotlib (python) colah's deformed grid?
                            
                                Keras: How come 'accuracy' is higher than 'val_acc'?
                            
                                Calculate recall for each class after each epoch in Tensorflow 2
                            
                                Generating dictionaries to categorize tweets into pre-defined categories using NLTK
                            
                                Formula for "Relative absolute error" and "Root relative squared error" used in machine learning (as computed by Weka)
                            
                                Using scikit-learn DecisionTreeClassifier to cluster
                            
                                Vastly different results for SVM model using e1071 and caret
                            
                                When using multiple classifiers - How to measure the ensemble's performance? [SciKit Learn]
                            
                                Does scikit-learn perform "real" multivariate regression (multiple dependent variables)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Neural Networks: What does the input layer consist of?

Tags:

machine-learning

neural-network

MindSeeker

People also ask

2 Answers

IVlad

runDOSrun

Recent Activity

Donate For Us