I recently started studying ANN, and there is something that I've been trying to figure out that I can't seem to find an answer to (probably because it's too trivial or because I'm searching for the wrong keywords..).
When do you use multiple outputs instead of single outputs? I guess in simplest case of 1/0-classification its the easiest to use the "sign" as the output activiation function. But in which case do you use several outputs? Is it if you have for instance a multiple classification problem, so you want to classify something as, say for instance, A, B or C and you choose 1 output neuron for each class? How do you determine which class it belongs to?
In a classification context, there are a couple of situations where using multiple output units can be helpful: multiclass classification, and explicit confidence estimation.
For the multiclass case, as you wrote in your question, you typically have one output unit in your network for each class of data you're interested in. So if you're trying to classify data as one of A, B, or C, you can train your network on labeled data, but convert all of your "A" labels to [1 0 0], all your "B" labels to [0 1 0], and your "C" labels to [0 0 1]. (This is called a "one-hot" encoding.) You also probably want to use a logistic activation on your output units to restrict their activation values to the interval (0, 1).
Then, when you're training your network, it's often useful to optimize a "cross-entropy" loss (as opposed to a somewhat more intuitive Euclidean distance loss), since you're basically trying to teach your network to output the probability of each class for a given input. Often one uses a "softmax" (also sometimes called a Boltzmann) distribution to define this probability.
For more info, please check out http://www.willamette.edu/~gorr/classes/cs449/classify.html (slightly more theoretical) and http://deeplearning.net/tutorial/logreg.html (more aimed at the code side of things).
Another cool use of multiple outputs is to use one output as a standard classifier (e.g., just one output unit that generates a 0 or 1), and a second output to indicate the confidence that this network has in its classification of the input signal (e.g., another output unit that generates a value in the interval (0, 1)).
This could be useful if you trained up a separate network on each of your A, B, and C classes of data, but then also presented data to the system later that came from class D (or whatever) -- in this case, you'd want each of the networks to indicate that they were uncertain of the output because they've never seen something from class D before.
Have a look at softmax layer for instance. Maximum output of this layer is your class. And it has got nice theoretical justification.
To be concise : you take previous layer's output and interpret it as a vector in m dimensional space. After that you fit K gaussians to it, which are sharing covariance matrices. If you model it and write out equations it amounts to softmax layer. For more details see "Machine Learning. A Probabilistic Perspective" by Kevin Murphy.
It is just an example of using last layer for multiclass classification. You can as well use multiple outputs for something else. For instance you can train ANN to "compress" your data, that is calculate a function from N dimensional to M dimensional space that minimizes loss of information (this model is called autoencoder)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With