I have created an Auto Encoder Neural Network in MATLAB. I have quite large inputs at the first layer which I have to reconstruct through the network's output layer. I cannot use the large inputs as it is,so I convert it to between [0, 1] using sigmf
function of MATLAB. It gives me a values of 1.000000 for all the large values. I have tried using setting the format but it does not help.
Is there a workaround to using large values with my auto encoder?
Bottleneck: It is the lower dimensional hidden layer where the encoding is produced. The bottleneck layer has a lower number of nodes and the number of nodes in the bottleneck layer also gives the dimension of the encoding of the input.
Lossy compression: The output of the autoencoder is not exactly the same as the input, it is a close but degraded representation. For lossless compression, they are not the way to go. Data-specific: Autoencoders are only able to meaningfully compress data similar to what they have been trained on.
Several deep learning techniques have been used to address this issue, and one of them is the Autoencoder and its Denoising and Variational variants. These models are able to learn a representation of the data with missing values and generate plausible new ones to replace them.
The process of convert your inputs to the range [0,1] is called normalization, however, as you noticed, the sigmf function is not adequate for this task. This link maybe is useful to you.
Suposse that your inputs are given by a matrix of N rows and M columns, where each row represent an input pattern and each column is a feature. If your first column is:
vec =
-0.1941
-2.1384
-0.8396
1.3546
-1.0722
Then you can convert it to the range [0,1] using:
%# get max and min
maxVec = max(vec);
minVec = min(vec);
%# normalize to -1...1
vecNormalized = ((vec-minVec)./(maxVec-minVec))
vecNormalized =
0.5566
0
0.3718
1.0000
0.3052
As @Dan indicates in the comments, another option is to standarize the data. The goal of this process is to scale the inputs to have mean 0 and a variance of 1. In this case, you need to substract the mean value of the column and divide by the standard deviation:
meanVec = mean(vec);
stdVec = std(vec);
vecStandarized = (vec-meanVec)./ stdVec
vecStandarized =
0.2981
-1.2121
-0.2032
1.5011
-0.3839
Before I give you my answer, let's think a bit about the rationale behind an auto-encoder (AE):
The purpose of auto-encoder is to learn, in an unsupervised manner, something about the underlying structure of the input data. How does AE achieves this goal? If it manages to reconstruct the input signal from its output signal (that is usually of lower dimension) it means that it did not lost information and it effectively managed to learn a more compact representation.
In most examples, it is assumed, for simplicity, that both input signal and output signal ranges in [0..1]. Therefore, the same non-linearity (sigmf
) is applied both for obtaining the output signal and for reconstructing back the inputs from the outputs.
Something like
output = sigmf( W*input + b ); % compute output signal
reconstruct = sigmf( W'*output + b_prime ); % notice the different constant b_prime
Then the AE learning stage tries to minimize the training error || output - reconstruct ||
.
However, who said the reconstruction non-linearity must be identical to the one used for computing the output?
In your case, the assumption that inputs ranges in [0..1] does not hold. Therefore, it seems that you need to use a different non-linearity for the reconstruction. You should pick one that agrees with the actual range of you inputs.
If, for example, your input ranges in (0..inf) you may consider using exp
or ().^2
as the reconstruction non-linearity. You may use polynomials of various degrees, log
or whatever function you think may fit the spread of your input data.
Disclaimer: I never actually encountered such a case and have not seen this type of solution in literature. However, I believe it makes sense and at least worth trying.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With