I am working in a machine learning problem and want to build neural network based classifiers on it in matlab. One problem is that the data is given in the form of features and number of samples is considerably lower. I know about data augmentation techniques for images, by rotating, translating, affine translation, etc.
I would like to know whether there are data augmentation techniques available for general datasets ? Like is it possible to use randomness to generate more data ? I read the answer here but I did not understand it.
Kindly please provide answers with the working details if possible.
Any help will be appreciated.
Data augmentation is the technique of increasing the size of data used for training a model. For reliable predictions, the deep learning models often require a lot of training data, which is not always available. Therefore, the existing data is augmented in order to make a better generalized model.
Two types of augmentation: augmented execution and/or augmented evaluation applied to Augmented Reality (Target of the task = Real World) and Augmented Virtuality (Target of the task = Computer).
This is called “data augmentation.” For example, say you have twenty images of ducks in your image classification dataset. By creating copies of your duck images and flipping them horizontally, you have doubled the training examples for the “duck” class.
You need to look into autoencoders. Effectively you pass your data into a low level neural network, it applies a PCA-like analysis, and you can subsequently use it to generate more data.
Matlab has an autoencoder class as well as a function, that will do all of this for you. From the matlab help files
Generate the training data.
rng(0,'twister'); % For reproducibility
n = 1000;
r = linspace(-10,10,n)';
x = 1 + r*5e-2 + sin(r)./r + 0.2*randn(n,1);
Train autoencoder using the training data.
hiddenSize = 25;
autoenc = trainAutoencoder(x',hiddenSize,...
'EncoderTransferFunction','satlin',...
'DecoderTransferFunction','purelin',...
'L2WeightRegularization',0.01,...
'SparsityRegularization',4,...
'SparsityProportion',0.10);
Generate the test data.
n = 1000;
r = sort(-10 + 20*rand(n,1));
xtest = 1 + r*5e-2 + sin(r)./r + 0.4*randn(n,1);
Predict the test data using the trained autoencoder, autoenc .
xReconstructed = predict(autoenc,xtest');
Plot the actual test data and the predictions.
figure;
plot(xtest,'r.');
hold on
plot(xReconstructed,'go');
You can see the green cicrles which represent additional data generated with the auto-encoder.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With