Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data augmentation techniques for general datasets?

I am working in a machine learning problem and want to build neural network based classifiers on it in matlab. One problem is that the data is given in the form of features and number of samples is considerably lower. I know about data augmentation techniques for images, by rotating, translating, affine translation, etc.

I would like to know whether there are data augmentation techniques available for general datasets ? Like is it possible to use randomness to generate more data ? I read the answer here but I did not understand it.

Kindly please provide answers with the working details if possible.

Any help will be appreciated.

like image 857
roni Avatar asked Sep 01 '16 07:09

roni


People also ask

What are data augmentation techniques?

Data augmentation is the technique of increasing the size of data used for training a model. For reliable predictions, the deep learning models often require a lot of training data, which is not always available. Therefore, the existing data is augmented in order to make a better generalized model.

What types of augmentation are possible?

Two types of augmentation: augmented execution and/or augmented evaluation applied to Augmented Reality (Target of the task = Real World) and Augmented Virtuality (Target of the task = Computer).

What is data augmentation Can you give some examples?

This is called “data augmentation.” For example, say you have twenty images of ducks in your image classification dataset. By creating copies of your duck images and flipping them horizontally, you have doubled the training examples for the “duck” class.


1 Answers

You need to look into autoencoders. Effectively you pass your data into a low level neural network, it applies a PCA-like analysis, and you can subsequently use it to generate more data.

Matlab has an autoencoder class as well as a function, that will do all of this for you. From the matlab help files

Generate the training data.

rng(0,'twister'); % For reproducibility
n = 1000;
r = linspace(-10,10,n)';
x = 1 + r*5e-2 + sin(r)./r + 0.2*randn(n,1);

Train autoencoder using the training data.

hiddenSize = 25;
autoenc = trainAutoencoder(x',hiddenSize,...
        'EncoderTransferFunction','satlin',...
        'DecoderTransferFunction','purelin',...
        'L2WeightRegularization',0.01,...
        'SparsityRegularization',4,...
        'SparsityProportion',0.10);

Generate the test data.

n = 1000;
r = sort(-10 + 20*rand(n,1));
xtest = 1 + r*5e-2 + sin(r)./r + 0.4*randn(n,1);

Predict the test data using the trained autoencoder, autoenc .

xReconstructed = predict(autoenc,xtest');

Plot the actual test data and the predictions.

figure;
plot(xtest,'r.');
hold on
plot(xReconstructed,'go');

Plot results

You can see the green cicrles which represent additional data generated with the auto-encoder.

like image 50
zhqiat Avatar answered Sep 21 '22 00:09

zhqiat