I have a project where I am doing a regression with Gradient Boosted Trees using tabular data. I want to see if using a denoising autoencoder on my data can find a better representation of my original data and improve my original GBT scores. Inspiration is taken from the popular Kaggle winner here.
AFAIK I have two main choices for extracting the activation's of the DAE - creating a bottleneck structure and taking the single middle layer activations or concatenating every layer's activation's as the representation.
Let's assume I want all layer activations from the 3x 512 node layers below:
inputs = Input(shape=(31,))
encoded = Dense(512, activation='relu')(inputs)
encoded = Dense(512, activation='relu')(encoded)
decoded = Dense(512, activation='relu')(encoded)
decoded = Dense(31, activation='linear')(decoded)
autoencoder = Model(inputs, decoded)
autoencoder.compile(optimizer='Adam', loss='mse')
history = autoencoder.fit(x_train_noisy, x_train_clean,
epochs=100,
batch_size=128,
shuffle=True,
validation_data=(x_test_noisy, x_test_clean),
callbacks=[reduce_lr])
My questions are:
Taking the activations of the above will give me a new representation of x_train, right? Should I repeat this process for x_test? I need both to train my GBT model.
How can I do inference? Each new data point will need to be "converted" into this new representation format. How can I do that with Keras?
Do I actually need to provide validation_data=
to .fit
in this situation?
Taking the activations of the above will give me a new representation of x_train, right? Should I repeat this process for x_test? I need both to train my GBT model.
Of course, you need to have the denoised representation for both training and testing data, because the GBT model that you train later only accepts the denoised feature.
How can I do inference? Each new data point will need to be "converted" into this new representation format. How can I do that with Keras?
If you want to use the denoised/reconstructed feature, you can directly use autoencoder.predict( X_feat )
to extract features. If you want to use the middle layer, you need to build a new model encoder_only=Model(inputs, encoded)
first and use it for feature extraction.
Do I actually need to provide validation_data= to .fit in this situation?
You'd better separate some training data for validation to prevent overfitting. However, you can always train multiple models, e.g. in a leave-one-out way to fully use all data in an ensemble way.
Additional remarks:
DropOut
Denoising autoencoder model is a model that can help denoising noisy data. As train data we are using our train data with target the same data.
The model you are describing above is not a denoising autoencoder model. For an autoencoder model, on encoding part, units must gradually be decreased in number from layer to layer thus on decoding part units must gradually be increased in number.
Simple autoencoder model should look like this:
input = Input(shape=(31,))
encoded = Dense(128, activation='relu')(input)
encoded = Dense(64, activation='relu')(encoded)
encoded = Dense(32, activation='relu')(encoded)
decoded = Dense(32, activation='relu')(encoded)
decoded = Dense(64, activation='relu')(decoded)
decoded = Dense(128, activation='relu')(decoded)
decoded = Dense(31, activation='sigmoid')(decoded)
autoencoder = Model(input, decoded)
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.fit(x_train_noisy, x_train_noisy,
epochs=100,
batch_size=256,
shuffle=True,
validation_data=(x_test_noisy, x_test_noisy))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With