How to reverse max pooling layer in autoencoder to return the original shape in decoder?

Question

I am building an autoencoder to compress the image. my input image is mnist dataset which contain (28,28,1) images and I want my latent space (encoded image)to have the shape (10,10,1) to have high compression ratio. in encoder part ,I don't have any problem but in the decoder part I cant return the the image to the original shape (28,28,1).

my code :

#Encoder

input_img = keras.Input(shape=(28, 28, 1))

x = layers.Conv2D(64 ,(3, 3), activation='relu', padding='same')(input_img)
x =layers.MaxPooling2D((3,3), padding='same')(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(512, (3, 3), activation='relu', padding='same')(x)
encoded =  layers.Conv2D(1, (3, 3), activation='relu', padding='same')(x)

Encoded shape

#Decoder

x = layers.Conv2D(512, (3, 3), activation='relu', padding='same')(encoded)
x = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2),interpolation="bilinear")(x)
x = layers.Conv2D(64 ,(3, 3), activation='relu', padding='same')(x)


decoded = x = layers.Conv2D(1, (3, 3), activation='relu', padding='same')(x)

decoded shape :(20,20,1) Decoded shape

How i can return the image to the original shape?

AntreasAntoniou · Accepted Answer

There are multiple ways to upscale a 2D tensor, or alternatively, to project a smaller vector into a larger one.

Here's a non exhaustive list:

Apply one or a couple of upsampling layers followed by a flatten layer, followed by a Linear layer. Upsampling basically applies standard image upscaling algorithms to increase the size of your image. Then, you want to flatten it so a linear layer can be applied on it so you can achieve the precise shape you require.
Skip the upscale altogether and just apply a flatten, followed by a projection layer. For MNIST this will suffice. For more complex datasets, you want to use the previous suggestion, interspersed with convolutional blocks, to help improve your models capacity and reconstruction ability.

I can see that you have already attempted the UpSampling + Conv direction. What you want to do next is apply a flatten layer, followed by a projection layer with 768 output units, before reshaping into batch, 28, 28, 1 again to get what you need.

How to reverse max pooling layer in autoencoder to return the original shape in decoder?

Tags:

machine-learning

keras-layer

autoencoder

mohammad

1 Answers

AntreasAntoniou

Recent Activity

Donate For Us

How to reverse max pooling layer in autoencoder to return the original shape in decoder?

Tags:

machine-learning

keras-layer

autoencoder

mohammad

1 Answers

AntreasAntoniou

Related questions

Recent Activity

Donate For Us