Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reverse max pooling layer in autoencoder to return the original shape in decoder?

I am building an autoencoder to compress the image. my input image is mnist dataset which contain (28,28,1) images and I want my latent space (encoded image)to have the shape (10,10,1) to have high compression ratio. in encoder part ,I don't have any problem but in the decoder part I cant return the the image to the original shape (28,28,1).

my code :

#Encoder

input_img = keras.Input(shape=(28, 28, 1))

x = layers.Conv2D(64 ,(3, 3), activation='relu', padding='same')(input_img)
x =layers.MaxPooling2D((3,3), padding='same')(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(512, (3, 3), activation='relu', padding='same')(x)
encoded =  layers.Conv2D(1, (3, 3), activation='relu', padding='same')(x)

Encoded shape

#Decoder

x = layers.Conv2D(512, (3, 3), activation='relu', padding='same')(encoded)
x = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2),interpolation="bilinear")(x)
x = layers.Conv2D(64 ,(3, 3), activation='relu', padding='same')(x)


decoded = x = layers.Conv2D(1, (3, 3), activation='relu', padding='same')(x)

decoded shape :(20,20,1) Decoded shape

How i can return the image to the original shape?

like image 735
mohammad Avatar asked Jan 29 '26 08:01

mohammad


1 Answers

There are multiple ways to upscale a 2D tensor, or alternatively, to project a smaller vector into a larger one.

Here's a non exhaustive list:

  • Apply one or a couple of upsampling layers followed by a flatten layer, followed by a Linear layer. Upsampling basically applies standard image upscaling algorithms to increase the size of your image. Then, you want to flatten it so a linear layer can be applied on it so you can achieve the precise shape you require.
  • Skip the upscale altogether and just apply a flatten, followed by a projection layer. For MNIST this will suffice. For more complex datasets, you want to use the previous suggestion, interspersed with convolutional blocks, to help improve your models capacity and reconstruction ability.

I can see that you have already attempted the UpSampling + Conv direction. What you want to do next is apply a flatten layer, followed by a projection layer with 768 output units, before reshaping into batch, 28, 28, 1 again to get what you need.

like image 85
AntreasAntoniou Avatar answered Jan 31 '26 23:01

AntreasAntoniou



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!