Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deepmind Deep Q Network (DQN) 3D Convolution

I was reading the deepmind nature paper on DQN network. I almost got everything about it except one. I don't know why no one has asked this question before but it seems a little odd to me anyway.

My question: Input to DQN is a 84*84*4 image. The first convolution layer consists of 32 filters of 8*8 with stide 4. I want to know what is the result of this convolution phase exactly? I mean, the input is 3D, but we have 32 filters which are all 2D. How does the third dimension (which corresponds to 4 last frames in the game) take part in the convolution?

Any ideas? Thanks Amin

like image 768
donamin Avatar asked Oct 26 '25 09:10

donamin


1 Answers

You can think of the third dimension (representing the last four frames) as channels into the network.

A similar scenario occurs if you combine three channels of RGB to create a greyscale representation. In this case you perform each convolution (for each channel) separately and sum the contributions to give the final output feature map.

The DeepMind guys refer to this paper (What is the Best Multi-Stage Architecture for Object Recognition?) which may provide a better explanation.

like image 180
John Wakefield Avatar answered Oct 29 '25 05:10

John Wakefield



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!