DenseNets tend to take up a lot of memory in TensorFlow because each concat operation is stored in a separate allocation. A recent paper, Memory-Efficient Implementation of DenseNets, demonstrates that this memory utilization can be dramatically reduced through sharing of allocations. This image from the paper + pytorch implementation illustrates the shared memory approach:
How can this be implemented with TensorFlow? If it can't be done via python, how can it be properly implemented in an Op with CPU and GPU support?
I've created a TensorFlow Feature Request for necessary allocation functionality.
A memory efficient implementation is now available at:
https://github.com/joeyearsley/efficient_densenet_tensorflow
The relevant function from the above link is:
# Gradient checkpoint the layer
_x = tf.contrib.layers.recompute_grad(_x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With