I have tried to profile layer-by-layer of DenseNet in Pytorch as caffe-time tool.
First trial : using autograd.profiler like below
...
model = models.__dict__['densenet121'](pretrained=True)
model.to(device)
with torch.autograd.profiler.profile(use_cuda=True) as prof:
model.eval()
print(prof)
...
But the any results are shown except for this message :
<unfinished torch.autograd.profile>
Ultimately, I want to profile network archtiectures(i.g.DenseNet) to check where bottlenecks happen.
Could anyone do this?
To run profiler you have do some operations, you have to input some tensor into your model.
Change your code as following.
import torch
import torchvision.models as models
model = models.densenet121(pretrained=True)
x = torch.randn((1, 3, 224, 224), requires_grad=True)
with torch.autograd.profiler.profile(use_cuda=True) as prof:
model(x)
print(prof)
This is the sample of the output I got:
----------------------------------- --------------- --------------- --------------- --------------- ---------------
Name CPU time CUDA time Calls CPU total CUDA total
----------------------------------- --------------- --------------- --------------- --------------- ---------------
conv2d 9976.544us 9972.736us 1 9976.544us 9972.736us
convolution 9958.778us 9958.400us 1 9958.778us 9958.400us
_convolution 9946.712us 9947.136us 1 9946.712us 9947.136us
contiguous 6.692us 6.976us 1 6.692us 6.976us
empty 11.927us 12.032us 1 11.927us 12.032us
mkldnn_convolution 9880.452us 9889.792us 1 9880.452us 9889.792us
batch_norm 1214.791us 1213.440us 1 1214.791us 1213.440us
native_batch_norm 1190.496us 1193.056us 1 1190.496us 1193.056us
threshold_ 158.258us 159.584us 1 158.258us 159.584us
max_pool2d_with_indices 28837.682us 28836.834us 1 28837.682us 28836.834us
max_pool2d_with_indices_forward 28813.804us 28822.530us 1 28813.804us 28822.530us
batch_norm 1780.373us 1778.690us 1 1780.373us 1778.690us
native_batch_norm 1756.774us 1759.327us 1 1756.774us 1759.327us
threshold_ 64.665us 66.368us 1 64.665us 66.368us
conv2d 6103.544us 6102.142us 1 6103.544us 6102.142us
convolution 6089.946us 6089.600us 1 6089.946us 6089.600us
_convolution 6076.506us 6076.416us 1 6076.506us 6076.416us
contiguous 7.306us 7.938us 1 7.306us 7.938us
empty 9.037us 8.194us 1 9.037us 8.194us
mkldnn_convolution 6015.653us 6021.408us 1 6015.653us 6021.408us
batch_norm 700.129us 699.394us 1 700.129us 699.394us
There are many rows below this.
I have used (1,3,224,224) tensor as densenet only accepts 224x224 images. In the future change tensor size according to the network.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With