Hello, I am trying to reproduce the profiler example of the official Pytorch tutorial. I want to export stacks of a forward pass of a model.
Although, the stacks files are created and they are empty.
import torch
from torch import profiler
from torchvision.models import resnet18
model = resnet18().cuda()
inputs = torch.rand(5, 3, 224, 224).cuda()
with profiler.profile(
activities=[profiler.ProfilerActivity.CPU,
profiler.ProfilerActivity.CUDA],
with_stack=True,
)as p:
model(inputs)
p.export_stacks(
f"/tmp/profiler/stacks_cpu.txt", "self_cpu_time_total")
p.export_stacks(
f"/tmp/profiler/stacks_cuda.txt", "self_cuda_time_total")
Note: I reproduced it on the bare docker image.
The very weird thing is that when I print the table from my script, I can see the trace. I give the exact snippet of code I use for that, I just put them right after the snippet of code above.
print(p.key_averages(group_by_stack_n=5).table(
sort_by="self_cuda_time_total", row_limit=2))
Update: The printing option is also not working. It prints the table but not the stacks. With debug I can see the function _build_table
in module torch.autograd.profiler_util
. On Line 794, the stacks
variable is an empty list. (_build_table
is called on table
method in code snippet above).
Also, in key_averages
method of the class EventList
- which is called in key_averages
of profiler
class (used in the code snipped) - each event has an empty stacks
attribute on line 298 .
So question is, why the stack is not filled in those events? I will investigate furthermore.
On the pytorch repo there is an issue #100253 that answers the final question of my post. I let you read the issue for more details.
In brief, there was an error on torch version 2.0.0 about the profiler. Their example is simpler: they try to profile an addition of 2 tensors. Their investigation is the same as mine, they end up with the same conclusion: the stacks is not filled by the profiler because the events has an empty stacks
attribute. Their investigation is more located because they compared two version of torch (1.13.0 VS 2.0.0) and they find the number of events are not the same. The profiler's tracing is done in C++, so I cannot investigate further.
The current fix is to go back to torch 1.13.0 waiting the fix.
Edit: See Ben comment and Github, to have all info we should add experimental_config
. My personal uses of it revealed some other problems, notably using the Kineto traces with HTA. But these problems are not part of the scope of this SO post.
Thanks to the person who brought this issue on torch repo and thanks to the maintainers of torch!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With