I have a kernel that is performing poorly on CC 3.0 (Kepler) as opposed to CC 2.0 (Fermi). In the Nsight profiler, the Warp Issue Efficiency
chart is showing that 60% of the time, there were no eligible warps and the Issue Stall Reasons
chart is showing that 60% of these are due to "Other".
I'm wondering what the Other issue stall reasons are and what I might do to reduce them.
CUDA 5.0. / Nsight 3.0. RC / CC 3.0.
In Nsight Visual Studio Edition 3.0 CUDA Profiler the Issue Efficiency displays a pie chart of the warp stall reasons. The stall reasons are Instruction Fetch, Execution Dependency, Data Requests, Texture, Synchronization, and Other.
For Compute Capability 3.* devices the Other category is the percentage of time that active warps are stalled due to the following reasons:
For Compute Capability 5.* and 6.* devices the Other category is the percentage of time that active warps are stalled due to the following reasons:
For 5.* and 6.*, especially gp100, the last reason can be very high (~75%) if the kernel reaches 32 warps per warp scheduler.
These stalls reasons are grouped into the other category as it is hard to identify actions that a developer can taken to resolve these issues.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With