Measure the overhead of context switching in GPU

Tags:

There are a lot of ways to measure the CPU context switching overhead. It seems it has few resources to measure the GPU context switching overhead. The CPU context switching and GPU's are quite different.

The GPU scheduling is based on warp scheduling. To calculate the overhead of GPU context switching, I need to know the time of warp with context switching and warp without context switching, and do the subtraction to get the overhead.

I am confused about how to measure the time of warp with context switching? Does anyone have some ideas to measure?

447

asked Jun 17 '14 02:06

LongY

1 Answers

I don't think it really makes sense to talk about "overhead" of context switching on a GPU.

On a CPU, context switching is done in software, by a function in the kernel called a "scheduler". The scheduler is ordinary code, a sequence of machine instructions that the processor has to run, and time spent running the scheduler is time not spent doing "useful" work.

A GPU, on the other hand, does context switching in hardware, without a scheduler, and it's fast enough that when one task encounters a pipeline stall, another task can be brought in to utilize the pipeline stages that would otherwise be idle. This is called "latency hiding" — delays in one task are hidden by progress in other tasks. The context switches actually allow more useful work to be done in a given timeframe.

For more information, see this answer I wrote to a related question on SuperUser.

105

answered Sep 27 '22 21:09

Wyzard

Related questions
                            
                                cryptography hardware acceleration with GPU
                            
                                How to make vector-type-value to pinned memory in cuda
                            
                                Cuda virtual class
                            
                                How do I get a free version (non-trial) of the compiler "Cuda Fortran"? [closed]
                            
                                Passing the PTX program to the CUDA driver directly
                            
                                Redirecting CUDA printf to a C++ stream
                            
                                Numba Matrix Vector multiplication
                            
                                Cudafy cannot find cublas, cudafft
                            
                                ImportError: libcudart.so.7.0: cannot open shared object file: No such file or directory
                            
                                How do I know that cudaMemcpyAsync is done reading host memory?
                            
                                CUDA - Parallel Reduction Sum
                            
                                Optimizing execution of a CUDA kernel for Triangular Matrix calculation
                            
                                Allocate constant memory
                            
                                scaling factor for CUFFT
                            
                                CUBLAS matrix multiplication
                            
                                Minimum number of GPU threads to be effective
                            
                                Clarifying memory transactions in CUDA
                            
                                copy to the shared memory in cuda
                            
                                cuda - minimal example, high register usage
                            
                                CUDA/PTX 32-bit vs. 64-bit

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Measure the overhead of context switching in GPU

Tags:

cuda

gpu

overhead

context-switch

LongY

People also ask

1 Answers

Wyzard

Recent Activity

Donate For Us