Thread synchronization with syncwarp

Tags:

cuda

Apart from the __syncthreads() function(s) which synchronizes the warps within a thread block, theres another function called __syncwarp(). What exactly does this function do?

The cuda programming guide says,

will cause the executing thread to wait until all warp lanes named in mask have executed a __syncwarp() (with the same mask) before resuming execution. All non-exited threads named in mask must execute a corresponding __syncwarp() with the same mask, or the result is undefined.

Executing __syncwarp() guarantees memory ordering among threads participating in the barrier. Thus, threads within a warp that wish to communicate via memory can store to memory, execute __syncwarp(), and then safely read values stored by other threads in the warp.

So does this mean that this function ensures synchronization in threads within a warp that is included by the mask? If so, do we need such synchronization within the threads in the same warp since they all are ensured to be executed in lockstep?

891

asked Sep 28 '17 10:09

BAdhi

1 Answers

This feature is available on CUDA 9 and yes it synchronizes all threads within a warp and useful for divergent warps. This is useful for Volta architecture in which threads within a warp can be scheduled separately.

100

answered Oct 21 '22 13:10

Mo Sani

Related questions
                            
                                Working with interleaved data in thrust
                            
                                CUDA's cudaMemcpyToSymbol() throws "invalid argument" error
                            
                                The memory consistency model CUDA 4.0 and global memory?
                            
                                Print messages in PyCUDA
                            
                                Dynamically detecting a CUDA enabled NVIDIA card and only then initializing the CUDA runtime: How to do?
                            
                                How to initialise CUDA Thrust vector without implicitly invoking 'copy'?
                            
                                CUDA 5 and Visual Studio 2010 intellisense error
                            
                                Using Cuda Object Linking with Cmake
                            
                                Integrating CUDA into a C++ application to use existing C++ class
                            
                                Defining templated constant variables in cuda
                            
                                Is it possible to reset or restart the GPU
                            
                                CUDA Dynamic Parallelism MakeFile
                            
                                Compile cuda file error: "runtime library" mismatch value 'MDd_DynamicDebug' doesn't match value 'MTd_StaticDebug' in vectorAddition_cuda.o
                            
                                Is there any way or even possible to get the overall utilization of a GPU during a period of time?
                            
                                CUDA device pointers
                            
                                Update a D3D9 texture from CUDA
                            
                                nVidia Thrust: device_ptr Const-Correctness
                            
                                NSight attach shows no available processes
                            
                                Profiling MATLAB mex CUDA applications with the NVIDIA visual profiler
                            
                                How can I use TensorFlow without CUDA on Linux?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With