...are just mentioned in the PTX manual. There is no hint about what they are good for or how to use them.
Does anyone know more? Am I just missing a common concept?
Goals of PTX PTX provides a stable programming model and instruction set for general purpose parallel programming. It is designed to be efficient on NVIDIA GPUs supporting the computation features defined by the NVIDIA Tesla architecture.
In PTX, predicate registers are virtual and have .pred as the type specifier. So, predicate registers can be declared as All instructions have an optional guard predicate which controls conditional execution of the instruction.
The most common use is for passing objects by value that do not fit within a PTX register, such as C structures larger than 8 bytes. In this case, a byte array in parameter space is used. Typically, the caller will declare a locally-scoped .param byte array variable that represents a flattened C structure or union.
The PTX-to-GPU translator and driver enable NVIDIA GPUs to be used as programmable parallel computers. 1.2. Goals of PTX PTX provides a stable programming model and instruction set for general purpose parallel programming.
Bart's comment is basically right. In more detail, as stated in the PTX ISA 3.1 manual,
For some instructions the destination operand is optional. A “bit bucket” operand denoted with an underscore (
_) may be used in place of a destination register.
There is actually only one class of instruction listed in the 3.1 PTX spec for which _ is a valid destination: atom. Here are the semantics of atom:
Atomically loads the original value at location a into destination register d, performs a reduction operation with operand b and the value in location a, and stores the result of the specified operation at location a, overwriting the original value.
And there is a note for atom:
Simple reductions may be specified by using the “bit bucket” destination operand ‘
_’.
So, we can construct an example:
atom.global.add.s32 _, [a], 4
This would add 4 to the signed integer at memory location a, and not return the previous value of location a in a register. So if you don't need the previous value, you can use this.  I assume that the compiler would generate this for this code
atomicAdd(&a, 4);
since the return value of atomicAdd is not stored to a variable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With