Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PTX "bit bucket" registers

...are just mentioned in the PTX manual. There is no hint about what they are good for or how to use them.

Does anyone know more? Am I just missing a common concept?

like image 760
Dude Avatar asked Oct 18 '12 01:10

Dude


People also ask

What is PTX?

Goals of PTX PTX provides a stable programming model and instruction set for general purpose parallel programming. It is designed to be efficient on NVIDIA GPUs supporting the computation features defined by the NVIDIA Tesla architecture.

What is predicate register in PTX?

In PTX, predicate registers are virtual and have .pred as the type specifier. So, predicate registers can be declared as All instructions have an optional guard predicate which controls conditional execution of the instruction.

When to use a byte array in PTX?

The most common use is for passing objects by value that do not fit within a PTX register, such as C structures larger than 8 bytes. In this case, a byte array in parameter space is used. Typically, the caller will declare a locally-scoped .param byte array variable that represents a flattened C structure or union.

What is PTX-to-GPU?

The PTX-to-GPU translator and driver enable NVIDIA GPUs to be used as programmable parallel computers. 1.2. Goals of PTX PTX provides a stable programming model and instruction set for general purpose parallel programming.


1 Answers

Bart's comment is basically right. In more detail, as stated in the PTX ISA 3.1 manual,

For some instructions the destination operand is optional. A “bit bucket” operand denoted with an underscore (_) may be used in place of a destination register.

There is actually only one class of instruction listed in the 3.1 PTX spec for which _ is a valid destination: atom. Here are the semantics of atom:

Atomically loads the original value at location a into destination register d, performs a reduction operation with operand b and the value in location a, and stores the result of the specified operation at location a, overwriting the original value.

And there is a note for atom:

Simple reductions may be specified by using the “bit bucket” destination operand ‘_’.

So, we can construct an example:

atom.global.add.s32 _, [a], 4

This would add 4 to the signed integer at memory location a, and not return the previous value of location a in a register. So if you don't need the previous value, you can use this. I assume that the compiler would generate this for this code

atomicAdd(&a, 4);

since the return value of atomicAdd is not stored to a variable.

like image 143
harrism Avatar answered Sep 24 '22 21:09

harrism