I need to do a lot of bit-wise operations on GPUs, but cannot find any information regarding whether Nvidia hardware is big or little-endian.
CUDA parallel computing platform. Our next step in understanding GPU architecture leads us to Nvidia's popular Compute Unified Device Architecture (CUDA) parallel computing platform.
The GPU is called a device and GPU memory likewise called device memory. To execute any CUDA program, there are three main steps: Copy the input data from host memory to device memory, also known as host-to-device transfer. Load the GPU program and execute, caching data on-chip for performance.
In GPU microarchitecture, a host means CPU, a device means GPU, and a kernel acts as a function that runs on the device. A CUDA program comprises of a host program, consisting of one or more sequential threads running on a host, and one or more parallel kernels suitable for execution on a parallel computing GPU.
See: https://devtalk.nvidia.com/default/topic/366773/cuda-programming-and-performance/endian-mode-of-the-device/post/2630674/#2630674
All of the supported CUDA platforms use little-endian CPUs, and cudaMemcpy() can copy data structures to the device without knowing the data format, so I would assume the GPU is also little-endian. The GPU might support both big and little endian execution (as some CPUs also do this) as a hedge against future CUDA platforms being big endian.
My guess is the answer has to be either "little-endian" or "both".
Per the Hardware Implementation section of the CUDA guide, little-endian.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With