PyCUDA's documentation mentions Driver Interface calls in passing, but I'm a bit think and can't see how to get information such as 'SHARED_SIZE_BYTES' out of my code.
Can anyone point me to any examples of querying the device in this way?
Is it possible to / How do I check the device state (eg between malloc/memcpy and kernel launch) to implement some machine-dynamic operations? (I want to be able to deal with devices that support multiple kernels in a 'friendly' way.
Just for anyone else coming across this, spending half an hour with the CUDA API in one hand, and the PyCUDA documentation in another does wonders. Its much simpler than my initial experiments indicated.
Runtime Kernel Info
Incoming lazy lazy code
...
kernel=mod.get_function("foo")
meminfo(kernel)
...
def meminfo(kernel):
shared=kernel.shared_size_bytes
regs=kernel.num_regs
local=kernel.local_size_bytes
const=kernel.const_size_bytes
mbpt=kernel.max_threads_per_block
print("=MEM=\nLocal:%d,\nShared:%d,\nRegisters:%d,\nConst:%d,\nMax Threads/B:%d" % (local,shared,regs,const,mbpt))
Example Output
=MEM=
Local:24,
Shared:64,
Registers:18,
Const:0,
Max Threads/B:512
Static Device Info
Incoming lazy lazy code
import pycuda.autoinit
import pycuda.driver as cuda
(free,total)=cuda.mem_get_info()
print("Global memory occupancy:%f%% free"%(free*100/total))
for devicenum in range(cuda.Device.count()):
device=cuda.Device(devicenum)
attrs=device.get_attributes()
#Beyond this point is just pretty printing
print("\n===Attributes for device %d"%devicenum)
for (key,value) in attrs.iteritems():
print("%s:%s"%(str(key),str(value)))
Example Output
Global memory occupancy:70.000000% free
===Attributes for device 0
MAX_THREADS_PER_BLOCK:512
MAX_BLOCK_DIM_X:512
MAX_BLOCK_DIM_Y:512
MAX_BLOCK_DIM_Z:64
MAX_GRID_DIM_X:65535
MAX_GRID_DIM_Y:65535
MAX_GRID_DIM_Z:1
MAX_SHARED_MEMORY_PER_BLOCK:16384
TOTAL_CONSTANT_MEMORY:65536
WARP_SIZE:32
MAX_PITCH:2147483647
MAX_REGISTERS_PER_BLOCK:8192
CLOCK_RATE:1500000
TEXTURE_ALIGNMENT:256
GPU_OVERLAP:1
MULTIPROCESSOR_COUNT:14
KERNEL_EXEC_TIMEOUT:1
INTEGRATED:0
CAN_MAP_HOST_MEMORY:1
COMPUTE_MODE:DEFAULT
MAXIMUM_TEXTURE1D_WIDTH:8192
MAXIMUM_TEXTURE2D_WIDTH:65536
MAXIMUM_TEXTURE2D_HEIGHT:32768
MAXIMUM_TEXTURE3D_WIDTH:2048
MAXIMUM_TEXTURE3D_HEIGHT:2048
MAXIMUM_TEXTURE3D_DEPTH:2048
MAXIMUM_TEXTURE2D_ARRAY_WIDTH:8192
MAXIMUM_TEXTURE2D_ARRAY_HEIGHT:8192
MAXIMUM_TEXTURE2D_ARRAY_NUMSLICES:512
SURFACE_ALIGNMENT:256
CONCURRENT_KERNELS:0
ECC_ENABLED:0
PCI_BUS_ID:1
PCI_DEVICE_ID:0
TCC_DRIVER:0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With