CUDA

Question

Does anyone have any experience implementing a hash map on a CUDA Device? Specifically, I'm wondering how one might go about allocating memory on the Device and copying the result back to the Host, or whether there are any useful libraries that can facilitate this task.

It seems like I would need to know the maximum size of the hash map a priori in order to allocate Device memory. All my previous CUDA endeavors have used arrays and memcpys and therefore been fairly straightforward.

Any insight into this problem are appreciated. Thanks.

Mojtaba Valizadeh · Accepted Answer

BTW, warpcore is a framework for creating high-throughput, purpose-built hashing data structures on CUDA-accelerators. Hashing at the speed of light on modern CUDA-accelerators. You can find it here:

https://github.com/sleeepyjack/warpcore

jopasserat · Answer

There is a GPU Hash Table implementation presented in "CUDA by example", from Jason Sanders and Edward Kandrot.

Fortunately, you can get information on this book and download the examples source code freely on this page:
http://developer.nvidia.com/object/cuda-by-example.html

In this implementation, the table is pre-allocated on CPU and safe multithreaded access is ensured by a lock function based upon the atomic function atomicCAS (Compare And Swap).

Moreover, newer hardware generation (from 2.0) combined with CUDA >= 4.0 are supposed to be able to use directly new/delete operators on the GPU ( http://developer.nvidia.com/object/cuda_4_0_RC_downloads.html?utm_source=http://forums.nvidia.com&utm_medium=http://forums.nvidia.com&utm_term=Developers&utm_content=Developers&utm_campaign=CUDA4 ), which could serve your implementation. I haven't tested these features yet.

CUDA - Implementing Device Hash Map?

Tags:

hashmap

parallel-processing

nedblorf

2 Answers

Mojtaba Valizadeh

jopasserat

Recent Activity

Donate For Us