Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I prefetch specific data to a specific cache level in a CUDA kernel?

I understand that Fermi GPUs support prefetching to L1 or L2 cache. However, in the CUDA reference manual I can not find any thing about it.

Dues CUDA allow my kernel code to prefetch specific data to a specific level of cache?

like image 375
dalibocai Avatar asked Jan 21 '11 04:01

dalibocai


People also ask

What does asynchronous memory prefetching do CUDA?

The prefetch proactively triggers asynchronous data migration to GPU before the data is accessed, which reduces page faults and, consequently, the overhead in handling page faults.

What are the three general section of CUDA program?

There are three key language extensions CUDA programmers can use—CUDA blocks, shared memory, and synchronization barriers. CUDA blocks contain a collection of threads. A block of threads can share memory, and multiple threads can pause until all threads reach a specified set of execution.

What is function of __ global __ qualifier in CUDA program?

__global__ : 1. A qualifier added to standard C. This alerts the compiler that a function should be compiled to run on a device (GPU) instead of host (CPU).

What is the correct way to launch CUDA kernel?

In order to run a kernel on the CUDA threads, we need two things. First, in the main() function of the program, we call the function to be executed by each thread on the GPU. This invocation is called Kernel Launch and with it we need provide the number of threads and their grouping.


1 Answers

Well not at instruction level but detailed information about prefetching in GPUs in here:

Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
(paper in the the ACM symposium on microarchitecture 2010)

You can find instruction reference in nVIDIA's PTX ISA reference document; the relevant instructions are prefetch and prefetchu.

like image 173
kerem Avatar answered Sep 20 '22 17:09

kerem