GPU Emulator for CUDA programming without the hardware [closed]

People also ask

Can I run CUDA code without GPU?

The answer to your question is YES. The nvcc compiler driver is not related to the physical presence of a device, so you can compile CUDA codes even without a CUDA capable GPU.

Can I compile CUDA without NVIDIA GPU?

Hi, even if you dont have dedicated Nvidia GPU card in your laptop or computer, you can execute CUDA programs online using Google Colab.

Can I run CUDA online?

Most online CUDA classes use AWS GPU instances, which are not hard to setup. I may be very biased, but I do recommend CUDA over OpenCL for a number of reasons; 2/3 of GPU-related academic papers use CUDA over OpenCL. Most GPU based and hybrid CPU/GPU supercomputers use NVIDIA GPUs usually with CUDA.

Can you emulate CUDA?

You can check also gpuocelot project which is a true emulator in the sense that PTX (bytecode in which CUDA code is converted to) will be emulated.

This response may be too late, but it's worth noting anyway. GPU Ocelot (of which I am one of the core contributors) can be compiled without CUDA device drivers (libcuda.so) installed if you wish to use the Emulator or LLVM backends. I've demonstrated the emulator on systems without NVIDIA GPUs.

The emulator attempts to faithfully implement the PTX 1.4 and PTX 2.1 specifications which may include features older GPUs do not support. The LLVM translator strives for correct and efficient translation from PTX to x86 that will hopefully make CUDA an effective way of programming multicore CPUs as well as GPUs. -deviceemu has been a deprecated feature of CUDA for quite some time, but the LLVM translator has always been faster.

Additionally, several correctness checkers are built into the emulator to verify: aligned memory accesses, accesses to shared memory are properly synchronized, and global memory dereferencing accesses allocated regions of memory. We have also implemented a command-line interactive debugger inspired largely by gdb to single-step through CUDA kernels, set breakpoints and watchpoints, etc... These tools were specifically developed to expedite the debugging of CUDA programs; you may find them useful.

Sorry about the Linux-only aspect. We've started a Windows branch (as well as a Mac OS X port) but the engineering burden is already large enough to stress our research pursuits. If anyone has any time and interest, they may wish to help us provide support for Windows!

Hope this helps.

[1]: GPU Ocelot - https://code.google.com/archive/p/gpuocelot/
[2]: Ocelot Interactive Debugger - http://forums.nvidia.com/index.php?showtopic=174820

For those who are seeking the answer in 2016 (and even 2017) ...

Disclaimer

I've failed to emulate GPU after all.
It might be possible to use gpuocelot if you satisfy its list of dependencies.

I've tried to get an emulator for BunsenLabs (Linux 3.16.0-4-686-pae #1 SMP Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29) i686 GNU/Linux).

I'll tell you what I've learnt.

`nvcc` used to have a `-deviceemu` option back in CUDA Toolkit 3.0

I downloaded CUDA Toolkit 3.0, installed it and tried to run a simple program:

#include <stdio.h>

__global__ void helloWorld() {
    printf("Hello world! I am %d (Warp %d) from %d.\n",
        threadIdx.x, threadIdx.x / warpSize, blockIdx.x);
}

int main() {
    int blocks, threads;
    scanf("%d%d", &blocks, &threads);
    helloWorld<<<blocks, threads>>>();
    cudaDeviceSynchronize();
    return 0;
}

Note that in CUDA Toolkit 3.0 nvcc was in the /usr/local/cuda/bin/.

It turned out that I had difficulties with compiling it:

NOTE: device emulation mode is deprecated in this release
      and will be removed in a future release.

/usr/include/i386-linux-gnu/bits/byteswap.h(47): error: identifier "__builtin_bswap32" is undefined

/usr/include/i386-linux-gnu/bits/byteswap.h(111): error: identifier "__builtin_bswap64" is undefined

/home/user/Downloads/helloworld.cu(12): error: identifier "cudaDeviceSynchronize" is undefined

3 errors detected in the compilation of "/tmp/tmpxft_000011c2_00000000-4_helloworld.cpp1.ii".

I've found on the Internet that if I used gcc-4.2 or similarly ancient instead of gcc-4.9.2 the errors might disappear. I gave up.

gpuocelot

The answer by Stringer has a link to a very old gpuocelot project website. So at first I thought that the project was abandoned in 2012 or so. Actually, it was abandoned few years later.

Here are some up to date websites:
- GitHub;
- Project's website;
- Installation guide.
I tried to install gpuocelot following the guide. I had several errors during installation though and I gave up again. gpuocelot is no longer supported and depends on a set of very specific versions of libraries and software.

You might try to follow this tutorial from July, 2015 but I don't guarantee it'll work. I've not tested it.
MCUDA

The MCUDA translation framework is a linux-based tool designed to effectively compile the CUDA programming model to a CPU architecture.

It might be useful. Here is a link to the website.
CUDA Waste

It is an emulator to use on Windows 7 and 8. I've not tried it though. It doesn't seem to be developed anymore (the last commit is dated on Jul 4, 2013).

Here's the link to the project's website: https://code.google.com/archive/p/cuda-waste/

CU2CL

Last update: 12.03.2017

As dashesy pointed out in the comments, CU2CL seems to be an interesting project. It seems to be able to translate CUDA code to OpenCL code. So if your GPU is capable of running OpenCL code then the CU2CL project might be of your interest.

Links:
- CU2CL homepage
- CU2CL GitHub repository

You can check also gpuocelot project which is a true emulator in the sense that PTX (bytecode in which CUDA code is converted to) will be emulated.

There's also an LLVM translator, it would be interesting to test if it's more fast than when using -deviceemu.

The CUDA toolkit had one built into it until the CUDA 3.0 release cycle. I you use one of these very old versions of CUDA, make sure to use -deviceemu when compiling with nvcc.

Related questions
                            
                                Does CUDA support recursion?
                            
                                Coding CUDA with C#?
                            
                                CUDA determining threads per block, blocks per grid
                            
                                Error Message : Cannot find or open the PDB file
                            
                                How can I flush GPU memory using CUDA (physical reset is unavailable)
                            
                                GPU Programming, CUDA or OpenCL? [closed]
                            
                                When to call cudaDeviceSynchronize?
                            
                                Passing pointers between C and Java through JNI
                            
                                LNK2038: mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MD_DynamicRelease' in file.obj
                            
                                In CUDA, what is memory coalescing, and how is it achieved?
                            
                                nvidia-smi Volatile GPU-Utilization explanation?
                            
                                Streaming multiprocessors, Blocks and Threads (CUDA)
                            
                                Why is CUDA pinned memory so fast?
                            
                                Is it possible to run CUDA on AMD GPUs?
                            
                                Best approach for GPGPU/CUDA/OpenCL in Java?
                            
                                How do I select which GPU to run a job on?
                            
                                Can I run CUDA on Intel's integrated graphics processor?
                            
                                How to get the nvidia driver version from the command line?
                            
                                What is a bank conflict? (Doing Cuda/OpenCL programming)
                            
                                NVIDIA vs AMD: GPGPU performance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

GPU Emulator for CUDA programming without the hardware [closed]

Tags:

emulation

cpu

cuda

gpu

People also ask

For those who are seeking the answer in 2016 (and even 2017) ...

Disclaimer

`nvcc` used to have a `-deviceemu` option back in CUDA Toolkit 3.0

`gpuocelot`

MCUDA

CUDA Waste

CU2CL

Recent Activity

Donate For Us

GPU Emulator for CUDA programming without the hardware [closed]

Tags:

emulation

cpu

cuda

gpu

People also ask

For those who are seeking the answer in 2016 (and even 2017) ...

Disclaimer

nvcc used to have a -deviceemu option back in CUDA Toolkit 3.0

gpuocelot

MCUDA

CUDA Waste

CU2CL

Related questions

Recent Activity

Donate For Us

`nvcc` used to have a `-deviceemu` option back in CUDA Toolkit 3.0

`gpuocelot`