Difference between "CPU OpenCL Project" and "GPU OpenCL Project"

Tags:

I installed the Intel OpenCL SDK and I wanted to create a project. Visual Studio 2017 showed me those two options and a third "Empty OpenCL Project". I don't know what the difference between the two is. I tried to look through the template code but since I don't (yet) know anything about OpenCL I couldn't understand their difference.

License header:

/*****************************************************************************
 * Copyright (c) 2013-2016 Intel Corporation
 * All rights reserved.
 *
 * WARRANTY DISCLAIMER
 *
 * THESE MATERIALS ARE PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS
 * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
 * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
 * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING
 * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THESE
 * MATERIALS, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 * Intel Corporation is the author of the Materials, and requests that all
 * problem reports or change requests be submitted to it directly
 *****************************************************************************/

I ran a diff as suggested:

625,629c625,626
<     // Create new OpenCL buffer objects
<     // As these buffer are used only for read by the kernel, you are recommended to create it with flag CL_MEM_READ_ONLY.
<     // Always set minimal read/write flags for buffers, it may lead to better performance because it allows runtime
<     // to better organize data copying.
<     // You use CL_MEM_COPY_HOST_PTR here, because the buffers should be populated with bytes at inputA and inputB.
---
>     cl_image_format format;
>     cl_image_desc desc;
631c628,650
<     ocl->srcA = clCreateBuffer(ocl->context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint) * arrayWidth * arrayHeight, inputA, &err);
---
>     // Define the image data-type and order -
>     // one channel (R) with unit values
>     format.image_channel_data_type = CL_UNSIGNED_INT32;
>     format.image_channel_order     = CL_R;
> 
>     // Define the image properties (descriptor)
>     desc.image_type        = CL_MEM_OBJECT_IMAGE2D;
>     desc.image_width       = arrayWidth;
>     desc.image_height      = arrayHeight;
>     desc.image_depth       = 0;
>     desc.image_array_size  = 1;
>     desc.image_row_pitch   = 0;
>     desc.image_slice_pitch = 0;
>     desc.num_mip_levels    = 0;
>     desc.num_samples       = 0;
> #ifdef CL_VERSION_2_0
>     desc.mem_object        = NULL;
> #else
>     desc.buffer            = NULL;
> #endif
> 
>     // Create first image based on host memory inputA
>     ocl->srcA = clCreateImage(ocl->context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, &format, &desc, inputA, &err);
634c653
<         LogError("Error: clCreateBuffer for srcA returned %s\n", TranslateOpenCLError(err));
---
>         LogError("Error: clCreateImage for srcA returned %s\n", TranslateOpenCLError(err));
638c657,658
<     ocl->srcB = clCreateBuffer(ocl->context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint) * arrayWidth * arrayHeight, inputB, &err);
---
>     // Create second image based on host memory inputB
>     ocl->srcB = clCreateImage(ocl->context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, &format, &desc, inputB, &err);
641c661
<         LogError("Error: clCreateBuffer for srcB returned %s\n", TranslateOpenCLError(err));
---
>         LogError("Error: clCreateImage for srcB returned %s\n", TranslateOpenCLError(err));
645,649c665,666
<     // If the output buffer is created directly on top of output buffer using CL_MEM_USE_HOST_PTR,
<     // then, depending on the OpenCL runtime implementation and hardware capabilities, 
<     // it may save you not necessary data copying.
<     // As it is known that output buffer will be write only, you explicitly declare it using CL_MEM_WRITE_ONLY.
<     ocl->dstMem = clCreateBuffer(ocl->context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint) * arrayWidth * arrayHeight, outputC, &err);
---
>     // Create third (output) image based on host memory outputC
>     ocl->dstMem = clCreateImage(ocl->context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, &format, &desc, outputC, &err);
652c669
<         LogError("Error: clCreateBuffer for dstMem returned %s\n", TranslateOpenCLError(err));
---
>         LogError("Error: clCreateImage for dstMem returned %s\n", TranslateOpenCLError(err));
734c751,755
<     cl_int *resultPtr = (cl_int *)clEnqueueMapBuffer(ocl->commandQueue, ocl->dstMem, true, CL_MAP_READ, 0, sizeof(cl_uint) * width * height, 0, NULL, NULL, &err);
---
>     size_t origin[] = {0, 0, 0};
>     size_t region[] = {width, height, 1};
>     size_t image_row_pitch;
>     size_t image_slice_pitch;
>     cl_int *resultPtr = (cl_int *)clEnqueueMapImage(ocl->commandQueue, ocl->dstMem, true, CL_MAP_READ, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &err);
783c804
<     cl_device_type deviceType = CL_DEVICE_TYPE_CPU;
---
>     cl_device_type deviceType = CL_DEVICE_TYPE_GPU;

I could also paste int the two complete source files but they are long (900 lines).

396

asked Jul 02 '18 12:07

raldone01

1 Answers

You've sort of answered it yourself with the diff. In the diff output you can see one project uses a clBuffer object while the other uses the clImage.

Image support is optional in the OpenCL standard, so it depends on the device and driver. GPU devices may have better performance with the image type, and most if not all Intel integrated GPUs support the image types (AFAIK).

Both codes use the host pointer, which works well on Intel devices as the iGPU and CPU can address the same memory, or at least behave that way. However, this may not always be optimal for discrete GPUs.

173

answered Oct 17 '22 00:10

Andreas Gravgaard Andersen

Related questions
                            
                                Create the simplest allocator with two template arguments
                            
                                Boost's data-driven tests' join operator `+` corrupts first column
                            
                                Converting struct to constexpr array of uint8_t
                            
                                constexpr array member with template specialization: inconsistent behavior cross compilers
                            
                                Range of integers imposed by the C++ standard
                            
                                Deducing template member function in class template
                            
                                How to render focus indicators in an offscreen window?
                            
                                Qt application misses start menu and taskbar icon on Windows 10
                            
                                Android studio 3 c++ file full of errors but compilation is ok
                            
                                How do I modify a point in a cell in a vtkPolydata?
                            
                                how to use standard library with C++ modules? (eg: `import std.io`)
                            
                                threadpool c++ implementation questions
                            
                                Strength of the multi-pass guarantee for forward iterators
                            
                                const, span, and iterator trouble
                            
                                Why injected-class-name is sometimes not treated as a template name in a class template?
                            
                                Why 'acquire/release' can not guarantee sequential consistency in c++11?
                            
                                C++ streaming based JSON parser for TCP sockets [closed]
                            
                                Idiomatic way to enforce constexpr-ness of constexpr functions
                            
                                Clang error – Compiler bug or missing some detail?
                            
                                How to determine whether a file or folder is on SSD or a hard drive?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between "CPU OpenCL Project" and "GPU OpenCL Project"

Tags:

c++

opencl

raldone01

People also ask

1 Answers

Andreas Gravgaard Andersen

Recent Activity

Donate For Us