Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enabling HVX SIMD in Hexagon DSP by using instruction intrinsics

I was using Hexagon-SDK 3.0 to compile my sample application for HVX DSP architecture. There are many tools related to Hexagon-LLVM available to use located folder at:

~/Qualcomm/HEXAGON_Tools/7.2.12/Tools/bin

I wrote a small example to calculate the product of two arrays to makes sure I can utilize the HVX hardware acceleration. However, when I generate my assembly, either with -S , or, with -S -emit-llvm I don't find any definition of HVX instructions such as vmem, vX, etc. My C application is executing on hexagon-sim for now till I manage to find a way to run in on the board as well.

As far as I understood, I need to define my HVX part of the code in C Intrinsics, but was not able to adapt the existing examples to match my own needs. It would be great if somebody could demonstrate how this process can be done. Also in the Hexagon V62 Programmer's Reference Manual many of the intrinsic instructions are not defined.

Here is my small app in pure C:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#if defined(__hexagon__)
#include "hexagon_standalone.h"
#include "subsys.h"
#endif
#include "io.h"
#include "hvx.cfg.h"


#define KERNEL_SIZE     9
#define Q               8
#define PRECISION       (1<<Q)

double vectors_dot_prod2(const double *x, const double *y, int n)
{
    double res = 0.0;
    int i = 0;
    for (; i <= n-4; i+=4)
    {
        res += (x[i] * y[i] +
                x[i+1] * y[i+1] +
                x[i+2] * y[i+2] +
                x[i+3] * y[i+3]);
    }
    for (; i < n; i++)
    {
        res += x[i] * y[i];
    }
    return res;
}


int main (int argc, char* argv[])
{
    int n;
    long long start_time, total_cycles;
/* -----------------------------------------------------*/
/*  Allocate memory for input/output                    */
/* -----------------------------------------------------*/
    //double *res  = memalign(VLEN, 4 *sizeof(double));
    const double *x  = memalign(VLEN, n *sizeof(double));
    const double *y  = memalign(VLEN, n *sizeof(double));

    if (  *x  == NULL || *y == NULL ){
        printf("Error: Could not allocate Memory for image\n");
        return 1;
}   
    #if defined(__hexagon__)
        subsys_enable();
        SIM_ACQUIRE_HVX;
    #if LOG2VLEN == 7
        SIM_SET_HVX_DOUBLE_MODE;
    #endif
    #endif

    /* -----------------------------------------------------*/                                                
    /*  Call fuction                                        */
    /* -----------------------------------------------------*/
    RESET_PMU();
    start_time = READ_PCYCLES();
    
    vectors_dot_prod2(x,y,n);

    total_cycles = READ_PCYCLES() - start_time;
    DUMP_PMU();



    printf("Array product of x[i] * y[i] = %f\n",vectors_dot_prod2(x,y,4));

    #if defined(__hexagon__)
        printf("AppReported (HVX%db-mode):  Array product of x[i] * y[i] =%f\n", VLEN, vectors_dot_prod2(x,y,4));
    #endif

return 0;
}

I compile it using hexagon-clang:

hexagon-clang -v  -O2 -mv60 -mhvx-double -DLOG2VLEN=7 -I../../common/include -I../include -DQDSP6SS_PUB_BASE=0xFE200000 -o arrayProd.o  -c  arrayProd.c

Then link it with subsys.o (is found in DSK and already compiled) and -lhexagon to generate my executable:

hexagon-clang -O2 -mv60 -o arrayProd.exe  arrayProd.o subsys.o -lhexagon

Finally, run it using the sim:

hexagon-sim -mv60 arrayProd.exe
like image 327
Amir Avatar asked Oct 18 '22 09:10

Amir


1 Answers

A bit late, but might still be useful.

Hexagon Vector eXtensions are not emitted automatically and current instruction set (as of 8.0 SDK) only supports integer manipulation, so compiler will not emit anything for the C code containing "double" type (it is similar to SSE programming, you have to manually pack xmm registers and use SSE intrinsics to do what you need).

You need to define what your application really requires. E.g., if you are writing something 3D-related and really need to calculate double (or float) dot products, you might convert yout floats to 16.16 fixed point and then use instructions (i.e., C intrinsics) like Q6_Vw_vmpyio_VwVh and Q6_Vw_vmpye_VwVuh to emulate fixed-point multiplication.

To "enable" HVX you should use HVX-related types defined in

#include <hexagon_types.h>
#include <hexagon_protos.h>

The instructions like 'vmem' and 'vmemu' are emitted automatically for statements like

// I assume 64-byte mode, no `-mhvx-double`. For 128-byte mode use 32 int array
int values[16] = { 1, 2, 3, ..... };

/* The following line compiles to 
     {
          r4 = __address_of_values
          v1 = vmem(r4 + #0)
     }
   You can get the exact code by using '-S' switch, as you already do
*/
HVX_Vector v = *(HVX_Vector*)values;

Your (fixed-point) version of dot_product may read out 16 integers at a time, multiply all 16 integers in a couple of instructions (see HVX62 programming manual, there is a tip to implement 32-bit integer multiplication from 16-bit one), then shuffle/deal/ror data around and sum up rearranged vectors to get dot product (this way you may calculate 4 dot products almost at once and if you preload 4 HVX registers - that is 16 4D vectors - you may calculate 16 dot products in parallel).

If what you are doing is really just byte/int image processing, you might use specific 16-bit and 8-bit hardware dot products in Hexagon instruction set, instead of emulating doubles and floats.

like image 193
Viktor Latypov Avatar answered Oct 20 '22 23:10

Viktor Latypov