Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to break an array into blocks

I have an array that represents points in a cuboid. It is a one dimensional array, which uses the following indexing function to realise the 3 dimensions:

int getCellIndex(int ix, int iy, int iz) {
    return ix + (iy * numCellsX) + (iz * numCellsX * numCellsY);
}

The number of cells in the domain is:

numCells = (numX + 2) * (numY + 2) * (numZ + 2)

Where numX/numY/numZ are the number of cells in the X/Y/Z direction. The +2 in each direction is to create padding cells around the outside of the domain. The number of cells in each direction is give by:

numX = 5 * numY
numZ = numY/2
numY = userInput

For each cell, I want to calculate a new value for that cell based upon it's neighbours value (i.e. a stencil), where it's neighbours are above, below, left, right, front and back. However, I only want to do this calculation for cells that aren't bad. I have a boolean array that tracks if a cell is bad. This is what the computation currently looks like:

for(int z = 1; z < numZ+1; z++) {
    for(int y = 1; y < numY+1; y++) {
        for(int x = 1; x < numX+1; x++) {
            if(!isBadCell[ getCellIndex(x,y,z) ] {
                // Do stencil Computation
            }
        }
    }
}

This is not great performance wise. I want to be able to vectorize the loop to improve performance, however I can't because of the if statement. I know if cells are bad in advance and this does not change throughout the computation. I'd like to split the domain up into blocks, preferably 4x4x4 blocks, so that I can calculate a priori per block if it contains bad cells, and if so process it as usual, or if not, use an optimized function that can take advantage of vectorization e.g.

for(block : blocks) {
    if(isBadBlock[block]) {
        slowProcessBlock(block) // As above
    } else {
        fastVectorizedProcessBlock(block)
    }
}

NOTE: It is not required for the blocks to physically exist i.e. this could be achieved by changing the indexing function, and using different indexes to loop over the array. I'm open to whatever works best.

The fastVectorizedProcessBlock() function would look similar to the slowProcessBlock() function, but with the if statement remove (since we know it doesn't contain bad cells), and a vectorization pragma.

How can I split up my domain into blocks so that I can accomplish this? It seems tricky because a) the number of cells in each direction are not equal, b) we need to take into account the padding cells, as we must never attempt to calculate their value, as this would lead to a memory access that is out of bounds.

How can I then process the blocks that don't contain bad cells without using an if statement?

EDIT:

This is the idea I originally had:

for(int i = 0; i < numBlocks; i++) { // use blocks of 4x4x4 = 64
    if(!isBadBlock[i]) {
        // vectorization pragma here
        for(int z = 0; z < 4; z++) {
            for(int y = 0; y < 4; y++) {
                for(int x = 0; x < 4; x++) {
                    // calculate stencil using getCellIndex(x,y,z)*i
                }
             }
         }
     } else {
         for(int z = 0; z < 4; z++) {
            for(int y = 0; y < 4; y++) {
                for(int x = 0; x < 4; x++) {
                    if(!isBadCell[i*getCellIndex(x,y,z)]) {    
                    // calculate stencil using getCellIndex(x,y,z)*i
                }
             }
         }
     }
 }

Cells would now be stored in blocks, i.e. all the cells in the first 4x4x4 block would be stored in pos 0-63, then all cells in the second block would be stored in pos 64-127 etc.

However, I don't think will work if the numX/numY/numZ values are not kind. For example, what if numY = 2, numZ = 1 and numX = 10? The for loops would expect the z direction to be at least 4 cells deep. Is there a good way to get past this?

UPDATE 2 - Here is what the stencil computation looks like:

if ( isBadCell[ getCellIndex(x,y,z) ] ) {
  double temp = someOtherArray[ getCellIndex(x,y,z) ] +
                    1.0/CONSTANT/CONSTANT*
                    (
                      - 1.0 * cells[ getCellIndex(x-1,y,z) ]
                      - 1.0 * cells[ getCellIndex(x+1,y,z) ]
                      - 1.0 * cells[ getCellIndex(x,y-1,z) ]
                      - 1.0 * cells[ getCellIndex(x,y+1,z) ]
                      - 1.0 * cells[ getCellIndex(x,y,z-1) ]
                      - 1.0 * cells[ getCellIndex(x,y,z+1) ]
                      + 6.0 * cells[ getCellIndex(x,y,z) ]
                      );
  globalTemp += temp * temp;
  cells[ getCellIndex(x,y,z) ] += -omega * temp / 6.0 * CONSTANT * CONSTANT;
}
like image 315
JC2188 Avatar asked Jan 27 '17 03:01

JC2188


People also ask

How do you split an array into a group?

We are required to write a JavaScript function that takes in an array of literals and a number and splits the array (first argument) into groups each of length n (second argument) and returns the two-dimensional array thus formed.

How do you divide an array?

To divide an array into two, we need at least three array variables. We shall take an array with continuous numbers and then shall store the values of it into two different variables based on even and odd values.


Video Answer


1 Answers

Where does getCellIndex() retrieve the values of numCellX and numCellY? It would be better to pass them as arguments instead of relying on global variables, and to make this function static inline to allow the compiler to optimize.

static line int getCellIndex(int ix, int iy, int iz, int numCellsX, numCellsY) {
    return ix + (iy * numCellsX) + (iz * numCellsX * numCellsY);
}

for (int z = 1; z <= numZ; z++) {
    for (int y = 1; y <= numY; y++) {
        for (int x = 1; x <= numX; x++) {
            if (!isBadCell[getCellIndex(x, y, z, numX + 2, numY + 2)] {
                // Do stencil Computation
            }
        }
    }
}

You could also remove all multiplications with some local variables:

int index = (numY + 2) * (numX + 2);  // skip top padding plane
for (int z = 1; z <= numZ; z++) {
    index += numX + 2;  // skip first padding row
    for (int y = 1; y <= numY; y++) {
        index += 1;   // skip first padding col
        for (int x = 1; x <= numX; x++, index++) {
            if (!isBadCell[index] {
                // Do stencil Computation
            }
        }
        index += 1;   // skip last padding col
    }
    index += numX + 2;   // skip last padding row
}

Whether these directions are promissing or not depends a lot on the actual computations performed to get the stencil value. You should post that too.

If you can change the format of the boolean array for bad cells, it would be useful to pad the lines to a multiple of 8 and to use horizontal padding of 8 columns to improve alignment. Making the boolean array an array of bits allows to check for 8, 16, 32 or even 64 cells at a time with a single test.

You can adjust the array pointer to use 0 based coordinates.

Here is how it would work:

int numCellsX = 8 + ((numX + 7) & ~7) + 8;
int numCellsY = 1 + numY + 1;
int numCellsXY = numCellsX * numCellsY;
// adjusted array_pointer
array_pointer = allocated_pointer + 8 + numCellsX + numCellsXY;
// assuming the isBadCell array is 0 based too.
for (int z = 0, indexZ = 0; z < numZ; z++, indexZ += numCellsXY) {
    for (int y = 0, indexY = indexZ; y < numY; y++, indexY += numCellsX) {
        for (int x = 0, index = indexY; x <= numX - 8; x += 8, index += 8) {
            int mask = isBadCell[index >> 3];
            if (mask == 0) {
                // let the compiler unroll computation for 8 pixels with
                for (int i = 0; i < 8; i++) {
                   // compute stencil value for x+i,y,z at index+i
                }
            } else {
                for (int i = 0; i < 8; i++, mask >>= 1) {
                    if (!(mask & 1)) {
                       // compute stencil value for x+i,y,z at index+i
                    }
                }
            }
        }
        int mask = isBadCell[index >> 3];
        for (; x < numX; x++, index++, mask >>= 1) {
            if (!(mask & 1)) {
                // compute stencil value for x,y,z at index
            }
        }
    }
}

EDIT:

The stencil function uses too many calls to getCellIndex. Here is how to optimize it using the index value computed in the above code:

// index is the offset of cell x,y,z
// numCellsX, numCellsY are the dimensions of the plane
// numCellsXY is the offset between planes: numCellsX * numCellsY

if (isBadCell[index]) {
    double temp = someOtherArray[index] +
                1.0 / CONSTANT / CONSTANT *
                ( - 1.0 * cells[index - 1]
                  - 1.0 * cells[index + 1]
                  - 1.0 * cells[index - numCellsX]
                  - 1.0 * cells[index + numCellsX]
                  - 1.0 * cells[index - numCellsXY]
                  - 1.0 * cells[index + numCellsXY]
                  + 6.0 * cells[index]
                );
    cells[index] += -omega * temp / 6.0 * CONSTANT * CONSTANT;
    globalTemp += temp * temp;
}

precomputing &cells[index] as a pointer might improve the code, but the compile should be able to detect this common subexpression and generate efficient code already.

EDIT2:

Here is a tiled approach: you can add the missing arguments, most sizes are assumed to be global but you should probably pass a pointer to a context structure with all these values. It uses isBadTile[] and isGoodTile[]: arrays of boolean telling if a given tile has all cells bad and all cells good respectively.

void handle_tile(int x, int y, int z, int nx, int ny, int nz) {
    int index0 = x + y * numCellsX + z * numCellsXY;
    // skipping a tile with all cells bad.
    if (isBadTile[index0] && nx == 4 && ny == 4 && nz == 4)
        return;
    // handling a 4x4x4 tile with all cells OK.
    if (isGoodTile[index0] && nx == 4 && ny == 4 && nz == 4) {
        for (int iz = 0; iz < 4; iz++) {
            for (int iy = 0; iy < 4; iy++) {
                for (int ix = 0; ix < 4; ix++) {
                    int index = index0 + ix + iy * numCellsX + iz + numCellsXY;
                    // Do stencil computation using `index`
                }
            }
        }
    } else {
        for (int iz = 0; iz < nz; iz++) {
            for (int iy = 0; iy < ny; iy++) {
                for (int ix = 0; ix < nx; ix++) {
                    int index = index0 + ix + iy * numCellsX + iz + numCellsXY;
                    if (!isBadCell[index] {
                        // Do stencil computation using `index`
                }
            }
        }
    }
}

void handle_cells() {
    int x, y, z;
    for (z = 1; z <= numZ; z += 4) {
        int nz = min(numZ + 1 - z, 4);
        for (y = 1; y <= numY; y += 4) {
            int ny = min(numY + 1 - y, 4);
            for (x = 1; x <= numX; x += 4) {
                int nx = min(numX + 1 - x, 4);
                handle_tile(x, y, z, nx, ny, nz);
            }
        }
    }
}

Here is a function to compute the isGoodTile[] array. The only offsets correctly computed correspond to values of x multiples of 4 + 1, y and z less than 3 from their maximum values.

This implementation is sub-optimal as fewer elements could be computed. Incomplete border tiles (less than 4 from the edge) could be flagged as not good to skip the good case with a single case. The test for bad tiles could work for these edge tiles if the isBadTile array was properly computed for the edge tiles, which is currently not the case.

void computeGoodTiles() {
    int start = 1 + numCellsX + numCellsXY;
    int stop = numCellsXY * numCellsZ - 1 - numCellsX - numCellsXY;

    memset(isGoodTile, 0, sizeof(*isGoodTile) * numCellsXY * numCellsZ);
    for (int i = start; i < stop; i += 4) {
        isGoodTile[i] = (isBadCell[i + 0] | isBadCell[i + 1] |
                         isBadCell[i + 2] | isBadCell[i + 3]) ^ 1;
    }
    for (int i = start; i < stop - 3 * numCellsX; i += 4) {
        isGoodTile[i] = isGoodTile[i + 0 * numCellsX] &
                        isGoodTile[i + 1 * numCellsX] &
                        isGoodTile[i + 2 * numCellsX] &
                        isGoodTile[i + 3 * numCellsX];
    }
    for (int i = start; i < stop - 3 * numCellsXY; i += 4) {
        isGoodTile[i] = isGoodTile[i + 0 * numCellsXY] &
                        isGoodTile[i + 1 * numCellsXY] &
                        isGoodTile[i + 2 * numCellsXY] &
                        isGoodTile[i + 3 * numCellsXY];
    }
}

void computeBadTiles() {
    int start = 1 + numCellsX + numCellsXY;
    int stop = numCellsXY * numCellsZ - 1 - numCellsX - numCellsXY;

    memset(isBadTile, 0, sizeof(*isBadTile) * numCellsXY * numCellsZ);
    for (int i = start; i < stop; i += 4) {
        isBadTile[i] = isBadCell[i + 0] & isBadCell[i + 1] &
                       isBadCell[i + 2] & isBadCell[i + 3];
    }
    for (int i = start; i < stop - 3 * numCellsX; i += 4) {
        isBadTile[i] = isBadTile[i + 0 * numCellsX] &
                       isBadTile[i + 1 * numCellsX] &
                       isBadTile[i + 2 * numCellsX] &
                       isBadTile[i + 3 * numCellsX];
    }
    for (int i = start; i < stop - 3 * numCellsXY; i += 4) {
        isBadTile[i] = isBadTile[i + 0 * numCellsXY] &
                       isBadTile[i + 1 * numCellsXY] &
                       isBadTile[i + 2 * numCellsXY] &
                       isBadTile[i + 3 * numCellsXY];
    }
}
like image 200
chqrlie Avatar answered Oct 23 '22 19:10

chqrlie