Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the fastest way to extract non-zero indices from a byte array in C++

Tags:

c++

algorithm

I have a byte array

unsigned char* array=new unsigned char[4000000];
 ...

And I would like to get indices of all non-zero elements of the array.

Of course, I can do following

for(int i=0;i<size;i++)
{
    if(array[i]!=0) somevector.push_back(i);
}

Is there any faster algorithm than this?

Update 1 I can see majority answer is no. I hoped that there is some magical bit operations I am not aware of. Some guys suggested sorting but no it's not feasible in this case. But thanks a lot for all your answers.

Update 2 After 4 years and 4 months since this question posted, @wim suggested this answer that looks promising.

like image 203
Tae-Sung Shin Avatar asked Sep 22 '12 16:09

Tae-Sung Shin


3 Answers

Unless your vector is ordered, this is the most efficient algorithm to perform what you want to do if you are using a mono-thread program. You can try to optimize the data structure where you want to store your result, but in time this is the best you can do.

like image 155
Hernan Velasquez Avatar answered Sep 28 '22 06:09

Hernan Velasquez


With a byte array that is mostly zero, being a sparse array, you can take advantage of a 32 bit CPU by doing comparisons 4 bytes at a time. The actual comparisons are done 4 bytes at a time however if any of the bytes are non-zero then you have to determine which of the bytes in the unsigned long are non-zero so that will take more effort. If the array is really sparse then the time saved with the comparisons may compensate for the additional work determining which of the bytes are non-zero.

The easiest would be to make the unsigned char array sized to some multiple of 4 bytes so that you do not need to worry about doing the last few bytes after the loop completes.

I would suggest doing a timing study on this as it is purely conjectural and there would be a point where an array becomes un-sparse enough that this would take more time than a simple loop.

One question that I would have is what are you doing with the vector of offsets of non-zero elements of the array and whether you can do away with the vector. Another question is if you need the vector whether you can build the vector as you place elements into the array.

unsigned char* array=new unsigned char[4000000];
......
unsigned long *pUlaw = (unsigned long *)array;

for ( ; pUlaw < array + 4000000; pUlaw++) {
    if (*pUlaw) {
        // at least one byte is non-zero
        unsigned char *pUlawByte = (unsigned char *)pUlaw;
        if (*pUlawByte)
            somevector.push_back(pUlawByte - array);
        if (*(pUlawByte+1))
            somevector.push_back(pUlawByte - array + 1);
        if (*(pUlawByte+2))
            somevector.push_back(pUlawByte - array + 2);
        if (*(pUlawByte+3))
            somevector.push_back(pUlawByte - array + 3);
    }
}
like image 25
Richard Chambers Avatar answered Sep 28 '22 06:09

Richard Chambers


If the non-zero values are relatively rare, one trick you can use is a sentinel value:

unsigned char old_value = array[size-1];
array[size-1] = 1; // make sure we find a non-zero eventually

int i=0;

for (;;) {
  while (array[i]==0) ++i; // tighter loop
  if (i==size-1) break;
  somevector.push_back(i);
  ++i;
}

array[size-1] = old_value;
if (old_value!=0) {
  somevector.push_back(size-1);
}

This avoids having to check both the index and the value on each iteration.

like image 44
Vaughn Cato Avatar answered Sep 28 '22 05:09

Vaughn Cato