I have a byte array
unsigned char* array=new unsigned char[4000000];
...
And I would like to get indices of all non-zero elements of the array.
Of course, I can do following
for(int i=0;i<size;i++)
{
if(array[i]!=0) somevector.push_back(i);
}
Is there any faster algorithm than this?
Update 1 I can see majority answer is no. I hoped that there is some magical bit operations I am not aware of. Some guys suggested sorting but no it's not feasible in this case. But thanks a lot for all your answers.
Update 2 After 4 years and 4 months since this question posted, @wim suggested this answer that looks promising.
Unless your vector is ordered, this is the most efficient algorithm to perform what you want to do if you are using a mono-thread program. You can try to optimize the data structure where you want to store your result, but in time this is the best you can do.
With a byte array that is mostly zero, being a sparse array, you can take advantage of a 32 bit CPU by doing comparisons 4 bytes at a time. The actual comparisons are done 4 bytes at a time however if any of the bytes are non-zero then you have to determine which of the bytes in the unsigned long are non-zero so that will take more effort. If the array is really sparse then the time saved with the comparisons may compensate for the additional work determining which of the bytes are non-zero.
The easiest would be to make the unsigned char array sized to some multiple of 4 bytes so that you do not need to worry about doing the last few bytes after the loop completes.
I would suggest doing a timing study on this as it is purely conjectural and there would be a point where an array becomes un-sparse enough that this would take more time than a simple loop.
One question that I would have is what are you doing with the vector of offsets of non-zero elements of the array and whether you can do away with the vector. Another question is if you need the vector whether you can build the vector as you place elements into the array.
unsigned char* array=new unsigned char[4000000];
......
unsigned long *pUlaw = (unsigned long *)array;
for ( ; pUlaw < array + 4000000; pUlaw++) {
if (*pUlaw) {
// at least one byte is non-zero
unsigned char *pUlawByte = (unsigned char *)pUlaw;
if (*pUlawByte)
somevector.push_back(pUlawByte - array);
if (*(pUlawByte+1))
somevector.push_back(pUlawByte - array + 1);
if (*(pUlawByte+2))
somevector.push_back(pUlawByte - array + 2);
if (*(pUlawByte+3))
somevector.push_back(pUlawByte - array + 3);
}
}
If the non-zero values are relatively rare, one trick you can use is a sentinel value:
unsigned char old_value = array[size-1];
array[size-1] = 1; // make sure we find a non-zero eventually
int i=0;
for (;;) {
while (array[i]==0) ++i; // tighter loop
if (i==size-1) break;
somevector.push_back(i);
++i;
}
array[size-1] = old_value;
if (old_value!=0) {
somevector.push_back(size-1);
}
This avoids having to check both the index and the value on each iteration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With