Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ What is the fastest way to scan for certain elements within a unsigned char array and a unsigned char vector?

Tags:

c++

I have a small question, what is the FASTEST way to scan for certain elements within a LARGE unsigned char array and a vector that contains only unsigned char elements? Straight answer would be great, but in-depth detailed answer would be awesome. What do I mean by fast? Basically, to search for certain characters within at least a second. I know that wasn't a very educated definition...

Note: The array is not sorted.

Common Declaration:

unsigned char* Array = new unsigned char[ 50000 ];
std::vector< unsigned char > Vec( 50000 );
/*
 * Fill Array & Vec with random bytes
 */

Lets say, I want to search for the letter 'a' in Array, I would simply write this loop to search for it:

Note: The searching process will be search for more than one elements. Mainly, 256. Hence, you can exploit that magic number.

For loop method:

unsigned int Count = 0;
for ( unsigned int Index = 0; Index != 50000; ++ Index )
   if( Array[ Index ] == 'a' ) Count ++;

std::count method:

unsigned int Count = std::count ( Array, Array + 50000, 'a' );

Are there any faster way to search for certain elements within Array?

Some IDEAS - Please don't give me a thumbs down for this! Its only an idea. I want some opinions.

Sorting

Would the speed be better if we made a copy of Array and sort it? Why make a copy? Well, because we need to keep the original content. The goal is to basically scan and count the occurrence of a character. Remember, speed matter. That means, the copying process must be fast.

Answer: No and its not worth it!

Why? Well, lets read this:

@Kiril Kirov:

Depends. If you plan to search for a single char - absolutely not. Copying the array is an expensive operation. Sorting it - even more expensive.

Well, if you will have only one array and you plan to search for, let's say, 100 different characters, then this method could give you a better performance. Now, this really depends on your usage. And nobody will be able to give you the absolutely correct answer for this case. You need to run it and profile.

*Scroll down to @Kiril Krov's informative post for more.

Answer: So far, there isn't a solid or an answer, because there isn't a really "fast" method to achieve this goal, especially when its not SORTED. However, threads could be a possible solution. But, watch out for our CPU! This was based on @Andrea's submitted answer (scroll down a little more for more info) - I hoped I read it right.

like image 538
CLearner Avatar asked Mar 21 '13 07:03

CLearner


2 Answers

As others wrote, the complexity of the best algorithm is O(n), especially since your array is not sorted.

To make the search faster, you could subdivide the array and scan each portion separately in separate threads. This would scale linearly with the number of CPU cores you have available on your machine.

If, for example, you have four cores available, then spawn four threads and let each thread scan one fourth of the array.

Probably this discussion might help: Using threads to reduce array search time


In any case (and this is true for any performance related issues), you should profile your code. Create a test case for the approach you have, measure the time it takes and take this as a baseline. Then, for each modification you do, redo the measurement to check if it really improves the execution time. Also make sure to do each measurement more than once (within the same test case) and calculate the average, to reduce caching and other warming up effects (ideally, execute the code at least once before starting the first measurement).

This is Java related, but gives some good feedback that it does not in all cases make sense to parallelize: A Beginner´s Guide to Hardcore Concurrency

like image 140
Andreas Fester Avatar answered Sep 29 '22 09:09

Andreas Fester


The best algorithm would be O(n), where n is the number of elements.

As you need to check each element, you must go through the whole array.

The easies way I can think of, is already written in your own answer.

And there's no faster way to do this - the memory is continuous, the array is not sorted, you need to "touch" each element. That's the fastest possible solution.


Regarding your edit: using std::count and "manually" looping through the array will give you the same performance.


Are there any faster way to search for certain elements within Array

Yes, if the array is sorted. Then you can achieve up to O( log(n) ). Then you would need some existing search algorithm, like binary search, for example.


Would the speed be better if we made a copy of Array and sort it

Depends. If you plan to search for a single char - absolutely not. Copying the array is an expensive operation. Sorting it - even more expensive.

Well, if you will have only one array and you plan to search for, let's say, 100 different characters, then this method could give you a better performance. Now, this really depends on your usage. And nobody will be able to give you the absolutely correct answer for this case. You need to run it and profile.

like image 39
Kiril Kirov Avatar answered Sep 29 '22 08:09

Kiril Kirov