I have a file that's 21056 bytes. I've written a program in C that reads the entire file into a buffer, and then uses multiple search algorithms to search the file for a token that's 82 chars. I've used all the implementations of the algorithms from the “Exact String Matching Algorithms” page. I've used: KMP, BM, TBM, and Horspool. And then I used <code>strstr</code> and benchmarked each one. What I'm wondering is, each time the <code>strstr</code> outperforms all the other algorithms. The only one that is faster sometimes is BM. Shouldn't <code>strstr</code> be the slowest? Here's my benchmark code with an example of benchmarking BM: <pre class="prettyprint"><code>double get_time() { LARGE_INTEGER t, f; QueryPerformanceCounter(&t); QueryPerformanceFrequency(&f); return (double)t.QuadPart/(double)f.QuadPart; } </code></pre> <pre class="prettyprint"><code>before = get_time(); BM(token, strlen(token), buffer, len); after = get_time(); printf("Time: %f\n\n", after - before); </code></pre> Could someone explain to me why <code>strstr</code> is outperforming the other search algorithms? I'll post more code on request if needed.

Why do you think <code>strstr</code> should be slower than all the others? Do you know what algorithm <code>strstr</code> uses? I think it's quite likely that <code>strstr</code> uses a fine-tuned, processor-specific, assembly-coded algorithm of the <code>KMP</code> type or better. In which case you don't stand a chance of out-performing it in <code>C</code> for such small benchmarks. (The reason I think this is likely is that programmers love to implement such things.)

strstr faster than algorithms?

Tags:

performance

c

string-matching

algorithm

strstr

I have a file that's 21056 bytes.

I've written a program in C that reads the entire file into a buffer, and then uses multiple search algorithms to search the file for a token that's 82 chars.

I've used all the implementations of the algorithms from the “Exact String Matching Algorithms” page. I've used: KMP, BM, TBM, and Horspool. And then I used strstr and benchmarked each one.

What I'm wondering is, each time the strstr outperforms all the other algorithms. The only one that is faster sometimes is BM.

Shouldn't strstr be the slowest?

Here's my benchmark code with an example of benchmarking BM:

double get_time()
{
    LARGE_INTEGER t, f;
    QueryPerformanceCounter(&t);
    QueryPerformanceFrequency(&f);
    return (double)t.QuadPart/(double)f.QuadPart;
}

before = get_time();
BM(token, strlen(token), buffer, len);
after = get_time();
printf("Time: %f\n\n", after - before);

Could someone explain to me why strstr is outperforming the other search algorithms? I'll post more code on request if needed.

625

asked Sep 28 '11 17:09

Josh

1 Answers

Why do you think strstr should be slower than all the others? Do you know what algorithm strstr uses? I think it's quite likely that strstr uses a fine-tuned, processor-specific, assembly-coded algorithm of the KMP type or better. In which case you don't stand a chance of out-performing it in C for such small benchmarks.

(The reason I think this is likely is that programmers love to implement such things.)

176

answered Oct 01 '22 14:10

TonyK

Related questions
                            
                                Proper Way To Initialize Unsigned Char*
                            
                                doxygen comment multiple variables at once
                            
                                Is div function useful (stdlib.h)? [duplicate]
                            
                                c - cannot take address of bit-field
                            
                                How to round down a double to the nearest smaller int in C?
                            
                                Detecting attached USB devices under Mac OSX
                            
                                c++ pow(2,1000) is normaly to big for double, but it's working. why?
                            
                                Is increment an integer atomic in x86? [duplicate]
                            
                                How to parse integer command line arguments in C?
                            
                                iOS/C: Convert "integer" into four character string
                            
                                View default include path of C headers in Mac OS X by `gcc -v`?
                            
                                Getopt not included? implicit declaration of function ‘getopt’
                            
                                C/C++ Image Loading [closed]
                            
                                How to calculate the MD5 hash of a large file in C?
                            
                                Checking if bit is not set
                            
                                Code before the first 'case' in a switch-statement
                            
                                When to use strncpy or memmove?
                            
                                mmap problem, allocates huge amounts of memory
                            
                                returning multiple values from a function [duplicate]
                            
                                Why use enum when #define is just as efficient? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With