Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strstr faster than algorithms?

I have a file that's 21056 bytes.

I've written a program in C that reads the entire file into a buffer, and then uses multiple search algorithms to search the file for a token that's 82 chars.

I've used all the implementations of the algorithms from the “Exact String Matching Algorithms” page. I've used: KMP, BM, TBM, and Horspool. And then I used strstr and benchmarked each one.

What I'm wondering is, each time the strstr outperforms all the other algorithms. The only one that is faster sometimes is BM.

Shouldn't strstr be the slowest?

Here's my benchmark code with an example of benchmarking BM:

double get_time()
{
    LARGE_INTEGER t, f;
    QueryPerformanceCounter(&t);
    QueryPerformanceFrequency(&f);
    return (double)t.QuadPart/(double)f.QuadPart;
}
before = get_time();
BM(token, strlen(token), buffer, len);
after = get_time();
printf("Time: %f\n\n", after - before);

Could someone explain to me why strstr is outperforming the other search algorithms? I'll post more code on request if needed.

like image 625
Josh Avatar asked Sep 28 '11 17:09

Josh


People also ask

How efficient is Strstr?

strstr is heavily optimized, and usually written in assembly language. It does things like reading data 4 bytes at a time and comparing them (bit-twiddling if necessary if the alignment isn't right) to minimize memory latency. It can also take advantage of things like SSE to load 16 bytes at a time.

What is the time complexity of Strstr?

Following's iterative implementation of the strstr() function. It returns a pointer to the first occurrence of Y in X or a null pointer if Y is not part of X . The time complexity of this solution is O(m.n) where m and n are the length of String X and Y, respectively.

How do you implement Strstr functions?

Syntax: char *strstr (const char *s1, const char *s2); Parameters: s1: This is the main string to be examined. s2: This is the sub-string to be searched in s1 string. Return Value: This function returns a pointer points to the first character of the found s2 in s1 otherwise a null pointer if s2 is not present in s1.


1 Answers

Why do you think strstr should be slower than all the others? Do you know what algorithm strstr uses? I think it's quite likely that strstr uses a fine-tuned, processor-specific, assembly-coded algorithm of the KMP type or better. In which case you don't stand a chance of out-performing it in C for such small benchmarks.

(The reason I think this is likely is that programmers love to implement such things.)

like image 176
TonyK Avatar answered Oct 01 '22 14:10

TonyK