Optimized version of strstr (search has constant length)

Tags:

My C program had a lot of strstr function calls. The standard library strstr is already fast but in my case the search string has always length of 5 characters. I replaced it with a special version to gain some speed:

int strstr5(const char *cs, const char *ct)
{
    while (cs[4]) {

        if (cs[0] == ct[0] && cs[1] == ct[1] && cs[2] == ct[2] && cs[3] == ct[3] && cs[4] == ct[4])
            return 1;

        cs++;
    }

    return 0;
}

The function returns an integer because it’s enough to know if ct occurs in cs. My function is simple and faster than standard strstr in this special case but I’m interested to hear if anybody has some performance improvements that could be applied. Even small improvements are welcome.

Summary:

cs has length of >=10, but otherwise it can vary. Length is known before (not used in my function). Length of cs is usually from 100 to 200.
ct has length of 5
Content of strings can be anything

Edit: Thank you for all answers and comments. I have to study and test ideas to see what works best. I will start with MAK's idea about suffix trie.

280

asked Jun 27 '10 19:06

armakuni

2 Answers

There are several fast string search algorithms. Try looking at Boyer-Moore (as already suggested by Greg Hewgill), Rabin-Karp and KMP algorithms.

If you need to search for many small patterns in the same large body of text, you can also try implementing a suffix tree or a suffix array. But these are IMHO somewhat harder to understand and implement correctly.

But beware, these techniques are very fast, but only give you an appreciable speedup if the strings involved are very large. You might not see an appreciable speedup for strings less than say a 1000 characters long.

EDIT:

If you are searching on the same text over and over again (i.e. the value of cs is always/often the same across calls), you will get a big speedup by using a suffix trie (Basically a trie of suffixes). Since your text is as small as 100 or 200 characters, you can use the simpler O(n^2) method to build the trie and then do multiple fast searches on it. Each search would require only 5 comparisons instead of the usual 5*200.

Edit 2:

As mentioned by caf's comment, C's strstr algorithm is implementations dependent. glibc uses a linear time algorithm which should be more or less as fast in practice as any of the methods I've mentioned. While the OP's method is asymptotically slower (O(N*m) instead of O(n) ), it is faster probably due to the fact that both n and m (the lengths of the pattern and the text) are very small and it does not have to do any of the long preprocessing in the glibc version.

153

answered Oct 20 '22 10:10

MAK

Your code may access cs beyond the bounds of its allocation if cs is shorter than 4 characters.

A common optimisation for string search is to use the Boyer-Moore algorithm where you start looking in cs from the end of what would be ct. See the linked page for a full description of the algorithm.

answered Oct 20 '22 09:10

Greg Hewgill

Related questions
                            
                                What is the use of "-u" option in cat command? [closed]
                            
                                Is there any free OCaml to C translator? [closed]
                            
                                unistd.h and c99 on Linux
                            
                                In a 64 bit process, will my mmap / malloc request ever be denied?
                            
                                const in C vs const in C++
                            
                                C free and struct
                            
                                Where is function err_sys() defined?
                            
                                How to get the current time in milliseconds in C Programming [duplicate]
                            
                                Is it possible to ignore all signals?
                            
                                Using C flag enums in C++
                            
                                Linking .h files with .c with #ifdef header guards
                            
                                using scanf function with pointers to character
                            
                                What error code does a process that segfaults return? [duplicate]
                            
                                Is there a POSIX function to copy a file? [closed]
                            
                                Cannot calculate factorials bigger than 20! ! How to do so?
                            
                                How can you print multiple variables inside a string using printf?
                            
                                Structs with enums are different in C and C++, why?
                            
                                Sizeof operator with variable-length array type
                            
                                Using strcat in C
                            
                                C/C++ bitfields versus bitwise operators to single out bits, which is faster, better, more portable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Optimized version of strstr (search has constant length)

Tags:

c

strstr

armakuni

People also ask

2 Answers

MAK

Greg Hewgill

Recent Activity

Donate For Us