Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast string comparison in C

Tags:

c

string

compare

I currently have this kind of loop

while(1)
{
    generate_string(&buffer);

    for(int i = 0; i < filelines; i++)
    {
        if(strcmp(buffer,line[i]) == 0)
        {
           /*  do something  */
        }
    }
}

I have a file with a few million strings(which hopefully should be cut by half sometime soon), the number of all these strings is stored in filelines

line[i] is basically where the string itself is stored.

Currently, due to the comparison of these million strings, function generate_string(&buffer); is executed around 42 times per second. Is there a faster way to do string comparison in C?

like image 605
farmdve Avatar asked May 23 '12 14:05

farmdve


People also ask

Is strncmp faster than strcmp?

If n is sufficiently large that strncmp will compare the whole strings (so that the behavior becomes effectively the same as strcmp ) then strncmp is likely to be moderately slower because it also has to keep track of a counter, but the difference might or might not be measurable or even present in a given ...

Is strcmp fast?

While comparing the two function, in a loop repeted 500'000'000 times, strcmp execute too much fast, about x750 times faster. The execution time of that function is 3.058s , while strcmp only 0.004s .

How do you compare strings in C?

The strcmp() function, is used to compare the strings (str1,str2). The strings str1 and str2 will be compared using this function. If the function returns a value 0, it signifies that the strings are equal otherwise, strings are not equal.

Can you compare strings with == C?

You can't compare strings in C with ==, because the C compiler does not really have a clue about strings beyond a string-literal.


2 Answers

strcmp is usually optimized by all vendors. However, if you're not satisfied with this you can try:

  • Lookup Burst Tries
  • Use a suffix tree for fast string comparison -- see this article
  • Depending on the size of strings in your application you can write a custom string comparator. E.g: GNU libc used to have this optimization for small strings where they tested strings smaller than five bytes as integers. MS cl also has some optimizations for small-strings (do look it up).

But more importantly make sure strcmp is your real bottleneck.

like image 67
dirkgently Avatar answered Oct 17 '22 16:10

dirkgently


I benchmarked against using strcmp() and a macro to compare byte by byte. The macro version is "very" much faster when compared to that of strcmp(). Usually for string comparation, its much faster to use the byte comparator macro, rather than strcmp(). Ex:

#define str3_cmp(m, c0, c1, c2, c3) m[0] == c0 && m[1] == c1 && m[2] == c2 && m[3] == c3

This is "very" faster when compared to that of strcmp(). But writing those down is a pain, and your required to split out the strings char by char, so I have written a handy PHP script to Generate that as a header file for you.

You can use this string compare in a hot loop where you know exactly the size of char* you would be comparing.

#!/usr/bin/php
<?php
function generate_macro($num) : string {
        $returner = "#define str".$num."cmp_macro(ptr, ";
        for($x = 0; $x < $num; $x++){
                $returner .= "c".$x;
                if($x != $num-1){ $returner .= ", "; }
        }
        $returner .= ") ";
        for($x = 0; $x < $num; $x++){
                $returner .= "*(ptr+".$x.") == c".$x;
                if($x != $num-1){ $returner .= " && "; }
        }
        return $returner;
}
function generate_static_inline_fn(&$generated_macro, $num) : string {
        $generated_macro .= "static inline bool str".$num."cmp(const char* ptr, const char* cmp)".
                                "{\n\t\treturn str".$num."cmp_macro(ptr, ";
        for($x = 0; $x < $num; $x++){
                $generated_macro .= " *(cmp+".$x.")";
                if($x != $num-1){ $generated_macro .= ", "; }
        }
        $generated_macro .= ");\n}\n";
        return $generated_macro;
}

function handle_generation($argc, $argv) : void {
        $out_filename = $argv[$argc-1];
        $gen_macro = "";
        for($x = 0; $x < $argc-2; $x++){
                $macro = generate_macro($argv[$x+1])."\n";
                $gen_macro .= generate_static_inline_fn($macro, $argv[$x+1]);
        }
        file_put_contents($out_filename, $gen_macro);
}
handle_generation($argc, $argv);
?>

This script takes Two arguments.

  1. Size of char* you would be comparing.
  2. Output header file name.

Example: $ ./gen_faststrcmp.php 3 5 fast_strcmp.h And this will generate fast_strcmp.h with contents

#define str3cmp_macro(ptr, c0, c1, c2) *(ptr+0) == c0 && *(ptr+1) == c1 && *(ptr+2) == c2
static inline bool str3cmp(const char* ptr, const char* cmp){
                return str3cmp_macro(ptr,  *(cmp+0),  *(cmp+1),  *(cmp+2));
}
#define str5cmp_macro(ptr, c0, c1, c2, c3, c4) *(ptr+0) == c0 && *(ptr+1) == c1 && *(ptr+2) == c2 && *(ptr+3) == c3 && *(ptr+4) == c4
static inline bool str5cmp(const char* ptr, const char* cmp){
                return str5cmp_macro(ptr,  *(cmp+0),  *(cmp+1),  *(cmp+2),  *(cmp+3),  *(cmp+4));
}

And in your code you can use the function like so,

const char* compare_me = "Hello";
if(str5cmp(compare_me, "Hello")) { /* code goes here */ }
like image 24
sammy Avatar answered Oct 17 '22 16:10

sammy