Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizations in Code related to string functions

Tags:

c

optimization

Are there any guidelines available that can be followed before calling standard string operation related functions in C?

For example, how much optimization will comparing the first character of two strings (and checking if they are equal) before calling strcmp provide?

What types of overhead related to string related functions in C can be expected, and what mechanisms will help avoid them?

Thanks!

like image 733
Jay Avatar asked Dec 29 '09 18:12

Jay


2 Answers

The string functions are there for you to use them. If you need to compare two strings, call strcmp. Don't worry about tiny performance difference, which are mostly imagined anyway. Get your code working first.

First, to answer any question about performance, if you ask "How much optimization will..." the answer is "Profile!" Nobody can predict how fast something is going to run. The implementation of the C stdlib has been continuously improved for years, any optimization tricks you try to come up with might hurt it.

For example, I think GCC will use vectorization when comparing strings, so you're actually comparing some 4-8 elements at a time. Were you expecting that? Doing your single character compare might actually slow it down.

That said, a typical implementation just checks character for character, so you'd simply be moving one comparison out of the loop, for no net gain. (But as stated, maybe for a net loss!)

So the guideline is:

Program now, optimize later.

And optimize the rational way: with evidence and testing, not with guessing.

like image 109
GManNickG Avatar answered Sep 28 '22 20:09

GManNickG


Worrying about strcmp() is a micro-optimization most of the time - not worth the effort put in. There are other things to be warier of, such as:

for (i = 0; i < strlen(s); i++)
{
     ...do stuff with s[i]...
}

The optimizer may not realize that it can and should avoid the function call on each loop iteration - if you increment s in the loop, it may not be able to avoid it. That is expensive.

Once upon a very long time ago, I was using a function such as strstr() on buffers of 20 KB or so, and the program worked fine on the HP box where I developed it. I ported it back to a Sun machine (remember, this was 20 years ago - the problems been fixed long since), and the program didn't even crawl - it was practically stationary. The problem turned out to be a loop in strstr() which used strlen() more or less as shown above. When used on short buffers, there wasn't a major problem; when used on 20 KB buffers, searching for a pattern that appeared every couple of kilobytes, the poor machine ground to a halt. The problem was shown by profiling the application. I plugged in a surrogate strstr() that avoided the mistake and things went back to normal.

Another common source of slowness when carried to excess is the use of strcat(). For example:

strcpy(dst, name);
strcat(dst, "/subdir/");
strcat(dst, file);
strcat(dst, ".sfx");

Usually, these are not actually a source of performance trouble - but in the absence of evidence to the contrary, it could be a source of a buffer overflow. You need to know that the lengths are small enough to fit into dst. But, if you know the lengths of each bit (as you should to be sure that they will fit), you could write instead:

strcpy(dst, name);
dst += len_name;
strcpy(dst, "/subdir/");
dst += sizeof("/subdir/") - 1;
strcpy(dst, file);
dst += len_file;
strcpy(dst, ".sfx");

This saves repeatedly scanning a string of known length to find the end before adding the new material. With short strings, this won't matter much; with long strings and many concatenation operations, it might matter. As before, the key point is to measure the cost.

Kernighan and Pike have an interesting tale of how to improve the performance of a spam filter in the book 'The Practice of Programming'. It started using strstr() in a loop; it ended up with a very different design because the circumstances for which strstr() was designed did not match the requirements of their spam filtering system. But, again, they only did the work because it was demonstrated that there was a performance problem, and they did sufficient work to prevent the program from being the bottleneck on the system, but not more.

like image 37
Jonathan Leffler Avatar answered Sep 28 '22 20:09

Jonathan Leffler