I am reading "Write Great Code Volume 2" and it shows the following strlen impelementation:
int myStrlen( char *s )
{
char *start;
start = s;
while( *s != 0 )
{
++s;
}
return s - start;
}
the book says that this implementation is typical for an inexperienced C programmer. I have been coding in C for the past 11 years and i can't see how to write a function better than this in C(i can think of writing better thing in assembly). How is it possible to write code better than this in C? I looked the standard library implementation of the strlen function in glibc and I couldn't understand most part of it. Where can I find better information on how to write highly optimized code?
strlen() in C-style strings can be replaced by C++ std::strings. sizeof() in C is as an argument to functions like malloc(), memcpy() or memset() can be replaced by C++ (use new, std::copy(), and std::fill() or constructors).
The strlen() function calculates the length of a given string. The strlen() function takes a string as an argument and returns its length. The returned value is of type size_t (an unsigned integer type). It is defined in the <string. h> header file.
Ok, I need to add some explanation. My application is getting a string from a shared memory (which is of some length), therefore it could be represented as an array of characters. If there is a bug in the library writing this string, then the string would not be zero terminated, and the strlen could fail.
From Optimising strlen(), a blogpost by Colm MacCarthaigh:
Unfortunately in C, we’re doomed to an O(n) implementation, best case, but we’re still not done … we can do something about the very size of n.
It gives good example in what direction you can work to speed it up. And another quote from it
Sometimes going really really fast just makes you really really insane.
Victor, take a look at this:
http://en.wikipedia.org/wiki/Strlen#Implementation
P.S. The reason you don't understand the glibc version is probably because it uses bit shifting to find the \0.
For starters, this is worthless for encodings like UTF-8... that is, calculating the number of characters in an UTF-8 string is more complicated, whereas the number of bytes is, of course, just as easy to calculate as in, say, an ASCII string.
In general, you can optimize on some platforms by reading into larger registers. Since the other links posted so far don't have an example of that, here's a bit of pseudo-pseudocode for lower endian:
int size = 0;
int x;
int *caststring = (int *) yourstring;
while (int x = *caststring++) {
if (!(x & 0xff)) /* first byte in this int-sized package is 0 */ return size;
else if (!(x & 0xff00)) /* second byte etc. */ return size+1;
/* rinse and repeat depending on target architecture, i.e. twice more for 32 bit */
size += sizeof (int);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With