Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is strtolower slightly slower than strtoupper?

I did an experiment out of curiosity. I wanted to see if there was a micro difference at all between strtolower() and strtoupper(). I expected strtolower() would be faster on mostly lowercase strings and visa versa. What I found is that strtolower() was slower in all cases (although in a completely insignificant way until you're doing it millions of times.) This was my test.

$string = 'hello world';
$start_time = microtime();
for ($i = 0; $i < 10000000; $i++) {
    strtolower($string);
}
$timed = microtime() - $start_time;
echo 'strtolower ' . $string . ' - ' . $timed . '<br>';

Repeated for strtolower() and strtoupper() with hello world, HELLO WORLD, and Hello World. Here is the full gist. I've ran the code several times and keep getting roughly the same results. Here's one run of the test below.

strtolower hello world - 0.043829
strtoupper hello world - 0.04062
strtolower HELLO WORLD - 0.042691
strtoupper HELLO WORLD - 0.015475
strtolower Hello World - 0.033626
strtoupper Hello World - 0.017022

I believe the C code in the php-src github that controls this is here for strtolower() and here for strtoupper()

To be clear, this isn't going to prevent me from ever using strtolower(). I am only trying to understand what is going on here.

Why is strtolower() slower than strtoupper()?

like image 634
Goose Avatar asked Jun 27 '17 13:06

Goose


2 Answers

It mostly depends on which character encoding you are currently using, but the main cause of the speed difference is the size of each encoded character of special characters.

Taken from babelstone.co.uk:

For example, lowercase j with caron (ǰ) is represented as a single encoded character (U+01F0 LATIN SMALL LETTER J WITH CARON), but the corresponding uppercase character (J̌) is represented in Unicode as a sequence of two encoded characters (U+004A LATIN CAPITAL LETTER J + U+030C COMBINING CARON).

More data to sift through in the index of Unicode characters will inevitably take a little longer.

Keep in mind, that strtolower uses your current locale, so if your server is using character encoding that does not support strtolower of special characters (such as 'Ê'), it will simply return the special character. The character mapping on UTF-8 is however set up, which can be confirmed by running mb_strtolower.

There is also the possibility of comparing the number of characters that fall into the category of uppercase vs the amount you will find in the lowercase category, but once again, that is dependent on your character encoding.

In short, strtolower has a bigger database of characters to compare each individual string character to when it checks whether or not the character is uppercase.

like image 97
Frits Avatar answered Oct 06 '22 00:10

Frits


There are a couple of very slight differences in the implementation of the code:

PHPAPI char *php_strtoupper(char *s, size_t len)
{
    unsigned char *c, *e;

    c = (unsigned char *)s;
    e = (unsigned char *)c+len;    <-- strtolower uses e = c+len;

    while (c < e) {
        *c = toupper(*c);
        c++;
    }
    return s;
}

PHPAPI zend_string *php_string_toupper(zend_string *s)
{
    unsigned char *c, *e;

    c = (unsigned char *)ZSTR_VAL(s);
    e = c + ZSTR_LEN(s);

    while (c < e) {
        if (islower(*c)) {
            register unsigned char *r;
            zend_string *res = zend_string_alloc(ZSTR_LEN(s), 0);

            if (c != (unsigned char*)ZSTR_VAL(s)) {
                memcpy(ZSTR_VAL(res), ZSTR_VAL(s), c - (unsigned char*)ZSTR_VAL(s));
            }
            r = c + (ZSTR_VAL(res) - ZSTR_VAL(s));
            while (c < e) {
                *r = toupper(*c);
                r++;
                c++;
            }
            *r = '\0';
            return res;
        }
        c++;
    }
    return zend_string_copy(s);
}

PHP_FUNCTION(strtoupper)
{
    zend_string *arg;      <-- strtolower uses zend_string *str;

    ZEND_PARSE_PARAMETERS_START(1, 1)
        Z_PARAM_STR(arg)          <-- strtolower uses Z_PARAM_STR(str)
    ZEND_PARSE_PARAMETERS_END();

    RETURN_STR(php_string_toupper(arg));     <-- strtolower uses RETURN_STR(php_string_tolower(str));
}

and for strtolower

PHPAPI char *php_strtolower(char *s, size_t len)
{
    unsigned char *c, *e;

    c = (unsigned char *)s;
    e = c+len;                  <-- strtoupper uses e = (unsigned char *)c+len;

    while (c < e) {
        *c = tolower(*c);
        c++;
    }
    return s;
}

PHPAPI zend_string *php_string_tolower(zend_string *s)
{
    unsigned char *c, *e;

    c = (unsigned char *)ZSTR_VAL(s);
    e = c + ZSTR_LEN(s);

    while (c < e) {
        if (isupper(*c)) {
            register unsigned char *r;
            zend_string *res = zend_string_alloc(ZSTR_LEN(s), 0);

            if (c != (unsigned char*)ZSTR_VAL(s)) {
                memcpy(ZSTR_VAL(res), ZSTR_VAL(s), c - (unsigned char*)ZSTR_VAL(s));
            }
            r = c + (ZSTR_VAL(res) - ZSTR_VAL(s));
            while (c < e) {
                *r = tolower(*c);
                r++;
                c++;
            }
            *r = '\0';
            return res;
        }
        c++;
    }
    return zend_string_copy(s);
}

PHP_FUNCTION(strtolower)
{
    zend_string *str;     <-- strtoupper uses zend_string *arg; 

    ZEND_PARSE_PARAMETERS_START(1, 1)
        Z_PARAM_STR(str)        <-- strtoupper uses Z_PARAM_STR(arg)
    ZEND_PARSE_PARAMETERS_END();

    RETURN_STR(php_string_tolower(str));    <-- strtoupper uses RETURN_STR(php_string_tolower(arg));
}

Whether these minor differences are enough to affect performance by those few nanoseconds, I don't know.... unsure why the differences are even there

like image 30
Mark Baker Avatar answered Oct 06 '22 00:10

Mark Baker