Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting std::string to upper case: major performance difference?

Tags:

c++

So I was playing around with some code and wanted to see which method of converting a std::string to upper case was most efficient. I figured that the two would be somewhat similar performance-wise, but I was terribly wrong. Now I'd like to find out why.

The first method of converting the string works as follows: for each character in the string (save the length, iterate from 0 to length), if it's between 'a' and 'z', then shift it so that it's between 'A' and 'Z' instead.

The second method works as follows: for each character in the string (start from 0, keep going till we hit a null terminator), apply the build in toupper() function.

Here's the code:

#include <iostream>
#include <string>

inline std::string ToUpper_Reg(std::string str)
{
    for (int pos = 0, sz = str.length(); pos < sz; ++pos)
    {
        if (str[pos] >= 'a' && str[pos] <= 'z') { str[pos] += ('A' - 'a'); }
    }

    return str;
}

inline std::string ToUpper_Alt(std::string str)
{
    for (int pos = 0; str[pos] != '\0'; ++pos) { str[pos] = toupper(str[pos]); }

    return str;
}


int main()
{
    std::string test = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~!@#$%^&*()_+=-`'{}[]\\|\";:<>,./?";

    for (size_t i = 0; i < 100000000; ++i) { ToUpper_Reg(test); /* ToUpper_Alt(test); */ }

    return 0;
}

The first method ToUpper_Reg took about 169 seconds per 100 million iterations.
The second method Toupper_Alt took about 379 seconds per 100 million iterations.

What gives?


Edit: I changed the second method so that it iterates the string how the first one does (set the length aside, loop while less than length) and it's a bit faster, but still about twice as slow.


Edit 2: Thanks everybody for your submissions! The data I'll be using it on is guaranteed to be ascii, so I think I'll be sticking with the first method for the time being. I'll keep in mind that toupper is locale specific for when/if I need it.

like image 822
Mr. Llama Avatar asked Feb 29 '12 22:02

Mr. Llama


People also ask

Is Toupper better than Tolower?

ToUpper depends on what your strings contain more of, and that typically strings contain more lower case characters which makes ToLower more efficient.

Does Toupper work on strings?

C++ String has got built-in toupper() function to convert the input String to Uppercase.

How do you convert to uppercase in C++?

The toupper() function in C++ converts a given character to uppercase. It is defined in the cctype header file.

Does Toupper work on strings in C?

toupper() function in CThe toupper() function is used to convert lowercase alphabet to uppercase. i.e. If the character passed is a lowercase alphabet then the toupper() function converts a lowercase alphabet to an uppercase alphabet. It is defined in the ctype.


2 Answers

std::toupper uses the current locale to do case conversions, which involves a function call and other abstractions. So naturally, it will be slower. But it will also work on non-ASCII text.

like image 133
Nicol Bolas Avatar answered Oct 15 '22 15:10

Nicol Bolas


toupper() does more than just shift characters in the range [a-z]. For one thing it's locale dependent and can handle more than just ASCII.

like image 32
bames53 Avatar answered Oct 15 '22 16:10

bames53