I was looking for a trivial way to check if a type std::string
contains only alphanumeric characters. My code looks like the following:
std::string sStr("This is a test");
for (std::string::const_iterator s = sStr.begin(); s != sStr.end(); ++s)
if (! isalnum(*s)) return false;
Is this the right way of doing this? Is there a more efficient way of handling the problem?
I would use std::find_if
:
std::string sStr("This is a test");
auto it = std::find_if(std::begin(sStr), std::end(sStr), [](unsigned char c) {
return !std::isalnum(c);
});
return it == std::end(sStr);
std::find_if_not
maybe would make more sense:
auto it = std::find_if_not(std::begin(sStr), std::end(sStr), [](unsigned char c) {
return std::isalnum(c);
});
Yes, actually*. The looping constructs found in <algorithm>
appear to perform slightly better than a raw loop under optimization, at least with gcc.
Using <algorithm>
and a lambda is a nice way of doing this, at first glance:
bool onlyAlnum(const std::string& str){
return std::all_of(
str.begin(),
str.end(),
[loc = std::locale{}](char c){return std::isalnum(c, loc);});
}
However this has its disadvantages.
locale: the locale
version of isalnum
seemed to be quite a bit slower than the <cctype>
incarnation of the function when I went to test it. the cctype version doesn't pay attention to locale, but testing single char
for whether they're alphanumeric is going to work for a tiny, tiny subset of UTF-8 characters anyway: UTF-8 is a variable-width encoding, and testing a part of a multi-char character will result in a false negative for a test of alphanumerism.
The lambda above is a c++14 lambda, which initializes a variable loc
to hold the locale when it's first created. This allows the function to work dependent on the present locale, while also preventing the cost of constructing a new object to represent the locale every time the predicate is evaluated, as would happen with a lambda like:
[](char c){return std::isalnum(c, std::locale{});}
However, it's still a very slow test, in comparison. If we don't need the limited advantages of the <locale>
incarnation of std::isalnum
, we can use the (much) faster <cctype>
version:
[](char c){return std::isalnum(c);}
So we come to a second way of doing this, which uses the cctype version instead. Testing shows this to be much faster, on par with the raw loop you gave:
bool onlyAlnumAllOf(const std::string& str){
return std::all_of(
str.begin(),
str.end(),
[](char c){return std::isalnum(c);});
}
all_of tests whether a condition is valid for every entry in a range of input iterators. The range is supplied by the first two arguments, here str.begin()
and str.end()
, which naturally define the start and end of the string.
and a demo on coliru shows that onlyAlNum
will return true for any string containing only characters from the alphabet or numbers, though not anything containing a space.
In the end, you can test the difference. With a cursory test evaluating "oyn3478qo47nqooina7o8oao7nroOL" 1000000 times, these are the results:
MinGW-64 port of gcc 5.2.0 on my machine
g++ main.cpp -Wall -Wextra -Wpedantic --std=c++14 -o3
all_of (with locale information): 652ms for 1000000 iterations
all_of: 63ms for 1000000 iterations
find_if: 63ms for 1000000 iterations
loop: 70ms for 1000000 iterations
range-loop: 69ms for 1000000 iterations
and coliru with gcc 6.1.0:
g++ main.cpp -Wall -Wextra -Wpedantic --std=c++14 -o3
all_of (with locale information): 1404ms for 1000000 iterations
all_of: 101ms for 1000000 iterations
find_if: 110ms for 1000000 iterations
loop: 108ms for 1000000 iterations
range-loop: 119ms for 1000000 iterations
and clang 3.8.0 on coliru:
clang++ -std=c++14 -O3 -Wall -Wextra -Wpedantic main.cpp
all_of (with locale information): 1127ms for 1000000 iterations
all_of: 85ms for 1000000 iterations
find_if: 72ms for 1000000 iterations
loop: 128ms for 1000000 iterations
range-loop: 88ms for 1000000 iterations
As you can see, it varies by compiler and version which function is the fastest. Isn't optimization fun?
this is the function I used to test each method:
using StrEvaluator = bool (*)(const std::string&);
using Milliseconds = std::chrono::milliseconds;
void testStrEvaluator(StrEvaluator eval, std::string str){
auto begin = std::chrono::steady_clock::now();
bool result = true;
for(unsigned int i = 0; i < 1000000; ++i){
str.resize(str.size());
result &= eval(str);
}
auto end = std::chrono::steady_clock::now();
std::cout
<< std::chrono::duration_cast<Milliseconds>(end - begin).count()
<< "ms for 1000000 iterations\n";
}
There are flaws with the testing: coliru doesn't provide guarantees regarding consistent resources during execution, and I didn't shut down other programs on my computer, so the variations could be a fluke. However, they seem consistent enough to draw a few conclusions from: both the looping constructs from algorithm and raw loops can perform quite well, and choosing between them based on speed (unless you've discovered the loop is a bottleneck) is more micro-optimization than anything else.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With