I want to get all digits in a std::string
but without using a loop (myself; what the code I'm calling uses, I don't mind). An alternative view of the request is: remove all non-digits from the string, leaving only the digits. I know that I can find all digit in a string using code like this:
std::string get_digits(std::string input) {
std::string::size_type next_digit(0u);
for (std::string::size_type pos(0u);
input.npos != (pos = input.find_first_of("0123456789"));
++pos) {
input[next_digit++] = input[pos];
}
input.resize(next_digit);
return input;
}
However, this function uses a loop. std::string
doesn't provide a function find_all()
or something! Ideally, the string is maniulated in-place (the code above moves it but it is easily changed to take a reference).
When there are multiple alternatives, I'll promise to post profiling results of how good the different approaches work on some lengthy text.
One way would be to use std::copy_if
(or std::remove_if
):
std::string get_digits(std::string input) {
std::string result;
std::copy_if(
input.begin(),
input.end(),
std::back_inserter(result),
[](char c) { return '0' <= c && c <= '9'; });
return result;
}
Obviously this uses a loop internally, but you said you don't care about that...
Edit: With std::remove_if
:
std::string get_digits_remove(std::string input) {
auto itErase = std::remove_if(
input.begin(),
input.end(),
[](char c) { return !('0' <= c && c <= '9'); });
input.erase(itErase, input.end());
return input;
}
Although I primarily had hoped for 5 quick answers (which wasn't achieved, sigh) the answers and comments led to some interesting approaches I hadn't thought of myself. My personal expectation had been that the answers effectively would result in:
If you want to be fast, use
input.erase(std::remove_if(input.begin(), input.end(),
[](unsigned char c){ return !std::isdigit(c); }),
input.end());
If you want to be concise, use
text = std::regex_replace(text, std::regex(R"(\D)"), "");
Instead, there were a number of approaches I hadn't even considered:
Use a recursive function!
Use std::partition()
which seems to require extra work (retain the characters which will be thrown out) and changes the order.
Use std::stable_partition()
which seems to require even more work but doesn't change the order.
Use std::sort()
and extract the substring with the relevant characters although I don't know how to make that one retain the original sequence of character. Just using a stable version doesn't quite to it.
Putting the different approaches together and using a number of variations on how to classify the characters, led to a total of 17 version of roughly the same operation (the code is on github). Most of the versions use std::remove_if()
and std::string::erase()
but differ in the classification of digits.
remove_if()
with [](char c){ return d.find(c) == d.npos; })
.remove_if()
with [](char c){ return std::find(d.begin(), d.end(), c) == d.end(); }
remove_if()
with [](char c){ return !std::binary_search(d.begin(), d.end()); }
remove_if()
with [](char c){ return '0' <= c && c <= '9'; }
remove_if()
with [](unsigned char c){ return !std::isdigit(c); }
(the char
is passed as unsigned char
to avoid undefined behavior in case c
is a char
with a negative value)remove_if()
with std::not1(std::ptr_fun(std::static_cast<int(*)(int)>(&std::isdigit)))
(the cast is necessary to determine the correct overload: std::isdigit()
happens to be overloaded).remove_if()
with [&](char c){ return !hash.count(c); }
remove_if()
with [&](char c){ return filter[c]; }
(the code initializing actually uses a loop)remove_if()
with [&](char c){ return std::isidigit(c, locale); }
remove_if()
with [&](char c){ return ctype.is(std::ctype_base::digit, c); }
str.erase(std::parition(str.begin(), str.end(), [](unsigned char c){ return !std::isdigit(c); }), str.end())
str.erase(std::stable_parition(str.begin(), str.end(), [](unsigned char c){ return !std::isdigit(c); }), str.end())
copy_if()
approach described in one of the answerstext = std::regex_replace(text, std::regex(R"(\D)"), "");
(I didn't manage to get this to work on icc)I have run the benchmark on a MacOS notebook. Since results like this are reasonably easy to graph with Google Chars, here is a graph of the results (although with the versions using regexps removed as these would cause the graph to scale such that the interesting bit isn't really visible). The results of the benchmarks in form of a table:
test clang gcc icc
1 use_remove_if_str_find 22525 26846 24815
2 use_remove_if_find 31787 23498 25379
3 use_remove_if_binary_search 26709 27507 37016
4 use_remove_if_compare 2375 2263 1847
5 use_remove_if_ctype 1956 2209 2218
6 use_remove_if_ctype_ptr_fun 1895 2304 2236
7 use_remove_if_hash 79775 60554 81363
8 use_remove_if_table 1967 2319 2769
9 use_remove_if_locale_naive 17884 61096 21301
10 use_remove_if_locale 2801 5184 2776
11 use_partition 1987 2260 2183
12 use_stable_partition 7134 4085 13094
13 use_sort 59906 100581 67072
14 use_copy_if 3615 2845 3654
15 use_recursive 2524 2482 2560
16 regex_build 758951 531641
17 regex_prebuild 775450 519263
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With