I'd like to write a simple string split
function.
The function should take one std::basic_string
and a delimiter (possibly a CharT
or std::basic_string
), and put the result into a ContainerT
.
My first try is
template <typename StringT, typename DelimiterT, typename ContainerT>
void split(
const StringT &str, const DelimiterT &delimiters, ContainerT &conts) {
conts.clear();
std::size_t start = 0, end;
std::size_t len = delimiters.size();
while ((end = str.find(delimiters, start)) != StringT::npos) {
if (end - start) {
conts.emplace_back(str, start, end - start);
}
start = end + len;
}
if (start != StringT::npos && start < str.size()) {
conts.emplace_back(str, start, str.size() - start);
}
}
My final goal is to extend this function to achieve:
std::basic_string<CharT>
put into some conts
.str
could be std::basic_string<CharT>
, const CharT*
or a string literal.delimiter
could be a char
, or a std::basic_string<CharT>
/const CharT*
/string literal, meaning that the length of the delimiter is greater than 1, e.g. split aaa,,bbb,c
with ,,
gives aaa/bbb,c
.STL
.Since one usually deals with modern stings in C++, 2 may be std::basic_string<CharT>
only for simplification.
Given that the function (template) can be overloaded, I wonder
const CharT*
delimiter, the line std::size_t len = delimiters.size();
must be changed to some std::distance(...)
?Update:
A revalent code review is added here.
Defining a Function TemplateA function template starts with the keyword template followed by template parameter(s) inside <> which is followed by the function definition. In the above code, T is a template argument that accepts different data types ( int , float , etc.), and typename is a keyword.
Which one is suitable syntax for function template? Explanation: Both class and typename keywords can be used alternatively for specifying a generic type in a template.
Function templates are similar to class templates but define a family of functions. With function templates, you can specify a set of functions that are based on the same code but act on different types or classes. The following function template swaps two items: C++ Copy.
The format for declaring function templates with type parameters is: template <class identifier> function_declaration; template <typename identifier> function_declaration; The only difference between both prototypes is the use of either the keyword class or the keyword typename.
You can use std::string_view
for both text to be split and delimeter. Additionally, you can use template template parameter to choose type of elements in the result:
template<typename Char, template<typename> class Container, typename String>
Container<String> split_impl(std::basic_string_view<Char> text, std::basic_string_view<Char> delim)
{
Container<String> result;
//...
result.push_back(String(text.substr(start, count)));
//...
return result;
}
template<template<typename> class Container, typename String = std::string_view>
Container<String> split(std::string_view text, std::string_view delim)
{ return split_impl<char, Container, String>(text, delim); }
template<template<typename> class Container, typename String = std::u16string_view>
Container<String> split(std::u16string_view text, std::u16string_view delim)
{ return split_impl<char16_t, Container, String>(text, delim); }
This way, it can be used with std::string
, std::string_view
and const char*
without redundant allocations:
// vector of std::string_view objects:
auto words_1 = split<std::vector>("hello world", " ");
// list of std::string objects:
auto words_2 = split<std::list, std::string>(std::string("hello world"), " ");
// vector of std::u16string_view objects:
auto words_3 = split<std::vector>(u"hello world", u" ");
Edit: added overloads for char
and char16_t
Edit 2
In code above, split_impl
does actual work. split
overloads are provided only to simplify user code, so that you don't have to explicitly specify character type to be used. It would be necessary without overloads, because compiler can't deduce Char
when type of parameter is basic_string_view
and you're passing an argument of different type (for example, const char*
or std::wstring
). In general, I think it isn't a big problem - probably, you want to have four overloads (char
, char16_t
, char32_t
, wchar_t
), if not less.
However, for completeness, here's an alternative that doesn't use overloads:
template<typename ContainerT, typename TextT, typename DelimT>
ContainerT split(const TextT& text, const DelimT& delim)
{
using CharT = std::remove_reference_t<decltype(text[0])>;
std::basic_string_view<CharT> textView(text);
std::basic_string_view<CharT> delimView(delim);
ContainerT result;
// actual implementation, but using textView and delimView instead of text and delim
result.push_back(textView.substr(start, count));
return result;
}
// usage:
auto words = split<std::vector<std::string_view>>("some text", " ");
With this approach you cannot use default value of String
template parameter, as above (because it would have to depend on TextT
type). For this reason, I removed it. Also, this code assumes that text
and delim
use the same character type and can be converter to basic_string_view
.
Personally, I prefer version 1. It doesn't use template types for function parameters, which is IMHO better, as it gives caller better idea about what should be passed in. In other words, interface of the first split
is better specified. Also, as noted above, I don't consider having to add four overloads of split
a problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With