Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the best practices for designing function templates to deal with string/char arguments?

Tags:

I'd like to write a simple string split function.

The function should take one std::basic_string and a delimiter (possibly a CharT or std::basic_string), and put the result into a ContainerT.

My first try is

template <typename StringT, typename DelimiterT, typename ContainerT>
void split(
    const StringT &str, const DelimiterT &delimiters, ContainerT &conts) {
    conts.clear();
    std::size_t start = 0, end;
    std::size_t len = delimiters.size();
    while ((end = str.find(delimiters, start)) != StringT::npos) {
        if (end - start) {
            conts.emplace_back(str, start, end - start);
        }
        start = end + len;
    }

    if (start != StringT::npos && start < str.size()) {
        conts.emplace_back(str, start, str.size() - start);
    }
}

My final goal is to extend this function to achieve:

  1. The final results are always std::basic_string<CharT> put into some conts.
  2. The first argument str could be std::basic_string<CharT>, const CharT* or a string literal.
  3. The second argument delimiter could be a char, or a std::basic_string<CharT>/const CharT*/string literal, meaning that the length of the delimiter is greater than 1, e.g. split aaa,,bbb,c with ,, gives aaa/bbb,c.
  4. The third argument can be any sequence container from STL.

Since one usually deals with modern stings in C++, 2 may be std::basic_string<CharT> only for simplification.

Given that the function (template) can be overloaded, I wonder

  1. At least how many functions would I need in this situation?
  2. And what's the best practice to design such functions(How to write more generic functions)? For example, maybe to make the above function work with a const CharT* delimiter, the line std::size_t len = delimiters.size(); must be changed to some std::distance(...)?

Update:

A revalent code review is added here.

like image 826
Saddle Point Avatar asked May 27 '18 06:05

Saddle Point


People also ask

How do you create a template function?

Defining a Function TemplateA function template starts with the keyword template followed by template parameter(s) inside <> which is followed by the function definition. In the above code, T is a template argument that accepts different data types ( int , float , etc.), and typename is a keyword.

Which of the following best defines the syntax for template function?

Which one is suitable syntax for function template? Explanation: Both class and typename keywords can be used alternatively for specifying a generic type in a template.

What are function templates?

Function templates are similar to class templates but define a family of functions. With function templates, you can specify a set of functions that are based on the same code but act on different types or classes. The following function template swaps two items: C++ Copy.

What is the format in declaring a function template?

The format for declaring function templates with type parameters is: template <class identifier> function_declaration; template <typename identifier> function_declaration; The only difference between both prototypes is the use of either the keyword class or the keyword typename.


1 Answers

You can use std::string_view for both text to be split and delimeter. Additionally, you can use template template parameter to choose type of elements in the result:

template<typename Char, template<typename> class Container, typename String>
Container<String> split_impl(std::basic_string_view<Char> text, std::basic_string_view<Char> delim)
{
    Container<String> result;
    //...
    result.push_back(String(text.substr(start, count)));
    //...
    return result;
}

template<template<typename> class Container, typename String = std::string_view>
Container<String> split(std::string_view text, std::string_view delim)
{ return split_impl<char, Container, String>(text, delim); }

template<template<typename> class Container, typename String = std::u16string_view>
Container<String> split(std::u16string_view text, std::u16string_view delim)
{ return split_impl<char16_t, Container, String>(text, delim); }

This way, it can be used with std::string, std::string_view and const char* without redundant allocations:

// vector of std::string_view objects:
auto words_1 = split<std::vector>("hello world", " ");

// list of std::string objects:
auto words_2 = split<std::list, std::string>(std::string("hello world"), " ");

// vector of std::u16string_view objects:
auto words_3 = split<std::vector>(u"hello world", u" ");

Edit: added overloads for char and char16_t

Edit 2

In code above, split_impl does actual work. split overloads are provided only to simplify user code, so that you don't have to explicitly specify character type to be used. It would be necessary without overloads, because compiler can't deduce Char when type of parameter is basic_string_view and you're passing an argument of different type (for example, const char* or std::wstring). In general, I think it isn't a big problem - probably, you want to have four overloads (char, char16_t, char32_t, wchar_t), if not less.

However, for completeness, here's an alternative that doesn't use overloads:

template<typename ContainerT, typename TextT, typename DelimT>
ContainerT split(const TextT& text, const DelimT& delim)
{
    using CharT = std::remove_reference_t<decltype(text[0])>;

    std::basic_string_view<CharT> textView(text);
    std::basic_string_view<CharT> delimView(delim);

    ContainerT result;

    // actual implementation, but using textView and delimView instead of text and delim

   result.push_back(textView.substr(start, count));

   return result;
}

// usage:
auto words = split<std::vector<std::string_view>>("some text", " ");

With this approach you cannot use default value of String template parameter, as above (because it would have to depend on TextT type). For this reason, I removed it. Also, this code assumes that text and delim use the same character type and can be converter to basic_string_view.

Personally, I prefer version 1. It doesn't use template types for function parameters, which is IMHO better, as it gives caller better idea about what should be passed in. In other words, interface of the first split is better specified. Also, as noted above, I don't consider having to add four overloads of split a problem.

like image 53
joe_chip Avatar answered Sep 28 '22 19:09

joe_chip