Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does std::string_view::data not include a null terminator?

This code has undefined behavior:

#include <string_view>
#include <iostream>

using namespace std::string_view_literals;

void foo(std::string_view msg) {
    std::cout << msg.data() << '\n'; // undefined behavior if 'msg' is not null-
                                     // terminated

    // std::cout << msg << '\n'; is not undefined because operator<< uses
    //                           iterators to print 'msg', but that's not the point
}

int main() {
    foo("hello"sv); // not null-terminated - undefined behavior
    foo("foo");     // same, even more dangerous
}

The reason why is that std::string_view can store non-null terminated strings, and doesn't include a null terminator when calling data. That's really limiting, as to make the above code defined behavior, I have to construct a std::string out of it:

std::string str{ msg };
std::cout << str.data() << '\n';

This really makes std::string_view unnecessary in this case, I still have to copy the string passed to foo, so why not use move semantics and change msg to a std::string? This might be faster, but I didn't measure.

Either way, having to construct a std::string every time I want to pass a const char* to a function which only accepts a const char* is a bit unnecessary, but there has to be a reason why the Committee decided it this way.

So, why does std::string_view::data not return a null-terminated string like std::string::data?

like image 540
Rakete1111 Avatar asked Jan 18 '17 14:01

Rakete1111


3 Answers

So, why does std::string_view::data not return a null-terminated string like std::string::data

Simply because it can't. A string_view can be a narrower view into a larger string (a substring of a string). That means that the string viewed will not necessary have the null termination at the end of a particular view. You can't write the null terminator into the underlying string for obvious reasons and you can't create a copy of the string and return char * without a memory leak.

If you want a null terminating string, you would have to create a std::string copy out of it.

Let me show a good use of std::string_view:

auto tokenize(std::string_view str, Pred is_delim) -> std::vector<std::string_view>

Here the resulting vector contains tokens as views into the larger string.

like image 197
bolov Avatar answered Oct 12 '22 08:10

bolov


The purpose of string_view is to be a range representing a contiguous sequence of characters. Limiting such a range to one that ends in a NUL-terminator limits the usefulness of the class.

That being said, it would still be useful to have an alternate version of string_view which is intended only to be created from strings that truly are NUL-terminated.

My zstring_view class is privately inherited from string_view, and it provides support for removing elements from the front and other operations that cannot make the string non-NUL-terminated. It provides the rest of the operations, but they return a string_view, not a zstring_view.

You'd be surprised how few operations you have to lose from string_view to make this work:

template<typename charT, typename traits = std::char_traits<charT>>
class basic_zstring_view : private basic_string_view<charT, traits>
{
public:
    using base_view_type = basic_string_view<charT, traits>;

    using base_view_type::traits_type;
    using base_view_type::value_type;
    using base_view_type::pointer;
    using base_view_type::const_pointer;
    using base_view_type::reference;
    using base_view_type::const_reference;

    using base_view_type::const_iterator;
    using base_view_type::iterator;
    using base_view_type::const_reverse_iterator;
    using base_view_type::reverse_iterator;

    using typename base_view_type::size_type;
    using base_view_type::difference_type;

    using base_view_type::npos;

    basic_zstring_view(const charT* str) : base_view_type(str) {}
    constexpr explicit basic_zstring_view(const charT* str, size_type len) : base_view_type(str, len) {}
    constexpr explicit basic_zstring_view(const base_view_type &view) : base_view_type(view) {}

    constexpr basic_zstring_view(const basic_zstring_view&) noexcept = default;
    basic_zstring_view& operator=(const basic_zstring_view&) noexcept = default;

    using base_view_type::begin;
    using base_view_type::end;
    using base_view_type::cbegin;
    using base_view_type::cend;
    using base_view_type::rbegin;
    using base_view_type::rend;
    using base_view_type::crbegin;
    using base_view_type::crend;

    using base_view_type::size;
    using base_view_type::length;
    using base_view_type::max_size;
    using base_view_type::empty;

    using base_view_type::operator[];
    using base_view_type::at;
    using base_view_type::front;
    using base_view_type::back;
    using base_view_type::data;

    using base_view_type::remove_prefix;

    //`using base_view_type::remove_suffix`; Intentionally not provided.

    ///Creates a `basic_string_view` that lacks the last few characters.
    constexpr basic_string_view<charT, traits> view_suffix(size_type n) const
    {
        return basic_string_view<charT, traits>(data(), size() - n);
    }

    using base_view_type::swap;

    template<class Allocator = std::allocator<charT> >
    std::basic_string<charT, traits, Allocator> to_string(const Allocator& a = Allocator()) const
    {
        return std::basic_string<charT, traits, Allocator>(begin(), end(), a);
    }

    constexpr operator base_view_type() const {return base_view_type(data(), size());}

    using base_view_type::to_string;

    using base_view_type::copy;

    using base_view_type::substr;

    using base_view_type::operator==;
    using base_view_type::operator!=;
    using base_view_type::compare;
};
like image 33
Nicol Bolas Avatar answered Oct 12 '22 09:10

Nicol Bolas


When dealing with string literals with known null terminators I usually use something like this to make sure the null is included in the counted chars.

template < size_t L > std::string_view string_viewz(const char (&t) [L])
{
    return std::string_view(t, L);
}

The aim here is not to try to fix the compatibility issue, there are too many. But if you know what you are doing at want the string_view span to have a null ( Serialization ) then it is a nice trick.

auto view = string_viewz("Surrogate String");
like image 22
justdoityourself Avatar answered Oct 12 '22 09:10

justdoityourself