I have read the latest draft where <code>lazy_split_view</code> is added. But later on, I realized that <code>split_view</code> was renamed into <code>lazy_split_view</code>, and the <code>split_view</code> was renewed. <code>libstdc++</code> also recently implemented this by using <code>GCC Trunk</code> version https://godbolt.org/z/9qG5T9n5h I have a simple naive program here that shows the usage of two views, but I can't see their differences: <pre class="prettyprint lang-cpp prettyprint-override"><code>#include <iostream> #include <ranges> int main(){ std::string str { "one two three four" }; for (auto word : str | std::views::split(' ')) { for (char ch : word) std::cout << ch; std::cout << '.'; } std::cout << '\n'; for (auto word : str | std::views::lazy_split(' ')) { for (char ch : word) std::cout << ch; std::cout << '.'; } } </code></pre> Output: <pre class="prettyprint lang-cpp prettyprint-override"><code>one.two.three..four. one.two.three..four. </code></pre> until I've noticed the differences when using as <code>std::span<const char></code> for both views. In the first one: <code>std::views::split</code>: <pre class="prettyprint lang-cpp prettyprint-override"><code>for (std::span<const char> word : str | std::views::split(' ')) </code></pre> the compiler accepts my code. While in the second one: <code>std::views::lazy_split</code> <pre class="prettyprint lang-cpp prettyprint-override"><code>for (std::span<const char> word : str | std::views::lazy_split(' ')) </code></pre> throws compilation errors. I know there will be differences between these two, but I can't easily spot them. Is this a defect report in C++20 or a new feature in C++23 (with changes), or both?

I've looked at the relevant paper (P2210R2 from Barry Revzin) and <code>split_view</code> has been renamed to <code>lazy_split_view</code>. The new <code>split_view</code> is different in that it provides you with a different result type that preserves the category of the source range. For example, our string <code>str</code> is a contiguous range, so <code>split</code> will yield a contiguous subrange. Previously it would only give you a forward range. This can be bad if you try to do multi-pass operations or get the address to the underlying storage. From the example of the paper: <pre class="prettyprint"><code>std::string str = "1.2.3.4"; auto ints = str | std::views::split('.') | std::views::transform([](auto v){ int i = 0; std::from_chars(v.data(), v.data() + v.size(), i); return i; }); </code></pre> will work now, but <pre class="prettyprint"><code>std::string str = "1.2.3.4"; auto ints = str | std::views::lazy_split('.') | std::views::transform([](auto v){ int i = 0; // v.data() doesn't exist std::from_chars(v.data(), v.data() + v.size(), i); return i; }); </code></pre> won't because the range <code>v</code> is only a forward range, which doesn't provide a <code>data()</code> member. <h3>Original Answer</h3> I was under the impression that <code>split</code> must be lazy as well (laziness was one of the selling points of the ranges proposal after all), so I made a little experiment: <pre class="prettyprint"><code>struct CallCount{ int i = 0; auto operator()(auto c) { i++; return c; } ~CallCount(){ if (i > 0) // there are a lot of copies made when the range is constructed std::cout << "number of calls: " << i << "\n"; } }; int main() { std::string str = "1 3 5 7 9 1"; std::cout << "split_view:\n"; for (auto word : str | std::views::transform(CallCount{}) | std::views::split(' ') | std::views::take(2)) { } std::cout << "lazy_split_view:\n"; for (auto word : str | std::views::transform(CallCount{}) | std::views::lazy_split(' ') | std::views::take(2)) { } } </code></pre> This code prints (note that the <code>transform</code> operates on each char in the string): <pre class="prettyprint"><code>split_view: number of calls: 6 lazy_split_view: number of calls: 4 </code></pre> So what happens? Indeed, both views are lazy. But there are differences in their laziness. The <code>transform</code> that I put in front of <code>split</code> just counts how many times it has been called. As it turns out <code>split</code> computes the next item eagerly, while <code>lazy_split</code> stops as soon as it hits the whitespace after the current item. You can see that the string <code>str</code> consists of numbers that also mark their char index (starting at 1). The <code>take(2)</code> should stop the loop after we've seen '3' in <code>str</code>. And indeed <code>lazy_split</code> stops at the whitespace after '3', but <code>split</code> stops at the whitespace after '5'. This esentially means that <code>split</code> fetches its next item eagerly instead of lazy. This difference probably shouldn't matter most of the time but it can impact performance critical code. I don't know whether that was the reason for this change (I haven't read the paper).

What is the difference between the split_view and the lazy_split_view in C++?

Tags:

c++

std-ranges

c++23

I have read the latest draft where lazy_split_view is added.

But later on, I realized that split_view was renamed into lazy_split_view, and the split_view was renewed.

libstdc++ also recently implemented this by using GCC Trunk version https://godbolt.org/z/9qG5T9n5h

I have a simple naive program here that shows the usage of two views, but I can't see their differences:

#include <iostream>
#include <ranges>

int main(){

    std::string str { "one two three  four" };

    for (auto word : str | std::views::split(' ')) {
        for (char ch : word)
            std::cout << ch;
        std::cout << '.';
    }

    std::cout << '\n';

    for (auto word : str | std::views::lazy_split(' ')) {
        for (char ch : word)
            std::cout << ch;
        std::cout << '.';
    }

}

Output:

one.two.three..four.
one.two.three..four.

until I've noticed the differences when using as std::span<const char> for both views.

In the first one: std::views::split:

for (std::span<const char> word : str | std::views::split(' '))

the compiler accepts my code.

While in the second one: std::views::lazy_split

for (std::span<const char> word : str | std::views::lazy_split(' '))

throws compilation errors.

I know there will be differences between these two, but I can't easily spot them. Is this a defect report in C++20 or a new feature in C++23 (with changes), or both?

291

asked Jun 21 '21 12:06

Desmond Gold

Video Answer

1 Answers

I've looked at the relevant paper (P2210R2 from Barry Revzin) and split_view has been renamed to lazy_split_view. The new split_view is different in that it provides you with a different result type that preserves the category of the source range.

For example, our string str is a contiguous range, so split will yield a contiguous subrange. Previously it would only give you a forward range. This can be bad if you try to do multi-pass operations or get the address to the underlying storage.

From the example of the paper:

std::string str = "1.2.3.4";
auto ints = str 
    | std::views::split('.')
    | std::views::transform([](auto v){
        int i = 0;
        std::from_chars(v.data(), v.data() + v.size(), i);
        return i;
    });

will work now, but

std::string str = "1.2.3.4";
auto ints = str 
    | std::views::lazy_split('.')
    | std::views::transform([](auto v){
        int i = 0;
        // v.data() doesn't exist
        std::from_chars(v.data(), v.data() + v.size(), i);
        return i;
    });

won't because the range v is only a forward range, which doesn't provide a data() member.

Original Answer

I was under the impression that split must be lazy as well (laziness was one of the selling points of the ranges proposal after all), so I made a little experiment:

struct CallCount{
    int i = 0;

    auto operator()(auto c) {
        i++;
        return c;
    }

    ~CallCount(){
        if (i > 0) // there are a lot of copies made when the range is constructed
            std::cout << "number of calls: " << i << "\n";
    }
};


int main() {
    
    std::string str = "1 3 5 7 9 1";

    std::cout << "split_view:\n";

    for (auto word : str | std::views::transform(CallCount{}) | std::views::split(' ') | std::views::take(2)) {
    }

    std::cout << "lazy_split_view:\n";

    for (auto word : str | std::views::transform(CallCount{}) | std::views::lazy_split(' ') | std::views::take(2)) {
    }    
}

This code prints (note that the transform operates on each char in the string):

split_view:
number of calls: 6
lazy_split_view:
number of calls: 4

So what happens?

Indeed, both views are lazy. But there are differences in their laziness. The transform that I put in front of split just counts how many times it has been called. As it turns out split computes the next item eagerly, while lazy_split stops as soon as it hits the whitespace after the current item.

You can see that the string str consists of numbers that also mark their char index (starting at 1). The take(2) should stop the loop after we've seen '3' in str. And indeed lazy_split stops at the whitespace after '3', but split stops at the whitespace after '5'.

This esentially means that split fetches its next item eagerly instead of lazy. This difference probably shouldn't matter most of the time but it can impact performance critical code.

I don't know whether that was the reason for this change (I haven't read the paper).

answered Oct 19 '22 15:10

Timo

Related questions
                            
                                Looking for Explanation of pointer initializing to const int and int
                            
                                Why is .push_back(x) faster than .push_back(std::move(x))
                            
                                Defining a c++20 concept for hash functions
                            
                                Order of evaluation of expressions in a function call
                            
                                How to create a new string from a range in C++20?
                            
                                How to compare two vectors for equality?
                            
                                Dynamic cast specification (rule) clarification
                            
                                Why does std::views::take_while from the Ranges library require a const predicate?
                            
                                Why does ranges::sort return an iterator?
                            
                                Can I declare multiple functions with comma?
                            
                                Generator called twice in C++20 views pipeline [duplicate]
                            
                                Does the implicit conversion of a literal to a class type happen at compile time?
                            
                                c++ requires expression with inverse return type concept check
                            
                                How to disable writes on temporary returned by getter?
                            
                                When does NRVO kick in? What are the requirements to be satisfied?
                            
                                context-select like features in C++
                            
                                Can I stop std::cout flushing on "\n"?
                            
                                Is this C++ member initialization behavior well defined?
                            
                                How do I store a function to a variable?
                            
                                Initializing pthread mutexes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With