Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between the split_view and the lazy_split_view in C++?

I have read the latest draft where lazy_split_view is added.

But later on, I realized that split_view was renamed into lazy_split_view, and the split_view was renewed.

libstdc++ also recently implemented this by using GCC Trunk version https://godbolt.org/z/9qG5T9n5h

I have a simple naive program here that shows the usage of two views, but I can't see their differences:

#include <iostream>
#include <ranges>

int main(){

    std::string str { "one two three  four" };

    for (auto word : str | std::views::split(' ')) {
        for (char ch : word)
            std::cout << ch;
        std::cout << '.';
    }

    std::cout << '\n';

    for (auto word : str | std::views::lazy_split(' ')) {
        for (char ch : word)
            std::cout << ch;
        std::cout << '.';
    }

}

Output:

one.two.three..four.
one.two.three..four.

until I've noticed the differences when using as std::span<const char> for both views.

In the first one: std::views::split:

for (std::span<const char> word : str | std::views::split(' '))

the compiler accepts my code.

While in the second one: std::views::lazy_split

for (std::span<const char> word : str | std::views::lazy_split(' ')) 

throws compilation errors.

I know there will be differences between these two, but I can't easily spot them. Is this a defect report in C++20 or a new feature in C++23 (with changes), or both?

like image 291
Desmond Gold Avatar asked Jun 21 '21 12:06

Desmond Gold


People also ask

What is split view and how do I use it?

Split View is especially useful when a user needs to review or edit records that meet certain criteria and can be sorted/filtered, then worked on or viewed. A user can only Sort fields in Table View, not the Split View. A user can only Filter fields in Table View, not the Split View.

What is the expression equivalent to lazy_split_view?

The expression views::lazy_split(e, f) is expression-equivalent to lazy_split_view(e, f). 3) The exposition-only concept /*tiny_range*/<Pattern> is satisfied if Pattern satisfies sized_range, Pattern::size() is a constant expression and suitable as a template non-type argument, and the value of Pattern::size() is less than or equal to 1.

What is the difference between p2210r2 and lazy_split_view?

Before P2210R2, split_view used a lazy mechanism for splitting, and thus could not keep the bidirectional, random access, or contiguous properties of the underlying view, or make the iterator type of the inner range same as that of the underlying view. Consequently, it is redesigned by P2210R2, and the lazy mechanism is moved to lazy_split_view .

What is the difference between forward_range and input_range in lazy_split_view?

lazy_split_view models the concepts forward_range and input_range when the underlying view V models respective concepts, and models common_range when V models both forward_range and common_range .


Video Answer


1 Answers

I've looked at the relevant paper (P2210R2 from Barry Revzin) and split_view has been renamed to lazy_split_view. The new split_view is different in that it provides you with a different result type that preserves the category of the source range.

For example, our string str is a contiguous range, so split will yield a contiguous subrange. Previously it would only give you a forward range. This can be bad if you try to do multi-pass operations or get the address to the underlying storage.

From the example of the paper:

std::string str = "1.2.3.4";
auto ints = str 
    | std::views::split('.')
    | std::views::transform([](auto v){
        int i = 0;
        std::from_chars(v.data(), v.data() + v.size(), i);
        return i;
    });

will work now, but

std::string str = "1.2.3.4";
auto ints = str 
    | std::views::lazy_split('.')
    | std::views::transform([](auto v){
        int i = 0;
        // v.data() doesn't exist
        std::from_chars(v.data(), v.data() + v.size(), i);
        return i;
    });

won't because the range v is only a forward range, which doesn't provide a data() member.

Original Answer

I was under the impression that split must be lazy as well (laziness was one of the selling points of the ranges proposal after all), so I made a little experiment:

struct CallCount{
    int i = 0;

    auto operator()(auto c) {
        i++;
        return c;
    }

    ~CallCount(){
        if (i > 0) // there are a lot of copies made when the range is constructed
            std::cout << "number of calls: " << i << "\n";
    }
};


int main() {
    
    std::string str = "1 3 5 7 9 1";

    std::cout << "split_view:\n";

    for (auto word : str | std::views::transform(CallCount{}) | std::views::split(' ') | std::views::take(2)) {
    }

    std::cout << "lazy_split_view:\n";

    for (auto word : str | std::views::transform(CallCount{}) | std::views::lazy_split(' ') | std::views::take(2)) {
    }    
}

This code prints (note that the transform operates on each char in the string):

split_view:
number of calls: 6
lazy_split_view:
number of calls: 4

So what happens?

Indeed, both views are lazy. But there are differences in their laziness. The transform that I put in front of split just counts how many times it has been called. As it turns out split computes the next item eagerly, while lazy_split stops as soon as it hits the whitespace after the current item.

You can see that the string str consists of numbers that also mark their char index (starting at 1). The take(2) should stop the loop after we've seen '3' in str. And indeed lazy_split stops at the whitespace after '3', but split stops at the whitespace after '5'.

This esentially means that split fetches its next item eagerly instead of lazy. This difference probably shouldn't matter most of the time but it can impact performance critical code.

I don't know whether that was the reason for this change (I haven't read the paper).

like image 98
Timo Avatar answered Oct 19 '22 15:10

Timo