Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the main differences between a Rust Iterator and C++ Iterator? [closed]

Tags:

c++

iterator

rust

A typical example of a C++ iterator is a pointer, and can be used to point at an element in a C array like so:

int array[] = {1, 2, 3, 4};
int* begin = std::begin(array); //Starting iterator
int* end = std::end(array) //Ending iterator

for(int* i = begin; i < end; i++)
{
    std::cout << *i << ',';
}

//Prints 1, 2, 3, 4

This is simple enough. The definition of an iterator from cplusplus.com is

An iterator is any object that, pointing to some element in a range of elements (such as an array or a container), has the ability to iterate through the elements of that range using a set of operators...

This makes sense; in the above code there were two iterators (the begin and the end iterators), and it used a for loop and incremented.

In Rust, an iterator is used like this:

let vect = vec![1, 2, 3, 4];

let vect_iter = vect.iter();

What? To iterate it you do:

vect_iter.next();
vect_iter.next();

I couldn't find any exact definition of a pointer in the Rust docs, but looking at the Iterator trait, it is seems that an iterator is a wrapper for a container that enables for easier processing, by standardizing the logic in a way (if that makes sense at all).

The main questions I have are:

  1. What are the main differences?
  2. Why does Rust have iterators in this fashion and why are they expressed so differently?
  3. Are there Rust-type iterators in C++?
  4. Are there C++-type iterators in Rust?
  5. Are they called something specific? (Internal/External?)
like image 806
The_Babu Avatar asked Feb 27 '18 01:02

The_Babu


People also ask

What is a Rust iterator?

In Rust, iterators are lazy, meaning they have no effect until you call methods that consume the iterator to use it up. For example, the code in Listing 13-10 creates an iterator over the items in the vector v1 by calling the iter method defined on Vec<T> . This code by itself doesn't do anything useful. let v1 = vec![

What does .iter do Rust?

Iterator. An iterator has a method, next , which when called, returns an Option<Item> . Calling next will return Some(Item) as long as there are elements, and once they've all been exhausted, will return None to indicate that iteration is finished.

What is an iterator C?

An iterator is an object that allows you to step through the contents of another object, by providing convenient operations for getting the first element, testing when you are done, and getting the next element if you are not. In C, we try to design iterators to have operations that fit well in the top of a for loop.


2 Answers

An iterator is a concept found in programming languages to refer to a construct which allows iterating over collections or sequences of elements. The concept is purposely vague, it's a concept! It does not prescribe any specific implementation.

To more easily distinguish C++ from Rust, I am going to use different names:

  • C++ iterators will be named cursors,
  • Rust iterators will be named streams.

Yes, those are completely arbitrary. Note that if you look at languages like Java or C#, you will find that they also use streams.


C++

First of all, don't use cplusplus.com. cppreference.com is much better.

An iterator is any object that, pointing to some element in a range of elements (such as an array or a container), has the ability to iterate through the elements of that range using a set of operators...

Simple, and wrong.

A cursor may either:

  • point to an element,
  • or be singular and point to no element at all.

In general, the singular value is used to represent:

  • the end of the sequence to iterate over: vec.end(),
  • the absence of an element: std::find(...).

You can increment, and sometimes decrement, a cursor. If you do so, you however generally need a pair of cursors to know when to stop.

Why did C++ use such a representation? Because that's how C did it, and it works pretty well... although it's error prone.


Rust

Rust endeavors to be safe and favors APIs which are easy to use. This rules out a pair of cursors:

  • a pair of cursors is not safe: you can easily iterate out of bounds, and you can obtain aliasing references,
  • a pair of cursors is error-prone: it's easy to accidentally pair off cursors from two different sequences.

In order to control bounds, aliasing and avoid pair mismatch, you have to use a single object; thus the stream-like API.

The Iterator API in Rust is reminiscent of that of Java and C#, although Rust improves upon it by using Option<T> so that instead of the clumsy hasNext()/next() call pair, it offers a single method next() which both advances the stream and may signal its end.


Conclusion

Both Rust and C++ features a way to iterate over a collection of elements:

  • C++ offers a C-like way, flexible but error-prone,
  • Rust offers a modern way, safe but less flexible.

Both languages also offer external and internal iteration:

  • External: the user controls the iteration (calls ++ or next()),
  • Internal: the iterator controls the user code (see std::foreach and Iterator::foreach).
like image 155
Matthieu M. Avatar answered Sep 29 '22 00:09

Matthieu M.


Iterators in Rust and C++ are conceptually quite different.

C++

In C++, an iterator is like a pointer. Iterators refer to an object, they can be incremented to refer to the next object, and they can be compared for equality with other iterators. Iterators can also refer to no object at all—they can refer to the “one past the end” element of a sequence, or they can be “singular” (which is like a null pointer). Some iterators support additional operations like moving both forwards and backwards, random access, and being copied.

A pointer in C++ is a valid iterator, but there are also other types which are iterators.

Iterators do not represent a sequence of elements, at least, that is not the convention. In C++, if you want a sequence of elements, you need a pair of iterators*: one for the beginning and one for the end. You aren't forced to iterate over the elements in sequence, you can do all sorts of other things. For example, if you want to reverse an array in C++, you can do it with iterators:

#include <algorithm>
#include <iterator>
#include <cstdio>
#include <utility>

template <typename T, std::size_t N>
void reverse_array(T (&arr)[N]) {
    using std::swap;
    auto left = std::begin(arr), right = std::end(arr);
    while (left < right) {
        --right;
        swap(*left, *right);
        ++left;
    }
}

int main() {
    int x[] = {1, 2, 3, 4, 5};
    reverse_array(x);
    for (const auto it : x) {
        std::printf("%d\n", it);
    }
    return 0;
}

But you can quickly generalize it to work on any container with bidirectional iterators:

#include <algorithm>
#include <iterator>
#include <list>
#include <cstdio>
#include <utility>

template <typename Iterator>
void reverse_any(Iterator left, Iterator right) {
    using std::swap;
    while (left != right) {
        --right;
        if (left == right)
            break;
        swap(*left, *right);
        ++left;
    }
}

int main() {
    std::list<int> list{1, 2, 3, 4, 5};
    reverse_any(std::begin(list), std::end(list));
    for (const auto it : list) {
        std::printf("%d\n", it);
    }
    return 0;
}

Rust

In Rust, an iterator is like a slice. Iterators refer to a sequence of objects, and elements can be accessed from the iterator using the next() method. In a sense, this means an iterator in Rust has both the begin and end iterator inside it. Reimplementing the C++ code above in Rust, you'll get something like this:

fn reverse_any<'a, T: 'a, Iter>(mut iter: Iter)
where
    Iter: DoubleEndedIterator<Item = &'a mut T>,
{
    while let Some(left) = iter.next() {
        if let Some(right) = iter.next_back() {
            std::mem::swap(left, right);
        }
    }
}

fn main() {
    let mut v = [1, 2, 3, 4, 5];
    reverse_any(v.iter_mut());
    println!("{:?}", v);
}

This has the added benefit of safety. Iterator invalidation is one of the most common sources of errors in C++ programs, but Rust eliminates the problem completely.

The cost is that if you want to mutate the elements, you are limited to a single (possibly double-ended) iterator in Rust, while in C++, you can have as many iterators as you want working with the same container. While single-ended and double-ended ranges are the most common case for iterators, there are some algorithms that use the additional flexibility that C++ provides.

One simple example I can think of is C++'s std::remove_if. A straightforward implementation of remove_if would use three iterators: two iterators for keeping track of the range of elements being scanned, and a third iterator for tracking the elements being written. You could translate std::remove_if to Rust, but it would not be able to operate on normal Rust iterators and still modify the container in-place.

Another simple example is the Dutch National Flag problem, which commonly uses three iterators. The solution to this problem is often used for partitioning elements for quicksort, so it's an important problem.

Summary

A Rust iterator is almost equivalent to a start + end C++ iterator pair. C++ allows you to use multiple iterators and move iterators forwards and backwards. Rust guarantees that you don't accidentally use an invalid iterator, but you can only use one at a time and it can only move in one direction.

I don't know of any terminology for distinguishing these types of iterators. Note that Rust-style iterators are much more common, iterators in C#, Python, Java, etc. work the same way but they might have slightly different names (they are called "enumerators" in C#).

Footnotes

*: Technically this is not true. You only need to have one iterator in C++, however, it is conventional to have a pair and library functions typically operate on pairs of iterators (so you "need" two iterators if you want to use those functions). The fact that you have a (start,end) pair does not mean that sequences are bounded, the end iterator can be infinitely far away. Think of it like having a range (0,∞) in mathematics... ∞ isn't really a number, it's just a placeholder that lets you know that the range is unbounded on the right.

: Remember that just because the “end” iterator exists in C++ does not mean that the sequence actually has an end. Some end iterators in C++ are like infinity. They don’t point to valid elements, and no matter how many times you iterate forward, you won’t reach infinity. In Rust, the equivalent construction is an iterator that never returns None.

like image 27
Dietrich Epp Avatar answered Sep 28 '22 23:09

Dietrich Epp