Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are the strings in my iterator being concatenated?

My original goal is to fetch a list of words, one on each line, and to put them in a HashSet, while discarding comment lines and raising I/O errors properly. Given the file "stopwords.txt":

a
# this is actually a comment
of
the
this

I managed to make the code compile like this:

fn stopword_set() -> io::Result<HashSet<String>> {
    let words = Result::from_iter(
        BufReader::new(File::open("stopwords.txt")?)
                .lines()
                .filter(|r| match r {
                    &Ok(ref l) => !l.starts_with('#'),
                    _ => true
                }));
    Ok(HashSet::from_iter(words))
}

fn main() {
    let set = stopword_set().unwrap();
    println!("{:?}", set);
    assert_eq!(set.len(), 4);
}

Here's a playground that also creates the file above.

I would expect to have a set of 4 strings at the end of the program. To my surprise, the function actually returns a set containing a single string with all words concatenated:

{"aofthethis"}
thread 'main' panicked at 'assertion failed: `(left == right)` (left: `1`, right: `4`)'

Led by a piece of advice in the docs for FromIterator, I got rid of all calls to from_iter and used collect instead (Playground), which has indeed solved the problem.

fn stopword_set() -> io::Result<HashSet<String>> {
    BufReader::new(File::open("stopwords.txt")?)
            .lines()
            .filter(|r| match r {
                &Ok(ref l) => !l.starts_with('#'),
                _ => true
            }).collect()
}

Why are the previous calls to from_iter leading to unexpected inferences, while collect() works just as intended?

like image 858
E_net4 stands with Ukraine Avatar asked Feb 07 '17 23:02

E_net4 stands with Ukraine


1 Answers

A simpler reproduction:

use std::collections::HashSet;
use std::iter::FromIterator;

fn stopword_set() -> Result<HashSet<String>, u8> {
    let input: Vec<Result<_, u8>> = vec![Ok("foo".to_string()), Ok("bar".to_string())];
    let words = Result::from_iter(input.into_iter());
    Ok(HashSet::from_iter(words))
}

fn main() {
    let set = stopword_set().unwrap();
    println!("{:?}", set);
    assert_eq!(set.len(), 2);
}

The problem is that here, we are collecting from the iterator twice. The type of words is Result<_, u8>. However, Result also implements Iterator itself, so when we call from_iter on that at the end, the compiler sees that the Ok type must be String due to the method signature. Working backwards, you can construct a String from an iterator of Strings, so that's what the compiler picks.

Removing the second from_iter would solve it:

fn stopword_set() -> Result<HashSet<String>, u8> {
    let input: Vec<Result<_, u8>> = vec![Ok("foo".to_string()), Ok("bar".to_string())];
    Result::from_iter(input.into_iter())
}

Or for your original:

fn stopword_set() -> io::Result<HashSet<String>> {
    Result::from_iter(
        BufReader::new(File::open("stopwords.txt")?)
                .lines()
                .filter(|r| match r {
                    &Ok(ref l) => !l.starts_with('#'),
                    _ => true
                }))
}

Of course, I'd normally recommend using collect instead, as I prefer the chaining:

fn stopword_set() -> io::Result<HashSet<String>> {
    BufReader::new(File::open("stopwords.txt")?)
        .lines()
        .filter(|r| match r {
            &Ok(ref l) => !l.starts_with('#'),
            _ => true,
        })
        .collect()
}
like image 186
Shepmaster Avatar answered Oct 14 '22 01:10

Shepmaster