Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this the right way to read lines from file and split them into words in Rust?

Tags:

rust

Editor's note: This code example is from a version of Rust prior to 1.0 and is not syntactically valid Rust 1.0 code. Updated versions of this code produce different errors, but the answers still contain valuable information.

I've implemented the following method to return me the words from a file in a 2 dimensional data structure:

fn read_terms() -> Vec<Vec<String>> {
    let path = Path::new("terms.txt");
    let mut file = BufferedReader::new(File::open(&path));
    return file.lines().map(|x| x.unwrap().as_slice().words().map(|x| x.to_string()).collect()).collect();
}

Is this the right, idiomatic and efficient way in Rust? I'm wondering if collect() needs to be called so often and whether it's necessary to call to_string() here to allocate memory. Maybe the return type should be defined differently to be more idiomatic and efficient?

like image 714
Michał Fronczyk Avatar asked Aug 30 '14 10:08

Michał Fronczyk


2 Answers

There is a shorter and more readable way of getting words from a text file.

use std::io::{BufRead, BufReader};
use std::fs::File;

let reader = BufReader::new(File::open("file.txt").expect("Cannot open file.txt"));

for line in reader.lines() {
    for word in line.unwrap().split_whitespace() {
        println!("word '{}'", word);
    }
}
like image 63
katomaso Avatar answered Oct 18 '22 23:10

katomaso


You could instead read the entire file as a single String and then build a structure of references that points to the words inside:

use std::io::{self, Read};
use std::fs::File;

fn filename_to_string(s: &str) -> io::Result<String> {
    let mut file = File::open(s)?;
    let mut s = String::new();
    file.read_to_string(&mut s)?;
    Ok(s)
}

fn words_by_line<'a>(s: &'a str) -> Vec<Vec<&'a str>> {
    s.lines().map(|line| {
        line.split_whitespace().collect()
    }).collect()
}

fn example_use() {
    let whole_file = filename_to_string("terms.txt").unwrap();
    let wbyl = words_by_line(&whole_file);
    println!("{:?}", wbyl)
}

This will read the file with less overhead because it can slurp it into a single buffer, whereas reading lines with BufReader implies a lot of copying and allocating, first into the buffer inside BufReader, and then into a newly allocated String for each line, and then into a newly allocated the String for each word. It will also use less memory, because the single large String and vectors of references are more compact than many individual Strings.

A drawback is that you can't directly return the structure of references, because it can't live past the stack frame the holds the single large String. In example_use above, we have to put the large String into a let in order to call words_by_line. It is possible to get around this with unsafe code and wrapping the String and references in a private struct, but that is much more complicated.

like image 32
wingedsubmariner Avatar answered Oct 18 '22 22:10

wingedsubmariner