Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most efficient way to read a large file in chunks without loading the entire file in memory at once?

Tags:

What is the most efficient general purpose way of reading "large" files (which may be text or binary), without going into unsafe territory? I was surprised how few relevant results there were when I did a web search for "rust read large file in chunks".

For example, one of my use cases is to calculate an MD5 checksum for a file using rust-crypto (the Md5 module allows you to add &[u8] chunks iteratively).

Here is what I have, which seems to perform slightly better than some other methods like read_to_end:

use std::{     fs::File,     io::{self, BufRead, BufReader}, };  fn main() -> io::Result<()> {     const CAP: usize = 1024 * 128;     let file = File::open("my.file")?;     let mut reader = BufReader::with_capacity(CAP, file);      loop {         let length = {             let buffer = reader.fill_buf()?;             // do stuff with buffer here             buffer.len()         };         if length == 0 {             break;         }         reader.consume(length);     }      Ok(()) } 
like image 476
Jacob Brown Avatar asked May 06 '16 18:05

Jacob Brown


People also ask

How do I read large chunks in Python?

To read a large file in chunk, we can use read() function with while loop to read some chunk data from a text file at a time.

How do I read large files?

To be able to open such large CSV files, you need to download and use a third-party application. If all you want is to view such files, then Large Text File Viewer is the best choice for you. For actually editing them, you can try a feature-rich text editor like Emacs, or go for a premium tool like CSV Explorer.

How do you process a large file in Python?

Reading Large Text Files in Python We can use the file object as an iterator. The iterator will return each line one by one, which can be processed. This will not read the whole file into memory and it's suitable to read large files in Python.


1 Answers

I don't think you can write code more efficient than that. fill_buf on a BufReader over a File is basically just a straight call to read(2).

That said, BufReader isn't really a useful abstraction when you use it like that; it would probably be less awkward to just call file.read(&mut buf) directly.

like image 71
Eli Friedman Avatar answered Oct 16 '22 04:10

Eli Friedman