Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the correct way to read a binary file in chunks of a fixed size and store all of those chunks into a Vec?

Tags:

file

binary

rust

I'm having trouble with opening a file. Most examples read files into a String or read the entire file into a Vec. What I need is to read a file into chunks of a fixed size and store those chunks into an array (Vec) of chunks.

For example, I have a file called my_file of exactly 64 KB size and I want to read it in chunks of 16KB so I would end up with an Vec of size 4 where each element is another Vec with size 16Kb (0x4000 bytes).

After reading the docs and checking other Stack Overflow answers, I was able to come with something like this:

let mut file = std::fs::File::open("my_file")?;
// ...calculate num_of_chunks 4 in this case
let list_of_chunks = Vec::new();

for chunk in 0..num_of_chunks {
    let mut data: [u8; 0x4000] = [0; 0x4000];
    file.read(&mut data[..])?;
    list_of_chunks.push(data.to_vec());
}

Although this seems to work fine, it looks a bit convoluted. I read:

  • For each iteration, create a new array on stack
  • Read the chunk into the array
  • Copy the contents of the array into a new Vec and then move the Vec into the list_of_chunks Vec.

I'm not sure if it's idiomatic or even possible, but I'd rather have something like this:

  • Create a Vec with num_of_chunk elements where each element is another Vec of size 16KB.
  • Read file chunk directly into the correct Vec

No copying and we make sure memory is allocated before reading the file.

Is that approach possible? or is there a better conventional/idiomatic/correct way to do this? I'm wondering if Vec is the correct type for solving this. I mean, I won't need the array to grow after reading the file.

like image 838
dospro Avatar asked Apr 07 '19 04:04

dospro


People also ask

Which method is used to read data from a binary file?

The BinaryReader class is used to read binary data from a file. A BinaryReader object is created by passing a FileStream object to its constructor.

What is the proper way of opening a file for writing as binary?

The open() function opens a file in text format by default. To open a file in binary format, add 'b' to the mode parameter. Hence the "rb" mode opens the file in binary format for reading, while the "wb" mode opens the file in binary format for writing. Unlike text files, binary files are not human-readable.

How do you store and access data from a binary file?

You can choose one of two methods for loading the data. 1) Use the commands open file, read from file and close file. 2) Use the URL keyword with the put command, prefixing the file path with "binfile:". Either approach allows you to place binary data into a variable so that it can be processed.


2 Answers

Read::read_to_end reads efficiently directly into a Vec. If you want it in chunks, combine it with Read::take to limit the amount of bytes that read_to_end will read.

Example:

let mut file = std::fs::File::open("your_file")?;

let mut list_of_chunks = Vec::new();

let chunk_size = 0x4000;

loop {
    let mut chunk = Vec::with_capacity(chunk_size);
    let n = file.by_ref().take(chunk_size as u64).read_to_end(&mut chunk)?;
    if n == 0 { break; }
    list_of_chunks.push(chunk);
    if n < chunk_size { break; }
}

The last if is not necessary, but it prevents an extra read call: If less than the requested amount of bytes was read by read_to_end, we can expect the next read to read nothing, since we hit the end of the file.

like image 111
Mara Bos Avatar answered Oct 11 '22 10:10

Mara Bos


I think the most idiomatic way would be to use an iterator. The code below (freely inspired by M-ou-se's answer):

  • Handles many use cases by using generic types
  • Will use a pre-allocated vector
  • Hides side effect
  • Avoid copying data twice
use std::io::{self, Read, Seek, SeekFrom};

struct Chunks<R> {
    read: R,
    size: usize,
    hint: (usize, Option<usize>),
}

impl<R> Chunks<R> {
    pub fn new(read: R, size: usize) -> Self {
        Self {
            read,
            size,
            hint: (0, None),
        }
    }

    pub fn from_seek(mut read: R, size: usize) -> io::Result<Self>
    where
        R: Seek,
    {
        let old_pos = read.seek(SeekFrom::Current(0))?;
        let len = read.seek(SeekFrom::End(0))?;

        let rest = (len - old_pos) as usize; // len is always >= old_pos but they are u64
        if rest != 0 {
            read.seek(SeekFrom::Start(old_pos))?;
        }

        let min = rest / size + if rest % size != 0 { 1 } else { 0 };
        Ok(Self {
            read,
            size,
            hint: (min, None), // this could be wrong I'm unsure
        })
    }

    // This could be useful if you want to try to recover from an error
    pub fn into_inner(self) -> R {
        self.read
    }
}

impl<R> Iterator for Chunks<R>
where
    R: Read,
{
    type Item = io::Result<Vec<u8>>;

    fn next(&mut self) -> Option<Self::Item> {
        let mut chunk = Vec::with_capacity(self.size);
        match self
            .read
            .by_ref()
            .take(chunk.capacity() as u64)
            .read_to_end(&mut chunk)
        {
            Ok(n) => {
                if n != 0 {
                    Some(Ok(chunk))
                } else {
                    None
                }
            }
            Err(e) => Some(Err(e)),
        }
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        self.hint
    }
}

trait ReadPlus: Read {
    fn chunks(self, size: usize) -> Chunks<Self>
    where
        Self: Sized,
    {
        Chunks::new(self, size)
    }
}

impl<T: ?Sized> ReadPlus for T where T: Read {}

fn main() -> io::Result<()> {
    let file = std::fs::File::open("src/main.rs")?;
    let iter = Chunks::from_seek(file, 0xFF)?; // replace with anything 0xFF was to test

    println!("{:?}", iter.size_hint());
    // This iterator could return Err forever be careful collect it into an Result
    let chunks = iter.collect::<Result<Vec<_>, _>>()?;
    println!("{:?}, {:?}", chunks.len(), chunks.capacity());

    Ok(())
}
like image 41
Stargateur Avatar answered Oct 11 '22 09:10

Stargateur