I'm having trouble with opening a file. Most examples read files into a String
or read the entire file into a Vec
. What I need is to read a file into chunks of a fixed size and store those chunks into an array (Vec
) of chunks.
For example, I have a file called my_file
of exactly 64 KB size and I want to read it in chunks of 16KB so I would end up with an Vec
of size 4 where each element is another Vec
with size 16Kb (0x4000 bytes).
After reading the docs and checking other Stack Overflow answers, I was able to come with something like this:
let mut file = std::fs::File::open("my_file")?;
// ...calculate num_of_chunks 4 in this case
let list_of_chunks = Vec::new();
for chunk in 0..num_of_chunks {
let mut data: [u8; 0x4000] = [0; 0x4000];
file.read(&mut data[..])?;
list_of_chunks.push(data.to_vec());
}
Although this seems to work fine, it looks a bit convoluted. I read:
Vec
and then move the Vec
into the list_of_chunks
Vec
.I'm not sure if it's idiomatic or even possible, but I'd rather have something like this:
Vec
with num_of_chunk
elements where each element is another Vec
of size 16KB.Vec
No copying and we make sure memory is allocated before reading the file.
Is that approach possible? or is there a better conventional/idiomatic/correct way to do this?
I'm wondering if Vec
is the correct type for solving this. I mean, I won't need the array to grow after reading the file.
The BinaryReader class is used to read binary data from a file. A BinaryReader object is created by passing a FileStream object to its constructor.
The open() function opens a file in text format by default. To open a file in binary format, add 'b' to the mode parameter. Hence the "rb" mode opens the file in binary format for reading, while the "wb" mode opens the file in binary format for writing. Unlike text files, binary files are not human-readable.
You can choose one of two methods for loading the data. 1) Use the commands open file, read from file and close file. 2) Use the URL keyword with the put command, prefixing the file path with "binfile:". Either approach allows you to place binary data into a variable so that it can be processed.
Read::read_to_end
reads efficiently directly into a Vec
. If you want it in chunks, combine it with Read::take
to limit the amount of bytes that read_to_end
will read.
Example:
let mut file = std::fs::File::open("your_file")?;
let mut list_of_chunks = Vec::new();
let chunk_size = 0x4000;
loop {
let mut chunk = Vec::with_capacity(chunk_size);
let n = file.by_ref().take(chunk_size as u64).read_to_end(&mut chunk)?;
if n == 0 { break; }
list_of_chunks.push(chunk);
if n < chunk_size { break; }
}
The last if
is not necessary, but it prevents an extra read
call: If less than the requested amount of bytes was read by read_to_end
, we can expect the next read
to read nothing, since we hit the end of the file.
I think the most idiomatic way would be to use an iterator. The code below (freely inspired by M-ou-se's answer):
use std::io::{self, Read, Seek, SeekFrom};
struct Chunks<R> {
read: R,
size: usize,
hint: (usize, Option<usize>),
}
impl<R> Chunks<R> {
pub fn new(read: R, size: usize) -> Self {
Self {
read,
size,
hint: (0, None),
}
}
pub fn from_seek(mut read: R, size: usize) -> io::Result<Self>
where
R: Seek,
{
let old_pos = read.seek(SeekFrom::Current(0))?;
let len = read.seek(SeekFrom::End(0))?;
let rest = (len - old_pos) as usize; // len is always >= old_pos but they are u64
if rest != 0 {
read.seek(SeekFrom::Start(old_pos))?;
}
let min = rest / size + if rest % size != 0 { 1 } else { 0 };
Ok(Self {
read,
size,
hint: (min, None), // this could be wrong I'm unsure
})
}
// This could be useful if you want to try to recover from an error
pub fn into_inner(self) -> R {
self.read
}
}
impl<R> Iterator for Chunks<R>
where
R: Read,
{
type Item = io::Result<Vec<u8>>;
fn next(&mut self) -> Option<Self::Item> {
let mut chunk = Vec::with_capacity(self.size);
match self
.read
.by_ref()
.take(chunk.capacity() as u64)
.read_to_end(&mut chunk)
{
Ok(n) => {
if n != 0 {
Some(Ok(chunk))
} else {
None
}
}
Err(e) => Some(Err(e)),
}
}
fn size_hint(&self) -> (usize, Option<usize>) {
self.hint
}
}
trait ReadPlus: Read {
fn chunks(self, size: usize) -> Chunks<Self>
where
Self: Sized,
{
Chunks::new(self, size)
}
}
impl<T: ?Sized> ReadPlus for T where T: Read {}
fn main() -> io::Result<()> {
let file = std::fs::File::open("src/main.rs")?;
let iter = Chunks::from_seek(file, 0xFF)?; // replace with anything 0xFF was to test
println!("{:?}", iter.size_hint());
// This iterator could return Err forever be careful collect it into an Result
let chunks = iter.collect::<Result<Vec<_>, _>>()?;
println!("{:?}, {:?}", chunks.len(), chunks.capacity());
Ok(())
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With