Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a lazy-evaluated range from a file?

Tags:

file-io

d

The File I/O API in Phobos is relatively easy to use, but right now I feel like it's not very well integrated with D's range interface.

I could create a range delimiting the full contents by reading the entire file into an array:

import std.file;
auto mydata = cast(ubyte[]) read("filename");
processData(mydata); // takes a range of ubytes

But this eager evaluation of the data might be undesired if I only want to retrieve a file's header, for example. The upTo parameter doesn't solve this issue if the file's format assumes a variable-length header or any other element we wish to retrieve. It could even be in the middle of the file, and read forces me to read all of the file up to that point.

But indeed, there are alternatives. readf, readln, byLine and most particularly byChunk let me retrieve pieces of data until I reach the end of the file, or just when I want to stop reading the file.

import std.stdio;
File file("filename");
auto chunkRange = file.byChunk(1000); // a range of ubyte[]s
processData(chunkRange); // oops! not expecting chunks!

But now I have introduced the complexity of dealing with fixed size chunks of data, rather than a continuous range of bytes.

So how can I create a simple input range of bytes from a file that is lazy evaluated, either by characters or by small chunks (to reduce the number of reads)? Can the range in the second example be seamlessly encapsulated in a way that the data can be processed like in the first example?

like image 388
E_net4 stands with Ukraine Avatar asked Jan 21 '15 19:01

E_net4 stands with Ukraine


1 Answers

You can use std.algorithm.joiner:

auto r = File("test.txt").byChunk(4096).joiner();

Note that byChunk reuses the same buffer for each chunk, so you may need to add .map!(chunk => chunk.idup) to lazily copy the chunks to the heap.

like image 66
Vladimir Panteleev Avatar answered Jan 01 '23 11:01

Vladimir Panteleev