Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Byte InputRange from file

Tags:

file-io

d

phobos

How to construct easily a raw byte-by-byte InputRange/ForwardRange/RandomAccessRange from a file?

like image 925
Tamas Avatar asked May 16 '15 10:05

Tamas


2 Answers

file.byChunk(4096).joiner

This reads a file in 4096-byte chunks and lazily joins the chunks together into a single ubyte input range.

joiner is from std.algorithm, so you'll have to import it first.

like image 96
Colonel Thirty Two Avatar answered Nov 11 '22 00:11

Colonel Thirty Two


The easiest way to make a raw byte range from a file is to just read it all right into memory:

import std.file;
auto data = cast(ubyte[]) read("filename");
// data is a full-featured random access range of the contents

If the file is too large for that to be reasonable, you could try a memory-mapped file http://dlang.org/phobos/std_mmfile.html and use the opSlice to get an array off it. Since it is an array, you get full range features, but since it is memory mapped by the operating system, you get lazy reading as you touch the file.

For a simple InputRange, there's LockingTextReader (undocumented) in Phobos, or you could construct one yourself over byChunk or even fgetc, the C function. fgetc would be the easiest to write:

struct FileByByte {
    ubyte front;
    void popFront() { front = cast(ubyte) fgetc(fp); }
    bool empty() { return feof(fp); }
    FILE* fp;
    this(FILE* fp) { this.fp = fp; popFront(); /* prime it */ }
}

I haven't actually tested that but i'm pretty sure it'd work. (BTW the file open and close is separate from this because ranges are supposed to be just views into data, not managed containers. You wouldn't want the file closed just because you passed this range into a function.)

This is not a forward nor random access range though. Those are trickier to do on streams without a lot of buffering code and I think that'd be a mistake to try to write - generally, ranges should be cheap, not emulating features the underlying container doesn't natively support.

EDIT: The other answer has a non-buffering way! https://stackoverflow.com/a/30278933/1457000 That's awesome.

like image 7
Adam D. Ruppe Avatar answered Nov 11 '22 01:11

Adam D. Ruppe