Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bytes consumed by StreamReader

Tags:

c#

.net

Is there a way to know how many bytes of a stream have been used by StreamReader?

I have a project where we need to read a file that has a text header followed by the start of the binary data. My initial attempt to read this file was something like this:

private int _dataOffset;
void ReadHeader(string path) 
{
    using (FileStream stream = File.OpenRead(path)) 
    {
        StreamReader textReader = new StreamReader(stream);

        do 
        {
            string line = textReader.ReadLine();
            handleHeaderLine(line);
        } while(line != "DATA") // Yes, they used "DATA" to mark the end of the header

        _dataOffset = stream.Position;
    }
}

private byte[] ReadDataFrame(string path, int frameNum) 
{
    using (FileStream stream = File.OpenRead(path)) 
    {
        stream.Seek(_dataOffset + frameNum * cbFrame, SeekOrigin.Begin);

        byte[] data = new byte[cbFrame];
        stream.Read(data, 0, cbFrame);

        return data;
    }
    return null;
}

The problem is that when I set _dataOffset to stream.Position, I get the position that the StreamReader has read to, not the end of the header. As soon as I thought about it this made sense, but I still need to be able to know where the end of the header is and I'm not sure if there's a way to do it and still take advantage of StreamReader.

like image 993
Jon Norton Avatar asked Apr 10 '09 11:04

Jon Norton


3 Answers

You can find out how many bytes the StreamReader has actually returned (as opposed to read from the stream) in a number of ways, none of them too straightforward I'm afraid.

  1. Get the result of textReader.CurrentEncoding.GetByteCount(totalLengthOfAllTextRead) and then seek to this position in the stream.
  2. Use some reflection hackery to retrieve the value of the private variable of the StreamReader object that corresponds to the current byte position within the internal buffer (different from that with the stream - usually behind, but no more than equal to of course). Judging by .NET Reflector, the this variable seems to be named bytePos.
  3. Don't bother using a StreamReader at all but instead implement your custom ReadLine function built on top of the Stream or BinaryReader even (BinaryReader is guaranteed never to read further ahead than what you request). This custom function must read from the stream char by char, so you'd actually have to use the low-level Decoder object (unless the encoding is ASCII/ANSI, in which case things are a bit simpler due to single-byte encoding).

Option 1 is going to be the least efficient I would imagine (since you're effectively re-encoding text you just decoded), and option 3 the hardest to implement, though perhaps the most elegant. I'd probably recommend against using the ugly reflection hack (option 2), even though it's looks tempting, being the most direct solution and only taking a couple of lines. (To be quite honest, the StreamReader class really ought to expose this variable via a public property, but alas it does not.) So in the end, it's up to you, but either method 1 or 3 should do the job nicely enough...

Hope that helps.

like image 57
Noldorin Avatar answered Nov 16 '22 09:11

Noldorin


So the data is utf8 (the default encoding for StreamReader). This is a multibyte encoding, so IndexOf would be inadvisable. You could:

Encoding.UTF8.GetByteCount(string)

on your data so far, adding 1 or 2 bytes for the missing line ending.

like image 41
spender Avatar answered Nov 16 '22 09:11

spender


If you're needing to count bytes, I'd go with the BinaryReader. You can take the results and cast them about as needed, but I find its idea of its current position to be more reliable (in that since it reads in binary, its immune to character-set problems).

like image 1
GWLlosa Avatar answered Nov 16 '22 09:11

GWLlosa