Reading text files line by line, with exact offset/position reporting

My simple requirement: Reading a huge (> a million) line test file (For this example assume it's a CSV of some sorts) and keeping a reference to the beginning of that line for faster lookup in the future (read a line, starting at X).

I tried the naive and easy way first, using a StreamWriter and accessing the underlying BaseStream.Position. Unfortunately that doesn't work as I intended:

Given a file containing the following


and this very simple code

using (var sr = new StreamReader(@"C:\Temp\LineTest.txt")) {
  string line;
  long pos = sr.BaseStream.Position;
  while ((line = sr.ReadLine()) != null) {
    Console.Write("{0:d3} ", pos);
    pos = sr.BaseStream.Position;

the output is:

000 Foo
025 Bar
025 Baz
025 Bla
025 Fasel

I can imagine that the stream is trying to be helpful/efficient and probably reads in (big) chunks whenever new data is necessary. For me this is bad..

The question, finally: Any way to get the (byte, char) offset while reading a file line by line without using a basic Stream and messing with \r \n \r\n and string encoding etc. manually? Not a big deal, really, I just don't like to build things that might exist already..

2 Answers

This is really tough issue. After very long and exhausting enumeration of different solutions in the internet (including solutions from this thread, thank you!) I had to create my own bicycle.

I had following requirements:

  • Performance - reading must be very fast, so reading one char at the time or using reflection are not acceptable, so buffering is required
  • Streaming - file can be huge, so it is not acceptable to read it to memory entirely
  • Tailing - file tailing should be available
  • Long lines - lines can be very long, so buffer can't be limited
  • Stable - single byte error was immediately visible during usage. Unfortunately for me, several implementations I found were with stability problems

    public class OffsetStreamReader
        private const int InitialBufferSize = 4096;    
        private readonly char _bom;
        private readonly byte _end;
        private readonly Encoding _encoding;
        private readonly Stream _stream;
        private readonly bool _tail;
        private byte[] _buffer;
        private int _processedInBuffer;
        private int _informationInBuffer;
        public OffsetStreamReader(Stream stream, bool tail)
            _buffer = new byte[InitialBufferSize];
            _processedInBuffer = InitialBufferSize;
            if (stream == null || !stream.CanRead)
                throw new ArgumentException("stream");
            _stream = stream;
            _tail = tail;
            _encoding = Encoding.UTF8;
            _bom = '\uFEFF';
            _end = _encoding.GetBytes(new [] {'\n'})[0];
        public long Offset { get; private set; }
        public string ReadLine()
            // Underlying stream closed
            if (!_stream.CanRead)
                return null;
            // EOF
            if (_processedInBuffer == _informationInBuffer)
                if (_tail)
                    _processedInBuffer = _buffer.Length;
                    _informationInBuffer = 0;
                return null;
            var lineEnd = Search(_buffer, _end, _processedInBuffer);
            var haveEnd = true;
            // File ended but no finalizing newline character
            if (lineEnd.HasValue == false && _informationInBuffer + _processedInBuffer < _buffer.Length)
                if (_tail)
                    return null;
                    lineEnd = _informationInBuffer;
                    haveEnd = false;
            // No end in current buffer
            if (!lineEnd.HasValue)
                if (_informationInBuffer != 0)
                    return ReadLine();
                return null;
            var arr = new byte[lineEnd.Value - _processedInBuffer];
            Array.Copy(_buffer, _processedInBuffer, arr, 0, arr.Length);
            Offset = Offset + lineEnd.Value - _processedInBuffer + (haveEnd ? 1 : 0);
            _processedInBuffer = lineEnd.Value + (haveEnd ? 1 : 0);
            return _encoding.GetString(arr).TrimStart(_bom).TrimEnd('\r', '\n');
        private void ReadBuffer()
            var notProcessedPartLength = _buffer.Length - _processedInBuffer;
            // Extend buffer to be able to fit whole line to the buffer
            // Was     [NOT_PROCESSED]
            // Become  [NOT_PROCESSED        ]
            if (notProcessedPartLength == _buffer.Length)
                var extendedBuffer = new byte[_buffer.Length + _buffer.Length/2];
                Array.Copy(_buffer, extendedBuffer, _buffer.Length);
                _buffer = extendedBuffer;
            // Copy not processed information to the begining
            // Was    [PROCESSED NOT_PROCESSED]
            // Become [NOT_PROCESSED          ]
            Array.Copy(_buffer, (long) _processedInBuffer, _buffer, 0, notProcessedPartLength);
            // Read more information to the empty part of buffer
            // Was    [ NOT_PROCESSED                   ]
            // Become [ NOT_PROCESSED NEW_NOT_PROCESSED ]
            _informationInBuffer = notProcessedPartLength + _stream.Read(_buffer, notProcessedPartLength, _buffer.Length - notProcessedPartLength);
            _processedInBuffer = 0;
        private int? Search(byte[] buffer, byte byteToSearch, int bufferOffset)
            for (int i = bufferOffset; i < buffer.Length - 1; i++)
                if (buffer[i] == byteToSearch)
                    return i;
            return null;
You could create a TextReader wrapper, which would track the current position in the base TextReader :

public class TrackingTextReader : TextReader
    private TextReader _baseReader;
    private int _position;

    public TrackingTextReader(TextReader baseReader)
        _baseReader = baseReader;

    public override int Read()
        return _baseReader.Read();

    public override int Peek()
        return _baseReader.Peek();

    public int Position
        get { return _position; }

You could then use it as follows :

string text = @"Foo

using (var reader = new StringReader(text))
using (var trackingReader = new TrackingTextReader(reader))
    string line;
    while ((line = trackingReader.ReadLine()) != null)
        Console.WriteLine("{0:d3} {1}", trackingReader.Position, line);
