Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c# - Read file with irregular newline characters

Tags:

c#

file

I am trying to read a text file with C#, that is formatted like this:

this is a line\r\n
this is a line\r
\r\n
this is a line\r
\r\n
this is a line\r
\r\n
this is a line\r\n
this is a line\r
\r\n
etc...

I am reading each line from the file with

StreamReader.ReadLine()

but that does not preserve new line characters. I need to know/detect what kind of new line characters there are because I am counting the amount of bytes on each line. For example:

if the the line ends with character \r, line consists of: ((nr-of-bytes-in-line) + 1 byte) bytes (depending on the encoding type of course), if line ends with \r\n, line consists of: ((nr-of-bytes-in-line) + 2 bytes) bytes.

EDIT:

I have the solution, based on the answer of israel altar. BTW: Jon Skeet suggested it also. I have implemented an overridden version of ReadLine, so that it would include new line characters. This is the code of the overridden function:

    public override String ReadLine()
    {
        StringBuilder sb = new StringBuilder();
        while (true)
        {
            int ch = Read();
            if (ch == -1)
            {
                break;
            }
            if (ch == '\r' || ch == '\n')
            {
                if (ch == '\r' && Peek() == '\n')
                {
                    sb.Append('\r');
                    sb.Append('\n');
                    Read();
                    break;
                }
                else if(ch == '\r' && Peek() == '\r')
                {
                    sb.Append('\r');
                    break;
                }
            }
            sb.Append((char)ch);
        }
        if (sb.Length > 0)
        {
            return sb.ToString();
        }
        return null;
    }
like image 940
DrGrid Avatar asked Oct 25 '25 22:10

DrGrid


1 Answers

this is the way that readline is implemented according to .net resources:

// Reads a line. A line is defined as a sequence of characters followed by
        // a carriage return ('\r'), a line feed ('\n'), or a carriage return
        // immediately followed by a line feed. The resulting string does not
        // contain the terminating carriage return and/or line feed. The returned
        // value is null if the end of the input stream has been reached.
        //
        public virtual String ReadLine() 
        {
            StringBuilder sb = new StringBuilder();
            while (true) {
                int ch = Read();
                if (ch == -1) break;
                if (ch == '\r' || ch == '\n') 
                {
                    if (ch == '\r' && Peek() == '\n') Read();
                    return sb.ToString();
                }
                sb.Append((char)ch);
            }
            if (sb.Length > 0) return sb.ToString();
            return null;
        }

as you can see you can add an if sentence like this:

 if (ch == '\r') 
{
  //add the amount of bytes wanted
}
if  (ch == '\n')
{
  //add the amount of bytes wanted
}

or do whatever manipulation you want.

like image 155
israel altar Avatar answered Oct 28 '25 12:10

israel altar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!