Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read large file and split by "\r\n"

Tags:

c#

.net

I have a large file >200MB. The file is an CSV-file from an external party, but sadly I cannot just read the file line by line, as \r\n is used to define a new line.

Currently I am reading in all the lines using this approach:

var file = File.ReadAllText(filePath, Encoding.Default);
var lines = Regex.Split(file, @"\r\n");

for (int i = 0; i < lines.Length; i++)
{
    string line = lines[i];
    ...
}

How can I optimize this? After calling ReadAllText on my 225MB file, the process is using more than 1GB RAM. Is it possible to use a streaming approach in my case, where I need to split the file using my \r\n pattern?

EDIT1: Your solutions using the File.ReadLines and a StreamReader will not work, as it sees each line in the file as one line. I need to split the file using my \r\n pattern. Reading the file using my code results in 758.371 lines (which is correct), whereas a normal line counts results in more than 1.5 million.

SOLUTION

public static IEnumerable<string> ReadLines(string path)
{
    const string delim = "\r\n";

    using (StreamReader sr = new StreamReader(path))
    {
        StringBuilder sb = new StringBuilder();

        while (!sr.EndOfStream)
        {
            for (int i = 0; i < delim.Length; i++)
            {
                Char c = (char)sr.Read();
                sb.Append(c);

                if (c != delim[i])
                    break;

                if (i == delim.Length - 1)
                {
                    sb.Remove(sb.Length - delim.Length, delim.Length);
                    yield return sb.ToString();
                    sb = new StringBuilder();
                    break;
                }
            }
        }

        if (sb.Length>0)
            yield return sb.ToString();
    }
}
like image 512
dhrm Avatar asked Dec 07 '25 09:12

dhrm


2 Answers

You can use File.ReadLines which returns IEnumerable<string> instead of loading whole file to memory.

foreach(var line in File.ReadLines(@filePath, Encoding.Default)
                        .Where(l => !String.IsNullOrEmpty(l)))
{
}
like image 73
L.B Avatar answered Dec 08 '25 22:12

L.B


using StreamReader it will be easy.

using (StreamReader sr = new StreamReader(path)) 
 {
      foreach(string line = GetLine(sr)) 
      {
           //
      }
 }


    IEnumerable<string> GetLine(StreamReader sr)
    {
        while (!sr.EndOfStream)
            yield return new string(GetLineChars(sr).ToArray());
    }

    IEnumerable<char> GetLineChars(StreamReader sr)
    {
        if (sr.EndOfStream)
            yield break;
        var c1 = sr.Read();
        if (c1 == '\\')
        {
            var c2 = sr.Read();
            if (c2 == 'r')
            {
                var c3 = sr.Read();
                if (c3 == '\\')
                {
                    var c4 = sr.Read();
                    if (c4 == 'n')
                    {
                        yield break;
                    }
                    else
                    {
                        yield return (char)c1;
                        yield return (char)c2;
                        yield return (char)c3;
                        yield return (char)c4;
                    }
                }
                else
                {
                    yield return (char)c1;
                    yield return (char)c2;
                    yield return (char)c3;
                }
            }
            else
            {
                yield return (char)c1;
                yield return (char)c2;
            }
        }
        else
            yield return (char)c1;
    }
like image 21
Rohit Avatar answered Dec 08 '25 21:12

Rohit



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!