I have a large file >200MB. The file is an CSV-file from an external party, but sadly I cannot just read the file line by line, as \r\n is used to define a new line.
Currently I am reading in all the lines using this approach:
var file = File.ReadAllText(filePath, Encoding.Default);
var lines = Regex.Split(file, @"\r\n");
for (int i = 0; i < lines.Length; i++)
{
string line = lines[i];
...
}
How can I optimize this? After calling ReadAllText on my 225MB file, the process is using more than 1GB RAM. Is it possible to use a streaming approach in my case, where I need to split the file using my \r\n pattern?
EDIT1:
Your solutions using the File.ReadLines and a StreamReader will not work, as it sees each line in the file as one line. I need to split the file using my \r\n pattern. Reading the file using my code results in 758.371 lines (which is correct), whereas a normal line counts results in more than 1.5 million.
SOLUTION
public static IEnumerable<string> ReadLines(string path)
{
const string delim = "\r\n";
using (StreamReader sr = new StreamReader(path))
{
StringBuilder sb = new StringBuilder();
while (!sr.EndOfStream)
{
for (int i = 0; i < delim.Length; i++)
{
Char c = (char)sr.Read();
sb.Append(c);
if (c != delim[i])
break;
if (i == delim.Length - 1)
{
sb.Remove(sb.Length - delim.Length, delim.Length);
yield return sb.ToString();
sb = new StringBuilder();
break;
}
}
}
if (sb.Length>0)
yield return sb.ToString();
}
}
You can use File.ReadLines which returns IEnumerable<string> instead of loading whole file to memory.
foreach(var line in File.ReadLines(@filePath, Encoding.Default)
.Where(l => !String.IsNullOrEmpty(l)))
{
}
using StreamReader it will be easy.
using (StreamReader sr = new StreamReader(path))
{
foreach(string line = GetLine(sr))
{
//
}
}
IEnumerable<string> GetLine(StreamReader sr)
{
while (!sr.EndOfStream)
yield return new string(GetLineChars(sr).ToArray());
}
IEnumerable<char> GetLineChars(StreamReader sr)
{
if (sr.EndOfStream)
yield break;
var c1 = sr.Read();
if (c1 == '\\')
{
var c2 = sr.Read();
if (c2 == 'r')
{
var c3 = sr.Read();
if (c3 == '\\')
{
var c4 = sr.Read();
if (c4 == 'n')
{
yield break;
}
else
{
yield return (char)c1;
yield return (char)c2;
yield return (char)c3;
yield return (char)c4;
}
}
else
{
yield return (char)c1;
yield return (char)c2;
yield return (char)c3;
}
}
else
{
yield return (char)c1;
yield return (char)c2;
}
}
else
yield return (char)c1;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With