I am trying to count the number of rows in a text file (to compare to a control file) before performing a complex SSIS insert package.
Currently I am using a StreamReader and it is breaking a line with a {LF} embedded into a new line, whereas SSIS is using {CR}{LF} (correctly), so the counts are not tallying up.
Does anyone know an alternate method of doing this where I can count the number of lines in the file based on {CR}{LF} Line breaks only?
Thanks in advance
Iterate through the file and count number of CRLFs.
Pretty straightforward implementation:
public int CountLines(Stream stream, Encoding encoding)
{
int cur, prev = -1, lines = 0;
using (var sr = new StreamReader(stream, encoding, false, 4096, true))
{
while ((cur = sr.Read()) != -1)
{
if (prev == '\r' && cur == '\n')
lines++;
prev = cur;
}
}
//Empty stream will result in 0 lines, any content would result in at least one line
if (prev != -1)
lines++;
return lines;
}
Example usage:
using(var s = File.OpenRead(@"<your_file_path>"))
Console.WriteLine("Found {0} lines", CountLines(s, Encoding.Default));
Actually it's a find substring in string task. More generic algorithms can be used.
Here is an extension-method that reads the lines with line-seperator {Cr}{Lf} only, and not {LF}. You could do a count on it.
var count= new StreamReader(@"D:\Test.txt").ReadLinesCrLf().Count()
But could also use it for reading files, sometimes usefull since the normal StreamReader.ReadLine breaks on both {Cr}{Lf} and {LF}. Can be used on any TextReader and works streaming (file size is not an issue).
public static IEnumerable<string> ReadLinesCrLf(this TextReader reader, int bufferSize = 4096)
{
StringBuilder lineBuffer = null;
//read buffer
char[] buffer = new char[bufferSize];
int charsRead;
var previousIsLf = false;
while ((charsRead = reader.Read(buffer, 0, bufferSize)) != 0)
{
int bufferIndex = 0;
int writeIdx = 0;
do
{
var currentChar = buffer[bufferIndex];
switch (currentChar)
{
case '\n':
if (previousIsLf)
{
if (lineBuffer == null)
{
//return from current buffer writeIdx could be higher than 0 when multiple rows are in the buffer
yield return new string(buffer, writeIdx, bufferIndex - writeIdx - 1);
//shift write index to next character that will be read
writeIdx = bufferIndex + 1;
}
else
{
Debug.Assert(writeIdx == 0, $"Write index should be 0, when linebuffer != null");
lineBuffer.Append(buffer, writeIdx, bufferIndex - writeIdx);
Debug.Assert(lineBuffer.ToString().Last() == '\r',$"Last character in linebuffer should be a carriage return now");
lineBuffer.Length--;
//shift write index to next character that will be read
writeIdx = bufferIndex + 1;
yield return lineBuffer.ToString();
lineBuffer = null;
}
}
previousIsLf = false;
break;
case '\r':
previousIsLf = true;
break;
default:
previousIsLf = false;
break;
}
bufferIndex++;
} while (bufferIndex < charsRead);
if (writeIdx < bufferIndex)
{
if (lineBuffer == null) lineBuffer = new StringBuilder();
lineBuffer.Append(buffer, writeIdx, bufferIndex - writeIdx);
}
}
//return last row
if (lineBuffer != null && lineBuffer.Length > 0) yield return lineBuffer.ToString();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With