Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ReadLine() vs Read() to Get CR and LF Efficiently?

I am working on a C# program to determine the line length for each row in multiple large text files with 100,000+ rows before importing using an SSIS package. I will also be checking other values on each line to verify they are correct befor importing them into my database using SSIS.

For example, I am expecting a line length of 3000 characters and then a CR at 3001 and LF at 3002, so overall a total of 3002 characters.

When using ReadLine() it reads a CR or LF as and end of line so that I can't check the CR or LF characters. I had been just checking the length of the line at 3000 to determine if the length was correct. I have just encountered an issue where the file has a LF at position 3001 but was missing the CR. So ReadLine() says it is 3000 char witch is correct but it will fail in my SSIS package because it is missing a CR.

I have verified that Read() will reach each char 1 at a time and I can determine if each line has a CR and LF but this seems rather unproductive, and when some files I will encounter with have upwards of 5,000,000+ rows this seems very inefficient. I will also need to then add each char into a string or use ReadBlock() and convert a char array into a string so that I can check other values in the line.

Does anyone have any ideas on an efficient way to check the line for CR and LF and other values on a given line without wasting unnecessary resources and to finish in a relatively timely manner.

like image 924
buzzzzjay Avatar asked Sep 01 '11 21:09

buzzzzjay


2 Answers

have verified that Read() will reach each char 1 at a time and I can determine if each line has a CR and LF but this seems rather unproductive

Think about this. Do you think ReadLine() has a magic wand and does not have to read each char?

Just create your own ReadMyLine(). Something has to read the chars, it doesn't matter if that's your code or the lib. I/O will be buffered by the Stream and Windows.

like image 88
Henk Holterman Avatar answered Sep 25 '22 11:09

Henk Holterman


Can you use an override of StreamReader.Read OR an override of TextReader.Read which accepts 3 parameters - string buffer (in your case a 3002 character array), startint index (you will handle this in a loop each time incrementing the index by 3002), number of characters to read (3002). From the read buffer, you can check the last two characters for your conditional evaluation of CR and LF.

like image 32
Arun Avatar answered Sep 23 '22 11:09

Arun