I have a text file which has lines of data separated by newlines. What I'm trying to do is count the number of lines in the file, excluding the ones that are only a newline.
I'm trying to use a regular expression to look at each line as it is read, and if it starts with a newline character not include it in my line count, but I can't seem to get it to work. I've searched all over the place for how to do this with no results.
Here's the method I've written to try to do this:
public int LineCounter()
{
StreamReader myRead = new StreamReader(@"C:\TestFiles\test.txt");
int lineCount = 0;
string line;
while ((line = myRead.ReadLine()) != null)
{
string regexExpression = @"^\r?\n";
RegexOptions myOptions = RegexOptions.Multiline;
Match stringMatch = Regex.Match(line, regexExpression, myOptions);
if (stringMatch.Success)
{
}
else
{
lineCount++;
}
}
return lineCount;
}
I've tried changing the RegexOptions between Singleline
andMultiline
, I've tried putting "\r|\n|\r\n"
into my pattern match, and I've tried removing the ^
from the expression, but I can't seem to get it to work. No matter what I do, my lineCount
always ends up being the total number of lines in the file, including the newlines.
I'm apparently overlooking something obvious, but I'm not yet familiar enough with the C# language to see what's wrong. Everything looks like it should work to me. Can someone please help me out?
"\n" matches a newline character.
If you want to indicate a line break when you construct your RegEx, use the sequence “\r\n”. Whether or not you will have line breaks in your expression depends on what you are trying to match. Line breaks can be useful “anchors” that define where some pattern occurs in relation to the beginning or end of a line.
Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.
By default in most regex engines, . doesn't match newline characters, so the matching stops at the end of each logical line. If you want . to match really everything, including newlines, you need to enable "dot-matches-all" mode in your regex engine of choice (for example, add re. DOTALL flag in Python, or /s in PCRE.
Use LINQ to make it a one line counter:
private int count(string filePath)
{
string[] lines = File.ReadAllLines(filePath);
return lines.Count(r => !String.IsNullOrWhiteSpace(r));
}
This will also exclude lines that have whitespaces in them.
You can change to String.IsNullOrEmpty(r)
if you want to count whitespaced lines
Try changing your code a bit to do something like this (removing regex). The Readline function will remove the \n character from the line it returns. So if you see an empty non null string then you are in effect seeing what you are looking for. Also, you want to wrap your StreamReader with a using statement as well to ensure the file is closed if something catastrophic happens.
NOTE: this code counts lines that only contain a space as well. It sounded like from your description that this is what you wanted.
public int LineCounter()
{
using (StreamReader myRead = new StreamReader(@"C:\TestFiles\test.txt"))
{
int lineCount = 0;
string line;
while ((line = myRead.ReadLine()) != null)
{
if (line.Count() != 0)
{
lineCount++;
}
}
}
return lineCount;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With