Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the line count in a string using .NET (with any line break)

Tags:

string

c#

.net

I need to count the number of lines in a string. Any line break can be character can be present in the string (CR, LF or CRLF).

So possible new line chars:
* \n
* \r
* \r\n

For example, with the following input:

This is [\n]
an string that [\r]
has four [\r\n]
lines

The method should return 4 lines. Do you know any built in function, or someone already implemented it?

static int GetLineCount(string input)
{
   // could you provide a good implementation for this method?
   // I want to avoid string.split since it performs really bad
}

NOTE: Performance is important for me, because I could read large strings.

like image 961
Daniel Peñalba Avatar asked Sep 17 '25 00:09

Daniel Peñalba


1 Answers

int count = 0;
int len = input.Length;
for(int i = 0; i != len; ++i)
  switch(input[i])
  {
    case '\r':
      ++count;
      if (i + 1 != len && input[i + 1] == '\n')
        ++i;
      break;
    case '\n':
    // Uncomment below to include all other line break sequences
    // case '\u000A':
    // case '\v':
    // case '\f':
    // case '\u0085':
    // case '\u2028':
    // case '\u2029':
      ++count;
      break;
  }

Simply scan through, counting the line-breaks, and in the case of \r test if the next character is \n and skip it if it is.

Performance is important for me, because I could read large strings.

If at all possible then, avoid reading large strings at all. E.g. if they come from streams this is pretty easy to do directly on a stream as there is no more than one-character read-ahead ever needed.

Here's another variant that doesn't count newlines at the very end of a string:

int count = 1;
int len = input.Length - 1;
for(int i = 0; i < len; ++i)
  switch(input[i])
  {
    case '\r':
    if (input[i + 1] == '\n')
    {
      if (++i >= len)
      {
        break;
      }
    }
    goto case '\n';
        case '\n':
        // Uncomment below to include all other line break sequences
        // case '\u000A':
        // case '\v':
        // case '\f':
        // case '\u0085':
        // case '\u2028':
        // case '\u2029':
          ++count;
          break;      
  }

This therefore considers "", "a line", "a line\n" and "a line\r\n" to each be one line only, and so on.

like image 117
Jon Hanna Avatar answered Sep 18 '25 16:09

Jon Hanna