Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest way to count newlines in a large .NET string?

Tags:

c#

.net

Is there a way to improve this:

private static int CountNewlines(string s)
{
    int len = s.Length;
    int c = 0;
    for (int i=0; i < len;  i++)
    {
        if (s[i] == '\n') c++;
    }
    return c;
}

I'm particularly concerned about the Item accessor on the string. Not sure if it is just pointer arithmetic like C/C++.

like image 745
Cheeso Avatar asked Mar 31 '10 22:03

Cheeso


4 Answers

I tested these implementations

private static int Count1(string s)
{
    int len = s.Length;
    int c = 0;
    for (int i=0; i < len;  i++)
    {
        if (s[i] == '\n') c++;
    }
    return c+1;
}

private static int Count2(string s)
{
    int count = -1;
    int index = -1;

    do
    {
        count++;
        index = s.IndexOf('\n', index + 1);
    }
    while (index != -1);

    return count+1;
}

private static int Count3(string s)
{
    return s.Count( c => c == '\n' ) + 1;
}


private static int Count4(string s)
{
    int n = 0;
    foreach( var c in s )
    {
        if ( c == '\n' ) n++;
    }
    return n+1;
}

private static int Count5(string s)
{
    var a = s.ToCharArray();
    int c = 0;
    for (int i=0; i < a.Length; i++)
    {
        if (a[i]=='\n') c++;
    }
    return c+1;
}

Here are my timing results for 100000 iterations on a string of ~25k. Lower is faster.

              Time  Factor
Count1   4.8581503     1.4
Count2   4.1406059     1.2
Count3  45.3614124    13.4
Count4   3.3896130     1.0
Count5   5.9304543     1.7

Surprisingly, to me, the Enumerator implementation was fastest for me, by a significant degree - 20% faster than the next closest implementation. The results were repeatable, regardless of the order in which the methods were run. I also used a warmup phase to insure transient effects (jit, etc) were factored out.

This was for a release build (/optimize+)

like image 177
Cheeso Avatar answered Oct 25 '22 13:10

Cheeso


I'm pretty sure this won't be much slower than converting the string to bytes and checking those, if not faster. The String class should be highly optimized.

If this is a big string, maybe a parallel execution by several threads will make things faster :-)

like image 29
M.A. Hanin Avatar answered Oct 25 '22 14:10

M.A. Hanin


This is probably the most efficient option - the item accessor is internally optimized and you can treat it as if it performs pointer arithmentic.

like image 42
LBushkin Avatar answered Oct 25 '22 13:10

LBushkin


Well, String implements IEnumerable<char>, so I'd definitely try:

s.Count( c => c == '\n' )

As nice as this looks, the original method is 30x faster :)

I haven't given up on the IEnumerable yet, so I've also tried:

int n = 0;
foreach( var c in s )
{
    if ( c == '\n' ) n++;
}
return n;

which seems as fast as the original method.

like image 28
Danko Durbić Avatar answered Oct 25 '22 13:10

Danko Durbić