Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does any one know of a faster method to do String.Split()?

I am reading each line of a CSV file and need to get the individual values in each column. So right now I am just using:

values = line.Split(delimiter);

where line is the a string that holds the values that are seperated by the delimiter.

Measuring the performance of my ReadNextRow method I noticed that it spends 66% on String.Split, so I was wondering if someone knows of a faster method to do this.

Thanks!


1 Answers

The BCL implementation of string.Split is actually quite fast, I've done some testing here trying to out preform it and it's not easy.

But there's one thing you can do and that's to implement this as a generator:

public static IEnumerable<string> GetSplit( this string s, char c )
{
    int l = s.Length;
    int i = 0, j = s.IndexOf( c, 0, l );
    if ( j == -1 ) // No such substring
    {
        yield return s; // Return original and break
        yield break;
    }

    while ( j != -1 )
    {
        if ( j - i > 0 ) // Non empty? 
        {
            yield return s.Substring( i, j - i ); // Return non-empty match
        }
        i = j + 1;
        j = s.IndexOf( c, i, l - i );
    }

    if ( i < l ) // Has remainder?
    {
        yield return s.Substring( i, l - i ); // Return remaining trail
    }
}

The above method is not necessarily faster than string.Split for small strings but it returns results as it finds them, this is the power of lazy evaluation. If you have long lines or need to conserve memory, this is the way to go.

The above method is bounded by the performance of IndexOf and Substring which does too much index of out range checking and to be faster you need to optimize away these and implement your own helper methods. You can beat the string.Split performance but it's gonna take cleaver int-hacking. You can read my post about that here.

like image 113
John Leidegren Avatar answered Sep 12 '25 07:09

John Leidegren