Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string type .NET vs. char array

I've been working with some programs here at work for about a month now that have a lot of string parsing and such going on. I've been advised to use a char array for this stuff as opposed to a string because the char array is faster. I understand why a char array is fast, but what is it about the string type that makes it slower? What data structure is it implementing and is there any way to make it as fast as a char array?

like image 758
MGZero Avatar asked Jul 11 '11 13:07

MGZero


2 Answers

The advantage with char arrays over strings is that you can alter character arrays in place; in C# strings are immutable, and so any change creates a new object on the heap with a changed version of the string. In a char array you can make lots of changes without allocating anything on the heap.

like image 123
antlersoft Avatar answered Nov 15 '22 12:11

antlersoft


The most obvious difference is that string is immutable. So you can't modify parts of it and need to create a completely new copy on each modification.

String itself has a very special implementation (it's a variable size class) and is not backed by an array. I see no reason why read-only access to chars in a string should be slow.

So if you want to change small parts of a string, you need to use either StringBuilder or char[]. Of these two char[] is/was faster since StringBuilder has additional verifications and indirections. But since this is an implementation detail it might have changed since I last tested it.


Just benchmarked it, and as of .NET 4 setting a member of char[] is about four times as fast compared to a StringBuilder. But both can do more than 200 milion assignments per second, so it rarely matters in practice.

Reading from a char[] is slightly faster (25% for my test code) that reading from string. Reading from StringBuilder on the other hand is slower (a factor of 3) than reading from char[].

In all benchmarks I neglected the overhead of my other code. This means that my test underestimates the differences a bit.

My conclusion is that while char[] is faster than the alternatives it only matters if you're going over hundreds of megabytes per second.


//Write StringBuilder
StringBuilder sb = new StringBuilder();
sb.Length = 256;
for(int i=0; i<1000000000; i++)
{
    int j = i&255;
    sb[j] = 'A';
}

//Write char[]
char[] cs = new char[256];
for(int i=0; i<1000000000; i++)
{
    int j = i&255;
    cs[j] = 'A';
}

// Read string
string s = new String('A',256);
int sum = 0;
for(int i=0; i<1000000000; i++)
{
    int j = i&255;
    sum += s[j];
}

//Read char[]
char[] s = new String('A',256).ToCharArray();
int sum = 0;
for(int i=0; i<1000000000; i++)
{
    int j = i&255;
    sum += s[j];
}

//Read StringBuilder
StringBuilder s= new StringBuilder(new String('A',256));
int sum = 0;
for(int i=0; i<1000000000; i++)
{
    int j = i&255;
    sum += s[j];
}

(Yes, I know my benchmark code isn't very good, but I don't think it makes a big difference.)

like image 30
CodesInChaos Avatar answered Nov 15 '22 13:11

CodesInChaos