Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest way to iterate through individual characters in a string in C#?

Tags:

string

c#

The title is the question. Below is my attempt to answer it through research. But I don't trust my uninformed research so I still pose the question (What is the fastest way to iterate through individual characters in a string in C#?).

Occasionally I want to cycle through the characters of a string one-by-one, such as when parsing for nested tokens -- something which cannot be done with regular expressions. I am wondering what the fastest way is to iterate through the individual characters in a string, particularly very large strings.

I did a bunch of testing myself and my results are below. However there are many readers with much more in depth knowledge of the .NET CLR and C# compiler so I don't know if I'm missing something obvious, or if I made a mistake in my test code. So I solicit your collective response. If anyone has insight into how the string indexer actually works that would be very helpful. (Is it a C# language feature compiled into something else behind the scenes? Or something built in to the CLR?).

The first method using a stream was taken directly from the accepted answer from the thread: how to generate a stream from a string?

Tests

longString is a 99.1 million character string consisting of 89 copies of the plain-text version of the C# language specification. Results shown are for 20 iterations. Where there is a 'startup' time (such as for the first iteration of the implicitly created array in method #3), I tested that separately, such as by breaking from the loop after the first iteration.

Results

From my tests, caching the string in a char array using the ToCharArray() method is the fastest for iterating over the entire string. The ToCharArray() method is an upfront expense, and subsequent access to individual characters is slightly faster than the built in index accessor.

                                           milliseconds                                 ---------------------------------  Method                         Startup  Iteration  Total  StdDev ------------------------------  -------  ---------  -----  ------  1 index accessor                     0        602    602       3  2 explicit convert ToCharArray     165        410    582       3  3 foreach (c in string.ToCharArray)168        455    623       3  4 StringReader                       0       1150   1150      25  5 StreamWriter => Stream           405       1940   2345      20  6 GetBytes() => StreamReader       385       2065   2450      35  7 GetBytes() => BinaryReader       385       5465   5850      80  8 foreach (c in string)              0        960    960       4 

Update: Per @Eric's comment, here are results for 100 iterations over a more normal 1.1 M char string (one copy of the C# spec). Indexer and char arrays are still fastest, followed by foreach(char in string), followed by stream methods.

                                           milliseconds                                 ---------------------------------  Method                         Startup  Iteration  Total  StdDev ------------------------------  -------  ---------  -----  ------  1 index accessor                     0        6.6    6.6    0.11  2 explicit convert ToCharArray     2.4        5.0    7.4    0.30  3 for(c in string.ToCharArray)     2.4        4.7    7.1    0.33  4 StringReader                       0       14.0   14.0    1.21  5 StreamWriter => Stream           5.3       21.8   27.1    0.46  6 GetBytes() => StreamReader       4.4       23.6   28.0    0.65  7 GetBytes() => BinaryReader       5.0       61.8   66.8    0.79  8 foreach (c in string)              0       10.3   10.3    0.11      

Code Used (tested separately; shown together for brevity)

//1 index accessor int strLength = longString.Length; for (int i = 0; i < strLength; i++) { c = longString[i]; }  //2 explicit convert ToCharArray int strLength = longString.Length; char[] charArray = longString.ToCharArray(); for (int i = 0; i < strLength; i++) { c = charArray[i]; }  //3 for(c in string.ToCharArray) foreach (char c in longString.ToCharArray()) { }   //4 use StringReader int strLength = longString.Length; StringReader sr = new StringReader(longString); for (int i = 0; i < strLength; i++) { c = Convert.ToChar(sr.Read()); }  //5 StreamWriter => StreamReader  int strLength = longString.Length; MemoryStream stream = new MemoryStream(); StreamWriter writer = new StreamWriter(stream); writer.Write(longString); writer.Flush(); stream.Position = 0; StreamReader str = new StreamReader(stream); while (stream.Position < strLength) { c = Convert.ToChar(str.Read()); }   //6 GetBytes() => StreamReader int strLength = longString.Length; MemoryStream stream = new MemoryStream(Encoding.Unicode.GetBytes(longString)); StreamReader str = new StreamReader(stream); while (stream.Position < strLength) { c = Convert.ToChar(str.Read()); }  //7 GetBytes() => BinaryReader  int strLength = longString.Length; MemoryStream stream = new MemoryStream(Encoding.Unicode.GetBytes(longString)); BinaryReader br = new BinaryReader(stream, Encoding.Unicode); while (stream.Position < strLength) { c = br.ReadChar(); }  //8 foreach (c in string) foreach (char c in longString) { }  

Accepted answer:

I interpreted @CodeInChaos and Ben's notes as follows:

fixed (char* pString = longString) {     char* pChar = pString;     for (int i = 0; i < strLength; i++) {         c = *pChar ;         pChar++;     } } 

Execution for 100 iterations over the short string was 4.4 ms, with < 0.1 ms st dev.

like image 751
Joshua Honig Avatar asked Jan 09 '12 19:01

Joshua Honig


People also ask

What type of a structure is the best way to iterate through the characters of a string?

Using the character iterator is probably the only correct way to iterate over characters, because Unicode requires more space than a Java char provides. A Java char contains 16 bit and can hold Unicode characters up U+FFFF but Unicode specifies characters up to U+10FFFF.

Can we iterate a string?

Another way to iterate over a string is to use for item of str . The variable item receives the character directly so you do not have to use the index. If your code does not need the index value of each character, this loop format is even simpler.


1 Answers

Any reason not to include foreach?

foreach (char c in text) {     ... } 

Is this really going to be your performance bottleneck, by the way? What proportion of your total running time does the iteration itself take?

like image 61
Jon Skeet Avatar answered Sep 18 '22 15:09

Jon Skeet