The title is the question. Below is my attempt to answer it through research. But I don't trust my uninformed research so I still pose the question (What is the fastest way to iterate through individual characters in a string in C#?).
Occasionally I want to cycle through the characters of a string one-by-one, such as when parsing for nested tokens -- something which cannot be done with regular expressions. I am wondering what the fastest way is to iterate through the individual characters in a string, particularly very large strings.
I did a bunch of testing myself and my results are below. However there are many readers with much more in depth knowledge of the .NET CLR and C# compiler so I don't know if I'm missing something obvious, or if I made a mistake in my test code. So I solicit your collective response. If anyone has insight into how the string indexer actually works that would be very helpful. (Is it a C# language feature compiled into something else behind the scenes? Or something built in to the CLR?).
The first method using a stream was taken directly from the accepted answer from the thread: how to generate a stream from a string?
Tests
longString
is a 99.1 million character string consisting of 89 copies of the plain-text version of the C# language specification. Results shown are for 20 iterations. Where there is a 'startup' time (such as for the first iteration of the implicitly created array in method #3), I tested that separately, such as by breaking from the loop after the first iteration.
Results
From my tests, caching the string in a char array using the ToCharArray() method is the fastest for iterating over the entire string. The ToCharArray() method is an upfront expense, and subsequent access to individual characters is slightly faster than the built in index accessor.
milliseconds --------------------------------- Method Startup Iteration Total StdDev ------------------------------ ------- --------- ----- ------ 1 index accessor 0 602 602 3 2 explicit convert ToCharArray 165 410 582 3 3 foreach (c in string.ToCharArray)168 455 623 3 4 StringReader 0 1150 1150 25 5 StreamWriter => Stream 405 1940 2345 20 6 GetBytes() => StreamReader 385 2065 2450 35 7 GetBytes() => BinaryReader 385 5465 5850 80 8 foreach (c in string) 0 960 960 4
Update: Per @Eric's comment, here are results for 100 iterations over a more normal 1.1 M char string (one copy of the C# spec). Indexer and char arrays are still fastest, followed by foreach(char in string), followed by stream methods.
milliseconds --------------------------------- Method Startup Iteration Total StdDev ------------------------------ ------- --------- ----- ------ 1 index accessor 0 6.6 6.6 0.11 2 explicit convert ToCharArray 2.4 5.0 7.4 0.30 3 for(c in string.ToCharArray) 2.4 4.7 7.1 0.33 4 StringReader 0 14.0 14.0 1.21 5 StreamWriter => Stream 5.3 21.8 27.1 0.46 6 GetBytes() => StreamReader 4.4 23.6 28.0 0.65 7 GetBytes() => BinaryReader 5.0 61.8 66.8 0.79 8 foreach (c in string) 0 10.3 10.3 0.11
Code Used (tested separately; shown together for brevity)
//1 index accessor int strLength = longString.Length; for (int i = 0; i < strLength; i++) { c = longString[i]; } //2 explicit convert ToCharArray int strLength = longString.Length; char[] charArray = longString.ToCharArray(); for (int i = 0; i < strLength; i++) { c = charArray[i]; } //3 for(c in string.ToCharArray) foreach (char c in longString.ToCharArray()) { } //4 use StringReader int strLength = longString.Length; StringReader sr = new StringReader(longString); for (int i = 0; i < strLength; i++) { c = Convert.ToChar(sr.Read()); } //5 StreamWriter => StreamReader int strLength = longString.Length; MemoryStream stream = new MemoryStream(); StreamWriter writer = new StreamWriter(stream); writer.Write(longString); writer.Flush(); stream.Position = 0; StreamReader str = new StreamReader(stream); while (stream.Position < strLength) { c = Convert.ToChar(str.Read()); } //6 GetBytes() => StreamReader int strLength = longString.Length; MemoryStream stream = new MemoryStream(Encoding.Unicode.GetBytes(longString)); StreamReader str = new StreamReader(stream); while (stream.Position < strLength) { c = Convert.ToChar(str.Read()); } //7 GetBytes() => BinaryReader int strLength = longString.Length; MemoryStream stream = new MemoryStream(Encoding.Unicode.GetBytes(longString)); BinaryReader br = new BinaryReader(stream, Encoding.Unicode); while (stream.Position < strLength) { c = br.ReadChar(); } //8 foreach (c in string) foreach (char c in longString) { }
Accepted answer:
I interpreted @CodeInChaos and Ben's notes as follows:
fixed (char* pString = longString) { char* pChar = pString; for (int i = 0; i < strLength; i++) { c = *pChar ; pChar++; } }
Execution for 100 iterations over the short string was 4.4 ms, with < 0.1 ms st dev.
Using the character iterator is probably the only correct way to iterate over characters, because Unicode requires more space than a Java char provides. A Java char contains 16 bit and can hold Unicode characters up U+FFFF but Unicode specifies characters up to U+10FFFF.
Another way to iterate over a string is to use for item of str . The variable item receives the character directly so you do not have to use the index. If your code does not need the index value of each character, this loop format is even simpler.
Any reason not to include foreach
?
foreach (char c in text) { ... }
Is this really going to be your performance bottleneck, by the way? What proportion of your total running time does the iteration itself take?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With