This simple piece of c# code that is meant to find script blocks in HTML takes 0.5 seconds to run on a 74K char string with only 9 script blocks in it. This is undebuged release binary on 2.8Ghz i7 CPU. I made several runs though this code to make sure that performance is not impeded by JIT. It is not. This is VS2010 .NET 4.0 Client Profile. x64 Why is this so slow? <pre class="prettyprint"><code> int[] _exclStart = new int[100]; int[] _exclStop = new int[100]; int _excl = 0; for (int f = input.IndexOf("<script", 0); f != -1; ) { _exclStart[_excl] = f; f = input.IndexOf("</script", f + 8); if (f == -1) { _exclStop[_excl] = input.Length; break; } _exclStop[_excl] = f; f = input.IndexOf("<script", f + 8); ++_excl; } </code></pre>

I used the source on this page as an example, I then duplicated the content 8 times, resulting in a page some 334,312 bytes long. Using StringComparision.Ordinal yields massive performance difference. <pre class="prettyprint"><code>string newInput = string.Format("{0}{0}{0}{0}{0}{0}{0}{0}", input.Trim().ToLower()); //string newInput = input.Trim().ToLower(); System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch(); sw.Start(); int[] _exclStart = new int[100]; int[] _exclStop = new int[100]; int _excl = 0; for (int f = newInput.IndexOf("<script", 0, StringComparison.Ordinal); f != -1; ) { _exclStart[_excl] = f; f = newInput.IndexOf("</script", f + 8, StringComparison.Ordinal); if (f == -1) { _exclStop[_excl] = newInput.Length; break; } _exclStop[_excl] = f; f = newInput.IndexOf("<script", f + 8, StringComparison.Ordinal); ++_excl; } sw.Stop(); Console.WriteLine(sw.Elapsed.TotalMilliseconds); </code></pre> running 5 times yields almost the same result for each (the loop timings did not significantly change so for this simple code there is almost no time spent for JIT to compile it) Output using your original code (in Milliseconds): <pre class="prettyprint"><code>10.2786 11.4671 11.1066 10.6537 10.0723 </code></pre> Output using the above code instead (in Milliseconds): <pre class="prettyprint"><code>0.3055 0.2953 0.2972 0.3112 0.3347 </code></pre> Notice that my test results are around 0.010 seconds (original code) and 0.0003 seconds (for ordinal code). Meaning you have something else wrong other than this code directly. If as you say, using <code>StringComparison.Ordinal</code> does nothing for your performance then that means that either your using incorrect timers to time your performance, or you have a large overhead in reading your <code>input</code> value such as reading it from a stream again which you otherwise don't realise. Tested under Windows 7 x64 running on a 3GHz i5 using .NET 4 Client Profile. Suggestions: <ol> <li>use <code>StringComparison.Ordinal</code> </li> <li>Make sure you're using <code>System.Diagnostics.Stopwatch</code> to time performance</li> <li>Declare a local variable for the <code>input</code> instead of using values external to the function (eg: <code>string newInput = input.Trim().ToLower();</code>)</li> </ol> Again I stress, I am getting 50 times faster speed for a test data that is apparently over 4 times larger in size using the exact same code that you provide. Meaning my test is running some 200 times faster than yours which is not something anyone would expect given we're both running same environment and just i5 (me) versus i7 (you).

The IndexOf overload you're using is culture-sensitive, which will affect performance. Instead, use: <pre class="prettyprint"><code>input.IndexOf("<script", 0, StringComparison.Ordinal); </code></pre>

string.IndexOf performance

Tags:

performance

c#

.net

This simple piece of c# code that is meant to find script blocks in HTML takes 0.5 seconds to run on a 74K char string with only 9 script blocks in it. This is undebuged release binary on 2.8Ghz i7 CPU. I made several runs though this code to make sure that performance is not impeded by JIT. It is not.

This is VS2010 .NET 4.0 Client Profile. x64

Why is this so slow?

                int[] _exclStart = new int[100];
                int[] _exclStop = new int[100];
                int _excl = 0;
                for (int f = input.IndexOf("<script", 0); f != -1; )
                {
                    _exclStart[_excl] = f;
                    f = input.IndexOf("</script", f + 8);
                    if (f == -1)
                    {
                        _exclStop[_excl] = input.Length;
                        break;
                    }
                    _exclStop[_excl] = f;
                    f = input.IndexOf("<script", f + 8);
                    ++_excl;
                }

426

asked Oct 28 '11 23:10

Ghostrider

3 Answers

I used the source on this page as an example, I then duplicated the content 8 times, resulting in a page some 334,312 bytes long. Using StringComparision.Ordinal yields massive performance difference.

string newInput = string.Format("{0}{0}{0}{0}{0}{0}{0}{0}", input.Trim().ToLower());
//string newInput = input.Trim().ToLower();

System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
int[] _exclStart = new int[100];
int[] _exclStop = new int[100];
int _excl = 0;
for (int f = newInput.IndexOf("<script", 0, StringComparison.Ordinal); f != -1; )
{
    _exclStart[_excl] = f;
    f = newInput.IndexOf("</script", f + 8, StringComparison.Ordinal);
    if (f == -1)
    {
        _exclStop[_excl] = newInput.Length;
        break;
    }
    _exclStop[_excl] = f;
    f = newInput.IndexOf("<script", f + 8, StringComparison.Ordinal);
    ++_excl;
}
sw.Stop();
Console.WriteLine(sw.Elapsed.TotalMilliseconds);

running 5 times yields almost the same result for each (the loop timings did not significantly change so for this simple code there is almost no time spent for JIT to compile it)

Output using your original code (in Milliseconds):

Output using the above code instead (in Milliseconds):

Notice that my test results are around 0.010 seconds (original code) and 0.0003 seconds (for ordinal code). Meaning you have something else wrong other than this code directly.

If as you say, using StringComparison.Ordinal does nothing for your performance then that means that either your using incorrect timers to time your performance, or you have a large overhead in reading your input value such as reading it from a stream again which you otherwise don't realise.

Tested under Windows 7 x64 running on a 3GHz i5 using .NET 4 Client Profile.

Suggestions:

use StringComparison.Ordinal
Make sure you're using System.Diagnostics.Stopwatch to time performance
Declare a local variable for the input instead of using values external to the function (eg: string newInput = input.Trim().ToLower();)

Again I stress, I am getting 50 times faster speed for a test data that is apparently over 4 times larger in size using the exact same code that you provide. Meaning my test is running some 200 times faster than yours which is not something anyone would expect given we're both running same environment and just i5 (me) versus i7 (you).

143

answered Oct 12 '22 23:10

Seph

The IndexOf overload you're using is culture-sensitive, which will affect performance. Instead, use:

input.IndexOf("<script", 0, StringComparison.Ordinal);

answered Oct 12 '22 22:10

Matthew Flaschen

I would recommend using RegEx for this, it offers significant performance improvement because the expressions are compiled only once. Whereas IndexOf is essentially a loop which runs on per character basis which probably means, you have 3 "loops" within your main for loop, ofcourse, IndexOf won't be as slow as a regular loop, but still when the input size grows the time increases. Regex has inbuilt functions that would return the number and positions of occurrences of each pattern you define.

Edit: this might shed some more light on the performance of IndexOf IndexOf Perf

answered Oct 12 '22 23:10

Aman

Related questions
                            
                                Creating a database using Connector/NET Programming?
                            
                                Scheduling load with a round robin algorithm?
                            
                                C# File Exception: cannot access the file because it is being used by another process
                            
                                Convert a Time to a formatted string in C#
                            
                                When is it OK to use an XML file to save information?
                            
                                Take every 2nd object in list
                            
                                Removing duplicate substrings within a string in C#
                            
                                How to split text value into array with words in C#?
                            
                                How close Html window when click asp.net button?
                            
                                Get SynchronizationContext from a given Thread
                            
                                How do I split a word's letters into an Array in C#?
                            
                                How to get specific Range in Excel through COM Interop?
                            
                                How to create two constructor overloads, both taking only one string parameter?
                            
                                C#.net - How to alert program that the thread is finished (event driven)?
                            
                                UdpClient vs TcpClient
                            
                                Constructors versus Initializors in C# [duplicate]
                            
                                Command behind a GridView ButtonField
                            
                                How to implement for-else and foreach-else in C# similar to Python's for-else and while-else?
                            
                                MongoDB C# - Getting BsonDocument for an Element that doesn't exist
                            
                                How to use Convert.ChangeType() when conversionType is a nullable int

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With