I was playing with C# and wanted to speed up a program. I made changes and was able to do so. However, I need help understanding why the change made it faster. I've attempted to reduce the code to something easier to understand in a question. Score1 and Report1 is the slower way. Score2 and Report2 is the faster way. The first method first stores a string and an int in a struct in parallel. Next, in a serial loop, it loops through an array of those structs and writes their data to a buffer. The second method first writes the data to a string buffer in parallel. Next, in a serial loop, it writes the string data to a buffer. Here are some sample run times: Run 1 Total Average Time = 0.492087 sec Run 2 Total Average Time = 0.273619 sec When I was working with an earlier non-parallel version of this, the times were almost the same. Why the difference with the parallel version? Even if I reduce the loop in Report1 to write a single line of output to the buffer it is still slower (total time about .42 sec). Here is the simplified code: <pre class="prettyprint"><code>using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Diagnostics; using System.Threading.Tasks; using System.IO; namespace OptimizationQuestion { class Program { struct ValidWord { public string word; public int score; } ValidWord[] valid; StringBuilder output; int total; public void Score1(string[] words) { valid = new ValidWord[words.Length]; for (int i = 0; i < words.Length; i++) { StringBuilder builder = new StringBuilder(); foreach (char c in words[i]) { if (c != 'U') builder.Append(c); } if (words[i].Length == 3) { valid[i] = new ValidWord { word = builder.ToString(), score = words[i].Length }; } } } public void Report1(StringBuilder outputBuffer) { int total = 0; foreach (ValidWord wordInfo in valid) { if (wordInfo.score > 0) { outputBuffer.AppendLine(String.Format("{0} {1}", wordInfo.word.ToString(), wordInfo.score)); total += wordInfo.score; } } outputBuffer.AppendLine(string.Format("Total = {0}", total)); } public void Score2(string[] words) { output = new StringBuilder(); total = 0; for (int i = 0; i < words.Length; i++) { StringBuilder builder = new StringBuilder(); foreach (char c in words[i]) { if (c != 'U') builder.Append(c); } if (words[i].Length == 3) { output.AppendLine(String.Format("{0} {1}", builder.ToString(), words[i].Length)); total += words[i].Length; } } } public void Report2(StringBuilder outputBuffer) { outputBuffer.Append(output.ToString()); outputBuffer.AppendLine(string.Format("Total = {0}", total)); } static void Main(string[] args) { Program[] program = new Program[100]; for (int i = 0; i < program.Length; i++) program[i] = new Program(); string[] words = File.ReadAllLines("words.txt"); Stopwatch stopwatch = new Stopwatch(); const int TIMING_REPETITIONS = 20; double averageTime1 = 0.0; StringBuilder output = new StringBuilder(); for (int i = 0; i < TIMING_REPETITIONS; ++i) { stopwatch.Reset(); stopwatch.Start(); output.Clear(); Parallel.ForEach<Program>(program, p => { p.Score1(words); }); for (int k = 0; k < program.Length; k++) program[k].Report1(output); stopwatch.Stop(); averageTime1 += stopwatch.Elapsed.TotalSeconds; GC.Collect(); } averageTime1 /= (double)TIMING_REPETITIONS; Console.WriteLine(string.Format("Run 1 Total Average Time = {0:0.000000} sec", averageTime1)); double averageTime2 = 0.0; for (int i = 0; i < TIMING_REPETITIONS; ++i) { stopwatch.Reset(); stopwatch.Start(); output.Clear(); Parallel.ForEach<Program>(program, p => { p.Score2(words); }); for (int k = 0; k < program.Length; k++) program[k].Report2(output); stopwatch.Stop(); averageTime2 += stopwatch.Elapsed.TotalSeconds; GC.Collect(); } averageTime2 /= (double)TIMING_REPETITIONS; Console.WriteLine(string.Format("Run 2 Total Average Time = {0:0.000000} sec", averageTime2)); Console.ReadLine(); } } } </code></pre>

I tried running it through a profiler, but I'm not trusting the results I got. (Run1 takes less time than Run2 in it.) So there aren't any concrete answers there, but my suspicion is that the valid[] array is the culprit: <ol> <li>That's a potentially large memory allocation that Run1 is doing and Run2 isn't. Allocating big chunks of memory can be time-consuming. </li> <li>It's possible that array is ending up far from any other working data in physical memory. At the very least, it's big enough to end up in the large object heap, whereas it looks like most everything else is going to end up on the stack or small object heap. That might mean that the Score1 function is having to deal with more cache misses than the Score2 function.</li> </ol> It might be a much smaller issue in the serial code, where you've only got that happening once at any given time. When it's happening for a lot of threads simultaneously, though, the problem might compound so that what originally just caused an extra cache miss or two is now causing page faults.

Help understanding C# optimization

Tags:

c#

.net

multithreading

parallel-processing

I was playing with C# and wanted to speed up a program. I made changes and was able to do so. However, I need help understanding why the change made it faster.

I've attempted to reduce the code to something easier to understand in a question. Score1 and Report1 is the slower way. Score2 and Report2 is the faster way. The first method first stores a string and an int in a struct in parallel. Next, in a serial loop, it loops through an array of those structs and writes their data to a buffer. The second method first writes the data to a string buffer in parallel. Next, in a serial loop, it writes the string data to a buffer. Here are some sample run times:

Run 1 Total Average Time = 0.492087 sec Run 2 Total Average Time = 0.273619 sec

When I was working with an earlier non-parallel version of this, the times were almost the same. Why the difference with the parallel version?

Even if I reduce the loop in Report1 to write a single line of output to the buffer it is still slower (total time about .42 sec).

Here is the simplified code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Threading.Tasks;
using System.IO;

namespace OptimizationQuestion
{
    class Program
    {
        struct ValidWord
        { 
            public string word;
            public int score;
        }
        ValidWord[] valid;
        StringBuilder output;
        int total; 

        public void Score1(string[] words)
        {
            valid = new ValidWord[words.Length];

            for (int i = 0; i < words.Length; i++)
            {
                StringBuilder builder = new StringBuilder();

                foreach (char c in words[i])
                {
                    if (c != 'U')
                        builder.Append(c);
                }
                if (words[i].Length == 3)
                {
                    valid[i] = new ValidWord 
                    { word = builder.ToString(), score = words[i].Length };
                }
            }
        }
        public void Report1(StringBuilder outputBuffer)
        {
            int total = 0;
            foreach (ValidWord wordInfo in valid)
            {
                if (wordInfo.score > 0)
                {
                    outputBuffer.AppendLine(String.Format("{0} {1}", wordInfo.word.ToString(), wordInfo.score));
                    total += wordInfo.score;
                }
            }
            outputBuffer.AppendLine(string.Format("Total = {0}", total));
        }

        public void Score2(string[] words)
        {
            output = new StringBuilder();
            total = 0;           
            for (int i = 0; i < words.Length; i++)
            {
                StringBuilder builder = new StringBuilder();

                foreach (char c in words[i])
                {
                    if (c != 'U')
                        builder.Append(c);
                }
                if (words[i].Length == 3)
                {
                    output.AppendLine(String.Format("{0} {1}", builder.ToString(), words[i].Length));
                    total += words[i].Length;
                }
            }
        }
        public void Report2(StringBuilder outputBuffer)
        {
            outputBuffer.Append(output.ToString());
            outputBuffer.AppendLine(string.Format("Total = {0}", total));
        } 
        static void Main(string[] args)
        {
            Program[] program = new Program[100];
            for (int i = 0; i < program.Length; i++)
                program[i] = new Program(); 

            string[] words = File.ReadAllLines("words.txt");

            Stopwatch stopwatch = new Stopwatch();
            const int TIMING_REPETITIONS = 20;
            double averageTime1 = 0.0;
            StringBuilder output = new StringBuilder();
            for (int i = 0; i < TIMING_REPETITIONS; ++i)
            {
                stopwatch.Reset();
                stopwatch.Start();
                output.Clear();
                Parallel.ForEach<Program>(program, p =>
                    {
                        p.Score1(words);
                    });
                for (int k = 0; k < program.Length; k++)
                    program[k].Report1(output);
                stopwatch.Stop();
                averageTime1 += stopwatch.Elapsed.TotalSeconds;
                GC.Collect();
            }
            averageTime1 /= (double)TIMING_REPETITIONS;
            Console.WriteLine(string.Format("Run 1 Total Average Time = {0:0.000000} sec", averageTime1));
            double averageTime2 = 0.0;
            for (int i = 0; i < TIMING_REPETITIONS; ++i)
            {
                stopwatch.Reset();
                stopwatch.Start();
                output.Clear();
                Parallel.ForEach<Program>(program, p =>
                    {
                        p.Score2(words);
                    });
                for (int k = 0; k < program.Length; k++)
                    program[k].Report2(output);
                stopwatch.Stop();
                averageTime2 += stopwatch.Elapsed.TotalSeconds;
                GC.Collect();
            }
            averageTime2 /= (double)TIMING_REPETITIONS;
            Console.WriteLine(string.Format("Run 2 Total Average Time = {0:0.000000} sec", averageTime2));
            Console.ReadLine();
        }
    }
}

443

asked Feb 09 '11 05:02

jlim

3 Answers

First of all, you are parallelizing the repeated runs. This will improve your benchmark time, but may not help out your real production run time very much. To accurately measure how long it will take to actually run through one word list, you need to have exactly one word list going at a time. Otherwise, the individual threads processing all the lists are competing with each other to some extent for system resources and the time per list suffers, even if the time to do all the lists in total is faster.

To speed up the time to process one word list, you want to processes the individual words in the list in parallel, for exactly one list at a time. To get enough definition/size for a good measurement, either make the list very long or process the list many times in serial.

In your case, this gets a bit tricky because the stringbuilder needed for your final product is not documented as being thread-safe. It's not that bad, though. Here's an example of calling parallel foreach for a single word list:

var locker = new Object(); //I'd actually make this static, but it should end up as a closure and so still work
var OutputBuffer = new StringBuilder(); // you can improve things futher if you can make a good estimate for the final size and force it allocate all the memory it will need up front
int score = 0;
Parallel.ForEach(words, w => 
{
   // We want to push as much of the work to the individual threads as possible.
   // If run in 1 thread, a stringbuilder per word would be bad.
   // Run in parallel, it allows us to do a little more of the work outside of locked code.
   var buf = new StringBuilder(w.Length + 5);
   string word = buf.Append(w.Where(c=>c!='U').Concat(' ').ToArray()).Append(w.Length).ToString();

   lock(locker)
   {
       OutputBuffer.Append(word);
       score += w.Length;
   }
});
OutputBuffer.Append("Total = ").Append(score);

Just call that 20 times in a normal sequentially processed for loop. Again, it might finish the benchmarks a little slower, but I think it will perform real-world a little faster because of a flaw in your benchmark. Also note that I typed this right into the reply window — I've never event tried to compile it, and so it's not likely to be perfect right out of the gate.

After fixing your benchmark to more accurately reflect how parallel code will impact your real-world processing time, the next step is to do some profiling to see where your program is actually spending it's time. This is how you know what areas to look at for improvement.

Out of curiosity, I'd also like to know how this version performs:

var agg = new {score = 0, OutputBuffer = new StringBuilder()};
agg = words.Where(w => w.Length == 3)
   .Select(w => new string(w.Where(c => c!='U').ToArray())
   .Aggregate(agg, (a, w) => {a.OutputBuffer.AppendFormat("{0} {1}\n", w, w.Length); score += w.Length;});
agg.OutputBuffer.Append("Total = ").Append(score);

answered Sep 29 '22 09:09

Joel Coehoorn

The size of a struct should typically be less than that of a pointer (if performance is the primary issue. Microsoft says that anything less than 16 bytes performs better as a struct if reference type semantics aren't needed), else the overhead for passing it around increases (because it is pass by value) and would be more than it would have been for just passing a pointer. Your struct contains a pointer and an int (making it more than a pointer) so you would be experiencing overhead because of this.

See the When to use structs section of this article.

answered Sep 29 '22 10:09

Cornelius

I tried running it through a profiler, but I'm not trusting the results I got. (Run1 takes less time than Run2 in it.) So there aren't any concrete answers there, but my suspicion is that the valid[] array is the culprit:

That's a potentially large memory allocation that Run1 is doing and Run2 isn't. Allocating big chunks of memory can be time-consuming.
It's possible that array is ending up far from any other working data in physical memory. At the very least, it's big enough to end up in the large object heap, whereas it looks like most everything else is going to end up on the stack or small object heap. That might mean that the Score1 function is having to deal with more cache misses than the Score2 function.

It might be a much smaller issue in the serial code, where you've only got that happening once at any given time. When it's happening for a lot of threads simultaneously, though, the problem might compound so that what originally just caused an extra cache miss or two is now causing page faults.

answered Sep 29 '22 11:09

Sean U

Related questions
                            
                                UWP VisualTreeHelper.GetParent() returns null
                            
                                Get context connection and use it as a connection in other place
                            
                                Can not stop async TCP sever Windows service
                            
                                Why does Python.NET use the base method instead of the method from a derived class?
                            
                                Elasticsearch.net - Range Query
                            
                                How to leverage generics to populate derive class models to avoid code duplication?
                            
                                IMemoryCache, refresh cache before eviction
                            
                                How to get HttpWebRequest.AllowAutoRedirect to set the cookies when doing a GET/POST on the redrected page?
                            
                                How do I restore a file from the recycle bin using C#?
                            
                                Can I get a path for a Memory Mapped File? (.NET 4.0)
                            
                                LinqToSQL Error : Operation is not valid due to the current state of the object
                            
                                Making a reusable predicate for EntitySet<T>, IQueryable<T> and IEnumerable<T>
                            
                                What is the C# equivalent of MsgWaitForMultipleObjects?
                            
                                C# 4.0: casting dynamic to static
                            
                                How to detect if Windows is directing traffic over LAN or over WiFi in C#
                            
                                How to use distinct with group by in Linq to SQL
                            
                                How to programmatically get the current audio level?
                            
                                What is the best full text search open source project (.NET preferred)?
                            
                                Get nETBIOSName from a UserPrincipal object
                            
                                C# POCO T4 template, generate interfaces?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With