Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance when Generating CPU Cache Misses

I am trying to learn about CPU cache performance in the world of .NET. Specifically I am working through Igor Ostovsky's article about Processor Cache Effects.

I have gone through the first three examples in his article and have recorded results that widely differ from his. I think I must be doing something wrong because the performance on my machine is showing almost the exact opposite results of what he shows in his article. I am not seeing the large effects from cache misses that I would expect.

What am I doing wrong? (bad code, compiler setting, etc.)

Here are the performance results on my machine:

enter image description here

enter image description here

enter image description here

If it helps, the processor on my machine is an Intel Core i7-2630QM. Here is info on my processor's cache:

enter image description here

I have compiled in x64 Release mode.

Below is my source code:

class Program
    {

        static Stopwatch watch = new Stopwatch();

        static int[] arr = new int[64 * 1024 * 1024];

        static void Main(string[] args)
        {
            Example1();
            Example2();
            Example3();


            Console.ReadLine();
        }

        static void Example1()
        {
            Console.WriteLine("Example 1:");

            // Loop 1
            watch.Restart();
            for (int i = 0; i < arr.Length; i++) arr[i] *= 3;
            watch.Stop();
            Console.WriteLine("     Loop 1: " + watch.ElapsedMilliseconds.ToString() + " ms");

            // Loop 2
            watch.Restart();
            for (int i = 0; i < arr.Length; i += 32) arr[i] *= 3;
            watch.Stop();
            Console.WriteLine("     Loop 2: " + watch.ElapsedMilliseconds.ToString() + " ms");

            Console.WriteLine();
        }

        static void Example2()
        {

            Console.WriteLine("Example 2:");

            for (int k = 1; k <= 1024; k *= 2)
            {

                watch.Restart();
                for (int i = 0; i < arr.Length; i += k) arr[i] *= 3;
                watch.Stop();
                Console.WriteLine("     K = "+ k + ": " + watch.ElapsedMilliseconds.ToString() + " ms");

            }
            Console.WriteLine();
        }

        static void Example3()
        {   

            Console.WriteLine("Example 3:");

            for (int k = 1; k <= 1024*1024; k *= 2)
            {

                //256* 4bytes per 32 bit int * k = k Kilobytes
                arr = new int[256*k];



                int steps = 64 * 1024 * 1024; // Arbitrary number of steps
                int lengthMod = arr.Length - 1;

                watch.Restart();
                for (int i = 0; i < steps; i++)
                {
                    arr[(i * 16) & lengthMod]++; // (x & lengthMod) is equal to (x % arr.Length)
                }

                watch.Stop();
                Console.WriteLine("     Array size = " + arr.Length * 4 + " bytes: " + (int)(watch.Elapsed.TotalMilliseconds * 1000000.0 / arr.Length) + " nanoseconds per element");

            }
            Console.WriteLine();
        }

    }
like image 819
Jason Moore Avatar asked Jun 18 '11 13:06

Jason Moore


People also ask

What is the impact of a cache miss in the overall performance of a program?

When a cache miss occurs, the system or application proceeds to locate the data in the underlying data store, which increases the duration of the request. Typically, the system may write the data to the cache, again increasing the latency, though that latency is offset by the cache hits on other data.

Is cache miss rate a good indicator of performance?

According to this article the cache-misses to instructions is a good indicator of cache performance. The ratio of cache-misses to instructions will give an indication how well the cache is working; the lower the ratio the better.

What affects cache miss rate?

The worst cache miss rate occurs when there is no tiling, but the worst CPI occurs with tile size 288 × 288. CPI improves slightly when tiling is discontinued. This is likely due to lower instruction CPI that results from the reduction of executed branch instructions from needing fewer iterations of the tile loops.

Does CPU cache increase performance?

-The more cache the CPU has, the less time the computer spends accessing slower main memory and as a result programs may run faster.


1 Answers

Why are you using i += 32 in the second loop. You are stepping over cache lines in this way. 32*4 = 128bytes way bigger then 64bytes needed.

like image 187
DiVan Avatar answered Sep 26 '22 18:09

DiVan