I have a simple method which converts an array from one type to another. I wanted to find out which method is the fastest. But so far I get differing results from which I cannot conclude which method is really faster by which margin.
Since the conversion is only about allocating memory, reading the array and converting values I am surprised that the values are not more stable. I wanted to know how I can make accurate measurements which are meaningful and to not change from one day to the other. The differences are about 20% from one day to the other.
There are of course differences between the JITer of .NET 3.5 and 4.0, debug and release mode, not running the executable under a debugger (disables JIT optimizations until you disable it), code generation of the C# compiler between DEBUG and RELEASE (mainly nop operations and more temporary variables in the IL code).
using System;
using System.Collections.Generic;
using System.Diagnostics;
namespace PerfTest
{
class Program
{
const int RUNS = 10 * 1000 * 1000;
static void Main(string[] args)
{
int[] array = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 };
var s2 = Stopwatch.StartNew();
for (int i = 0; i < RUNS; i++)
{
float[] arr = Cast(array);
}
s2.Stop();
GC.Collect();
var s3 = Stopwatch.StartNew();
for (int i = 0; i < RUNS; i++)
{
float[] arr = Cast2(array);
}
s3.Stop();
GC.Collect();
var s4 = Stopwatch.StartNew();
for (int i = 0; i < RUNS; i++)
{
var arr = CastSafe(array);
}
s4.Stop();
Console.WriteLine("Times: {0} {1} {2}", s2.ElapsedMilliseconds, s3.ElapsedMilliseconds, s4.ElapsedMilliseconds);
}
// Referece cast implementation to check performance
public static unsafe float[] Cast(int[] input)
{
int N = input.Length;
float[] output = new float[N];
fixed (int* pIStart = &input[0])
{
int* pI = pIStart;
fixed (float* pOStart = &output[0])
{
float* pO = pOStart;
for (int i = 0; i < N; i++)
{
*pO = (float)*pI;
pI++;
pO++;
}
}
}
return output;
}
// Referece cast implementation to check performance
public static unsafe float[] Cast2(int[] input)
{
int N = input.Length;
float[] output = new float[N];
fixed (int* pIStart = &input[0])
{
int* pI = pIStart;
fixed (float* pOStart = &output[0])
{
float* pO = pOStart;
for (int i = 0; i < N; i++)
{
pO[i] = (float) pI[i];
}
}
}
return output;
}
public static float[] CastSafe(int[] input)
{
int N = input.Length;
float[] output = new float[N];
for (int i = 0; i < input.Length; i++)
{
output[i] = (float)input[i];
}
return output;
}
}
}
I do get then
From this it does look like the dumb safe variant is faster than any unsafe variant although bounds check elimination of the unsafe methods should make it at least as fast if not faster. Just for fun I did also compile the same IL code via LCG (DynamicMethod) which seem to be even slower than any of these methods although the additional cost of the delegate invocation does not seem to play such a big role here.
The for loop does execute this code 10 million times which should produce stable results. Why am I seeing any differences here at all? Using realtime as process priority did also not help (psexec -realtime executable). How can I get reliable numbers?
My tests did include
If I use a profiler I am not sure if he will distort the measurements even more. Since he does interrupt my application from time to time to get the call stacks he will certainly destroy any cache locality which might aid performance. If there is any approach with better (data) cache locality I will not be able to find it out with a profiler.
Edit1: To take into account that I do not have a real time OS I do now sample my measurements. Since for one thread I have a 15ms time window granted the the Windows Scheduler I can keep out the Scheduler if I measure shorter than 15ms. If I do measure too shortly I will end up with very small tick counts which will not tell me much.
To get stable values I need a time span long enough to let the OS do whatever it does on a regular basis. Empiric tests have shown that 30+ seconds is a good time span one measurment should take.
This time span is then divided into sample time spans which are well below 15ms. Then I will get timing information for each sample. From the samples I can extract min/max and average. This way I can also see first time initialization effects. The code looks now like this
class Program
{
const int RUNS = 100 * 1000 * 1000; // 100 million runs will take about 30s
const int RunsPerSample = 100; // 100 runs for on sample is about 0,01ms << 15ms
static void Main(string[] args)
{
int[] array = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 };
long[] sampleTimes = new long [RUNS/RunsPerSample];
int sample = 0;
for (int i = 0; i < RUNS; i+=RunsPerSample)
{
var sw = Stopwatch.StartNew();
for (int j = i; j < i+RunsPerSample; j++)
{
float[] arr = Cast(array);
}
sw.Stop();
sampleTimes[sample] = sw.ElapsedTicks;
sample++;
}
Console.WriteLine("SampleSize: {0}, Min {1}, Max {2}, Average {3}",
RunsPerSample, sampleTimes.Min(), sampleTimes.Max(), sampleTimes.Average());
The values from these tests do still vary (<10%) but I think that if I create a histogram chart of my values and drop the 10% highest values which are likely caused by the OS, GC, ... I can get really stable numbers which I can trust.
SampleSize: 100, Min 25, Max 86400, Average 28,614631
Edit2: The histograms show that the measured values are not random. They look like a Landau distribution which should give me with the right approximation algorithms stable values. I wish in .NET would exist something like ROOT where I can interactivly fit the right distribution function to my data and get the results.
The code to generate the histogram with the MSChart controls is below:
using System.Collections.Generic;
using System.Drawing;
using System.Linq;
using System.Windows.Forms;
using System.Windows.Forms.DataVisualization.Charting;
namespace ConsoleApplication4
{
public partial class Histogram : Form
{
public Histogram(long [] sampleTimes)
{
InitializeComponent();
Series histogramSeries = cHistogram.Series.Add("Histogram");
// Set new series chart type and other attributes
histogramSeries.ChartType = SeriesChartType.Column;
histogramSeries.BorderColor = Color.Black;
histogramSeries.BorderWidth = 1;
histogramSeries.BorderDashStyle = ChartDashStyle.Solid;
var filtered = RemoveHighValues(sampleTimes, 40);
KeyValuePair<long,int>[] histoData = GenerateHistogram(filtered);
ChartArea chartArea = cHistogram.ChartAreas[histogramSeries.ChartArea];
chartArea.AxisY.Title = "Frequency";
chartArea.AxisX.Minimum = histoData.Min( x=>x.Key );
chartArea.AxisX.Maximum = histoData.Max( x=>x.Key );
foreach (var v in histoData)
{
histogramSeries.Points.Add(new DataPoint(v.Key, v.Value));
}
chartArea.AxisY.Minimum = 0;
chartArea.AxisY.Maximum = histoData[0].Value + 100;
}
// Count the occurence of each value of input and return an array with the value as key and its count as value
// as ordered list starting with the highest counts.
KeyValuePair<long,int>[] GenerateHistogram(long [] input)
{
Dictionary<long, int> counts = new Dictionary<long, int>();
foreach (var value in input)
{
int old = 0;
if (!counts.TryGetValue(value, out old))
{
counts[value] = 0;
}
counts[value] = ++old;
}
var orderedCounts = (from x in counts
orderby x.Value descending
select x).ToArray();
return orderedCounts;
}
long[] RemoveHighValues(long[] input, int maxDifference)
{
var min = input.Min();
var max = input.Max();
long[] filtered = input;
while (max - min > maxDifference) // remove all values wich differ by more than maxDifference ticks
{
filtered = input.Where(x => x < max).ToArray();
max = filtered.Max();
}
return filtered;
}
}
}
Some of the reasons why professionals need to measure and report organizational performance are to justify the valid use of investors' money, guide managerial decision making by pointing out the trouble areas, compare performances of different functions, projects, and people, and to exercise control.
There are five specific types of measures that have been identified, defined and will be applied throughout Iowa state government: input, output, efficiency, quality and outcome.
The findings of the paper show that many internal and external factors of an organization, such as size, organizational structure, environmental uncertainty, strategy, market competition, and many other contextual variables, influence the uses of performance measures.
You're talking about an average difference of about a hundredth of a nanosecond per method call. Windows doesn't claim to be a realtime OS; these measurements are about as stable as you'll get.
And, by the way, the jitter will eliminate the bounds check inside your CastSafe
method. I'd be very surprised if you can find anything faster than that.
(If the bottleneck is CPU then you might improve the performance by using Parallel.For
rather than a plain for
loop, but to determine that you'd need to test against real-world data. For example, cache behaviour is going to be vastly different for an array of 43 ints than for an array of 43,000,000 ints.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With