I have pretty much the same code in python and C. Python example: <pre class="prettyprint"><code>import numpy nbr_values = 8192 n_iter = 100000 a = numpy.ones(nbr_values).astype(numpy.float32) for i in range(n_iter): a = numpy.sin(a) </code></pre> C example: <pre class="prettyprint"><code>#include <stdio.h> #include <math.h> int main(void) { int i, j; int nbr_values = 8192; int n_iter = 100000; double x; for (j = 0; j < nbr_values; j++){ x = 1; for (i=0; i<n_iter; i++) x = sin(x); } return 0; } </code></pre> Something strange happen when I ran both examples: <pre class="prettyprint"><code>$ time python numpy_test.py real 0m5.967s user 0m5.932s sys 0m0.012s $ g++ sin.c $ time ./a.out real 0m13.371s user 0m13.301s sys 0m0.008s </code></pre> It looks like python/numpy is twice faster than C. Is there any mistake in the experiment above? How you can explain it? P.S. I have Ubuntu 12.04, 8G ram, core i5 btw

First, turn on optimization. Secondly, subtleties matter. Your C code is definitely not 'basically the same'. Here is equivalent C code: sinary2.c: <pre class="prettyprint"><code>#include <math.h> #include <stdlib.h> float *sin_array(const float *input, size_t elements) { int i = 0; float *output = malloc(sizeof(float) * elements); for (i = 0; i < elements; ++i) { output[i] = sin(input[i]); } return output; } </code></pre> sinary.c: <pre class="prettyprint"><code>#include <math.h> #include <stdlib.h> extern float *sin_array(const float *input, size_t elements) int main(void) { int i; int nbr_values = 8192; int n_iter = 100000; float *x = malloc(sizeof(float) * nbr_values); for (i = 0; i < nbr_values; ++i) { x[i] = 1; } for (i=0; i<n_iter; i++) { float *newary = sin_array(x, nbr_values); free(x); x = newary; } return 0; } </code></pre> Results: <pre class="prettyprint"><code>$ time python foo.py real 0m5.986s user 0m5.783s sys 0m0.050s $ gcc -O3 -ffast-math sinary.c sinary2.c -lm $ time ./a.out real 0m5.204s user 0m4.995s sys 0m0.208s </code></pre> The reason the program has to be split in two is to fool the optimizer a bit. Otherwise it will realize that the whole loop has no effect at all and optimize it out. Putting things in two files doesn't give the compiler visibility into the possible side-effects of <code>sin_array</code> when it's compiling <code>main</code> and so it has to assume that it actually has some and repeatedly call it. Your original program is not at all equivalent for several reasons. One is that you have nested loops in the C version and you don't in Python. Another is that you are working with arrays of values in the Python version and not in the C version. Another is that you are creating and discarding arrays in the Python version and not in the C version. And lastly you are using <code>float</code> in the Python version and <code>double</code> in the C version. Simply calling the <code>sin</code> function the appropriate number of times does not make for an equivalent test. Also, the optimizer is a really big deal for C. Comparing C code on which the optimizer hasn't been used to anything else when you're wondering about a speed comparison is the wrong thing to do. Of course, you also need to be mindful. The C optimizer is very sophisticated and if you're testing something that really doesn't do anything, the C optimizer might well notice this fact and simply not do anything at all, resulting in a program that's ridiculously fast.

Why is my python/numpy example faster than pure C implementation?

Tags:

performance

python

c

numpy

I have pretty much the same code in python and C. Python example:

import numpy
nbr_values = 8192
n_iter = 100000

a = numpy.ones(nbr_values).astype(numpy.float32)
for i in range(n_iter):
    a = numpy.sin(a)

C example:

#include <stdio.h>
#include <math.h>
int main(void)
{
  int i, j;
  int nbr_values = 8192;
  int n_iter = 100000;
  double x;  
  for (j = 0; j < nbr_values; j++){
    x = 1;
    for (i=0; i<n_iter; i++)
    x = sin(x);
  }
  return 0;
}

Something strange happen when I ran both examples:

$ time python numpy_test.py 
real    0m5.967s
user    0m5.932s
sys     0m0.012s

$ g++ sin.c
$ time ./a.out 
real    0m13.371s
user    0m13.301s
sys     0m0.008s

It looks like python/numpy is twice faster than C. Is there any mistake in the experiment above? How you can explain it?

P.S. I have Ubuntu 12.04, 8G ram, core i5 btw

649

asked Jan 22 '13 19:01

Artem Mezhenin

2 Answers

First, turn on optimization. Secondly, subtleties matter. Your C code is definitely not 'basically the same'.

Here is equivalent C code:

sinary2.c:

#include <math.h>
#include <stdlib.h>

float *sin_array(const float *input, size_t elements)
{
    int i = 0;
    float *output = malloc(sizeof(float) * elements);
    for (i = 0; i < elements; ++i) {
        output[i] = sin(input[i]);
    }
    return output;
}

sinary.c:

#include <math.h>
#include <stdlib.h>

extern float *sin_array(const float *input, size_t elements)

int main(void)
{
    int i;
    int nbr_values = 8192;
    int n_iter = 100000;
    float *x = malloc(sizeof(float) * nbr_values);  
    for (i = 0; i < nbr_values; ++i) {
        x[i] = 1;
    }
    for (i=0; i<n_iter; i++) {
        float *newary = sin_array(x, nbr_values);
        free(x);
        x = newary;
    }
    return 0;
}

Results:

$ time python foo.py 

real    0m5.986s
user    0m5.783s
sys 0m0.050s
$ gcc -O3 -ffast-math sinary.c sinary2.c -lm
$ time ./a.out 

real    0m5.204s
user    0m4.995s
sys 0m0.208s

The reason the program has to be split in two is to fool the optimizer a bit. Otherwise it will realize that the whole loop has no effect at all and optimize it out. Putting things in two files doesn't give the compiler visibility into the possible side-effects of sin_array when it's compiling main and so it has to assume that it actually has some and repeatedly call it.

Your original program is not at all equivalent for several reasons. One is that you have nested loops in the C version and you don't in Python. Another is that you are working with arrays of values in the Python version and not in the C version. Another is that you are creating and discarding arrays in the Python version and not in the C version. And lastly you are using float in the Python version and double in the C version.

Simply calling the sin function the appropriate number of times does not make for an equivalent test.

Also, the optimizer is a really big deal for C. Comparing C code on which the optimizer hasn't been used to anything else when you're wondering about a speed comparison is the wrong thing to do. Of course, you also need to be mindful. The C optimizer is very sophisticated and if you're testing something that really doesn't do anything, the C optimizer might well notice this fact and simply not do anything at all, resulting in a program that's ridiculously fast.

135

answered Sep 27 '22 18:09

Omnifarious

Because "numpy" is a dedicated math library implemented for speed. C has standard functions for sin/cos, that are generally derived for accuracy.

You are also not comparing apples with apples, as you are using double in C, and float32 (float) in python. If we change the python code to calculate float64 instead, the time increases by about 2.5 seconds on my machine, making it roughly match with the correctly optimized C version.

If the whole test was made to do something more complicated that requires more control structres (if/else, do/while, etc), then you would probably see even less difference between C and Python - because the C compiler can't really do "sin" any faster - unless you implement a better "sin" function.

Newer mind the fact that your code isn't quite the same on both sides... ;)

answered Sep 27 '22 17:09

Mats Petersson

Related questions
                            
                                python float to in int conversion
                            
                                Simulating /dev/random on Windows
                            
                                Using subprocess with select and pty hangs when capturing output
                            
                                In django-taggit, how to get tags for objects that are associated with a specific user?
                            
                                set axis limits in loglog plot with matplotlib
                            
                                Produce a summary ("pivot"?) table
                            
                                Global variable declaration Python
                            
                                CherryPy redirect to root
                            
                                Is there a shortcut for `self.somevariable = somevariable` in a Python class constructor?
                            
                                Error "No such file or directory" when running Django ./manage.py
                            
                                How python handles object instantiation in a ' for' loop
                            
                                Non-blocking server in Twisted
                            
                                Unicode in ipython notebook
                            
                                Testing that I have connected to a particular signal in Django
                            
                                how to (simply) build a integer and float mixed numpy array
                            
                                How can I call OCaml functions from a Python program?
                            
                                Scaled QPixmap looks bad
                            
                                How can I find a substring and highlight it in QTextEdit?
                            
                                Python multiprocessing pool.map raises IndexError
                            
                                Bind columns (from vectors) for numpy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With