Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my python/numpy example faster than pure C implementation?

I have pretty much the same code in python and C. Python example:

import numpy
nbr_values = 8192
n_iter = 100000

a = numpy.ones(nbr_values).astype(numpy.float32)
for i in range(n_iter):
    a = numpy.sin(a)

C example:

#include <stdio.h>
#include <math.h>
int main(void)
{
  int i, j;
  int nbr_values = 8192;
  int n_iter = 100000;
  double x;  
  for (j = 0; j < nbr_values; j++){
    x = 1;
    for (i=0; i<n_iter; i++)
    x = sin(x);
  }
  return 0;
}

Something strange happen when I ran both examples:

$ time python numpy_test.py 
real    0m5.967s
user    0m5.932s
sys     0m0.012s

$ g++ sin.c
$ time ./a.out 
real    0m13.371s
user    0m13.301s
sys     0m0.008s

It looks like python/numpy is twice faster than C. Is there any mistake in the experiment above? How you can explain it?

P.S. I have Ubuntu 12.04, 8G ram, core i5 btw

like image 649
Artem Mezhenin Avatar asked Jan 22 '13 19:01

Artem Mezhenin


People also ask

Is NumPy faster than C?

I will leave only new solution + NumPy implementation. See how just by performing memory optimizations we can significantly reduce execution time. From 2.8 seconds to 0.42! It is almost 7 times faster than straightforward C implementation.

Why is NumPy so much faster?

NumPy is fast because it can do all its calculations without calling back into Python. Since this function involves looping in Python, we lose all the performance benefits of using NumPy. For a 10,000,000-entry NumPy array, this functions takes 2.5 seconds to run on my computer.

Why is pandas NumPy faster than pure Python?

Because the Numpy array is densely packed in memory due to its homogeneous type, it also frees the memory faster. So overall a task executed in Numpy is around 5 to 100 times faster than the standard python list, which is a significant leap in terms of speed.

Why NumPy array operations are faster than looping?

Looping over Python arrays, lists, or dictionaries, can be slow. Thus, vectorized operations in Numpy are mapped to highly optimized C code, making them much faster than their standard Python counterparts.


2 Answers

First, turn on optimization. Secondly, subtleties matter. Your C code is definitely not 'basically the same'.

Here is equivalent C code:

sinary2.c:

#include <math.h>
#include <stdlib.h>

float *sin_array(const float *input, size_t elements)
{
    int i = 0;
    float *output = malloc(sizeof(float) * elements);
    for (i = 0; i < elements; ++i) {
        output[i] = sin(input[i]);
    }
    return output;
}

sinary.c:

#include <math.h>
#include <stdlib.h>

extern float *sin_array(const float *input, size_t elements)

int main(void)
{
    int i;
    int nbr_values = 8192;
    int n_iter = 100000;
    float *x = malloc(sizeof(float) * nbr_values);  
    for (i = 0; i < nbr_values; ++i) {
        x[i] = 1;
    }
    for (i=0; i<n_iter; i++) {
        float *newary = sin_array(x, nbr_values);
        free(x);
        x = newary;
    }
    return 0;
}

Results:

$ time python foo.py 

real    0m5.986s
user    0m5.783s
sys 0m0.050s
$ gcc -O3 -ffast-math sinary.c sinary2.c -lm
$ time ./a.out 

real    0m5.204s
user    0m4.995s
sys 0m0.208s

The reason the program has to be split in two is to fool the optimizer a bit. Otherwise it will realize that the whole loop has no effect at all and optimize it out. Putting things in two files doesn't give the compiler visibility into the possible side-effects of sin_array when it's compiling main and so it has to assume that it actually has some and repeatedly call it.

Your original program is not at all equivalent for several reasons. One is that you have nested loops in the C version and you don't in Python. Another is that you are working with arrays of values in the Python version and not in the C version. Another is that you are creating and discarding arrays in the Python version and not in the C version. And lastly you are using float in the Python version and double in the C version.

Simply calling the sin function the appropriate number of times does not make for an equivalent test.

Also, the optimizer is a really big deal for C. Comparing C code on which the optimizer hasn't been used to anything else when you're wondering about a speed comparison is the wrong thing to do. Of course, you also need to be mindful. The C optimizer is very sophisticated and if you're testing something that really doesn't do anything, the C optimizer might well notice this fact and simply not do anything at all, resulting in a program that's ridiculously fast.

like image 135
Omnifarious Avatar answered Sep 27 '22 18:09

Omnifarious


Because "numpy" is a dedicated math library implemented for speed. C has standard functions for sin/cos, that are generally derived for accuracy.

You are also not comparing apples with apples, as you are using double in C, and float32 (float) in python. If we change the python code to calculate float64 instead, the time increases by about 2.5 seconds on my machine, making it roughly match with the correctly optimized C version.

If the whole test was made to do something more complicated that requires more control structres (if/else, do/while, etc), then you would probably see even less difference between C and Python - because the C compiler can't really do "sin" any faster - unless you implement a better "sin" function.

Newer mind the fact that your code isn't quite the same on both sides... ;)

like image 43
Mats Petersson Avatar answered Sep 27 '22 17:09

Mats Petersson