I am currently comparing two loop calculation in Python3 and C. For Python, I have:
# Python3
t1 = time.process_time()
a = 100234555
b = 22333335
c = 341500
for i in range(1, 10000000001):
a = a - (b % 2)
b = b - (c % 2)
print("Sum is", a+b)
t2 = time.process_time()
print(t2-t1, "Seconds")
Then in C, I do the same thing:
#include <stdio.h>
int main() {
long long a = 100234555;
long long b = 22333335;
long long c = 341500;
for(long long i = 1; i <= 10000000000; i++){
a = a - (b % 2);
b = b - (c % 2);
}
printf("Sum is %lld\n", a+b);
return 0;
}
I timed both the code in Python and in C. The timing for Python is around 3500 seconds while the timing in C (including compilation and execution) only takes around 0.3 seconds.
I am wondering how there is such a big difference in timing. The execution was done on a server with 100 GB Ram and enough processing power.
It's partially due to the fact that Python byte code is executed by a program instead of the CPU directly, but most of the overhead is caused by the memory allocation and deallocation caused by the immutability of integers which is due to the object model, not the interpretedness.
What's going on is that your C code can change the value of the numbers, but in Python numbers are immutable which means they do not change. This means that when you do a sum, Python has to create a new int
object for each new value, and then destroy the old int
s after they're no longer used. This makes it much slower than just modifying a single memory value.
There is also the possibility that your C compiler is being clever, and reasons that via a chain of optimisations it can completely remove your for
loop, and the result will be identical – as if the loop had actually run. I'd expect the code to run much faster than it did if that had been the case in your example, but it could do that.
Python has no such smart compiler. It can't do something as grand and clever as that; it's just not designed to optimise the code because it's so hard to do so reliably in a dynamically-typed language (though the fact that Python is strongly-typed does make it somewhat of a possibility.
As dmuir noticed, the code can be simplified drastically if the compiler propagates some constants correctly. For example: clang -O1
compiles the C code down to this (cf https://gcc.godbolt.org/z/1ZH8Rm ):
main: # @main
push rax
movabs rsi, -9877432110
mov edi, offset .L.str
xor eax, eax
call printf
xor eax, eax
pop rcx
ret
.L.str:
.asciz "Sum is %lld\n"
gcc -O1
produces essentially similar code.
Since this boils down to a single call to printf
, the explanation seems to be:
Testing the C version with optimisations disabled (-O0
) produces this output:
$ time (clang -O0 -o loop10g loop10g.c && ./loop10g)
Sum is -9877432110
real 4m15.352s
user 3m47.232s
sys 0m3.252s
Still much faster with unoptimized C than Python: 255 seconds vs: >3500
The Python code is interpreted with byte-code and a stack with dynamically typed values: a factor of 10 to 20 is a typical slowdown. Furthermore the integer arithmetic automatically switches to bignum mode for large values, which may be the case here, although the penalty should be even higher.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With