Both Java and python (talking about CPython only) are interpreted to Java and CPython bytecode respectively. Both bytecodes are then interpreted by their respective virtual Machines (JVM & Cpython VM). (Here I am ignoring the JIT compilation part which kicks in after 10K runs.)
I have 2 questions regarding this:
Although it plays a big role in the runtime, I suppose static vs dynamic typing shouldn't play too big a role during the compilation and should not be the only reason for this difference in timings. Also, I think in both the implementations, some optimisation is done during the bytecode generation.
Is there something that I am missing here? (I do not have much experience working in Java.)
Update:
I actually did time profiling for python first run and later runs and found that statement 2 is wrong. There is a very noticeable difference when running a large python file.
Approach was simple. Created a large file with repeated lines of
a = 5
b = 6
c = a*b
print(str(c))
Then imported it to file large.py
and ran time python large.py
First run result:
python large.py 1.49s user 0.33s system 97% cpu 1.868 total
Second run result:
python large.py 0.20s user 0.08s system 90% cpu 0.312 total
After deleting the __pycache__ folder:
python large.py 1.57s user 0.34s system 97% cpu 1.959 total
So basically in python also, the compilation to bytecode is a costly process, just that it's not as costly as in java.
The Java byte code compiler has to do a lot more checks than the Python byte code compiler. To illustrate, take this line from the "hello world" program:
System.out.println("Hello World!");
To compile this single line of code, the compiler has to find what all of its parts mean. This is more complicated than it sounds: System
could be a package. Or it could be a class, either in the same package where the code is, or in one of the imported packages, or in java.lang
. So the compiler has to check all of those options, in that order. Once it finds the System
class, it has to check if its access modifiers permit this use.
After that, the compiler has to figure out what out
is: is it a nested class, or a class member, and what are its access modifiers? The compiler finds that it's a static member variable, of the PrintStream
type. Then it has to do the same checks for println
. The compiler cannot emit any code for this line of code until it knows all of this because the generated byte code is different based on the types of the objects involved.
All these checks take time, most importantly because the compiler has to load a ton of class definitions from the standard library even for the most trivial program.
In comparison, the Python byte code compiler only needs to parse the line, and it can immediately generate code without looking at extra modules. In Python the code would be compiled to:
The Python compiler doesn't care if some of these lookups failed at run time.
Another important difference is that the Java compiler is written entirely in Java, and compiled to machine code at run time, while much of CPython implementation is ahead-of-time compiled C code. This means Java has a bit of "cold start" problem compared to Python.
Update: Since Java 9, you can run a java program directly from source, without compiling it to byte code. Running a trivial "hello world" program gives you an idea of how much you save by compiling Java to byte code ahead of time, even for a trivial program:
time python hello.py
.time java Hello.java
time java Hello
Disclaimer: No scientific method followed or statistical analysis performed, so take this with a grain of salt. Test environment: Python version 3.8.5, Java version 11.0.8, on Fedora 32, with Intel i7 8750H CPU
hello.py:
print("hello world")
Hello.java:
public class Hello {
public static void main(String[] args) {
System.out.println("Hello world");
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With