Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compilation to Bytecode, Java vs Python. What is the reason for the difference in time taken?

Both Java and python (talking about CPython only) are interpreted to Java and CPython bytecode respectively. Both bytecodes are then interpreted by their respective virtual Machines (JVM & Cpython VM). (Here I am ignoring the JIT compilation part which kicks in after 10K runs.)

I have 2 questions regarding this:

  1. Why does Java compilation to java bytecode take so much time as compared to python? In java, compilation is an explicit step while in python it happens at runtime.
  2. Why is there no noticeable difference between the first run and the nth run of python when in the first run compilation to CPython bytecode is done and cached in .pyc files which is used in all successive runs. Is this bytecode compilation really an almost zero cost task in python?

Although it plays a big role in the runtime, I suppose static vs dynamic typing shouldn't play too big a role during the compilation and should not be the only reason for this difference in timings. Also, I think in both the implementations, some optimisation is done during the bytecode generation.

Is there something that I am missing here? (I do not have much experience working in Java.)

Update:

I actually did time profiling for python first run and later runs and found that statement 2 is wrong. There is a very noticeable difference when running a large python file.

Approach was simple. Created a large file with repeated lines of

a = 5
b = 6
c = a*b
print(str(c))

Then imported it to file large.py and ran time python large.py

First run result:

python large.py  1.49s user 0.33s system 97% cpu 1.868 total

Second run result:

python large.py  0.20s user 0.08s system 90% cpu 0.312 total

After deleting the __pycache__ folder:

python large.py  1.57s user 0.34s system 97% cpu 1.959 total

So basically in python also, the compilation to bytecode is a costly process, just that it's not as costly as in java.

like image 477
sprksh Avatar asked Mar 03 '23 04:03

sprksh


1 Answers

The Java byte code compiler has to do a lot more checks than the Python byte code compiler. To illustrate, take this line from the "hello world" program:

System.out.println("Hello World!");

To compile this single line of code, the compiler has to find what all of its parts mean. This is more complicated than it sounds: System could be a package. Or it could be a class, either in the same package where the code is, or in one of the imported packages, or in java.lang. So the compiler has to check all of those options, in that order. Once it finds the System class, it has to check if its access modifiers permit this use.

After that, the compiler has to figure out what out is: is it a nested class, or a class member, and what are its access modifiers? The compiler finds that it's a static member variable, of the PrintStream type. Then it has to do the same checks for println. The compiler cannot emit any code for this line of code until it knows all of this because the generated byte code is different based on the types of the objects involved.

All these checks take time, most importantly because the compiler has to load a ton of class definitions from the standard library even for the most trivial program.

In comparison, the Python byte code compiler only needs to parse the line, and it can immediately generate code without looking at extra modules. In Python the code would be compiled to:

  • looking up a "System" object from the current scope (LOAD_NAME)
  • looking up an "out" attribute from System (LOAD_ATTR)
  • looking up "println" from "out" (LOAD_METHOD)
  • generate code to call it (CALL_METHOD)

The Python compiler doesn't care if some of these lookups failed at run time.

Another important difference is that the Java compiler is written entirely in Java, and compiled to machine code at run time, while much of CPython implementation is ahead-of-time compiled C code. This means Java has a bit of "cold start" problem compared to Python.

Update: Since Java 9, you can run a java program directly from source, without compiling it to byte code. Running a trivial "hello world" program gives you an idea of how much you save by compiling Java to byte code ahead of time, even for a trivial program:

  • The python program runs in 45-50 milliseconds as measured with time python hello.py.
  • The Java program without compiling to byte code ahead of time runs in 350-400 milliseconds as measured with time java Hello.java
  • The Java program after compiling to byte code runs in 70-80 milliseconds, as measured with time java Hello

Disclaimer: No scientific method followed or statistical analysis performed, so take this with a grain of salt. Test environment: Python version 3.8.5, Java version 11.0.8, on Fedora 32, with Intel i7 8750H CPU

hello.py:

print("hello world")

Hello.java:

public class Hello {
    public static void main(String[] args) {
        System.out.println("Hello world");
    }
}
like image 174
Joni Avatar answered Mar 05 '23 16:03

Joni