Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Internal Architecture of Java Compiler [closed]

I have been working on Java from more than 8 years.

Last week, in a small meeting in my company, one of my colleague has asked me how exactly does Java Compiler work? I was with no answer.

I tried explaining, like Java Compiler takes statements one by one and converts them to byte code that is not targeted to any OS but to JVM.

No one satisfied with that answer even me.

Now the main question is how exactly java compiler works. i.e. How many steps or stages or phases are there which will be done by the compiler in case of compiling a Java file.

What exactly the Java's compiler architecture?

What if there are multiple Java classes in same .java file. Then how many classes will be compiled.

What if there are imports pointing to un-compiled Java classes? Then the un-compiled classes be compiled or ignored?

I googled for more than half a day and all are providing same answer as I gave to my colleagues.

But finally I found some useful tutorial here.

But the tutorial also covering not too in-depth and I could not visualize that tutorial.

Still I am not satisfied and eager to learn something more about this from you.

So if any one knows something more than me and the above blog, something by using which I can visualize what exactly the internal architecture of Java Compiler please explain me.

like image 551
Jagadeesh Avatar asked Sep 25 '15 09:09

Jagadeesh


People also ask

How does Java compiler work internally?

In Java, programs are not compiled into executable files; they are compiled into bytecode (as discussed earlier), which the JVM (Java Virtual Machine) then executes at runtime. Java source code is compiled into bytecode when we use the javac compiler. The bytecode gets saved on the disk with the file extension .

What is the architecture of JVM?

JVM (Java Virtual Machine) is an abstract machine. It is a specification that provides runtime environment in which java bytecode can be executed. JVMs are available for many hardware and software platforms (i.e.

What are the 3 components of JVM?

The JVM consists of three distinct components: Class Loader. Runtime Memory/Data Area. Execution Engine.

How does a compiler work internally?

As we already know, the compiler converts high-level source code to low-level code. Then, the target machine executes low-level code. On the other hand, the interpreter analyzes and executes source code directly.


1 Answers

Some basic steps:

  1. parse: Reads a set of *.java source files and maps the resulting token sequence into AST (Abstract Syntax Tree)-Nodes.
  2. enter: Enters symbols for the definitions into the symbol table.
  3. process annotations: If Requested, processes annotations found in the specifed compilation units.
  4. attribute: Attributes the Syntax trees. This step includes name resolution, type checking and constant folding.
  5. flow: Performs dataflow analysis on the trees from the previous step. This includes checks for assignments and reachability.
  6. desugar: Rewrites the AST and translates away some syntactic sugar.
  7. generate: Generates Source Files or Class Files.

In more details:

  1. Lex - Break the source file into individual words, or tokens.
  2. Parse - Analyze the phrase structure of the program.
  3. Semantic Actions - Build a piece of abstract syntax tree corresponding to each phrase.
  4. Semantic Analysis - Determine what each phrase means, relate uses of variables to their definitions, check types of expressions, request translation of each phrase.
  5. Frame Layout - Place variables, function-parameters, etc. into activation records (stack frames) in a machine-dependent way.
  6. Translate - Produce intermediate representation trees (IR trees), a notation that is not tied to any particular source language or targetmachine architecture.
  7. Canonicalize - Hoist side effects out of expressions, and clean up conditional branches, for the convenience of the next phases.
  8. Instruction Selection - Group the IR-tree nodes into clumps that correspond to the actions of target-machine instructions.
  9. Control Flow Analysis - Analyze the sequence of instructions into a control flow graph that shows all the possible flows of control the program might follow when it executes.

  10. Dataflow Analysis - Gather information about the flow of information through variables of the program; for example, liveness analysis calculates the places where each program variable holds a still-needed value (is live).

  11. Register Allocation - Choose a register to hold each of the variables and temporary values used by the program; variables not live at the same time can share the same register.

  12. Code Emission - Replace the temporary names in each machine instruction with machine registers.

There is a nice book:

Modern Compiler Implementation in Java

You may want to look inside javac code:

Javac Documentation

OpenJDK source code

Hacker's guide to javac

Don't Panic! To help newcomers to javac navigate their way around the code base

JVM JLS

like image 86
ACV Avatar answered Oct 09 '22 00:10

ACV