Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: Do BOTH the compiler AND the JRE require access to all 3rd-party class files?

Tags:

java

maven

javac

I have 15 years' C++ experience but am new to Java. I am trying to understand how the absence of header files is handled by Java. I have a few questions related to this issue.

Specifically, suppose that I write source code for a class 'A' that imports a 3rd-party class 'Z' (and uses Z). I understand that at compile-time, the Java compiler must have "access" to the information about Z in order to compile A.java, creating A.class. Therefore, either Z.java or Z.class (or a JAR containing one of these; say Z.jar) must be present on the local filesystem at compile time - correct?

Does the compiler use a class loader to load Z (to reiterate - at compile time)?

If I'm correct that a class loader is used at COMPILE time, what if a user-defined class loader (L) is desired - and is part of the project being compiled? Suppose, for example, that L is responsible for downloading Z.class AT RUNTIME across a network? In this scenario, how will the Java compiler obtain Z.class at compile time? Will it attempt to compile L first, and then use L at compile time to obtain Z?

I understand that using Maven to build the project, Z.jar can be located on a remote repository over the internet at compile time - either on ibiblio, or on a custom repository defined in the POM file. I hope I'm correct that it is MAVEN that is responsible for downloading the 3rd-party JAR file at compile time, rather than the compiler's JVM?

Note, however, that at RUNTIME, A.class again requires Z.class - how will JRE know where to download Z.class from (without Maven to help)? Or is it the developer's responsibility to ship Z.class along with A.class with the application (say in the JAR file)? (...assuming a user-defined class loader is not used.)

Now a related question, just for confirmation: I assume that once compiled, A.class contains only symbolic links to Z.class - the bytecodes of Z.class are not part of A.class; please correct me if I'm wrong. (In C++, static linking would copy the bytes from Z.class into A.class, whereas dynamic linking would not.)

Another related question regarding the compilation process: once the necessary files describing Z are located on the CLASSPATH at compile time, does the compiler require the bytecodes from Z.class in order to compile A.java (and will build Z.class, if necessary, from Z.java), or does Z.java suffice for the compiler?

My overall confusion can be summarized as follows. It seems that the full [byte]code for Z needs to be present TWICE - once during compilation, and a second time during runtime - and that this must be true for ALL classes referenced by a Java program. In other words, every single class must be downloaded/present TWICE. Not a single class can be represented during compile time as just a header file (as it can be in C++).

like image 878
Dan Nissenbaum Avatar asked Feb 25 '23 15:02

Dan Nissenbaum


2 Answers

Does the compiler use a class loader to load Z (to reiterate - at compile time)?

Almost. It uses a JavaFileManager which acts like a class loader in many ways. It does not actually load classes though since it needs to create class signatures from .java files as well as .class files.

I hope I'm correct that it is MAVEN that is responsible for downloading the 3rd-party JAR file at compile time, rather than the compiler's JVM?

Yes, Maven pulls down jars, although it is possible to implement a JavaFileManager that behaves like a URLClassLoader. Maven manages a local cache of jars, and will fill that cache from the network as needed.

Another related question regarding the compilation process: once the necessary files describing Z are located on the CLASSPATH at compile time, does the compiler require the bytecodes from Z.class in order to compile A.java (and will build Z.class, if necessary, from Z.java), or does Z.java suffice for the compiler?

It does not require all bytecode. Just class, method, and property signatures and metadata. If A depends on Z, that dependency can be satisfied by a Z.java found on the source path, on a Z.class found on any of the (class path, system class path), or via some custom extension like a Z.jsp.

My overall confusion can be summarized as follows. It seems that the full [byte]code for Z needs to be present TWICE - once during compilation, and a second time during runtime - and that this must be true for ALL classes referenced by a Java program. In other words, every single class must be downloaded/present TWICE. Not a single class can be represented during compile time as just a header file (as it can be in C++).

Maybe an example can help clear this up. The java language specification requires the compiler do certain optimizations. Inlining of static final primtives and Strings.

If class A depends on B only for a constant:

class B {
  public static final String FOO = "foo";
}

class A {
  A() { System.out.println(B.FOO); }
}

then A can be compiled, loaded, and instantiated without B.class on the classpath. If you changed and shipped a B.class with a different value of FOO then A would still have that compile time dependency.

So it is possible to have a compile-time dependency and not a link-time dependency.

It is, of course, possible to have a runtime dependency without a compile-time dependency via reflection.

To summarize, at compile time, the compiler makes sure that the methods and properties a class accesses are available.

At class load time (runtime) the byte-code verifier checks that the expected methods and properties are really there. So the byte-code verifier double checks the assumptions the compiler makes (except for inlining assumptions such as those above).

It is possible to blur these distinctions. E.g. JSP uses a custom classloader that invokes the java compiler to compile and load classes from source as needed at runtime.

like image 173
Mike Samuel Avatar answered Feb 27 '23 04:02

Mike Samuel


The best way to understand how Maven fits into the picture is to realize that it (mostly) doesn't.

Maven is NOT INVOLVED in the processes by which the compiler finds definitions, or the runtime system loads classes. The compiler does this by itself ... based on what the build-time classpath says. By the time that you run the application, Maven is no longer in the picture at all.

At build time, Maven's role is to examine the project dependencies declared in the POM files, check versions, download missing projects, put the JARs in a well known place and create a "classpath" for the compiler (and other tools) to use.

The compiler then "loads" the classes that it needs from those JAR files to extract type signature information in the compiled class files. It doesn't use a regular class loader to do this, but the basic algorithm for locating the classes is the same.

Once the compiler has done, Maven then takes care of packaging into JAR, WAR, EAR files and so on, as specified by the POM file(s). In the case of a WAR or EAR file, all of the required dependent JARs packaged into the file.

No Maven-directed JAR downloading takes place at runtime. However, it is possible that running the application could involve downloading JAR files; e.g. if the application is deployed using Java WebStart. (But the JARs won't be downloaded from a Maven repository in this case ...)

Some more things to note:

  • Maven does not need to be in the picture at all. You could use an IDE to do the building, the Ant build tool (maybe with Ivy), Make or even "dumb" shell scripts. Depending on the build mechanism, you may need to handle external dependencies by hand; e.g. figuring out with external JARs to download, where to put them and so on.

  • The Java runtime system typically has to load more than the compiler does. The compiler only needs to load those classes that are necessary to type-check the classes that are being compiled.

    For example, suppose class A has a method that uses class B as a parameter, and class B has a method that uses class C as a parameter. When compiling A, B needs to be loaded, but not C (unless A directly depends on C in some way). When executing A, both B and C needs to be loaded.

    A second example, suppose that class A depends on interface I with implementations IC1 and IC2. Unless A explicitly depends on IC1 or IC2, the compiler does not need to load them to compile A.

  • It is also possible to dynamically load classes at runtime; e.g. by calling Class.forName(className) where className is a string-valued expression.


You wrote:

For the example in your second bullet point - I'd think that the developer could choose to provide, at compile time, a stub file for B that does not include B's method that uses C, and A would compile just fine. This would confirm my assessment that, at compile time, what might be called "header" files with only the necessary functions declared (even as stubs) is perfectly allowed in Java - so it's just for convenience/convention that tools have evolved over time not to use a header/source file distinction. (Correct me if I'm wrong.)

It is not a convenience / evolutionary thing. Java has NEVER supported separate header files. James Gosling et al started from the position that header files and preprocessors were a bad idea.

Your hypothetical stub version of B would have to have all of the visible methods, constructors and fields of the real B, and the methods and constructors would have to have bodies. The stub B wouldn't compile otherwise. (I guess in theory, the bodies could be empty, return a dummy value or throw an unchecked exception.)

The problem with this approach is that it would be horribly fragile. If you made the smallest mistake in keeping the stub and full versions of B in step, the result would be that the class loader (at runtime) would report a fatal error.

By the way, C and C++ are pretty much the exception in having separate header files. In most other languages that support separate compilation (of different files comprising an application), the compiler can extract the interface information (e.g. signatures) from the implementation source code.

like image 28
Stephen C Avatar answered Feb 27 '23 03:02

Stephen C