Multi-input, multi-output compilers with Shake

Question

I'm experimenting with using Shake to build Java code, and am a bit stuck because of the unusual nature of the javac compiler. In general for each module of a large project, the compiler is invoked with all of the source files for that module as input, and produces all of the output files in one pass. Subsequently we typically take the .class files produced by the compiler and assemble them into a JAR (basically just a ZIP).

For example, a typical Java module project is arranged as follows:

a src directory that contains multiple .java files, some of them nested many levels deep in a tree.
a bin directory that contains the output from the compiler. Typically this output follows the same directory structure and filenames, with .class substituted for each .java file, but the mapping is not necessarily one-to-one: a single .java file can produce zero to many .class files!

The rules I would like to define in Shake are therefore as follows:

1) If any file under src is newer than any file under bin then erase all contents of bin and recreate with:

javac -d bin <recursive list of .java files under src>

I know this rule seems excessive, but without invoking the compiler we cannot know the extent of changes in output resulting from even a small change in a single input file.

2) if any file under bin is newer than module.jar then recreate module.jar with:

jar cf module.jar -C bin .

Many thanks!

PS Responses in the vein "just use Ant/Maven/Gradle/" will not be appreciated! I know those tools offer Java compilation out-of-the-box, but they are much harder to compose and aggregate. This is why I want to experiment with a Haskell/Shake-based tool.

Neil Mitchell · Accepted Answer

Writing rules which produce multiple outputs whose names cannot be statically determined can be a bit tricky. The usual approach is to find an output whose name is statically known and always need that, or if none exists, create a fake file to use as the static output (as per ghc-make, the .result file). In your case you have module.jar as the ultimate output, so I would write:

"module.jar" *> \out -> do
    javas <- getDirectoryFiles "" ["src//*.java"]
    need javas
    liftIO $ removeFiles "" ["bin//*"]
    liftIO $ createDirectory "bin"
    () <- cmd "javac -d bin" javas
    classes <- getDirectoryFiles "" ["bin//*.class"]
    need classes
    cmd "jar cf" [out] "-C bin ."

There is no advantage to splitting it up into two rules, since you never depend on the .class files (and can't really, since they are unpredictable in name), and if any source file changes then you will always rebuild module.jar anyway. This rule has all the dependencies you mention, plus if you add/rename/delete any .java or .class file then it will automatically recompile, as the getDirectoryFiles call is tracked.

Multi-input, multi-output compilers with Shake

Tags:

haskell

shake-build-system

Neil Bartlett

1 Answers

Neil Mitchell

Recent Activity

Donate For Us

Multi-input, multi-output compilers with Shake

Tags:

haskell

shake-build-system

Neil Bartlett

1 Answers

Neil Mitchell

Related questions

Recent Activity

Donate For Us