Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stack (Haskell) build cache of source files with GitHub Actions

When building my Haskell project locally using stack build, only the changed source files are re-compiled. Unfortunately, I am not able to make Stack behave like this on GitHub Actions. Any suggestions please?

Example

I created a simple example with Lib.hs and Fib.hs, I even check that cached .stack-work folder is updated between builds but it always compiles both files even when just one is changed.

Here is the example:

  1. (no cache used, builds both Lib.hs and Fib.hs + dependencies): https://github.com/MarekSuchanek/stack-test/runs/542163994
  2. (only Lib.hs changes, builds both Lib.hs and Fib.hs): https://github.com/MarekSuchanek/stack-test/runs/542174351

I can observe from logs (verbose Stack) that something in cache is being updated, but it is totally not clear to me what and why. It correctly finds out that only Lib.hs is changed: "stack-test-0.1.0.0: unregistering (local file changes: src/Lib.hs)" so I can't understand why all gets compiled. I noticed that in 2. Fib.hi is not updated in .stack-work but others (Fib.o, Fib.dyn_hi, and Fib.dyn_o) are.

Note

Caching of ~/.stack is OK as well as no-build when no source file is changed. Of course, this is dummy example, but we have different projects with many more source files where it would significantly speed up the build. When non-source file is changed (e.g. README file), nothing is being built as expected.

like image 765
Marek Suchánek Avatar asked Mar 28 '20 20:03

Marek Suchánek


2 Answers

The culprit for this problem is that stack uses timestamp (as many other tools do) to figure out if a source file has changed or not. When you restore cache on CI and you do it correctly, none of the dependencies will get rebuild, but the problem the source files is that when the CI provider clones a repo for you, the timestamps for all of the files in the repo are set to the date and time when it was cloned.

Hopefully the cause for recompilation of unchanged source files makes sense now. What do we do about working around this problem. The only real way to get it is to restore the timestamp of the last git commit that changed a particular file. I noticed this quite a while ago and a bit of googling gave me some answers on SO, here is one of them I think: Restore a file's modification time in Git

A modified it a bit to suite my needs and that is what I ended up with:

  git ls-tree -r --name-only HEAD | while read filename; do
    TS="$(git log -1 --format="%ct" -- ${filename})"
    touch "${filename}" -mt "$(date --date="@$TS" "+%Y%m%d%H%M.%S")"
  done

That worker great for a while for me on Ubuntu CI, but solving this problem in an OS agnostic manner with bash is not something I wanted to do when I needed to setup Azure CI. For that reason I wrote a Haskell script that works for all GHC-8.2 version and newer without requiring any non-core dependencies. I use it for all of my projects and I'll embed the juice of it here, but also provide a link to a permanent gist:

main = do
  args <- getArgs
  let rev = case args of
        [] -> "HEAD"
        (x:_) -> x
  fs <- readProcess "git" ["ls-tree", "-r", "-t", "--full-name", "--name-only", rev] ""
  let iso8601 = iso8601DateFormat (Just "%H:%M:%S%z")
      restoreFileModtime fp = do
        modTimeStr <- readProcess "git" ["log", "--pretty=format:%cI", "-1", rev, "--", fp] ""
        modTime <- parseTimeM True defaultTimeLocale iso8601 modTimeStr
        setModificationTime fp modTime
        putStrLn $ "[" ++ modTimeStr ++ "] " ++ fp
  putStrLn "Restoring modification time for all these files:"
  mapM_ restoreFileModtime $ lines fs

How would you go about using it without much overhead. The trick is to:

  • use stack itself to run the script
  • use the exactly samel resolver as the one for the project.

Above two points will ensure that no redundant dependencies or ghc versions will get installed. All in all the only two things are needed are stack and something like curl or wget and it will work cross platform:

# Script for restoring source files modification time from commit to avoid recompilation.
curl -sSkL https://gist.githubusercontent.com/lehins/fd36a8cc8bf853173437b17f6b6426ad/raw/4702d0252731ad8b21317375e917124c590819ce/git-modtime.hs -o git-modtime.hs
# Restore mod time and setup ghc, if it wasn't restored from cache
stack script --resolver ${RESOLVER} git-modtime.hs --package base --package time --package directory --package process

Here is a real project that uses this approach and you can dig through it to see how it works: massiv-io

Edit @Simon Michael in the comments mentioned that he can't reproduce this issue locally. Reason for this is that not everything is the same up on CI as it is locally. Quite often an absolute path is different, for example, possibly other things that I can't think of right now. Those things, together with the source file timestamp cause the recompilation of the source files.

For example follow this steps and you will find your project will be recompiled:

~/tmp$ git clone [email protected]:fpco/safe-decimal.git
~/tmp$ cd safe-decimal
~/tmp/safe-decimal$ stack build
safe-decimal> configure (lib)
[1 of 2] Compiling Main
...
Configuring safe-decimal-0.2.0.0...
safe-decimal> build (lib)
Preprocessing library for safe-decimal-0.2.0.0..
Building library for safe-decimal-0.2.0.0..
[1 of 3] Compiling Numeric.Decimal.BoundedArithmetic
[2 of 3] Compiling Numeric.Decimal.Internal
[3 of 3] Compiling Numeric.Decimal
...
~/tmp/safe-decimal$ cd ../
~/tmp$ mv safe-decimal safe-decimal-moved
~/tmp$ cd safe-decimal-moved/
~/tmp/safe-decimal-moved$ stack build
safe-decimal-0.2.0.0: unregistering (old configure information not found)
safe-decimal> configure (lib)
[1 of 2] Compiling Main
...

You'll see that the location of the project triggered project building. Despite that the project itself was rebuild, you will notice that none of the source files were recompiled. Now if you combine that procedure with a touch of a source file, that source file will get recompiled.

To sum it up:

  • Environment can cause the project to be rebuild
  • Contents of a source file can cause the source file (and others that depend on it) to be recompiled
  • Environment together with the source file contents or timestamp change can cause the project together with that source file to be recompiled
like image 152
lehins Avatar answered Oct 01 '22 23:10

lehins


I have provided a PR fix for this so modified time is no longer relied on!

like image 43
Andres S Avatar answered Oct 02 '22 01:10

Andres S