When building my Haskell project locally using stack build
, only the changed source files are re-compiled. Unfortunately, I am not able to make Stack behave like this on GitHub Actions. Any suggestions please?
Example
I created a simple example with Lib.hs
and Fib.hs
, I even check that cached .stack-work folder is updated between builds but it always compiles both files even when just one is changed.
Here is the example:
Lib.hs
and Fib.hs
+ dependencies): https://github.com/MarekSuchanek/stack-test/runs/542163994
Lib.hs
changes, builds both Lib.hs
and Fib.hs
): https://github.com/MarekSuchanek/stack-test/runs/542174351
I can observe from logs (verbose Stack) that something in cache is being updated, but it is totally not clear to me what and why. It correctly finds out that only Lib.hs
is changed: "stack-test-0.1.0.0: unregistering (local file changes: src/Lib.hs)
" so I can't understand why all gets compiled. I noticed that in 2. Fib.hi
is not updated in .stack-work
but others (Fib.o
, Fib.dyn_hi
, and Fib.dyn_o
) are.
Note
Caching of ~/.stack is OK as well as no-build when no source file is changed. Of course, this is dummy example, but we have different projects with many more source files where it would significantly speed up the build. When non-source file is changed (e.g. README file), nothing is being built as expected.
The culprit for this problem is that stack uses timestamp (as many other tools do) to figure out if a source file has changed or not. When you restore cache on CI and you do it correctly, none of the dependencies will get rebuild, but the problem the source files is that when the CI provider clones a repo for you, the timestamps for all of the files in the repo are set to the date and time when it was cloned.
Hopefully the cause for recompilation of unchanged source files makes sense now. What do we do about working around this problem. The only real way to get it is to restore the timestamp of the last git commit that changed a particular file. I noticed this quite a while ago and a bit of googling gave me some answers on SO, here is one of them I think: Restore a file's modification time in Git
A modified it a bit to suite my needs and that is what I ended up with:
git ls-tree -r --name-only HEAD | while read filename; do
TS="$(git log -1 --format="%ct" -- ${filename})"
touch "${filename}" -mt "$(date --date="@$TS" "+%Y%m%d%H%M.%S")"
done
That worker great for a while for me on Ubuntu CI, but solving this problem in an OS agnostic manner with bash is not something I wanted to do when I needed to setup Azure CI. For that reason I wrote a Haskell script that works for all GHC-8.2 version and newer without requiring any non-core dependencies. I use it for all of my projects and I'll embed the juice of it here, but also provide a link to a permanent gist:
main = do
args <- getArgs
let rev = case args of
[] -> "HEAD"
(x:_) -> x
fs <- readProcess "git" ["ls-tree", "-r", "-t", "--full-name", "--name-only", rev] ""
let iso8601 = iso8601DateFormat (Just "%H:%M:%S%z")
restoreFileModtime fp = do
modTimeStr <- readProcess "git" ["log", "--pretty=format:%cI", "-1", rev, "--", fp] ""
modTime <- parseTimeM True defaultTimeLocale iso8601 modTimeStr
setModificationTime fp modTime
putStrLn $ "[" ++ modTimeStr ++ "] " ++ fp
putStrLn "Restoring modification time for all these files:"
mapM_ restoreFileModtime $ lines fs
How would you go about using it without much overhead. The trick is to:
stack
itself to run the scriptAbove two points will ensure that no redundant dependencies or ghc versions will get installed. All in all the only two things are needed are stack
and something like curl
or wget
and it will work cross platform:
# Script for restoring source files modification time from commit to avoid recompilation.
curl -sSkL https://gist.githubusercontent.com/lehins/fd36a8cc8bf853173437b17f6b6426ad/raw/4702d0252731ad8b21317375e917124c590819ce/git-modtime.hs -o git-modtime.hs
# Restore mod time and setup ghc, if it wasn't restored from cache
stack script --resolver ${RESOLVER} git-modtime.hs --package base --package time --package directory --package process
Here is a real project that uses this approach and you can dig through it to see how it works: massiv-io
Edit @Simon Michael in the comments mentioned that he can't reproduce this issue locally. Reason for this is that not everything is the same up on CI as it is locally. Quite often an absolute path is different, for example, possibly other things that I can't think of right now. Those things, together with the source file timestamp cause the recompilation of the source files.
For example follow this steps and you will find your project will be recompiled:
~/tmp$ git clone [email protected]:fpco/safe-decimal.git
~/tmp$ cd safe-decimal
~/tmp/safe-decimal$ stack build
safe-decimal> configure (lib)
[1 of 2] Compiling Main
...
Configuring safe-decimal-0.2.0.0...
safe-decimal> build (lib)
Preprocessing library for safe-decimal-0.2.0.0..
Building library for safe-decimal-0.2.0.0..
[1 of 3] Compiling Numeric.Decimal.BoundedArithmetic
[2 of 3] Compiling Numeric.Decimal.Internal
[3 of 3] Compiling Numeric.Decimal
...
~/tmp/safe-decimal$ cd ../
~/tmp$ mv safe-decimal safe-decimal-moved
~/tmp$ cd safe-decimal-moved/
~/tmp/safe-decimal-moved$ stack build
safe-decimal-0.2.0.0: unregistering (old configure information not found)
safe-decimal> configure (lib)
[1 of 2] Compiling Main
...
You'll see that the location of the project triggered project building. Despite that the project itself was rebuild, you will notice that none of the source files were recompiled. Now if you combine that procedure with a touch
of a source file, that source file will get recompiled.
To sum it up:
I have provided a PR fix for this so modified time is no longer relied on!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With