Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caching APT packages in GitHub Actions workflow

I use the following Github Actions workflow for my C project. The workflow finishes in ~40 seconds, but more than half of that time is spent by installing the valgrind package and its dependencies.

I believe caching could help me speed up the workflow. I do not mind waiting a couple of extra seconds, but this just seems like a pointless waste of GitHub's resources.

name: C Workflow  on: [push, pull_request]  jobs:   build:     runs-on: ubuntu-latest      steps:     - uses: actions/checkout@v1      - name: make       run: make      - name: valgrind       run: |         sudo apt-get install -y valgrind         valgrind -v --leak-check=full --show-leak-kinds=all ./bin 

Running sudo apt-get install -y valgrind installs the following packages:

  • gdb
  • gdbserver
  • libbabeltrace1
  • libc6-dbg
  • libipt1
  • valgrind

I know Actions support caching of a specific directory (and there are already several answered SO questions and articles about this), but I am not sure where all the different packages installed by apt end up. I assume /bin/ or /usr/bin/ are not the only directories affected by installing packages.

Is there an elegant way to cache the installed system packages for future workflow runs?

like image 421
natiiix Avatar asked Dec 10 '19 14:12

natiiix


People also ask

How does caching work in GitHub Actions?

The cache key uses contexts and expressions to generate a key that includes the runner's operating system and a SHA-256 hash of the package-lock. json file. When key matches an existing cache, it's called a cache hit, and the action restores the cached files to the path directory.

Where is apt package cache?

When you install a package using apt-get or apt command (or DEB packages in the software center), the apt package manager downloads the package and its dependencies in . deb format and keeps it in /var/cache/apt/archives folder. While downloading, apt keeps the deb package in /var/cache/apt/archives/partial directory.

Does GitHub Actions use Yaml?

GitHub Actions uses YAML syntax to define the workflow. Each workflow is stored as a separate YAML file in your code repository, in a directory named . github/workflows . You can create an example workflow in your repository that automatically triggers a series of commands whenever code is pushed.

How do I install packages using GitHub actions?

You can install packages as part of your CI flow using GitHub Actions. For example, you could configure a workflow so that anytime a developer pushes code to a pull request, the workflow resolves dependencies by downloading and installing packages hosted by GitHub Packages. Then, the workflow can run CI tests that require the dependencies.

How do I cache dependencies in GitHub workflows?

To help speed up the time it takes to recreate these files, GitHub can cache dependencies you frequently use in workflows. To cache dependencies for a job, you'll need to use GitHub's cache action. The action retrieves a cache identified by a unique key. For more information, see actions/cache.

When to use the cache action in a workflow?

Note: You must use the cache action in your workflow before you need to use the files that might be restored from the cache. If the provided key doesn't match an existing cache, a new cache is automatically created if the job completes successfully. Every programming language and framework has its own way of caching.

What is the use of actions/cache?

This action allows caching of Advanced Package Tool (APT) package dependencies to improve workflow execution time instead of installing the packages on every run. This action is a composition of actions/cache and the apt utility. Some actions require additional APT based packages to be installed in order for other steps to be executed.


2 Answers

The purpose of this answer is to show how caching can be done with github actions. Not necessarily to show how to cache valgrind, which it does show, but more so to show that not everything can/should be cached, and that the tradeoffs of caching and restoring a cache, vs reinstalling the dependency needs to be taken into account.


You will make use of the actions/cache action to do this.

Add it as a step (before you need to use valgrind):

- name: Cache valgrind   uses: actions/cache@v2   id: cache-valgrind   with:       path: "~/valgrind"       key: ${{secrets.VALGRIND_VERSION}} 

The next step should attempt to install the cached version if any or install from the repositories:

- name: Install valgrind   env:     CACHE_HIT: ${{steps.cache-valgrind.outputs.cache-hit}}     VALGRIND_VERSION: ${{secrets.VALGRIND_VERSION}}   run: |       if [[ "$CACHE_HIT" == 'true' ]]; then         sudo cp --verbose --force --recursive ~/valgrind/* /       else         sudo apt-get install --yes valgrind="$VALGRIND_VERSION"         mkdir -p ~/valgrind         sudo dpkg -L valgrind | while IFS= read -r f; do if test -f $f; then echo $f; fi; done | xargs cp --parents --target-directory ~/valgrind/       fi 

Explanation

Set VALGRIND_VERSION secret to be the output of:

apt-cache policy valgrind | grep -oP '(?<=Candidate:\s)(.+)' 

this will allow you to invalidate the cache when a new version is released simply by changing the value of the secret.

dpkg -L valgrind is used to list all the files installed when using sudo apt-get install valgrind.

What we can now do with this command is to copy all the dependencies to our cache folder:

dpkg -L valgrind | while IFS= read -r f; do if test -f $f; then echo $f; fi; done | xargs cp --parents --target-directory ~/valgrind/ 

Furthermore

In addition to copying all the components of valgrind, it may also be necessary to copy the dependencies (such as libc in this case), but I don't recommend continuing along this path because the dependency chain just grows from there. To be precise, the dependencies needed to copy to finally have an environment suitable for valgrind to run in is as follows:

  • libc6
  • libgcc1
  • gcc-8-base

To copy all these dependencies, you can use the same syntax as above:

for dep in libc6 libgcc1 gcc-8-base; do     dpkg -L $dep | while IFS= read -r f; do if test -f $f; then echo $f; fi; done | xargs cp --parents --target-directory ~/valgrind/ done 

Is all this work really worth the trouble when all that is required to install valgrind in the first place is to simply run sudo apt-get install valgrind? If your goal is to speed up the build process, then you also have to take into consideration the amount of time it is taking to restore (downloading, and extracting) the cache vs simply running the command again to install valgrind.


And finally to restore the cache, assuming it is stored at /tmp/valgrind, you can use the command:

cp --force --recursive /tmp/valgrind/* / 

Which will basically copy all the files from the cache unto the root partition.

In addition to the process above, I also have an example of "caching valgrind" by installing and compiling it from source. The cache is now about 63MB (compressed) in size and one still needs to separately install libc which kind of defeats the purpose.


Note: Another answer to this question proposes what I could consider to be a safer approach to caching dependencies, by using a container which comes with the dependencies pre-installed. The best part is that you can use actions to keep those containers up-to-date.

References:

  • https://askubuntu.com/a/408785
  • https://unix.stackexchange.com/questions/83593/copy-specific-file-type-keeping-the-folder-structure
like image 134
smac89 Avatar answered Sep 18 '22 14:09

smac89


You could create a docker image with valgrind preinstalled and run your workflow on that.

Create a Dockerfile with something like:

FROM ubuntu  RUN apt-get install -y valgrind 

Build it and push it to dockerhub:

docker build -t natiiix/valgrind . docker push natiiix/valgrind 

Then use something like the following as your workflow:

name: C Workflow  on: [push, pull_request]  jobs:   build:     container: natiiix/valgrind      steps:     - uses: actions/checkout@v1      - name: make       run: make      - name: valgrind       run: valgrind -v --leak-check=full --show-leak-kinds=all ./bin 

Completely untested, but you get the idea.

like image 23
deivid Avatar answered Sep 21 '22 14:09

deivid