Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid reinstalling dependencies for each job in Gitlab CI

I'm using Gitlab CI 8.0 with gitlab-ci-multi-runner 0.6.0. I have a .gitlab-ci.yml file similar to the following:

before_script:   - npm install  server_tests:   script: mocha  client_tests:   script: karma start karma.conf.js 

This works but it means the dependencies are installed independently before each test job. For a large project with many dependencies this adds a considerable overhead.

In Jenkins I would use one job to install dependencies then TAR them up and create a build artefact which is then copied to downstream jobs. Would something similar work with Gitlab CI? Is there a recommended approach?

like image 469
Tamlyn Avatar asked Nov 04 '15 15:11

Tamlyn


People also ask

What does clear runner caches do in GitLab?

Clearing the cache manually Introduced in GitLab 10.4. If you want to avoid editing .gitlab-ci.yml , you can easily clear the cache via GitLab's UI: Navigate to your project's CI/CD > Pipelines page. Click on the Clear Runner caches button to clean up the cache. On the next push, your CI/CD job will use a new cache.

What is cache in GitLab CI?

all tiers. A cache is one or more files a job downloads and saves. Subsequent jobs that use the same cache don't have to download the files again, so they execute more quickly.

Do GitLab jobs run in parallel?

When there are many team members waiting on a running pipeline to finish to be able to make a contribution to the project, the productivity of the team takes a hit. GitLab provides a method to make clones of a job and run them in parallel for faster execution using the parallel: keyword.

Which file is used to specify the jobs within the GitLab CI pipeline?

gitlab-ci. yml file is a YAML file where you configure specific instructions for GitLab CI/CD. In this file, you define: The structure and order of jobs that the runner should execute.


1 Answers

Update: I now recommend using artifacts with a short expire_in. This is superior to cache because it only has to write the artifact once per pipeline whereas the cache is updated after every job. Also the cache is per runner so if you run your jobs in parallel on multiple runners it's not guaranteed to be populated, unlike artifacts which are stored centrally.


Gitlab CI 8.2 adds runner caching which lets you reuse files between builds. However I've found this to be very slow.

Instead I've implemented my own caching system using a bit of shell scripting:

before_script:   # unique hash of required dependencies   - PACKAGE_HASH=($(md5sum package.json))   # path to cache file   - DEPS_CACHE=/tmp/dependencies_${PACKAGE_HASH}.tar.gz   # Check if cache file exists and if not, create it   - if [ -f $DEPS_CACHE ];     then       tar zxf $DEPS_CACHE;     else       npm install --quiet;       tar zcf - ./node_modules > $DEPS_CACHE;     fi 

This will run before every job in your .gitlab-ci.yml and only install your dependencies if package.json has changed or the cache file is missing (e.g. first run, or file was manually deleted). Note that if you have several runners on different servers, they will each have their own cache file.

You may want to clear out the cache file on a regular basis in order to get the latest dependencies. We do this with the following cron entry:

@daily               find /tmp/dependencies_* -mtime +1 -type f -delete 
like image 165
Tamlyn Avatar answered Oct 11 '22 15:10

Tamlyn