Passing files from a rocker container to a latex container within a gitlab-ci job

Tags:

I would like to use Gitlab CI to compile a Latex article as explained in this answer on tex.stackexchange (a similar pdf generation example is shown in the gitlab documentation for artifacts). I use a special latex template given by the journal editor. My Latex article contains figures made with the R statistical software. R and Latex are two large software installations with a lot of dependencies so I decided to use two separate containers for the build, one for the statistical analysis and visualization with R and one to compile a Latex document to pdf.

Here is the content of .gitlab-ci.yml:

knit_rnw_to_tex:
  image: rocker/verse:4.0.0
  script:
  - Rscript -e "knitr::knit('article.Rnw')"
  artifacts:
    paths:
      - figure/

compile_pdf:
  image: aergus/latex
  script:
  - ls figure
  - latexmk -pdf -bibtex -use-make article.tex
  artifacts:
    paths:
      - article.pdf

The knit_rnw_to_tex job executed in the R "rocker" container is successful and I can download the figure artifacts from the gitlab "jobs" page. The issue in the second job compile_pdf is that ls figure shows me an empty folder and the Latex article compilation fails because of missing figures.

It should be possible to use artifacts to pass data between jobs according to this answer and to this well explained forum post but they use only one container for different jobs. It doesn't work in my case. Probably because I use two different containers?
Another solution would be to use only the rocker/tidyverse container and install latexmk in there, but the installation of apt install latexmk fails for an unknown reason. Maybe because It has over hundred dependencies and that is to much for gitlab-CI?
The "dependencies" keyword could help according to that answer, but the artifacts are still not available when I use it.
How can I pass the artifacts from one job to the other?
Should I use cache as explained in docs.gitlab.com / caching?

517

asked Apr 15 '21 15:04

Paul Rougieux

2 Answers

Thank you for the comment as I wanted to be sure, how you do it. Example would help too, but I'll be generic for now (using docker).

To run multiple containers you need a (The Docker executor)

To quote the documentation on it:

The Docker executor when used with GitLab CI, connects to Docker Engine and runs each build in a separate and isolated container using the predefined image that is set up in .gitlab-ci.yml and in accordance in config.toml.

Workflow

The Docker executor divides the job into multiple steps:

Prepare: Create and start the services.

Pre-job: Clone, restore cache and download artifacts from previous stages. This is run on a special Docker image.

Job: User build. This is run on the user-provided Docker image.

Post-job: Create cache, upload artifacts to GitLab. This is run on a special Docker Image.

Your config.toml could look like this:

[runners.docker]
  image = "rocker/verse:4.0.0"
  builds_dir = /home/builds/rocker

[[runners.docker.services]]
  name = "aergus/latex"
  alias = "latex"

From above linked documentation:

The `image` keyword

The image keyword is the name of the Docker image that is present in the local Docker Engine (list all images with docker images) or any image that can be found at Docker Hub. For more information about images and Docker Hub please read the Docker Fundamentals documentation.

In short, with image we refer to the Docker image, which will be used to create a container on which your build will run.

If you don’t specify the namespace, Docker implies library which includes all official images. That’s why you’ll see many times the library part omitted in .gitlab-ci.yml and config.toml. For example you can define an image like image: ruby:2.6, which is a shortcut for image: library/ruby:2.6.

Then, for each Docker image there are tags, denoting the version of the image. These are defined with a colon (:) after the image name. For example, for Ruby you can see the supported tags at docker hub. If you don’t specify a tag (like image: ruby), latest is implied.

The image you choose to run your build in via image directive must have a working shell in its operating system PATH. Supported shells are sh, bash, and pwsh (since 13.9) for Linux, and PowerShell for Windows. GitLab Runner cannot execute a command using the underlying OS system calls (such as exec).

The `services` keyword

The services keyword defines just another Docker image that is run during your build and is linked to the Docker image that the image keyword defines. This allows you to access the service image during build time.

The service image can run any application, but the most common use case is to run a database container, e.g., mysql. It’s easier and faster to use an existing image and run it as an additional container than install mysql every time the project is built.

You can see some widely used services examples in the relevant documentation of CI services examples.

If needed, you can assign an alias to each service.

As for your questions:

It should be possible to use artifacts to pass data between jobs according to this answer and to this well explained forum post but they use only one container for different jobs. It doesn't work in my case. Probably because I use two different containers?

The builds and cache storage (from documentation)

The Docker executor by default stores all builds in /builds/<namespace>/<project-name> and all caches in /cache (inside the container). You can overwrite the /builds and /cache directories by defining the builds_dir and cache_dir options under the [[runners]] section in config.toml. This will modify where the data are stored inside the container.

If you modify the /cache storage path, you also need to make sure to mark this directory as persistent by defining it in volumes = ["/my/cache/"] under the [runners.docker] section in config.toml.

builds_dir -> Absolute path to a directory where builds are stored in the context of the selected executor. For example, locally, Docker, or SSH.

The [[runners]] section documentation

As you may have noticed I have customized the build_dir in your toml file to /home/builds/rocker, please adjust it to your own path.

How can I pass the artifacts from one job to the other?

You can use the build_dir directive. Second option would to use Job Artifacts API.

Should I use cache as explained in docs.gitlab.com / caching?

Yes, You should use cache to store project dependencies. The advantage is that you fetch the dependencies only once from internet and then subsequent runs are much faster as they can skip this step. Artifacts are used to share results between build stages.

I hope it is now clearer and I have pointed you into right direction.

answered Oct 05 '22 12:10

tukan

The two different images are not the cause of your problems. The artifacts are saved in one image (which seems to work), and then restored in the other. I would therefore advise against building (and maintaining) a single image, as that should not be necessary here.

The reason you are having problems is that you are missing build stages which inform gitlab about dependencies between the jobs. I would therefore advise you to specify stages as well as their respective jobs in your .gitlab-ci.yml:

stages:
  - do_stats
  - do_compile_pdf

knit_rnw_to_tex:
  stage: do_stats
  image: rocker/verse:4.0.0
  script:
  - Rscript -e "knitr::knit('article.Rnw')"
  artifacts:
    paths:
      - figure/

compile_pdf:
  stage: do_compile_pdf
  image: aergus/latex
  script:
  - ls figure
  - latexmk -pdf -bibtex -use-make article.tex
  artifacts:
    paths:
      - article.pdf

Context:

By default, all artifacts of previous build stages are made available in later stages if you add the corresponding specifications.

If you do not specify any stages, gitlab will put all jobs into the default test stage and execute them in parallel, assuming that they are independent and do not require each others artifacts. It will still store the artifacts but not make them available between the jobs. This is presumably what is causing your problems.

As for the cache: Artifacts are how you pass files between build stages. Caches are for well, caching. In practice, they are used for things like external packages in order to avoid having to download them multiple times, see here. Caches are somewhat unpredictable in situations with multiple different runners. They are only used for performance reasons, and passing files between jobs using cache rather than using the artifact system is a huge anti-pattern.

Edit: I don't know precisely what your knitr setup is, but if you generate an article.tex from your article.Rnw, then you probably need to add that to your artifacts as well.

Also, services are used for things like a MySQL server for testing databases, or the dind (docker in docker) daemon to build docker images. This should not be necessary in your case. Similarly, you should not need to change any runner configuration (in their respective config.toml) from the defaults.

Edit2: I added a MWE here, which works with my gitlab setup.

answered Oct 05 '22 11:10

hfhc2

Related questions
                            
                                Conda install r-essentials with MKL
                            
                                Unable to load Rcpp package
                            
                                From list to data frame with tidyverse, selecting specific list elements
                            
                                Assign group id to sequence of concecutive unique values in timeseries
                            
                                Query to Snowflake database isn't working because no active warehouse is selected
                            
                                Specify the dots argument when calling a tidyselect-using function without needing to specify the preceding arguments
                            
                                Conditionally determining value of column by looking at last group
                            
                                Button to view in full screen
                            
                                "Debug location is approximate because the source is not available" in R 4.0.0 + RStudio
                            
                                An analog to rnorm in python
                            
                                r summarize_if with multiple conditions
                            
                                R devtools unable to install - Ubuntu 20.04 - package or namespace load failed for ‘pkgload’
                            
                                Pause and resume caret training in R
                            
                                Numbered captions on customized and reactive figure in R markdown HTML file
                            
                                Standardize variables using dplyr [r]
                            
                                Fable: Extracting the p,d,q specification from an ARIMA model
                            
                                How to shade shapes
                            
                                Column to nested list separated by /
                            
                                How to select a specific tab in R Markdown?
                            
                                How to plot 'outside' of plotting area using ggplot in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Passing files from a rocker container to a latex container within a gitlab-ci job

Tags:

r

gitlab

latex

gitlab-ci

Paul Rougieux

People also ask

2 Answers

Workflow

The `image` keyword

The `services` keyword

As for your questions:

The builds and cache storage (from documentation)

tukan

Context:

hfhc2

Recent Activity

Donate For Us

Passing files from a rocker container to a latex container within a gitlab-ci job

Tags:

r

gitlab

latex

gitlab-ci

Paul Rougieux

People also ask

2 Answers

Workflow

The image keyword

The services keyword

As for your questions:

The builds and cache storage (from documentation)

tukan

Context:

hfhc2

Related questions

Recent Activity

Donate For Us

The `image` keyword

The `services` keyword