Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing files from a rocker container to a latex container within a gitlab-ci job

I would like to use Gitlab CI to compile a Latex article as explained in this answer on tex.stackexchange (a similar pdf generation example is shown in the gitlab documentation for artifacts). I use a special latex template given by the journal editor. My Latex article contains figures made with the R statistical software. R and Latex are two large software installations with a lot of dependencies so I decided to use two separate containers for the build, one for the statistical analysis and visualization with R and one to compile a Latex document to pdf.

Here is the content of .gitlab-ci.yml:

knit_rnw_to_tex:
  image: rocker/verse:4.0.0
  script:
  - Rscript -e "knitr::knit('article.Rnw')"
  artifacts:
    paths:
      - figure/

compile_pdf:
  image: aergus/latex
  script:
  - ls figure
  - latexmk -pdf -bibtex -use-make article.tex
  artifacts:
    paths:
      - article.pdf

The knit_rnw_to_tex job executed in the R "rocker" container is successful and I can download the figure artifacts from the gitlab "jobs" page. The issue in the second job compile_pdf is that ls figure shows me an empty folder and the Latex article compilation fails because of missing figures.

  • It should be possible to use artifacts to pass data between jobs according to this answer and to this well explained forum post but they use only one container for different jobs. It doesn't work in my case. Probably because I use two different containers?
  • Another solution would be to use only the rocker/tidyverse container and install latexmk in there, but the installation of apt install latexmk fails for an unknown reason. Maybe because It has over hundred dependencies and that is to much for gitlab-CI?
  • The "dependencies" keyword could help according to that answer, but the artifacts are still not available when I use it.
  • How can I pass the artifacts from one job to the other?
  • Should I use cache as explained in docs.gitlab.com / caching?
like image 517
Paul Rougieux Avatar asked Apr 15 '21 15:04

Paul Rougieux


People also ask

How do I compile a LaTeX document in GitLab?

Compiling LaTeX documents with GitLab CI The basic idea is as follows: Locally on your own computer you edit the LaTeX source, which is for example stored in a file named essay.tex. If you want to compile this source file, you upload it to a server. About half a minute later you can download the finished PDF in your browser.

Can I use Docker with GitLab CI?

GitLab CI in conjunction with GitLab Runner can use Docker Engine to test and build any application. Docker is an open-source project that allows you to use predefined images to run applications in independent “containers” that are run within a single Linux instance.

How to access private container registries in GitLab?

To access private container registries, the GitLab Runner process can use: 1 Statically defined credentials. That is, a username and password for a specific registry. 2 Credentials Store. For more information, see the relevant Docker documentation . 3 Credential Helpers. For more information, see the relevant Docker documentation .

How to compile ConTeXt documents using GitLab CI?

Apart from LaTeX you may want to compile ConTeXt documents using Gitlab CI. That's very easy as well. Simply use the install script provided by ConTeXt standalone (the following CI configurations will download the beta version of ConTeXt).


2 Answers

Thank you for the comment as I wanted to be sure, how you do it. Example would help too, but I'll be generic for now (using docker).

To run multiple containers you need a (The Docker executor)

To quote the documentation on it:

The Docker executor when used with GitLab CI, connects to Docker Engine and runs each build in a separate and isolated container using the predefined image that is set up in .gitlab-ci.yml and in accordance in config.toml.

Workflow

The Docker executor divides the job into multiple steps:

  • Prepare: Create and start the services.
  • Pre-job: Clone, restore cache and download artifacts from previous stages. This is run on a special Docker image.
  • Job: User build. This is run on the user-provided Docker image.
  • Post-job: Create cache, upload artifacts to GitLab. This is run on a special Docker Image.

Your config.toml could look like this:

[runners.docker]
  image = "rocker/verse:4.0.0"
  builds_dir = /home/builds/rocker

[[runners.docker.services]]
  name = "aergus/latex"
  alias = "latex"

From above linked documentation:

The image keyword

The image keyword is the name of the Docker image that is present in the local Docker Engine (list all images with docker images) or any image that can be found at Docker Hub. For more information about images and Docker Hub please read the Docker Fundamentals documentation.

In short, with image we refer to the Docker image, which will be used to create a container on which your build will run.

If you don’t specify the namespace, Docker implies library which includes all official images. That’s why you’ll see many times the library part omitted in .gitlab-ci.yml and config.toml. For example you can define an image like image: ruby:2.6, which is a shortcut for image: library/ruby:2.6.

Then, for each Docker image there are tags, denoting the version of the image. These are defined with a colon (:) after the image name. For example, for Ruby you can see the supported tags at docker hub. If you don’t specify a tag (like image: ruby), latest is implied.

The image you choose to run your build in via image directive must have a working shell in its operating system PATH. Supported shells are sh, bash, and pwsh (since 13.9) for Linux, and PowerShell for Windows. GitLab Runner cannot execute a command using the underlying OS system calls (such as exec).

The services keyword

The services keyword defines just another Docker image that is run during your build and is linked to the Docker image that the image keyword defines. This allows you to access the service image during build time.

The service image can run any application, but the most common use case is to run a database container, e.g., mysql. It’s easier and faster to use an existing image and run it as an additional container than install mysql every time the project is built.

You can see some widely used services examples in the relevant documentation of CI services examples.

If needed, you can assign an alias to each service.

As for your questions:

It should be possible to use artifacts to pass data between jobs according to this answer and to this well explained forum post but they use only one container for different jobs. It doesn't work in my case. Probably because I use two different containers?

The builds and cache storage (from documentation)

The Docker executor by default stores all builds in /builds/<namespace>/<project-name> and all caches in /cache (inside the container). You can overwrite the /builds and /cache directories by defining the builds_dir and cache_dir options under the [[runners]] section in config.toml. This will modify where the data are stored inside the container.

If you modify the /cache storage path, you also need to make sure to mark this directory as persistent by defining it in volumes = ["/my/cache/"] under the [runners.docker] section in config.toml.

  • builds_dir -> Absolute path to a directory where builds are stored in the context of the selected executor. For example, locally, Docker, or SSH.

The [[runners]] section documentation

As you may have noticed I have customized the build_dir in your toml file to /home/builds/rocker, please adjust it to your own path.

How can I pass the artifacts from one job to the other?

You can use the build_dir directive. Second option would to use Job Artifacts API.

Should I use cache as explained in docs.gitlab.com / caching?

Yes, You should use cache to store project dependencies. The advantage is that you fetch the dependencies only once from internet and then subsequent runs are much faster as they can skip this step. Artifacts are used to share results between build stages.

I hope it is now clearer and I have pointed you into right direction.

like image 99
tukan Avatar answered Oct 05 '22 12:10

tukan


The two different images are not the cause of your problems. The artifacts are saved in one image (which seems to work), and then restored in the other. I would therefore advise against building (and maintaining) a single image, as that should not be necessary here.

The reason you are having problems is that you are missing build stages which inform gitlab about dependencies between the jobs. I would therefore advise you to specify stages as well as their respective jobs in your .gitlab-ci.yml:

stages:
  - do_stats
  - do_compile_pdf

knit_rnw_to_tex:
  stage: do_stats
  image: rocker/verse:4.0.0
  script:
  - Rscript -e "knitr::knit('article.Rnw')"
  artifacts:
    paths:
      - figure/

compile_pdf:
  stage: do_compile_pdf
  image: aergus/latex
  script:
  - ls figure
  - latexmk -pdf -bibtex -use-make article.tex
  artifacts:
    paths:
      - article.pdf

Context:

By default, all artifacts of previous build stages are made available in later stages if you add the corresponding specifications.

If you do not specify any stages, gitlab will put all jobs into the default test stage and execute them in parallel, assuming that they are independent and do not require each others artifacts. It will still store the artifacts but not make them available between the jobs. This is presumably what is causing your problems.

As for the cache: Artifacts are how you pass files between build stages. Caches are for well, caching. In practice, they are used for things like external packages in order to avoid having to download them multiple times, see here. Caches are somewhat unpredictable in situations with multiple different runners. They are only used for performance reasons, and passing files between jobs using cache rather than using the artifact system is a huge anti-pattern.

Edit: I don't know precisely what your knitr setup is, but if you generate an article.tex from your article.Rnw, then you probably need to add that to your artifacts as well.

Also, services are used for things like a MySQL server for testing databases, or the dind (docker in docker) daemon to build docker images. This should not be necessary in your case. Similarly, you should not need to change any runner configuration (in their respective config.toml) from the defaults.

Edit2: I added a MWE here, which works with my gitlab setup.

like image 44
hfhc2 Avatar answered Oct 05 '22 11:10

hfhc2