Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jenkins Pipeline: scm checkout shallow copy fails

I am using Jenkins file to build a pipeline. I am trying to clone the reference repository using DSL like below.

checkout(
[$class: 'GitSCM', branches: [[name: '*/master']], doGenerateSubmoduleConfigurations: false, 
extensions: [[$class: 'CloneOption', depth: 1, noTags: false, reference: '', shallow: true]], 
submoduleCfg: [], 
userRemoteConfigs: [[url: '[email protected]:user_team/infrastructure-as-code.git']])

and while the pipeline is being executed, it is being translated to this

git fetch --tags --progress [email protected]:userteam/infrastructure-as-code.git +refs/heads/*:refs/remotes/origin/* --depth=1

This clones the whole repository on my Jenkins server. I just want to obtain a shallow copy of my repo so that I could save my Jenkins server from space crunch. Please help here.

I am using: Jenkins version: 2.58,

Plugins:

Pipeline SCM Step: 2.4

Git: 3.3.0

like image 563
Abhishek Lodha Avatar asked Feb 12 '19 08:02

Abhishek Lodha


People also ask

What is scm checkout retry count Jenkins?

Description. In a pipeline scm checkout, when using non-lightweight checkout, and the global setting "SCM checkout retry count" is a non-zero value, if a build is performing the initial scm clone and the build is cancelled, the retry will relaunch the scm step as per the retry count.

What does Jenkins checkout scm do?

The checkout step will checkout code from source control; scm is a special variable which instructs the checkout step to clone the specific revision which triggered this Pipeline run.

What is lightweight checkout in Jenkins?

The Jenkins Pipeline plugin has a feature known as "lightweight checkout", where the master only pulls the Jenkinsfile from the repo, as opposed to the entire repo. There's a corresponding checkbox in the configuration screen.

What is scm in Jenkins pipeline?

In Jenkins, SCM stands for "Source Code Management". This option instructs Jenkins to obtain your Pipeline from Source Control Management (SCM), which will be your locally cloned Git repository.


2 Answers

I think you are misunderstanding the meaning of shallow clone.
Shallow clone will still clone the entire repository.
The difference will be that history will be truncated to the specified number of commits (in your case 1, since you have mentioned depth to be one.) It can save you a lot of space and time.

For more information please follow this link: git-clone#Documentation

For instance, see the below image where I am cloning same repository ( https://github.com/spring-cloud/spring-cloud-config.git) 2 times, one without depth and one with depth=1. In first case, the local repository size is 40 MB and with depth the local repository size is mere 3.4 MB.

shallow clone

like image 178
Swati Kp Avatar answered Oct 21 '22 04:10

Swati Kp


I would recommend to check https://issues.jenkins-ci.org/browse/JENKINS-43878 for better understanding. The reporter of this ticket compares the duration of clone+checkout process in 3 cases: non-shallow clone with git command, shallow clone with pipeline and shallow clone(depth=1) with git command, and the ticket reporter complains that case #2 lasts much longer than case #3.

I exercised with the repo https://github.com/tesseract-ocr/tessdata (~5 GB) and I could not reproduce the duration difference. But I observed the big size difference. These are my measurements:

  1. Full clone with pipeline: 8 min, total size 4615 MB, "fetch size" 3256 MB.
  2. Full clone with "git clone": 8 min, total size 4615 MB.
  3. Shallow clone(depth=1) with pipeline: 4-5 min, total size 3121 MB, "fetch size" 1762 MB
  4. Shallow clone(depth=1) with "git clone": 4-5 min, total size 1995 MB.

(the "fetch" size in my comparison is the size of the directory which I measured with "du -ms" at the moment after "git fetch" and before "git checkout" when it was done with the help of Jenkins pipeline)

If you compare cases 3 and 4 you will see that for shallow clone the pipeline (that is "fetch+checkout") approach leads to more disk space occupation than for the normal "clone".

The pipeline maintainers agreed with this fact, but closed the ticket with "Won't fix", because they don't want to change the way of working from "fetch+checkout" to "clone" for the plugin due to other reasons.

This is exactly answer to your question why don't you see big difference between shallow and full clone for Jenkins pipeline: because Jenkins pipeline uses "fetch+checkout" approach which in case of --depth works differently than "clone" and downloads more data than "clone".

If you need a normal "clone --depth" it should be run as a shell command from the pipeline script. On my opinion it is a disadvantage of Jenkins pipeline.

like image 32
Alexander Samoylov Avatar answered Oct 21 '22 02:10

Alexander Samoylov