I'm attempting to move our large FPGA build into a Jenkins CI environment, but the build hangs at the end of synthesis when run in a Docker container spawned by Jenkins.
I've attempted to replicate the environment that Jenkins is creating, but when I spawn a Docker container myself, there's no issue with the build.
I've tried:
-nolog -nojournal
options on the vivado
commands to remove any log file collisionsI also have an extremely small build that makes it through the entire build process in Jenkins with no issue, so I don't think there is a fundamental flaw with my Docker containers.
agent {
docker {
image "vivado:2017.4"
args """
-v <MOUNT XILINX LICENSE FILE>
--dns <DNS_ADDRESS>
--mac-address <MAC_ADDRESS>
"""
}
}
steps {
sh "chmod -R 777 ."
dir(path: "${params.root_dir}") {
timeout(time: 15, unit: 'MINUTES') {
// Create HLS IP for use in Vivado project
sh './run_hls.sh'
}
timeout(time: 20, unit: 'MINUTES') {
// Create vivado project, add sources, constraints, HLS IP, generated IP
sh 'source source_vivado.sh && vivado -mode batch -source tcl/setup_proj.tcl'
}
timeout(time: 20, unit: 'MINUTES') {
// Create block designs from TCL scripts
sh 'source source_vivado.sh && vivado -mode batch -source tcl/run_bd.tcl'
}
timeout(time: 1, unit: 'HOURS') {
// Synthesize complete project
sh 'source source_vivado.sh && vivado -mode batch -source tcl/run_synth.tcl'
}
}
}
This code block below was running 1 job with a 12 hour timeout. You can see that Synthesis finished, then a timeout occurred 8 hours later.
[2019-04-17T00:30:06.131Z] Finished Writing Synthesis Report : Time (s): cpu = 00:01:53 ; elapsed = 00:03:03 . Memory (MB): peak = 3288.852 ; gain = 1750.379 ; free physical = 332 ; free virtual = 28594
[2019-04-17T00:30:06.131Z] ---------------------------------------------------------------------------------
[2019-04-17T00:30:06.131Z] Synthesis finished with 0 errors, 0 critical warnings and 671 warnings.
[2019-04-17T08:38:37.742Z] Sending interrupt signal to process
[2019-04-17T08:38:43.013Z] Terminated
[2019-04-17T08:38:43.013Z]
[2019-04-17T08:38:43.013Z] Session terminated, killing shell... ...killed.
[2019-04-17T08:38:43.013Z] script returned exit code 143
Running the same commands in locally spawned Docker containers has no issues whatsoever. Unfortunately, the timeout
Jenkins step doesn't appear to flush open buffers, as my post:unsuccesful
step that prints out all log files doesn't find synth_1
, though I wouldn't expect there to be anything different from the Jenkins capture.
Are there any known issues with Jenkins/Vivado integration? Is there a way to enter a Jenkins spawned container so I can try and duplicate what I'm expecting vs what I'm experiencing?
EDIT: I've since added in a timeout in the actual tcl scripts to move past the wait_on_runs
command used in run_synth.tcl
, but now I'm experiencing the same hanging behavior during implementation.
The problem lies in the way vivado deals (or doesn't deal...) with its forked processes. Specifically I think this applies to the parallel synthesis. This is maybe, why you only see it in some of your projects. In the state you describe above (stuck after "Synthesis finished") I noticed a couple of abandoned zombie processes of vivado. To my understanding these are child processes which ended, but the parent didn't collect the status before ending themselves. Tracing with strace
even reveals that vivado tries to kill these processes:
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
kill(319, SIG_0) = 0
kill(370, SIG_0) = 0
kill(422, SIG_0) = 0
kill(474, SIG_0) = 0
nanosleep({tv_sec=5, tv_nsec=0}, 0x7f86edcf4dd0) = 0
kill(319, SIG_0) = 0
kill(370, SIG_0) = 0
kill(422, SIG_0) = 0
kill(474, SIG_0) = 0
nanosleep({tv_sec=5, tv_nsec=0}, <detached ...>
But (as we all know) you can't kill zombies, they are already dead...
Normally these processes would be adopted by the init process and handled there. But in the case of Jenkins Pipeline in Docker there is no init by default. The pipeline spawns the container and runs cat
with no inputs to keep it alive. This way cat
becomes pid 1 and takes the abandoned children of vivado. cat of course doesn't know what do do with them and ignores them (a tragedy really).
cat,1
|-(sh,16)
|-sh,30 -c ...
| |-sh,31 -c ...
| | `-sleep,5913 3
| `-sh,32 -xe /home/user/.jenkins/workspace...
| `-sh,35 -xe /home/user/.jenkins/workspace...
| `-vivado,36 /opt/Xilinx/Vivado/2019.2/bin/vivado -mode tcl ...
| `-loader,60 /opt/Xilinx/Vivado/2019.2/bin/loader -exec vivado -mode tcl ...
| `-vivado,82 -mode tcl ...
| |-{vivado},84
| |-{vivado},85
| |-{vivado},111
| |-{vivado},118
| `-{vivado},564
|-(vivado,319)
|-(vivado,370)
|-(vivado,422)
`-(vivado,474)
Luckily there is a way to have an init process in the docker container. Passing the --init
argument with the docker run
solves the problem for me.
agent {
docker {
image 'vivado:2019.2'
args '--init'
}
}
This creates the init process vivado seems to rely on and the build runs without problems.
Hope this helps you!
Cheers!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With