Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

git-ignore dvc.lock in repositories where only the DVC pipelines are used

Tags:

dvc

I want to use the pipeline functionality of dvc in a git repository. The data is managed otherwise and should not be versioned by dvc. The only functionality which is needed is that dvc reproduces the needed steps of the pipeline when dvc repro is called. Checking out the repository on a new system should lead to an 'empty' repository, where none of the pipeline steps are stored.

Thus, - if I understand correctly - there is no need to track the dvc.lock file in the repository. However, adding dvc.lock to the .gitginore file leads to an error message:

ERROR: 'dvc.lock' is git-ignored.

Is there any way to disable the dvc.lock in .gitignore check for this usecase?

like image 319
ppmt Avatar asked Oct 27 '25 14:10

ppmt


1 Answers

This is definitely possible, as DVC features are loosely coupled to one another. You can do pipelining by writing your dvc.yaml file(s), but avoid data management/versioning by using cache: false in the stage outputs (outs field). See also helper dvc stage add -O (big O, alias of --outs-no-cache).

And the same for initial data dependencies, you can dvc add --no-commit them (ref).

You do want to track dvc.lock in Git though, so that DVC can determine the latest stage of the pipeline associated with the Git commit in every repo copy or branch.

You'll be responsible for placing the right data files/dirs (matching .dvc files and dvc.lock) in the workspace for dvc repro or dvc exp run to behave as expected. dvc checkout won't be able to help you.

like image 179
Jorge Orpinel Pérez Avatar answered Oct 30 '25 13:10

Jorge Orpinel Pérez