DVC uses git commits to save the experiments and navigate between experiments.
Is it possible to avoid making auto-commits in CI/CD (to save data artifacts after dvc repro
in CI/CD side).
You use dvc commit when an already tracked file changes. If you make a local change to the data, then you would commit the change to the cache before uploading it to remote. You haven't changed your data since it was added, so you can skip the commit step.
A dvc. yaml file is generated. It includes information about the command we want to run ( python src/prepare.py data/data. xml ), its dependencies, and outputs. DVC uses these metafiles to track the data used and produced by the stage, so there's no need to use dvc add on data/prepared manually.
will you make it part of CI pipeline
DVC often serves as a part of MLOps infrastructure. There is a popular blog post about CI/CD for ML where DVC is used under the hood. Another example but with GitLab CI/CD.
scenario where you will integrate dvc commit command with CI pipelines?
If you mean git commit
of DVC files (not dvc commit
) then yes, you need to commit dvc-files into Git during CI/CD process. Auto-commit is not the best practice.
How to avoid Git commit in CI/CD:
repo state --> run results
outside of Git repo (in data remote).Disclaimer: I'm one of the creators of DVC.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With