I am using DVC for data version control in machine learning projects. Typically, switching between versions of data is managed to done by checkout git branches, commits, or tags to get appropriate *.dvc files that represent data checksum, then run dvc checkout to update data, for example:
git checkout ddc5c395b2afb2b2a626c62ef63a2c7d85382aa6 # to rollback to an old version of *.dvc files
dvc checkout mydata.dvc # to roll `mydata` back to the previous version
I now want to use DVC and switch between data versions without using git, what i am expecting is somethings like following:
dvc checkout mydata.dvc --tag v1.0
Could someone please guide me to use dvc in such a way? Thank you for any help.
To follow up on @omessor's comment, there are Python packages that allow you to programmatically work with a git repo (without using CLI git). DVC itself uses both dulwich and pygit2 via scmrepo.
You could actually do what you are looking for directly through DVC's internal API like
from dvc.repo import Repo
dvc = Repo("path/to/your/repo")
dvc.scm.checkout("tags/v1.0") # git checkout tags/v1.0
dvc.checkout("mydata.dvc") # dvc checkout mydata.dvc
This would only require installing DVC via pip or conda, and does not require a CLI git installation.
Just note that these API's aren't publicly documented, so you may need to take a look at the DVC and scmrepo source to see how it works
https://github.com/iterative/dvc/blob/main/dvc/scm.py
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With