Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use different remotes for different folders?

Tags:

dvc

I want my data and models stored in separate Google Cloud buckets. The idea is that I want to be able to share the data with others without sharing the models.

One idea I can think of is using separate git submodules for data and models. But that feels cumbersome and imposes some additional requirements from the end user (e.g. having to do git submodule update).

So can I do this without using git submodules?

like image 685
Michael Litvin Avatar asked Nov 20 '19 11:11

Michael Litvin


2 Answers

You can first add the different DVC remotes you want to establish (let's say you call them data and models, each one pointing to a different GC bucket). But don't set any remote as the project's default; This way, dvc push won't work without the -r (or --remote) option.

You would then need to push each directory or file individually to the appropriate remote, like dvc push data/ -r data and dvc push model.dat -r models.

Note that a feature request to configure this exists on the DVC repo too. See Specify file types that can be pushed to remote.

like image 189
Jorge Orpinel Pérez Avatar answered Dec 15 '22 00:12

Jorge Orpinel Pérez


Yes, you can use multiple remotes without Git-submodules.

There is a separate command for using data artifacts from external repositories: dvc import http://your-repo datadir The command brings data to your repo and keeps the connection to the original repo (to avoid data duplication in different remotes).

In your case, one repository can be used for a dataset with its own data remote. A second repo might be used for the code and models which imports the dataset project while all it's models and outputs go to another data remote.

With import, no dvc push -r myremote are needed. A default dvc push synchronize data in a proper remote.

EDITED: Simply use one Git repo for dataset with its data-remote/S3-folder, and import it from another repo with code, model and another data-remote/S3-folder.

like image 24
Dmitry Petrov Avatar answered Dec 14 '22 23:12

Dmitry Petrov