I am new to KubeFlow and trying to port / adapt an existing solution to run in KubeFlow pipelines. The issue I am solving now is that the existing solution shared data via a mounted volume. I know this is not the best practice for components exchanging data in KubeFlow however this will be a temporary proof of concept and I have no other choice.
I am facing issues with accessing an existing Volume from the pipeline. I am basically running the code from KubeFlow documentation here, but pointing to an existing K8S Vo
def volume_op_dag():
vop = dsl.VolumeOp(
name="shared-cache",
resource_name="shared-cache",
size="5Gi",
modes=dsl.VOLUME_MODE_RWO
)
The Volume shared-cache exists:

However when I run the pipeline a new volume is created:

What am I doing wrong? I obviously don't want to create a new volume every time I run the pipeline but instead mount an existing one.
Edit: Adding KubeFlow versions:
Have a look at the function kfp.onperm.mount_pvc. You can find values for the arguments pvc_name and volume_name via the console command
kubectl -n <your-namespace> get pvc.
The way you use it is by writing the component as if the volume is already mounted and following the example from the doc when binding it in the pipeline:
train = train_op(...)
train.apply(mount_pvc('claim-name', 'pipeline', '/mnt/pipeline'))
Also note, that both the volume and the pipeline must be in the same namespace.
You can use already existing volume using the following step:
volume_name = 'already_existing_volume name'
#Instead of using this (which every time creates a new volume),
task = create_step_prepare_data().add_pvolumes({data_path: vop.volume})
# use this (just by adding dsl.PipelineVolume(pvc=volume_name))
task = create_step_prepare_data().add_pvolumes({data_path: dsl.PipelineVolume(pvc=volume_name)})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With