Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kedro: How to pass multiple same data from a directory as a node input?

Tags:

python

kedro

I have a directory with multiple files for the same data format (1 file per day). It's like one data split into multiple files.

Is it possible to pass all the files to A Kedro node without specifying each file? So they all get processed sequentially or in parallel based on the runner?

like image 281
921kiyo Avatar asked Oct 18 '25 01:10

921kiyo


1 Answers

  1. If the number of files is small and fixed, you may consider creating those preprocessing pipeline for each of them manually.
  2. If the number of files is large/dynamic, you may create your pipeline definition programmatically for each of them, adding them all together afterwards. Same would probably apply to programmatic creation of the required datasets.
  3. Alternative option would be to read all the files once in the first node, concatenate them all together into one dataset, and make all consecutive preproc nodes use that dataset (or its derivatives) as inputs
like image 98
921kiyo Avatar answered Oct 20 '25 16:10

921kiyo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!