Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TextIO. Read multiple files from GCS using pattern {}

I tried using the following

TextIO.Read.from("gs://xyz.abc/xxx_{2017-06-06,2017-06-06}.csv")

That pattern didn't work, as I get

java.lang.IllegalStateException: Unable to find any files matching StaticValueProvider{value=gs://xyz.abc/xxx_{2017-06-06,2017-06-06}.csv}

Even though those 2 files do exist. And I tried with a local file using a similar expression

TextIO.Read.from("somefolder/xxx_{2017-06-06,2017-06-06}.csv")

And that did work just fine.

I would've thought there would be support for all kinds of globs for files in GCS, but nope. Why is that? is there away to accomplish what I'm looking for?

like image 298
CCC Avatar asked Jan 04 '23 22:01

CCC


1 Answers

This may be another option, in addition to Scott's suggestion and your comment on his answer:

You can define a list with the paths you want to read and then iterate over it, creating a number of PCollections in the usual way:

PCollection<String> events1 = p.apply(TextIO.Read.from(path1));
PCollection<String> events2 = p.apply(TextIO.Read.from(path2));

Then create a PCollectionList:

PCollectionList<String> eventsList = PCollectionList.of(events1).and(events2);

And then flatten this list into your PCollection for your main input:

PCollection<String> events = eventsList.apply(Flatten.pCollections());

like image 159
Matthias Baetens Avatar answered Jan 18 '23 03:01

Matthias Baetens