I have an Apache Spark script running on Google Compute Engine which has for output a Google Cloud Storage. I have more than 300 part-00XXX files in my Cloud Storage folder. I would like to merge them.
I tried:
poiuytrez@spark-m:~$ gsutil compose gs://mybucket/data/* gs://mybucket/myfile.csv
But I got this error:
CommandException: "compose" called with too many component objects. Limit is 32.
Any ideas of a solution to merge all theses part files?
You can only compose 32 objects in a single request, but a composite object may have up to 1024 components. In particular, you could compose objects 0-31 into some object 0', 32-63 into 1', etc. - then each of those composite objects may be composed again by composing (0', 1',..., floor(300/32)').
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With