Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge more than 32 files in Google Cloud Storage

I have an Apache Spark script running on Google Compute Engine which has for output a Google Cloud Storage. I have more than 300 part-00XXX files in my Cloud Storage folder. I would like to merge them.

I tried:

poiuytrez@spark-m:~$ gsutil compose gs://mybucket/data/* gs://mybucket/myfile.csv

But I got this error:

CommandException: "compose" called with too many component objects. Limit is 32.

Any ideas of a solution to merge all theses part files?

like image 364
poiuytrez Avatar asked Oct 03 '14 12:10

poiuytrez


1 Answers

You can only compose 32 objects in a single request, but a composite object may have up to 1024 components. In particular, you could compose objects 0-31 into some object 0', 32-63 into 1', etc. - then each of those composite objects may be composed again by composing (0', 1',..., floor(300/32)').

like image 195
Zach Wilt Avatar answered Oct 19 '22 05:10

Zach Wilt