According to the documentation for Google Cloud Storage, there are a few limitations on using gsutil compose
(see below).
Is there a more efficient way to combine a large number of files in the same bucket (~1 million)?
If I understand correctly, I would have to join groups of 32, then keep doing that and joining again?
Note that there is a limit (currently 32) to the number of components that can be composed in a single operation.
There is a limit (currently 1024) to the total number of components for a given composite object. This means you can append to each object at most 1023 times.
There is a per-project rate limit (currently 200) to the number of components you can compose per second. This rate counts both the components being appended to a composite object as well as the components being copied when the composite object of which they are a part is copied.
GCS no longer enforces a component count limit. You can combine 1 million files as long as the newly created object is <= 5 TiB. You still have to join the files in groups of 32 by composing recursively, as documented here.
A simple way to do this in serial is by appending to a single object by repeatedly overwriting it. For example:
Since the compose per-project rate limit has also been lifted, you can also do this in parallel by composing in batches to temporary objects, then deleting the temporary objects.
The only caveat is that the componentCount
metadata property saturates at 2,147,483,647, even if the object has > 2,147,483,647 components. If you don't depend on componentCount
being accurate, then this should not be a problem, since componentCount
does not affect whether compose succeeds or not.
Unfortunately, combining groups of 32 over and over again won't work, due to the "grand total" components limit of 1024.
Instead, what you'd have to do is this:
Much of this work can be done in parallel, which would greatly speed things up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With