Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do I split a large csv.gz file in Google Cloud Storage?

I get this error when trying to load a table in Google BQ:

Input CSV files are not splittable and at least one of the files is larger than the maximum allowed size. Size is: 56659381010. Max allowed size is: 4294967296.

Is there a way to split the file using gsutil or something like that without having to upload everything again?

like image 871
Dervin Thunk Avatar asked Dec 31 '25 17:12

Dervin Thunk


1 Answers

The largest compressed CSV file you can load into BigQuery is 4 gigabytes. GCS unfortunately does not provide a way to decompress a compressed file, nor does it provide a way to split a compressed file. GZip'd files can't be arbitrarily split up and reassembled in the way you could a tar file.

I imagine your best bet would likely be to spin up a GCE instance in the same region as your GCS bucket, download your object to that instance (which should be pretty fast, given that it's only a few dozen gigabytes), decompress the object (which will be slower), break that CSV file into a bunch of smaller ones (the linux split command is useful for this), and then upload the objects back up to GCS.

like image 116
Brandon Yarbrough Avatar answered Jan 04 '26 21:01

Brandon Yarbrough



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!