I get this error when trying to load a table in Google BQ:
Input CSV files are not splittable and at least one of the files is larger than the maximum allowed size. Size is: 56659381010. Max allowed size is: 4294967296.
Is there a way to split the file using gsutil or something like that without having to upload everything again?
The largest compressed CSV file you can load into BigQuery is 4 gigabytes. GCS unfortunately does not provide a way to decompress a compressed file, nor does it provide a way to split a compressed file. GZip'd files can't be arbitrarily split up and reassembled in the way you could a tar file.
I imagine your best bet would likely be to spin up a GCE instance in the same region as your GCS bucket, download your object to that instance (which should be pretty fast, given that it's only a few dozen gigabytes), decompress the object (which will be slower), break that CSV file into a bunch of smaller ones (the linux split command is useful for this), and then upload the objects back up to GCS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With