I got a big multipart compressed CSV file using RAR utility (100GB uncompressed, 20GB compressed), so I have 100 RAR file parts, that were uploaded to Google Cloud Storage. I need to extract it to Google Cloud Storage. It would be best if I could use Python on GAE. Any ideas? I don't want to download, extract, and upload. I want to do it all in the cloud.
This question was already answered (and accepted), but for future similar use cases, I would recommend doing this entirely in the cloud by spinning up a tiny Linux instance on GCE, e.g., f1-micro
, and then running the steps as suggested by Marc Cohen in his answer. The instances come with gsutil
preinstalled so it's easy to use. When you're done, just shut down and delete your micro-instance, as your resulting file was already stored in Google Cloud Storage.
Step-by-step instructions:
The benefit here is that instead of downloading to your own computer, you're transferring all the data within Google Cloud itself, so the transfers should be very fast, and do not depend on your own Internet connection speed or consume any of your bandwidth.
Note: network bandwidth is proportional to the size of the VM (in vCPUs), so for faster performance, consider creating a larger VM. Google Compute Engine pricing for VM instances is as follows:
So, for example, given that an n1-standard-1
costs USD $0.05 / hr (as of 8 Oct 2016), 15 minutes of usage will cost USD $0.0125 in total.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With