Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract RAR files from Google Cloud Storage

I got a big multipart compressed CSV file using RAR utility (100GB uncompressed, 20GB compressed), so I have 100 RAR file parts, that were uploaded to Google Cloud Storage. I need to extract it to Google Cloud Storage. It would be best if I could use Python on GAE. Any ideas? I don't want to download, extract, and upload. I want to do it all in the cloud.

like image 456
user1516770 Avatar asked Dec 20 '22 14:12

user1516770


1 Answers

This question was already answered (and accepted), but for future similar use cases, I would recommend doing this entirely in the cloud by spinning up a tiny Linux instance on GCE, e.g., f1-micro, and then running the steps as suggested by Marc Cohen in his answer. The instances come with gsutil preinstalled so it's easy to use. When you're done, just shut down and delete your micro-instance, as your resulting file was already stored in Google Cloud Storage.

Step-by-step instructions:

  1. Create a Google Compute Engine VM instance
  2. SSH to the instance
  3. Follow the instructions in the other answer

The benefit here is that instead of downloading to your own computer, you're transferring all the data within Google Cloud itself, so the transfers should be very fast, and do not depend on your own Internet connection speed or consume any of your bandwidth.


Note: network bandwidth is proportional to the size of the VM (in vCPUs), so for faster performance, consider creating a larger VM. Google Compute Engine pricing for VM instances is as follows:

  1. minimum 10 minutes
  2. rounded up to the nearest minute

So, for example, given that an n1-standard-1 costs USD $0.05 / hr (as of 8 Oct 2016), 15 minutes of usage will cost USD $0.0125 in total.

like image 158
Misha Brukman Avatar answered Apr 25 '23 16:04

Misha Brukman