Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to download a file from the internet to a Google Cloud bucket directly

I want to download a file over 20GB from the internet into a google cloud bucket directly. Just like doing in a local command line the following:

wget http://some.url.com/some/file.tar 

I refuse to download the file to my own computer and then copying the file to the bucket using:

gsutil cp file.tar gs://the-bucket/

For the moment I am trying (just at this very moment) to use datalab to download the file and then copying the file from there to the bucket.

like image 288
bsaldivar Avatar asked Jul 24 '19 15:07

bsaldivar


People also ask

Can you download directly to cloud storage?

Downloading directly to cloud means to download files to online cloud drive storage via web link or URL which indicates where the source files locates. The web link is usually generated while the owner of the file share it to others, maybe his families, friends, colleagues and so on.

Can I upload files to Google Cloud Storage from URL?

It is not possible to upload a file to Google Cloud Storage directly from an URL. Since you are running the script from a local environment, the file contents that you want to upload, need to be in that same environment. This means that the contents of the url need to either be stored in the memory, or in a file.

How do I upload a file to Google bucket?

In the Google Cloud console, go to the Cloud Storage Buckets page. In the list of buckets, click on the name of the bucket that you want to upload an object to. In the Objects tab for the bucket, either: Drag and drop the desired files from your desktop or file manager to the main pane in the console.


1 Answers

A capability of the Google Cloud Platform as it relates to Google Cloud Storage is the functional area known as "Storage Transfer Service". The documentation for this is available here.

At the highest level, this capability allows you to define a source of data that is external to Google such as data available as a URL or on AWS S3 storage and then schedule that to be copied to Google Cloud Storage in the background. This function seems to perform the task you want ... the data is copied from an Internet source to GCS directly.


A completely different story would be the realization that GCP itself provides compute capabilities. What this means is that you can run your own logic on GCP through simple mechanisms such as a VM, Cloud Functions or Cloud Run. This helps us in this story by realizing that we could execute our code to download the Internet based data from within GCP itself to a local temp file. This file could then be uploaded into GCS from within GCP. At no time did the data that will end up in GCP ever go anywhere than from the source to Google. Once retrieved from the source, the transfer rate of the data from the GCP compute to GCS storage should be optimal as it is passing exclusively over Googles internal ultra high speed networks.

like image 172
Kolban Avatar answered Sep 29 '22 06:09

Kolban