Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Uploading large files to Google Storage GCE from a Kubernetes pod

We get this error when uploading a large file (more than 10Mb but less than 100Mb):

403 POST https://www.googleapis.com/upload/storage/v1/b/dm-scrapes/o?uploadType=resumable: ('Response headers must contain header', 'location')

Or this error when the file is more than 5Mb

403 POST https://www.googleapis.com/upload/storage/v1/b/dm-scrapes/o?uploadType=multipart: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>)

It seems that this API is looking at the file size and trying to upload it via multi part or resumable method. I can't imagine that is something that as a caller of this API I should be concerned with. Is the problem somehow related to permissions? Does the bucket need special permission do it can accept multipart or resumable upload.

from google.cloud import storage

try:
    client = storage.Client()
    bucket = client.get_bucket('my-bucket')
    blob = bucket.blob('blob-name')
    blob.upload_from_filename(zip_path, content_type='application/gzip')

except Exception as e:
    print(f'Error in uploading {zip_path}')
    print(e)

We run this inside a Kubernetes pod so the permissions get picked up by storage.Client() call automatically.

We already tried these:

  • Can't upload with gsutil because the container is Python 3 and gsutil does not run in python 3.

  • Tried this example: but runs into the same error: ('Response headers must contain header', 'location')

  • There is also this library. But it is basically alpha quality with little activity and no commits for a year.

  • Upgraded to google-cloud-storage==1.13.0

Thanks in advance

like image 712
David Dehghan Avatar asked Oct 14 '18 11:10

David Dehghan


People also ask

How do I download a Google VM?

To download or upload a file using SSH click the SSH button next to your VM Instance to open a terminal in your web browser. Click the top right gear icon in your terminal window to upload or download a file directly.


2 Answers

The problem was indeed the credentials. Somehow the error message was very miss-leading. When we loaded the credentials explicitly the problem went away.

 # Explicitly use service account credentials by specifying the private key file.
 storage_client = storage.Client.from_service_account_json(
        'service_account.json')
like image 130
David Dehghan Avatar answered Oct 13 '22 00:10

David Dehghan


I found my node pools had been spec'd with

    oauthScopes:
    - https://www.googleapis.com/auth/devstorage.read_only

and changing it to

    oauthScopes:
    - https://www.googleapis.com/auth/devstorage.full_control

fixed the error. As described in this issue the problem is an uninformative error message.

like image 36
Andy Jones Avatar answered Oct 12 '22 23:10

Andy Jones