Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

can't upload > ~2GB to Google Cloud Storage

Trace below.

The relevant Python snippet:

bucket = _get_bucket(location['bucket'])
blob = bucket.blob(location['path'])
blob.upload_from_filename(source_path)

Which ultimately triggers (from the ssl library):

OverflowError: string longer than 2147483647 bytes

I assume there is some special configuration option I'm missing?

This is possibly related to this ~1.5yr old apparently still-open issue: https://github.com/googledatalab/datalab/issues/784.

Help appreciated!

Full trace:

[File "/usr/src/app/gcloud/download_data.py", line 109, in ******* blob.upload_from_filename(source_path)

File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 992, in upload_from_filename size=total_bytes)

File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 946, in upload_from_file client, file_obj, content_type, size, num_retries)

File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 867, in _do_upload client, stream, content_type, size, num_retries)

File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 700, in _do_multipart_upload transport, data, object_metadata, content_type)

File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/upload.py", line 97, in transmit retry_strategy=self._retry_strategy)

File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/_helpers.py", line 101, in http_request func, RequestsMixin._get_status_code, retry_strategy)

File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry response = func()

File "/usr/local/lib/python3.5/dist-packages/google/auth/transport/requests.py", line 186, in request method, url, data=data, headers=request_headers, **kwargs)

File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 508, in request resp = self.send(prep, **send_kwargs)

File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 618, in send r = adapter.send(request, **kwargs)

File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 440, in send timeout=timeout

File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 601, in urlopen chunked=chunked)

File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 357, in _make_request conn.request(method, url, **httplib_request_kw)

File "/usr/lib/python3.5/http/client.py", line 1106, in request self._send_request(method, url, body, headers)

File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request self.endheaders(body)

File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders self._send_output(message_body)

File "/usr/lib/python3.5/http/client.py", line 936, in _send_output self.send(message_body)

File "/usr/lib/python3.5/http/client.py", line 908, in send self.sock.sendall(data)

File "/usr/lib/python3.5/ssl.py", line 891, in sendall v = self.send(data[count:])

File "/usr/lib/python3.5/ssl.py", line 861, in send return self._sslobj.write(data)

File "/usr/lib/python3.5/ssl.py", line 586, in write return self._sslobj.write(data)

OverflowError: string longer than 2147483647 bytes

like image 708
severian Avatar asked Dec 02 '17 16:12

severian


1 Answers

The issue is it is attempting to read the entire file into memory. Following the chain from upload_from_filename shows that it stats the file and then passes that in as the upload size as a single upload part.

Instead, specifying a chunk_size when creating the object will trigger it to upload in multiple parts:

# Must be a multiple of 256KB per docstring    
CHUNK_SIZE = 10485760  # 10MB
blob = bucket.blob(location['path'], chunk_size=CHUNK_SIZE)

Happy Hacking!

like image 200
jkoelker Avatar answered Oct 26 '22 04:10

jkoelker