Trace below.
The relevant Python snippet:
bucket = _get_bucket(location['bucket'])
blob = bucket.blob(location['path'])
blob.upload_from_filename(source_path)
Which ultimately triggers (from the ssl library):
OverflowError: string longer than 2147483647 bytes
I assume there is some special configuration option I'm missing?
This is possibly related to this ~1.5yr old apparently still-open issue: https://github.com/googledatalab/datalab/issues/784.
Help appreciated!
Full trace:
[File "/usr/src/app/gcloud/download_data.py", line 109, in ******* blob.upload_from_filename(source_path)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 992, in upload_from_filename size=total_bytes)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 946, in upload_from_file client, file_obj, content_type, size, num_retries)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 867, in _do_upload client, stream, content_type, size, num_retries)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 700, in _do_multipart_upload transport, data, object_metadata, content_type)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/upload.py", line 97, in transmit retry_strategy=self._retry_strategy)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/_helpers.py", line 101, in http_request func, RequestsMixin._get_status_code, retry_strategy)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry response = func()
File "/usr/local/lib/python3.5/dist-packages/google/auth/transport/requests.py", line 186, in request method, url, data=data, headers=request_headers, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 508, in request resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 618, in send r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 440, in send timeout=timeout
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 601, in urlopen chunked=chunked)
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 357, in _make_request conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.5/http/client.py", line 1106, in request self._send_request(method, url, body, headers)
File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request self.endheaders(body)
File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders self._send_output(message_body)
File "/usr/lib/python3.5/http/client.py", line 936, in _send_output self.send(message_body)
File "/usr/lib/python3.5/http/client.py", line 908, in send self.sock.sendall(data)
File "/usr/lib/python3.5/ssl.py", line 891, in sendall v = self.send(data[count:])
File "/usr/lib/python3.5/ssl.py", line 861, in send return self._sslobj.write(data)
File "/usr/lib/python3.5/ssl.py", line 586, in write return self._sslobj.write(data)
OverflowError: string longer than 2147483647 bytes
The issue is it is attempting to read the entire file into memory. Following the chain from upload_from_filename
shows that it stats
the file and then passes that in as the upload size as a single upload part.
Instead, specifying a chunk_size
when creating the object will trigger it to upload in multiple parts:
# Must be a multiple of 256KB per docstring
CHUNK_SIZE = 10485760 # 10MB
blob = bucket.blob(location['path'], chunk_size=CHUNK_SIZE)
Happy Hacking!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With