I am trying to migrate an AWS Lambda
function written in Python
to CF that
The output is > 2GB - but slightly less than 3GB so it fits in Lambda
, just.
Well, it seems impossible or way more involved in GCP
:
/tmp
- limited to 2048MB as of writing this - so Python Client lib upload_from_file
(or _filename
) cannot be usedboto
, a library initially designed for AWS S3
, and a quite outdated one since boto3
is out for some time. No genuine GCP
method to stream write or readcreateWriteStream()
- nice article here btw - but no equivalent one-liner in PythonGCS
was a local filesystem. This is not limited to Cloud Functions
and a lacking feature of the Python Client library, but it is more acute in CF due the resource constraints. Btw, I was part of a discussion to add a writeable IOBase function but it had no traction.DataFlow
are out of question for the task at hand.In my mind, stream (or stream-like) reading/writing from cloud-based storage should even be included in the Python standard library.
As recommended back then, one can still use GCSFS, which behind the scenes commits the upload in chunks for you while you are writing stuff to a FileObj.
The same team wrote s3fs
. I don't know for Azure.
AFAIC, I will stick to AWS Lambda
as the output can fit in memory - for now - but multipart upload is the way to go to support any output size with a minimum of memory.
Thoughts or alternatives ?
I got confused with multipart
vs. resumable
upload. The latter is what you need for "streaming" - it's actually more like uploading chunks of a buffered stream.
Multipart
upload is to load data and custom metadata at once, in the same API call.
While I like GCSFS very much - Martin, his main contributor is very responsive -, I recently found an alternative that uses the google-resumable-media
library.
GCSFS
is built upon the core http API whereas Seth's solution uses a low-level library maintained by Google, more in sync with API changes and which includes exponential backup. The latter is really a must for large/long stream as connection may drop, even within GCP
- we faced the issue with GCF
.
On a closing note, I still believe that the Google Cloud Library is the right place to add stream-like functionality, with basic write
and read
. It has the core code already.
If you too are interested in that feature in the core lib, thumbs up the issue here - assuming priority is based thereon.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With