I am trying to migrate an AWS Lambda function written in Python to CF that 
The output is > 2GB - but slightly less than 3GB so it fits in Lambda, just.
Well, it seems impossible or way more involved in GCP:
/tmp - limited to 2048MB as of writing this - so Python Client lib upload_from_file (or _filename) cannot be usedboto, a library initially designed for AWS S3, and a quite outdated one since boto3 is out for some time. No genuine GCP method to stream write or readcreateWriteStream() - nice article here btw - but no equivalent one-liner in PythonGCS was a local filesystem. This is not limited to Cloud Functions and a lacking feature of the Python Client library, but it is more acute in CF due the resource constraints. Btw, I was part of a discussion to add a writeable IOBase function but it had no traction.DataFlow are out of question for the task at hand.In my mind, stream (or stream-like) reading/writing from cloud-based storage should even be included in the Python standard library.
As recommended back then, one can still use GCSFS, which behind the scenes commits the upload in chunks for you while you are writing stuff to a FileObj.
The same team wrote s3fs. I don't know for Azure.
AFAIC, I will stick to AWS Lambda as the output can fit in memory - for now - but multipart upload is the way to go to support any output size with a minimum of memory.
Thoughts or alternatives ?
I got confused with multipart vs. resumable upload. The latter is what you need for "streaming" - it's actually more like uploading chunks of a buffered stream. 
Multipart upload is to load data and custom metadata at once, in the same API call.
While I like GCSFS very much - Martin, his main contributor is very responsive -, I recently found an alternative that uses the google-resumable-media library. 
GCSFS is built upon the core http API whereas Seth's solution uses a low-level library maintained by Google, more in sync with API changes and which includes exponential backup. The latter is really a must for large/long stream as connection may drop, even within GCP - we faced the issue with GCF.
On a closing note, I still believe that the Google Cloud Library is the right place to add stream-like functionality, with basic write and read. It has the core code already.
If you too are interested in that feature in the core lib, thumbs up the issue here - assuming priority is based thereon.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With