Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading Data From Cloud Storage Via Cloud Functions

I am trying to do a quick proof of concept for building a data processing pipeline in Python. To do this, I want to build a Google Function which will be triggered when certain .csv files will be dropped into Cloud Storage.

I followed along this Google Functions Python tutorial and while the sample code does trigger the Function to create some simple logs when a file is dropped, I am really stuck on what call I have to make to actually read the contents of the data. I tried to search for an SDK/API guidance document but I have not been able to find it.

In case this is relevant, once I process the .csv, I want to be able to add some data that I extract from it into GCP's Pub/Sub.

like image 651
rara-aaa Avatar asked Nov 17 '18 00:11

rara-aaa


People also ask

How does Python read files from Google Cloud Storage?

If you're developing code locally, you can create and obtain service account credentials manually. First of all create service account and download private key file. This json file is used for reading bucket data.


2 Answers

The function does not actually receive the contents of the file, just some metadata about it.

You'll want to use the google-cloud-storage client. See the "Downloading Objects" guide for more details.

Putting that together with the tutorial you're using, you get a function like:

from google.cloud import storage

storage_client = storage.Client()

def hello_gcs_generic(data, context):
    bucket = storage_client.get_bucket(data['bucket'])
    blob = bucket.blob(data['name'])
    contents = blob.download_as_string()
    # Process the file contents, etc...
like image 87
Dustin Ingram Avatar answered Oct 16 '22 22:10

Dustin Ingram


This is an alternative solution using pandas:

Cloud Function Code:

import pandas as pd

def GCSDataRead(event, context):
    bucketName = event['bucket']
    blobName = event['name']
    fileName = "gs://" + bucketName + "/" + blobName
    
    dataFrame = pd.read_csv(fileName, sep=",")
    print(dataFrame)
like image 5
Soumendra Mishra Avatar answered Oct 16 '22 22:10

Soumendra Mishra