Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read csv file with using pandas and cloud functions in GCP?

I just try to read csv file which was upload to GCS.

I want to read csv file which is upload to GCS with Cloud functions in GCP. And I want to deal with the csv data as "DataFrame".

But I can't read csv file by using pandas.

This is the code to read csv file on the GCS with using cloud functions.

def read_csvfile(data, context):
     try:
          bucket_name = "my_bucket_name"
          file_name = "my_csvfile_name.csv"
          project_name = "my_project_name"

          # create gcs client
          client = gcs.Client(project_name)
          bucket = client.get_bucket(bucket_name)
          # create blob
          blob = gcs.Blob(file_name, bucket)
          content = blob.download_as_string()
          train = pd.read_csv(BytesIO(content))
          print(train.head())
     
     except Exception as e:
          print("error:{}".format(e))

When I ran my Python code, I got the following error.

No columns to parse from file

Some websites says that the error means I read un empty csv file. But actually I upload non empty csv file. So how can I solve this problem?

please give me your help. Thanks.

----add at 2020/08/08-------

Thank you for giving me your help! But finally I cloud not read csv file by using your code... I still have the error, No columns to parse from file.

So I tried new way to read csv file as Byte type. The new Python code to read csv file is bellow.

MAIN.PY

from google.cloud import storage
import pandas as pd
import io
import csv
from io import BytesIO 

def check_columns(data, context):
    try:
        object_name = data['name']
        bucket_name = data['bucket']

        storage_client = storage.Client()
        bucket = storage_client.bucket(bucket_name)
        blob = bucket.blob(object_name)
        data = blob.download_as_string()
        
        #read the upload csv file as Byte type.
        f = io.StringIO(str(data))
        df = pd.read_csv(f, encoding = "shift-jis")

        print("df:{}".format(df))     
        print("df.columns:{}".format(df.columns)) 
        print("The number of columns:{}".format(len(df.columns)))

REQUIREMENTS.TXT

Click==7.0
Flask==1.0.2
itsdangerous==1.1.0
Jinja2==2.10
MarkupSafe==1.1.0
Pillow==5.4.1
qrcode==6.1
six==1.12.0
Werkzeug==0.14.1
google-cloud-storage==1.30.0
gcsfs==0.6.2
pandas==1.1.0

The output I got is bellow.

df:Empty DataFrame
Columns: [b'Apple, Lemon, Orange, Grape]
Index: []
df.columns:Index(['b'Apple', 'Lemon', 'Orange', 'Grape'])
The number of columns:4

So I could read only first record in csv file as df.column!? But I could not get the other records in csv file...And the first column is not the column but normal record.

So how can I get some records in csv file as DataFrame with using pandas?

Could you help me again? Thank you.

like image 650
alan Avatar asked Dec 30 '22 22:12

alan


1 Answers

Pandas, since version 0.24.1, can directly read a Google Cloud Storage URI.

For example:

gs://awesomefakebucket/my.csv

Your service account attached to your function must have access to read the CSV file.

Please, feel free to test and modify this code.

I used Python 3.7

function.py

from google.cloud import storage
import pandas as pd

def hello_world(request):
    # it is mandatory initialize the storage client
    client = storage.Client()
    #please change the file's URI
    temp = pd.read_csv('gs://awesomefakebucket/my.csv', encoding='utf-8')
    print (temp.head())
    return f'check the results in the logs'

requirements.txt

google-cloud-storage==1.30.0
gcsfs==0.6.2
pandas==1.1.0

like image 68
Jan Hernandez Avatar answered Jan 02 '23 12:01

Jan Hernandez