I just try to read csv file which was upload to GCS.
I want to read csv file which is upload to GCS with Cloud functions in GCP. And I want to deal with the csv data as "DataFrame".
But I can't read csv file by using pandas.
This is the code to read csv file on the GCS with using cloud functions.
def read_csvfile(data, context):
try:
bucket_name = "my_bucket_name"
file_name = "my_csvfile_name.csv"
project_name = "my_project_name"
# create gcs client
client = gcs.Client(project_name)
bucket = client.get_bucket(bucket_name)
# create blob
blob = gcs.Blob(file_name, bucket)
content = blob.download_as_string()
train = pd.read_csv(BytesIO(content))
print(train.head())
except Exception as e:
print("error:{}".format(e))
When I ran my Python code, I got the following error.
No columns to parse from file
Some websites says that the error means I read un empty csv file. But actually I upload non empty csv file. So how can I solve this problem?
please give me your help. Thanks.
----add at 2020/08/08-------
Thank you for giving me your help!
But finally I cloud not read csv file by using your code... I still have the error, No columns to parse from file
.
So I tried new way to read csv file as Byte type. The new Python code to read csv file is bellow.
MAIN.PY
from google.cloud import storage
import pandas as pd
import io
import csv
from io import BytesIO
def check_columns(data, context):
try:
object_name = data['name']
bucket_name = data['bucket']
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(object_name)
data = blob.download_as_string()
#read the upload csv file as Byte type.
f = io.StringIO(str(data))
df = pd.read_csv(f, encoding = "shift-jis")
print("df:{}".format(df))
print("df.columns:{}".format(df.columns))
print("The number of columns:{}".format(len(df.columns)))
REQUIREMENTS.TXT
Click==7.0
Flask==1.0.2
itsdangerous==1.1.0
Jinja2==2.10
MarkupSafe==1.1.0
Pillow==5.4.1
qrcode==6.1
six==1.12.0
Werkzeug==0.14.1
google-cloud-storage==1.30.0
gcsfs==0.6.2
pandas==1.1.0
The output I got is bellow.
df:Empty DataFrame
Columns: [b'Apple, Lemon, Orange, Grape]
Index: []
df.columns:Index(['b'Apple', 'Lemon', 'Orange', 'Grape'])
The number of columns:4
So I could read only first record in csv file as df.column!? But I could not get the other records in csv file...And the first column is not the column but normal record.
So how can I get some records in csv file as DataFrame with using pandas?
Could you help me again? Thank you.
Pandas, since version 0.24.1, can directly read a Google Cloud Storage URI.
For example:
gs://awesomefakebucket/my.csv
Your service account attached to your function must have access to read the CSV file.
Please, feel free to test and modify this code.
I used Python 3.7
function.py
from google.cloud import storage
import pandas as pd
def hello_world(request):
# it is mandatory initialize the storage client
client = storage.Client()
#please change the file's URI
temp = pd.read_csv('gs://awesomefakebucket/my.csv', encoding='utf-8')
print (temp.head())
return f'check the results in the logs'
requirements.txt
google-cloud-storage==1.30.0
gcsfs==0.6.2
pandas==1.1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With