I have a SNS notification setup that triggers a Lambda function when a .xlsx file is uploaded to S3 bucket.
The lambda function reads the .xlsx file into Pandas DataFrame.
import os
import pandas as pd
import json
import xlrd
import boto3
def main(event, context):
message = event['Records'][0]['Sns']['Message']
parsed_message = json.loads(message)
src_bucket = parsed_message['Records'][0]['s3']['bucket']['name']
filepath = parsed_message['Records'][0]['s3']['object']['key']
s3 = boto3.resource('s3')
s3_client = boto3.client('s3')
obj = s3_client.get_object(Bucket=src_bucket, Key=filepath)
print(obj['Body'])
df = pd.read_excel(obj, header=2)
print(df.head(2))
I get an error as below:
Invalid file path or buffer object type: <type 'dict'>: ValueError
Traceback (most recent call last):
File "/var/task/handler.py", line 26, in main
df = pd.read_excel(obj, header=2)
File "/var/task/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/var/task/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/var/task/pandas/io/excel.py", line 307, in read_excel
io = ExcelFile(io, engine=engine)
File "/var/task/pandas/io/excel.py", line 376, in __init__
io, _, _, _ = get_filepath_or_buffer(self._io)
File "/var/task/pandas/io/common.py", line 218, in get_filepath_or_buffer
raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type: <type 'dict'>
How can I resolve this?
To read an excel file as a DataFrame, use the pandas read_excel() method. You can read the first sheet, specific sheets, multiple sheets or all sheets. Pandas converts this to the DataFrame structure, which is a tabular like structure.
Use pandas. read_excel() function to read excel sheet into pandas DataFrame, by default it loads the first sheet from the excel file and parses the first row as a DataFrame column name. Excel file has an extension . xlsx.
Read an Excel file into a pandas DataFrame. Supports xls , xlsx , xlsm , xlsb , odf , ods and odt file extensions read from a local filesystem or URL. Supports an option to read a single sheet or a list of sheets. Any valid string path is acceptable.
You can directly read xls file from S3 without having to download or save it locally. xlrd module has a provision to provide raw data to create workbook object.
It is perfectly normal! obj is a dictionnary, have u tried ?
df = pd.read_excel(obj['body'], header=2)
Pandas now supports s3 URL as a file path so it can read the excel file directly from s3 without downloading it first.
See here for a CSV example - https://stackoverflow.com/a/51777553/52954
try pd.read_excel(obj['Body'].read())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With