Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to read and load an excel file from AWS S3?

I have uploaded an excel file to AWS S3 bucket and now I want to read it in python. Any help would be appreciated. Here is what I have achieved so far,

import boto3
import os

aws_id = 'aws_id'
aws_secret = 'aws_secret_key'

client = boto3.client('s3', aws_access_key_id=aws_id, aws_secret_access_key=aws_secret)
bucket_name = 'my_bucket'
object_key = 'my_excel_file.xlsm'
object_file = client.get_object(Bucket=bucket_name, Key=object_key)
body = object_file['Body']
data = body.read()

What do I need to do next in order to read this data and work on it?

like image 431
exan Avatar asked Nov 23 '18 01:11

exan


People also ask

How do I extract data from AWS S3?

You can also download the object to your local computer. In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it.


3 Answers

Spent quite some time on it and here's how I got it working,

import boto3
import io
import pandas as pd
import json

aws_id = ''
aws_secret = ''
bucket_name = ''
object_key = ''

s3 = boto3.client('s3', aws_access_key_id=aws_id, aws_secret_access_key=aws_secret)
obj = s3.get_object(Bucket=bucket_name, Key=object_key)
data = obj['Body'].read()
df = pd.read_excel(io.BytesIO(data), encoding='utf-8')
like image 189
exan Avatar answered Oct 09 '22 16:10

exan


You can directly read xls file from S3 without having to download or save it locally. xlrd module has a provision to provide raw data to create workbook object. Following is the code snippet.

from boto3 import Session  
from xlrd.book import open_workbook_xls  

aws_id = ''    
aws_secret = ''
bucket_name = ''
object_key = ''

s3_session = Session(aws_access_key_id=aws_id, aws_secret_access_key=aws_secret)
bucket_object = s3_session.resource('s3').Bucket(bucket_name).Object(object_key)
content = bucket_object.get()['Body'].read()
workbook = open_workbook_xls(file_contents=content)
like image 23
Rhythm Chopra Avatar answered Oct 09 '22 16:10

Rhythm Chopra


You can directly read excel files using awswrangler.s3.read_excel. Note that you can pass any pandas.read_excel() arguments (sheet name, etc) to this.

import awswrangler as wr
df = wr.s3.read_excel(path=s3_uri)
like image 2
milihoosh Avatar answered Oct 09 '22 17:10

milihoosh