I have already read through the answers available here and here and these do not help.
I am trying to read a csv
object from S3
bucket and have been able to successfully read the data using the following code.
srcFileName="gossips.csv"
def on_session_started():
print("Starting new session.")
conn = S3Connection()
my_bucket = conn.get_bucket("randomdatagossip", validate=False)
print("Bucket Identified")
print(my_bucket)
key = Key(my_bucket,srcFileName)
key.open()
print(key.read())
conn.close()
on_session_started()
However, if I try to read the same object using pandas as a data frame, I get an error. The most common one being S3ResponseError: 403 Forbidden
def on_session_started2():
print("Starting Second new session.")
conn = S3Connection()
my_bucket = conn.get_bucket("randomdatagossip", validate=False)
# url = "https://s3.amazonaws.com/randomdatagossip/gossips.csv"
# urllib2.urlopen(url)
for line in smart_open.smart_open('s3://my_bucket/gossips.csv'):
print line
# data = pd.read_csv(url)
# print(data)
on_session_started2()
What am I doing wrong? I am on python 2.7 and cannot use Python 3.
Sometimes we may need to read a csv file from amzon s3 bucket directly , we can achieve this by using several methods, in that most common way is by using csv module. #1 — creating an object for s3 client with s3 access key , secret key and region (just assuming , reader already know what is access key and secret key.)
Here is what I have done to successfully read the df
from a csv
on S3.
import pandas as pd
import boto3
bucket = "yourbucket"
file_name = "your_file.csv"
s3 = boto3.client('s3')
# 's3' is a key word. create connection to S3 using default config and all buckets within S3
obj = s3.get_object(Bucket= bucket, Key= file_name)
# get object and file (key) from bucket
initial_df = pd.read_csv(obj['Body']) # 'Body' is a key word
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With