I'm trying to read a CSV file from a private S3 bucket to a pandas dataframe: <pre class="prettyprint"><code>df = pandas.read_csv('s3://mybucket/file.csv') </code></pre> I can read a file from a public bucket, but reading a file from a private bucket results in HTTP 403: Forbidden error. I have configured the AWS credentials using aws configure. I can download a file from a private bucket using boto3, which uses aws credentials. It seems that I need to configure pandas to use AWS credentials, but don't know how.

Pandas uses <code>boto</code> (not <code>boto3</code>) inside <code>read_csv</code>. You might be able to install boto and have it work correctly. There's some troubles with boto and python 3.4.4 / python3.5.1. If you're on those platforms, and until those are fixed, you can use boto 3 as <pre class="prettyprint"><code>import boto3 import pandas as pd s3 = boto3.client('s3') obj = s3.get_object(Bucket='bucket', Key='key') df = pd.read_csv(obj['Body']) </code></pre> That <code>obj</code> had a <code>.read</code> method (which returns a stream of bytes), which is enough for pandas.

Updated for Pandas 0.20.1 Pandas now uses s3fs to handle s3 coonnections. link <blockquote> pandas now uses s3fs for handling S3 connections. This shouldn’t break any code. However, since s3fs is not a required dependency, you will need to install it separately, like boto in prior versions of pandas. </blockquote> <pre class="prettyprint"><code>import os import pandas as pd from s3fs.core import S3FileSystem # aws keys stored in ini file in same path # refer to boto3 docs for config settings os.environ['AWS_CONFIG_FILE'] = 'aws_config.ini' s3 = S3FileSystem(anon=False) key = 'path\to\your-csv.csv' bucket = 'your-bucket-name' df = pd.read_csv(s3.open('{}/{}'.format(bucket, key), mode='rb')) # or with f-strings df = pd.read_csv(s3.open(f'{bucket}/{key}', mode='rb')) </code></pre>

Reading a file from a private S3 bucket to a pandas dataframe

Tags:

pandas

amazon-web-services

I'm trying to read a CSV file from a private S3 bucket to a pandas dataframe:

df = pandas.read_csv('s3://mybucket/file.csv')

I can read a file from a public bucket, but reading a file from a private bucket results in HTTP 403: Forbidden error.

I have configured the AWS credentials using aws configure.

I can download a file from a private bucket using boto3, which uses aws credentials. It seems that I need to configure pandas to use AWS credentials, but don't know how.

643

asked Mar 04 '16 18:03

IgorK

2 Answers

Pandas uses boto (not boto3) inside read_csv. You might be able to install boto and have it work correctly.

There's some troubles with boto and python 3.4.4 / python3.5.1. If you're on those platforms, and until those are fixed, you can use boto 3 as

import boto3 import pandas as pd  s3 = boto3.client('s3') obj = s3.get_object(Bucket='bucket', Key='key') df = pd.read_csv(obj['Body'])

That obj had a .read method (which returns a stream of bytes), which is enough for pandas.

answered Sep 28 '22 02:09

TomAugspurger

Updated for Pandas 0.20.1

Pandas now uses s3fs to handle s3 coonnections. link

pandas now uses s3fs for handling S3 connections. This shouldn’t break any code. However, since s3fs is not a required dependency, you will need to install it separately, like boto in prior versions of pandas.

import os  import pandas as pd from s3fs.core import S3FileSystem  # aws keys stored in ini file in same path # refer to boto3 docs for config settings os.environ['AWS_CONFIG_FILE'] = 'aws_config.ini'  s3 = S3FileSystem(anon=False) key = 'path\to\your-csv.csv' bucket = 'your-bucket-name'  df = pd.read_csv(s3.open('{}/{}'.format(bucket, key), mode='rb')) # or with f-strings df = pd.read_csv(s3.open(f'{bucket}/{key}', mode='rb'))

answered Sep 28 '22 02:09

spitfiredd

Related questions
                            
                                aws s3 replace file atomically
                            
                                Amazon Cloudfront Cache-Control: no-cache header has no effect after 24 hours
                            
                                Download file from url and upload it to AWS S3 without saving - node.js
                            
                                AWS CLI get download S3 URL for private bucket from AWS CLI
                            
                                What exactly is a single Heroku Web Dyno?
                            
                                Cloudfront serving over own SSL certificate
                            
                                How is Amazon DynamoDB throughput calculated and limited?
                            
                                Custom domain for API Gateway returning 403
                            
                                SNS topic not publishing to SQS
                            
                                AWS CloudFront returns http 307 when origin is S3 bucket
                            
                                SSH EC2 asking for password
                            
                                Athena greater than condition in date column
                            
                                When should I use a t2.medium vs. a m3.medium instance type within AWS?
                            
                                This distribution is not configured to allow the HTTP request
                            
                                Find if AWS instance is running Amazon Linux 1 or 2?
                            
                                AWS Cognito Authentication USER_PASSWORD_AUTH flow not enabled for this client
                            
                                How to append a value to list attribute on AWS DynamoDB?
                            
                                AWS : Invalid identity pool configuration. Check assigned IAM roles for this pool
                            
                                Docker Error - "jq: error: Cannot iterate over null"
                            
                                Retrieve S3 file as Object instead of downloading to absolute system path

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With