<p>How to read a parquet file on s3 using <code>dask</code> and specific AWS profile (stored in a credentials file). Dask uses <code>s3fs</code> which uses <code>boto</code>. This is what I have tried:</p> <pre class="prettyprint"><code>>>>import os >>>import s3fs >>>import boto3 >>>import dask.dataframe as dd >>>os.environ['AWS_SHARED_CREDENTIALS_FILE'] = "~/.aws/credentials" >>>fs = s3fs.S3FileSystem(anon=False,profile_name="some_user_profile") >>>fs.exists("s3://some.bucket/data/parquet/somefile") True >>>df = dd.read_parquet('s3://some.bucket/data/parquet/somefile') NoCredentialsError: Unable to locate credentials </code></pre>

<p>Never mind, that was easy, but did not find any reference online, so here it is:</p> <pre class="prettyprint"><code>>>>import os >>>import dask.dataframe as dd >>>os.environ['AWS_SHARED_CREDENTIALS_FILE'] = "/path/to/credentials" >>>df = dd.read_parquet('s3://some.bucket/data/parquet/somefile', storage_options={"profile_name":"some_user_profile"}) >>>df.head() # works </code></pre>

How to read parquet file from s3 using dask with specific AWS profile

Tags:

python

amazon-s3

boto3

dask

python-s3fs

How to read a parquet file on s3 using dask and specific AWS profile (stored in a credentials file). Dask uses s3fs which uses boto. This is what I have tried:

>>>import os
>>>import s3fs
>>>import boto3
>>>import dask.dataframe as dd

>>>os.environ['AWS_SHARED_CREDENTIALS_FILE'] = "~/.aws/credentials"

>>>fs = s3fs.S3FileSystem(anon=False,profile_name="some_user_profile")
>>>fs.exists("s3://some.bucket/data/parquet/somefile")
True
>>>df = dd.read_parquet('s3://some.bucket/data/parquet/somefile')
NoCredentialsError: Unable to locate credentials

272

asked Jan 22 '18 20:01

muon

1 Answers

Never mind, that was easy, but did not find any reference online, so here it is:

>>>import os
>>>import dask.dataframe as dd
>>>os.environ['AWS_SHARED_CREDENTIALS_FILE'] = "/path/to/credentials"

>>>df = dd.read_parquet('s3://some.bucket/data/parquet/somefile',
                      storage_options={"profile_name":"some_user_profile"})
>>>df.head()
# works

answered Oct 19 '22 04:10

muon

Related questions
                            
                                How does one create a metaclass? [duplicate]
                            
                                Kivy: Get widgets ids and accessing widgets by unique property
                            
                                Import error: No module named _mysql
                            
                                Seaborn timeseries plot with multiple series
                            
                                Group in a group ArgParse
                            
                                Bytecode optimization
                            
                                Edit tensorflow inceptionV3 retraining-example.py for multiple classificiations
                            
                                python - prevent IOError: [Errno 5] Input/output error when running without stdout
                            
                                Using google cloud datastore emulator with dev_appserver
                            
                                Django Test -- Unable to drop and recreate test database
                            
                                How can I use a postactivate script using Python 3 venv?
                            
                                How to edit the path in odbcinst -j
                            
                                What's the difference between override_settings and modify_settings in Django?
                            
                                Porting pickle py2 to py3 strings become bytes
                            
                                How to adjust space between Matplotlib/Seaborn subplots for multi-plot layouts
                            
                                Getting topic-word distribution from LDA in scikit learn
                            
                                Is it possible to copy code from one AWS Lambda function to another without downloading it first?
                            
                                How to avoid circular dependency caused by type hinting of pointer attributes in python
                            
                                What versions of Python will work in Windows XP?
                            
                                How is this list expanded with the slicing assignment? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to read parquet file from s3 using dask with specific AWS profile

Tags:

python

amazon-s3

boto3

dask

python-s3fs

muon

People also ask

1 Answers

muon

Recent Activity

Donate For Us