Access HDF files stored on s3 in pandas

Tags:

I'm storing pandas data frames dumped in HDF format on S3. I'm pretty much stuck as I can't pass the file pointer, the URL, the s3 URL or a StringIO object to read_hdf. If I understand it correctly the file must be present on the filesystem.

Source: https://github.com/pydata/pandas/blob/master/pandas/io/pytables.py#L315

It looks like it's implemented for CSV but not for HDF. Is there any better way to open those HDF files than copy them to the filesystem?

For the record, these HDF files are being handled on a web server, that's why I don't want a local copy.

If I need to stick with the local file: Is there any way to emulate that file on the filesystem (with a real path) which can be destroyed after the reading is done?

I'm using Python 2.7 with Django 1.9 and pandas 0.18.1.

550

asked Sep 07 '16 14:09

fodma1

1 Answers

Newer versions of python allow to read an hdf5 directly from S3 as mentioned in the read_hdf documentation. Perhaps you should upgrade pandas if you can. This of course assumes you've set the right access rights to read those files: either with a credentials file or with public ACLs.

Regarding your last comment, I am not sure why storing several HDF5 per df would necessarily be contra-indicated to the use of HDF5. Pickle should be much slower than HDF5 though joblib.dump might partially improve on this.

108

answered Oct 13 '22 13:10

Louis MAYAUD

Related questions
                            
                                Python ARIMA model, predicted values are shifted
                            
                                AWS Lambda w/ Python UUID on Dynamo DB (Concept)
                            
                                Efficient reduction of multiple tensors in Python
                            
                                Setting up a scheduled / cron job with Django on Elastic Beanstalk with a Worker Tier
                            
                                How often does python-requests perform dns queries
                            
                                How to control memory while using Keras with tensorflow backend?
                            
                                Any way to find all possible kwargs for a function in python from cli?
                            
                                Decide when to refresh OAUTH2 token with Python Social Auth
                            
                                Exclude manylinux wheels when downloading from pip
                            
                                Python: l2-Penalty for logistic regression model from statsmodels?
                            
                                PySpark: TypeError: 'Row' object does not support item assignment
                            
                                Using python with Anaconda in Windows
                            
                                Python : Ramer-Douglas-Peucker (RDP) algorithm with number of points instead of epsilon
                            
                                Implementing Tuples and Lists in the isinstance Function in Python 2.7
                            
                                How to apply RANSAC in Python OpenCV
                            
                                Zoom action in android using appium-python-client
                            
                                ImportError: No module named setuptools.command on Mac OS X within virtualenv
                            
                                How __reduce__ function exactly works in case of pickle module?
                            
                                Insert large amount of data to BigQuery via bigquery-python library
                            
                                numpy ndarray with more that 32 dimensions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Access HDF files stored on s3 in pandas

Tags:

python

pandas

django

hdf5

amazon-s3

fodma1

People also ask

1 Answers

Louis MAYAUD

Recent Activity

Donate For Us