Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write pandas DataFrame to HDF in memory buffer

Tags:

python

pandas

hdf

I want to get a dataframe as hdf in memory. The code below results in "AttributeError: '_io.BytesIO' object has no attribute 'put'". I am using python 3.5 and pandas 0.17

import pandas as pd
import numpy as np
import io

df = pd.DataFrame(np.arange(8).reshape(-1, 2), columns=['a', 'b'])
buf = io.BytesIO()
df.to_hdf(buf, 'some_key')

Update: As UpSampler pointed out "path_or_buf" cannot be an io stream (which I find confusing since buf usually can be an io stream, see to_csv). Other than writing to disk and reading it back in, can I get a dataframe as hdf in memory?

like image 929
user2133814 Avatar asked Jan 06 '17 13:01

user2133814


People also ask

How do I save pandas DataFrame as h5?

DataFrame - to_hdf() function. The to_hdf() function is used to write the contained data to an HDF5 file using HDFStore. File path or HDFStore object.

Is Pyarrow faster than pandas?

The pyarrow library is able to construct a pandas. DataFrame faster than using pandas.

Can you save DataFrame as pickle?

pickle saves the dataframe in it's current state thus the data and its format is preserved. This can lead to massive performance increases.

How is a pandas DataFrame stored in memory?

By default, Pandas returns the memory used just by the NumPy array it's using to store the data. For strings, this is just 8 multiplied by the number of strings in the column, since NumPy is just storing 64-bit pointers.


1 Answers

Your first argument to df.to_hdf() has to be a "path (string) or HDFStore object" not an io stream. Documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.to_hdf.html

like image 117
UpSampler Avatar answered Sep 18 '22 13:09

UpSampler