I want to get a dataframe as hdf in memory. The code below results in "AttributeError: '_io.BytesIO' object has no attribute 'put'". I am using python 3.5 and pandas 0.17
import pandas as pd
import numpy as np
import io
df = pd.DataFrame(np.arange(8).reshape(-1, 2), columns=['a', 'b'])
buf = io.BytesIO()
df.to_hdf(buf, 'some_key')
Update: As UpSampler pointed out "path_or_buf" cannot be an io stream (which I find confusing since buf usually can be an io stream, see to_csv). Other than writing to disk and reading it back in, can I get a dataframe as hdf in memory?
DataFrame - to_hdf() function. The to_hdf() function is used to write the contained data to an HDF5 file using HDFStore. File path or HDFStore object.
The pyarrow library is able to construct a pandas. DataFrame faster than using pandas.
pickle saves the dataframe in it's current state thus the data and its format is preserved. This can lead to massive performance increases.
By default, Pandas returns the memory used just by the NumPy array it's using to store the data. For strings, this is just 8 multiplied by the number of strings in the column, since NumPy is just storing 64-bit pointers.
Your first argument to df.to_hdf() has to be a "path (string) or HDFStore object" not an io stream. Documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.to_hdf.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With