I am retrieving some web data, parsing it, and storing the output as a Pandas DataFrame into an HDF5 file. Right before I write the DataFrame
into the H5 file, I add my own description string to annotate some metadata about where the data came from and whether anything went wrong while parsing it.
In [1]: my_data_frame.desc = "Some string about the data"
In [2]: my_data_frame.desc
Out[1]: "Some string about the data"
In [3]: print type(my_data_frame)
<class 'pandas.core.frame.DataFrame'>
However, after loading the same data with pandas.io.pytables.HDFStore()
, my added desc
attribute is missing and I get the error: AttributeError: 'DataFrame' object has no attribute 'desc'
as if I had never added this new attribute.
How can I get my metadata descriptions to persist as an extra attribute of the DataFrame object? (Or is there some existing, recognized attribute of a DataFrame that I can hijack for my metadata purposes?)
Pandas DataFrame describe() Method The describe() method returns description of the data in the DataFrame. If the DataFrame contains numerical data, the description contains these information for each column: count - The number of not-empty values. mean - The average (mean) value.
You can get the Pandas DataFrame Column Names by using DataFrame. columns. values method and to get it as a list use tolist(). Each column in a Pandas DataFrame has a label/name that specifies what type of value it holds/represents.
Adding DataFrame metadata or per-column metadata is on the roadmap but hasn't been implemented yet. I'm open to ideas about what the API should look like, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With