Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding my own description attribute to a Pandas DataFrame

I am retrieving some web data, parsing it, and storing the output as a Pandas DataFrame into an HDF5 file. Right before I write the DataFrame into the H5 file, I add my own description string to annotate some metadata about where the data came from and whether anything went wrong while parsing it.

In [1]: my_data_frame.desc = "Some string about the data"

In [2]: my_data_frame.desc

Out[1]: "Some string about the data"

In [3]: print type(my_data_frame)
<class 'pandas.core.frame.DataFrame'>

However, after loading the same data with pandas.io.pytables.HDFStore(), my added desc attribute is missing and I get the error: AttributeError: 'DataFrame' object has no attribute 'desc' as if I had never added this new attribute.

How can I get my metadata descriptions to persist as an extra attribute of the DataFrame object? (Or is there some existing, recognized attribute of a DataFrame that I can hijack for my metadata purposes?)

like image 817
ely Avatar asked Jul 26 '12 15:07

ely


People also ask

How do I get description columns in pandas?

Pandas DataFrame describe() Method The describe() method returns description of the data in the DataFrame. If the DataFrame contains numerical data, the description contains these information for each column: count - The number of not-empty values. mean - The average (mean) value.

How do you get a feature name in a DataFrame?

You can get the Pandas DataFrame Column Names by using DataFrame. columns. values method and to get it as a list use tolist(). Each column in a Pandas DataFrame has a label/name that specifies what type of value it holds/represents.


1 Answers

Adding DataFrame metadata or per-column metadata is on the roadmap but hasn't been implemented yet. I'm open to ideas about what the API should look like, though.

like image 107
Wes McKinney Avatar answered Oct 12 '22 04:10

Wes McKinney