Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subclassing pandas dataframe and setting field in constuctor

I'm trying to subclass pandas data structure. If I set a field on the instance, it works fine.

import seaborn as sns
import pandas as pd
df = sns.load_dataset('iris')

class Results(pd.DataFrame):
    def __init__(self, *args, **kwargs):
        # use the __init__ method from DataFrame to ensure
        # that we're inheriting the correct behavior
        super(Results, self).__init__(*args, **kwargs)

    @property
    def _constructor(self):
        return Results
    
result_object = Results(df)
result_object['scheme'] = 'not_default'
print(result_object.head(5))

>>>   sepal_length  sepal_width  petal_length  petal_width species       scheme
0           5.1          3.5           1.4          0.2  setosa  not_default
1           4.9          3.0           1.4          0.2  setosa  not_default
2           4.7          3.2           1.3          0.2  setosa  not_default
3           4.6          3.1           1.5          0.2  setosa  not_default
4           5.0          3.6           1.4          0.2  setosa  not_default

I don't quite understand the _constructor method under the hood well enough to tell why this does not work.

import seaborn as sns
import pandas as pd
df = sns.load_dataset('iris')

class Results(pd.DataFrame):
    def __init__(self, *args,scheme='default', **kwargs):
        # use the __init__ method from DataFrame to ensure
        # that we're inheriting the correct behavior
        super(Results, self).__init__(*args, **kwargs)
        self['scheme'] = scheme

    @property
    def _constructor(self):
        return Results

result_object = Results(df.copy(),scheme='not_default')
print(result_object.head(5))

>>>
# scheme is still 'default'
   sepal_length  sepal_width  petal_length  petal_width species   scheme
0           5.1          3.5           1.4          0.2  setosa  default
1           4.9          3.0           1.4          0.2  setosa  default
2           4.7          3.2           1.3          0.2  setosa  default
3           4.6          3.1           1.5          0.2  setosa  default
4           5.0          3.6           1.4          0.2  setosa  default

Notice the scheme field still says default.

Is there anyway to set a field in the instance constructor?

like image 526
jwillis0720 Avatar asked Mar 16 '21 00:03

jwillis0720


People also ask

How do I change the DataFrame column in pandas?

One way of renaming the columns in a Pandas Dataframe is by using the rename() function. This method is quite useful when we need to rename some selected columns because we need to specify information only for the columns which are to be renamed.

How do I assign a column to a DataFrame?

DataFrame - assign() functionThe assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.

How do I change the value of DataFrame in PD?

Suppose that you want to replace multiple values with multiple new values for an individual DataFrame column. In that case, you may use this template: df['column name'] = df['column name']. replace(['1st old value', '2nd old value', ...], ['1st new value', '2nd new value', ...])


1 Answers

Your current version creates scheme as an attribute (like .index, .columns):

result_object.scheme

# 0      not_default
# 1      not_default
#           ...     
# 148    not_default
# 149    not_default
# Name: scheme, Length: 150, dtype: object

To make it a proper column, you can modify the incoming data before sending it to super():

class Results(pd.DataFrame):
    def __init__(self, data=None, *args, scheme='default', **kwargs):

        # add column to incoming data
        if isinstance(data, pd.DataFrame):
            data['scheme'] = scheme

        super(Results, self).__init__(data=data, *args, **kwargs)

    @property
    def _constructor(self):
        return Results

df = sns.load_dataset('iris')
result_object = Results(df.copy(), scheme='not_default')

#    sepal_length  sepal_width  petal_length  petal_width species       scheme
# 0           5.1          3.5           1.4          0.2  setosa  not_default
# 1           4.9          3.0           1.4          0.2  setosa  not_default
# 2           4.7          3.2           1.3          0.2  setosa  not_default
# 3           4.6          3.1           1.5          0.2  setosa  not_default
# ...         ...          ...           ...          ...     ...          ...
like image 71
tdy Avatar answered Oct 20 '22 00:10

tdy