I'm trying to subclass pandas data structure. If I set a field on the instance, it works fine.
import seaborn as sns
import pandas as pd
df = sns.load_dataset('iris')
class Results(pd.DataFrame):
def __init__(self, *args, **kwargs):
# use the __init__ method from DataFrame to ensure
# that we're inheriting the correct behavior
super(Results, self).__init__(*args, **kwargs)
@property
def _constructor(self):
return Results
result_object = Results(df)
result_object['scheme'] = 'not_default'
print(result_object.head(5))
>>> sepal_length sepal_width petal_length petal_width species scheme
0 5.1 3.5 1.4 0.2 setosa not_default
1 4.9 3.0 1.4 0.2 setosa not_default
2 4.7 3.2 1.3 0.2 setosa not_default
3 4.6 3.1 1.5 0.2 setosa not_default
4 5.0 3.6 1.4 0.2 setosa not_default
I don't quite understand the _constructor
method under the hood well enough to tell why this does not work.
import seaborn as sns
import pandas as pd
df = sns.load_dataset('iris')
class Results(pd.DataFrame):
def __init__(self, *args,scheme='default', **kwargs):
# use the __init__ method from DataFrame to ensure
# that we're inheriting the correct behavior
super(Results, self).__init__(*args, **kwargs)
self['scheme'] = scheme
@property
def _constructor(self):
return Results
result_object = Results(df.copy(),scheme='not_default')
print(result_object.head(5))
>>>
# scheme is still 'default'
sepal_length sepal_width petal_length petal_width species scheme
0 5.1 3.5 1.4 0.2 setosa default
1 4.9 3.0 1.4 0.2 setosa default
2 4.7 3.2 1.3 0.2 setosa default
3 4.6 3.1 1.5 0.2 setosa default
4 5.0 3.6 1.4 0.2 setosa default
Notice the scheme
field still says default.
Is there anyway to set a field in the instance constructor?
One way of renaming the columns in a Pandas Dataframe is by using the rename() function. This method is quite useful when we need to rename some selected columns because we need to specify information only for the columns which are to be renamed.
DataFrame - assign() functionThe assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.
Suppose that you want to replace multiple values with multiple new values for an individual DataFrame column. In that case, you may use this template: df['column name'] = df['column name']. replace(['1st old value', '2nd old value', ...], ['1st new value', '2nd new value', ...])
Your current version creates scheme
as an attribute (like .index
, .columns
):
result_object.scheme
# 0 not_default
# 1 not_default
# ...
# 148 not_default
# 149 not_default
# Name: scheme, Length: 150, dtype: object
To make it a proper column, you can modify the incoming data
before sending it to super()
:
class Results(pd.DataFrame):
def __init__(self, data=None, *args, scheme='default', **kwargs):
# add column to incoming data
if isinstance(data, pd.DataFrame):
data['scheme'] = scheme
super(Results, self).__init__(data=data, *args, **kwargs)
@property
def _constructor(self):
return Results
df = sns.load_dataset('iris')
result_object = Results(df.copy(), scheme='not_default')
# sepal_length sepal_width petal_length petal_width species scheme
# 0 5.1 3.5 1.4 0.2 setosa not_default
# 1 4.9 3.0 1.4 0.2 setosa not_default
# 2 4.7 3.2 1.3 0.2 setosa not_default
# 3 4.6 3.1 1.5 0.2 setosa not_default
# ... ... ... ... ... ... ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With