I want to ensure that the first value of val2
corresponding to each vintage
is NaN
. Currently two are already NaN
, but I want to ensure that 0.53
also changes to NaN
.
df = pd.DataFrame({
'vintage': ['2017-01-01', '2017-01-01', '2017-01-01', '2017-02-01', '2017-02-01', '2017-03-01'],
'date': ['2017-01-01', '2017-02-01', '2017-03-01', '2017-02-01', '2017-03-01', '2017-03-01'],
'val1': [0.59, 0.68, 0.8, 0.54, 0.61, 0.6],
'val2': [np.nan, 0.66, 0.81, 0.53, 0.62, np.nan]
})
Here's what I've tried so far:
df.groupby('vintage').first().val2 #This gives the first non-NaN values, as shown below
vintage
2017-01-01 0.66
2017-02-01 0.53
2017-03-01 NaN
df.groupby('vintage').first().val2 = np.nan #This doesn't change anything
df.val2
0 NaN
1 0.66
2 0.81
3 0.53
4 0.62
5 NaN
Pandas DataFrame replace() MethodThe replace() method replaces the specified value with another specified value. The replace() method searches the entire DataFrame and replaces every case of the specified value.
First row means that index 0, hence to get the first row of each row, we need to access the 0th index of each group, the groups in pandas can be created with the help of pandas. DataFrame. groupby() method.
Pandas DataFrame first() Method The first() method returns the first n rows, based on the specified value. The index have to be dates for this method to work as expected.
You can change the column name of pandas DataFrame by using DataFrame. rename() method and DataFrame. columns() method.
You can't assign to the result of an aggregation, also first
ignores existing NaN
, what you can do is call head(1)
which will return the first row for each group, and pass the indices to loc
to mask the orig df to overwrite those column values:
In[91]
df.loc[df.groupby('vintage')['val2'].head(1).index, 'val2'] = np.NaN
df:
Out[91]:
date val1 val2 vintage
0 2017-01-01 0.59 NaN 2017-01-01
1 2017-02-01 0.68 0.66 2017-01-01
2 2017-03-01 0.80 0.81 2017-01-01
3 2017-02-01 0.54 NaN 2017-02-01
4 2017-03-01 0.61 0.62 2017-02-01
5 2017-03-01 0.60 NaN 2017-03-01
here you can see that head(1)
returns the first row for each group:
In[94]:
df.groupby('vintage')['val2'].head(1)
Out[94]:
0 NaN
3 0.53
5 NaN
Name: val2, dtype: float64
contrast with first
which will return the first non-NaN unless there is only NaN
values for that group:
In[95]:
df.groupby('vintage')['val2'].first()
Out[95]:
vintage
2017-01-01 0.66
2017-02-01 0.53
2017-03-01 NaN
Name: val2, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With