Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change first element of each group in pandas DataFrame

I want to ensure that the first value of val2 corresponding to each vintage is NaN. Currently two are already NaN, but I want to ensure that 0.53 also changes to NaN.

df = pd.DataFrame({
        'vintage': ['2017-01-01', '2017-01-01', '2017-01-01', '2017-02-01', '2017-02-01', '2017-03-01'],
        'date': ['2017-01-01', '2017-02-01', '2017-03-01', '2017-02-01', '2017-03-01', '2017-03-01'],
        'val1': [0.59, 0.68, 0.8, 0.54, 0.61, 0.6],
        'val2': [np.nan, 0.66, 0.81, 0.53, 0.62, np.nan]
    })

Here's what I've tried so far:

df.groupby('vintage').first().val2 #This gives the first non-NaN values, as shown below

vintage
2017-01-01    0.66
2017-02-01    0.53
2017-03-01     NaN

df.groupby('vintage').first().val2 = np.nan #This doesn't change anything
df.val2

0     NaN
1    0.66
2    0.81
3    0.53
4    0.62
5     NaN
like image 260
Gaurav Bansal Avatar asked Sep 15 '17 14:09

Gaurav Bansal


People also ask

How do I change a specific value in pandas?

Pandas DataFrame replace() MethodThe replace() method replaces the specified value with another specified value. The replace() method searches the entire DataFrame and replaces every case of the specified value.

How do you keep the first row of each group in pandas?

First row means that index 0, hence to get the first row of each row, we need to access the 0th index of each group, the groups in pandas can be created with the help of pandas. DataFrame. groupby() method.

What is first () in pandas?

Pandas DataFrame first() Method The first() method returns the first n rows, based on the specified value. The index have to be dates for this method to work as expected.

How do I change the first column in a DataFrame?

You can change the column name of pandas DataFrame by using DataFrame. rename() method and DataFrame. columns() method.


1 Answers

You can't assign to the result of an aggregation, also first ignores existing NaN, what you can do is call head(1) which will return the first row for each group, and pass the indices to loc to mask the orig df to overwrite those column values:

In[91]
df.loc[df.groupby('vintage')['val2'].head(1).index, 'val2'] = np.NaN
df:

Out[91]: 
         date  val1  val2     vintage
0  2017-01-01  0.59   NaN  2017-01-01
1  2017-02-01  0.68  0.66  2017-01-01
2  2017-03-01  0.80  0.81  2017-01-01
3  2017-02-01  0.54   NaN  2017-02-01
4  2017-03-01  0.61  0.62  2017-02-01
5  2017-03-01  0.60   NaN  2017-03-01

here you can see that head(1) returns the first row for each group:

In[94]:
df.groupby('vintage')['val2'].head(1)
Out[94]: 
0     NaN
3    0.53
5     NaN
Name: val2, dtype: float64

contrast with first which will return the first non-NaN unless there is only NaN values for that group:

In[95]:
df.groupby('vintage')['val2'].first()

Out[95]: 
vintage
2017-01-01    0.66
2017-02-01    0.53
2017-03-01     NaN
Name: val2, dtype: float64
like image 67
EdChum Avatar answered Sep 21 '22 02:09

EdChum