Change first element of each group in pandas DataFrame

Tags:

I want to ensure that the first value of val2 corresponding to each vintage is NaN. Currently two are already NaN, but I want to ensure that 0.53 also changes to NaN.

df = pd.DataFrame({
        'vintage': ['2017-01-01', '2017-01-01', '2017-01-01', '2017-02-01', '2017-02-01', '2017-03-01'],
        'date': ['2017-01-01', '2017-02-01', '2017-03-01', '2017-02-01', '2017-03-01', '2017-03-01'],
        'val1': [0.59, 0.68, 0.8, 0.54, 0.61, 0.6],
        'val2': [np.nan, 0.66, 0.81, 0.53, 0.62, np.nan]
    })

Here's what I've tried so far:

df.groupby('vintage').first().val2 #This gives the first non-NaN values, as shown below

vintage
2017-01-01    0.66
2017-02-01    0.53
2017-03-01     NaN

df.groupby('vintage').first().val2 = np.nan #This doesn't change anything
df.val2

0     NaN
1    0.66
2    0.81
3    0.53
4    0.62
5     NaN

260

asked Sep 15 '17 14:09

Gaurav Bansal

1 Answers

You can't assign to the result of an aggregation, also first ignores existing NaN, what you can do is call head(1) which will return the first row for each group, and pass the indices to loc to mask the orig df to overwrite those column values:

In[91]
df.loc[df.groupby('vintage')['val2'].head(1).index, 'val2'] = np.NaN
df:

Out[91]: 
         date  val1  val2     vintage
0  2017-01-01  0.59   NaN  2017-01-01
1  2017-02-01  0.68  0.66  2017-01-01
2  2017-03-01  0.80  0.81  2017-01-01
3  2017-02-01  0.54   NaN  2017-02-01
4  2017-03-01  0.61  0.62  2017-02-01
5  2017-03-01  0.60   NaN  2017-03-01

here you can see that head(1) returns the first row for each group:

In[94]:
df.groupby('vintage')['val2'].head(1)
Out[94]: 
0     NaN
3    0.53
5     NaN
Name: val2, dtype: float64

contrast with first which will return the first non-NaN unless there is only NaN values for that group:

In[95]:
df.groupby('vintage')['val2'].first()

Out[95]: 
vintage
2017-01-01    0.66
2017-02-01    0.53
2017-03-01     NaN
Name: val2, dtype: float64

answered Sep 21 '22 02:09

EdChum

Related questions
                            
                                Dictionary in a numpy array?
                            
                                Slicing a MultiIndex DataFrame by multiple values from a specified level
                            
                                SQLAlchemy. Creating tables that share enum
                            
                                Write formula to Excel with Python
                            
                                How to load a pre-trained Word2vec MODEL File and reuse it?
                            
                                How to create a Django superuser if it doesn't exist non-interactively?
                            
                                Different colours for arrows in quiver plot
                            
                                Compare two Python methods in PyCharm
                            
                                How to run Scrapy project in Jupyter?
                            
                                How to fix "AssertionError: Value must be bytes" error in Python2.7 with Apache Kafka
                            
                                Escaping double quotes while rendering in Jinja2
                            
                                How to read gz compressed file by pyspark
                            
                                Why is the output of print in python2 and python3 different with the same string?
                            
                                How to concatenate pandas column with list values into one list?
                            
                                How to create an array from two columns in pandas
                            
                                Python pyautogui window handle
                            
                                Why can't I append pandas dataframe in a loop
                            
                                Forex historical data in Python
                            
                                yaml.dump adding unwanted newlines in multiline strings
                            
                                How to skip header and footer data in pandas dataframe?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Change first element of each group in pandas DataFrame

Tags:

python

pandas

dataframe

Gaurav Bansal

People also ask

1 Answers

EdChum

Recent Activity

Donate For Us