pandas group by and find first non null value for all columns

Tags:

I have pandas DF as below ,

id  age   gender  country  sales_year
1   None   M       India    2016
2   23     F       India    2016
1   20     M       India    2015
2   25     F       India    2015
3   30     M       India    2019
4   36     None    India    2019

I want to group by on id, take the latest 1 row as per sales_date with all non null element.

output expected,

id  age   gender  country  sales_year
1   20     M       India    2016
2   23     F       India    2016
3   30     M       India    2019
4   36     None    India    2019

In pyspark,

df = df.withColumn('age', f.first('age', True).over(Window.partitionBy("id").orderBy(df.sales_year.desc())))

But i need same solution in pandas .

EDIT :: This can the case with all the columns. Not just age. I need it to pick up latest non null data(id exist) for all the ids.

496

asked Nov 26 '19 10:11

j '

Video Answer

1 Answers

Use GroupBy.first:

df1 = df.groupby('id', as_index=False).first()
print (df1)
   id   age gender country  sales_year
0   1  20.0      M   India        2016
1   2  23.0      F   India        2016
2   3  30.0      M   India        2019
3   4  36.0    NaN   India        2019

If column sales_year is not sorted:

df2 = df.sort_values('sales_year', ascending=False).groupby('id', as_index=False).first()
print (df2)
   id   age gender country  sales_year
0   1  20.0      M   India        2016
1   2  23.0      F   India        2016
2   3  30.0      M   India        2019
3   4  36.0    NaN   India        2019

183

answered Oct 01 '22 13:10

jezrael

Related questions
                            
                                How to find_all(id) from a div with beautiful soup in python
                            
                                How to add new line to existing pandas dataframe? [duplicate]
                            
                                Can't fix "zipimport.ZipImportError: can't decompress data; zlib not available" when I type in "python3.6 get-pip.py"
                            
                                Set 'y' axis to scientific notation
                            
                                Workaround for blocked GET requests in Python
                            
                                write pytest test function return value to file with pytest.hookimpl
                            
                                Why pd.to_numeric `errors=''` is equivalent to `errors='coerce'`
                            
                                LSTM Keras input shape confusion
                            
                                multiplying two int arrays in python
                            
                                Fill dataframe nan values from a join
                            
                                Pandas How to create a new dataframe with a start and end even if on different rows
                            
                                What is the difference between json() method and json.loads()
                            
                                tensorflow transition to gpu version
                            
                                Forward fill missing values by group after condition is met in pandas
                            
                                python-docx: Parse a table to Panda Dataframe
                            
                                Get visual feedback from QValidator
                            
                                How to set a value for a specific threshold in SVC model and generate a confusion matrix?
                            
                                Installing Python 3.8 on windows 7 32bit with SP1
                            
                                Display pandas dataframe with larger font in jupyter notebook
                            
                                Aiohttp logging: how to distinguish log messages of different requests?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas group by and find first non null value for all columns

Tags:

python

pandas

group-by

window

pyspark

j '

People also ask

Video Answer

1 Answers

jezrael

Recent Activity

Donate For Us