Pandas: Impute NaN's

Tags:

I have an incomplete dataframe, incomplete_df, as below. I want to impute the missing amounts with the average amount of the corresponding id. If the average for that specific id is itself NaN (see id=4), I want to use the overall average.

Below are the example data and my highly inefficient solution:

import pandas as pd
import numpy as np
incomplete_df = pd.DataFrame({'id': [1,2,3,2,2,3,1,1,1,2,4],
                              'type': ['one', 'one', 'two', 'three', 'two', 'three', 'one', 'two', 'one', 'three','one'],
                         'amount': [345,928,np.NAN,645,113,942,np.NAN,539,np.NAN,814,np.NAN] 
                         }, columns=['id','type','amount'])

# Forrest Gump Solution
for idx in incomplete_df.index[np.isnan(incomplete_df.amount)]: # loop through all rows with amount = NaN
    cur_id = incomplete_df.loc[idx, 'id']
    if (cur_id in means.index ):
        incomplete_df.loc[idx, 'amount'] = means.loc[cur_id]['amount'] # average amount of that specific id.
    else:
        incomplete_df.loc[idx, 'amount'] = np.mean(means.amount) # average amount across all id's

What is the fastest and the most pythonic/pandonic way to achieve this?

453

asked Jan 10 '14 17:01

Zhubarb

1 Answers

Disclaimer: I'm not really interested in the fastest solution but the most pandorable.

Here, I think that would be something like:

>>> df["amount"].fillna(df.groupby("id")["amount"].transform("mean"), inplace=True)
>>> df["amount"].fillna(df["amount"].mean(), inplace=True)

which produces

>>> df
    id   type  amount
0    1    one   345.0
1    2    one   928.0
2    3    two   942.0
3    2  three   645.0
4    2    two   113.0
5    3  three   942.0
6    1    one   442.0
7    1    two   539.0
8    1    one   442.0
9    2  three   814.0
10   4    one   615.2

[11 rows x 3 columns]

There are lots of obvious tweaks depending upon exactly how you want the chained imputation process to go.

197

answered Oct 14 '22 23:10

DSM

Related questions
                            
                                Sub matrix of a list of lists (without numpy)
                            
                                lxml: insert tag at a given position
                            
                                tick label positions for matplotlib 3D plot
                            
                                how to render only part of html with data using django
                            
                                How to generate a number of n-bit in length using python? [duplicate]
                            
                                Inserting None values into DynamoDB using Boto
                            
                                How to travese two dictionaries in a single for loop?
                            
                                Summing over months with pandas
                            
                                Python Tkinter - How to insert text at the beginning of the text box?
                            
                                Adding field that isn't in model to serializer in Django REST framework
                            
                                VALUES clause in SQLAlchemy
                            
                                Get status text after failed http-request
                            
                                change request.GET QueryDict values
                            
                                converting binary to utf-8 in python
                            
                                How to store os.system() output in a variable or a list in python [duplicate]
                            
                                Efficient Vector / Point class in Python
                            
                                Removing specific ticks from matplotlib plot
                            
                                Can't install discount with pip: error: command 'cc' failed with exit status 1
                            
                                Configuring Django
                            
                                Flask app gives ubiquitous 404 when proxied through nginx

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: Impute NaN's

Tags:

python

pandas

dataframe

nan

mean

Zhubarb

People also ask

1 Answers

DSM

Recent Activity

Donate For Us