Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Dataframe replace Nan from a row when a column value matches

I have dataframe i.e.,

Input Dataframe

      class  section  sub  marks  school  city
0     I      A        Eng  80     jghss   salem
1     I      A        Mat  90     jghss   salem 
2     I      A        Eng  50     Nan     salem 
3     III    A        Eng  80     gphss   Nan
4     III    A        Mat  45     Nan     salem
5     III    A        Eng  40     gphss   Nan
6     III    A        Eng  20     gphss   salem
7     III    A        Mat  55     gphss   Nan

I need to replace the "Nan" in "school" and "city" when a value in "class" and "section" column matches. The resultant outcome suppose to be, Input Dataframe

      class  section  sub  marks  school  city
0     I      A        Eng  80     jghss   salem
1     I      A        Mat  90     jghss   salem 
2     I      A        Eng  50     jghss   salem 
3     III    A        Eng  80     gphss   salem
4     III    A        Mat  45     gphss   salem
5     III    A        Eng  40     gphss   salem
6     III    A        Eng  20     gphss   salem
7     III    A        Mat  55     gphss   salem

Can anyone help me out in this?

like image 784
Mahamutha M Avatar asked Mar 28 '19 04:03

Mahamutha M


People also ask

How do I change NaN values in pandas based on condition?

You can replace all values or selected values in a column of pandas DataFrame based on condition by using DataFrame. loc[] , np. where() and DataFrame. mask() methods.

How do I change NaN values in a column in pandas?

You can use the fillna() function to replace NaN values in a pandas DataFrame.


Video Answer


2 Answers

Use forward and back filling missing values per groups with lambda function in columns specified in list with DataFrame.groupby - is necessary for each combination same values per groups:

cols = ['school','city']
df[cols] = df.groupby(['class','section'])[cols].apply(lambda x: x.ffill().bfill())
print (df)
  class section  sub  marks school   city
0     I       A  Eng     80  jghss  salem
1     I       A  Mat     90  jghss  salem
2     I       A  Eng     50  jghss  salem
3   III       A  Eng     80  gphss  salem
4   III       A  Mat     45  gphss  salem
5   III       A  Eng     40  gphss  salem
6   III       A  Eng     20  gphss  salem
7   III       A  Mat     55  gphss  salem
like image 78
jezrael Avatar answered Oct 24 '22 07:10

jezrael


Assuming that each pair of class and section corresponds to a unique pair of school and city, we can use groupby:

# create a dictionary of class and section with school and city
# here we assume that for each pair and class there's a row with both school and city
# if that's not the case, we can separate the two series 
school_city_dict = df[['class', 'section','school','city']].dropna().\
                     groupby(['class', 'section'])[['school','city']].\
                     max().to_dict()
# school_city_dict = {'school': {('I', 'A'): 'jghss', ('III', 'A'): 'gphss'},
#                     'city': {('I', 'A'): 'salem', ('III', 'A'): 'salem'}}

# set index, prepare for map function
df.set_index(['class','section'], inplace=True)

df.loc[:,'school'] = df.index.map(school_city_dict['school'])
df.loc[:,'city'] = df.index.map(school_city_dict['city'])

# reset index to the original
df.reset_index()
like image 26
Quang Hoang Avatar answered Oct 24 '22 08:10

Quang Hoang