Pandas Dataframe replace Nan from a row when a column value matches

Tags:

I have dataframe i.e.,

Input Dataframe

      class  section  sub  marks  school  city
0     I      A        Eng  80     jghss   salem
1     I      A        Mat  90     jghss   salem 
2     I      A        Eng  50     Nan     salem 
3     III    A        Eng  80     gphss   Nan
4     III    A        Mat  45     Nan     salem
5     III    A        Eng  40     gphss   Nan
6     III    A        Eng  20     gphss   salem
7     III    A        Mat  55     gphss   Nan

I need to replace the "Nan" in "school" and "city" when a value in "class" and "section" column matches. The resultant outcome suppose to be, Input Dataframe

      class  section  sub  marks  school  city
0     I      A        Eng  80     jghss   salem
1     I      A        Mat  90     jghss   salem 
2     I      A        Eng  50     jghss   salem 
3     III    A        Eng  80     gphss   salem
4     III    A        Mat  45     gphss   salem
5     III    A        Eng  40     gphss   salem
6     III    A        Eng  20     gphss   salem
7     III    A        Mat  55     gphss   salem

Can anyone help me out in this?

784

asked Mar 28 '19 04:03

Mahamutha M

Video Answer

2 Answers

Use forward and back filling missing values per groups with lambda function in columns specified in list with DataFrame.groupby - is necessary for each combination same values per groups:

cols = ['school','city']
df[cols] = df.groupby(['class','section'])[cols].apply(lambda x: x.ffill().bfill())
print (df)
  class section  sub  marks school   city
0     I       A  Eng     80  jghss  salem
1     I       A  Mat     90  jghss  salem
2     I       A  Eng     50  jghss  salem
3   III       A  Eng     80  gphss  salem
4   III       A  Mat     45  gphss  salem
5   III       A  Eng     40  gphss  salem
6   III       A  Eng     20  gphss  salem
7   III       A  Mat     55  gphss  salem

answered Oct 24 '22 07:10

jezrael

Assuming that each pair of class and section corresponds to a unique pair of school and city, we can use groupby:

# create a dictionary of class and section with school and city
# here we assume that for each pair and class there's a row with both school and city
# if that's not the case, we can separate the two series 
school_city_dict = df[['class', 'section','school','city']].dropna().\
                     groupby(['class', 'section'])[['school','city']].\
                     max().to_dict()
# school_city_dict = {'school': {('I', 'A'): 'jghss', ('III', 'A'): 'gphss'},
#                     'city': {('I', 'A'): 'salem', ('III', 'A'): 'salem'}}

# set index, prepare for map function
df.set_index(['class','section'], inplace=True)

df.loc[:,'school'] = df.index.map(school_city_dict['school'])
df.loc[:,'city'] = df.index.map(school_city_dict['city'])

# reset index to the original
df.reset_index()

answered Oct 24 '22 08:10

Quang Hoang

Related questions
                            
                                Spotify API {'error': 'invalid_client'} Authorization Code Flow [400]
                            
                                How to encircle some pixels on a heat map with a continuous, not branched line using Python?
                            
                                How to specify Accept headers from rest_framework.test.Client?
                            
                                Project Euler # 11 Numpy way
                            
                                How to use TensorFlow tf.print with non capital p?
                            
                                Django Admin List Filter Remove All Option
                            
                                How to cut a list by specific item?
                            
                                How to save pandas to excel with different colors
                            
                                Cannot load mkl_intel_thread.dll on python executable
                            
                                How to assign random values from a list to a column in a pandas dataframe?
                            
                                MySQL One-to-Many to JSON format
                            
                                When to use dynamodb.client, dynamodb.resource and dynamodb.Table?
                            
                                how to write gray (1-channel) image with opencv for python
                            
                                Can't connect to mysql db withh python - bad handshake
                            
                                Column-dependent bounds in torch.clamp
                            
                                How do I write a BeautifulSoup strainer that only parses objects with certain text between the tags?
                            
                                Pybind11 and std::vector -- How to free data using capsules?
                            
                                reload module with pyximport?
                            
                                Leading underscore before the name of Python module
                            
                                What's the difference between state.sls and state.apply?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas Dataframe replace Nan from a row when a column value matches

Tags:

python

python-3.x

pandas

nan