Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python .drop does not give the result I expect

Tags:

python

nan

I have a dataframe called xxx. One column of xxx is Final and xxx looks like this

  FpPropeTypCode DTE_DATE_DEATH             Area         Final  
0             FP            NaN  Ame_MidEast_Lnd           NaN  
1             FP            NaN  Southern_Europe  W.E.M. Lines  
2             FP            NaN              NaN           NaN  
3             ZP            NaN  Ame_MidEast_Lnd           NaN  
4             YY            NaN  Ame_MidEast_Lnd           NaN  

I would like to remove all rows that has NaN for Final, so what I did was

xxx= xxx.drop(pd.isnull(data_file_fp4['Final']))

Unfortunately what I got is

  FpPropeTypCode DTE_DATE_DEATH             Area                         Final  
2             FP            NaN              NaN                           NaN  
3             ZP            NaN  Ame_MidEast_Lnd                           NaN  
4             YY            NaN  Ame_MidEast_Lnd                           NaN  
5             NN            NaN  Ame_MidEast_Lnd  NORTH ARM TRANSPORTATION LTD  
6             CP            NaN  Northern_Europe                     MPC Group 

which is obviously not right...

What I actually need to do is to drop rows based on two conditions: Final being NaN and Area being Ame_MidEast_Lnd. So I can not really use dropna

What was wrong in my current codes just to do the first condition? Thanks in advance.


1 Answers

Are you using pandas? Pandas has a function that will allow you to drop rows based on criteria, in this case a certain column being NaN: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

The specific command you're looking for would probably be something like:

xxx = xxx.dropna(axis=0, subset=['Final'])

axis=0 specifies that you want to drop rows and not columns subset specifies that you want to drop where 'Final' is NaN

EDIT: The asker cannot use dropna because their filter logic is more complex.

If you want more complex logic, you might be better off just doing bracket logic. I will try and verify in a moment but can you try something like this:

xxx = xxx[~xxx['Final'].isnull()]

If you want the second part of the logic, where you have both the NaN filter and the column filter, you would do this:

xxx = xxx[~(xxx['Final'].isnull() & xxx['Area'].str.contains("Ame_MidEast_Lnd"))]

I have verified that this works by running this python file below:

import pandas as pd
import numpy as np

xxx = pd.DataFrame([
                    ['FP', np.nan, 'Ame_MidEast_Lnd', np.nan],
                    ['FP', np.nan, 'Southern_Europe', 'W.E.M. Lines'],
                    ['FP', np.nan, np.nan, np.nan],
                    ['ZP', np.nan, 'Ame_MidEast_Lnd', np.nan],
                    ['YY', np.nan, 'Ame_MidEast_Lnd', np.nan]],
                   columns=['FpPropeTypCode','DTE_DATE_DEATH','Area', 'Final']
                   )

# before
print xxx

# whatever rows have both 'Final' as NaN and 'Area' containing Ame_MidEast_Lnd, we do NOT want those rows
xxx = xxx[~(xxx['Final'].isnull() & xxx['Area'].str.contains("Ame_MidEast_Lnd"))]

# after
print xxx

You will see the solution works the way you want.

like image 192
itsmichaelwang Avatar answered Jun 08 '26 02:06

itsmichaelwang