I have a dataframe called xxx. One column of xxx is Final and xxx looks like this
FpPropeTypCode DTE_DATE_DEATH Area Final
0 FP NaN Ame_MidEast_Lnd NaN
1 FP NaN Southern_Europe W.E.M. Lines
2 FP NaN NaN NaN
3 ZP NaN Ame_MidEast_Lnd NaN
4 YY NaN Ame_MidEast_Lnd NaN
I would like to remove all rows that has NaN for Final, so what I did was
xxx= xxx.drop(pd.isnull(data_file_fp4['Final']))
Unfortunately what I got is
FpPropeTypCode DTE_DATE_DEATH Area Final
2 FP NaN NaN NaN
3 ZP NaN Ame_MidEast_Lnd NaN
4 YY NaN Ame_MidEast_Lnd NaN
5 NN NaN Ame_MidEast_Lnd NORTH ARM TRANSPORTATION LTD
6 CP NaN Northern_Europe MPC Group
which is obviously not right...
What I actually need to do is to drop rows based on two conditions: Final being NaN and Area being Ame_MidEast_Lnd. So I can not really use dropna
What was wrong in my current codes just to do the first condition? Thanks in advance.
Are you using pandas? Pandas has a function that will allow you to drop rows based on criteria, in this case a certain column being NaN: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html
The specific command you're looking for would probably be something like:
xxx = xxx.dropna(axis=0, subset=['Final'])
axis=0 specifies that you want to drop rows and not columns subset specifies that you want to drop where 'Final' is NaN
EDIT: The asker cannot use dropna because their filter logic is more complex.
If you want more complex logic, you might be better off just doing bracket logic. I will try and verify in a moment but can you try something like this:
xxx = xxx[~xxx['Final'].isnull()]
If you want the second part of the logic, where you have both the NaN filter and the column filter, you would do this:
xxx = xxx[~(xxx['Final'].isnull() & xxx['Area'].str.contains("Ame_MidEast_Lnd"))]
I have verified that this works by running this python file below:
import pandas as pd
import numpy as np
xxx = pd.DataFrame([
['FP', np.nan, 'Ame_MidEast_Lnd', np.nan],
['FP', np.nan, 'Southern_Europe', 'W.E.M. Lines'],
['FP', np.nan, np.nan, np.nan],
['ZP', np.nan, 'Ame_MidEast_Lnd', np.nan],
['YY', np.nan, 'Ame_MidEast_Lnd', np.nan]],
columns=['FpPropeTypCode','DTE_DATE_DEATH','Area', 'Final']
)
# before
print xxx
# whatever rows have both 'Final' as NaN and 'Area' containing Ame_MidEast_Lnd, we do NOT want those rows
xxx = xxx[~(xxx['Final'].isnull() & xxx['Area'].str.contains("Ame_MidEast_Lnd"))]
# after
print xxx
You will see the solution works the way you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With