I have a df that looks like this:
    A                B    C               D     NEW
0   1       Adhoc_Task  WID          WI_DTL      []  
1   1  Arun_adhoc_load  ATT           IXN_1  (IXN,)
2   1  Arun_adhoc_load  ATT          IXN_10  (IXN,)
3   1  Arun_adhoc_load  ATT         IXN_100  (IXN,)
4   1  Arun_adhoc_load  ATT         IXN_101  (IXN,)
5   2    Batch_Support  ATT      CDS_STATUS      []
6   2    Batch_Support  ATT     CDS_CONTROL      []
7   2    Batch_Support  ATT  CDS_ORA_STATUS      []
8   2    Batch_Support  ATT      REP_FILTER      []
9   1      online_load  ATT           TAX_3  (TAX,)
10  1      online_load  ATT           TAX_4  (TAX,)
11  1      online_load  ATT           TAX_8  (TAX,)
12  1      online_load  ATT          TAX_11  (TAX,)
Desired output would look like this:
    A                B    C               D     NEW
0   1       Adhoc_Task  WID          WI_DTL      []  
1   1  Arun_adhoc_load  ATT           IXN_1  (IXN,)
5   2    Batch_Support  ATT      CDS_STATUS      []
9   1      online_load  ATT           TAX_3  (TAX,)
I'm trying to drop duplicate rows based off column B. However, when I run
df.drop_duplicates(subset = ['B'], keep='first', inplace=True)
I get the following error:
TypeError: drop_duplicates() got an unexpected keyword argument 'subset'
I'm running pandas 0.19.1 from python 3, so I took a look at the documentation here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html
I haven't the foggiest of what I'm doing wrong with subset. How would I drop duplicates from the DataFrame based off the values in one column?
For whatever reason in your code, df became a Series object. Check type(df) just before the failing drop_duplicates call. That function has no subset argument for the Series.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With