There are several columns in the data, three are named "candidate_id", "enddate", "TitleLevel".
Within the same id, if the enddate is the same, I will delete the lower level record.
For example, given:
candidate_id   startdate     enddate   TitleLevel
    1          2012.1.1      2013.5.1     2
    1          2011.1.1      2013.5.1     4
    1          2008.12.1     2010.1.1     3
    2          2010.10.1     2012.12.1    2
What I want is:
candidate_id   startdate     enddate   TitleLevel
    1          2011.1.1      2013.5.1     4
    1          2008.12.1     2010.1.1     3
    2          2010.10.1     2012.12.1    2
I will delete candidate_id=1, enddate=2013.5.1, and titlelevel=2.
I have come up with a loop.
for i in range(nrow-2,-1, -1):
    if (JobData['enddate'][i] == JobData['enddate'][i+1] 
           and JobData['candidate_id'][i] == JobData['candidate_id'][i+1] 
           and pd.notnull(JobData['enddate'][i]):
        if JobData['TitleLevel'][i] > JobData['TitleLevel'][i+1]:
            JobData= JobData.drop(i+1)
        else:
            JobData= JobData.drop(i) 
The loop really takes some time to delete redundant rows. Is there a faster method?
If you data structure is exactly as you describe, you can use groupby/max:
>>> df
   candidate_id    enddate  TitleLevel
0             1   2013.5.1           2
1             1   2013.5.1           4
2             1   2010.1.1           3
3             2  2012.12.1           2
>>> df.groupby(['candidate_id','enddate']).max().reset_index()
   candidate_id    enddate  TitleLevel
0             1   2010.1.1           3
1             1   2013.5.1           4
2             2  2012.12.1           2
Here groupby groups rows with equal candidate_id and enddate, and max() evaluates maximum TitleLevel within each group. Result is the same as if rows with all other values being dropped.
In case you have more columns,
>>> df
   candidate_id    enddate  TitleLevel other_column
0             1   2013.5.1           2          foo
1             1   2013.5.1           4          bar
2             1   2010.1.1           3       foobar
3             2  2012.12.1           2       barfoo
you can get idexes of rows with max values, without sorting if rows order has to be preserved:
>>> idx = df.groupby(['candidate_id','enddate'], sort=False)['TitleLevel'].agg(lambda x: x.idxmax())
and filter needed rows with ix:
>>> df.ix[idx]
   candidate_id    enddate  TitleLevel other_column
1             1   2013.5.1           4          bar
2             1   2010.1.1           3       foobar
3             2  2012.12.1           2       barfoo
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With