I have a PANDAS dataframe with a columns with an open/closed status value and a ranking field value. After I sort by the rank field, what would be the best way to drop/delete all rows after the first occurrence of an "open" value? I'm just confused if I should take an iterator function or standard index based approach with PANDAS. Any advice would be great!
Edit: This is just what I have started with thus far
df["Rank", "Status"].sort_values(by="Rank")
The output I am trying to accomplish would look like the following:
From this:
Rank Status
1 Closed
5 Closed
6 Open
9 Closed
10 Open
To this:
Rank Status
1 Closed
5 Closed
6 Open
You can simply reindex the data frame when you sort it and then find the index location of the first instance of 'Open' and slice the data to there....
import pandas as pd
#create dataframe
df = pd.DataFrame({
'Rank' : [5, 1, 10 ,6, 9],
'Status' : ['Closed', 'Closed', 'Open', 'Closed', 'Open']
})
# sort and reindex
df = df.sort('Rank').reset_index()
#slice to first occurrence of your value
df.loc[: df[(df['Status'] == 'Open')].index[0], :]
Almost the same answer. Manipulating df directly.
df = df[:df[df['Status'] == 'Open'].index[0]]
This will return the index of the first instance of the value and then slice the DataFrame up to that row.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With