Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get rows from Pandas DataFrame from index until condition

Let's say I have a Pandas DataFrame:

x = pd.DataFrame(data=[5,4,3,2,1,0,1,2,3,4,5],columns=['value'])
x
Out[9]: 
    value
0       5
1       4
2       3
3       2
4       1
5       0
6       1
7       2
8       3
9       4
10      5

Now, I want to, given an index, find rows in x until a condition is met. For example, if index = 2:

x.loc[2]
Out[14]: 
value    3
Name: 2, dtype: int64

Now I want to, from that index, find the next n rows where the value is greater than some threshold. For example, if the threshold is 0, the results should be:

x
Out[9]: 
    value
2       3
3       2
4       1
5       0

How can I do this?

I have tried:

x.loc[2:x['value']>0,:]

But of course this will not work because x['value']>0 returns a boolean array of:

Out[20]: 
0      True
1      True
2      True
3      True
4      True
5     False
6      True
7      True
8      True
9      True
10     True
Name: value, dtype: bool
like image 590
pookie Avatar asked Sep 13 '18 15:09

pookie


People also ask

How do you select rows from a DataFrame based on an index?

You can select a single row from pandas DataFrame by integer index using df. iloc[n] . Replace n with a position you wanted to select.

How do you select rows of pandas DataFrame using multiple conditions?

To select the rows based on mutiple condition we can use the & operator.In this example we have passed mutiple conditon using this code dfobj. loc[(dobj['Name'] == 'Rack') & (dobj['Marks'] == 100)]. This code will return a subset of dataframe rows where name='Rack' and marks =100.

How do you select rows of pandas DataFrame based on values in a list?

isin() to Select Rows From List of Values. DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.


2 Answers

Using idxmin and slicing

x.loc[2:x['value'].gt(0).idxmin(),:]

2    3
3    2
4    1
5    0
Name: value

Edit:

For a general formula, use

index = 7
threshold = 2
x.loc[index:x.loc[index:,'value'].gt(threshold).idxmin(),:]

From your description in comments, seemed like you want to begin from index+1 and not index. So, if that is the case, just use

x.loc[index+1:x.loc[index+1:,'value'].gt(threshold).idxmin(),:]
like image 195
rafaelc Avatar answered Sep 20 '22 22:09

rafaelc


You want to filter for index greater than your index=2, and for x['value']>=threshold, and then select the first n of these rows, which can be accomplished with .head(n).

Say:

idx = 2
threshold = 0
n = 4
x[(x.index>=idx) & (x['value']>=threshold)].head(n)

Out:

#      value
# 2     3
# 3     2
# 4     1
# 5     0

Edit: changed to >=, and updated example to match OP's example.

Edit 2 due to clarification from OP: since n is unknown:

idx = 2
threshold = 0
x.loc[idx:(x['value']<=threshold).loc[x.index>=idx].idxmax()]

This is selecting from the starting idx, in this case idx=2, up to and including the first row where the condition is not met (in this case index 5).

like image 32
Jake Morris Avatar answered Sep 17 '22 22:09

Jake Morris