Let's say I have a Pandas DataFrame:
x = pd.DataFrame(data=[5,4,3,2,1,0,1,2,3,4,5],columns=['value'])
x
Out[9]:
value
0 5
1 4
2 3
3 2
4 1
5 0
6 1
7 2
8 3
9 4
10 5
Now, I want to, given an index, find rows in x
until a condition is met.
For example, if index = 2
:
x.loc[2]
Out[14]:
value 3
Name: 2, dtype: int64
Now I want to, from that index
, find the next n
rows where the value is greater than some threshold
. For example, if the threshold is 0
, the results should be:
x
Out[9]:
value
2 3
3 2
4 1
5 0
How can I do this?
I have tried:
x.loc[2:x['value']>0,:]
But of course this will not work because x['value']>0
returns a boolean array of:
Out[20]:
0 True
1 True
2 True
3 True
4 True
5 False
6 True
7 True
8 True
9 True
10 True
Name: value, dtype: bool
You can select a single row from pandas DataFrame by integer index using df. iloc[n] . Replace n with a position you wanted to select.
To select the rows based on mutiple condition we can use the & operator.In this example we have passed mutiple conditon using this code dfobj. loc[(dobj['Name'] == 'Rack') & (dobj['Marks'] == 100)]. This code will return a subset of dataframe rows where name='Rack' and marks =100.
isin() to Select Rows From List of Values. DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.
Using idxmin
and slicing
x.loc[2:x['value'].gt(0).idxmin(),:]
2 3
3 2
4 1
5 0
Name: value
Edit:
For a general formula, use
index = 7
threshold = 2
x.loc[index:x.loc[index:,'value'].gt(threshold).idxmin(),:]
From your description in comments, seemed like you want to begin from index+1
and not index. So, if that is the case, just use
x.loc[index+1:x.loc[index+1:,'value'].gt(threshold).idxmin(),:]
You want to filter for index greater than your index=2
, and for x['value']>=threshold
, and then select the first n
of these rows, which can be accomplished with .head(n)
.
Say:
idx = 2
threshold = 0
n = 4
x[(x.index>=idx) & (x['value']>=threshold)].head(n)
Out:
# value
# 2 3
# 3 2
# 4 1
# 5 0
Edit: changed to >=, and updated example to match OP's example.
Edit 2 due to clarification from OP: since n
is unknown:
idx = 2
threshold = 0
x.loc[idx:(x['value']<=threshold).loc[x.index>=idx].idxmax()]
This is selecting from the starting idx
, in this case idx=2
, up to and including the first row where the condition is not met (in this case index 5
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With