Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find value greater than level - Python Pandas

Tags:

python

pandas

In a time series (ordered tuples), what's the most efficient way to find the first time a criterion is met?

In particular, what's the most efficient way to determine when a value goes over 100 for the value of a column in a pandas data frame?

I was hoping for a clever vectorized solution, and not having to use df.iterrows().

For example, for price or count data, when a value exceeds 100. I.e. df['col'] > 100.

              price
date 
2005-01-01     98
2005-01-02     99
2005-01-03     100
2005-01-04     99
2005-01-05     98
2005-01-06     100
2005-01-07     100
2005-01-08     98

but for potentially very large series. Is it better to iterate (slow) or is there a vectorized solution?

A df.iterrows() solution could be:

for row, ind in df.iterrows():
    if row['col'] > value_to_check:
        breakpoint = row['value_to_record'].loc[ind]
        return breakpoint
return None

But my question is more about efficiency (potentially, a vectorized solution that will scale well).

like image 375
Jared Avatar asked Aug 10 '16 00:08

Jared


People also ask

How do you find values greater than in Pandas?

By using the pandas series.gt() method we can check if the elements of a series object are Greater Than a scalar value or not. The gt() comparison operation is exactly equivalent to series > Other.

How do you use greater than and less than in Pandas?

le (equivalent to <= ) — less than or equals to. lt (equivalent to < ) — less than. ge (equivalent to >= ) — greater than or equals to. gt (equivalent to > ) — greater than.

How do you find the highest value in a DataFrame Python?

Pandas DataFrame max() Method The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.

How do you find the top 5 values in Python?

to get the top 5 most occuring values use df['column']. value_counts(). head(n) and the solution provided by @lux7 df['column']. nlargest(n=5) would result in the top 5 values from a column(their values not how many times they have appeared).


1 Answers

Try this: "> 99"

df[df['price'].gt(99)].index[0]

returns "2", the second index row.

all row indexes greater than 99

df[df['price'].gt(99)].index
Int64Index([2, 5, 6], dtype='int64')
like image 120
Merlin Avatar answered Nov 15 '22 22:11

Merlin