Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the index for a value in a Pandas Dataframe

I've got a problem that shouldn't be that difficult but it's stumping me. There has to be an easy way to do it. I have a series from a dataframe that looks like this:

               value

2001-01-04     0.134
2001-01-05      Nan
2001-01-06      Nan
2001-01-07     0.032
2001-01-08      Nan
2001-01-09     0.113
2001-01-10      Nan
2001-01-11      Nan
2001-01-12     0.112
2001-01-13      Nan
2001-01-14      Nan
2001-01-15     0.136
2001-01-16      Nan
2001-01-17      Nan

Iterating from bottom to top, I need to find the index of the value that is greater than 0.100 at the earliest date where the next earliest date would be less than 0.100.

So in the series above, I want to find the index of the value 0.113 which is 2001-01-09. The next earlier value is below 0.100 (0.031 on 2001-01-07). The two later values are greater than 0.100 but I want the index of the earliest value > 0.100 following a value less than than threshold iterating bottom to top.

The only way I can think of doing this is reversing the series, iterating to the first (last) value, checking if it is > 0.100, then again iterating to the next earlier value, and checking it to see if it's less than 0.100. If it isn't I'm done. If it > 0.100 I have to iterate again and test the earlier number.

Surely there is a non-messy way to do this I'm not seeing that avoids all this stepwise iteration.

Thanks in advance for you help.

like image 525
Windstorm1981 Avatar asked Mar 24 '17 18:03

Windstorm1981


People also ask

How do you find the index of a value in a DataFrame in python?

To get the index of a Pandas DataFrame, call DataFrame. index property. The DataFrame. index property returns an Index object representing the index of this DataFrame.

How do you find the row index of an element in a DataFrame?

Use pandas DataFrame. iloc[] & DataFrame. loc[] to select rows by integer Index and by row indices respectively. iloc[] operator can accept single index, multiple indexes from the list, indexes by a range, and many more.

How do you access the index of a Pandas series?

In order to access the series element refers to the index number. Use the index operator [ ] to access an element in a series. The index must be an integer. In order to access multiple elements from a series, we use Slice operation.


2 Answers

You're essentially looking for two conditions. For the first condition, you want the given value to be greater than 0.1:

df['value'].gt(0.1)

For the second condition, you want the previous non-null value to be less than 0.1:

df['value'].ffill().shift().lt(0.1)

Now, combine the two conditions with the and operator, reverse the resulting Boolean indexer, and use idxmax to find the the first (last) instance where your condition holds:

(df['value'].gt(0.1) & df['value'].ffill().shift().lt(0.1))[::-1].idxmax()

Which gives the expected index value.

The above method assumes that at least one value satisfies the situation you've described. If it's possible that your data may not satisfy your situation you may want to use any to verify that a solution exists:

# Build the condition.
cond = (df['value'].gt(0.1) & df['value'].ffill().shift().lt(0.1))[::-1]

# Check if the condition is met anywhere.
if cond.any():
    idx = cond.idxmax()
else:
    idx = ???

In you're question, you've specified both inequalities to be strict. What happens for a value exactly equal to 0.1? You may want to change one of the gt/lt to ge/le to account for this.

like image 129
root Avatar answered Oct 19 '22 16:10

root


Bookkeepping

# making sure `nan` are actually `nan`
df.value = pd.to_numeric(df.value, 'coerce')
# making sure strings are actually dates
df.index = pd.to_datetime(df.index)

plan

  • dropna
  • sort_index
  • boolean series of less than 0.1
  • convert to integers to use in diff
  • diff - Your scenario happens when we go from < .1 to > .1. In this case, diff will be -1
  • idxmax - find the first -1

df.value.dropna().sort_index().lt(.1).astype(int).diff().eq(-1).idxmax()

2001-01-09 00:00:00

Correction do account for flaw pointed out by @root.

diffs = df.value.dropna().sort_index().lt(.1).astype(int).diff().eq(-1)
diffs.idxmax() if diffs.any() else pd.NaT

editorial

This question highlights an important SO dynamic. We that answer questions often do so by editing our questions until they are in a satisfactory state. I have observed that those of us who answer pandas questions are generally very helpful to each other as well to those who ask questions.

In this post, I was well informed by @root and subsequently changed my post to reflect the added information. That alone makes @root's post very useful in addition to the other great information they provided.

Please recognize both posts and up vote as many useful posts as you can.

Thx

like image 22
piRSquared Avatar answered Oct 19 '22 14:10

piRSquared