Drop all rows before first occurrence of a value

Question

I have a df like so:

Year  ID Count
1997  1  0
1998  2  0
1999  3  1
2000  4  0
2001  5  1

and I want to remove all rows before the first occurrence of 1 in Count which would give me:

Year  ID Count
1999  3  1
2000  4  0
2001  5  1

I can remove all rows AFTER the first occurrence like this:

df=df.loc[: df[(df['Count'] == 1)].index[0], :]

but I can't seem to follow the slicing logic to make it do the opposite.

piRSquared · Accepted Answer

I'd do:

df[(df.Count == 1).idxmax():]

enter image description here

df.Count == 1 returns a boolean array. idxmax() will identify the index of the maximum value. I know the max value will be True and when there are more than one Trues it will return the position of the first one found. That's exactly what you want. By the way, that value is 2. Finally, I slice the dataframe for everything from 2 onward with df[2:]. I put all that in one line in the answer above.

MaxU - stop WAR against UA · Answer

you can use cumsum() method:

In [13]: df[(df.Count == 1).cumsum() > 0]
Out[13]:
   Year  ID  Count
2  1999   3      1
3  2000   4      0
4  2001   5      1

Explanation:

In [14]: (df.Count == 1).cumsum()
Out[14]:
0    0
1    0
2    1
3    1
4    2
Name: Count, dtype: int32

Timing against 500K rows DF:

In [18]: df = pd.concat([df] * 10**5, ignore_index=True)

In [19]: df.shape
Out[19]: (500000, 3)

In [20]: %timeit df[(df.Count == 1).idxmax():]
100 loops, best of 3: 3.7 ms per loop

In [21]: %timeit df[(df.Count == 1).cumsum() > 0]
100 loops, best of 3: 16.4 ms per loop

In [22]: %timeit df.loc[df[(df['Count'] == 1)].index[0]:, :]
The slowest run took 4.01 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 7.02 ms per loop

Conclusion: @piRSquared's idxmax() solution is a clear winner...

root · Answer

Using np.where:

df[np.where(df['Count']==1)[0][0]:]

Timings

Timings were performed on a larger version of the DataFrame:

df = pd.concat([df]*10**5, ignore_index=True)

Results:

%timeit df[np.where(df['Count']==1)[0][0]:]  
100 loops, best of 3: 2.74 ms per loop

%timeit df[(df.Count == 1).idxmax():] 
100 loops, best of 3: 6.18 ms per loop

%timeit df[(df.Count == 1).cumsum() > 0] 
10 loops, best of 3: 26.6 ms per loop

%timeit df.loc[df[(df['Count'] == 1)].index[0]:, :]
100 loops, best of 3: 11.2 ms per loop

Drop all rows before first occurrence of a value

Tags:

pandas

python-2.7

Stefano Potter

3 Answers

piRSquared

MaxU - stop WAR against UA

root

Recent Activity

Donate For Us

Drop all rows before first occurrence of a value

Tags:

pandas

python-2.7

Stefano Potter

3 Answers

piRSquared

MaxU - stop WAR against UA

root

Related questions

Recent Activity

Donate For Us