Remove many index ranges from Pandas DataFrame

Question

Question + MWE

How can I drop/remove multiple ranges of rows from a Pandas DataFrame with a (two level) multi index looking like this:

idx1    idx2  |  value(s)   ...
------------------------------------------
4       0     |  1.123456   ...
        1     |  2.234567   ...
        2     |  0.012345   ...
8       0     | -1.123456   ...
        1     | -0.973915   ...
        2     |  1.285553   ...
        3     | -0.194625   ...
        4     | -0.144112   ...
...     ...   | ...         ...

The ranges to drop/remove are currently residing in a list like this:

ranges = [[(4, 1), (4, 2)],          # range (4,1):(4,2)
          [(8, 0), (8, 3)],          # range (8,0):(8,3)
          [(8, 5), (8, 10)], ...]    # range (8,5):(8,10)

The main problem is, that most methods I found, don't support either multi indexing or slicing or multiple slices/ranges.

What is the best/fastest way to do so.

Current ugly solution

for range in ranges:
    df = df.drop(df.loc[range[0]:range[1]].index)

Is slow and ugly, but is the only workable solution I found, combining multi indexing, slicing and in a way multiple ranges (by looping over the ranges).

Solution comparison

All three proposed solutions work. They all solve the issue by transforming the list of slices into a list of all individual tuples within those slice.

Slices to complete set of tuples

The fastest way to do so is @ALollz solution:

idx = [(x, z) for (x, i), (_, j) in ranges for z in np.arange(i,j+1,1)]

Performance

Regarding the removal of the rows, all solutions work, but there's a big difference in performance (all performance data based on my data set with ~10 Mio. entries)

@ALollz + @Ben. T's combined solution (~19 sec.)
```
df.drop(pd.MultiIndex.from_tuples(idx))
```
or without creating a MultiIndex object
```
df.drop(idx)
```

@ALollz first solution (~75 sec.)

df.loc[list(set(df.index.values) - set(idx))]

@user3471881's solution (~95 sec.)
```
df.loc[~df.index.isin(ranges)]
```
my ugly solution (~350 sec.)
```
see above
```

ALollz · Accepted Answer

You could create a new list of indices and as Ben.T points out, just drop them.

import numpy as np
import pandas as pd

idx = [(x, z) for (x, i), (_, j) in ranges for z in np.arange(i,j+1,1)]
df.drop(pd.MultiIndex.from_tuples(idx))

Output:

           value(s)
idx1 idx2          
4    0            4
8    4           11

user3471881 · Answer

The range list you are using forces us to use multiple slices, which could be fine but doesn't seem to be what you want.

If you instead fill your list with all indexes that you want removed (you said in a comment that you could do this):

ranges = [(4, 1), (4, 2), (8, 0), (8, 1), (8, 2), (8, 3) ... ]

You could just access the index of the DataFrame and check if it isin() your list of tuples.

df.index.isin(ranges)

To remove the indexes that are in your list of ranges, add a tilde and then use as mask.

df[~df.index.isin(ranges)]

Remove many index ranges from Pandas DataFrame

Tags:

python

pandas

Question + MWE

Current ugly solution

Solution comparison

Slices to complete set of tuples

Performance

Ichixgo

2 Answers

Output:

ALollz

user3471881

Recent Activity

Donate For Us

Remove many index ranges from Pandas DataFrame

Tags:

python

pandas

Question + MWE

Current ugly solution

Solution comparison

Slices to complete set of tuples

Performance

Ichixgo

2 Answers

Output:

ALollz

user3471881

Related questions

Recent Activity

Donate For Us