Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select multiple sections of rows by index in pandas

I have large DataFrame with GPS path and some attributes. A few sections of the path are those which I need to analyse. I would like to subset only those sections to a new DataFrame. I can subset one section at the time but the idea is to have them all and to have an original index.

The problem is similar to:

import pandas as pd 
df = pd.DataFrame({'A':[0,1,2,3,4,5,6,7,8,9],'B':['a','b','c','d','e','f','g','h','i','j']},
                  index=range(10,20,))

I want o get something like:

cdf = df.loc[[11:13] & [17:20]] # SyntaxError: invalid syntax

desired outcome:

    A  B
11  1  b
12  2  c
13  3  d
17  7  h
18  8  i
19  9  j

I know the example is easy with cdf = df.loc[[11,12,13,17,18,19],:] but in the original problem I have thousands of lines and some entries already removed, so listing points is rather not an option.

like image 564
tomasz74 Avatar asked Dec 10 '22 15:12

tomasz74


1 Answers

You could use np.r_ to concatenate the slices:

In [16]: df.loc[np.r_[11:13, 17:20]]
Out[16]: 
    A  B
11  1  b
12  2  c
17  7  h
18  8  i
19  9  j

Note, however, that df.loc[A:B] selects labels A through B with B included. np.r_[A:B] returns an array of A through B with B excluded. To include B you would need to use np.r_[A:B+1].

When passed a slice, such as df.loc[A:B], df.loc ignores labels that are not in df.index. In contrast, when passed an array, such as df.loc[np.r_[A:B]], df.loc may add a new row filled with NaNs for each value in the array which is not in df.index.

Thus to produce the desired result, you would need to adjust the right endpoint of the slices and use isin to test for membership in df.index:

In [26]: df.loc[df.index.isin(np.r_[11:14, 17:21])]
Out[26]: 
    A  B
11  1  b
12  2  c
13  3  d
17  7  h
18  8  i
19  9  j
like image 132
unutbu Avatar answered Dec 24 '22 13:12

unutbu