distinct contiguous blocks in pandas dataframe

Question

I have a pandas dataframe looking like this:

 x1=[np.nan, 'a','a','a', np.nan,np.nan,'b','b','c',np.nan,'b','b', np.nan]
 ty1 = pd.DataFrame({'name':x1})

Do you know how I can get a list of tuples containing the start and end indices of distinct contiguous blocks? For example for the dataframe above,

[(1,3), (6,7), (8,8), (10,11)].

joris · Accepted Answer

You can use shift and cumsum to create 'id's for each contiguous block:

In [5]: blocks = (ty1 != ty1.shift()).cumsum()

In [6]: blocks
Out[6]:
    name
0      1
1      2
2      2
3      2
4      3
5      4
6      5
7      5
8      6
9      7
10     8
11     8
12     9

You are only interested in those blocks that are not NaN, so filter for that:

In [7]: blocks = blocks[ty1['name'].notnull()]

In [8]: blocks
Out[8]:
    name
1      2
2      2
3      2
6      5
7      5
8      6
10     8
11     8

And then, we can get the first and last index for each 'id':

In [10]: blocks.groupby('name').apply(lambda x: (x.index[0], x.index[-1]))
Out[10]:
name
2      (1, 3)
5      (6, 7)
6      (8, 8)
8    (10, 11)
dtype: object

Although, if this last step is necessary will depend on what you want to do with it (working with tuples as elements in dataframes in not really recommended). Maybe having the 'id's can already be enough.

distinct contiguous blocks in pandas dataframe

Tags:

python

pandas

NickD1

1 Answers

joris

Recent Activity

Donate For Us

distinct contiguous blocks in pandas dataframe

Tags:

python

pandas

NickD1

1 Answers

joris

Related questions

Recent Activity

Donate For Us