Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to groupby consecutive values in pandas DataFrame

I have a column in a DataFrame with values:

[1, 1, -1, 1, -1, -1] 

How can I group them like this?

[1,1] [-1] [1] [-1, -1] 
like image 270
Bryan Fok Avatar asked Nov 25 '16 10:11

Bryan Fok


People also ask

Can you use Groupby with multiple columns in pandas?

How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.

Does pandas Groupby keep order?

Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.

How do you remove consecutive duplicates in a DataFrame in Python?

Removing neighboring (consecutive-only) duplicates in a Pandas DataFrame. Pandas, the Python Data Analysis Library, makes it easy to drop duplicates from a DataFrame, using the drop_duplicates() function (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html).

How do you get Groupby rows in pandas?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.


2 Answers

You can use groupby by custom Series:

df = pd.DataFrame({'a': [1, 1, -1, 1, -1, -1]}) print (df)    a 0  1 1  1 2 -1 3  1 4 -1 5 -1  print ((df.a != df.a.shift()).cumsum()) 0    1 1    1 2    2 3    3 4    4 5    4 Name: a, dtype: int32 
for i, g in df.groupby([(df.a != df.a.shift()).cumsum()]):     print (i)     print (g)     print (g.a.tolist())     a 0  1 1  1 [1, 1] 2    a 2 -1 [-1] 3    a 3  1 [1] 4    a 4 -1 5 -1 [-1, -1] 
like image 158
jezrael Avatar answered Oct 11 '22 04:10

jezrael


Using groupby from itertools data from Jez

from itertools import groupby [ list(group) for key, group in groupby(df.a.values.tolist())] Out[361]: [[1, 1], [-1], [1], [-1, -1]] 
like image 39
BENY Avatar answered Oct 11 '22 04:10

BENY