Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Groupby Range of Values

Is there an easy method in pandas to invoke groupby on a range of values increments? For instance given the example below can I bin and group column B with a 0.155 increment so that for example, the first couple of groups in column B are divided into ranges between '0 - 0.155, 0.155 - 0.31 ...`

import numpy as np import pandas as pd df=pd.DataFrame({'A':np.random.random(20),'B':np.random.random(20)})       A         B 0  0.383493  0.250785 1  0.572949  0.139555 2  0.652391  0.401983 3  0.214145  0.696935 4  0.848551  0.516692 

Alternatively I could first categorize the data by those increments into a new column and subsequently use groupby to determine any relevant statistics that may be applicable in column A?

like image 888
GeoPy Avatar asked Jan 29 '14 19:01

GeoPy


2 Answers

You might be interested in pd.cut:

>>> df.groupby(pd.cut(df["B"], np.arange(0, 1.0+0.155, 0.155))).sum()                       A         B B                                 (0, 0.155]     2.775458  0.246394 (0.155, 0.31]  1.123989  0.471618 (0.31, 0.465]  2.051814  1.882763 (0.465, 0.62]  2.277960  1.528492 (0.62, 0.775]  1.577419  2.810723 (0.775, 0.93]  0.535100  1.694955 (0.93, 1.085]       NaN       NaN  [7 rows x 2 columns] 
like image 67
DSM Avatar answered Sep 19 '22 23:09

DSM


Try this:

df = df.sort_values('B') bins =  np.arange(0, 1.0, 0.155) ind = np.digitize(df['B'], bins)      print df.groupby(ind).head()  

Of course you can use any function on the groups not just head.

like image 38
Alvaro Fuentes Avatar answered Sep 20 '22 23:09

Alvaro Fuentes