Is there an easy method in pandas to invoke groupby
on a range of values increments? For instance given the example below can I bin and group column B
with a 0.155
increment so that for example, the first couple of groups in column B
are divided into ranges between '0 - 0.155, 0.155 - 0.31 ...`
import numpy as np import pandas as pd df=pd.DataFrame({'A':np.random.random(20),'B':np.random.random(20)}) A B 0 0.383493 0.250785 1 0.572949 0.139555 2 0.652391 0.401983 3 0.214145 0.696935 4 0.848551 0.516692
Alternatively I could first categorize the data by those increments into a new column and subsequently use groupby
to determine any relevant statistics that may be applicable in column A
?
You might be interested in pd.cut
:
>>> df.groupby(pd.cut(df["B"], np.arange(0, 1.0+0.155, 0.155))).sum() A B B (0, 0.155] 2.775458 0.246394 (0.155, 0.31] 1.123989 0.471618 (0.31, 0.465] 2.051814 1.882763 (0.465, 0.62] 2.277960 1.528492 (0.62, 0.775] 1.577419 2.810723 (0.775, 0.93] 0.535100 1.694955 (0.93, 1.085] NaN NaN [7 rows x 2 columns]
Try this:
df = df.sort_values('B') bins = np.arange(0, 1.0, 0.155) ind = np.digitize(df['B'], bins) print df.groupby(ind).head()
Of course you can use any function on the groups not just head
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With