I have a dataframe df
:
A B
0 28 abc
1 29 def
2 30 hij
3 31 hij
4 32 abc
5 28 abc
6 28 abc
7 29 def
8 30 hij
9 28 abc
10 29 klm
11 30 nop
12 28 abc
13 29 xyz
df.dtypes
A object # A is a string column as well
B object
dtype: object
I want to use the values from this list to groupby:
i = np.array([ 3, 5, 6, 9, 12, 14])
Basically, all rows in df
with index 0, 1, 2 are in the first group, rows with index 3, 4 are in the second group, rows with index 5 are in the third group, and so on.
My end goal is this:
A B
28,29,30 abc,def,hij
31,32 hij,abc
28 abc
28,29,30 abc,def,hij
28,29,30 abc,klm,nop
28,29 abc,xyz
Solution so far using groupby
+ pd.cut
:
df.groupby(pd.cut(df.index, bins=np.append([0], i)), as_index=False).agg(','.join)
A B
0 29,30,31 def,hij,hij
1 32,28 abc,abc
2 28 abc
3 29,30,28 def,hij,abc
4 29,30,28 klm,nop,abc
5 29 xyz
The result is incorrect :-(
How can I do this properly?
You are very close, but use include_lowest=True
and right=False
in pd.cut
because you want 0
th index from the bins and then you don't want to include last element each of the bins i.e
idx = pd.cut(df.index, bins=np.append([0], i),
include_lowest=True, right=False)
df.groupby(idx, as_index=False).agg(','.join)
A B 28,29,30 abc,def,hij 31,32 hij,abc 28 abc 28,29,30 abc,def,hij 28,29,30 abc,klm,nop 28,29 abc,xyz
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With