Currently I'm having trouble setting up a combination of setting up a list and filtering when grouping a dataframe.
Let's say we have a DataFrame of the form:
A B C
0 x2 a32cd 1
1 x1 a11aa 0
2 x1 NaN 1
3 x1 d75dd 0
4 x1 a11aa 1
5 x2 a32cd 1
6 x2 w22xz 0
...
And what I'm looking for is to group on column A (strings) and then make a list of non-duplicate, non-null values of B (strings) and I can drop out list C (integers). The final form I am looking for is something like:
A B
0 x1 [a11aa, d75dd, ...]
1 x2 [a32cd, w22xz, ...]
I was thinking of setting it up somehow with the form of:
df_x.groupby('A')['B'].apply(list)
and then apply some conditions to it, but I can't seem to find it. Should I set up a function for it? I come from a MATLAB based background, so I am inclined to just run through the entire DataFrame, row by row. But I have been told that once you are thinking about doing that in Pandas that there probably is a smarter way to do it.
>>> df.dropna().groupby("A")["B"].unique()
A
x1 [a11aa, d75dd]
x2 [a32cd, w22xz]
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With