Python

Question

Currently I'm having trouble setting up a combination of setting up a list and filtering when grouping a dataframe.

Let's say we have a DataFrame of the form:

      A       B    C
0    x2   a32cd    1
1    x1   a11aa    0
2    x1     NaN    1 
3    x1   d75dd    0
4    x1   a11aa    1
5    x2   a32cd    1
6    x2   w22xz    0
...

And what I'm looking for is to group on column A (strings) and then make a list of non-duplicate, non-null values of B (strings) and I can drop out list C (integers). The final form I am looking for is something like:

      A           B 
0    x1   [a11aa, d75dd, ...]
1    x2   [a32cd, w22xz, ...]

I was thinking of setting it up somehow with the form of:

df_x.groupby('A')['B'].apply(list)

and then apply some conditions to it, but I can't seem to find it. Should I set up a function for it? I come from a MATLAB based background, so I am inclined to just run through the entire DataFrame, row by row. But I have been told that once you are thinking about doing that in Pandas that there probably is a smarter way to do it.

w-m · Accepted Answer

>>> df.dropna().groupby("A")["B"].unique()
A
x1    [a11aa, d75dd]
x2    [a32cd, w22xz]
dtype: object

Python - Pandas - GroupBy conditional string addition

Tags:

string

pandas

group-by

SirGianmarcoD

1 Answers

w-m

Recent Activity

Donate For Us