Using groupby with expanding and a custom function

Question

I have a dataframe that consists of truthIds and trackIds:

truthId = ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'C', 'B', 'A', 'A', 'C', 'C']
trackId = [1, 1, 2, 2, 3, 4, 5, 3, 2, 1, 5, 4, 6]
df1 = pd.DataFrame({'truthId': truthId, 'trackId': trackId})
    trackId truthId
0         1       A
1         1       A
2         2       B
3         2       B
4         3       C
5         4       C
6         5       A
7         3       C
8         2       B
9         1       A
10        5       A
11        4       C
12        6       C

I wish to add a column that calculates, for each unique truthId, the length of the set of unique tracksIds that have previously (i.e. from the top of the data to that row) been associated with it:

       truthId  trackId  unique_Ids
0        A        1           1
1        A        1           1
2        B        2           1
3        B        2           1
4        C        3           1
5        C        4           2
6        A        5           2
7        C        3           2
8        B        2           1
9        A        1           2
10       A        5           2
11       C        4           2
12       C        6           3

I am very close to accomplishing this. I can use:

df.groupby('truthId').expanding().agg({'trackId': lambda x: len(set(x))})

Which produces the following output:

                trackId
truthId            
A       0       1.0
        1       1.0
        6       2.0
        9       2.0
        10      2.0
B       2       1.0
        3       1.0
        8       1.0
C       4       1.0
        5       2.0
        7       2.0
        11      2.0
        12      3.0

This is consistent with the documentation

However, it throws an error when I attempt to assign this output to a new column:

df['unique_Ids'] = df.groupby('truthId').expanding().agg({'trackId': lambda x: len(set(x))})

I have used this workflow before and ideally the new column is put back into the original DateFrame with no issues (i.e. Split-Apply-Combine). How can I get it to work?

BENY · Accepted Answer

You need reset_index

df['Your']=(df.groupby('truthId').expanding().agg({'trackId': lambda x: len(set(x))})).reset_index(level=0,drop=True)
df
Out[1162]: 
    trackId truthId  Your
0         1       A   1.0
1         1       A   1.0
2         2       B   1.0
3         2       B   1.0
4         3       C   1.0
5         4       C   2.0
6         5       A   2.0
7         3       C   2.0
8         2       B   1.0
9         1       A   2.0
10        5       A   2.0
11        4       C   2.0
12        6       C   3.0

Using groupby with expanding and a custom function

Tags:

python

pandas

lambda

pandas-groupby

split-apply-combine

Tara S

1 Answers

BENY

Recent Activity

Donate For Us

Using groupby with expanding and a custom function

Tags:

python

pandas

lambda

pandas-groupby

split-apply-combine

Tara S

1 Answers

BENY

Related questions

Recent Activity

Donate For Us