Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create an aggregated dataframe from column of lists

Tags:

python

pandas

I have a dataframe of issues pulled from a Github repo using Pygithub. It is currently structured like the below -

assignees  |  open?  |  issue_id
n/a           yes        1
[p1, p2]      no         2
[p5]          no         3
[p1, p5, p2]  yes        4

I would like to modify/pivot table it so that it looks like -

assignee  |   yes   |    no
n/a            1          0
p1             1          1
p2             1          1
p5             1          1

I tried pd.Series(Counter(chain.from_iterable(df['assignees']))) but this split up the n/a values to three separate values ("n", "/", and "a"). I'm also not sure how that would work in regards to the values in another column. I was looking into a reverse group_by method but so far, my googling skills have failed me.

To create the test dataframe:

data = {'assignees': ['n/a', ['p1', 'p2'], ['p5'], ['p1', 'p5', 'p2']], 
        'open?': ['yes', 'no', 'no', 'yes'], 
        'issue_id': [1,2,3,4]}

df = pd.DataFrame(data)

Thanks so much in advance!

like image 817
eleanore Avatar asked Nov 30 '25 07:11

eleanore


1 Answers

IIUC, let's try pandas 0.25.0+ explode:

df_out = df.set_index(['open?','issue_id'])['assignees'].explode().reset_index()

df_out.pivot_table(index='assignees', 
                   columns='open?', 
                   values='issue_id', 
                   aggfunc='count', 
                   fill_value=0)

Output:

open?      no  yes
assignees         
n/a         0    1
p1          1    1
p2          1    1
p5          1    1
like image 117
Scott Boston Avatar answered Dec 02 '25 19:12

Scott Boston