I have a dataframe of issues pulled from a Github repo using Pygithub. It is currently structured like the below -
assignees | open? | issue_id
n/a yes 1
[p1, p2] no 2
[p5] no 3
[p1, p5, p2] yes 4
I would like to modify/pivot table it so that it looks like -
assignee | yes | no
n/a 1 0
p1 1 1
p2 1 1
p5 1 1
I tried pd.Series(Counter(chain.from_iterable(df['assignees']))) but this split up the n/a values to three separate values ("n", "/", and "a"). I'm also not sure how that would work in regards to the values in another column. I was looking into a reverse group_by method but so far, my googling skills have failed me.
To create the test dataframe:
data = {'assignees': ['n/a', ['p1', 'p2'], ['p5'], ['p1', 'p5', 'p2']],
'open?': ['yes', 'no', 'no', 'yes'],
'issue_id': [1,2,3,4]}
df = pd.DataFrame(data)
Thanks so much in advance!
IIUC, let's try pandas 0.25.0+ explode:
df_out = df.set_index(['open?','issue_id'])['assignees'].explode().reset_index()
df_out.pivot_table(index='assignees',
columns='open?',
values='issue_id',
aggfunc='count',
fill_value=0)
Output:
open? no yes
assignees
n/a 0 1
p1 1 1
p2 1 1
p5 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With