Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas Unstacking Unique Column Values to Columns Of Their Own

df = pd.DataFrame({'Col1': ['label1', 'label1', 'label2', 'label2',
          'label3', 'label3', 'label4'],
 'Col2': ['a', 'd', 'b', 'e', 'c', 'f', 'q']}, columns=['Col1', 'Col2'])

Looks like this

     Col1 Col2
0  label1    a
1  label1    d
2  label2    b
3  label2    e
4  label3    c
5  label3    f
6  label4    q

For the unique values in Col1, I want to turn a column's unique values into columns. In a sense, I'm trying to "unstack" Col1 values to be column headers and the row values would be the values in Col2. My key main issue is that I'm not computing any numeric data - it's all text - and I'm simply trying to reshape the structure.

Here's the desired result:

  label1 label2 label3 label4
0      a      b      c      q
1      d      e      f    NaN

I've tried: stack, unstack, pd.melt, pivot_table, pivot.

This ALMOST gets me there, but not quite, and doesn't seem very concise:

df.groupby('Col1').apply(lambda x: x['Col2'].values).to_frame().T

Col1  label1  label2  label3 label4
0     [a, d]  [b, e]  [c, f]    [q]

This question shows how to do it with a pivot table.. but the numeric index in my case is not something I care about.

This question shows how to also do it with a pivot table.. using aggfunc first or ' '.join but that returns CSV instead of values on respective rows.

like image 616
Jarad Avatar asked Mar 11 '23 20:03

Jarad


1 Answers

You can use cumcount for creating column for new index and then pivot_table with aggregating join:

df['g'] = df.groupby('Col1')['Col1'].cumcount()

print (df.pivot_table(index='g', columns='Col1', values='Col2', aggfunc=''.join))
Col1 label1 label2 label3 label4
g                               
0         a      b      c      q
1         d      e      f   None

Thank you for comment Jeff L.:

df['g'] = df.groupby('Col1')['Col1'].cumcount()

print (df.pivot(index='g', columns='Col1', values='Col2'))
Col1 label1 label2 label3 label4
g                               
0         a      b      c      q
1         d      e      f   None

Or:

print (pd.pivot(index=df.groupby('Col1')['Col1'].cumcount(),
                columns=df['Col1'], 
                values=df['Col2']))

Col1 label1 label2 label3 label4
0         a      b      c      q
1         d      e      f   None
like image 60
jezrael Avatar answered Apr 26 '23 04:04

jezrael