Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add columns to a pivot table (pandas)

I know in R I can use tidyr for the following:

data_wide <- spread(data_protein, Fraction, Count)

and data_wide will inherit all the columns from data_protein that are not spread.

Protein Peptide  Start  Fraction  Count
1             A    122       F1     1
1             A    122       F2     2     
1             B    230       F1     3     
1             B    230       F2     4

becomes

Protein Peptide  Start  F1  F2
1             A    122   1  2
1             B    230   3  4     

But in pandas (Python),

data_wide = data_prot2.reset_index(drop=True).pivot('Peptide','Fraction','Count').fillna(0)

doesn't inherit anything not specified in the function (index, key, value). Thus, I decided to join it through df.join():

data_wide2 = data_wide.join(data_prot2.set_index('Peptide')['Start']).sort_values('Start')

But that produces duplicates of the peptides because there are several start values. Is there any more straightforward way to solve this? Or a special parameter for join that omits repeats? Thank you in advance.

like image 808
Nico Avatar asked Jul 27 '16 20:07

Nico


Video Answer


2 Answers

try this:

In [144]: df
Out[144]:
   Protein Peptide  Start Fraction  Count
0        1       A    122       F1      1
1        1       A    122       F2      2
2        1       B    230       F1      3
3        1       B    230       F2      4

In [145]: df.pivot_table(index=['Protein','Peptide','Start'], columns='Fraction').reset_index()
Out[145]:
         Protein Peptide Start Count
Fraction                          F1 F2
0              1       A   122     1  2
1              1       B   230     3  4

you can also specify Count column explicitly:

In [146]: df.pivot_table(index=['Protein','Peptide','Start'], columns='Fraction', values='Count').reset_index()
Out[146]:
Fraction  Protein Peptide  Start  F1  F2
0               1       A    122   1   2
1               1       B    230   3   4
like image 153
MaxU - stop WAR against UA Avatar answered Oct 23 '22 04:10

MaxU - stop WAR against UA


Using stack:

df.set_index(df.columns[:4].tolist()) \
  .Count.unstack().reset_index() \
  .rename_axis(None, axis=1)

enter image description here

like image 36
piRSquared Avatar answered Oct 23 '22 03:10

piRSquared