Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a pivot table that lists out values

What aggfunc do I need to use to produce a list using a pivot table? I tried using str which doesn't quite work.

Inputs

import pandas as pd
data = {
    'Test point': [0, 1, 2, 0, 1],
    'Experiment': [1, 2, 3, 4, 5]
}
df = pd.DataFrame(data)
print df

pivot = pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=len)
print pivot

pivot = pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=str)
print pivot

Outputs

   Experiment  Test point
0           1           0
1           2           1
2           3           2
3           4           0
4           5           1
            Experiment
Test point            
0                    2
1                    2
2                    1
                                                Experiment
Test point                                                
0           0    1\n3    4\nName: Experiment, dtype: int64
1           1    2\n4    5\nName: Experiment, dtype: int64
2                   2    3\nName: Experiment, dtype: int64

Desired output

            Experiment
Test point                                                
0           1, 4
1           2, 5
2           3
like image 221
bluprince13 Avatar asked Oct 14 '17 10:10

bluprince13


People also ask

How do I list values in a PivotTable?

In the PivotTable, right-click the value field, and then click Show Values As. Note: In Excel for Mac, the Show Values As menu doesn't list all the same options as Excel for Windows, but they are available. Select More Options on the menu if you don't see the choice you want listed.

Can a PivotTable just show value?

Introduction. In Microsoft Excel, usually you can only show numbers in a pivot table values area, even if you add a text field there. By default, Excel shows a count for text data, and a sum for numerical data.

How do I show values and not sum in PivotTable?

Inside the Pivot Column dialog, select the column with the values that will populate the new columns to be created. In this case "Time" but could be any field type, including text. In the Advanced Options part, select "Don´t Aggregate" so the values will displayed without any modification.


Video Answer


3 Answers

you can use list itself as a function:

>>> pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=lambda x:list(x))
           Experiment
Test point           
0              [1, 4]
1              [2, 5]
2                 [3]
like image 63
Roman Pekar Avatar answered Oct 17 '22 11:10

Roman Pekar


Use

In [1830]: pd.pivot_table(df, index=['Test point'], values=['Experiment'],
                          aggfunc=lambda x: ', '.join(x.astype(str)))
Out[1830]:
           Experiment
Test point
0                1, 4
1                2, 5
2                   3

Or, groupby would do.

In [1831]: df.groupby('Test point').agg({
                'Experiment': lambda x: x.astype(str).str.cat(sep=', ')})
Out[1831]:
           Experiment
Test point
0                1, 4
1                2, 5
2                   3

But, if you want then as list.

In [1861]: df.groupby('Test point').agg({'Experiment': lambda x: x.tolist()})
Out[1861]:
           Experiment
Test point
0              [1, 4]
1              [2, 5]
2                 [3]

x.astype(str).str.cat(sep=', ') is similar to ', '.join(x.astype(str))

like image 22
Zero Avatar answered Oct 17 '22 13:10

Zero


Option 1
str Pre-conversion + groupby + apply.

You could pre-convert to string to simplify the groupby call.

df.assign(Experiment=df.Experiment.astype(str))\
      .groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment')

           Experiment
Test point           
0                1, 4
1                2, 5
2                   3

And a modification of this would involve inplace assignment, for speed (assign returns a copy and is slower):

df.Experiment = df.Experiment.astype(str)
df.groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment')

           Experiment
Test point           
0                1, 4
1                2, 5
2                   3

With the downside of modifying the original dataframe as well.

Performance

# Zero's 1st solution
%%timeit
df.groupby('Test point').agg({'Experiment': lambda x: x.astype(str).str.cat(sep=', ')})

100 loops, best of 3: 3.72 ms per loop
# Zero's second solution
%%timeit
pd.pivot_table(df, index=['Test point'], values=['Experiment'], 
               aggfunc=lambda x: ', '.join(x.astype(str)))

100 loops, best of 3: 5.17 ms per loop
# proposed in this post
%%timeit -n 1
df.Experiment = df.Experiment.astype(str)
df.groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment')

1 loop, best of 3: 2.02 ms per loop

Note that the .assign method is only a few ms slower than this. Larger performance gains should be seen for larger dataframes.


Option 2
groupby + agg:

A similar operation follows with agg:

df.assign(Experiment=df.Experiment.astype(str))\
         .groupby('Test point').agg({'Experiment' : ', '.join})

           Experiment
Test point           
0                1, 4
1                2, 5
2                   3

And the in-place version of this would be the same as above.

# proposed in this post
%%timeit -n 1
df.Experiment = df.Experiment.astype(str)
df.groupby('Test point').agg({'Experiment' : ', '.join})

1 loop, best of 3: 2.21 ms per loop

agg should see speed gains over apply for larger dataframes.

like image 40
cs95 Avatar answered Oct 17 '22 12:10

cs95