What aggfunc do I need to use to produce a list using a pivot table? I tried using str which doesn't quite work. Inputs <pre class="prettyprint"><code>import pandas as pd data = { 'Test point': [0, 1, 2, 0, 1], 'Experiment': [1, 2, 3, 4, 5] } df = pd.DataFrame(data) print df pivot = pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=len) print pivot pivot = pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=str) print pivot </code></pre> Outputs <pre class="prettyprint"><code> Experiment Test point 0 1 0 1 2 1 2 3 2 3 4 0 4 5 1 Experiment Test point 0 2 1 2 2 1 Experiment Test point 0 0 1\n3 4\nName: Experiment, dtype: int64 1 1 2\n4 5\nName: Experiment, dtype: int64 2 2 3\nName: Experiment, dtype: int64 </code></pre> Desired output <pre class="prettyprint"><code> Experiment Test point 0 1, 4 1 2, 5 2 3 </code></pre>

you can use <code>list</code> itself as a function: <pre class="prettyprint"><code>>>> pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=lambda x:list(x)) Experiment Test point 0 [1, 4] 1 [2, 5] 2 [3] </code></pre>

Create a pivot table that lists out values

Tags:

python

pandas

pivot-table

What aggfunc do I need to use to produce a list using a pivot table? I tried using str which doesn't quite work.

Inputs

import pandas as pd
data = {
    'Test point': [0, 1, 2, 0, 1],
    'Experiment': [1, 2, 3, 4, 5]
}
df = pd.DataFrame(data)
print df

pivot = pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=len)
print pivot

pivot = pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=str)
print pivot

Outputs

   Experiment  Test point
0           1           0
1           2           1
2           3           2
3           4           0
4           5           1
            Experiment
Test point            
0                    2
1                    2
2                    1
                                                Experiment
Test point                                                
0           0    1\n3    4\nName: Experiment, dtype: int64
1           1    2\n4    5\nName: Experiment, dtype: int64
2                   2    3\nName: Experiment, dtype: int64

Desired output

            Experiment
Test point                                                
0           1, 4
1           2, 5
2           3

221

asked Oct 14 '17 10:10

bluprince13

Video Answer

3 Answers

you can use list itself as a function:

>>> pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=lambda x:list(x))
           Experiment
Test point           
0              [1, 4]
1              [2, 5]
2                 [3]

answered Oct 17 '22 11:10

Roman Pekar

Use

In [1830]: pd.pivot_table(df, index=['Test point'], values=['Experiment'],
                          aggfunc=lambda x: ', '.join(x.astype(str)))
Out[1830]:
           Experiment
Test point
0                1, 4
1                2, 5
2                   3

Or, groupby would do.

In [1831]: df.groupby('Test point').agg({
                'Experiment': lambda x: x.astype(str).str.cat(sep=', ')})
Out[1831]:
           Experiment
Test point
0                1, 4
1                2, 5
2                   3

But, if you want then as list.

In [1861]: df.groupby('Test point').agg({'Experiment': lambda x: x.tolist()})
Out[1861]:
           Experiment
Test point
0              [1, 4]
1              [2, 5]
2                 [3]

x.astype(str).str.cat(sep=', ') is similar to ', '.join(x.astype(str))

answered Oct 17 '22 13:10

Zero

Option 1
str Pre-conversion + groupby + apply.

You could pre-convert to string to simplify the groupby call.

df.assign(Experiment=df.Experiment.astype(str))\
      .groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment')

           Experiment
Test point           
0                1, 4
1                2, 5
2                   3

And a modification of this would involve inplace assignment, for speed (assign returns a copy and is slower):

df.Experiment = df.Experiment.astype(str)
df.groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment')

           Experiment
Test point           
0                1, 4
1                2, 5
2                   3

With the downside of modifying the original dataframe as well.

Performance

# Zero's 1st solution
%%timeit
df.groupby('Test point').agg({'Experiment': lambda x: x.astype(str).str.cat(sep=', ')})

100 loops, best of 3: 3.72 ms per loop

# Zero's second solution
%%timeit
pd.pivot_table(df, index=['Test point'], values=['Experiment'], 
               aggfunc=lambda x: ', '.join(x.astype(str)))

100 loops, best of 3: 5.17 ms per loop

# proposed in this post
%%timeit -n 1
df.Experiment = df.Experiment.astype(str)
df.groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment')

1 loop, best of 3: 2.02 ms per loop

Note that the .assign method is only a few ms slower than this. Larger performance gains should be seen for larger dataframes.

Option 2
groupby + agg:

A similar operation follows with agg:

df.assign(Experiment=df.Experiment.astype(str))\
         .groupby('Test point').agg({'Experiment' : ', '.join})

           Experiment
Test point           
0                1, 4
1                2, 5
2                   3

And the in-place version of this would be the same as above.

# proposed in this post
%%timeit -n 1
df.Experiment = df.Experiment.astype(str)
df.groupby('Test point').agg({'Experiment' : ', '.join})

1 loop, best of 3: 2.21 ms per loop

agg should see speed gains over apply for larger dataframes.

answered Oct 17 '22 12:10

cs95

Related questions
                            
                                Is there a way to change the filemode for a logger object that is not configured using basicConfig?
                            
                                Python "bad interpreter" ERROR
                            
                                new column with coordinates using geopy pandas
                            
                                iPython - set up magic commands in configuration file
                            
                                How to change the number of axis ticks in seaborn plots
                            
                                numpy.core.multiarray failed to import
                            
                                Time Series Analysis - unevenly spaced measures - pandas + statsmodels
                            
                                When bulding a CNN, I am getting complaints from Keras that do not make sense to me.
                            
                                pandas read_csv column dtype is set to decimal but converts to string
                            
                                Split nested array values from Pandas Dataframe cell over multiple rows
                            
                                Pandas: get multiindex level as series
                            
                                Using tf.unpack() when first dimension of Variable is None
                            
                                Exclude unwanted tag on Beautifulsoup Python
                            
                                How to use paho mqtt client in django?
                            
                                What does `layer.get_weights()` return?
                            
                                Flier colors in boxplot with matplotlib
                            
                                python pandas sum by hour of day
                            
                                Copying MultiIndex dataframes with pd.read_clipboard?
                            
                                Django custom for complex Func (sql function)
                            
                                How to merge/combine columns in pandas?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With