<ul> <li>What is pivot?</li> <li>How do I pivot?</li> <li>Is this a pivot?</li> <li>Long format to wide format?</li> </ul> I've seen a lot of questions that ask about pivot tables. Even if they don't know that they are asking about pivot tables, they usually are. It is virtually impossible to write a canonical question and answer that encompasses all aspects of pivoting... ... But I'm going to give it a go. <hr> The problem with existing questions and answers is that often the question is focused on a nuance that the OP has trouble generalizing in order to use a number of the existing good answers. However, none of the answers attempt to give a comprehensive explanation (because it's a daunting task) Look a few examples from my Google Search <ol> <li>How to pivot a dataframe in Pandas?</li> </ol> <ul> <li>Good question and answer. But the answer only answers the specific question with little explanation.</li> </ul> <ol start="2"> <li>pandas pivot table to data frame</li> </ol> <ul> <li>In this question, the OP is concerned with the output of the pivot. Namely how the columns look. OP wanted it to look like R. This isn't very helpful for pandas users.</li> </ul> <ol start="3"> <li>pandas pivoting a dataframe, duplicate rows</li> </ol> <ul> <li>Another decent question but the answer focuses on one method, namely <code>pd.DataFrame.pivot</code> </li> </ul> So whenever someone searches for <code>pivot</code> they get sporadic results that are likely not going to answer their specific question. <hr> <h3>Setup</h3> You may notice that I conspicuously named my columns and relevant column values to correspond with how I'm going to pivot in the answers below. <pre class="prettyprint"><code>import numpy as np import pandas as pd from numpy.core.defchararray import add np.random.seed([3,1415]) n = 20 cols = np.array(['key', 'row', 'item', 'col']) arr1 = (np.random.randint(5, size=(n, 4)) // [2, 1, 2, 1]).astype(str) df = pd.DataFrame( add(cols, arr1), columns=cols ).join( pd.DataFrame(np.random.rand(n, 2).round(2)).add_prefix('val') ) print(df) key row item col val0 val1 0 key0 row3 item1 col3 0.81 0.04 1 key1 row2 item1 col2 0.44 0.07 2 key1 row0 item1 col0 0.77 0.01 3 key0 row4 item0 col2 0.15 0.59 4 key1 row0 item2 col1 0.81 0.64 5 key1 row2 item2 col4 0.13 0.88 6 key2 row4 item1 col3 0.88 0.39 7 key1 row4 item1 col1 0.10 0.07 8 key1 row0 item2 col4 0.65 0.02 9 key1 row2 item0 col2 0.35 0.61 10 key2 row0 item2 col1 0.40 0.85 11 key2 row4 item1 col2 0.64 0.25 12 key0 row2 item2 col3 0.50 0.44 13 key0 row4 item1 col4 0.24 0.46 14 key1 row3 item2 col3 0.28 0.11 15 key0 row3 item1 col1 0.31 0.23 16 key0 row0 item2 col3 0.86 0.01 17 key0 row4 item0 col3 0.64 0.21 18 key2 row2 item2 col0 0.13 0.45 19 key0 row2 item0 col4 0.37 0.70 </code></pre> <h3>Question(s)</h3> <ol> <li> Why do I get <code>ValueError: Index contains duplicate entries, cannot reshape</code> </li> <li> How do I pivot <code>df</code> such that the <code>col</code> values are columns, <code>row</code> values are the index, and mean of <code>val0</code> are the values? <pre class="prettyprint"><code> col col0 col1 col2 col3 col4 row row0 0.77 0.605 NaN 0.860 0.65 row2 0.13 NaN 0.395 0.500 0.25 row3 NaN 0.310 NaN 0.545 NaN row4 NaN 0.100 0.395 0.760 0.24 </code></pre> </li> <li> How do I pivot <code>df</code> such that the <code>col</code> values are columns, <code>row</code> values are the index, mean of <code>val0</code> are the values, and missing values are <code>0</code>? <pre class="prettyprint"><code> col col0 col1 col2 col3 col4 row row0 0.77 0.605 0.000 0.860 0.65 row2 0.13 0.000 0.395 0.500 0.25 row3 0.00 0.310 0.000 0.545 0.00 row4 0.00 0.100 0.395 0.760 0.24 </code></pre> </li> <li> Can I get something other than <code>mean</code>, like maybe <code>sum</code>? <pre class="prettyprint"><code> col col0 col1 col2 col3 col4 row row0 0.77 1.21 0.00 0.86 0.65 row2 0.13 0.00 0.79 0.50 0.50 row3 0.00 0.31 0.00 1.09 0.00 row4 0.00 0.10 0.79 1.52 0.24 </code></pre> </li> <li> Can I do more that one aggregation at a time? <pre class="prettyprint"><code> sum mean col col0 col1 col2 col3 col4 col0 col1 col2 col3 col4 row row0 0.77 1.21 0.00 0.86 0.65 0.77 0.605 0.000 0.860 0.65 row2 0.13 0.00 0.79 0.50 0.50 0.13 0.000 0.395 0.500 0.25 row3 0.00 0.31 0.00 1.09 0.00 0.00 0.310 0.000 0.545 0.00 row4 0.00 0.10 0.79 1.52 0.24 0.00 0.100 0.395 0.760 0.24 </code></pre> </li> <li> Can I aggregate over multiple value columns? <pre class="prettyprint"><code> val0 val1 col col0 col1 col2 col3 col4 col0 col1 col2 col3 col4 row row0 0.77 0.605 0.000 0.860 0.65 0.01 0.745 0.00 0.010 0.02 row2 0.13 0.000 0.395 0.500 0.25 0.45 0.000 0.34 0.440 0.79 row3 0.00 0.310 0.000 0.545 0.00 0.00 0.230 0.00 0.075 0.00 row4 0.00 0.100 0.395 0.760 0.24 0.00 0.070 0.42 0.300 0.46 </code></pre> </li> <li> Can Subdivide by multiple columns? <pre class="prettyprint"><code> item item0 item1 item2 col col2 col3 col4 col0 col1 col2 col3 col4 col0 col1 col3 col4 row row0 0.00 0.00 0.00 0.77 0.00 0.00 0.00 0.00 0.00 0.605 0.86 0.65 row2 0.35 0.00 0.37 0.00 0.00 0.44 0.00 0.00 0.13 0.000 0.50 0.13 row3 0.00 0.00 0.00 0.00 0.31 0.00 0.81 0.00 0.00 0.000 0.28 0.00 row4 0.15 0.64 0.00 0.00 0.10 0.64 0.88 0.24 0.00 0.000 0.00 0.00 </code></pre> </li> <li> Or <pre class="prettyprint"><code> item item0 item1 item2 col col2 col3 col4 col0 col1 col2 col3 col4 col0 col1 col3 col4 key row key0 row0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.86 0.00 row2 0.00 0.00 0.37 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.50 0.00 row3 0.00 0.00 0.00 0.00 0.31 0.00 0.81 0.00 0.00 0.00 0.00 0.00 row4 0.15 0.64 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 0.00 key1 row0 0.00 0.00 0.00 0.77 0.00 0.00 0.00 0.00 0.00 0.81 0.00 0.65 row2 0.35 0.00 0.00 0.00 0.00 0.44 0.00 0.00 0.00 0.00 0.00 0.13 row3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.28 0.00 row4 0.00 0.00 0.00 0.00 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 key2 row0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.40 0.00 0.00 row2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.13 0.00 0.00 0.00 row4 0.00 0.00 0.00 0.00 0.00 0.64 0.88 0.00 0.00 0.00 0.00 0.00 </code></pre> </li> <li> Can I aggregate the frequency in which the column and rows occur together, aka "cross tabulation"? <pre class="prettyprint"><code> col col0 col1 col2 col3 col4 row row0 1 2 0 1 1 row2 1 0 2 1 2 row3 0 1 0 2 0 row4 0 1 2 2 1 </code></pre> </li> <li> How do I convert a DataFrame from long to wide by pivoting on ONLY two columns? Given, <pre class="prettyprint"><code>np.random.seed([3, 1415]) df2 = pd.DataFrame({'A': list('aaaabbbc'), 'B': np.random.choice(15, 8)}) df2 A B 0 a 0 1 a 11 2 a 2 3 a 11 4 b 10 5 b 10 6 b 14 7 c 7 </code></pre> The expected should look something like <pre class="prettyprint"><code> a b c 0 0.0 10.0 7.0 1 11.0 10.0 NaN 2 2.0 14.0 NaN 3 11.0 NaN NaN </code></pre> </li> <li> How do I flatten the multiple index to single index after <code>pivot</code>? From <pre class="prettyprint"><code> 1 2 1 1 2 a 2 1 1 b 2 1 0 c 1 0 0 </code></pre> To <pre class="prettyprint"><code> 1|1 2|1 2|2 a 2 1 1 b 2 1 0 c 1 0 0 </code></pre> </li> </ol>

To extend @piRSquared's answer another version of Question 10 <h3>Question 10.1</h3> DataFrame: <pre class="prettyprint"><code>d = data = {'A': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3, 6: 5}, 'B': {0: 'a', 1: 'b', 2: 'c', 3: 'a', 4: 'b', 5: 'a', 6: 'c'}} df = pd.DataFrame(d) A B 0 1 a 1 1 b 2 1 c 3 2 a 4 2 b 5 3 a 6 5 c </code></pre> Output: <pre class="prettyprint"><code> 0 1 2 A 1 a b c 2 a b None 3 a None None 5 c None None </code></pre> <hr> Using <code>df.groupby</code> and <code>pd.Series.tolist</code> <pre class="prettyprint"><code>t = df.groupby('A')['B'].apply(list) out = pd.DataFrame(t.tolist(),index=t.index) out 0 1 2 A 1 a b c 2 a b None 3 a None None 5 c None None </code></pre> Or A much better alternative using <code>pd.pivot_table</code> with <code>df.squeeze.</code> <pre class="prettyprint"><code>t = df.pivot_table(index='A',values='B',aggfunc=list).squeeze() out = pd.DataFrame(t.tolist(),index=t.index) </code></pre>

How can I pivot a dataframe?

Tags:

python

pandas

group-by

pandas-groupby

pivot

What is pivot?
How do I pivot?
Is this a pivot?
Long format to wide format?

I've seen a lot of questions that ask about pivot tables. Even if they don't know that they are asking about pivot tables, they usually are. It is virtually impossible to write a canonical question and answer that encompasses all aspects of pivoting...

... But I'm going to give it a go.

The problem with existing questions and answers is that often the question is focused on a nuance that the OP has trouble generalizing in order to use a number of the existing good answers. However, none of the answers attempt to give a comprehensive explanation (because it's a daunting task)

Look a few examples from my Google Search

How to pivot a dataframe in Pandas?

Good question and answer. But the answer only answers the specific question with little explanation.

pandas pivot table to data frame

In this question, the OP is concerned with the output of the pivot. Namely how the columns look. OP wanted it to look like R. This isn't very helpful for pandas users.

pandas pivoting a dataframe, duplicate rows

Another decent question but the answer focuses on one method, namely pd.DataFrame.pivot

So whenever someone searches for pivot they get sporadic results that are likely not going to answer their specific question.

Setup

You may notice that I conspicuously named my columns and relevant column values to correspond with how I'm going to pivot in the answers below.

import numpy as np import pandas as pd from numpy.core.defchararray import add  np.random.seed([3,1415]) n = 20  cols = np.array(['key', 'row', 'item', 'col']) arr1 = (np.random.randint(5, size=(n, 4)) // [2, 1, 2, 1]).astype(str)  df = pd.DataFrame(     add(cols, arr1), columns=cols ).join(     pd.DataFrame(np.random.rand(n, 2).round(2)).add_prefix('val') ) print(df)       key   row   item   col  val0  val1 0   key0  row3  item1  col3  0.81  0.04 1   key1  row2  item1  col2  0.44  0.07 2   key1  row0  item1  col0  0.77  0.01 3   key0  row4  item0  col2  0.15  0.59 4   key1  row0  item2  col1  0.81  0.64 5   key1  row2  item2  col4  0.13  0.88 6   key2  row4  item1  col3  0.88  0.39 7   key1  row4  item1  col1  0.10  0.07 8   key1  row0  item2  col4  0.65  0.02 9   key1  row2  item0  col2  0.35  0.61 10  key2  row0  item2  col1  0.40  0.85 11  key2  row4  item1  col2  0.64  0.25 12  key0  row2  item2  col3  0.50  0.44 13  key0  row4  item1  col4  0.24  0.46 14  key1  row3  item2  col3  0.28  0.11 15  key0  row3  item1  col1  0.31  0.23 16  key0  row0  item2  col3  0.86  0.01 17  key0  row4  item0  col3  0.64  0.21 18  key2  row2  item2  col0  0.13  0.45 19  key0  row2  item0  col4  0.37  0.70

Question(s)

Why do I get ValueError: Index contains duplicate entries, cannot reshape

How do I pivot df such that the col values are columns, row values are the index, and mean of val0 are the values?

 col   col0   col1   col2   col3  col4  row  row0  0.77  0.605    NaN  0.860  0.65  row2  0.13    NaN  0.395  0.500  0.25  row3   NaN  0.310    NaN  0.545   NaN  row4   NaN  0.100  0.395  0.760  0.24

How do I pivot df such that the col values are columns, row values are the index, mean of val0 are the values, and missing values are 0?

 col   col0   col1   col2   col3  col4  row  row0  0.77  0.605  0.000  0.860  0.65  row2  0.13  0.000  0.395  0.500  0.25  row3  0.00  0.310  0.000  0.545  0.00  row4  0.00  0.100  0.395  0.760  0.24

Can I get something other than mean, like maybe sum?

 col   col0  col1  col2  col3  col4  row  row0  0.77  1.21  0.00  0.86  0.65  row2  0.13  0.00  0.79  0.50  0.50  row3  0.00  0.31  0.00  1.09  0.00  row4  0.00  0.10  0.79  1.52  0.24

Can I do more that one aggregation at a time?

        sum                          mean  col   col0  col1  col2  col3  col4  col0   col1   col2   col3  col4  row  row0  0.77  1.21  0.00  0.86  0.65  0.77  0.605  0.000  0.860  0.65  row2  0.13  0.00  0.79  0.50  0.50  0.13  0.000  0.395  0.500  0.25  row3  0.00  0.31  0.00  1.09  0.00  0.00  0.310  0.000  0.545  0.00  row4  0.00  0.10  0.79  1.52  0.24  0.00  0.100  0.395  0.760  0.24

Can I aggregate over multiple value columns?

       val0                             val1  col   col0   col1   col2   col3  col4  col0   col1  col2   col3  col4  row  row0  0.77  0.605  0.000  0.860  0.65  0.01  0.745  0.00  0.010  0.02  row2  0.13  0.000  0.395  0.500  0.25  0.45  0.000  0.34  0.440  0.79  row3  0.00  0.310  0.000  0.545  0.00  0.00  0.230  0.00  0.075  0.00  row4  0.00  0.100  0.395  0.760  0.24  0.00  0.070  0.42  0.300  0.46

Can Subdivide by multiple columns?

 item item0             item1                         item2  col   col2  col3  col4  col0  col1  col2  col3  col4  col0   col1  col3  col4  row  row0  0.00  0.00  0.00  0.77  0.00  0.00  0.00  0.00  0.00  0.605  0.86  0.65  row2  0.35  0.00  0.37  0.00  0.00  0.44  0.00  0.00  0.13  0.000  0.50  0.13  row3  0.00  0.00  0.00  0.00  0.31  0.00  0.81  0.00  0.00  0.000  0.28  0.00  row4  0.15  0.64  0.00  0.00  0.10  0.64  0.88  0.24  0.00  0.000  0.00  0.00

 item      item0             item1                         item2  col        col2  col3  col4  col0  col1  col2  col3  col4  col0  col1  col3  col4  key  row  key0 row0  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.86  0.00       row2  0.00  0.00  0.37  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.50  0.00       row3  0.00  0.00  0.00  0.00  0.31  0.00  0.81  0.00  0.00  0.00  0.00  0.00       row4  0.15  0.64  0.00  0.00  0.00  0.00  0.00  0.24  0.00  0.00  0.00  0.00  key1 row0  0.00  0.00  0.00  0.77  0.00  0.00  0.00  0.00  0.00  0.81  0.00  0.65       row2  0.35  0.00  0.00  0.00  0.00  0.44  0.00  0.00  0.00  0.00  0.00  0.13       row3  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.28  0.00       row4  0.00  0.00  0.00  0.00  0.10  0.00  0.00  0.00  0.00  0.00  0.00  0.00  key2 row0  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.40  0.00  0.00       row2  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.13  0.00  0.00  0.00       row4  0.00  0.00  0.00  0.00  0.00  0.64  0.88  0.00  0.00  0.00  0.00  0.00

Can I aggregate the frequency in which the column and rows occur together, aka "cross tabulation"?

 col   col0  col1  col2  col3  col4  row  row0     1     2     0     1     1  row2     1     0     2     1     2  row3     0     1     0     2     0  row4     0     1     2     2     1

How do I convert a DataFrame from long to wide by pivoting on ONLY two columns? Given,

np.random.seed([3, 1415]) df2 = pd.DataFrame({'A': list('aaaabbbc'), 'B': np.random.choice(15, 8)}) df2    A   B 0  a   0 1  a  11 2  a   2 3  a  11 4  b  10 5  b  10 6  b  14 7  c   7

The expected should look something like

      a     b    c 0   0.0  10.0  7.0 1  11.0  10.0  NaN 2   2.0  14.0  NaN 3  11.0   NaN  NaN

How do I flatten the multiple index to single index after pivot?

From

   1  2    1  1  2 a  2  1  1 b  2  1  0 c  1  0  0

   1|1  2|1  2|2 a    2    1    1 b    2    1    0 c    1    0    0

534

asked Nov 07 '17 08:11

piRSquared

2 Answers

We start by answering the first question:

Question 1

Why do I get ValueError: Index contains duplicate entries, cannot reshape

This occurs because pandas is attempting to reindex either a columns or index object with duplicate entries. There are varying methods to use that can perform a pivot. Some of them are not well suited to when there are duplicates of the keys in which it is being asked to pivot on. For example. Consider pd.DataFrame.pivot. I know there are duplicate entries that share the row and col values:

df.duplicated(['row', 'col']).any()  True

So when I pivot using

df.pivot(index='row', columns='col', values='val0')

I get the error mentioned above. In fact, I get the same error when I try to perform the same task with:

df.set_index(['row', 'col'])['val0'].unstack()

Here is a list of idioms we can use to pivot

pd.DataFrame.groupby + pd.DataFrame.unstack
- Good general approach for doing just about any type of pivot
- You specify all columns that will constitute the pivoted row levels and column levels in one group by. You follow that by selecting the remaining columns you want to aggregate and the function(s) you want to perform the aggregation. Finally, you unstack the levels that you want to be in the column index.
pd.DataFrame.pivot_table
- A glorified version of groupby with more intuitive API. For many people, this is the preferred approach. And is the intended approach by the developers.
- Specify row level, column levels, values to be aggregated, and function(s) to perform aggregations.
pd.DataFrame.set_index + pd.DataFrame.unstack
- Convenient and intuitive for some (myself included). Cannot handle duplicate grouped keys.
- Similar to the groupby paradigm, we specify all columns that will eventually be either row or column levels and set those to be the index. We then unstack the levels we want in the columns. If either the remaining index levels or column levels are not unique, this method will fail.
pd.DataFrame.pivot
- Very similar to set_index in that it shares the duplicate key limitation. The API is very limited as well. It only takes scalar values for index, columns, values.
- Similar to the pivot_table method in that we select rows, columns, and values on which to pivot. However, we cannot aggregate and if either rows or columns are not unique, this method will fail.
pd.crosstab
- This a specialized version of pivot_table and in its purest form is the most intuitive way to perform several tasks.
pd.factorize + np.bincount
- This is a highly advanced technique that is very obscure but is very fast. It cannot be used in all circumstances, but when it can be used and you are comfortable using it, you will reap the performance rewards.
pd.get_dummies + pd.DataFrame.dot
- I use this for cleverly performing cross tabulation.

Examples

What I'm going to do for each subsequent answer and question is to answer it using pd.DataFrame.pivot_table. Then I'll provide alternatives to perform the same task.

Question 3

How do I pivot df such that the col values are columns, row values are the index, mean of val0 are the values, and missing values are 0?

pd.DataFrame.pivot_table

fill_value is not set by default. I tend to set it appropriately. In this case I set it to 0. Notice I skipped question 2 as it's the same as this answer without the fill_value

aggfunc='mean' is the default and I didn't have to set it. I included it to be explicit.

    df.pivot_table(         values='val0', index='row', columns='col',         fill_value=0, aggfunc='mean')      col   col0   col1   col2   col3  col4     row     row0  0.77  0.605  0.000  0.860  0.65     row2  0.13  0.000  0.395  0.500  0.25     row3  0.00  0.310  0.000  0.545  0.00     row4  0.00  0.100  0.395  0.760  0.24

pd.DataFrame.groupby

  df.groupby(['row', 'col'])['val0'].mean().unstack(fill_value=0)

pd.crosstab

  pd.crosstab(       index=df['row'], columns=df['col'],       values=df['val0'], aggfunc='mean').fillna(0)

Question 4

Can I get something other than mean, like maybe sum?

pd.DataFrame.pivot_table

  df.pivot_table(       values='val0', index='row', columns='col',       fill_value=0, aggfunc='sum')    col   col0  col1  col2  col3  col4   row   row0  0.77  1.21  0.00  0.86  0.65   row2  0.13  0.00  0.79  0.50  0.50   row3  0.00  0.31  0.00  1.09  0.00   row4  0.00  0.10  0.79  1.52  0.24

pd.DataFrame.groupby

  df.groupby(['row', 'col'])['val0'].sum().unstack(fill_value=0)

pd.crosstab

  pd.crosstab(       index=df['row'], columns=df['col'],       values=df['val0'], aggfunc='sum').fillna(0)

Question 5

Can I do more that one aggregation at a time?

Notice that for pivot_table and crosstab I needed to pass list of callables. On the other hand, groupby.agg is able to take strings for a limited number of special functions. groupby.agg would also have taken the same callables we passed to the others, but it is often more efficient to leverage the string function names as there are efficiencies to be gained.

pd.DataFrame.pivot_table

  df.pivot_table(       values='val0', index='row', columns='col',       fill_value=0, aggfunc=[np.size, np.mean])         size                      mean   col  col0 col1 col2 col3 col4  col0   col1   col2   col3  col4   row   row0    1    2    0    1    1  0.77  0.605  0.000  0.860  0.65   row2    1    0    2    1    2  0.13  0.000  0.395  0.500  0.25   row3    0    1    0    2    0  0.00  0.310  0.000  0.545  0.00   row4    0    1    2    2    1  0.00  0.100  0.395  0.760  0.24

pd.DataFrame.groupby

  df.groupby(['row', 'col'])['val0'].agg(['size', 'mean']).unstack(fill_value=0)

pd.crosstab

  pd.crosstab(       index=df['row'], columns=df['col'],       values=df['val0'], aggfunc=[np.size, np.mean]).fillna(0, downcast='infer')

Question 6

Can I aggregate over multiple value columns?

pd.DataFrame.pivot_table we pass values=['val0', 'val1'] but we could've left that off completely

  df.pivot_table(       values=['val0', 'val1'], index='row', columns='col',       fill_value=0, aggfunc='mean')          val0                             val1   col   col0   col1   col2   col3  col4  col0   col1  col2   col3  col4   row   row0  0.77  0.605  0.000  0.860  0.65  0.01  0.745  0.00  0.010  0.02   row2  0.13  0.000  0.395  0.500  0.25  0.45  0.000  0.34  0.440  0.79   row3  0.00  0.310  0.000  0.545  0.00  0.00  0.230  0.00  0.075  0.00   row4  0.00  0.100  0.395  0.760  0.24  0.00  0.070  0.42  0.300  0.46

pd.DataFrame.groupby

  df.groupby(['row', 'col'])['val0', 'val1'].mean().unstack(fill_value=0)

Question 7

Can Subdivide by multiple columns?

pd.DataFrame.pivot_table

  df.pivot_table(       values='val0', index='row', columns=['item', 'col'],       fill_value=0, aggfunc='mean')    item item0             item1                         item2   col   col2  col3  col4  col0  col1  col2  col3  col4  col0   col1  col3  col4   row   row0  0.00  0.00  0.00  0.77  0.00  0.00  0.00  0.00  0.00  0.605  0.86  0.65   row2  0.35  0.00  0.37  0.00  0.00  0.44  0.00  0.00  0.13  0.000  0.50  0.13   row3  0.00  0.00  0.00  0.00  0.31  0.00  0.81  0.00  0.00  0.000  0.28  0.00   row4  0.15  0.64  0.00  0.00  0.10  0.64  0.88  0.24  0.00  0.000  0.00  0.00

pd.DataFrame.groupby

  df.groupby(       ['row', 'item', 'col']   )['val0'].mean().unstack(['item', 'col']).fillna(0).sort_index(1)

Question 8

Can Subdivide by multiple columns?

pd.DataFrame.pivot_table

  df.pivot_table(       values='val0', index=['key', 'row'], columns=['item', 'col'],       fill_value=0, aggfunc='mean')    item      item0             item1                         item2   col        col2  col3  col4  col0  col1  col2  col3  col4  col0  col1  col3  col4   key  row   key0 row0  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.86  0.00        row2  0.00  0.00  0.37  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.50  0.00        row3  0.00  0.00  0.00  0.00  0.31  0.00  0.81  0.00  0.00  0.00  0.00  0.00        row4  0.15  0.64  0.00  0.00  0.00  0.00  0.00  0.24  0.00  0.00  0.00  0.00   key1 row0  0.00  0.00  0.00  0.77  0.00  0.00  0.00  0.00  0.00  0.81  0.00  0.65        row2  0.35  0.00  0.00  0.00  0.00  0.44  0.00  0.00  0.00  0.00  0.00  0.13        row3  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.28  0.00        row4  0.00  0.00  0.00  0.00  0.10  0.00  0.00  0.00  0.00  0.00  0.00  0.00   key2 row0  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.40  0.00  0.00        row2  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.13  0.00  0.00  0.00        row4  0.00  0.00  0.00  0.00  0.00  0.64  0.88  0.00  0.00  0.00  0.00  0.00

pd.DataFrame.groupby

  df.groupby(       ['key', 'row', 'item', 'col']   )['val0'].mean().unstack(['item', 'col']).fillna(0).sort_index(1)

pd.DataFrame.set_index because the set of keys are unique for both rows and columns

  df.set_index(       ['key', 'row', 'item', 'col']   )['val0'].unstack(['item', 'col']).fillna(0).sort_index(1)

Question 9

Can I aggregate the frequency in which the column and rows occur together, aka "cross tabulation"?

pd.DataFrame.pivot_table

  df.pivot_table(index='row', columns='col', fill_value=0, aggfunc='size')        col   col0  col1  col2  col3  col4   row   row0     1     2     0     1     1   row2     1     0     2     1     2   row3     0     1     0     2     0   row4     0     1     2     2     1

pd.DataFrame.groupby

  df.groupby(['row', 'col'])['val0'].size().unstack(fill_value=0)

pd.crosstab
```
  pd.crosstab(df['row'], df['col']) 
```

pd.factorize + np.bincount

  # get integer factorization `i` and unique values `r`   # for column `'row'`   i, r = pd.factorize(df['row'].values)   # get integer factorization `j` and unique values `c`   # for column `'col'`   j, c = pd.factorize(df['col'].values)   # `n` will be the number of rows   # `m` will be the number of columns   n, m = r.size, c.size   # `i * m + j` is a clever way of counting the   # factorization bins assuming a flat array of length   # `n * m`.  Which is why we subsequently reshape as `(n, m)`   b = np.bincount(i * m + j, minlength=n * m).reshape(n, m)   # BTW, whenever I read this, I think 'Bean, Rice, and Cheese'   pd.DataFrame(b, r, c)          col3  col2  col0  col1  col4   row3     2     0     0     1     0   row2     1     2     1     0     2   row0     1     0     1     2     1   row4     2     2     0     1     1

pd.get_dummies

  pd.get_dummies(df['row']).T.dot(pd.get_dummies(df['col']))          col0  col1  col2  col3  col4   row0     1     2     0     1     1   row2     1     0     2     1     2   row3     0     1     0     2     0   row4     0     1     2     2     1

Question 10

How do I convert a DataFrame from long to wide by pivoting on ONLY two columns?

DataFrame.pivot

The first step is to assign a number to each row - this number will be the row index of that value in the pivoted result. This is done using GroupBy.cumcount:

  df2.insert(0, 'count', df2.groupby('A').cumcount())   df2       count  A   B   0      0  a   0   1      1  a  11   2      2  a   2   3      3  a  11   4      0  b  10   5      1  b  10   6      2  b  14   7      0  c   7

The second step is to use the newly created column as the index to call DataFrame.pivot.

  df2.pivot(*df2)   # df2.pivot(index='count', columns='A', values='B')    A         a     b    c   count   0       0.0  10.0  7.0   1      11.0  10.0  NaN   2       2.0  14.0  NaN   3      11.0   NaN  NaN

DataFrame.pivot_table

Whereas DataFrame.pivot only accepts columns, DataFrame.pivot_table also accepts arrays, so the GroupBy.cumcount can be passed directly as the index without creating an explicit column.

  df2.pivot_table(index=df2.groupby('A').cumcount(), columns='A', values='B')    A         a     b    c   0       0.0  10.0  7.0   1      11.0  10.0  NaN   2       2.0  14.0  NaN   3      11.0   NaN  NaN

Question 11

How do I flatten the multiple index to single index after pivot

If columns type object with string join

df.columns = df.columns.map('|'.join)

else format

df.columns = df.columns.map('{0[0]}|{0[1]}'.format)

answered Oct 12 '22 00:10

piRSquared

To extend @piRSquared's answer another version of Question 10

Question 10.1

DataFrame:

d = data = {'A': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3, 6: 5},  'B': {0: 'a', 1: 'b', 2: 'c', 3: 'a', 4: 'b', 5: 'a', 6: 'c'}} df = pd.DataFrame(d)     A  B 0  1  a 1  1  b 2  1  c 3  2  a 4  2  b 5  3  a 6  5  c

Output:

   0     1     2 A 1  a     b     c 2  a     b  None 3  a  None  None 5  c  None  None

Using df.groupby and pd.Series.tolist

t = df.groupby('A')['B'].apply(list) out = pd.DataFrame(t.tolist(),index=t.index) out    0     1     2 A 1  a     b     c 2  a     b  None 3  a  None  None 5  c  None  None

Or A much better alternative using pd.pivot_table with df.squeeze.

t = df.pivot_table(index='A',values='B',aggfunc=list).squeeze() out = pd.DataFrame(t.tolist(),index=t.index)

answered Oct 11 '22 23:10

Ch3steR

Related questions
                            
                                Append values to a set in Python
                            
                                logger configuration to log to file and print to stdout
                            
                                No module named MySQLdb
                            
                                Read .mat files in Python
                            
                                How to write inline if statement for print?
                            
                                Pretty printing XML in Python
                            
                                Elegant ways to support equivalence ("equality") in Python classes
                            
                                Why do some functions have underscores "__" before and after the function name?
                            
                                What's the pythonic way to use getters and setters?
                            
                                Why are there no ++ and --​ operators in Python?
                            
                                Reimport a module in python while interactive
                            
                                Why does "not(True) in [False, True]" return False?
                            
                                How to get the return value from a thread in python?
                            
                                Check if all elements in a list are identical
                            
                                How can I get list of values from dict?
                            
                                Change the name of a key in dictionary
                            
                                Expanding tuples into arguments
                            
                                Remap values in pandas column with a dict, preserve NaNs
                            
                                pandas create new column based on values from other columns / apply a function of multiple columns, row-wise
                            
                                ISO time (ISO 8601) in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I pivot a dataframe?

Tags:

python

pandas

group-by

pandas-groupby

pivot

Setup

Question(s)

piRSquared

People also ask

2 Answers

Question 1

Examples

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question 11

piRSquared

Question 10.1

Ch3steR

Recent Activity

Donate For Us