I'm trying to reshape my pd dataframe with the following function: <pre class="prettyprint"><code> ar = ar.pivot(index='Received', columns='Merch Ref', values='acceptance_rate') </code></pre> The dataset looks like: <pre class="prettyprint"><code> Merch Ref Received acceptance_rate 0 SF 2014-08-28 15:38:00 0 1 SF 2014-08-28 15:44:00 0 2 SF 2014-08-28 16:04:00 0 3 WF 2014-08-28 16:05:00 0 4 WF 2014-08-28 16:07:00 0 5 SF 2014-08-28 16:34:00 0 6 SF 2014-08-28 16:55:00 0 7 BF 2014-08-28 17:59:00 0 8 BF 2014-08-29 15:05:00 0 9 SF 2014-08-29 21:25:00 0 10 SF 2014-08-30 10:29:00 0 ... </code></pre> What I'd like to obtain is: <pre class="prettyprint"><code> SF WF BF 2014-08-28 15:38:00 0 1 0 2014-08-28 15:44:00 0 1 0 2014-08-28 16:04:00 0 0 1 2014-08-28 16:05:00 1 1 0 2014-08-28 16:07:00 0 0 1 2014-08-28 16:34:00 1 1 0 2014-08-28 16:55:00 1 1 0 2014-08-28 17:59:00 0 1 0 2014-08-29 15:05:00 0 0 1 2014-08-29 21:25:00 0 0 1 2014-08-30 10:29:00 0 1 0 </code></pre> However, I get the error: <pre class="prettyprint"><code> ValueError: Index contains duplicate entries, cannot reshape </code></pre> This is because i have some orders placed at the same time. Is there a way to sum/aggregate these orders ?

Try to remove duplicate: <pre class="prettyprint"><code>ar = ar.drop_duplicates(['Received','Merch Ref']) </code></pre> it should work

As you identified, the error occurs from duplicates in pairs (x, y) for x in <code>Received</code> and y in <code>Merch Ref</code>. If you would like to aggregate by <code>sum</code> then <pre class="prettyprint"><code>ar.pivot_table(index='Received', columns='Merch Ref', values='acceptance_rate', aggfunc=np.sum) </code></pre> . The default aggregation function is <code>mean</code>. That is, <pre class="prettyprint"><code>ar.pivot_table(index='Received', columns='Merch Ref', values='acceptance_rate') </code></pre> , will pivot the table and all entries with the same (x, y) pair will be aggregated with the <code>np.mean</code> function. Remark: I initially received the same error, but after iterating through the (x, y) pairs I didn't find any duplicates. It turns out some of the pairs were of the form (<code>nan</code>, <code>nan</code>) and were omitted from the iteration process. Thus for other users trying to debug what they believe are unique pairs, consider checking for <code>nan</code>s with <code>pd.isnull</code> or <code>pd.notnull</code>.

ValueError: Index contains duplicate entries, cannot reshape

I'm trying to reshape my pd dataframe with the following function:

 ar = ar.pivot(index='Received', columns='Merch Ref', values='acceptance_rate')

The dataset looks like:

     Merch Ref            Received  acceptance_rate
0           SF 2014-08-28 15:38:00                0
1           SF 2014-08-28 15:44:00                0
2           SF 2014-08-28 16:04:00                0
3           WF 2014-08-28 16:05:00                0
4           WF 2014-08-28 16:07:00                0
5           SF 2014-08-28 16:34:00                0
6           SF 2014-08-28 16:55:00                0
7           BF 2014-08-28 17:59:00                0
8           BF 2014-08-29 15:05:00                0
9           SF 2014-08-29 21:25:00                0
10          SF 2014-08-30 10:29:00                0
...

What I'd like to obtain is:

                      SF WF BF 
2014-08-28 15:38:00    0  1  0
2014-08-28 15:44:00    0  1  0
2014-08-28 16:04:00    0  0  1
2014-08-28 16:05:00    1  1  0
2014-08-28 16:07:00    0  0  1
2014-08-28 16:34:00    1  1  0
2014-08-28 16:55:00    1  1  0
2014-08-28 17:59:00    0  1  0
2014-08-29 15:05:00    0  0  1
2014-08-29 21:25:00    0  0  1 
2014-08-30 10:29:00    0  1  0

However, I get the error:

 ValueError: Index contains duplicate entries, cannot reshape

This is because i have some orders placed at the same time. Is there a way to sum/aggregate these orders ?

How do you solve index contains duplicate entries Cannot reshape?

To fix this error, we can use the pivot_table() function with a specific aggfunc argument to aggregate the data values in a certain way. Notice that we don't receive an error this time. The values in the DataFrame show the sum of points for each combination of team and position.

How do I drop duplicates in pandas?

Use DataFrame. drop_duplicates() to Drop Duplicate and Keep First Rows. You can use DataFrame. drop_duplicates() without any arguments to drop rows with the same values on all columns.

What is pivot table in pandas?

A pivot table is a similar operation that is commonly seen in spreadsheets and other programs that operate on tabular data. The pivot table takes simple column-wise data as input, and groups the entries into a two-dimensional table that provides a multidimensional summarization of the data.

Try to remove duplicate:

ar = ar.drop_duplicates(['Received','Merch Ref'])

it should work

As you identified, the error occurs from duplicates in pairs (x, y) for x in Received and y in Merch Ref.

If you would like to aggregate by sum then

ar.pivot_table(index='Received', columns='Merch Ref',
               values='acceptance_rate', aggfunc=np.sum)

. The default aggregation function is mean. That is,

ar.pivot_table(index='Received', columns='Merch Ref',
               values='acceptance_rate')

, will pivot the table and all entries with the same (x, y) pair will be aggregated with the np.mean function.

Remark: I initially received the same error, but after iterating through the (x, y) pairs I didn't find any duplicates. It turns out some of the pairs were of the form (nan, nan) and were omitted from the iteration process. Thus for other users trying to debug what they believe are unique pairs, consider checking for nans with pd.isnull or pd.notnull.

ValueError: Index contains duplicate entries, cannot reshape

Tags:

pandas

pivot

pivot-table

Blue Moon

People also ask

2 Answers

Farid

timctran

Recent Activity

Donate For Us

ValueError: Index contains duplicate entries, cannot reshape

Tags:

pandas

pivot

pivot-table

Blue Moon

People also ask

2 Answers

Farid

timctran

Related questions

Recent Activity

Donate For Us