Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ValueError: Index contains duplicate entries, cannot reshape

I'm trying to reshape my pd dataframe with the following function:

 ar = ar.pivot(index='Received', columns='Merch Ref', values='acceptance_rate')

The dataset looks like:

     Merch Ref            Received  acceptance_rate
0           SF 2014-08-28 15:38:00                0
1           SF 2014-08-28 15:44:00                0
2           SF 2014-08-28 16:04:00                0
3           WF 2014-08-28 16:05:00                0
4           WF 2014-08-28 16:07:00                0
5           SF 2014-08-28 16:34:00                0
6           SF 2014-08-28 16:55:00                0
7           BF 2014-08-28 17:59:00                0
8           BF 2014-08-29 15:05:00                0
9           SF 2014-08-29 21:25:00                0
10          SF 2014-08-30 10:29:00                0
...

What I'd like to obtain is:

                      SF WF BF 
2014-08-28 15:38:00    0  1  0
2014-08-28 15:44:00    0  1  0
2014-08-28 16:04:00    0  0  1
2014-08-28 16:05:00    1  1  0
2014-08-28 16:07:00    0  0  1
2014-08-28 16:34:00    1  1  0
2014-08-28 16:55:00    1  1  0
2014-08-28 17:59:00    0  1  0
2014-08-29 15:05:00    0  0  1
2014-08-29 21:25:00    0  0  1 
2014-08-30 10:29:00    0  1  0

However, I get the error:

 ValueError: Index contains duplicate entries, cannot reshape

This is because i have some orders placed at the same time. Is there a way to sum/aggregate these orders ?

like image 788
Blue Moon Avatar asked Aug 03 '15 10:08

Blue Moon


People also ask

How do you solve index contains duplicate entries Cannot reshape?

To fix this error, we can use the pivot_table() function with a specific aggfunc argument to aggregate the data values in a certain way. Notice that we don't receive an error this time. The values in the DataFrame show the sum of points for each combination of team and position.

How do I drop duplicates in pandas?

Use DataFrame. drop_duplicates() to Drop Duplicate and Keep First Rows. You can use DataFrame. drop_duplicates() without any arguments to drop rows with the same values on all columns.

What is pivot table in pandas?

A pivot table is a similar operation that is commonly seen in spreadsheets and other programs that operate on tabular data. The pivot table takes simple column-wise data as input, and groups the entries into a two-dimensional table that provides a multidimensional summarization of the data.


2 Answers

Try to remove duplicate:

ar = ar.drop_duplicates(['Received','Merch Ref'])

it should work

like image 156
Farid Avatar answered Oct 19 '22 04:10

Farid


As you identified, the error occurs from duplicates in pairs (x, y) for x in Received and y in Merch Ref.

If you would like to aggregate by sum then

ar.pivot_table(index='Received', columns='Merch Ref',
               values='acceptance_rate', aggfunc=np.sum)

. The default aggregation function is mean. That is,

ar.pivot_table(index='Received', columns='Merch Ref',
               values='acceptance_rate')

, will pivot the table and all entries with the same (x, y) pair will be aggregated with the np.mean function.

Remark: I initially received the same error, but after iterating through the (x, y) pairs I didn't find any duplicates. It turns out some of the pairs were of the form (nan, nan) and were omitted from the iteration process. Thus for other users trying to debug what they believe are unique pairs, consider checking for nans with pd.isnull or pd.notnull.

like image 41
timctran Avatar answered Oct 19 '22 04:10

timctran