I'm trying to reshape my pd dataframe with the following function:
ar = ar.pivot(index='Received', columns='Merch Ref', values='acceptance_rate')
The dataset looks like:
Merch Ref Received acceptance_rate
0 SF 2014-08-28 15:38:00 0
1 SF 2014-08-28 15:44:00 0
2 SF 2014-08-28 16:04:00 0
3 WF 2014-08-28 16:05:00 0
4 WF 2014-08-28 16:07:00 0
5 SF 2014-08-28 16:34:00 0
6 SF 2014-08-28 16:55:00 0
7 BF 2014-08-28 17:59:00 0
8 BF 2014-08-29 15:05:00 0
9 SF 2014-08-29 21:25:00 0
10 SF 2014-08-30 10:29:00 0
...
What I'd like to obtain is:
SF WF BF
2014-08-28 15:38:00 0 1 0
2014-08-28 15:44:00 0 1 0
2014-08-28 16:04:00 0 0 1
2014-08-28 16:05:00 1 1 0
2014-08-28 16:07:00 0 0 1
2014-08-28 16:34:00 1 1 0
2014-08-28 16:55:00 1 1 0
2014-08-28 17:59:00 0 1 0
2014-08-29 15:05:00 0 0 1
2014-08-29 21:25:00 0 0 1
2014-08-30 10:29:00 0 1 0
However, I get the error:
ValueError: Index contains duplicate entries, cannot reshape
This is because i have some orders placed at the same time. Is there a way to sum/aggregate these orders ?
To fix this error, we can use the pivot_table() function with a specific aggfunc argument to aggregate the data values in a certain way. Notice that we don't receive an error this time. The values in the DataFrame show the sum of points for each combination of team and position.
Use DataFrame. drop_duplicates() to Drop Duplicate and Keep First Rows. You can use DataFrame. drop_duplicates() without any arguments to drop rows with the same values on all columns.
A pivot table is a similar operation that is commonly seen in spreadsheets and other programs that operate on tabular data. The pivot table takes simple column-wise data as input, and groups the entries into a two-dimensional table that provides a multidimensional summarization of the data.
Try to remove duplicate:
ar = ar.drop_duplicates(['Received','Merch Ref'])
it should work
As you identified, the error occurs from duplicates in pairs (x, y) for x in Received
and y in Merch Ref
.
If you would like to aggregate by sum
then
ar.pivot_table(index='Received', columns='Merch Ref',
values='acceptance_rate', aggfunc=np.sum)
. The default aggregation function is mean
. That is,
ar.pivot_table(index='Received', columns='Merch Ref',
values='acceptance_rate')
, will pivot the table and all entries with the same (x, y) pair will be aggregated with the np.mean
function.
Remark: I initially received the same error, but after iterating through the (x, y) pairs I didn't find any duplicates. It turns out some of the pairs were of the form (nan
, nan
) and were omitted from the iteration process. Thus for other users trying to debug what they believe are unique pairs, consider checking for nan
s with pd.isnull
or pd.notnull
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With