My dataframe looks like this:
SKU # GRP CATG PRD 0 54995 9404000 4040 99999 1 54999 9404000 4040 99999 2 55037 9404000 4040 1556894 3 55148 9404000 4040 1556894 4 55254 9404000 4040 1556894 5 55291 9404000 4040 1556894 6 55294 9404000 4040 1556895 7 55445 9404000 4040 1556895 8 55807 9404001 4040 1556896 9 49021 9404002 4040 1556897 10 49035 9404002 4040 1556897 11 27538 9404000 4040 1556898 12 27539 9404000 4040 1556899 13 27540 9404000 4040 1556894 14 27542 9404000 4040 1556900 15 27543 9404000 4040 1556900 16 27544 9404003 4040 1556901 17 27546 9404004 4040 1556902 18 99111 9404005 4040 1556903 19 99112 9404006 4040 1556904 20 99113 9404007 4040 1556905 21 99116 9404008 4040 1556906 22 99119 9404009 4040 1556907 23 99122 94040010 4040 1556908 24 99125 94040011 4040 1556909 25 86007 94040012 4040 1556910 26 86010 94040013 4040 1556911
And when I try to perform a group by operation on the above dataframe, I get the "cannot reindex from a duplicate axis" error.
df.groupby(['GRP','CATG'],as_index=False)['PRD'].min()
I tried to find out the duplicate indices using:
df[df.index.duplicated()]
But didn't return any thing. How can I go about resolving this issue?
Indicate duplicate index values. Duplicated values are indicated as True values in the resulting array. Either all duplicates, all except the first, or all except the last occurrence of duplicates can be indicated. The value or values in a set of duplicates to mark as missing.
Pandas drop_duplicates() Function Syntax keep: allowed values are {'first', 'last', False}, default 'first'. If 'first', duplicate rows except the first one is deleted. If 'last', duplicate rows except the last one is deleted. If False, all the duplicate rows are deleted.
Use DataFrame.reset_index() function reset_index() to reset the index of the updated DataFrame. By default, it adds the current row index as a new column called 'index' in DataFrame, and it will create a new row index as a range of numbers starting at 0.
This error is often thrown due to duplications in your column names (not necessarily values)
First, just check if there is any duplication in your column names using the code:
df.columns.duplicated().any()
If it's true, then remove the duplicated columns
df.loc[:,~df.columns.duplicated()]
After you remove the duplicated columns, you should be able to run your groupby
operation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With