Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convenient way to deal with ValueError: cannot reindex from a duplicate axis

I am able to search suggestions that show the 'cause' of this error message, but not how to address it -

I encounter this problem every time I try to add a new column to a pandas dataframe by concatenating string values in 2 existing columns.

For instance:

wind['timestamp'] = wind['DATE (MM/DD/YYYY)'] + ' ' + temp['stamp']

It works if the first item and the second merged with ' ' are each separate dataframe/series.

These attempts are to have date & time merged into the same column so that they get recognized as datetime stamps by pandas library.

I am not certain if I am wrongly using the command or if it is the pandas library features are internally limited, as it keeps returning the duplicate axis error msg. I understand the latter is highly unlikely hahaha ...

Could I hear some quick and easy solution out of this?

I mean, I thought sum/subtract and all these operations between column values in a dataframe would be quite easy. Shouldn't be too hard to have it visible on the table either right?

like image 942
dia Avatar asked Aug 21 '18 17:08

dia


People also ask

How do you solve Cannot reindex from a duplicate axis?

In order to make sure your DataFrame cannot contain duplicate values in the index, you can set allows_duplicate_labels flag to False for preventing the assignment of duplicate values.

What does Valueerror Cannot reindex from a duplicate axis mean?

In Python, you will get a valueerror: cannot reindex from a duplicate axis usually when you set an index to a specific value, reindexing or resampling the DataFrame using reindex method. If you look at the error message “cannot reindex from a duplicate axis“, it means that Pandas DataFrame has duplicate index values.

How do I get rid of Pandas indexing?

Dropping a Pandas Index Column Using reset_index The most straightforward way to drop a Pandas dataframe index is to use the Pandas . reset_index() method. By default, the method will only reset the index, forcing values from 0 - len(df)-1 as the index.

Can Pandas index have duplicates?

Indicate duplicate index values. Duplicated values are indicated as True values in the resulting array. Either all duplicates, all except the first, or all except the last occurrence of duplicates can be indicated. The value or values in a set of duplicates to mark as missing.


1 Answers

Operations between series require non-duplicated indices, otherwise Pandas doesn't know how to align values in calculations. This isn't the case with your data currently.

If you are certain that your series are aligned by position, you can call reset_index on each dataframe:

wind = pd.DataFrame({'DATE (MM/DD/YYYY)': ['2018-01-01', '2018-02-01', '2018-03-01']})
temp = pd.DataFrame({'stamp': ['1', '2', '3']}, index=[0, 1, 1])

# ATTEMPT 1: FAIL
wind['timestamp'] = wind['DATE (MM/DD/YYYY)'] + ' ' + temp['stamp']
# ValueError: cannot reindex from a duplicate axis

# ATTEMPT 2: SUCCESS
wind = wind.reset_index(drop=True)
temp = temp.reset_index(drop=True)
wind['timestamp'] = wind['DATE (MM/DD/YYYY)'] + ' ' + temp['stamp']

print(wind)

  DATE (MM/DD/YYYY)     timestamp
0        2018-01-01  2018-01-01 1
1        2018-02-01  2018-02-01 2
2        2018-03-01  2018-03-01 3
like image 154
jpp Avatar answered Sep 30 '22 10:09

jpp