Reindex a dataframe with duplicate index values

Tags:

So I imported and merged 4 csv's into one dataframe called data. However, upon inspecting the dataframe's index with:

index_series = pd.Series(data.index.values)
index_series.value_counts()

I see that multiple index entries have 4 counts. I want to completely reindex the data dataframe so each row now has a unique index value. I tried:

Click to copy

data.reindex(np.arange(len(data)))

which gave the error "ValueError: cannot reindex from a duplicate axis." A google search leads me to think this error is because the there are up to 4 rows that share a same index value. Any idea how I can do this reindexing without dropping any rows? I don't particularly care about the order of the rows either as I can always sort it.

UPDATE: So in the end I did find a way to reindex like I wanted.

Click to copy

data['index'] = np.arange(len(data))
data = data.set_index('index')

As I understand it, I just added a new column called 'index' to my data frame, and then set that column as my index. As for my csv's, they were the four csv's under "download loan data" on this page of Lending Club loan stats.

256

asked Jun 22 '15 18:06

Justin H

1 Answers

It's pretty easy to replicate your error with this sample data:

Click to copy

In [92]: data = pd.DataFrame( [33,55,88,22], columns=['x'], index=[0,0,1,2] )

In [93]: data.index.is_unique
Out[93]: False

In [94:] data.reindex(np.arange(len(data)))  # same error message

The problem is because reindex requires unique index values. In this case, you don't want to preserve the old index values, you merely want new index values that are unique. The easiest way to do that is:

Click to copy

In [95]: data.reset_index(drop=True)
Out[72]: 
    x
0  33
1  55
2  88
3  22

Note that you can leave off drop=True if you want to retain the old index values.

answered Nov 08 '22 18:11

JohnE

Related questions
                            
                                How to create a Python script to automate software installation? [closed]
                            
                                Custom exceptions are not raised properly when used in Multiprocessing Pool
                            
                                How to run a Python unit test with the Atom editor?
                            
                                Assert mocked function called with json string in python
                            
                                UnicodeDecodeError when logging an Exception in Python
                            
                                Python subclassing process with initialiser
                            
                                Pandas with different length arrays
                            
                                Installing pygame module in anaconda mac
                            
                                Why my lambdas do not work? [duplicate]
                            
                                How to group by multiple keys in spark?
                            
                                Querying a django model using a model name string input
                            
                                Get minimum point(s) of numpy.poly1d curve
                            
                                Regex to extract between two strings (which are variables)
                            
                                Django: How do I use is_active of auth_user table?
                            
                                Pybabel generates empty pot file with jinja2
                            
                                Add new key value pair to existing Firebase
                            
                                Where does cython pyximport compile?
                            
                                main loop 'builtin_function_or_method' object is not iterable
                            
                                How to pass OpenCV image to Tesseract in python?
                            
                                Pandas guess delimiter with sep=None

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reindex a dataframe with duplicate index values

Tags:

python

pandas

reindex

Justin H

People also ask

1 Answers

JohnE

Recent Activity

Donate For Us