Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reindex a pandas DataFrame after concatenation

Tags:

python

pandas

Suppose I concatenate two DataFrames like so:

import numpy as np
import pandas as pd

array1 = np.random.randn(3,3)
array2 = np.random.randn(3,3)

df1 = pd.DataFrame(array1, columns=list('ABC'))
df2 = pd.DataFrame(array2, columns=list('ABC'))

df = pd.concat([df1, df2])

The resulting DataFrame df looks like this:

          A         B         C
0  1.297362  0.745510 -0.206756
1 -0.056807 -1.875149 -0.210556
2  0.310837 -1.068873  2.054006
0  1.163739 -0.678165  2.626052
1 -0.557625 -1.448195 -1.391434
2  0.222607 -0.334348  0.672643

Note that the indices are the same as in the original DataFrames. I'd like to re-index df such that the indices simply run from 0 to 5. How can I do this?

(I've tried df = df.reindex(index = range(df.shape[0])) but this gives ValueError: cannot reindex from a duplicate axis. This is because the original axis contains duplicates (two 0s, two 1s, etc.)).

like image 418
Kurt Peek Avatar asked Jul 14 '16 14:07

Kurt Peek


Video Answer


1 Answers

You want to pass ignore_index=True to concat:

In [68]:
array1 = np.random.randn(3,3)
array2 = np.random.randn(3,3)
​
df1 = pd.DataFrame(array1, columns=list('ABC'))
df2 = pd.DataFrame(array2, columns=list('ABC'))
​
df = pd.concat([df1, df2], ignore_index=True)
df

Out[68]:
          A         B         C
0 -0.091094  0.460133 -0.548937
1 -0.839469 -1.354138 -0.823666
2  0.088581 -1.142542 -1.746608
3  0.067320  1.014533 -1.294371
4  2.094135  0.622129  1.203257
5  0.415768 -0.467081 -0.740371

This will ignore the existing indices so in effect it sets a new index starting from 0 for the newly concatenated index

like image 50
EdChum Avatar answered Sep 25 '22 06:09

EdChum