Is there any equivalent of <code>pandas.DataFrame.reset_index()</code> which operates on the columns and can handle the case of duplicate column names? I want it to throw away the column names and return a default numbered index 0,1,2.. for the columns. (Methods like <code>df.rename</code> or <code>df.reindex_axis</code> do not work when I have duplicate column names.) Sample input: <pre class="prettyprint"><code> pd.DataFrame(np.random.rand(5, 3), columns = ['A', 'A', 'B']) A A B 0 0.5 0.3 0.9 1 0.7 0.9 0.3 2 0.9 0.4 0.8 3 0.6 0.2 0.9 4 0.7 0.4 0.6 </code></pre> Expected output: <pre class="prettyprint"><code> 0 1 2 0 0.8 0.1 0.2 1 0.4 0.2 0.4 2 0.3 0.3 0.4 3 0.4 0.1 0.8 4 1.0 0.9 0.9 </code></pre>

Use <code>range</code> with length of columns by <code>shape</code>: <pre class="prettyprint"><code>df.columns = range(df.shape[1]) print (df) 0 1 2 0 0.228080 0.884450 0.753401 1 0.176790 0.741979 0.525305 2 0.680255 0.730258 0.449681 3 0.169420 0.660825 0.986554 4 0.302204 0.040413 0.902899 </code></pre> Another solution with double transposing by <code>T</code> and <code>reset_index</code> with parameter <code>drop=True</code>: <pre class="prettyprint"><code>df = df.T.reset_index(drop=True).T print (df) 0 1 2 0 0.024846 0.688193 0.887926 1 0.284681 0.895319 0.142876 2 0.440834 0.299527 0.762815 3 0.936967 0.928907 0.642960 4 0.801077 0.085773 0.866651 </code></pre>

pandas DataFrame reset_index which can handle duplicate column names?

Tags:

python

pandas

dataframe

duplicates

reindex

Is there any equivalent of pandas.DataFrame.reset_index() which operates on the columns and can handle the case of duplicate column names? I want it to throw away the column names and return a default numbered index 0,1,2.. for the columns. (Methods like df.rename or df.reindex_axis do not work when I have duplicate column names.)

Sample input:

 pd.DataFrame(np.random.rand(5, 3), columns = ['A', 'A', 'B'])

     A   A   B
0   0.5 0.3 0.9
1   0.7 0.9 0.3
2   0.9 0.4 0.8
3   0.6 0.2 0.9
4   0.7 0.4 0.6

Expected output:

     0   1   2
0   0.8 0.1 0.2
1   0.4 0.2 0.4
2   0.3 0.3 0.4
3   0.4 0.1 0.8
4   1.0 0.9 0.9

412

asked Jul 21 '16 10:07

FLab

Video Answer

2 Answers

you can use set_axis() method:

In [54]: df
Out[54]:
          A         A         B
0  0.934900  0.817182  0.166270
1  0.064543  0.139431  0.249576
2  0.709349  0.731913  0.965048
3  0.284955  0.479898  0.496652
4  0.520749  0.464256  0.999993

In [55]: df.set_axis(1, range(len(df.columns)))

In [56]: df
Out[56]:
          0         1         2
0  0.934900  0.817182  0.166270
1  0.064543  0.139431  0.249576
2  0.709349  0.731913  0.965048
3  0.284955  0.479898  0.496652
4  0.520749  0.464256  0.999993

answered Oct 14 '22 05:10

MaxU - stop WAR against UA

Use range with length of columns by shape:

df.columns = range(df.shape[1])
print (df)
          0         1         2
0  0.228080  0.884450  0.753401
1  0.176790  0.741979  0.525305
2  0.680255  0.730258  0.449681
3  0.169420  0.660825  0.986554
4  0.302204  0.040413  0.902899

Another solution with double transposing by T and reset_index with parameter drop=True:

df = df.T.reset_index(drop=True).T
print (df)
          0         1         2
0  0.024846  0.688193  0.887926
1  0.284681  0.895319  0.142876
2  0.440834  0.299527  0.762815
3  0.936967  0.928907  0.642960
4  0.801077  0.085773  0.866651

answered Oct 14 '22 04:10

jezrael

Related questions
                            
                                How to run Django management commands against Google Cloud SQL
                            
                                Randomly move a certain number of items in a Python list
                            
                                Difference in "dir" between Python 2 and 3
                            
                                Socket connection over internet in Python?
                            
                                Numpy.where function not finding values within array... Anyone know why?
                            
                                Z3 Int not defined error
                            
                                Python's matplotlib legend in separate axis with gridspec
                            
                                How to continue insertion after duplicate key error using PyMongo
                            
                                Search list of string elements that match another list of string elements
                            
                                How can i change django admin to rtl style
                            
                                Denormalize unit vector
                            
                                Why do file permissions show different in Python and bash?
                            
                                Polar heatmaps in python
                            
                                How to append a file with a newline using %%writefile -a command in jupyter?
                            
                                Multicore and multithread on Ipython Notebook
                            
                                How can I make my code more readable and DRYer when working with XML namespaces in Python?
                            
                                Python: search for a file in current directory and all it's parents
                            
                                Python TypeError: 'type' object does not support item assignment
                            
                                Timeseries plot with min/max shading using Seaborn
                            
                                Zipline: using pandas-datareader to feed in Google Finance dataframe for non-US based financial markets

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With