Is there any equivalent of pandas.DataFrame.reset_index()
which operates on the columns and can handle the case of duplicate column names? I want it to throw away the column names and return a default numbered index 0,1,2.. for the columns. (Methods like df.rename
or df.reindex_axis
do not work when I have duplicate column names.)
Sample input:
pd.DataFrame(np.random.rand(5, 3), columns = ['A', 'A', 'B'])
A A B
0 0.5 0.3 0.9
1 0.7 0.9 0.3
2 0.9 0.4 0.8
3 0.6 0.2 0.9
4 0.7 0.4 0.6
Expected output:
0 1 2
0 0.8 0.1 0.2
1 0.4 0.2 0.4
2 0.3 0.3 0.4
3 0.4 0.1 0.8
4 1.0 0.9 0.9
When we look at the smaller dataframe, it might still carry the row index of the original dataframe. If the original index are numbers, now we have indexes that are not continuous. Well, pandas has reset_index () function. So to reset the index to the default integer index beginning at 0, We can simply use the reset_index () function.
To remove the duplicate columns we can pass the list of duplicate column’s names returned by our user defines function getDuplicateColumns () to the Dataframe.drop () method. How to Drop Columns with NaN Values in Pandas DataFrame?
Code 1: Find duplicate columns in a DataFrame. To find duplicate columns we need to iterate through all columns of a DataFrame and for each and every column it will search if any other column exists in DataFrame with the same contents already. If yes then that column name will be stored in the duplicate column set.
pandas.DataFrame.reset_index¶. Reset the index, or a level of it. Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels. Only remove the given levels from the index. Removes all levels by default.
you can use set_axis() method:
In [54]: df
Out[54]:
A A B
0 0.934900 0.817182 0.166270
1 0.064543 0.139431 0.249576
2 0.709349 0.731913 0.965048
3 0.284955 0.479898 0.496652
4 0.520749 0.464256 0.999993
In [55]: df.set_axis(1, range(len(df.columns)))
In [56]: df
Out[56]:
0 1 2
0 0.934900 0.817182 0.166270
1 0.064543 0.139431 0.249576
2 0.709349 0.731913 0.965048
3 0.284955 0.479898 0.496652
4 0.520749 0.464256 0.999993
Use range
with length of columns by shape
:
df.columns = range(df.shape[1])
print (df)
0 1 2
0 0.228080 0.884450 0.753401
1 0.176790 0.741979 0.525305
2 0.680255 0.730258 0.449681
3 0.169420 0.660825 0.986554
4 0.302204 0.040413 0.902899
Another solution with double transposing by T
and reset_index
with parameter drop=True
:
df = df.T.reset_index(drop=True).T
print (df)
0 1 2
0 0.024846 0.688193 0.887926
1 0.284681 0.895319 0.142876
2 0.440834 0.299527 0.762815
3 0.936967 0.928907 0.642960
4 0.801077 0.085773 0.866651
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With