Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas DataFrame reset_index which can handle duplicate column names?

Is there any equivalent of pandas.DataFrame.reset_index() which operates on the columns and can handle the case of duplicate column names? I want it to throw away the column names and return a default numbered index 0,1,2.. for the columns. (Methods like df.rename or df.reindex_axis do not work when I have duplicate column names.)

Sample input:

 pd.DataFrame(np.random.rand(5, 3), columns = ['A', 'A', 'B'])

     A   A   B
0   0.5 0.3 0.9
1   0.7 0.9 0.3
2   0.9 0.4 0.8
3   0.6 0.2 0.9
4   0.7 0.4 0.6

Expected output:

     0   1   2
0   0.8 0.1 0.2
1   0.4 0.2 0.4
2   0.3 0.3 0.4
3   0.4 0.1 0.8
4   1.0 0.9 0.9
like image 412
FLab Avatar asked Jul 21 '16 10:07

FLab


People also ask

How to reset the index of a Dataframe in pandas?

When we look at the smaller dataframe, it might still carry the row index of the original dataframe. If the original index are numbers, now we have indexes that are not continuous. Well, pandas has reset_index () function. So to reset the index to the default integer index beginning at 0, We can simply use the reset_index () function.

How to remove duplicate columns in pandas Dataframe?

To remove the duplicate columns we can pass the list of duplicate column’s names returned by our user defines function getDuplicateColumns () to the Dataframe.drop () method. How to Drop Columns with NaN Values in Pandas DataFrame?

How do you find duplicates in a Dataframe?

Code 1: Find duplicate columns in a DataFrame. To find duplicate columns we need to iterate through all columns of a DataFrame and for each and every column it will search if any other column exists in DataFrame with the same contents already. If yes then that column name will be stored in the duplicate column set.

How do I remove a level from a Dataframe in pandas?

pandas.DataFrame.reset_index¶. Reset the index, or a level of it. Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels. Only remove the given levels from the index. Removes all levels by default.


Video Answer


2 Answers

you can use set_axis() method:

In [54]: df
Out[54]:
          A         A         B
0  0.934900  0.817182  0.166270
1  0.064543  0.139431  0.249576
2  0.709349  0.731913  0.965048
3  0.284955  0.479898  0.496652
4  0.520749  0.464256  0.999993

In [55]: df.set_axis(1, range(len(df.columns)))

In [56]: df
Out[56]:
          0         1         2
0  0.934900  0.817182  0.166270
1  0.064543  0.139431  0.249576
2  0.709349  0.731913  0.965048
3  0.284955  0.479898  0.496652
4  0.520749  0.464256  0.999993
like image 92
MaxU - stop WAR against UA Avatar answered Oct 14 '22 05:10

MaxU - stop WAR against UA


Use range with length of columns by shape:

df.columns = range(df.shape[1])
print (df)
          0         1         2
0  0.228080  0.884450  0.753401
1  0.176790  0.741979  0.525305
2  0.680255  0.730258  0.449681
3  0.169420  0.660825  0.986554
4  0.302204  0.040413  0.902899

Another solution with double transposing by T and reset_index with parameter drop=True:

df = df.T.reset_index(drop=True).T
print (df)
          0         1         2
0  0.024846  0.688193  0.887926
1  0.284681  0.895319  0.142876
2  0.440834  0.299527  0.762815
3  0.936967  0.928907  0.642960
4  0.801077  0.085773  0.866651
like image 43
jezrael Avatar answered Oct 14 '22 04:10

jezrael