Is there a reset_index for columns or a way to move column headers to an inner index leaving their index positions as the outer index?

Tags:

pandas

Sample DataFrame:

import numpy as np
df = pd.DataFrame(np.random.randint(0, 10, size=(10, 4)), columns=list('ABCD'))

Is there a way to reset index for columns? or to easily insert a row with column index position values? I'd prefer the index positions to be the outer most index and be left with the column headers as the inner most index.

426

asked Apr 27 '17 18:04

Yale Newman

2 Answers

a.1) Drop column names

df.columns = pd.RangeIndex(df.columns.size)
df

Output:

    0   1   2   3
#---------------#
0   0   1   3   3
1   2   2   0   2
2   2   1   3   1
3   2   1   0   0

a.2) Drop column names (one-liner)
Could have performance issues and side effects, see discussion below.

df.T.reset_index(drop=True).T

Output:

    0   1   2   3
#---------------#
0   0   1   3   3
1   2   2   0   2
2   2   1   3   1
3   2   1   0   0

b.1) Move column names into a row (one-liner)
Same issues, see discussion below.

df.T.reset_index().T

Output:

        0   1   2   3
#-------------------#
index   A   B   C   D
   0    0   1   3   3
   1    2   2   0   2
   2    2   1   3   1
   3    2   1   0   0

b.2) Move column names into a row
Effective way.

 #heterogeneous DataFrame creation
df = pd.DataFrame(np.random.randint(0,4,size=(4, 3)), columns=list('789')).join(
     pd.DataFrame(list('bcde'),columns=['A']))
df.index.name = '4'

#save column as row then reindex column names
df = df.append(pd.Series( df.columns,name = df.index.name,index= df.columns ), )
df.columns = pd.RangeIndex(df.columns.size)
print (df)
print(df.info())

Output: NB you will need extra effort to prevent upcasing of all data

   0  1  2  3
#-----------#
4            
0  2  3  2  b
1  1  0  2  c
2  3  1  3  d
3  3  3  2  e
4  7  8  9  A

<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, 0 to 4
Data columns (total 4 columns):
0    5 non-null object
1    5 non-null object
2    5 non-null object
3    5 non-null object
dtypes: object(4)

c) Add secondary column index (one-liner)
Could have performance issues and side effects, see discussion below.

df.T.set_index(pd.RangeIndex(df.columns.size),append=True).T

Output:

    A   B   C   D
    0   1   2   3
#---------------#
0   0   1   3   3
1   2   2   0   2
2   2   1   3   1
3   2   1   0   0

One line approach criticism

Performance issues:
For huge datasets could be unacceptable costs of double T , but on simple cases one line that returns copy of DataFrame maybe useful. See test results

In [294]: for i in range (3,7):
     ...:     df = pd.DataFrame(np.random.randint(0,9,size=(10**i, 10**3)))
     ...:     print ('shape:',df.shape)
     ...:     %timeit df.T.reset_index(drop=True)
     ...: 
shape: (1000, 1000)
100 loops, best of 3: 3.2 ms per loop
shape: (10000, 1000)
10 loops, best of 3: 29.3 ms per loop
shape: (100000, 1000)
1 loop, best of 3: 546 ms per loop
shape: (1000000, 1000)
1 loop, best of 3: 9.9 s per loop

In [295]: %timeit df.columns = pd.RangeIndex(df.columns.size)
The slowest run took 28.60 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.74 µs per loop

Side effect (upcasting):
Heterogeneous DataFrames will be up-casted

In [352]: df = pd.DataFrame(np.random.randint(0,4,size=(4, 3)), columns=list('789')).join(
     ...:          pd.DataFrame(list('bcde'),columns=['A']))

In [353]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
7    4 non-null int64
8    4 non-null int64
9    4 non-null int64
A    4 non-null object
dtypes: int64(3), object(1)
memory usage: 208.0+ bytes

.T.T upcasting

In [354]: df.T.T.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
7    4 non-null object
8    4 non-null object
9    4 non-null object
A    4 non-null object
dtypes: object(4)
memory usage: 208.0+ bytes

188

answered Oct 21 '22 07:10

ilia timofeev

I think you can use numpy.arange or range:

np.random.seed(10)
df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))

df.columns = np.arange(len(df.columns))
#alternatively
#df.columns = range(len(df.columns))
print (df)
   0  1  2  3
0  9  4  0  1
1  9  0  1  8
2  9  0  8  6
3  4  3  0  4
4  6  8  1  8
5  4  1  3  6
6  5  3  9  6
7  9  1  9  4
8  2  6  7  8
9  8  9  2  0

But lost column values.

If need MultiIndex without names:

df.columns = [np.arange(len(df.columns)), df.columns]
print (df)
   0  1  2  3
   A  B  C  D
0  9  4  0  1
1  9  0  1  8
2  9  0  8  6
3  4  3  0  4
4  6  8  1  8
5  4  1  3  6
6  5  3  9  6
7  9  1  9  4
8  2  6  7  8
9  8  9  2  0

and for names use MultiIndex.from_arrays:

names = ['a','b']
df.columns = pd.MultiIndex.from_arrays([np.arange(len(df.columns)), df.columns], names=names)
print (df)
a  0  1  2  3
b  A  B  C  D
0  9  4  0  1
1  9  0  1  8
2  9  0  8  6
3  4  3  0  4
4  6  8  1  8
5  4  1  3  6
6  5  3  9  6
7  9  1  9  4
8  2  6  7  8
9  8  9  2  0

answered Oct 21 '22 06:10

jezrael

Related questions
                            
                                Python 3D Plots over non-rectangular domain
                            
                                Remove redundant square brackets in a list python [duplicate]
                            
                                Creating gist directly from Jupyper notebook?
                            
                                Python OpenCV - Extrapolating the largest rectangle off of a set of contour points
                            
                                Incremental Word2Vec Model Training in gensim
                            
                                Python - How to generate the Pairwise Hamming Distance Matrix
                            
                                Django CreateView success message not shown
                            
                                Formatting consecutive numbers
                            
                                How do I receive the data coming from IBs API in Python?
                            
                                Pandas .dt.hour formatting
                            
                                Pandas: How to do a boxplot bases in rows values instead of column values?
                            
                                aws CLI unable to be used due to module colorama
                            
                                sqlalchemy table schema autoload
                            
                                Python pandas -> select by condition in columns name
                            
                                How can I use psycopg2.extras in sqlalchemy?
                            
                                Sum of previous rows values
                            
                                Change table to tall format using panda (UNPIVOT)
                            
                                How can i plot a Kmeans text clustering result with matplotlib?
                            
                                H2O Python - how to get variable types, getTypes equivalent
                            
                                Setting the interval of x-axis for seaborn plot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a reset_index for columns or a way to move column headers to an inner index leaving their index positions as the outer index?

Tags:

python

pandas

Yale Newman

People also ask

2 Answers

One line approach criticism

ilia timofeev

jezrael

Recent Activity

Donate For Us