Sample DataFrame:
import numpy as np
df = pd.DataFrame(np.random.randint(0, 10, size=(10, 4)), columns=list('ABCD'))
Is there a way to reset index for columns? or to easily insert a row with column index position values? I'd prefer the index positions to be the outer most index and be left with the column headers as the inner most index.
When you set inplace = True , the reset_index method will not create a new DataFrame. Instead, it will directly modify and overwrite your original DataFrame.
You can use the rename() method of pandas. DataFrame to change column/index name individually. Specify the original name and the new name in dict like {original name: new name} to columns / index parameter of rename() . columns is for the column name, and index is for the index name.
Pandas DataFrame reset_index() Method The reset_index() method allows you reset the index back to the default 0, 1, 2 etc indexes. By default this method will keep the "old" idexes in a column named "index", to avoid this, use the drop parameter.
a.1) Drop column names
df.columns = pd.RangeIndex(df.columns.size)
df
Output:
0 1 2 3
#---------------#
0 0 1 3 3
1 2 2 0 2
2 2 1 3 1
3 2 1 0 0
a.2) Drop column names (one-liner)
Could have performance issues and side effects, see discussion below.
df.T.reset_index(drop=True).T
Output:
0 1 2 3
#---------------#
0 0 1 3 3
1 2 2 0 2
2 2 1 3 1
3 2 1 0 0
b.1) Move column names into a row (one-liner)
Same issues, see discussion below.
df.T.reset_index().T
Output:
0 1 2 3
#-------------------#
index A B C D
0 0 1 3 3
1 2 2 0 2
2 2 1 3 1
3 2 1 0 0
b.2) Move column names into a row
Effective way.
#heterogeneous DataFrame creation
df = pd.DataFrame(np.random.randint(0,4,size=(4, 3)), columns=list('789')).join(
pd.DataFrame(list('bcde'),columns=['A']))
df.index.name = '4'
#save column as row then reindex column names
df = df.append(pd.Series( df.columns,name = df.index.name,index= df.columns ), )
df.columns = pd.RangeIndex(df.columns.size)
print (df)
print(df.info())
Output: NB you will need extra effort to prevent upcasing of all data
0 1 2 3
#-----------#
4
0 2 3 2 b
1 1 0 2 c
2 3 1 3 d
3 3 3 2 e
4 7 8 9 A
<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, 0 to 4
Data columns (total 4 columns):
0 5 non-null object
1 5 non-null object
2 5 non-null object
3 5 non-null object
dtypes: object(4)
c) Add secondary column index (one-liner)
Could have performance issues and side effects, see discussion below.
df.T.set_index(pd.RangeIndex(df.columns.size),append=True).T
Output:
A B C D
0 1 2 3
#---------------#
0 0 1 3 3
1 2 2 0 2
2 2 1 3 1
3 2 1 0 0
Performance issues:
For huge datasets could be unacceptable costs of double T
, but on simple cases one line that returns copy of DataFrame maybe useful. See test results
In [294]: for i in range (3,7):
...: df = pd.DataFrame(np.random.randint(0,9,size=(10**i, 10**3)))
...: print ('shape:',df.shape)
...: %timeit df.T.reset_index(drop=True)
...:
shape: (1000, 1000)
100 loops, best of 3: 3.2 ms per loop
shape: (10000, 1000)
10 loops, best of 3: 29.3 ms per loop
shape: (100000, 1000)
1 loop, best of 3: 546 ms per loop
shape: (1000000, 1000)
1 loop, best of 3: 9.9 s per loop
In [295]: %timeit df.columns = pd.RangeIndex(df.columns.size)
The slowest run took 28.60 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.74 µs per loop
Side effect (upcasting):
Heterogeneous DataFrames will be up-casted
In [352]: df = pd.DataFrame(np.random.randint(0,4,size=(4, 3)), columns=list('789')).join(
...: pd.DataFrame(list('bcde'),columns=['A']))
In [353]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
7 4 non-null int64
8 4 non-null int64
9 4 non-null int64
A 4 non-null object
dtypes: int64(3), object(1)
memory usage: 208.0+ bytes
.T.T upcasting
In [354]: df.T.T.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
7 4 non-null object
8 4 non-null object
9 4 non-null object
A 4 non-null object
dtypes: object(4)
memory usage: 208.0+ bytes
I think you can use numpy.arange
or range
:
np.random.seed(10)
df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
df.columns = np.arange(len(df.columns))
#alternatively
#df.columns = range(len(df.columns))
print (df)
0 1 2 3
0 9 4 0 1
1 9 0 1 8
2 9 0 8 6
3 4 3 0 4
4 6 8 1 8
5 4 1 3 6
6 5 3 9 6
7 9 1 9 4
8 2 6 7 8
9 8 9 2 0
But lost column values.
If need MultiIndex
without names:
df.columns = [np.arange(len(df.columns)), df.columns]
print (df)
0 1 2 3
A B C D
0 9 4 0 1
1 9 0 1 8
2 9 0 8 6
3 4 3 0 4
4 6 8 1 8
5 4 1 3 6
6 5 3 9 6
7 9 1 9 4
8 2 6 7 8
9 8 9 2 0
and for names use MultiIndex.from_arrays
:
names = ['a','b']
df.columns = pd.MultiIndex.from_arrays([np.arange(len(df.columns)), df.columns], names=names)
print (df)
a 0 1 2 3
b A B C D
0 9 4 0 1
1 9 0 1 8
2 9 0 8 6
3 4 3 0 4
4 6 8 1 8
5 4 1 3 6
6 5 3 9 6
7 9 1 9 4
8 2 6 7 8
9 8 9 2 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With