I want to take a column indexed 'length' and make it my second column. It currently exists as the 5th column. I have tried:
colnames = big_df.columns.tolist()
# make index "length" the second column in the big_df
colnames = colnames[0] + colnames[4] + colnames[:-1]
big_df = big_df[colnames]
I see the following error:
TypeError: must be str, not list
I'm not sure how to interpret this error because it actually should be a list
, right?
Also, is there a general method to move any column by label to a specified position? My columns only have one level, i.e. no MultiIndex
involved.
The basic idea to move a column in a pandas dataframe is to remove the column from its current place and insert it in the desired position. The pandas library offers many useful functions such as pop() and insert(). We will make use of these two functions to manipulate with our dataframe.
Pandas Change Position of a Column (Last to the First) You can change the position of a pandas column in multiple ways, the simplest way would be to select the columns by positioning the last column in the first position. You can also use this approach to change the order of pandas columns in the desired order.
Move or copy rows or columns Drag the rows or columns to another location. Hold down OPTION and drag the rows or columns to another location. Hold down SHIFT and drag your row or column between existing rows or columns. Excel makes space for the new row or column.
You need to create a new list of your columns in the desired order, then use df = df[cols] to rearrange the columns in this new order.
I'm not sure how to interpret this error because it actually should be a list, right?
No: colnames[0]
and colnames[4]
are scalars, not lists. You can't concatenate a scalar with a list. To make them lists, use square brackets:
colnames = [colnames[0]] + [colnames[4]] + colnames[:-1]
You can either use df[[colnames]]
or df.reindex(columns=colnames)
: both necessarily trigger a copy operation as this transformation cannot be processed in place.
But converting arrays to lists and then concatenating lists manually is not only expensive, but prone to error. A related answer has many list-based solutions, but a NumPy-based solution is worthwhile since pd.Index
objects are stored as NumPy arrays.
The key here is to modify the NumPy array via slicing rather than concatenation. There are only 2 cases to handle: when the desired position exists after the current position, and vice versa.
import pandas as pd, numpy as np
from string import ascii_uppercase
df = pd.DataFrame(columns=list(ascii_uppercase))
def shifter(df, col_to_shift, pos_to_move):
arr = df.columns.values
idx = df.columns.get_loc(col_to_shift)
if idx == pos_to_move:
pass
elif idx > pos_to_move:
arr[pos_to_move+1: idx+1] = arr[pos_to_move: idx]
else:
arr[idx: pos_to_move] = arr[idx+1: pos_to_move+1]
arr[pos_to_move] = col_to_shift
df = df.reindex(columns=arr)
return df
df = df.pipe(shifter, 'J', 1)
print(df.columns)
Index(['A', 'J', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N',
'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'],
dtype='object')
Using NumPy slicing is more efficient with a large number of columns versus a list-based method:
n = 10000
df = pd.DataFrame(columns=list(range(n)))
def shifter2(df, col_to_shift, pos_to_move):
cols = df.columns.tolist()
cols.insert(pos_to_move, cols.pop(df.columns.get_loc(col_to_shift)))
df = df.reindex(columns=cols)
return df
%timeit df.pipe(shifter, 590, 5) # 381 µs
%timeit df.pipe(shifter2, 590, 5) # 1.92 ms
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With