How to move a column in a pandas dataframe

Tags:

I want to take a column indexed 'length' and make it my second column. It currently exists as the 5th column. I have tried:

colnames = big_df.columns.tolist()

# make index "length" the second column in the big_df
colnames = colnames[0] + colnames[4] + colnames[:-1] 

big_df = big_df[colnames]

I see the following error:

TypeError: must be str, not list

I'm not sure how to interpret this error because it actually should be a list, right?

Also, is there a general method to move any column by label to a specified position? My columns only have one level, i.e. no MultiIndex involved.

898

asked Oct 02 '18 21:10

GAD

1 Answers

Correcting your error

I'm not sure how to interpret this error because it actually should be a list, right?

No: colnames[0] and colnames[4] are scalars, not lists. You can't concatenate a scalar with a list. To make them lists, use square brackets:

colnames = [colnames[0]] + [colnames[4]] + colnames[:-1]

You can either use df[[colnames]] or df.reindex(columns=colnames): both necessarily trigger a copy operation as this transformation cannot be processed in place.

Generic solution

But converting arrays to lists and then concatenating lists manually is not only expensive, but prone to error. A related answer has many list-based solutions, but a NumPy-based solution is worthwhile since pd.Index objects are stored as NumPy arrays.

The key here is to modify the NumPy array via slicing rather than concatenation. There are only 2 cases to handle: when the desired position exists after the current position, and vice versa.

import pandas as pd, numpy as np
from string import ascii_uppercase

df = pd.DataFrame(columns=list(ascii_uppercase))

def shifter(df, col_to_shift, pos_to_move):
    arr = df.columns.values
    idx = df.columns.get_loc(col_to_shift)
    if idx == pos_to_move:
        pass
    elif idx > pos_to_move:
        arr[pos_to_move+1: idx+1] = arr[pos_to_move: idx]
    else:
        arr[idx: pos_to_move] = arr[idx+1: pos_to_move+1]
    arr[pos_to_move] = col_to_shift
    df = df.reindex(columns=arr)
    return df
    
df = df.pipe(shifter, 'J', 1)

print(df.columns)

Index(['A', 'J', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N',
       'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'],
      dtype='object')

Performance benchmarking

Using NumPy slicing is more efficient with a large number of columns versus a list-based method:

n = 10000
df = pd.DataFrame(columns=list(range(n)))

def shifter2(df, col_to_shift, pos_to_move):
    cols = df.columns.tolist()
    cols.insert(pos_to_move, cols.pop(df.columns.get_loc(col_to_shift)))
    df = df.reindex(columns=cols)
    return df

%timeit df.pipe(shifter, 590, 5)   # 381 µs
%timeit df.pipe(shifter2, 590, 5)  # 1.92 ms

139

answered Sep 23 '22 19:09

jpp

Related questions
                            
                                switch-case statement for STRINGS in Python
                            
                                Python call sql-server stored procedure with table valued parameter
                            
                                Discord API 401: Unauthorized error
                            
                                How to encrypt and decrypt pandas dataframe with decryption key?
                            
                                How to back up anaconda environment in Windows 10?
                            
                                Pandas: Split string on last occurrence
                            
                                Updating x-axis labels in matplotlib animation
                            
                                use tf–idf in keras Tokenizer
                            
                                python osmnx - extract only big freeways of a country
                            
                                Is there a GUI to see contents of .npy file?
                            
                                MemoryError with numpy arange
                            
                                FuzzyWuzzy error: WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '/']
                            
                                What datatype is captured with cv2.VideoCapture.read() method in OpenCV?
                            
                                Capturing Response Body for a HTTP Error in python
                            
                                In Python Flask: What are appropriate places to store data? [duplicate]
                            
                                Reading SQL table in python
                            
                                How to deal with the (undesired) triangles that form between the edges of my geometry when using Triangulation in matplotlib
                            
                                Merging layers on Keras (dot product)
                            
                                Sort 2 list linked to each other
                            
                                How to retrieve the markers for py.test in conftest.py?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to move a column in a pandas dataframe

Tags:

python

indexing

pandas

dataframe

numpy

GAD

People also ask

1 Answers

Correcting your error

Generic solution

Performance benchmarking

jpp

Recent Activity

Donate For Us