I'm trying to rearrange columns in a DataFrame, by putting a few columns first, and then all the others after.
With R's dplyr
, this would look like:
library(dplyr)
df = tibble(col1 = c("a", "b", "c"),
id = c(1, 2, 3),
col2 = c(2, 4, 6),
date = c("1 Feb", "2 Feb", "3 Feb"))
df2 = select(df,
id, date, everything())
Easy. With Python's pandas
, here's what I've tried:
import pandas as pd
df = pd.DataFrame({
"col1": ["a", "b", "c"],
"id": [1, 2, 3],
"col2": [2, 4, 6],
"date": ["1 Feb", "2 Feb", "3 Feb"]
})
# using sets
cols = df.columns.tolist()
cols_1st = {"id", "date"}
cols = set(cols) - cols_1st
cols = list(cols_1st) + list(cols)
# wrong column order
df2 = df[cols]
# using lists
cols = df.columns.tolist()
cols_1st = ["id", "date"]
cols = [c for c in cols if c not in cols_1st]
cols = cols_1st + cols
# right column order, but is there a better way?
df3 = df[cols]
The pandas
way is more tedious, but I'm fairly new to this. Is there a better way?
Reorder Columns using Pandas . Another way to reorder columns is to use the Pandas . reindex() method. This allows you to pass in the columns= parameter to pass in the order of columns that you want to use.
You need to create a new list of your columns in the desired order, then use df = df[cols] to rearrange the columns in this new order.
order() is used to rearrange the dataframe columns in alphabetical order.
Dplython. Package dplython is dplyr for Python users. It provide infinite functionality for data preprocessing.
You can use df.drop
:
>>> df = pd.DataFrame({
"col1": ["a", "b", "c"],
"id": [1, 2, 3],
"col2": [2, 4, 6],
"date": ["1 Feb", "2 Feb", "3 Feb"]
})
>>> df
col1 id col2 date
0 a 1 2 1 Feb
1 b 2 4 2 Feb
2 c 3 6 3 Feb
>>> cols_1st = ["id", "date"]
>>> df[cols_1st + list(df.drop(cols_1st, 1))]
id date col1 col2
0 1 1 Feb a 2
1 2 2 Feb b 4
2 3 3 Feb c 6
Usually, the best translation between R and Python Pandas is with base R which follow the same semantics such as logical indexing on a vector, here being column names. Notice the similarity below with negation and in
functions:
# R
mycols <- c("id", "date")
df2 <- df[c(mycols, colnames(df)[!colnames(df) %in% c(mycols)])]
# PANDAS (OLDER, NON-RECOMMENDED WAY)
mycols = ["id", "date"]
df2 = df[mycols + df.columns[~df.columns.isin(mycols)].tolist()]
# PANDAS (CURRENT, RECOMMENDED WAY WITH reindex)
df2 = df.reindex(mycols + df.columns[~df.columns.isin(mycols)].tolist(),
axis='columns')
As easy as you do it in R with datar
:
>>> from datar.all import c, f, tibble, select, everything
>>> df = tibble(col1 = c("a", "b", "c"),
... id = c(1, 2, 3),
... col2 = c(2, 4, 6),
... date = c("1 Feb", "2 Feb", "3 Feb"))
>>>
>>> df2 = select(df,
... f.id, f.date, everything())
>>>
>>> df2
id date col1 col2
<int64> <object> <object> <int64>
0 1 1 Feb a 2
1 2 2 Feb b 4
2 3 3 Feb c 6
I am the author of the package. Feel free to submit issues if you have any questions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With