Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rearranging columns with pandas: Is there an equivalent to dplyr's select(..., everything())?

I'm trying to rearrange columns in a DataFrame, by putting a few columns first, and then all the others after.

With R's dplyr, this would look like:

library(dplyr)

df = tibble(col1 = c("a", "b", "c"),
            id = c(1, 2, 3),
            col2 = c(2, 4, 6),
            date = c("1 Feb", "2 Feb", "3 Feb"))

df2 = select(df,
             id, date, everything())

Easy. With Python's pandas, here's what I've tried:

import pandas as pd

df = pd.DataFrame({
    "col1": ["a", "b", "c"],
    "id": [1, 2, 3],
    "col2": [2, 4, 6],
    "date": ["1 Feb", "2 Feb", "3 Feb"]
    })

# using sets
cols = df.columns.tolist()
cols_1st = {"id", "date"}
cols = set(cols) - cols_1st
cols = list(cols_1st) + list(cols)

# wrong column order
df2 = df[cols]

# using lists
cols = df.columns.tolist()
cols_1st = ["id", "date"]
cols = [c for c in cols if c not in cols_1st]
cols = cols_1st + cols

# right column order, but is there a better way?
df3 = df[cols]

The pandas way is more tedious, but I'm fairly new to this. Is there a better way?

like image 501
ardaar Avatar asked Mar 01 '20 18:03

ardaar


People also ask

How do I rearrange the order of columns in pandas?

Reorder Columns using Pandas . Another way to reorder columns is to use the Pandas . reindex() method. This allows you to pass in the columns= parameter to pass in the order of columns that you want to use.

How do you rearrange the columns of a Dataframe having columns?

You need to create a new list of your columns in the desired order, then use df = df[cols] to rearrange the columns in this new order.

Which function is used to rearrange columns in Dataframe?

order() is used to rearrange the dataframe columns in alphabetical order.

What is the equivalent of dplyr in Python?

Dplython. Package dplython is dplyr for Python users. It provide infinite functionality for data preprocessing.


3 Answers

You can use df.drop:

>>> df = pd.DataFrame({
    "col1": ["a", "b", "c"],
    "id": [1, 2, 3],
    "col2": [2, 4, 6],
    "date": ["1 Feb", "2 Feb", "3 Feb"]
    })

>>> df

  col1  id  col2   date
0    a   1     2  1 Feb
1    b   2     4  2 Feb
2    c   3     6  3 Feb

>>> cols_1st = ["id", "date"]

>>> df[cols_1st + list(df.drop(cols_1st, 1))]

   id   date col1  col2
0   1  1 Feb    a     2
1   2  2 Feb    b     4
2   3  3 Feb    c     6
like image 95
Sayandip Dutta Avatar answered Nov 10 '22 00:11

Sayandip Dutta


Usually, the best translation between R and Python Pandas is with base R which follow the same semantics such as logical indexing on a vector, here being column names. Notice the similarity below with negation and in functions:

# R 
mycols <- c("id", "date")
df2 <- df[c(mycols, colnames(df)[!colnames(df) %in% c(mycols)])]


# PANDAS (OLDER, NON-RECOMMENDED WAY)
mycols = ["id", "date"]
df2 = df[mycols + df.columns[~df.columns.isin(mycols)].tolist()]

# PANDAS (CURRENT, RECOMMENDED WAY WITH reindex)
df2 = df.reindex(mycols + df.columns[~df.columns.isin(mycols)].tolist(),
                 axis='columns')
like image 38
Parfait Avatar answered Nov 09 '22 23:11

Parfait


As easy as you do it in R with datar:

>>> from datar.all import c, f, tibble, select, everything
>>> df = tibble(col1 = c("a", "b", "c"),
...             id = c(1, 2, 3),
...             col2 = c(2, 4, 6),
...             date = c("1 Feb", "2 Feb", "3 Feb"))
>>>             
>>> df2 = select(df,
...              f.id, f.date, everything())
>>>              
>>> df2
       id     date     col1    col2
  <int64> <object> <object> <int64>
0       1    1 Feb        a       2
1       2    2 Feb        b       4
2       3    3 Feb        c       6

I am the author of the package. Feel free to submit issues if you have any questions.

like image 41
Panwen Wang Avatar answered Nov 10 '22 00:11

Panwen Wang