Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting and renaming columns at the same time

I looked around but could not find the solution for this. In R's dplyr we can select and rename column in one line of code.

select(Com=Commander,Sco=Score)

I'm trying to do the same thing in pandas but could not find feasible solution for it yet!

Let's say we have this sample data

# Create an example dataframe
data = {'Commander': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'Date': ['2012, 02, 08', '2012, 02, 08', '2012, 02, 08', '2012, 02, 08', '2012, 02, 08'], 
        'Score': [4, 24, 31, 2, 3]}
df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])
df


           Commander          Date  Score
Cochice        Jason  2012, 02, 08      4
Pima           Molly  2012, 02, 08     24
Santa Cruz      Tina  2012, 02, 08     31
Maricopa        Jake  2012, 02, 08      2
Yuma             Amy  2012, 02, 08      3

and want to select and rename Commander and Score columns like this

df[['Com'=='Commander','Sco'=='Score']]

ValueError: Item wrong length 2 instead of 5.

How can I do that ?

like image 206
Alexander Avatar asked Mar 03 '23 14:03

Alexander


1 Answers

A bit late, and maybe you've already figured this out, but I had the same problem and the answers here got me most of the way to the solution I used.

The shortest answer to "how to add a range to select" is to pass the list of selected columns to the resultant dataframe of your rename operation:

df.rename(columns = {"Commander": "Com", "Score": "Sco"})[['Com', 'Sco']]

              Com  Sco
Cochice     Jason    4
Pima        Molly   24
Santa Cruz   Tina   31
Maricopa     Jake    2
Yuma          Amy    3

But it's a little tedious to rewrite the column names, right? So you can initialize the rename with a dictionary:

selector_d = {'Commander': 'Com', 'Score': 'Sco'}

and pass that to the rename and select operations:

df.rename(columns=selector_d)[[*selector_d.values()]]
              Com  Sco
Cochice     Jason    4
Pima        Molly   24
Santa Cruz   Tina   31
Maricopa     Jake    2
Yuma          Amy    3

My scenario was close to this - I had columns that I did not want to rename, but I did want to select them. This can be done by including the columns in the rename/select dictionary, but using the same name.

Here's the whole process with another column added:

data = {
    'Commander': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
    'Date': ['2012, 02, 08', '2012, 02, 08', '2012, 02, 08',
             '2012, 02, 08', '2012, 02, 08'],
    'Score': [4, 24, 31, 2, 3],
    'Team': ['Green', 'Yellow', 'Green', 'Yellow', 'Yellow'],
}
df = pd.DataFrame(data, index=['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])
df

           Commander          Date  Score    Team
Cochice        Jason  2012, 02, 08      4   Green
Pima           Molly  2012, 02, 08     24  Yellow
Santa Cruz      Tina  2012, 02, 08     31   Green
Maricopa        Jake  2012, 02, 08      2  Yellow
Yuma             Amy  2012, 02, 08      3  Yellow

selector_d = {'Team': 'Team', 'Commander': 'Com', 'Score': 'Sco'}

df.rename(columns=selector_d)[[*selector_d.values()]]

              Team    Com  Sco
Cochice      Green  Jason    4
Pima        Yellow  Molly   24
Santa Cruz   Green   Tina   31
Maricopa    Yellow   Jake    2
Yuma        Yellow    Amy    3

As you can see, this also allows reordering of the columns in the final dataframe.

Edited on 2021-08-28, per comment by @Hedge92

Actually, you don't need the double brackets to select the columns from selector_d.values(), as seen here:

df.rename(columns=selector_d)[[*selector_d.values()]].equals(
    df.rename(columns=selector_d)[selector_d.values()]
)
True

So, df.rename(columns=selector_d)[selector_d.values()] will suffice to select the new columns.

like image 76
pjdrew Avatar answered Mar 17 '23 04:03

pjdrew