Pandas: Efficiently perform numerous modifications to column names

Question

How can you make numerous modifications to dataframe columns avoiding boilerplate code.

Reproducible example:

data = {'Subject Id': ['1', '2', '3'],
        'First-Name': ['Alex', 'Amy', 'Allen'], 
        'Last, name': ['Anderson', 'Ackerman', 'Ali']}

df = pd.DataFrame(data, columns = ['Subject Id', 'First-Name', 'Last, name'])

df

    Subject Id  First-Name  Last, name
0   1           Alex        Anderson
1   2           Amy         Ackerman
2   3           Allen       Ali

To clean the column names I'd usually do something like this:

df.columns = [l.lower() for l in df.columns]
df.columns = [s.replace('-', ' ') for s in df.columns]
df.columns = [d.replace(',', ' ') for d in df.columns]

But sometimes I need to make far more than 3 modifications. Is there a way to chain such operations together or otherwise do this more efficiently?

EdChum · Accepted Answer

You can call vectorised .str methods and chain these calls on your columns, here we use str.lower and str.replace:

In [91]:
df.columns = df.columns.str.lower().str.replace('-|,', ' ')
df

Out[91]:
  subject id first name last  name
0          1       Alex   Anderson
1          2        Amy   Ackerman
2          3      Allen        Ali

Note also there was nothing stopping you from just combining everything in a single list comprehension:

In [93]:
df.columns = [l.lower().replace('-', ' ').replace(',',' ') for l in df.columns]
df

Out[93]:
  subject id first name last  name
0          1       Alex   Anderson
1          2        Amy   Ackerman
2          3      Allen        Ali

A list comprehension maybe quicker on such a small number of columns:

timings

In [96]:
%timeit [l.lower().replace('-', ' ').replace(',',' ') for l in df.columns]
%timeit df.columns.str.lower().str.replace('-|,', ' ')

100000 loops, best of 3: 5.26 µs per loop
1000 loops, best of 3: 284 µs per loop

Pandas: Efficiently perform numerous modifications to column names

Tags:

python

pandas

RDJ

1 Answers

EdChum

Recent Activity

Donate For Us

Pandas: Efficiently perform numerous modifications to column names

Tags:

python

pandas

RDJ

1 Answers

EdChum

Related questions

Recent Activity

Donate For Us