Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to apply a function to multiple columns in a pandas dataframe at one time

I frequently deal with data which is poorly formatted (I.e. number fields are not consistent etc)

There may be other ways, which I am not aware of but the way I format a single column in a dataframe is by using a function and mapping the column to that function.

format = df.column_name.map(format_number)

Question: 1 - what if I have a dataframe with 50 columns, and want to apply that formatting to multiple columns, etc column 1, 3, 5, 7, 9,

Can you go:

format = df.1,3,5,9.map(format_number)

.. This way I could format all my number columns in one line?

like image 743
yoshiserry Avatar asked Feb 28 '14 04:02

yoshiserry


2 Answers

You can do df[['Col1', 'Col2', 'Col3']].applymap(format_number). Note, though that this will return new columns; it won't modify the existing DataFrame. If you want to put the values back in the original, you'll have to do df[['Col1', 'Col2', 'Col3']] = df[['Col1', 'Col2', 'Col3']].applymap(format_number).

like image 90
BrenBarn Avatar answered Oct 05 '22 13:10

BrenBarn


You could use apply like this:

df.apply(lambda row: format_number(row), axis=1)

You would need to specify the columns though in your format_number function:

def format_number(row):
    row['Col1'] = doSomething(row['Col1']
    row['Col2'] = doSomething(row['Col2'])
    row['Col3'] = doSomething(row['Col3'])

This is not as elegant as @BrenBarn's answer but it has an advantage that the dataframe is modified in place so you don't need to assign the columns back again

like image 45
EdChum Avatar answered Oct 05 '22 14:10

EdChum