I frequently deal with data which is poorly formatted (I.e. number fields are not consistent etc)
There may be other ways, which I am not aware of but the way I format a single column in a dataframe is by using a function and mapping the column to that function.
format = df.column_name.map(format_number)
Question: 1 - what if I have a dataframe with 50 columns, and want to apply that formatting to multiple columns, etc column 1, 3, 5, 7, 9,
Can you go:
format = df.1,3,5,9.map(format_number)
.. This way I could format all my number columns in one line?
You can do df[['Col1', 'Col2', 'Col3']].applymap(format_number)
. Note, though that this will return new columns; it won't modify the existing DataFrame. If you want to put the values back in the original, you'll have to do df[['Col1', 'Col2', 'Col3']] = df[['Col1', 'Col2', 'Col3']].applymap(format_number)
.
You could use apply
like this:
df.apply(lambda row: format_number(row), axis=1)
You would need to specify the columns though in your format_number
function:
def format_number(row):
row['Col1'] = doSomething(row['Col1']
row['Col2'] = doSomething(row['Col2'])
row['Col3'] = doSomething(row['Col3'])
This is not as elegant as @BrenBarn's answer but it has an advantage that the dataframe is modified in place so you don't need to assign the columns back again
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With