I'm new to pandas and trying to figure out how to convert multiple columns which are formatted as strings to float64's. Currently I'm doing the below, but it seems like apply() or applymap() should be able to accomplish this task even more efficiently...unfortunately I'm a bit too much of a rookie to figure out how. Currently the values are percentages formatted as strings like '15.5%'
for column in ['field1', 'field2', 'field3']:
data[column] = data[column].str.rstrip('%').astype('float64') / 100
To convert the data type of multiple columns to float, use Pandas' apply(~) method with to_numeric(~) .
Using pandas. Alternatively, you can convert all string columns to float type using pandas. to_numeric() . For example use df['Discount'] = pd. to_numeric(df['Discount']) function to convert 'Discount' column to float.
to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
df.applymap(lambda x:float(x.rstrip('%'))/100)
Starting in 0.11.1 (coming out this week), replace has a new option to replace with a regex, so this becomes possible
In [14]: df = DataFrame('10.0%',index=range(100),columns=range(10))
In [15]: df.replace('%','',regex=True).astype('float')/100
Out[15]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 10 columns):
0 100 non-null values
1 100 non-null values
2 100 non-null values
3 100 non-null values
4 100 non-null values
5 100 non-null values
6 100 non-null values
7 100 non-null values
8 100 non-null values
9 100 non-null values
dtypes: float64(10)
And a bit faster
In [16]: %timeit df.replace('%','',regex=True).astype('float')/100
1000 loops, best of 3: 1.16 ms per loop
In [18]: %timeit df.applymap(lambda x: float(x[:-1]))/100
1000 loops, best of 3: 1.67 ms per loop
answering a comment in the accepted answer: for specific columns make sure you don't do it inplace.
df['Column1'] = df['Column1'].replace('%','',regex=True).astype('float')/100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With