Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas convert strings to float for multiple columns in dataframe

Tags:

python

pandas

I'm new to pandas and trying to figure out how to convert multiple columns which are formatted as strings to float64's. Currently I'm doing the below, but it seems like apply() or applymap() should be able to accomplish this task even more efficiently...unfortunately I'm a bit too much of a rookie to figure out how. Currently the values are percentages formatted as strings like '15.5%'

for column in ['field1', 'field2', 'field3']:
    data[column] = data[column].str.rstrip('%').astype('float64') / 100
like image 599
user1507844 Avatar asked May 20 '13 06:05

user1507844


People also ask

How do I change multiple columns to float in pandas?

To convert the data type of multiple columns to float, use Pandas' apply(~) method with to_numeric(~) .

How do I convert all columns to float in pandas?

Using pandas. Alternatively, you can convert all string columns to float type using pandas. to_numeric() . For example use df['Discount'] = pd. to_numeric(df['Discount']) function to convert 'Discount' column to float.

How do I change the datatype of multiple columns in pandas?

to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.


3 Answers

df.applymap(lambda x:float(x.rstrip('%'))/100)
like image 162
waitingkuo Avatar answered Oct 10 '22 22:10

waitingkuo


Starting in 0.11.1 (coming out this week), replace has a new option to replace with a regex, so this becomes possible

In [14]: df = DataFrame('10.0%',index=range(100),columns=range(10))

In [15]: df.replace('%','',regex=True).astype('float')/100
Out[15]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 10 columns):
0    100  non-null values
1    100  non-null values
2    100  non-null values
3    100  non-null values
4    100  non-null values
5    100  non-null values
6    100  non-null values
7    100  non-null values
8    100  non-null values
9    100  non-null values
dtypes: float64(10)

And a bit faster

In [16]: %timeit df.replace('%','',regex=True).astype('float')/100
1000 loops, best of 3: 1.16 ms per loop

 In [18]: %timeit df.applymap(lambda x: float(x[:-1]))/100
1000 loops, best of 3: 1.67 ms per loop
like image 31
Jeff Avatar answered Oct 10 '22 22:10

Jeff


answering a comment in the accepted answer: for specific columns make sure you don't do it inplace.

df['Column1'] = df['Column1'].replace('%','',regex=True).astype('float')/100
like image 1
nigel76 Avatar answered Oct 10 '22 23:10

nigel76