Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trouble converting string to float in python

Tags:

python

pandas

I am fairly new to Python so forgive me this simple question. I'm trying to convert string to float. Here is a sample of the data:

0     10.65%
1      7.90%

When I try:

 df['int_rate'] = df['int_rate'].astype('float')

I get:

ValueError: could not convert string to float: '13.75%'

When I try:

df['int_rate'] = df['int_rate'].replace("%","", inplace=True) 

And check my data, I get:

0     None
1     None

Any ideas what I'm doing wrong? Many thanks!

like image 911
Minsky Avatar asked Sep 20 '17 12:09

Minsky


Video Answer


2 Answers

You can use Series.replace with parameter regex=True for replace substrings:

df = pd.DataFrame({'int_rate':['10.65%','7.90%']})
df['int_rate'] = df['int_rate'].replace("%","", regex=True).astype(float)
print (df)
   int_rate
0     10.65
1      7.90

Or Series.str.replace:

df['int_rate'] = df['int_rate'].str.replace("%","")
print (df)
  int_rate
0    10.65
1     7.90
2         

Or Series.str.rstrip:

df['int_rate'] = df['int_rate'].str.rstrip("%").astype(float)
print (df)
   int_rate
0     10.65
1      7.90

See difference without it:

df = pd.DataFrame({'int_rate':['10.65%','7.90%', '%']})

df['int_rate_subs'] = df['int_rate'].replace("%","", regex=True)
df['int_rate_val'] = df['int_rate'].replace("%","")
print (df)
  int_rate int_rate_subs int_rate_val
0   10.65%         10.65       10.65%
1    7.90%          7.90        7.90%
2        %                           
like image 82
jezrael Avatar answered Oct 20 '22 08:10

jezrael


As you guessed, ValueError: could not convert string to float: '13.75%' indicates that the % character blocks the convertion.

Now when you try to remove it:

df['int_rate'] = df['int_rate'].replace("%","", inplace=True) 

You set inplace=True in your replacement, which as the name suggests changes the dataframe in-place, so replace() method call returns None. Thus you store None in df['int_rate'] and end up with a column containing only None values. You should either do:

df['int_rate'] = df['int_rate'].replace("%","") 

or

df['int_rate'].replace("%","", inplace=True)
like image 44
Guillaume Avatar answered Oct 20 '22 06:10

Guillaume