Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change column with string of percent to float pandas dataframe

Tags:

python

pandas

So I've essentially got this:

,    pct_intl_student
2879      %
2880     9%
2881    NaN
2882     1%
2883    NaN
Name: pct_intl_student, Length: 2884, dtype: object

Would it be possible in some easy way to change all the strings with a percent sign in them to a decimal number? So basically this:

,    pct_intl_student
2979    0
2880    0.09
2881    NaN
2882    0.01
2883    NaN
Name: pct_intl_student, Length: 2884, dtype: object

I do need the NaN values to stay in place, they will be converted to the average percentage number afterwards. The thing also is that NaN values should all stay as NaN, and the rows with merely the string '%' needs to become 0.

I tried:

df['pct_intl_student'] = df['pct_intl_student'].str.rstrip('%').astype('float') / 100.0

But this raises this error:

ValueError: could not convert string to float:

So I'm kindof at a loss right now

Hopefully someone can help me out.

like image 977
PEREZje Avatar asked Jun 04 '18 17:06

PEREZje


People also ask

How do you change percent to float in pandas?

str. rstrip() method to remove the trailing '%' character and then use astype(float) to convert it to numeric. You can also use Series.

How do you convert percentage to float?

To convert a percent to a decimal, pass the percent string to the parseFloat() function and divide the result by 100 , e.g. parseFloat(percent) / 100 . The parseFloat() function parses the provided string and returns a floating point number.

How do you convert a string to a float in Python?

We can convert a string to float in Python using the float() function. This is a built-in function used to convert an object to a floating point number. Internally, the float() function calls specified object __float__() function.

How do I change the data type values in a column in pandas?

Change column type in pandas using DataFrame.apply() to_numeric, pandas. to_datetime, and pandas. to_timedelta as arguments to apply the apply() function to change the data type of one or more columns to numeric, DateTime, and time delta respectively.


2 Answers

Here is an example that better describes your issue:

df = pd.DataFrame({"a": ["9%", "10%", np.nan, '%']})
print(df)
#     a
#0   9%
#1  10%
#2  NaN
#3    %

You want the string % to turn into the value 0.

One way is to change your code to use str.replace instead of str.strip. Here I will replace the %s with .0

df['a'].str.replace(r'%', r'.0').astype('float') / 100.0
#0    0.09
#1    0.10
#2     NaN
#3    0.00
#Name: a, dtype: float64
like image 67
pault Avatar answered Nov 14 '22 21:11

pault


Update:

df['pct_intl_student'] = (pd.to_numeric(df['pct_intl_student'].str[:-1])
                            .div(100)
                            .mask(df['pct_intl_student'] == '%', 0))

Output:

      pct_intl_student
2879              0.00
2880              0.09
2881               NaN
2882              0.01
2883               NaN

Use:

df['pct_intl_student'] = pd.to_numeric(df['pct_intl_student'].str.strip('%')).div(100)

Or

df['pct_intl_student'] = pd.to_numeric(df['pct_intl_student'].str[:-1]).div(100)

Output:

2880    0.09
2881     NaN
2882    0.01
2883     NaN
Name: pct_intl_student, dtype: float64
like image 31
Scott Boston Avatar answered Nov 14 '22 22:11

Scott Boston