So I've essentially got this:
, pct_intl_student
2879 %
2880 9%
2881 NaN
2882 1%
2883 NaN
Name: pct_intl_student, Length: 2884, dtype: object
Would it be possible in some easy way to change all the strings with a percent sign in them to a decimal number? So basically this:
, pct_intl_student
2979 0
2880 0.09
2881 NaN
2882 0.01
2883 NaN
Name: pct_intl_student, Length: 2884, dtype: object
I do need the NaN values to stay in place, they will be converted to the average percentage number afterwards. The thing also is that NaN values should all stay as NaN, and the rows with merely the string '%' needs to become 0.
I tried:
df['pct_intl_student'] = df['pct_intl_student'].str.rstrip('%').astype('float') / 100.0
But this raises this error:
ValueError: could not convert string to float:
So I'm kindof at a loss right now
Hopefully someone can help me out.
str. rstrip() method to remove the trailing '%' character and then use astype(float) to convert it to numeric. You can also use Series.
To convert a percent to a decimal, pass the percent string to the parseFloat() function and divide the result by 100 , e.g. parseFloat(percent) / 100 . The parseFloat() function parses the provided string and returns a floating point number.
We can convert a string to float in Python using the float() function. This is a built-in function used to convert an object to a floating point number. Internally, the float() function calls specified object __float__() function.
Change column type in pandas using DataFrame.apply() to_numeric, pandas. to_datetime, and pandas. to_timedelta as arguments to apply the apply() function to change the data type of one or more columns to numeric, DateTime, and time delta respectively.
Here is an example that better describes your issue:
df = pd.DataFrame({"a": ["9%", "10%", np.nan, '%']})
print(df)
# a
#0 9%
#1 10%
#2 NaN
#3 %
You want the string %
to turn into the value 0
.
One way is to change your code to use str.replace
instead of str.strip
. Here I will replace the %
s with .0
df['a'].str.replace(r'%', r'.0').astype('float') / 100.0
#0 0.09
#1 0.10
#2 NaN
#3 0.00
#Name: a, dtype: float64
Update:
df['pct_intl_student'] = (pd.to_numeric(df['pct_intl_student'].str[:-1])
.div(100)
.mask(df['pct_intl_student'] == '%', 0))
Output:
pct_intl_student
2879 0.00
2880 0.09
2881 NaN
2882 0.01
2883 NaN
Use:
df['pct_intl_student'] = pd.to_numeric(df['pct_intl_student'].str.strip('%')).div(100)
Or
df['pct_intl_student'] = pd.to_numeric(df['pct_intl_student'].str[:-1]).div(100)
Output:
2880 0.09
2881 NaN
2882 0.01
2883 NaN
Name: pct_intl_student, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With