Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert percent string to float in pandas read_csv

Tags:

python

pandas

Is there a way to convert values like '34%' directly to int or float when using read_csv in pandas? I want '34%' to be directly read as 0.34

  1. Using this in read_csv did not work:

    read_csv(..., dtype={'col':np.float})

  2. After loading the csv as 'df' this also did not work with the error "invalid literal for float(): 34%"

    df['col'] = df['col'].astype(float)

  3. I ended up using this which works but is long winded:

    df['col'] = df['col'].apply(lambda x: np.nan if x in ['-'] else x[:-1]).astype(float)/100

like image 992
KieranPC Avatar asked Sep 04 '14 15:09

KieranPC


People also ask

How do you convert percentage to float?

Strip the "%" from the end. If percent has no ".", simply return it divided by 100. If percent is negative, strip the "-" and re-call function, then convert the result back to a negative and return it. Remove the decimal place.

How do I change a percentage into a number in pandas?

The solution here is to first use pandas. Series. str. rstrip() method to remove the trailing '%' character and then use astype(float) to convert it to numeric.

How do I remove a percentage from a DataFrame in Python?

rstrip() to get rid of the trailing percent sign, then we divide the array in its entirety by 100.0 to convert from percentage to actual value. For example, 45% is equivalent to 0.45.

How do you convert a string to a float in Python?

We can convert a string to float in Python using the float() function. This is a built-in function used to convert an object to a floating point number. Internally, the float() function calls specified object __float__() function.


2 Answers

You were very close with your df attempt. Try changing:

df['col'] = df['col'].astype(float) 

to:

df['col'] = df['col'].str.rstrip('%').astype('float') / 100.0 #                     ^ use str funcs to elim '%'     ^ divide by 100 # could also be:     .str[:-1].astype(... 

Pandas supports Python's string processing functions on string columns. Just precede the string function you want with .str and see if it does what you need. (This includes string slicing, too, of course.)

Above we utilize .str.rstrip() to get rid of the trailing percent sign, then we divide the array in its entirety by 100.0 to convert from percentage to actual value. For example, 45% is equivalent to 0.45.

Although .str.rstrip('%') could also just be .str[:-1], I prefer to explicitly remove the '%' rather than blindly removing the last char, just in case...

like image 80
Gary02127 Avatar answered Sep 28 '22 09:09

Gary02127


You can define a custom function to convert your percents to floats at read_csv() time:

# dummy data temp1 = """index col  113 34% 122 50% 123 32% 301 12%"""  # Custom function taken from https://stackoverflow.com/questions/12432663/what-is-a-clean-way-to-convert-a-string-percent-to-a-float def p2f(x):     return float(x.strip('%'))/100  # Pass to `converters` param as a dict... df = pd.read_csv(io.StringIO(temp1), sep='\s+',index_col=[0], converters={'col':p2f}) df          col index       113    0.34 122    0.50 123    0.32 301    0.12  # Check that dtypes really are floats df.dtypes  col    float64 dtype: object 

My percent to float code is courtesy of ashwini's answer: What is a clean way to convert a string percent to a float?

like image 28
EdChum Avatar answered Sep 28 '22 09:09

EdChum