Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas convert object column to str - column contains unicode, float etc

I have pandas data frame where column type shows as object but when I try to convert to string,

df['column'] = df['column'].astype('str')

UnicodeEncodeError get thrown: *** UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

My next approach was to handle the encoding part: df['column'] = filtered_df['column'].apply(lambda x: x.encode('utf-8').strip())

But that gives following error: *** AttributeError: 'float' object has no attribute 'encode'

Whats the best approach to convert this column to string.

Sample of string in the column

Thank you :)
Thank You !!!
responsibilities/assigned job.
like image 677
add-semi-colons Avatar asked Jan 09 '18 22:01

add-semi-colons


People also ask

How do you change items to STR in pandas?

Call pandas. DataFrame. astype(token) with that column as pandas. DataFrame and "|S" as token to convert each object in the column to a string, with the length of each string in memory equal to that of the longest string.

Which pandas method will convert a column type from object to float?

By using pandas DataFrame. astype() and pandas. to_numeric() methods you can convert a column from string/int type to float.

How do I change an object's datatype to float in Python?

We can convert a string to float in Python using the float() function. This is a built-in function used to convert an object to a floating point number. Internally, the float() function calls specified object __float__() function.

Which pandas method will convert a column type from object to float even if there are invalid numbers in that column?

Solution. To convert the column type to float in Pandas DataFrame: use the Series' astype() method.


1 Answers

I had the same problem in python 2.7 when trying to run a script that was originally intended for python 3. In python 2.7, the default str functionality is to encode to ASCII, which will apparently not work with your data. This can be replicated in a simple example:

import pandas as pd
df = pd.DataFrame({'column': ['asdf', u'uh ™ oh', 123]})
df['column'] = df['column'].astype('str')

Results in:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 3: ordinal not in range(128)

Instead, you can specify unicode:

df['column'] = df['column'].astype('unicode')

Verify that the number has been converted to a string:

df['column'][2]

This outputs u'123', so it has been converted to a unicode string. The special character ™ has been properly preserved as well.

like image 74
Nigel Avatar answered Oct 20 '22 18:10

Nigel