How to convert pandas dataframe to unicode?
`messages=pandas.read_csv('data/SMSSpamCollection',sep='\t',quoting=csv.QUOTE_NONE,names=["label", "message"])
def split_into_tokens(message):
message = unicode(message, 'utf8') # convert bytes into proper unicode
return TextBlob(message).words
messages.head().apply(split_into_tokens(messages))`
It gives error
Traceback (most recent call last):
File "minor.py", line 46, in <module>
messages.head().apply(split_into_tokens(messages))
File "minor.py", line 42, in split_into_tokens
message = unicode(message, 'utf8') # convert bytes into proper unicode
TypeError: coercing to Unicode: need string or buffer, DataFrame found
Df.x.str.encode('utf-8')
Will fix your problems.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.encode.html
Change the code
messages.head().apply(split_into_tokens(messages))
to
messages.head().apply(split_into_tokens)
while using 'apply' with a funtion like in your case passing parameters is not required, as your code shows it is passing a dataframe which is giving error on execution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With