I have been following this site, https://radimrehurek.com/data_science_python/, to apply bag of words on a list of tweets.
import csv
from textblob import TextBlob
import pandas
messages = pandas.read_csv('C:/Users/Suki/Project/Project12/newData1.csv', sep='\t', quoting=csv.QUOTE_NONE,
names=["label", "message"])
def split_into_tokens(message):
message = unicode(message, encoding="utf8") # convert bytes into proper unicode
return TextBlob(message).words
messages.message.head().apply(split_into_tokens)
print (messages)
However I keep getting this error. I've checked and I following the code on the site but the error keeps arising.
Error
Traceback (most recent call last):
File "C:/Users/Suki/Project/Project12/projectBagofWords.py", line 34, in <module>
messages.message.head().apply(split_into_tokens)
File "C:\Program Files\Python36\lib\site-packages\pandas\core\series.py", line 2510, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/src\inference.pyx", line 1521, in pandas._libs.lib.map_infer
File "C:/Users/Suki/Project/Project12/projectBagofWords.py", line 31, in split_into_tokens
message = unicode(message, encoding="utf8") # convert bytes into proper unicode
NameError: name 'unicode' is not defined
Can someone offer advice on how I could rectify this?
Thanks
unicode
is a python 2 method. If you are not sure which version will run this code, you can simply add this at the beginning of your code so it will replace the old unicode
with new str
:
import sys
if sys.version_info[0] >= 3:
unicode = str
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With