I have been following this site, https://radimrehurek.com/data_science_python/, to apply bag of words on a list of tweets.
import csv
from textblob import TextBlob
import pandas
messages = pandas.read_csv('C:/Users/Suki/Project/Project12/newData1.csv', sep='\t', quoting=csv.QUOTE_NONE,
                               names=["label", "message"])
def split_into_tokens(message):
    message = unicode(message, encoding="utf8")  # convert bytes into proper unicode
    return TextBlob(message).words
messages.message.head().apply(split_into_tokens)
print (messages)
However I keep getting this error. I've checked and I following the code on the site but the error keeps arising.
Error
Traceback (most recent call last):
  File "C:/Users/Suki/Project/Project12/projectBagofWords.py", line 34, in <module>
    messages.message.head().apply(split_into_tokens)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\series.py", line 2510, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/src\inference.pyx", line 1521, in pandas._libs.lib.map_infer
  File "C:/Users/Suki/Project/Project12/projectBagofWords.py", line 31, in split_into_tokens
    message = unicode(message, encoding="utf8")  # convert bytes into proper unicode
NameError: name 'unicode' is not defined
Can someone offer advice on how I could rectify this?
Thanks
unicode is a python 2 method. If you are not sure which version will run this code, you can simply add this at the beginning of your code so it will replace the old unicode with new str:
import sys
if sys.version_info[0] >= 3:
    unicode = str
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With