my project is to identify a sentiment either positive or negative ( sentiment analysis ) in Arabic language,to do this task I used NLTK and python, when I enter tweets in arabic an error occurs
>>> pos_tweets = [(' أساند كل عون أمن شريف', 'positive'),
('ما أحلى الثورة التونسية', 'positive'),
('أجمل طفل في العالم', 'positive'),
('الشعب يحرس', 'positive'),
('ثورة شعبنا هي ثورة الكـــرامة وثـــورة الأحــــرار', 'positive')]
Unsupported characters in input
how can I solve this problem?
Your problem came from the IDLE shell. AFAIK IDLE won't accept UTF-8 input in interactive mode.
I suggest youe use alternative (and better) shells such as DreamPie or PythonWin.
There is a simple hack that i usually do to input UTF-8
into my python code. I don't know why it works but it accepts the unicode strings and runs the script smoothly after I add these lines:
#! /usr/local/bin/python -*- coding: UTF-8 -*-
pos_tweets = [(u' أساند كل عون أمن شريف', 'positive'),
(u'ما أحلى الثورة التونسية', 'positive'),
(u'أجمل طفل في العالم', 'positive'),
(u'الشعب يحرس', 'positive'),
(u'ثورة شعبنا هي ثورة الكـــرامة وثـــورة الأحــــرار', 'positive')]
for i in pos_tweets:
print i[0], i[1]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With