my project is to identify a sentiment either positive or negative ( sentiment analysis ) in Arabic language,to do this task I used NLTK and python, when I enter tweets in arabic an error occurs
>>> pos_tweets = [(' أساند كل عون أمن شريف', 'positive'),
              ('ما أحلى الثورة التونسية', 'positive'),
              ('أجمل طفل في العالم', 'positive'),
              ('الشعب يحرس', 'positive'),
              ('ثورة شعبنا هي ثورة الكـــرامة وثـــورة الأحــــرار', 'positive')]
Unsupported characters in input
how can I solve this problem?
Your problem came from the IDLE shell. AFAIK IDLE won't accept UTF-8 input in interactive mode.
I suggest youe use alternative (and better) shells such as DreamPie or PythonWin.
There is a simple hack that i usually do to input UTF-8 into my python code. I don't know why it works but it accepts the unicode strings and runs the script smoothly after I add these lines:
#! /usr/local/bin/python  -*- coding: UTF-8 -*-
pos_tweets = [(u' أساند كل عون أمن شريف', 'positive'), 
(u'ما أحلى الثورة التونسية', 'positive'), 
(u'أجمل طفل في العالم', 'positive'), 
(u'الشعب يحرس', 'positive'), 
(u'ثورة شعبنا هي ثورة الكـــرامة وثـــورة الأحــــرار', 'positive')] 
for i in pos_tweets:
  print i[0], i[1]
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With