Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeWarning: special characters in Tkinter

I have written a program in Tkinter (Python 2.7), a scrabblehelper in Norwegian which contains some special characters (æøå), which means my wordlist (ordliste) contains words with special characters.

When I run my function finnord(c*), it returns 'cd'. I am using an entry.get() to get the word to put in my function.

My problem is with the encoding of entry.get(). I have local coding UTF-8, but I get an UniCodeError when I am writing any special characters in my entrybox and matching them to my wordliste.

Here is my output.

Warning (from warnings module):
  File "C:\pythonprog\scrabble\feud.py", line 46
if s not in liste and s in ordliste:
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode -    
interpreting them as being unequal

When i write in my shell:

> ordinn.get()
u'k\xf8**e'
> ordinn.get().encode('utf-8')
'k\xc3\xb8**e'
> print ordinn.get()
kø**e
> print ordinn.get().encode('utf-8')
kø**e

Anyone knows why I can't match ordinn.get() (entry) to my wordlist ?

like image 681
Martol1ni Avatar asked Nov 07 '11 12:11

Martol1ni


1 Answers

I can reproduce the error this way:

% python
Python 2.7.2+ (default, Oct  4 2011, 20:03:08) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 'k\xf8**e' in [u'k\xf8**e']
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False

So perhaps s is a str object, and liste or ordliste contains unicode, or (as eryksun points out in the comments) vice versa. The solution is to decode the str objects (most likely with the utf-8 codec) to make them unicode.

If that does not help, please print out and post the output of

print(repr(s))
print(repr(liste))
print(repr(ordliste))

I believe the problem can be avoided by converting all strings to unicode.

  1. When you generate ordliste from norsk.txt, use codecs.open('norsk.txt','r','utf-8'):

    encoding = sys.stdin.encoding
    with codecs.open('norsk.txt','r','utf-8') as fil:
        ordliste = [line.rstrip(u'\n') for line in fil]
    
  2. Convert all user input to unicode as soon as possible:

    def get_unicode(widget):
        streng = widget.get()
        try:
            streng = streng.decode('utf-8')
        except UnicodeEncodeError:
            pass
        return streng
    

So perhaps try this:

import Tkinter as tk
import tkMessageBox
import codecs
import itertools
import sys

alfabetet = (u"abcdefghijklmnopqrstuvwxyz"
             u"\N{LATIN SMALL LETTER AE}"
             u"\N{LATIN SMALL LETTER O WITH STROKE}"
             u"\N{LATIN SMALL LETTER A WITH RING ABOVE}")

encoding = sys.stdin.encoding
with codecs.open('norsk.txt','r',encoding) as fil:
    ordliste = set(line.rstrip(u'\n') for line in fil)

def get_unicode(widget):
    streng = widget.get()
    if isinstance(streng,str):
        streng = streng.decode('latin-1')
    return streng

def siord():
    alfa=lagtabell()
    try:
        streng = get_unicode(ordinn)
        ordene=finnord(streng,alfa)
        if len(ordene) == 0:
            # There are no words that match
            tkMessageBox.showinfo('Dessverre..','Det er ingen ord som passer...')
        else:
            # Done: The words that fit the pattern
            tkMessageBox.showinfo('Ferdig',
                'Ordene som passer er:\n'+ordene.encode('utf-8'))
    except Exception as err:
        # There has been a mistake .. Check your word
        print(repr(err))
        tkMessageBox.showerror('ERROR','Det har skjedd en feil.. Sjekk ordet ditt.')

def finnord(streng,alfa): 
    liste = set()
    for substitution in itertools.permutations(alfa,streng.count(u'*')):
        s = streng
        for ch in substitution:
            s = s.replace(u'*',ch,1)
        if s in ordliste:
            liste.add(s)
    liste = [streng]+list(liste)
    return u','.join(liste)+u'.'

def lagtabell():
    tinbox = get_unicode(bokstinn)
    if not tinbox.isalpha():
        alfa = alfabetet
    else:
        alfa = tinbox.lower()
    return alfa

root = tk.Tk()
root.title('FeudHjelper av Martin Skow Røed')
root.geometry('400x250+450+200')
# root.iconbitmap('data/ikon.ico')

skrift1 = tk.Label(root,
                text = '''\
Velkommen til FeudHjelper. Skriv inn de bokstavene du har, og erstatt ukjente med *.
F. eks: sl**ge
Det er kun lov til å bruke tre stjerner, altså tre ukjente bokstaver.''',
                font = ('Verdana',8), wraplength=350)
skrift1.pack(pady = 5)

ordinn = tk.StringVar(None)
tekstboks = tk.Entry(root, textvariable = ordinn)
tekstboks.pack(pady = 5)

# What letters do you have? Eg "ahneki". Leave blank here if you want all the words.
skrift2 = tk.Label(root, text = '''Hvilke bokstaver har du? F. eks "ahneki". La det være blankt her hvis du vil ha alle ordene.''',
                font = ('Verdana',8), wraplength=350)
skrift2.pack(pady = 10)

bokstinn = tk.StringVar(None)
tekstboks2 = tk.Entry(root, textvariable = bokstinn)
tekstboks2.pack()

knapp = tk.Button(text = 'Finn ord!', command = siord)
knapp.pack(pady = 10)
root.mainloop()
like image 185
unutbu Avatar answered Nov 09 '22 06:11

unutbu