Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TypeError: decoding Unicode is not supported

Tags:

python

unicode

New to python....Trying to get the parser to decode properly into a sqlite database but it just won't work :(

# coding: utf8
from pysqlite2 import dbapi2 as sqlite3
import urllib2
from bs4 import BeautifulSoup
from string import *


conn = sqlite3.connect(':memory:')
cursor = conn.cursor()

# # create a table
def createTable():
    cursor.execute("""CREATE TABLE characters
                      (rank INTEGER PRIMARY KEY, word TEXT, definition TEXT) 
                   """)


def insertChar(rank,word,definition):
    cursor.execute("""INSERT INTO characters (rank,word,definition)
                        VALUES (?,?,?)""",(rank,word,definition))


def main():
    createTable()

    # u = unicode("辣", "utf-8")

    # insertChar(1,u,"123123123")

    soup = BeautifulSoup(urllib2.urlopen('http://www.zein.se/patrick/3000char.html').read())
    # print (html_doc.prettify())   

    tables = soup.blockquote.table

    # print tables

    rows = tables.find_all('tr')
    result=[]
    for tr in rows:
        cols = tr.find_all('td')
        character = []
        x = cols[0].string 
        y = cols[1].string 
        z = cols[2].string 
        xx = unicode(x, "utf-8")
        yy = unicode(y , "utf-8")
        zz = unicode(z , "utf-8")
        insertChar(xx,yy,zz)

    conn.commit() 

main()

I keep getting the follow error: TypeError: decoding Unicode is not supported

WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
Traceback (most recent call last):
  File "sqlitetestbed.py", line 64, in <module>
    main()
  File "sqlitetestbed.py", line 48, in main
    xx = unicode(x, "utf-8")


Traceback (most recent call last):
File "sqlitetestbed.py", line 52, in <module>
main()
File "sqlitetestbed.py", line 48, in main
insertChar(x,y,z)
File "sqlitetestbed.py", line 20, in insertChar
VALUES (?,?,?)""",(rank,word,definition))
pysqlite2.dbapi2.IntegrityError: datatype mismatch

I'm probably doing something thats really stupid... :( Please tell me what I'm doing wrong... Thanks!

like image 459
user805981 Avatar asked Feb 25 '13 22:02

user805981


People also ask

How do I fix Unicode decode errors in Python?

The Python "UnicodeDecodeError: 'ascii' codec can't decode byte in position" occurs when we use the ascii codec to decode bytes that were encoded using a different codec. To solve the error, specify the correct encoding, e.g. utf-8 .

What is Unicode decode error?

The UnicodeDecodeError normally happens when decoding an str string from a certain coding. Since codings map only a limited number of str strings to unicode characters, an illegal sequence of str characters will cause the coding-specific decode() to fail.

What is Unicode string?

Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF . A Unicode string is a sequence of zero or more code points.


1 Answers

x is already unicode, as the cols[0].string field contains unicode (just as documented).

like image 61
wRAR Avatar answered Oct 15 '22 10:10

wRAR