Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode Hell in Pyramid: MySQL -> SQLAlchemy -> Pyramid -> JSON

Background

I'm in a real mess with unicode and Python. It seems to be a common angst and I've tried using other solutions out there but I just can't get my head around it.

Setup

MySQL Database Setup

  • collation_database: utf8_general_ci
  • character_set_database: utf8

SQLAlchemy Model

class Product(Base):
    id = Column('product_id', Integer, primary_key=True)
    name = Column('product_name', String(64)) #Tried using Unicode() but didn't help

Pyramid View

@view_config(renderer='json', route_name='products_search')
def products_search(request):
    json_products = []
    term = "%%%s%%" % request.params['term']

    products = dbsession.query(Product).filter(Product.name.like(term)).all()

    for prod in products:
        json_prod = {'id': prod.id, 'label': prod.name, 'value': prod.name, 'sku': prod.sku, 'price': str(prod.price[0].price)}
        json_products.append(json_prod)

    return json_products

Problem

I get encoding errors reported from the json module (which is called as its the renderer for this route) like so:

UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 37: invalid start byte

The culprit is a "-" (dash symbol) in the prod.name value. Full stack trace here. If the returned products don't have a "-" in then it all works fine!

Tried

I've tried encoding, decoding with various types before returning the json_products variable.

like image 526
RustyFluff Avatar asked Aug 21 '11 15:08

RustyFluff


1 Answers

The above comment is right, but more specifially, you can replace 'label': prod.name with 'label': prod.name.decode("cp1252"). You should probably also do this for all of the strings in your json_prod dictionary since it's likely you'll see cp1252 encoded characters elsewhere in the real-world use of your application.

On that note, depending on the source of these strings and how widely that source is used in your app, you may run into such problems elsewhere in your app and generally when you least expect it. To investigate further you may want to figure What the source for these strings is and if you can do the decoding/re-encoding at a lower level to correct most future problems with this.

like image 193
Ross Patterson Avatar answered Oct 23 '22 03:10

Ross Patterson