How can I convert a dict to a unicode JSON string?

Tags:

This doesn't appear to be possible to me using the standard library json module. When using json.dumps it will automatically escape all non-ASCII characters then encode the string to ASCII. I can specify that it not escape non-ASCII characters, but then it crashes when it tries to convert the output to ASCII.

The problem is - I don't want ASCII! I just want my JSON string back as a unicode (or UTF-8) string. Are there any convenient ways to do that?

Here's an example to demonstrate what I want:

d = {'navn': 'Åge', 'stilling': 'Lærling'}
json.dumps(d, output_encoding='utf8')
# => '{"stilling": "Lærling", "navn": "Åge"}'

But of course, there is no such option as output_encoding, so here's the actual output:

d = {'navn': 'Åge', 'stilling': 'Lærling'}
json.dumps(d)
# => '{"stilling": "L\\u00e6rling", "navn": "\\u00c5ge"}'

So to summarize - I want to convert a Python dict to an UTF-8 JSON string without any escapes. How can I do that?

I'll accept solutions like:

Hacks (pre- and post processing input to dumps to achieve the desired effect)
Subclassing the JSONEncoder (I have no idea how it works and the documentation isn't very helpful)
Third party libraries available on PyPi

450

asked Jul 28 '12 08:07

Hubro

2 Answers

Requirements

Make sure your python files are encoded in UTF-8. Or else your non-ascii characters will become question marks, ?. Notepad++ has excellent encoding options for this.
Make sure that you have the appropriate fonts included. If you want to display Japanese characters then you need to install Japanese fonts.
Make sure that your IDE supports displaying unicode characters. Otherwise you might get an UnicodeEncodeError error thrown.

Example:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 22-23: character maps to <undefined>

PyScripter works for me. It's included with "Portable Python" at http://portablepython.com/wiki/PortablePython3.2.1.1

Make sure you're using Python 3+, since this version offers better unicode support.

Problem

json.dumps() escapes unicode characters.

Solution

Read the update at the bottom. Or...

Replace each escaped characters with the parsed unicode character.

I created a simple lambda function called getStringWithDecodedUnicode that does just that.

import re   
getStringWithDecodedUnicode = lambda str : re.sub( '\\\\u([\da-f]{4})', (lambda x : chr( int( x.group(1), 16 ) )), str )

Here's getStringWithDecodedUnicode as a regular function.

def getStringWithDecodedUnicode( value ):
    findUnicodeRE = re.compile( '\\\\u([\da-f]{4})' )
    def getParsedUnicode(x):
        return chr( int( x.group(1), 16 ) )

    return  findUnicodeRE.sub(getParsedUnicode, str( value ) )

Example

testJSONWithUnicode.py (Using PyScripter as the IDE)

import re
import json
getStringWithDecodedUnicode = lambda str : re.sub( '\\\\u([\da-f]{4})', (lambda x : chr( int( x.group(1), 16 ) )), str )

data = {"Japan":"日本"}
jsonString = json.dumps( data )
print( "json.dumps({0}) = {1}".format( data, jsonString ) )
jsonString = getStringWithDecodedUnicode( jsonString )
print( "Decoded Unicode: %s" % jsonString )

Output

json.dumps({'Japan': '日本'}) = {"Japan": "\u65e5\u672c"}
Decoded Unicode: {"Japan": "日本"}

Update

Or... just pass ensure_ascii=False as an option for json.dumps.

Note: You need to meet the requirements that I outlined at the beginning or else this isn't going to work.

import json
data = {'navn': 'Åge', 'stilling': 'Lærling'}
result = json.dumps(d, ensure_ascii=False)
print( result ) # prints '{"stilling": "Lærling", "navn": "Åge"}'

146

answered Oct 14 '22 11:10

Larry Battle

encode_ascii=False is the best solution IMHO.

If you are using Python2.7, here is example python file :

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# example.py
from __future__ import unicode_literals
from json import dumps as json_dumps
d = {'navn': 'Åge', 'stilling': 'Lærling'}
print json_dumps(d, ensure_ascii=False).encode('utf-8')

answered Oct 14 '22 09:10

Xiao

Related questions
                            
                                Graphviz - Drawing maximal cliques
                            
                                XMPP server for Python [closed]
                            
                                I found myself swinging the list comprehension hammer
                            
                                Create a dynamic 2D numpy array on the fly
                            
                                How to safely run unreliable piece of code?
                            
                                How I do to update data on many-to-many with WTForms and SQLAlchemy?
                            
                                Python: How to create simple web pages without a huge framework? [closed]
                            
                                Positioning of classes in UML diagram
                            
                                What is a good pythonic way of finding duplicate objects?
                            
                                Reading a list of lists from a file as list of lists in python
                            
                                Python 3.x: Using string.maketrans() in order to create a unicode-character transformation
                            
                                What does the lambda self: do
                            
                                How to change the value stored in memory address?
                            
                                How to process a string into layer of sublists
                            
                                How to generate a cross platform interface with SWIG?
                            
                                What is the simplest way to add a hyperlink to a canvas element in ReportLab?
                            
                                Is it possible to use read_csv to read only specific lines?
                            
                                Plotting a masked surface plot using python, numpy and matplotlib
                            
                                Is there a faster way of converting a number to a name?
                            
                                TypeError: unsupported type for timedelta days component: datetime.datetime

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With