Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing pandas DataFrame to JSON in unicode

I'm trying to write a pandas DataFrame containing unicode to json, but the built in .to_json function escapes the characters. How do I fix this?

Example:

import pandas as pd df = pd.DataFrame([['τ', 'a', 1], ['π', 'b', 2]]) df.to_json('df.json') 

This gives:

{"0":{"0":"\u03c4","1":"\u03c0"},"1":{"0":"a","1":"b"},"2":{"0":1,"1":2}} 

Which differs from the desired result:

{"0":{"0":"τ","1":"π"},"1":{"0":"a","1":"b"},"2":{"0":1,"1":2}} 


I have tried adding the force_ascii=False argument:
import pandas as pd df = pd.DataFrame([['τ', 'a', 1], ['π', 'b', 2]]) df.to_json('df.json', force_ascii=False) 

But this gives the following error:

UnicodeEncodeError: 'charmap' codec can't encode character '\u03c4' in position 11: character maps to <undefined> 


I'm using WinPython 3.4.4.2 64bit with pandas 0.18.0
like image 620
Swier Avatar asked Sep 21 '16 09:09

Swier


People also ask

How do I write pandas DataFrame to JSON?

To convert the object to a JSON string, then use the Pandas DataFrame. to_json() function. Pandas to_json() is an inbuilt DataFrame function that converts the object to a JSON string. To export pandas DataFrame to a JSON file, then use the to_json() function.

What is Orient records in pandas?

orient: String value, ('dict', 'list', 'series', 'split', 'records', 'index') Defines which dtype to convert Columns(series into). For example, 'list' would return a dictionary of lists with Key=Column name and Value=List (Converted series).

What is JSON normalize?

This package contains a function, json_normalize. It will take a json-like structure and convert it to a map object which returns dicts. Output dicts will have their path joined by ".", this can of course be customized.


1 Answers

Opening a file with the encoding set to utf-8, and then passing that file to the .to_json function fixes the problem:

with open('df.json', 'w', encoding='utf-8') as file:     df.to_json(file, force_ascii=False) 

gives the correct:

{"0":{"0":"τ","1":"π"},"1":{"0":"a","1":"b"},"2":{"0":1,"1":2}} 

Note: it does still require the force_ascii=False argument.

like image 62
Swier Avatar answered Sep 18 '22 19:09

Swier