Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract json fields and write them into a csv with python

Tags:

python

json

csv

I've got a very big json with multiple fields and I want to extract just some of them and then write them into a csv.

Here is my code:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import json

import csv

data_file = open("book_data.json", "r")
values = json.load(data_file)
data_file.close()

with open("book_data.csv", "wb") as f:
    wr = csv.writer(f)
    for data in values:
         value = data["identifier"]
         value = data["authors"]
         for key, value in data.iteritems():
               wr.writerow([key, value])

It gives me this error:

File "json_to_csv.py", line 22, in <module>
wr.writerow([key, value])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 8: ordinal not in range(128)

But I give the utf-8 encoding on the top, so I don't know what's wrong there.

Thanks

like image 934
Lara M. Avatar asked Nov 27 '25 13:11

Lara M.


1 Answers

You need to encode the data:

wr.writerow([key.encode("utf-8"), value.encode("utf-8")])

The difference is equivalent to:

In [8]: print u'\u2019'.encode("utf-8")
’

In [9]: print str(u'\u2019')
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-9-4e3ad09ee31b> in <module>()
----> 1 print str(u'\u2019')

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 0: ordinal not in range(128)

If you have a mixture of strings and lists and values, you can use issinstance to check what you have, if you have a list iterate over and encode:

with open("book_data.csv", "wb") as f:
    wr = csv.writer(f)
    for data in values:
         for key, value in data.iteritems():
               wr.writerow([key, ",".join([v.encode("utf-8") for v in value]) if isinstance(value, list) else value.encode("utf8")])

To just write the three columns creator, contributor and identifier, just pull the data using the keys:

import csv

with open("book_data.csv", "wb") as f:
    wr = csv.writer(f)
    for dct in values:
        authors = dct["authors"]
        wr.writerow((",".join(authors["creator"]).encode("utf-8"),
                     "".join(authors["contributor"]).encode("utf-8"),
                     dct["identifier"].encode("utf-8")))
like image 110
Padraic Cunningham Avatar answered Nov 30 '25 03:11

Padraic Cunningham