Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed up process of loading and reading JSON files in Python?

Tags:

python

json

I am running a script (in multiprocessing mode) that extract some parameters from a bunch of JSON files but currently it is very slow. Here is the script:

from __future__ import print_function, division
import os
from glob import glob
from os import getpid
from time import time
from sys import stdout
import resource
from multiprocessing import Pool
import subprocess
try:
    import simplejson as json
except ImportError:
    import json


path = '/data/data//*.A.1'
print("Running with PID: %d" % getpid())

def process_file(file):
    start = time()
    filename =file.split('/')[-1]
    print(file)
    with open('/data/data/A.1/%s_DI' %filename, 'w') as w:
        with open(file, 'r') as f:
            for n, line in enumerate(f):
                d = json.loads(line)
                try:

                    domain = d['rrname']
                    ips = d['rdata']
                    for i in ips:
                        print("%s|%s" % (i, domain), file=w)
                except:
                    print (d)
                    pass

if __name__ == "__main__":
    files_list = glob(path)
    cores = 12
    print("Using %d cores" % cores)
    pp = Pool(processes=cores)
    pp.imap_unordered(process_file, files_list)
    pp.close()
    pp.join()

Does any body know how to speed this up?

like image 946
UserYmY Avatar asked Dec 10 '14 17:12

UserYmY


2 Answers

swith from

import json 

to

import ujson

https://artem.krylysov.com/blog/2015/09/29/benchmark-python-json-libraries/

or switch to orjson

import orjson 

https://github.com/ijl/orjson

like image 53
Ryabchenko Alexander Avatar answered Oct 12 '22 16:10

Ryabchenko Alexander


First, find out where your bottlenecks are.

If it is on the json decoding/encoding step, try switching to ultrajson:

UltraJSON is an ultra fast JSON encoder and decoder written in pure C with bindings for Python 2.5+ and 3.

The changes would be as simple as changing the import part:

try:
    import ujson as json
except ImportError:
    try:
        import simplejson as json
    except ImportError:
        import json

I've also done a simple benchmark at What is faster - Loading a pickled dictionary object or Loading a JSON file - to a dictionary?, take a look.

like image 11
alecxe Avatar answered Oct 12 '22 15:10

alecxe