Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read json file containing ObjectId and ISODate in Python?

I want to read a JSON file that contains ObjectId and ISODate.

JSON DATA :

{
    "_id" : ObjectId("5baca841d25ce14b7d3d017c"),
    "country" : "in",
    "state" : "",
    "date" : ISODate("1902-01-31T00:00:00.000Z")
}
like image 503
Rochit Jain Avatar asked Oct 05 '18 20:10

Rochit Jain


1 Answers

I want to expand a little on Maviles' answer by adding a couple of notes from a couple of other SO questions.

First, from «Unable to deserialize PyMongo ObjectId from JSON» we learn that this data looks like the Python representation of an actual BSON/MOngo Extended JSON object. Native BSON files are also binaries, not text.

Second, from «How can I use Python to transform MongoDB's bsondump into JSON?» we can expand on Fabian Fagerholm's answer:

def read_mongoextjson_file(filename):
    with open(filename, "r") as f:
        # read the entire input; in a real application,
        # you would want to read a chunk at a time
        bsondata = '['+f.read()+']'

        # convert the TenGen JSON to Strict JSON
        # here, I just convert the ObjectId and Date structures,
        # but it's easy to extend to cover all structures listed at
        # http://www.mongodb.org/display/DOCS/Mongo+Extended+JSON
        jsondata = re.sub(r'ObjectId\s*\(\s*\"(\S+)\"\s*\)',
                          r'{"$oid": "\1"}',
                          bsondata)
        jsondata = re.sub(r'ISODate\s*\(\s*(\S+)\s*\)',
                          r'{"$date": \1}',
                          jsondata)
        jsondata = re.sub(r'NumberInt\s*\(\s*(\S+)\s*\)',
                          r'{"$numberInt": "\1"}',
                          jsondata)

        # now we can parse this as JSON, and use MongoDB's object_hook
        # function to get rich Python data structures inside a dictionary
        data = json.loads(jsondata, object_hook=json_util.object_hook)

        return data

As you see comparing the previous version and this one it is quite simple to handle the types. Use MongoDB Extended JSON reference for any other you need.

A couple of additional caveats:

  • the file I was working on was a series of objects, but it wasn't a list, I have worked around by putting everything in square brackets:
   bsondata = '['+f.read()+']'

Otherwise I would get a JSONDecodeError: Extra data: line 38 column 2 (char 1016) at the end of the first object.

  • in the first attempts, I had an issue with json_utils from bson, this thread «importing json_utils issues ImportError» helped me, i. e.:
pip uninstall bson
pip uninstall pymongo
pip install pymongo

Here's a paste with a complete working example.

like image 72
CristianCantoro Avatar answered Oct 19 '22 21:10

CristianCantoro