Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

json.dump to a gzip file in python 3

Tags:

python-3.x

I'm trying to write a object to a gzipped json file in one step (minimising code, and possibly saving memory space). My initial idea (python3) was thus:

import gzip, json
with gzip.open("/tmp/test.gz", mode="wb") as f:
  json.dump({"a": 1}, f)

This however fails: TypeError: 'str' does not support the buffer interface, which I think has to do with a string not being encoded to bytes. So what is the proper way to do this?

Current solution that I'm unhappy with:

Opening the file in text-mode solves the problem:

import gzip, json
with gzip.open("/tmp/test.gz", mode="wt") as f:
  json.dump({"a": 1}, f)

however I don't like text-mode. In my mind (and maybe this is wrong, but supported by this), text mode is used to fix line-endings. This shouldn't be an issue because json doesn't have line endings, but I don't like it (possibly) messing with my bytes, it (possibly) is slower because it's looking for line-endings to fix, and (worst of all) I don't understand why something about line-endings fixes my encoding problems?

like image 348
Claude Avatar asked May 18 '15 08:05

Claude


People also ask

Can I gzip JSON?

Compressing with gzip As text data, JSON data compresses nicely. That's why gzip is our first option to reduce the JSON data size. Moreover, it can be automatically applied in HTTP, the common protocol for sending and receiving JSON.

How do I parse JSON in Python 3?

If you need to parse a JSON string that returns a dictionary, then you can use the json. loads() method. If you need to parse a JSON file that returns a dictionary, then you can use the json. load() method.

How do you minify JSON in Python?

The default is (', ', ': ') if indent is None and (',', ': ') otherwise. To get the most compact JSON representation, you should specify (',', ':') to eliminate whitespace.


1 Answers

offtopic: I should have dived into the docs a bit further than I initially did.

The python docs show:

Normally, files are opened in text mode, that means, you read and write strings from and to the file, which are encoded in a specific encoding (the default being UTF-8). 'b' appended to the mode opens the file in binary mode: now the data is read and written in the form of bytes objects. This mode should be used for all files that don’t contain text.

I don't fully agree that the result from a json encode is a string (I think it should be a set of bytes, since it explicitly defines that it uses utf-8 encoding), but I noticed this before. So I guess text mode it is.

like image 78
Claude Avatar answered Oct 22 '22 06:10

Claude