I have a binary file with contents created by zlib.compress
on Python, is there an easy way to open and decompress it in Clojure?
import zlib
import json
with open('data.json.zlib', 'wb') as f:
f.write(zlib.compress(json.dumps(data).encode('utf-8')))
Basicallly it isn't a gzip file, it is just bytes representing deflated data.
I could only find these references but not quite what I'm looking for (I think first two are most relevant):
Must I really implement this multi-line wrapper to java.util.zip
or is there a nice library out there? Actually I'm not even sure if these byte streams are compatible across libraries, or if I'm just trying to mix-and-match wrong libs.
Steps in Python:
>>> '{"hello": "world"}'.encode('utf-8')
b'{"hello": "world"}'
>>> zlib.compress(b'{"hello": "world"}')
b'x\x9c\xabV\xcaH\xcd\xc9\xc9W\xb2RP*\xcf/\xcaIQ\xaa\x05\x009\x99\x06\x17'
>>> [int(i) for i in zlib.compress(b'{"hello": "world"}')]
[120, 156, 171, 86, 202, 72, 205, 201, 201, 87, 178, 82, 80, 42, 207, 47, 202, 73, 81, 170, 5, 0, 57, 153, 6, 23]
>>> import numpy
>>> [numpy.int8(i) for i in zlib.compress(b'{"hello": "world"}')]
[120, -100, -85, 86, -54, 72, -51, -55, -55, 87, -78, 82, 80, 42, -49, 47, -54, 73, 81, -86, 5, 0, 57, -103, 6, 23]
>>> zlib.decompress(bytes([120, 156, 171, 86, 202, 72, 205, 201, 201, 87, 178, 82, 80, 42, 207, 47, 202, 73, 81, 170, 5, 0, 57, 153, 6, 23])).decode('utf-8')
'{"hello": "world"}'
Decode attempt in Clojure:
; https://github.com/funcool/buddy-core/blob/master/src/buddy/util/deflate.clj#L40 without try-catch
(ns so.core
(:import java.io.ByteArrayInputStream
java.io.ByteArrayOutputStream
java.util.zip.Deflater
java.util.zip.DeflaterOutputStream
java.util.zip.InflaterInputStream
java.util.zip.Inflater
java.util.zip.ZipException)
(:gen-class))
(defn uncompress
"Given a compressed data as byte-array, uncompress it and return as an other byte array."
([^bytes input] (uncompress input nil))
([^bytes input {:keys [nowrap buffer-size]
:or {nowrap true buffer-size 2048}
:as opts}]
(let [buf (byte-array (int buffer-size))
os (ByteArrayOutputStream.)
inf (Inflater. ^Boolean nowrap)]
(with-open [is (ByteArrayInputStream. input)
iis (InflaterInputStream. is inf)]
(loop []
(let [readed (.read iis buf)]
(when (pos? readed)
(.write os buf 0 readed)
(recur)))))
(.toByteArray os))))
(uncompress (byte-array [120, -100, -85, 86, -54, 72, -51, -55, -55, 87, -78, 82, 80, 42, -49, 47, -54, 73, 81, -86, 5, 0, 57, -103, 6, 23]))
ZipException invalid stored block lengths java.util.zip.InflaterInputStream.read (InflaterInputStream.java:164)
Any help would be appreciated. I wouldn't want to use zip or gzip files as I only care about raw content, not file names or modification dates in this context. But is possible to use an other compression algorithm on Python side if it is the only option.
Here is an easy way to do it with gzip:
Python code:
import gzip
content = "the quick brown fox"
with gzip.open('fox.txt.gz', 'wb') as f:
f.write(content)
Clojure code:
(with-open [in (java.util.zip.GZIPInputStream.
(clojure.java.io/input-stream
"fox.txt.gz"))]
(println "result:" (slurp in)))
;=> result: the quick brown fox
Keep in mind that "gzip" is an algorithm and a format, and does not mean you need to use the "gzip" command-line tool.
Please note that the input to Clojure doesn't have to be a file. You could send the gzip compressed data as raw bytes over a socket and still decompress it on the Clojure side. Full details at: https://clojuredocs.org/clojure.java.io/input-stream
If you need to use the pure zlib
format instead of gzip
, the result is very similar:
Python code:
import zlib
fp = open( 'balloon.txt.z', 'wb' )
fp.write( zlib.compress( 'the big red baloon' ))
fp.close()
Clojure code:
(with-open [in (java.util.zip.InflaterInputStream.
(clojure.java.io/input-stream
"balloon.txt.z"))]
(println "result:" (slurp in)))
;=> result: the big red baloon
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With