Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decompress zlib stream in Clojure

I have a binary file with contents created by zlib.compress on Python, is there an easy way to open and decompress it in Clojure?

import zlib
import json

with open('data.json.zlib', 'wb') as f:
    f.write(zlib.compress(json.dumps(data).encode('utf-8')))

Basicallly it isn't a gzip file, it is just bytes representing deflated data.

I could only find these references but not quite what I'm looking for (I think first two are most relevant):

  • deflateclj_hatemogi_clojure/deflate.clj
  • funcool/buddy-core/deflate.clj
  • Compressing / Decompressing strings in clojure
  • Reading and Writing Compressed Files
  • clj-http

Must I really implement this multi-line wrapper to java.util.zip or is there a nice library out there? Actually I'm not even sure if these byte streams are compatible across libraries, or if I'm just trying to mix-and-match wrong libs.

Steps in Python:

>>> '{"hello": "world"}'.encode('utf-8')
b'{"hello": "world"}'
>>> zlib.compress(b'{"hello": "world"}')
b'x\x9c\xabV\xcaH\xcd\xc9\xc9W\xb2RP*\xcf/\xcaIQ\xaa\x05\x009\x99\x06\x17'
>>> [int(i) for i in zlib.compress(b'{"hello": "world"}')]
[120, 156, 171, 86, 202, 72, 205, 201, 201, 87, 178, 82, 80, 42, 207, 47, 202, 73, 81, 170, 5, 0, 57, 153, 6, 23]
>>> import numpy
>>> [numpy.int8(i) for i in zlib.compress(b'{"hello": "world"}')]
[120, -100, -85, 86, -54, 72, -51, -55, -55, 87, -78, 82, 80, 42, -49, 47, -54, 73, 81, -86, 5, 0, 57, -103, 6, 23]
>>> zlib.decompress(bytes([120, 156, 171, 86, 202, 72, 205, 201, 201, 87, 178, 82, 80, 42, 207, 47, 202, 73, 81, 170, 5, 0, 57, 153, 6, 23])).decode('utf-8')
'{"hello": "world"}'

Decode attempt in Clojure:

; https://github.com/funcool/buddy-core/blob/master/src/buddy/util/deflate.clj#L40 without try-catch
(ns so.core
  (:import java.io.ByteArrayInputStream
           java.io.ByteArrayOutputStream
           java.util.zip.Deflater
           java.util.zip.DeflaterOutputStream
           java.util.zip.InflaterInputStream
           java.util.zip.Inflater
           java.util.zip.ZipException)
  (:gen-class))

(defn uncompress
  "Given a compressed data as byte-array, uncompress it and return as an other byte array."
  ([^bytes input] (uncompress input nil))
  ([^bytes input {:keys [nowrap buffer-size]
                  :or {nowrap true buffer-size 2048}
                  :as opts}]
   (let [buf  (byte-array (int buffer-size))
         os   (ByteArrayOutputStream.)
         inf  (Inflater. ^Boolean nowrap)]
     (with-open [is  (ByteArrayInputStream. input)
                 iis (InflaterInputStream. is inf)]
       (loop []
         (let [readed (.read iis buf)]
           (when (pos? readed)
             (.write os buf 0 readed)
             (recur)))))
     (.toByteArray os))))

(uncompress (byte-array [120, -100, -85, 86, -54, 72, -51, -55, -55, 87, -78, 82, 80, 42, -49, 47, -54, 73, 81, -86, 5, 0, 57, -103, 6, 23]))
ZipException invalid stored block lengths  java.util.zip.InflaterInputStream.read (InflaterInputStream.java:164)

Any help would be appreciated. I wouldn't want to use zip or gzip files as I only care about raw content, not file names or modification dates in this context. But is possible to use an other compression algorithm on Python side if it is the only option.

like image 808
NikoNyrh Avatar asked Jan 31 '17 13:01

NikoNyrh


1 Answers

Here is an easy way to do it with gzip:

Python code:

import gzip
content = "the quick brown fox"
with gzip.open('fox.txt.gz', 'wb') as f:
    f.write(content)

Clojure code:

(with-open [in (java.util.zip.GZIPInputStream.
                (clojure.java.io/input-stream
                 "fox.txt.gz"))]
  (println "result:" (slurp in)))

;=>  result: the quick brown fox

Keep in mind that "gzip" is an algorithm and a format, and does not mean you need to use the "gzip" command-line tool.

Please note that the input to Clojure doesn't have to be a file. You could send the gzip compressed data as raw bytes over a socket and still decompress it on the Clojure side. Full details at: https://clojuredocs.org/clojure.java.io/input-stream

Update

If you need to use the pure zlib format instead of gzip, the result is very similar:

Python code:

import zlib
fp = open( 'balloon.txt.z', 'wb' )
fp.write( zlib.compress( 'the big red baloon' ))
fp.close()

Clojure code:

(with-open [in (java.util.zip.InflaterInputStream.
                (clojure.java.io/input-stream
                 "balloon.txt.z"))]
  (println "result:" (slurp in)))

;=> result: the big red baloon
like image 98
Alan Thompson Avatar answered Sep 22 '22 15:09

Alan Thompson