Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read stream from s3 with Clojure/Java

I have a large file on s3 which I wish to decode and parse as it downloads. I happen to be using the clojure Amazonica library, but any library will do.

I can easily get a stream:

(def stream (-> (get-object "some-s3-bucket" "some-object-key") :input-stream))

; returns: #<S3ObjectInputStream com.amazonaws.services.s3.model.S3ObjectInputStream

but how do I read the stream? Can I read it a line at a time (the decompressed contents are JSON lines)?

(If there's any ambiguity in my question, I am only concerned with the reading of the stream, not any part of the gzip decoding)

like image 769
David Avatar asked Nov 02 '15 07:11

David


1 Answers

If it's helpful to anyone, this is what I came up with following D-Side's helpful response.

(ns some-project.get-s3-stream
    (:require [aws.sdk.s3 :as s3])
    (:require [clojure.java.io :as io])
    (:use [amazonica.aws.s3])
    (:import [java.util.zip GZIPInputStream]))

(def bucket "some-s3-bucket")
(def object-key "some-object-key")

(def seq-of-json-lines
  (->
   (get-object bucket object-key)
   :object-content
   (java.util.zip.GZIPInputStream.)
   io/reader
   line-seq))
like image 68
David Avatar answered Oct 21 '22 09:10

David