Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell streaming download

The two resources I found that suggested recipes for streaming downloads using popular Haskell libraries were:

  • https://haskell-lang.org/library/http-client#Streaming
  • http://www.alfredodinapoli.com/posts/2013-07-20-slick-http-download-in-haskell.html

How would I modify the code in the former to (a) save to file, and (b) print only a (take 5) of the byte response, rather than the whole response to stdout?

My attempt at (b) is:

#!/usr/bin/env stack
{- stack --install-ghc --resolver lts-5.13 runghc
   --package http-conduit
 -}
{-# LANGUAGE OverloadedStrings #-}
import           Control.Monad.IO.Class (liftIO)
import qualified Data.ByteString        as S
import qualified Data.Conduit.List      as CL
import           Network.HTTP.Simple
import           System.IO              (stdout)

main :: IO ()
main = httpSink "http://httpbin.org/get" $ \response -> do
    liftIO $ putStrLn
           $ "The status code was: "
          ++ show (getResponseStatusCode response)

    CL.mapM_ (take 5) (S.hPut stdout)

Which fails to map the (take 5), and suggests to me among other things I still don't understand how mapping over monads works, or liftIO.

Also, this resource:

http://haskelliseasy.readthedocs.io/en/latest/#note-on-streaming

...gave me a warning, "I know what I'm doing and I'd like more fine-grained control over resources, such as streaming" that this not easily or generally supported.

Other places I looked:

  • Downloading large files from the Internet in Haskell
  • https://hackage.haskell.org/package/wreq
  • https://hackage.haskell.org/package/pipes-http

If there's anything in the Haskellverse that makes this easier, more like Python's requests:

response = requests.get(URL, stream=True)
for i,chunk in enumerate(response.iter_content(BLOCK)):
  f.write(chunk)

I'd appreciate the tip there, too, or pointers towards the 2016 state of the art.

like image 624
Mittenchops Avatar asked Nov 28 '16 03:11

Mittenchops


1 Answers

You are probably looking for httpSource from the latest version of http-conduit. It behaves pretty much exactly like Python's requests: you get back a stream of chunks.

save to file

This is easy, just redirect the source straight into a file sink.

#!/usr/bin/env stack
{- stack --install-ghc --resolver nightly-2016-11-26 runghc --package http-conduit -}

{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Simple (httpSource, getResponseBody)
import Conduit

main = runConduitRes $ httpSource "http://httpbin.org/get" getResponseBody
                    .| sinkFile "data_file"

print only a (take 5) of the byte response

Once we have the source, we take the first 5 bytes with takeCE 5 and then print these via printC.

#!/usr/bin/env stack
{- stack --install-ghc --resolver nightly-2016-11-26 runghc --package http-conduit -}

{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Simple (httpSource, getResponseBody)
import Data.ByteString (unpack)
import Conduit

main = runConduitRes $ httpSource "http://httpbin.org/get" getResponseBody
                    .| takeCE 5
                    .| printC

save to file and print only a (take 5) of the byte response

To do this, you want zipSinks or, for more general cases that involve zipping multiple sinks ZipSink:

#!/usr/bin/env stack
{- stack --install-ghc --resolver nightly-2016-11-26 runghc --package http-conduit -}

{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Simple (httpSource, getResponseBody)
import Data.ByteString (unpack)
import Data.Conduit.Internal (zipSinks)
import Conduit

main = runConduitRes $ httpSource "http://httpbin.org/get" getResponseBody
                    .| zipSinks (takeCE 5 .| printC)
                                (sinkFile "data_file")
like image 60
Alec Avatar answered Nov 13 '22 23:11

Alec