Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I download a file from the Internet using Haskell?

Tags:

haskell

I'm just trying to do something similar to wget, where I download a file from the Internet. I saw that there used to be a package called http-wget, but that it's been deprecated in favor of http-conduit.

Http-conduit has a simple example for how to get the contents of a web page using httpBS. So following that, I got this to work:

{-# LANGUAGE OverloadedStrings #-}

import Network.HTTP.Simple
import qualified Data.ByteString.Char8 as B8

main :: IO ()
main = do
  let url = "https://www.example.com/sitemap.xml"
  resp <- httpBS url
  B8.putStrLn $ getResponseBody resp

And this works for getting the filename (sitemap.xml) from the URL:

{-# LANGUAGE OverloadedStrings #-}

import Network.HTTP.Simple
import qualified Data.ByteString.Char8 as B8

main :: IO ()
main = do
  let url = "https://www.example.com/sitemap.xml"
  let urlParts = B8.split '/' $ B8.pack url
  let fileName = Prelude.last urlParts
  B8.putStrLn fileName

But I can't put them together:

{-# LANGUAGE OverloadedStrings #-}

import Network.HTTP.Simple
import qualified Data.ByteString.Char8 as B8

main :: IO ()
main = do
  let url = "https://www.example.com/sitemap.xml"
  let urlParts = B8.split '/' $ B8.pack url
  let fileName = Prelude.last urlParts
  resp <- httpBS url
  B8.putStrLn $ getResponseBody resp

That gives the error:

ny1920-parse.hs:12:41: error:
    • Couldn't match type ‘Request’ with ‘[Char]’
      Expected type: String
        Actual type: Request
    • In the first argument of ‘B8.pack’, namely ‘url’
      In the second argument of ‘($)’, namely ‘B8.pack url’
      In the expression: B8.split '/' $ B8.pack url
   |
12 |   let urlParts = B8.split '/' $ B8.pack url
   |                                         ^^^

So I just need to convert String -> Request? There's apparently a function for that in http-conduit, but it doesn't work as expected—I still get the same error.

I can force the URL to be a Request like this:

  let url = "https://www.example.com/sitemap.xml" :: Request

But then of course that breaks the part where I break up the filename, because it expects a [Char] and not a Request.

So I'm stuck—if I make the URL a String, it breaks http-conduit. And if I make it a Request, it breaks the string manipulation.

I feel like something this simple shouldn't be this hard, no?

Edit: Ok, so I can almost get it to work with this addition:

  let urlParts = B8.split '/' $ B8.pack (show url)

That compiles, but it makes the filename corrupt. Trying to print out the filename gives: "1.1\n}\n" instead of sitemap.xml.

like image 274
Jonathan Avatar asked Jan 25 '23 08:01

Jonathan


1 Answers

I'm going to disagree with the other answer here: splitting on / yourself is a bad idea. Don't try to implement an ad-hoc URL parser; it's way harder than you think. Instead, re-use the parse that you already have:

{-# LANGUAGE OverloadedStrings #-}

import Network.HTTP.Client
import Network.HTTP.Simple
import Network.URI
import qualified Data.ByteString.Char8 as B8

main :: IO ()
main = do
    let request = "https://www.example.com/sitemap.xml"
        fileName = Prelude.last . pathSegments . getUri $ request
    resp <- httpBS request
    B8.putStrLn $ getResponseBody resp

See the documentation for more on the parts you can extract from a URI.

like image 139
Daniel Wagner Avatar answered Mar 07 '23 07:03

Daniel Wagner