Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell Network.HTTP incorrectly downloading image

I'm trying to download images using the Network.HTTP module and having little success.

import Network.HTTP

main = do
  jpg <- get "http://www.irregularwebcomic.net/comics/irreg2557.jpg"
  writeFile "irreg2557.jpg" jpg where
       get url = simpleHTTP (getRequest url) >>= getResponseBody

The output file appears in the current directory, but fails to display under chromium or ristretto. Ristretto reports "Error interpreting JPEG image file (Not a JPEG file: starts with 0c3 0xbf)".

like image 262
Jack Kelly Avatar asked Jul 17 '12 00:07

Jack Kelly


1 Answers

writeFile :: FilePath -> String -> IO ()

String. That's your problem, right there. String is for unicode text. Attempting to store binary data in it will lead to corruption. It's not clear in this case whether the corruption is being done by simpleHTTP or by writeFile, but it's ultimately unimportant. You're using the wrong type, and something is corrupting the data when confronted with bytes that don't make up a valid unicode encoding.

As for fixing this, newer versions of HTTP are polymorphic in their return type, and can handle returning the raw bytes in a ByteString. You just need to change how you're writing the bytes to the file, so that it won't infer that you want a String.

import qualified Data.ByteString as B
import Network.HTTP
import Network.URI (parseURI)

main = do
    jpg <- get "http://www.irregularwebcomic.net/comics/irreg2557.jpg"
    B.writeFile "irreg2557.jpg" jpg
  where
    get url = let uri = case parseURI url of
                          Nothing -> error $ "Invalid URI: " ++ url
                          Just u -> u in
              simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody

The construction to get a polymorphic Request is a bit clumsy. If issue #1 ever gets fixed then using getRequest url will suffice.

like image 118
Carl Avatar answered Nov 09 '22 02:11

Carl