Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my https getter in Haskell so slow compared to curl?

Tags:

curl

haskell

I'm writing a simple https getter code in Haskell. After I get the response I save it to a file with compression. However my version is very slow compared to curl and gzip combination. How can I make it faster that curl? Details are below.

Haskell code (fetcher.hs):

import Control.Lens
import qualified Codec.Compression.GZip as GZip
import qualified Data.ByteString.Lazy as BL
import Network.Wreq

writeURIBodyToFile :: FilePath -> String -> IO()
writeURIBodyToFile filePath uri = do
  response <- get uri
  let body = (response ^. responseBody)
  BL.writeFile filePath (GZip.compress body)

main :: IO ()
main = writeURIBodyToFile "out.html.gz" "https://www.sahibinden.com/ilan/vasita-otomobil-seat-hatasiz-boyasiz-tramersiz-dsg-leon-469484363/detay/"

Haskell result:

$ ghc -o fetcher fetcher.hs
$ time ./fetcher 

real    0m9.240s
user    0m8.840s
sys     0m0.232s

curl result:

$ time curl "https://www.sahibinden.com/ilan/vasita-otomobil-seat-hatasiz-boyasiz-tramersiz-dsg-leon-469484363/detay/" | gzip > out.html.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  102k  100  102k    0     0   331k      0 --:--:-- --:--:-- --:--:--  332k

real    0m0.524s
user    0m0.156s
sys     0m0.040s

Edit: I also tried with http-conduit package, nothing changed.

import qualified Data.ByteString.Lazy as BL
import           Network.HTTP.Simple

main :: IO ()
main = do
    response <- httpLBS "https://www.sahibinden.com/ilan/vasita-otomobil-seat-hatasiz-boyasiz-tramersiz-dsg-leon-469484363/detay/"
    BL.writeFile "outnew.html" $ getResponseBody response

Edit2: I also checked the connection with tcpdump, and there is no connection issue.

Edit3: GHCi, version 7.10.3

Edit4: compile command ghc -o fetcher fetcher.hs

Edit5: the problem couldn't be reproducible with this code on Feb 2019:

{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Simple
import qualified Data.ByteString.Char8 as B8

main :: IO ()
main = httpBS "https://www.sahibinden.com/ilan/vasita-otomobil-mercedes-benz-mercedes-benz-c-180-fascination-7g-tronic-ozel-renk-652750468/detay" >>= B8.putStrLn . getResponseBody

Result:

$ ghc -o fetcher fetcher.hs
$ time ./fetcher 
real    0m0,549s
user    0m0,093s
sys     0m0,021s

Edit6: again, the problem couldn't be reproducible on the first code sample on Feb 2019 GHCi, version 8.0.2

like image 966
Mustafa İlhan Avatar asked Aug 20 '17 08:08

Mustafa İlhan


1 Answers

my best guess is that your HTTP client does not consider the Content-Length http header, and merely keeps downloading until the remote server close the connection, which is

a: potentially much slower than just reading the Content-Length header, many webservers keep sockets open for much longer than need be (usually for a socket reuse scheme)

b: a common theme among naive/simple http clients.

you can confirm this with a little netcat http server like this:

printf "HTTP/1.0 200 OK\r\nContent-Length: 3\r\n\r\nabcx" | nc -l 9999

now hit http://127.0.0.1:9999 and check the response, a http client optimized to consider the Content-Length header will say that the response body is abc, while a http client not optimized to consider the Content-Length header will say that the response body is abcx

enter image description here

note: this command should work on unix-like systems (Linux, *BSD, MacOS), but probably won't work on Windows-systems. if you're running Windows, it will work on Cygwin (and probably work on WSL, but i haven't tried, i am still rolling windows 7 which doesn't support WSL)

like image 85
hanshenrik Avatar answered Nov 12 '22 06:11

hanshenrik