I'm writing a simple https getter code in Haskell. After I get the response I save it to a file with compression. However my version is very slow compared to curl and gzip combination. How can I make it faster that curl? Details are below.
Haskell code (fetcher.hs):
import Control.Lens
import qualified Codec.Compression.GZip as GZip
import qualified Data.ByteString.Lazy as BL
import Network.Wreq
writeURIBodyToFile :: FilePath -> String -> IO()
writeURIBodyToFile filePath uri = do
response <- get uri
let body = (response ^. responseBody)
BL.writeFile filePath (GZip.compress body)
main :: IO ()
main = writeURIBodyToFile "out.html.gz" "https://www.sahibinden.com/ilan/vasita-otomobil-seat-hatasiz-boyasiz-tramersiz-dsg-leon-469484363/detay/"
Haskell result:
$ ghc -o fetcher fetcher.hs
$ time ./fetcher
real 0m9.240s
user 0m8.840s
sys 0m0.232s
curl result:
$ time curl "https://www.sahibinden.com/ilan/vasita-otomobil-seat-hatasiz-boyasiz-tramersiz-dsg-leon-469484363/detay/" | gzip > out.html.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 102k 100 102k 0 0 331k 0 --:--:-- --:--:-- --:--:-- 332k
real 0m0.524s
user 0m0.156s
sys 0m0.040s
Edit: I also tried with http-conduit package, nothing changed.
import qualified Data.ByteString.Lazy as BL
import Network.HTTP.Simple
main :: IO ()
main = do
response <- httpLBS "https://www.sahibinden.com/ilan/vasita-otomobil-seat-hatasiz-boyasiz-tramersiz-dsg-leon-469484363/detay/"
BL.writeFile "outnew.html" $ getResponseBody response
Edit2: I also checked the connection with tcpdump, and there is no connection issue.
Edit3: GHCi, version 7.10.3
Edit4: compile command ghc -o fetcher fetcher.hs
Edit5: the problem couldn't be reproducible with this code on Feb 2019:
{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Simple
import qualified Data.ByteString.Char8 as B8
main :: IO ()
main = httpBS "https://www.sahibinden.com/ilan/vasita-otomobil-mercedes-benz-mercedes-benz-c-180-fascination-7g-tronic-ozel-renk-652750468/detay" >>= B8.putStrLn . getResponseBody
Result:
$ ghc -o fetcher fetcher.hs
$ time ./fetcher
real 0m0,549s
user 0m0,093s
sys 0m0,021s
Edit6: again, the problem couldn't be reproducible on the first code sample on Feb 2019 GHCi, version 8.0.2
my best guess is that your HTTP client does not consider the Content-Length
http header, and merely keeps downloading until the remote server close the connection, which is
a: potentially much slower than just reading the Content-Length
header, many webservers keep sockets open for much longer than need be (usually for a socket reuse scheme)
b: a common theme among naive/simple http clients.
you can confirm this with a little netcat http server like this:
printf "HTTP/1.0 200 OK\r\nContent-Length: 3\r\n\r\nabcx" | nc -l 9999
now hit http://127.0.0.1:9999 and check the response, a http client optimized to consider the Content-Length
header will say that the response body is abc
, while a http client not optimized to consider the Content-Length
header will say that the response body is abcx
note: this command should work on unix-like systems (Linux, *BSD, MacOS), but probably won't work on Windows-systems. if you're running Windows, it will work on Cygwin (and probably work on WSL, but i haven't tried, i am still rolling windows 7 which doesn't support WSL)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With