Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

http-conduit browser usage

I'm trying to scrape data from a site using HTTPS. I managed to make basic requests using Network.HTTP.Conduit successfully (posting credentials, etc.), but failed at extracting cookie information from the response headers (Set-Cookie). It looks like http-conduit has its own mechanism for dealing with cookies, which I failed to understand.

Network.HTTP.Conduit.Browser seems to deal with cookies automatically (which is fine by me), but I couldn't get it to work due to lacking documentation.

Could someone with more experience dealing with the http-conduit browser module show me how to:

  1. Deal with self-signed certificates (I managed to do this with managerCheckCerts in the base module)
  2. Send a POST request with URL-encoded parameters in the body, not following any redirects (I used urlEncodedBody from the base module for this)
  3. Use the cookie from the 2. step in a simple GET request and read the response as a (lazy) ByteString (I would have used httpLbs for this)

To me it looks like the abstraction level of Network.HTTP.Conduit.Browser is more suited for my application compared to Network.HTTP.Conduit, so I would like to make the switch even if I could deal with cookies manually using the latter.

like image 894
akosch Avatar asked Feb 26 '12 12:02

akosch


1 Answers

I've never used Browser, but I have used http-conduit. I read the source code to answer these questions, I apologize if I make any mistakes.

  1. Do the same thing you're doing. When you've created the Manager with the right managerCheckCerts, pass that along to browse :: Manager -> BrowserAction a -> ResourceT IO a.

  2. makeRequest :: Request IO -> BrowserAction (Response (Source IO BS.ByteString)) takes a Request IO; use urlEncodedBody like before to create a POST request with parameters in the body and pass it to makeRequest. Set redirectCount to 0 to disable redirect following, I believe.

  3. I believe you just need to use getCookieJar :: BrowserAction CookieJar; the BrowserAction comes from getBrowserState :: BrowserAction BrowserState.

The way http-conduit manages cookies outside the Browser module is that it doesn't. Cookies are returned in the HTTP response; what you can do is parse the response and store the cookies in a cookie jar. That's actually all Browser really does.

like image 74
hao Avatar answered Nov 22 '22 06:11

hao