Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly set cookies to get URL content using httr

Tags:

r

cookies

httr

I need to download information from web site that is protected using cookies. I pass this protection manually and then insert cookies to httr.

Here is similar topic, but it does not solve my problem: (Copying cookie for httr)

library(httr)
url<-"http://smida.gov.ua/db/emitent/year/xml/showform/32153/125/templ"

cook<-"_SMIDA=9117a9eb136353bd6956651bd59acd37; __utmt=1; __utma=29983421.1729484844.1413489369.1413625619.1413627797.3; __utmb=29983421.7.10.1413627797; __utmc=29983421; __utmz=29983421.1413489369.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)"

response <- GET(url, config(cookie= cook))

content(x = response,as = 'text', encoding = "UTF-8")   

So when I use content it return me information, that I am not logged in( as I do without cookie)

How can I solve this problem?

Test credentials are login: mytest2, pass: qwerty12)

like image 445
VadymB Avatar asked Oct 18 '14 16:10

VadymB


People also ask

How do I make a cookie for a specific URL?

Syntax. document. cookie = "name = yourName; path = yourPath"; Where “yourPath” is the path to the specific page on which you want to set the cookie.

Are cookies URL specific?

@Roel, Cookies by default are set specific to the domain. So, if you set cookies for say example.com, you won't find those cookies under google.com. I have seen some article that you can pass cookies to other site if you don't provide the samesite option or samesite="lax" may also submit your cookie to other site.

How are cookies encoded?

It is then base64 encoded so it is an ASCII string, since the underlying HTTP protocols expect to work with ASCII. That base64 encoded string becomes the value of the cookie. When cookies are sent back to the server, they are read, (base64) decoded, decrypted, JSON parsed, and stored in memory as key/value pairs.


1 Answers

This would be the way to set_cookies with GET & httr:

GET("http://smida.gov.ua/db/emitent/year/xml/showform/32153/125/templ", 
    set_cookies(`_SMIDA` = "7cf9ea4bfadb60bbd0950e2f8f4c279d",
                `__utma` = "29983421.138599299.1413649536.1413649536.1413649536.1",
                `__utmb` = "29983421.5.10.1413649536",
                `__utmc` = "29983421",
                `__utmt` = "1",
                `__utmz` = "29983421.1413649536.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)"))

That worked for me, well at least I think it did as I cannot read the language. A table comes back with the same structure and no prompt to login.

Unfortunately the captcha on login prevents the use of Rselenium (or other, similar, crawling packages), so you'll have to continue to manually grab those cookies (or use a utility to extract them from the session).

Finally, you probably want to seriously consider changing those credentials, now :-)


EDIT: @VadymB and I both found that the code didn't work until we rebooted RStudio. Your mileage may vary.

like image 142
hrbrmstr Avatar answered Oct 04 '22 06:10

hrbrmstr