I have several apps in which I am receiving a cookie from the webview on logged in webpages and reuse it directly with jsoup to scrape content as follows:
final String url = "https://need.authentication.com";
// -- Android Cookie part here --
CookieSyncManager.getInstance().sync();
CookieManager cm = CookieManager.getInstance();
String cookie = cm.getCookie(url); // returns cookie for url
// ...
// -- JSoup part here --
// Jsoup uses cookies as "name/value pairs"
doc = Jsoup.connect("https://need.authentication.com").cookie(url, cookie).get();
This doesn't work for all urls. Receiving the cookie is never a problem but jsoup sometimes can not use the cookie.
All i would like to do now is add this existing cookie to a httpclient or another non-deprecated option to download the page and then hand it to jsoup for further scraping as I have the feeling jsoup isn't handling cookies correctly.
Jsoup debug is only showing:
03-19 03:06:16.394 1317-3369/mysource.internationsexpress W/System.err: at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:512)
03-19 03:06:16.394 1317-3369/mysource.internationsexpress W/System.err: at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493)
03-19 03:06:16.394 1317-3369/mysource.internationsexpress W/System.err: at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205)
03-19 03:06:16.394 1317-3369/mysource.internationsexpress W/System.err: at org.jsoup.helper.HttpConnection.get(HttpConnection.java:194)
and for more info the cookie looks like this:
__indbg=481084b1-3d71-461a-b6e1-93d;
__gads=ID=0058c3ccb75f72f2:T=1458162316:S=ALN;
INSESSION=ct8njokkc4uadlmjjg8a3gvp1ng4m0acvvveea66bkpmn32fvc;
INEP=%5B%22nw01_101_B_0%22%2C%22mp04_103_B_0%22%2C%22in01_244;
WASLOGGEDIN=1;
INREMEMBERME=cHlMQlRVbzVOUkhJTU5kU25tMlplZ2RvNWxvbkN4TmdsR0RBVWp6Qkp6dkpONW1Tb2o3MH;
INBP=mobile;
__utmt=1;
__utma=68558281.1607821733.1458162272.1458240416.1;
__utmb=68558281.1.10.1458327475;
__utmc=68558281;
__utmz=68558281.1458162272.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
__utmv=68558281.|2=community=sanj=1^3=loggedIn=1=1^5=experiment=%7Cst01_267_B_2%7Cmt01
cookie(name, value) expects the name of the cookie not its related url.
Try this instead:
doc = Jsoup //
.connect("https://need.authentication.com") //
.header("Cookie", cookie) //
.get();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With