Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jsoup Cookies for HTTPS scraping

Tags:

I am experimenting with this site to gather my username on the welcome page to learn Jsoup and Android. Using the following code

Connection.Response res = Jsoup.connect("http://www.mikeportnoy.com/forum/login.aspx")     .data("ctl00$ContentPlaceHolder1$ctl00$Login1$UserName", "username", "ctl00$ContentPlaceHolder1$ctl00$Login1$Password", "password")     .method(Method.POST)     .execute(); String sessionId = res.cookie(".ASPXAUTH");  Document doc2 = Jsoup.connect("http://www.mikeportnoy.com/forum/default.aspx") .cookie(".ASPXAUTH", sessionId) .get(); 

My cookie (.ASPXAUTH) always ends up NULL. If I delete this cookie in a webbrowser, I lose my connection. So I am sure it is the correct cookie. In addition, if I change the code

.cookie(".ASPXAUTH", "jkaldfjjfasldjf")  Using the correct values of course 

I am able to scrape my login name from this page. This also makes me think I have the correct cookie. So, how come my cookie comes up Null? Are my username and password name fields incorrect? Something else?

Thanks.

like image 519
Brian Avatar asked Aug 21 '11 15:08

Brian


People also ask

How do you get cookies from Jsoup?

method(Method. POST) . execute(); //This will get you cookies Map<String, String> cookies = res. cookies(); //And this is the easieste way I've found to remain in session Documente doc = Jsoup.

Does Jsoup support JavaScript?

Jsoup parses the source code as delivered from the server (or in this case loaded from file). It does not invoke client-side actions such as JavaScript or CSS DOM manipulation.

What is Jsoup connect?

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.


1 Answers

I know I'm kinda late by 10 months here. But a good option using Jsoup is to use this easy peasy piece of code:

//This will get you the response. Response res = Jsoup     .connect("url")     .data("loginField", "[email protected]", "passField", "pass1234")     .method(Method.POST)     .execute();  //This will get you cookies Map<String, String> cookies = res.cookies();  //And this is the easieste way I've found to remain in session Documente doc = Jsoup.connect("url").cookies(cookies).get(); 

Though I'm still having trouble connection to SOME websites, I connect to a whole lot of them with the same basic piece of code. Oh, and before I forget.. What I figured my problem is, is SSL certificates. You have to properly manage them in a way I still haven't quite figured out.

like image 117
Igor Brusamolin Lobo Santos Avatar answered Sep 17 '22 13:09

Igor Brusamolin Lobo Santos