Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

jsoup posting and cookie

I'm trying to use jsoup to login to a site and then scrape information, I am running into in a problem, I can login successfully and create a Document from index.php but I cannot get other pages on the site. I know I need to set a cookie after I post and then load it when I'm trying to open another page on the site. But how do I do this? The following code lets me login and get index.php

Document doc = Jsoup.connect("http://www.example.com/login.php")                .data("username", "myUsername",                       "password", "myPassword")                .post(); 

I know I can use apache httpclient to do this but I don't want to.

like image 674
Gwindow Avatar asked Jun 21 '11 22:06

Gwindow


People also ask

What is jsoup used for?

What It Is. jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.

What is a jsoup document?

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.

Can jsoup parse JavaScript?

Jsoup parses the source code as delivered from the server (or in this case loaded from file). It does not invoke client-side actions such as JavaScript or CSS DOM manipulation.


2 Answers

When you login to the site, it is probably setting an authorised session cookie that needs to be sent on subsequent requests to maintain the session.

You can get the cookie like this:

Connection.Response res = Jsoup.connect("http://www.example.com/login.php")     .data("username", "myUsername", "password", "myPassword")     .method(Method.POST)     .execute();  Document doc = res.parse(); String sessionId = res.cookie("SESSIONID"); // you will need to check what the right cookie name is 

And then send it on the next request like:

Document doc2 = Jsoup.connect("http://www.example.com/otherPage")     .cookie("SESSIONID", sessionId)     .get(); 
like image 120
Jonathan Hedley Avatar answered Sep 19 '22 16:09

Jonathan Hedley


//This will get you the response. Response res = Jsoup     .connect("loginPageUrl")     .data("loginField", "[email protected]", "passField", "pass1234")     .method(Method.POST)     .execute();  //This will get you cookies Map<String, String> loginCookies = res.cookies();  //And this is the easiest way I've found to remain in session Document doc = Jsoup.connect("urlYouNeedToBeLoggedInToAccess")       .cookies(loginCookies)       .get(); 
like image 34
Igor Brusamolin Lobo Santos Avatar answered Sep 22 '22 16:09

Igor Brusamolin Lobo Santos