Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need help logging into website and retrieving information

I have read through many similar questions, but I am still stuck with logging in to my school gradebook (https://parents.mtsd.k12.nj.us/genesis/parents) to retrieve data. My networking class is shown below:

public class WebLogin {

    public String login(String username, String password, String url) throws IOException {
        URL address = new URL(url);

        HttpURLConnection connection = (HttpURLConnection) address.openConnection();
        connection.setDoOutput(true);
        connection.setRequestProperty("j_username", username);
        connection.setRequestProperty("j_password", password);
        connection.setRequestProperty("__submit1__","Login");

        InputStream response = connection.getInputStream();
        Document document = Jsoup.parse(response, null, "");

        //don't know what to do here!

        return null;
    }
 }

I am not sure what to do with the InputStream or if I am going about logging in correctly, and once again, I have gone through and read several online resources to try to understand this concept but I am still confused. Any help is appreciated!

UPDATE: So now I understand the basic process and I know what I have to do. When using Jsoup to handle the connection, I know you can do something like this:

    Connection.Response res = Jsoup.connect("---website---")
                .data("j_username", username, "j_password", password)
                .followRedirects(true)
                .method(Method.POST)
                .execute();

However, I am still a little confused as to how to actually send the data (such as a user's username/password to the website with HttpURLConnection, and how to actually store the obtained cookie...otherwise, the help has been really useful and I am fine with everything else

like image 897
mlz7 Avatar asked Jul 25 '15 18:07

mlz7


People also ask

Can I scrape a website that requires login?

Yes, it's login screens. Sometimes, you might set your sights on scraping data you can access only after you log into an account. It could be your channel analytics, your user history, or any other type of information you need. In this case, first check if the company provides an API for the purpose.

What does scraping a website mean?

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere.

How do I scrape my login page?

Web Scraping Past Login Screens ParseHub is a free and powerful web scraper that can log in to any site before it starts scraping data. You can then set it up to extract the specific data you want and download it all to an Excel or JSON file. To get started, make sure you download and install ParseHub for free.


2 Answers

EXAMPLE HOW-TO:

(explanation on the origin in edit part)

  1. connect to website (login page) with user creditentials [with HttpUrlConnection]
  2. grab cookies
  3. reconnect to page with user data
  4. parse data [with JSoup]
  5. present

(with use of value/key pair with http url connection):

1-2 . assuming we got some URL adress (login page "http://blabla/login.php") & we wanna login so what we do:

// create  & open connection 
HttpUrlConnection connection = (HttpURLConnection) new URL(adress).openConnection();

// set do output ...
connection.setDoOutput(true);

/** variable charset for encoding */
String CHARSET = "UTF-8";

// Construct the POST value/key pair  data.
String data = "login=" + URLEncoder.encode(login, CHARSET)      
        + "&password=" + URLEncoder.encode(password, CHARSET) 
        + "&remember_me=on";

byte[] dataBytes = data.getBytes(CHARSET);

// create output stram to write our creditentials 
OutputStream outputStream = new BufferedOutputStream(connection.getOutputStream());

// write value/key data to output stream
outputStream.write(dataBytes);
outputStream.flush();

// connect to url 
connection.connect();

// now we are connected and we can do other stuff get input strem header response code etc....
int responseCode = connection.getResponseCode();

 /**
 *     here we grab cookies (how?  - in other story)
 */

connection.disconnect();

3 . THEN WE GOT SECOND PAGE WITH USER DATA ( http://blabla/userdata.php ) (*if we allready were not redirected, ** we also can reuse connection or do request as next step to above)

//we are creating & oppening another connection to new adres as at beginning 
HttpUrlConnection connection = (HttpURLConnection) new URL(adressToUserData).openConnection();

// but we do not construct user value/kay data & don't create output stream 
// we just add obtained cookies as request property 
connection.addRequestProperty("Cookie", _cookie); 

//connecting
connection.connect();

//getting input stram 
InputStrem is = connection.getInputStream();

//parse data for example with jsoup 
Document doc = JSoup.parse(is,null"");

//show parsed result example in grid view 
  1. to parse data you need to know the structure of web page after login (the part with user grades) to examine a page u can use simple tool built in chrome web browser (right click with mouse & examine element)

then u got the "page" as document from point 3 & u select what u need

Elements tableWithGrades = doc.select("table>grades");

u cant select single element as table row(tr), cell(td), span, div by id or class, name etc from HTML - what u like to. You just need to learn 'syntax' of JSoup & got 'simple' knowledge of html:)

EDIT:

"I am not sure what to do with the InputStream or if I am going about logging in correctly, and once again, I have gone through and read several online resources to try to understand this concept but I am still confused."

to be clear:

  1. you need to made yourself clear objective(goals) - what you want to achieve
  2. then the way how.

You want to (--goals--):

  • present data from school server which are avaliable for logged in user

so u need to ACT & "THINK" like a WEB BROWSER:(--way--):

  1. make request with user creditentials (login)
  2. grab data for logged in user
  3. parse data (get what u want from the data)
  4. show results

u know now how to do 1,2 from my previous answers

u need now think how-to 3,4 - there are many ways - many roads to one place :P it's your choice which one will u take - but still you need to be aware of those roads :)

so what part of gathered data & in what kind of way u want to present it ?

EDIT2:

"However, I am still a little confused as to how to actually send the data (such as a user's username/password to the website with HttpURLConnection"

  • by implementing a login form (activity) which will gather user data
  • providing data from any other source (as u desired)

before u call:

// change post string "?login=xxxx&password=zzz" to byte array *
byte[] dataBytes = data.getBytes(CHARSET);

which is bound with:

//write value/key data to output stream
outputStream.write(dataBytes);
outputStream.flush();

or before u call as in yr jsoup example:

res.execute();

u need to set String login="...";, passwod="...." why???

because your code is called sequential(excluding parareel parts) & java use references

"and how to actually store the obtained cookie..."

  • use persisten storage like: shared preferences, file, database
  • or for session(per app 'pid=vm' lifecycle) singleton instance(example Application class or any other singleton) or some helper class / variable which will not be consumed by GC during your session

"and does writing to the outputStream do the same thing as what my Jsoup example would do? "

in your statment jsoup uses so known builder pattern which makes the same as code i wrote in point 1-4 but it's something like: "the details on how you do it are not irrelevant to me as far it is working" as you can dig a hole with an ax or a shovel - you get the expected result.

"when you are posting login information, why would that be an encoded param?"

imagine your password looks like this $$$s-_xxx.php?xxaasfs??dfsdfśś%%___////**"* - try to do browser request with that kinnd of url http://server.com/home.php&action=login&username=xxx&password=$$$s-_xxx.php?xxaasfs??dfsdfśś :)))

URLEncoder.encode(password, CHARSET) 

/**
* This class is used to encode a string using the format required by
* {@code application/x-www-form-urlencoded} MIME content type.
*
* <p>All characters except letters ('a'..'z', 'A'..'Z') and numbers ('0'..'9')
* and characters '.', '-', '*', '_' are converted into their hexadecimal value
* prepended by '%'. For example: '#' -> %23. In addition, spaces are
* substituted by '+'.
*/

SOME OTHER HINTS:

  • when u open a stream you should close it when u're done
  • to close stream 'best way' (not always) is to use finally block of try/catch
  • befor you poceed with stram check if it is not null it will save your time
  • if u use jsoup check for elements.size()>0 or element!=null before u perform any strings/element involving actions

ps. for youre case you should look also at

Using Form-Based Login in JavaServer Faces Web Applications

I hope that this will give you an overview of one of the possible paths & sorry for typos i dont know english language ;p at all

like image 70
ceph3us Avatar answered Sep 20 '22 16:09

ceph3us


One way to do this is using Apache Http Client. Maybe something like below could work.

Request.Post("https://parents.mtsd.k12.nj.us/genesis/j_security_check")
    .bodyForm(
        Form.form()
        .add("j_username",  username)
        .add("j_password",  password).build()
    )
.execute().returnContent();

You'll have figure out what information are necessary for logging in (URL, request method, name and value of parameters etc.).

Here is an example I found for logging into Gmail.

like image 23
XiaoChuan Yu Avatar answered Sep 22 '22 16:09

XiaoChuan Yu