Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use webclient in a secure site?

I need to automate a process involving a website that is using a login form. I need to capture some data in the pages following the login page.

I know how to screen-scrape normal pages, but not those behind a secure site.

  1. Can this be done with the .NET WebClient class?
    • How would I automatically login?
    • How would I keep logged in for the other pages?
like image 383
Oded Avatar asked Sep 07 '08 07:09

Oded


People also ask

Can I use WebClient in Spring MVC?

It can take time to get used to Reactive APIs, but the WebClient has interesting features and can also be used in traditional Spring MVC applications. You can use WebClient to communicate with non-reactive, blocking services, too.

What is a reactive web client?

WebClient is a non-blocking, reactive client for performing HTTP requests with Reactive Streams back pressure. WebClient provides a functional API that takes advantage of Java 8 Lambdas. By default, WebClient uses Reactor Netty as the HTTP client library. But others can be plugged in through a custom.

How do I add a bearer token in WebClient?

Bonus tip – Setting Bearer Token in WebClient Similar to Basic Auth, we can also setup the Bearer token in WebClient using new method setBearerAuth in HttpHeaders class: void setBearerAuth(String token) //Set the value of the Authorization header to the given Bearer token.


1 Answers

One way would be through automating a browser -- you mentioned WebClient, so I'm guessing you might be referring to WebClient in .NET.

Two main points:

  • There's nothing special about https related to WebClient - it just works
  • Cookies are typically used to carry authentication -- you'll need to capture and replay them

Here's the steps I'd follow:

  1. GET the login form, capture the the cookie in the response.
  2. Using Xpath and HtmlAgilityPack, find the "input type=hidden" field names and values.
  3. POST to login form's action with user name, password, and hidden field values in the request body. Include the cookie in the request headers. Again, capture the cookie in the response.
  4. GET the pages you want, again, with the cookie in the request headers.

On step 2, I mention a somewhat complicated method for automating the login. Usually, you can post with username and password directly to the known login form action without getting the initial form or relaying the hidden fields. Some sites have form validation (different from field validation) on their forms which makes this method not work.

HtmlAgilityPack is a .NET library that allows you to turn ill-formed html into an XmlDocument so you can XPath over it. Quite useful.

Finally, you may run into a situation where the form relies on client script to alter the form values before submitting. You may need to simulate this behavior.

Using a tool to view the http traffic for this type of work is extremely helpful - I recommend ieHttpHeaders, Fiddler, or FireBug (net tab).

like image 117
Hafthor Avatar answered Oct 03 '22 09:10

Hafthor