Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check programatically if url of page is redirecting?

Tags:

java

groovy

I am trying to extract the content of a webpage A. Using groovy I've tried the following

......
String urlStr = "url-of-webpage-A"
String pageText = urlStr.toURL().text
//println pageText
.....

The above code retrieves the text of webPage A as long as it doesn't redirect to an other webpage B. If A redirects to B, the page content of webPage B is retrieved in the pageText variable. Is there a way to code and check if webPage A is redirecting to an other webpage (in groovy or java)?

PS: The above piece of code is not a part of server side logic. I am executing it on the client side within the scope of a desktop appilcation.

like image 313
Vamsi Emani Avatar asked Dec 03 '22 01:12

Vamsi Emani


2 Answers

In Java you can use URL.openConnection() to get a HttpURLConnection (you'll need to cast). On this you can call setInstanceFollowRedirects(false).

Then you can use getResponseCode() and see if HTTP_MOVED_PERM (301), HTTP_MOVED_TEMP (302) or HTTP_SEE_OTHER (303). They all indicate redirection.

If you need to know where you're being redirected to, then you can use getHeaderField("Location") to get the location header.

like image 52
Joachim Sauer Avatar answered Mar 24 '23 17:03

Joachim Sauer


In groovy, you could do what Joachim suggests by doing:

String location = "url-of-webpage-A"
boolean wasRedirected = false
String pageContent = null

while( location ) {
  new URL( location ).openConnection().with { con ->
    // We'll do redirects ourselves
    con.instanceFollowRedirects = false

    // Get the response code, and the location to jump to (in case of a redirect)
    location = con.getHeaderField( "Location" )
    if( !wasRedirected && location ) {
      wasRedirected = true
    }

    // Read the HTML and close the inputstream
    pageContent = con.inputStream.withReader { it.text }
  }
}

println "wasRedirected:$wasRedirected contentLength:${pageContent.length()}"

If you don't want to be redirected, and want the contents of the first page, you simply need to do:

String location = "url-of-webpage-A"
String pageContent = new URL( location ).openConnection().with { con ->
  // We'll do redirects ourselves
  con.instanceFollowRedirects = false

  // Get the location to jump to (in case of a redirect)
  location = con.getHeaderField( "Location" )

  // Read the HTML and close the inputstream
  con.inputStream.withReader { it.text }
}

if( location ) { 
  println "Page wanted to redirect to $location"
}
println "Content was:"
println pageContent    
like image 42
tim_yates Avatar answered Mar 24 '23 19:03

tim_yates